EEP: 7 Title: Foreign Function Interface (FFI) Version: $Revision: 19 $ Last-Modified: $Date: 2007-09-12 16:06:29 +0200 (mer, 12 set 2007) $ Author: Alceste Scalas Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 3-Sep-2007 Erlang-Version: R12B Post-History: Abstract ======== This EEP describes a Foreign Function Interface (FFI) for Erlang/OTP, that allows to easily perform direct calls of external C functions. It introduces three new BIFs (`ffi:raw_call/3`_, `erl_ddll:load_library/3`_ and `ffi:raw_call/2`_) that accomplish the main FFI tasks: loading generic C libraries, making external function calls and performing automatic Erlang-to-C and C-to-Erlang type conversions. It also introduces two auxiliary BIFs for converting C buffers/strings into binaries (`ffi:raw_buffer_to_binary/2`_ and `ffi:raw_cstring_to_binary/1`_), a new ``ffi`` Erlang module that provides a higher-level API with stricter type checking, and some utility macros. Finally, it extends erl_ddll:info/2 with FFI information. Motivation ========== The current Erlang extension mechanisms can be divided in two main categories: 1. absolute stability at the price of speed (C nodes, pipe drivers); 2. more speed at the (potential) price of stability (linked-in drivers). Linked-in drivers have thus become the standard way for creating library bindings when efficiency is an issue. In both cases, however, the Erlang driver interface implies the development of relevant amounts of glue code, mostly because the communication between Erlang and C always requires data parsing and (de)serialization. Several tools have been created in order to autogenerate (at least part of) that glue: from the (now unmaintained) IG driver generation tool [1]_ to the newer Erlang Driver Toolkit (EDTK) [2]_ and Dryverl [3]_. But, even with the help of these tools, developing an Erlang driver is a difficult and time-consuming task (especially when interfacing external libraries with tens or hundreds of functions), and the glue code itself increases the possibility to introduce bugs --- that, in the case of linked-in drivers, usually mean VM crashes. For all these reasons, the lack of libraries and the difficulty of interfacing them from other languages is one of the negative aspects that are usually associated with Erlang. The same problems also arise when a developer needs to replace performance-critical portions of his/her Erlang code with optimized C functions. In this case, also the data serialization/deserialization overhead may be a significant issue. An easier method for interfacing Erlang and C code could drastically extend the Erlang capabilities and open new usage scenarios. Rationale ========= This EEP proposes a Foreign Function Interface (FFI) extension that would allow to easily perform direct C function calls. This concept is implemented in almost every language, with two main (non-exclusive) approaches: 1. automatic type conversions between the host and the foreign language (examples: Python [7]_, Haskell [8]_); 2. documented C interface for handling host language types from the foreign language (examples: Java [9]_, Python [10]_ [11]_). This EEP follows the first approach, but (when possible) also reuses part of the existing C Driver API (and, thus, allows to manage ``ErlDrvBinary`` and ``ErlIOVec`` pointers in the external C functions). The FFI has been designed *not* to require language changes or introduce incompatibilities. The BIFs and functions proposed in this EEP don't give any access to the Erlang VM internals --- but the called C functions could leak memory and/or cause the Erlang VM to crash. The FFI is, thus, not intended for "casual" Erlang developers: this is a tool designed for library bindings developers (that should take care of hiding FFI calls from final users), and advanced programmers looking for an easy (and efficient) way to call C code from Erlang. Overview ======== In order to call a C function, the FFI needs a port opened towards the required C code. Thus, with the current driver loading mechanism, a developer would be required to: 1. create a C file with a void ``ErlDrvEntry`` structure and driver init function; 2. compile it and possibly link it against the required C libraries, thus obtaining a void Erlang driver; 3. load the driver in the Erlang VM, by using erl_ddll:load/2. In order to simplify this procedure, this EEP proposes the `erl_ddll:load_library/3`_ function, that allows to load a generic library in the Erlang VM --- even if it lacks the structure of an Erlang linked-in driver. erl_ddll:load_library/3 also offers an option to preload a list of C function symbols and signatures, thus precompiling the internal structures needed for performing dynamic function calls. Information about preloaded data can be retrieved with `erl_ddll:info/2`_. Once a library or driver has been loaded, erlang:open_port/2 or `erlang:open_port/1`_ could be used to get a port for the FFI functions, and perform calls either through the low-level or the high-level APIs. Low-level API ------------- The low-level FFI methods are denoted by the ``raw_`` prefix. The main function is the `ffi:raw_call/3`_ BIF, that performs a direct C function call through an open port. It converts C types to/from Erlang types. When taken alone, ffi:raw_call/3 has got a major drawback: it introduces great call overhead, due to the C symbol lookup and the dynamic construction of the function call. In order to exploit preloading option of erl_ddll:load_library/3, the `ffi:raw_call/2`_ BIF is introduced: it avoids symbol lookup and call structure compilation, thus guaranteeing a lower call overhead than ffi:raw_call/3. Furthermore, the low-level interface provides two BIFs for creating an Erlang binary from a C pointer (possibly returned by a FFI call). These BIFs are `ffi:raw_buffer_to_binary/2`_ and `ffi:raw_cstring_to_binary/1`_. High-level API -------------- The high-level interface is built upon the low-level one. It introduces the concept of type-tagged values: any value passed to or returned from FFI calls has the form of a `{Type, Value}` tuple. This allows to: 1. increase the readability of FFI calls; 2. make the C calls safer: the consistency of tagged values is checked before the values themselves are passed to the low-level API. Furthermore, the preload information given to erl_ddll:load_library/3 is used (when available) to ensure that the tagged values actually match the function signature; 3. simulate the static typing of C code, thus requiring proper and explicit "casts" when a tagged value needs to be converted to another type. These checks are performed by `ffi:call/3`_, `ffi:buffer_to_binary/2`_ and `ffi:cstring_to_binary/1`_ (the type-tagged equivalents of the low-level BIFs). Type-tagged values can also be checked with `ffi:check/1`_. Furthermore, the allowed minimum and maximum value of each FFI type can be examined with `ffi:min/1`_ and `ffi:max/1`_. Utility macros -------------- The FFI defines a series of utility macros in the `ffi_hardcodes.hrl`_ header file, that could be used for binary matching of C buffers and structures. Specifications ============== Types ----- c_func_name() ''''''''''''' `c_func_name() = atom() | string()` Name of a C function. type_tag() '''''''''' `type_tag() = atom()` Valid FFI type atom. For the list of allowed values, see the Appendix_. tagged_value() '''''''''''''' `tagged_value() = tuple(type_tag(), term())` Type-tagged value used for FFI calls. tagged_func_name() '''''''''''''''''' `tagged_func_name() = tuple(type_tag(), c_func_name())` C function name with return type. func_index() '''''''''''' `func_index() = integer()` Function position on the list of preloads given to `erl_ddll:load_library/3`_. tagged_func_index() ''''''''''''''''''' `tagged_func_index() = tuple(type_tag(), func_index())` C function index with return type. signature() ''''''''''' `signature() = tuple(type_tag(), ...)` Signature of a C function: return type followed by arguments types (if any). erl_ddll:load_library/3 ----------------------- :: erl_ddll:load_library(Path, Name, OptionsList) -> ok | {error, ErrorDesc} Types: - Path = Name = string() | atom() - OptionList = [Option] - Option = tuple(preload, [Preload]) - Preload = tuple(`c_func_name()`_, `signature()`_) Load a generic shared library. If an ``ErlDrvEntry`` structure and a driver init function are found when loading the library, this BIF will behave like erl_ddll:load/2. The function parameters are also the same of erl_ddll:load/2, with the following addition: **OptionList** A list of options for library/driver loading. The supported options are: **{preload, PreloadList}** Preload the given list of functions, and prepare their call structures. Each PreloadList element is a tuple in the form: tuple(`c_func_name()`_, `signature()`_) i.e. the function name followed by its return and arguments types. The function return values are the same of erl_ddll:load/2. Once a library has been loaded, it is possible to use erlang:open_port/2 to get a port. That port could *always* be used with `ffi:call/3`_, `ffi:raw_call/3`_ or `ffi:raw_call/2`_. However, if the loaded library does *not* contain a proper ``ErlDrvEntry`` structure and a driver init function, the port will **not** be usable with erlang:port_command/2, erlang:port_control/3 etc. The following example loads the C standard library and preloads some functions: :: ok = erl_ddll:load_library("/lib", libc, [{preload, [{puts, {sint, nonnull}}, {putchar, {sint, sint}}, {malloc, {nonnull, size_t}}, {free, {void, nonnull}}]}]). erl_ddll:load_library/2 ----------------------- :: erl_ddll:load_library(Path, Name) Utility function that calls `erl_ddll:load_library/3`_ with an empty OptionsList. erlang:open_port/1 ------------------ :: erlang:open_port(Library) Types: - Library = string() | atom() Open a port towards the specified shared library, possibly loaded with `erl_ddll:load_library/3`_. Calling this function is equivalent to: :: erlang:open_port({spawn, Library}, [binary]) erl_ddll:info/2 --------------- This EEP proposes a new parameter for the erl_ddll:info/2 BIF: the 'preloads' atom. It allows to retrieve information about FFI preloads for the given library. The preload information is a list of proplists, one for each preloaded function. Each proplist, in turn, has the following format: :: [ { index, integer() }, % Position in the preload list { name, string() }, % Function name { address, integer() }, % Function address { signature, signature() } ] % Function signature This information would be made available also through erl_ddll:info/0 and erl_ddll:info/1. ffi:raw_call/3 -------------- :: ffi:raw_call(Port, CallArgs, Signature) -> term() Types: - Port = port() - CallArgs = tuple(`c_func_name()`_, Arg1, ...) - Arg1, ... = term() - Signature = `signature()`_ Call the specified C function. This BIF accepts the following parameters: **Port** A port opened towards the required driver/library. **CallArgs** A tuple with the function name (atom or string) followed by its arguments (if any). **Signature** Function signature This BIF returns the return value of the C function being called (or 'void' if the return type is void). It automatically converts Erlang terms to/from C values. The supported C types and conversions are reported in the Appendix_. The following example calls the malloc() and free() functions from the standard C library (it should work with any Erlang linked-in driver): :: Pointer = ffi:raw_call(Port, {malloc, 1024}, {pointer, size_t}), ok = ffi:raw_call(Port, {free, Pointer}, {void, pointer}). **WARNING:** bugs and/or misuses of the external C functions can affect the Erlang VM, possibly making it crash. Use this BIF with extreme care. ffi:raw_call/2 -------------- :: ffi:raw_call(Port, OptimizedCall) -> term() Types: - Port = port() - OptimizedCall = {FuncIndex, Arg1, ...} - FuncIndex = func_index() - Arg1, ... = term() Call a function preloaded with the 'preload' option of `erl_ddll:load_library/3`_. This BIF accepts the following parameters: **Port** A port opened towards the required driver/library (that **must** have been loaded with `erl_ddll:load_library/3`_). **OptimizedCall** A tuple with the function index (i.e. its position in the preload list) followed by its arguments (if any). This BIF returns the return value of the C function being called (or 'void' if the return type is void). It automatically converts Erlang terms to/from C values. The supported C types and conversions are reported in the Appendix_. The following example calls malloc() and free(), after they have been preloaded with the code sample shown in `erl_ddll:load_library/3`_: :: Port = open_port({spawn, "libc"}, [binary]), Pointer = ffi:raw_call(Port, {3, 1024}), ffi:raw_call(Port, {4, Pointer}) **WARNING:** bugs and/or misuses of the external C functions can affect the Erlang VM, possibly making it crash. Use this BIF with extreme care. ffi:raw_buffer_to_binary/2 -------------------------- :: ffi:raw_buffer_to_binary(Pointer, Size) -> binary() Types: - Pointer = integer() - Size = integer() Return a binary with a copy of Size bytes read from the given C pointer (represented by an integer, possibly returned by a FFI call). **WARNING:** passing the wrong pointer to this BIF may cause the Erlang VM to crash. Use with extreme care. ffi:raw_cstring_to_binary/1 --------------------------- :: ffi:raw_cstring_to_binary(CString) -> binary() Types: - CString = integer() Return a binary with a copy of the given NULL-terminated C string (an integer representing a pointer, possibly returned by a FFI call). The binary will include the trailing 0. **WARNING:** passing a wrong pointer to this BIF may cause the Erlang VM to crash. Use with extreme care. ffi:call/3 ---------- :: call(Port, CFunc, Args) -> RetVal Types: - Port = port() - CFunc = `c_func_name()`_ | `func_index()`_ | `tagged_func_name()`_ | `tagged_func_index()`_ - Args = [`tagged_value()`_] - RetVal = `tagged_value()`_ Call the C function ``CFunc`` with the given list of arguments, using the port ``Port``. If the function was preloaded with ffi:load_library/3, all the type tags will be matched against the preloaded signature before performing the call. Return the return value of the C function, with the proper type tag. **Note:** if ``CFunc`` is not of type `tagged_func_name()`_, the C function will be called if and only if it was preloaded with `erl_ddll:load_library/3`_ (it is required in order to determine its return type). As an example, the following ``malloc()`` calls are all valid and equivalent when executed after the code sample shown in `erl_ddll:load_library/3`_: :: %% Use function name, but require preloads for return type {nonnull, Ptr1} = ffi:call(Port, "malloc", [{size_t, 1024}]), {nonnull, Ptr2} = ffi:call(Port, malloc, [{size_t, 1024}]), %% Use function index from preloads list {nonnull, Ptr3} = ffi:call(Port, 3, [{size_t, 1024}]), {nonnull, Ptr4} = ffi:call(Port, {nonnull, 3}, [{size_t, 1024}]), %% These calls do not require any preload information {nonnull, Ptr5} = ffi:call(Port, {nonnull, "malloc"}, [{size_t, 1024}]), {nonnull, Ptr6} = ffi:call(Port, {nonnull, malloc}, [{size_t, 1024}]), **WARNING:** bugs and/or misuses of the external C functions can affect the Erlang VM, possibly making it crash. Use this BIF with extreme care. ffi:buffer_to_binary/2 ---------------------- :: ffi:buffer_to_binary(TaggedNonNull, Size) -> binary() Types: - TaggedNonNull = tuple(nonnull, integer()) - Size: integer() Return a binary with a copy of ``Size`` bytes read from the given C pointer. **WARNING:** passing a wrong pointer to this function may cause the Erlang VM to crash. Use with extreme care. ffi:cstring_to_binary/1 ----------------------- :: ffi:cstring_to_binary(TaggedCString) -> binary() Types: - TaggedCString = tuple(cstring, integer()) Return a binary with a copy of the given NULL-terminated C string. **WARNING:** passing a wrong pointer to this function may cause the Erlang VM to crash. Use with extreme care. ffi:sizeof/1 ------------ :: ffi:sizeof(TypeTag) -> integer() Types: - TypeTag: `type_tag()`_ Return the size (in bytes) of the given FFI type, on the current platform. ffi:check/1 ----------- :: ffi:check(TaggedValue) -> true | false Types: - TaggedValue = `tagged_value()`_ Returns 'true' if the given type-tagged value is well-formed and consistent (i.e. it falls in the allowed range for its type, on the current platform). Otherwise, returns 'false'. ffi:min/1 --------- :: ffi:min(TypeTag) -> integer() Types: - TypeTag = `type_tag()`_ Return the minimum value allowed for the given FFI type, on the current platform. ffi:max/1 --------- :: ffi:max(TypeTag) -> integer() Types: - TypeTag = `type_tag()`_ Return the maximum value allowed for the given FFI type, on the current platform. ffi_hardcodes.hrl ----------------- The `ffi_hardcodes.hrl` file is part of the Erlang ffi library. It defines a set of macros for handling FFI types sizes, and for easy binary matching on C buffers and structures: **FFI_HARDCODED_** An Erlang bit-syntax snippet (Size/TypeSpecifier) that could be used to match the given FFI type inside a binary (possibly obtained from a C buffer). For example, the following binary matching :: <> = Binary on x86-64 will expand to: :: <> = Binary **FFI_HARDCODED_SIZEOF_** The type size in *bytes* **FFI_HARDCODED__BITS** The type size in *bits* As implied by their name, the ``ffi_hardcodes.hrl`` contents are *specific to the build platform*, and when they are used, they will be hard-coded in the resulting ``.beam`` files. Thus, these macros should be avoided if a developer expects his/her FFI-based code to be *portable without recompilation*. The recommended method for getting FFI type sizes in a portable way is the `ffi:sizeof/1`_ function. Further notes ============= Notes on FFI preloading ----------------------- When a library is loaded with `erl_ddll:load_library/3`_, it may be reloaded or unloaded just like any Erlang linked-in driver. If the 'preload' option is used, then two additional behaviors arise: - if `erl_ddll:load_library/3`_ is called two or more times with the same library, then the associated preload list must be rebuilt according to the last call. If no 'preload' option is used, then the last preloads (if any) must be kept intact; - if an erl_ddll:reload/2 is issued, then the last preloads must be refreshed by performing a new symbol lookup in the loaded library. If one or more symbols could not be found anymore, then they must be disabled (and an error must raised when trying to use them with `ffi:raw_call/2`_). Notes on vararg functions ------------------------- `ffi:call/3`_ and `ffi:raw_call/3`_ may be used to call vararg C functions, simply by providing the desired number of arguments. In order to exploit the preloading optimizations, however, it is necessary to use a different preload for each different function call signature. For example, if a developer is going to call ``printf()`` with different arguments, he/she will need to use a preloading list like the following one: :: ok = erl_ddll:load_library("/lib", libc, [{preload, [{printf, {sint, cstring}}, {printf, {sint, cstring, double}}, {printf, {sint, cstring, uint, sint}}, {printf, {sint, cstring, cstring}}]}]). Notes on C pointers and Erlang binaries --------------------------------------- As reported in the Appendix_, an Erlang binary can be passed to a C function as a 'pointer' value. In this case, the C function will receive a pointer to the first byte of binary data. That pointer will be valid *only* until the C function returns. If the C side needs to access the pointer data later, then it should use the 'binary' FFI type (see next paragraph) or copy the data itself in a safe place. Notes on Erlang binaries and reference counting ----------------------------------------------- As reported in the Appendix_, when the 'binary' FFI type is used as argument, the C function will also receive a binary (in the form of an ``ErlDrvBinary`` pointer). Correspondingly, a C function with 'binary' FFI return type must return an ``ErlDrvBinary`` pointer. Furthermore, an 'erliovec' argument type will cause the conversion of an Erlang ``iolist()`` into an ``ErlIOVec`` (and its pointer will be passed to the C function). There are three rules for properly handling the refcount of binaries passed to, or returned from, the C side through a FFI call. 1. when a binary is received as argument (either directly, or inside an ``ErlIOVec``), and the C side needs to keep a reference, then the refcount must be increased; 2. when a binary is created with ``driver_alloc_binary()``, it will have the refcount value of 1. It is considered to be *still* referenced by the C side; 3. as a consequence of the previous point, if the C side wants to return a newly-crated binary *without* keeping references, it must call ``driver_binary_dec_refc()`` before returning. Notes on type-tagged values --------------------------- As reported above, the high-level FFI API is based on type-tagged values. Type tags, however, may introduce yet another way to annotate/represent the types of Erlang function parameters --- and it may become an annoying redundancy, expecially now that type contracts are (probably) going to be introduced in Erlang [12]_. Thus, the high-level FFI API should be considered highly experimental and subject to change, depending on how type contracts will allow to achieve the same tasks (see `High-level API`_). This issue will need to be explored if/when contracts will be available in the standard Erlang/OTP distribution. Backwards Compatibility ======================= This EEP, and the proposed FFI patches (see below), do not introduce incompatibilities with the standard OTP release. However, three (possibly) relevant internal changes are required: 1. the ``driver_binary_dec_refc()`` function must be allowed to reach the refcount of 0 without errors or warnings (even when debugging). This is necessary in order to allow a C function to create a binary, drop its references and return it to the Erlang VM (see `Notes on Erlang binaries and reference counting`_); 2. as a consequence of the previous point, ``driver_binary_inc_refc()`` must be allowed to reach a minimum refcount of 1 without errors or warnings (the current minimum value is 2); 3. the ``iolist()`` -> ``ErlIOVec`` conversion code in ``io.c`` needs to be exposed as a stand-alone function, to be used by the FFI. Reference implementation ======================== An implementation of this EEP is available on [4]_ as a set of patches against OTP R11B-5. The code is based on the GCC FFI library (libffi) [5]_. libffi is multi-platform, can be packaged and used separately from the GCC source code, and is released under a very permissive license [6]_ (compatible with the Erlang Public License). It has been used to implement the FFI interface of several applications and languages, including Python [7]_. The current EEP implementation looks for libffi on the build system, and links the Erlang emulator against it (preferring the libffi shared library, when available). It may be a "good enough" approach, since libffi is usually pre-packaged and easily available on GNU/Linux, BSD and Solaris distributions. However, this approach may create troubles for developers that compile everything from scratch, could not install a precompiled libffi package, or just want to force static linking between the Erlang emulator and libffi. In order to address these issues, it is customary that a copy of libffi is distributed together with the host language, and possibly kept in sync with the upstream version. This is what Python actually does, and Erlang/OTP could possibly adopt the same approach depending on the developers' feedback. References ========== .. [1] IG: the Erlang Interface Generator, Törnquist and Lundell http://www1.erlang.org/documentation/doc-4.8.2/lib/ig-1.8/doc/index.html .. [2] The Evolution of Erlang Drivers and the Erlang Driver Toolkit, Fritchie http://www.erlang.se/workshop/2002/Fritchie.pdf .. [3] The Dryverl Erlang/C binding compiler http://dryverl.objectweb.org/ .. [4] Foreign Function Interface (FFI) for Erlang/OTP http://muvara.org/crs4/erlang/ffi .. [5] libffi: the GCC Foreign Function Interface Library http://gcc.gnu.org/viewcvs/trunk/libffi/ .. [6] The libffi license `http://gcc.gnu.org/viewcvs/*checkout*/trunk/libffi/LICENSE `_ .. [7] The CPython package http://python.net/crew/theller/ctypes/ .. [8] The Haskell 98 Foreign Function Interface http://www.cse.unsw.edu.au/~chak/haskell/ffi/ .. [9] The Java Native Interface http://java.sun.com/j2se/1.5.0/docs/guide/jni/ .. [10] Extending and Embedding the Python Interpreter http://docs.python.org/ext/ext.html .. [11] Python/C API Reference Manual http://docs.python.org/api/api.html .. [12] A Language for Specifying Type Contracts in Erlang and its Interaction with Success Typings, Jiménez Lindahl and Sagonas (Presented at the 2007 SIGPLAN Erlang Workshop). Paper: http://user.it.uu.se/~kostis/Papers/contracts.pdf Slides: http://www.erlang.se/workshop/2007/proceedings/03lindah.pdf Appendix ======== Erlang-to-C automatic type conversions -------------------------------------- The following table reports the Erlang-to-C conversions, used for passing Erlang terms as C function call arguments. ====================== =============================== C argument type Supported Erlang types ====================== =============================== uchar integer() schar integer() ushort integer() sshort integer() uint integer() sint integer() ulong integer() slong integer() uint8 integer() sint8 integer() uint16 integer() sint16 integer() uint32 integer() sint32 integer() uint64 integer() sint64 integer() float float() double float() longdouble float() pointer binary() | integer() cstring binary() | integer() nonnull binary() | integer() size_t integer() ssize_t integer() pid_t integer() off_t integer() binary binary() erliovec iolist() ====================== =============================== C-to-Erlang automatic type conversions -------------------------------------- The following table reports the C-to-Erlang conversions, used for converting C function return values into Erlang terms. ====================== =============================== C return type Resulting Erlang type ====================== =============================== uchar integer() schar integer() ushort integer() sshort integer() uint integer() sint integer() ulong integer() slong integer() uint8 integer() sint8 integer() uint16 integer() sint16 integer() uint32 integer() sint32 integer() uint64 integer() sint64 integer() float float() double float() longdouble float() pointer integer() cstring integer() nonnull integer() size_t integer() ssize_t integer() off_t integer() pid_t integer() binary binary() ====================== =============================== Copyright ========= Copyright (C) 2007 by CRS4 (Center for Advanced Studies, Research and Development in Sardinia) - http://www.crs4.it/ Author: Alceste Scalas This EEP is released under the terms of the Creative Commons Attribution 3.0 License. See http://creativecommons.org/licenses/by/3.0/ .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: