Binder Data Model

Baiqin Wang
The Startup
Published in
10 min readFeb 4, 2021

--

Simply speaking, the only thing you use Binder for is transferring data between processes. During a Binder transaction, the source process will pack all data to be transferred in a serializable form and hand it over to Binder driver. When the driver handles the Binder transaction, the serialized data will be copied to target process’s memory space. The target process will then deserialize the data into various objects. So in a typical Binder transaction the data is copied three times.

Parcel

The serialization and deserialization of data happen in native user space. Under Binder’s terminology, the serialization process is called parceling and they are modeled by the Parcel class in both Java and C++. The Parcel facilities are only used in user space. Specifically, it is a concept introduced by Android framework. Before sending data to Binder driver, a Parcel needs to be converted to a form that is understood by the driver. But the conversion process is simple.

You can roughly think Parcel as an area of continuous user space memory and a bunch of helper methods to flatten all types of data into it. The Java version of Parcel doesn't have much business logic on itself. Most of the work is just delegated to it's peer C++ version Parcel. In this article we will start from the Java version. An instance of Java Parcel is created before a Binder transaction call. Let's take the requestOrder interface method in IHamKingInterface for example [1]:

The majority of Parcel instances are created to facilitate Binder transactions and are released immediately after that. In the above example, the _data is used to serialize incoming parameters and _reply is used to receive the reply from target process. A Parcel is created with obtain method:

The obtain method implements a typical object pool pattern. Parcel objects are acquired and released very often, so such a pattern greatly reduces memory allocations and garbage collection. mNativePtr points to the memory address of its native peer and mOwnsNativeParcelObject tells whether the Java Parcel owns the native Parcel. During a Binder transaction, the data to be transferred moves in this flow: Java proxy Binder -> C++ proxy Binder -> Binder driver -> C++ local Binder -> Java local Binder. So on the proxy side a Java Parcel creates a native peer and owns the native peer. On the local Binder side a Java Parcel is initialized from an already created native peer and the native peer owns the Java Parcel. We can see from the source code of Binder that execTransact will be called by native code. The dataObj and replyObj are the memory addresses of already created native peers:

The nativeCreate method just creates a native Parcel object and returns it's address to Java code. Let's have a look at the shape of the C++ Parcel class:

The mData points to an area of continuous memory allocated to this Parcel and mDataCapacity is the size of the allocated memory. mDataSize is the size of memory that contains payload data. Of course mDataSize <= mDataCapacity. mDataPos points to the position in the buffer where next read or write happens.

There is a long list of write* and read* methods to help users write and read different kinds data in the Parcel. We cannot cover all of them so I just choose several representative ones. We can categorize all data types into two categories: primitives and live objects. The primitives are raw data types that are just transferred as is. For example, integers, strings, booleans are all primitive data types. Live objects are data types that carry additional information in addition to the bits value so the Binder driver needs to do additional processing on them. The types of live objects are as follows:

BINDER_TYPE_BINDER and BINDER_TYPE_WEAK_BINDER are native Binder objects. BINDER_TYPE_HANDLEand BINDER_TYPE_WEAK_HANDLE are proxy Binder objects. The term "weak" means the holder of the object should use a weak reference to it. The BINDER_TYPE_FD and BINDER_TYPE_FDA are a single file descriptor and file descriptor array respectively. The BINDER_TYPE_PTR mean the object is a pointer to another data buffer. It is only used by scatter-gather transaction introduced in Android Oreo.

All types of data are packed into the same memory buffer pointed to by mData so Binder driver needs to know the locations of the live objects. Otherwise the data just appear to be raw bits to Binder driver. The mObjects is a separate data buffer containing the offsets of live objects in mData. mObjectsCapacity is the allocated size of the buffer and mObjectsSize is the number of offsets in the buffer. Of course mObjectsSize <= mObjectsCapacity.

When a Parcel is initialized no buffer is allocated. The actual allocation will happen when you start writing data into the Parcel:

The writeBool method converts a boolean to an integer before writing. In theory you only need one bit to pack a boolean. But we want to align the data to a four bytes boundary to improve data load performance. Because loading unaligned data causes additional memory access on many CPU architectures. Some wasted padding space is traded for better performance. The writeAligned method first checks if there is enough space left for the data. If so just write the data to the buffer on line 17. Then it calls finishWrite to adjust mDataPos and mDataSize accordingly. When the first data item is being written, line 15 will evaluate to false so growData will be called to allocate memory:

The new size of the data buffer will be the larger of 128 bytes and 1.5 times the original size. When data buffer is allocated for the first time, continueWrite will use malloc in libc to allocate a chunk of memory. Otherwise realloc or malloc will be used depending on whether the user wants to initialize the expanded area with zeros.

Let’s look at another method writeString16:

The string type is actually an array of characters so the length of the string needs to be written in the buffer first, otherwise the other process won’t know how many bytes to read from the buffer. Parcel usually uses a length value of -1 to denote a null object. Otherwise, the length of the string is first written at line 6, then the characters in the string are copied to the buffer at line 11. Note a serialized string is padded up for the same reason as serialized boolean.

Note that these primitives are just serialized to the buffer without any data type information stored. How can the receiver side of the buffer know what types they are? For example, at line 6 an integer is written into the buffer, how can the target process know whether value is a standalone integer or the size of a string? The answer is that the user of a Parcel needs to make sure that data items needs to be read out in exactly the same order they are written into. Let's take a look at the OrderInfo in HamKing:

As we can see, the writeToParcel writes data in the order of mOrderId->mProteinType->mSpicy->mSauces->mBurgerImage. The readFromParcel then needs to read out data in exactly the same order as they are written into the Parcel.

The writeParcelable method in a Parcel just calls the writeToParcel method of the Parcelable:

Now let’s look at how live objects are serialized into a Parcel. Before that let's look at the data structures that represent serialized live objects:

The binder_object_header is a common header that's embedded in all kinds of data structures representing Binder live objects. When parsing a live object, the kernel first reads out the header to determine the type of the live object then reads out the whole data structure. All the data structures live in the uapi directory of Linux header so they are contracts between user space and Binder driver. Serialized live objects have the same shapes in both Binder driver and Parcel.

The flat_binder_object data structure represents a serialized BBinder or BpBinder. For BBinder the type field is BINDER_TYPE_BINDER or BINDER_TYPE_WEAK_BINDER. The binder field will be the memory address of a weak reference object in the corresponding BBinder and the cookie field will be the address of BBinder itself. For BpBinder the type field is BINDER_TYPE_HANDLE or BINDER_TYPE_WEAK_HANDLE. The cookie field will not be used for serialized BpBinder and the handle field is the handle value generated by Binder driver which we talked about in the article "Binder architecture and core components". The cookie, binder and handle all act as a kind of identifier of a Binder object. The cookie and binder are generated by user space and registered to Binder driver. The handle is generated by Binder driver and used by user space.

The binder_fd_object represents a file descriptor to be transferred. As we know a file descriptor represents an opened file for a process and it is just an integer value. However we cannot pass this integer to target process as is since they have per-process scopes: A file descriptor value in one process has no meaning in another. So the kernel needs to open a file descriptor in the target process to point to the same underlying file when a file descriptor goes across process boundary. The fd field in binder_fd_object represents the file descriptor value.

Let’s go back to the source code of Parcel. The writeStrongBinder writes a strongly referenced BBinder or BpBinder to the Parcel. If val is a BBinder then local will not be null. Then line 23 to 25 sets the header type and memory addresses of the flat_binder_object. If val is a BpBinder then the handle field is set. After the flat_binder_object is prepared so it is written into the data buffer inside writeObject at line 39. Besides, the offset of this flattened IBinder is written to the offsets buffer mObjects at line 41. The code that allocates and expands the offsets buffer is not shown since it is pretty much the same logic as the mData buffer. As we can see the actual content that gets serialized is just the memory address or the handle value. This is not a real serialization strictly speaking.

The writeFileDescriptor method serializes a file descriptor into a Parcel. The most important data gets written to the buffer is the integer value of the file descriptor at line 9.

Transfer data with kernel

We have seen how different kinds of data are serialized into a user space buffer. Now let’s look at how it is sent to the kernel. The transaction starts when the transact method is called on a proxy Binder. The Java proxy Binder class is BinderProxy and C++ proxy Binder class is BpBinder. The BinderProxy will just call it's peer BpBinder to do the transaction so we will only look at the BpBinder class:

The IPCThreadState is a class that is used by a user space thread to interact with Binder driver. Each thread has a singleton IPCThreadState which means multiple calls to the IPCThreadState::self will result in the same instance of IPCThreadState. It manages the state of a thread that is interacting with Binder driver:

The only two fields of interest in this article are mIn and mOut which are two Parcel instances that are solely used by this thread. The mIn is used to receive data from Binder driver and mOut is used to send data to Binder driver. The way a thread interacts with Binder driver will be discussed in detail in the article "Binder threading model".

The writeTransactionData method writes the incoming Parcel into mOut for transaction. After that waitForResponse is called to send the data to Binder driver. If TF_ONE_WAY is set then this transaction is asynchronous. Most Binder transactions that happen are synchronous though. The writeTransactionData creates a binder_transaction_data structure and sets the data buffer pointer and offset buffer pointer from line 26 to 29. Then line 33 writes the structure to the outgoing Parcel.

The waitForResponse method calls talkWithDriver to transfer the data to Binder driver. The Binder reply code BR_REPLY from driver means the Binder transaction is finished. The binder_write_read structure defines the data format that Binder driver receives and sends payload data. User space needs to convert a Parcel to this format then use BINDER_WRITE_READ command with ioctl system call to transfer the data to Binder driver. The binder_write_read data structure is as follows:

During a single BINDER_WRITE_READ, the calling thread can send some data to Binder driver and receive some data from Binder driver. The write_size tells driver how many bytes it wants to send to kernel and write_buffer points the memory address of the data to be sent. Binder driver will fill in the write_consumed field with the number of bytes it actually consumed. The read_size tells Binder driver the maximum number of bytes it can receive from kernel and read_buffer points the location to receive data. Binder driver will fill in the read_consumed field with the number of bytes received from kernel. The mIn and mOut fields in IPCThreadState matches this BINDER_WRITE_READ transfer mode.

The doReceive parameter of talkWithDriver defaults to true which mean the calling thread wants to receive data from driver. It is usually set to true unless the calling thread is flushing pending commands to the driver. As we can see, the talkWithDriver method basically assigns the buffer address in mOut to write_buffer and address in mIn to read_buffer then starts the ioctl call.

In Binder driver, the binder_ioctl_write_read function is used to handle a BINDER_WRITE_READ command. The four steps to handle this command is very clear. Line 22 copies the binder_write_read data structure from user space; line 23 to 28 handles receiving data from write_buffer; line 29 to 34 handles sending data to read_buffer; line 35 copies the updated binder_write_read structure back to user space.

This article focuses on the data modeling aspects of Binder, so I will not go deep into the detail of Binder transaction right now. Binder transaction is discussed in detail in the article “Binder transaction”. In short, in user space Binder uses Parcel to mode data. User space needs to convert the data into a binder_write_read data structure and use BINDER_WRITE_READ command with ioctl to send and receive data to and from driver.

External links

[1] https://github.com/androidonekb/HamKing

--

--