Binder Data Model
Simply speaking, the only thing you use Binder for is transferring data between processes. During a Binder transaction, the source process will pack all data to be transferred in a serializable form and hand it over to Binder driver. When the driver handles the Binder transaction, the serialized data will be copied to target process’s memory space. The target process will then deserialize the data into various objects. So in a typical Binder transaction the data is copied three times.
Parcel
The serialization and deserialization of data happen in native user space. Under Binder’s terminology, the serialization process is called parceling and they are modeled by the Parcel
class in both Java and C++. The Parcel
facilities are only used in user space. Specifically, it is a concept introduced by Android framework. Before sending data to Binder driver, a Parcel
needs to be converted to a form that is understood by the driver. But the conversion process is simple.
You can roughly think Parcel
as an area of continuous user space memory and a bunch of helper methods to flatten all types of data into it. The Java version of Parcel
doesn't have much business logic on itself. Most of the work is just delegated to it's peer C++ version Parcel
. In this article we will start from the Java version. An instance of Java Parcel
is created before a Binder transaction call. Let's take the requestOrder
interface method in IHamKingInterface
for example [1]:
The majority of Parcel
instances are created to facilitate Binder transactions and are released immediately after that. In the above example, the _data
is used to serialize incoming parameters and _reply
is used to receive the reply from target process. A Parcel
is created with obtain
method:
The obtain
method implements a typical object pool pattern. Parcel
objects are acquired and released very often, so such a pattern greatly reduces memory allocations and garbage collection. mNativePtr
points to the memory address of its native peer and mOwnsNativeParcelObject
tells whether the Java Parcel
owns the native Parcel
. During a Binder transaction, the data to be transferred moves in this flow: Java proxy Binder -> C++ proxy Binder -> Binder driver -> C++ local Binder -> Java local Binder. So on the proxy side a Java Parcel
creates a native peer and owns the native peer. On the local Binder side a Java Parcel
is initialized from an already created native peer and the native peer owns the Java Parcel
. We can see from the source code of Binder
that execTransact
will be called by native code. The dataObj
and replyObj
are the memory addresses of already created native peers:
The nativeCreate
method just creates a native Parcel
object and returns it's address to Java code. Let's have a look at the shape of the C++ Parcel
class:
The mData
points to an area of continuous memory allocated to this Parcel
and mDataCapacity
is the size of the allocated memory. mDataSize
is the size of memory that contains payload data. Of course mDataSize
<= mDataCapacity
. mDataPos
points to the position in the buffer where next read or write happens.
There is a long list of write*
and read*
methods to help users write and read different kinds data in the Parcel
. We cannot cover all of them so I just choose several representative ones. We can categorize all data types into two categories: primitives and live objects. The primitives are raw data types that are just transferred as is. For example, integers, strings, booleans are all primitive data types. Live objects are data types that carry additional information in addition to the bits value so the Binder driver needs to do additional processing on them. The types of live objects are as follows:
BINDER_TYPE_BINDER
and BINDER_TYPE_WEAK_BINDER
are native Binder objects. BINDER_TYPE_HANDLE
and BINDER_TYPE_WEAK_HANDLE
are proxy Binder objects. The term "weak" means the holder of the object should use a weak reference to it. The BINDER_TYPE_FD
and BINDER_TYPE_FDA
are a single file descriptor and file descriptor array respectively. The BINDER_TYPE_PTR
mean the object is a pointer to another data buffer. It is only used by scatter-gather transaction introduced in Android Oreo.
All types of data are packed into the same memory buffer pointed to by mData
so Binder driver needs to know the locations of the live objects. Otherwise the data just appear to be raw bits to Binder driver. The mObjects
is a separate data buffer containing the offsets of live objects in mData
. mObjectsCapacity
is the allocated size of the buffer and mObjectsSize
is the number of offsets in the buffer. Of course mObjectsSize
<= mObjectsCapacity
.
When a Parcel
is initialized no buffer is allocated. The actual allocation will happen when you start writing data into the Parcel
:
The writeBool
method converts a boolean to an integer before writing. In theory you only need one bit to pack a boolean. But we want to align the data to a four bytes boundary to improve data load performance. Because loading unaligned data causes additional memory access on many CPU architectures. Some wasted padding space is traded for better performance. The writeAligned
method first checks if there is enough space left for the data. If so just write the data to the buffer on line 17. Then it calls finishWrite
to adjust mDataPos
and mDataSize
accordingly. When the first data item is being written, line 15 will evaluate to false so growData
will be called to allocate memory:
The new size of the data buffer will be the larger of 128 bytes and 1.5 times the original size. When data buffer is allocated for the first time, continueWrite
will use malloc
in libc to allocate a chunk of memory. Otherwise realloc
or malloc
will be used depending on whether the user wants to initialize the expanded area with zeros.
Let’s look at another method writeString16
:
The string type is actually an array of characters so the length of the string needs to be written in the buffer first, otherwise the other process won’t know how many bytes to read from the buffer. Parcel
usually uses a length value of -1 to denote a null object. Otherwise, the length of the string is first written at line 6, then the characters in the string are copied to the buffer at line 11. Note a serialized string is padded up for the same reason as serialized boolean.
Note that these primitives are just serialized to the buffer without any data type information stored. How can the receiver side of the buffer know what types they are? For example, at line 6 an integer is written into the buffer, how can the target process know whether value is a standalone integer or the size of a string? The answer is that the user of a Parcel
needs to make sure that data items needs to be read out in exactly the same order they are written into. Let's take a look at the OrderInfo
in HamKing:
As we can see, the writeToParcel
writes data in the order of mOrderId
->mProteinType
->mSpicy
->mSauces
->mBurgerImage
. The readFromParcel
then needs to read out data in exactly the same order as they are written into the Parcel
.
The writeParcelable
method in a Parcel
just calls the writeToParcel
method of the Parcelable
:
Now let’s look at how live objects are serialized into a Parcel
. Before that let's look at the data structures that represent serialized live objects:
The binder_object_header
is a common header that's embedded in all kinds of data structures representing Binder live objects. When parsing a live object, the kernel first reads out the header to determine the type of the live object then reads out the whole data structure. All the data structures live in the uapi
directory of Linux header so they are contracts between user space and Binder driver. Serialized live objects have the same shapes in both Binder driver and Parcel
.
The flat_binder_object
data structure represents a serialized BBinder
or BpBinder
. For BBinder
the type
field is BINDER_TYPE_BINDER
or BINDER_TYPE_WEAK_BINDER
. The binder
field will be the memory address of a weak reference object in the corresponding BBinder
and the cookie
field will be the address of BBinder
itself. For BpBinder
the type
field is BINDER_TYPE_HANDLE
or BINDER_TYPE_WEAK_HANDLE
. The cookie
field will not be used for serialized BpBinder
and the handle
field is the handle value generated by Binder driver which we talked about in the article "Binder architecture and core components". The cookie
, binder
and handle
all act as a kind of identifier of a Binder object. The cookie
and binder
are generated by user space and registered to Binder driver. The handle
is generated by Binder driver and used by user space.
The binder_fd_object
represents a file descriptor to be transferred. As we know a file descriptor represents an opened file for a process and it is just an integer value. However we cannot pass this integer to target process as is since they have per-process scopes: A file descriptor value in one process has no meaning in another. So the kernel needs to open a file descriptor in the target process to point to the same underlying file when a file descriptor goes across process boundary. The fd
field in binder_fd_object
represents the file descriptor value.
Let’s go back to the source code of Parcel
. The writeStrongBinder
writes a strongly referenced BBinder
or BpBinder
to the Parcel
. If val
is a BBinder
then local
will not be null. Then line 23 to 25 sets the header type and memory addresses of the flat_binder_object
. If val
is a BpBinder
then the handle
field is set. After the flat_binder_object
is prepared so it is written into the data buffer inside writeObject
at line 39. Besides, the offset of this flattened IBinder
is written to the offsets buffer mObjects
at line 41. The code that allocates and expands the offsets buffer is not shown since it is pretty much the same logic as the mData
buffer. As we can see the actual content that gets serialized is just the memory address or the handle value. This is not a real serialization strictly speaking.
The writeFileDescriptor
method serializes a file descriptor into a Parcel
. The most important data gets written to the buffer is the integer value of the file descriptor at line 9.
Transfer data with kernel
We have seen how different kinds of data are serialized into a user space buffer. Now let’s look at how it is sent to the kernel. The transaction starts when the transact
method is called on a proxy Binder. The Java proxy Binder class is BinderProxy
and C++ proxy Binder class is BpBinder
. The BinderProxy
will just call it's peer BpBinder
to do the transaction so we will only look at the BpBinder
class:
The IPCThreadState
is a class that is used by a user space thread to interact with Binder driver. Each thread has a singleton IPCThreadState
which means multiple calls to the IPCThreadState::self
will result in the same instance of IPCThreadState
. It manages the state of a thread that is interacting with Binder driver:
The only two fields of interest in this article are mIn
and mOut
which are two Parcel
instances that are solely used by this thread. The mIn
is used to receive data from Binder driver and mOut
is used to send data to Binder driver. The way a thread interacts with Binder driver will be discussed in detail in the article "Binder threading model".
The writeTransactionData
method writes the incoming Parcel
into mOut
for transaction. After that waitForResponse
is called to send the data to Binder driver. If TF_ONE_WAY
is set then this transaction is asynchronous. Most Binder transactions that happen are synchronous though. The writeTransactionData
creates a binder_transaction_data
structure and sets the data buffer pointer and offset buffer pointer from line 26 to 29. Then line 33 writes the structure to the outgoing Parcel
.
The waitForResponse
method calls talkWithDriver
to transfer the data to Binder driver. The Binder reply code BR_REPLY
from driver means the Binder transaction is finished. The binder_write_read
structure defines the data format that Binder driver receives and sends payload data. User space needs to convert a Parcel
to this format then use BINDER_WRITE_READ
command with ioctl
system call to transfer the data to Binder driver. The binder_write_read
data structure is as follows:
During a single BINDER_WRITE_READ
, the calling thread can send some data to Binder driver and receive some data from Binder driver. The write_size
tells driver how many bytes it wants to send to kernel and write_buffer
points the memory address of the data to be sent. Binder driver will fill in the write_consumed
field with the number of bytes it actually consumed. The read_size
tells Binder driver the maximum number of bytes it can receive from kernel and read_buffer
points the location to receive data. Binder driver will fill in the read_consumed
field with the number of bytes received from kernel. The mIn
and mOut
fields in IPCThreadState
matches this BINDER_WRITE_READ
transfer mode.
The doReceive
parameter of talkWithDriver
defaults to true which mean the calling thread wants to receive data from driver. It is usually set to true unless the calling thread is flushing pending commands to the driver. As we can see, the talkWithDriver
method basically assigns the buffer address in mOut
to write_buffer
and address in mIn
to read_buffer
then starts the ioctl
call.
In Binder driver, the binder_ioctl_write_read
function is used to handle a BINDER_WRITE_READ
command. The four steps to handle this command is very clear. Line 22 copies the binder_write_read
data structure from user space; line 23 to 28 handles receiving data from write_buffer
; line 29 to 34 handles sending data to read_buffer
; line 35 copies the updated binder_write_read
structure back to user space.
This article focuses on the data modeling aspects of Binder, so I will not go deep into the detail of Binder transaction right now. Binder transaction is discussed in detail in the article “Binder transaction”. In short, in user space Binder uses Parcel
to mode data. User space needs to convert the data into a binder_write_read
data structure and use BINDER_WRITE_READ
command with ioctl
to send and receive data to and from driver.