Binder Lifecycle Management

19 min readFeb 4, 2021

Death notification

Death notification is a Binder mechanism to allow a BpBinder to get notified when the BBinder it references to dies. I briefly touched on this topic in the "Binder architecture and core components" article. In this article we are going to take a deeper look into its implementation. When you say a BBinder dies, it means the process that created this BBinder terminates due to unexpected crashing or intentional exiting. For example, in the server module of HamKing project [1], the requestOrder method in RemoteService uses the linkToDeath method to register a death listener for the ICreditCard so that it can remove any order that is "purchased" with this ICreditCard instance:

The ICreditCard instance is created in client app process before calling requestOrder and passed to the server app process. So the server app process holds a proxy of ICreditCard and the asBinder method returns a BinderProxy object.

The DeathRecipient interface contains a single callback method binderDied which will be invoked by Binder framework when the remote BBinder dies. The linkToDeath method is a native method that calls the corresponding linkToDeath method on its peer BpBinder:

mObitsSent is a flag that gets set if this BpBinder receives a death notification. This flag will never be unset because if the remote process dies, it dies forever, so does the BBinder it references to. There is no concept of reviving for BBinder. So if that flag is set, the method just returns a DEAD_OBJECT which will cause JNI layer to throw a DeadObjectException in Java code. Otherwise, this recipient is wrapped inside an Obituary object and added into mObituaries list. If this is the first death listener to register, then the requestDeathNotification and flushCommands methods in IPCThread state are called to register a death listener to Binder driver. As we can see, no matter how many user space death listeners are registered through linkToDeath, the Binder framework only register a death recipient to the driver once.

A BC_REQUEST_DEATH_NOTIFICATION command code is written to the thread's output buffer. The handle parameter is used to identify target BBinder and the memory address of the BpBinder is used as a cookie so that IPCThreadState can find the corresponding BpBinder when a death notification is received from Binder driver. flushCommands method will trigger an ioctl system call to register a death recipient to the driver. On the other hand, unlinkToDeath is used to clear a previously registered death notification and the corresponding method in IPCThreadState is clearDeathNotification:

The only difference is the command code. We will jump to the relevant code in Binder driver. Please read through previous articles in this series if you feel hard to follow.

The BC_CLEAR_DEATH_NOTIFICATION and the BC_REQUEST_DEATH_NOTIFICATION are handled together because their data contract are the same. (They both need a handle value and and the address of BpBinder.) The binder_ref_death represents a death recipient in Binder driver. It has an embedded binder_work so that it can be enqueued onto a todo list. The cookie is used by user space to identify death notification receiver, it is transparent to the driver.

binder_get_ref returns the corresponding binder_ref structure using the handle value. In short, all that BC_REQUEST_DEATH_NOTIFICATION does is to create a binder_ref_death structure and let the corresponding binder_ref point to it. However, line 35 to 43 handles a special case where the target process is already dead. In this case, a death notification will be sent back right away. As we can see, the way Binder driver knows a process is alive is by checking whether the target binder_node points a binder_proc structure. Line 36 initializes a work item and enqueue it onto a todo list. If the calling thread is a Binder thread, then line 38 enqueues the work item onto the thread's todo list. Otherwise it is enqueued to the process's todo list so that a Binder thread in the process can pick it up. In the HamKing example code above, the linkToDeath method is called by a Binder thread in server app process so the death notification handling will be scheduled on the same Binder thread. However, if a UI thread calls linkToDeath then the death notification handling will be scheduled on a Binder thread in the process.

Let’s look at the handling of BC_CLEAR_DEATH_NOTIFICATION command. If line 47 is true, then it means the binder_ref_death hasn't been enqueued onto a todo list which is the normal case. Otherwise, it means the remote process is dead and the driver has scheduled the work on a todo list but the target thread hasn't handled it yet. In the normal case, a BINDER_WORK_CLEAR_DEATH_NOTIFICATION work type is created to indicate a successful death recipient removal, the work is then enqueued onto a todo list. Otherwise line 56 changes the work type from BINDER_WORK_DEAD_BINDER to BINDER_WORK_DEAD_BINDER_AND_CLEAR which is a combination of the previous two work types. The work doesn't need to be scheduled since it is already on a todo list.

In the corner case that the remote process is already dead when a dead recipient is being registered or deregistered, a death notification will be sent back immediately. But in the normal case where the target process is still alive, when the driver detects the death of the target process later, it will distribute death notifications to all death recipients listening for it.

How does Binder driver detect the death of a BBinder? When a process crashes or exits, Linux kernel needs to clean up the resources that the process uses. One of the cleanups is to call the release function hook in all the files that this process opens. For /dev/binder, the function hook points to binder_release who will eventually call binder_deferred_release:

Line 8 to 13 gets all the binder_node structures in the dying process and calls binder_node_release on each binder_node. Line 21 clears the proc field in the binder_node structure so that the binder_refs pointing to it will know the binder_node is already dead later. Line 24 to 32 tries to deliver death notifications to each binder_ref that points to this binder_node. If a binder_ref does have a death recipient added, then line 29 to 31 initializes a work item and enqueues it to the referencing process's todo list.

We have seen how a dead notification work is scheduled, let’s study how the listening process handles the death notification.

Like all other kinds of work items, death notification related works will be processed in binder_thread_read in which a thread tries to read data from Binder driver. If the work type is BINDER_WORK_CLEAR_DEATH_NOTIFICATION, then it is just a notifier that the death recipient removal is successful so a return code BR_CLEAR_DEATH_NOTIFICATION_DONE will be sent to user space. Otherwise it means the remote process is dead so a BR_DEAD_BINDER is sent to user space. Line 45 to 48 writes the return command and the cookie value to the reading thread's receive data buffer.

BR_CLEAR_DEATH_NOTIFICATION_DONE is handled by just decrementing the weak reference count on the corresponding BpBinder object. BR_DEAD_BINDER indicates an actual death notification, line 10 casts the cookie back to a BpBinder and calls the sendObituary method to invoke death notification callbacks. Besides, a BC_DEAD_BINDER_DONE command is sent to Binder driver to tell it the death notification is handled in user space. We won't go into BC_DEAD_BINDER_DONE since what it does is quite trivial.

The sendObituary method invokes binderDied callback in each DeathRecipient object. Beside, it will remove the death recipient from Binder driver.

Binder reference counting

In the article “Smart pointers” we learned a reference counting based framework for automatic object deallocation. This framework enables you to use sp and wp classes to reference reference counting enabled objects inside a process's memory space. Binder framework expands the concept of reference counting across process boundaries. This is necessary because Binder is an object oriented IPC mechanism where each entity that can go across process boundaries is an object that inherits IBinder. The BBinder is the target object to be remotely referenced by BpBinders in other processes. With this object oriented design, it is natural for Binder to implement a reference counting based mechanism to manage the lifetimes of Binder objects. The reference counting is implemented with the help of Android framework's smart pointers.‌

There are two parts in an object referencing structure: the referencer and the target object. For example, in the smart pointer framework, the wp or sp is the referencer and the underlying object that inherits fromRefBase or LightRefBase is the target object. The target object keeps the reference counters and the referencer increments or decrements the counters. There is only one link in this chain of reference, but when a referencing structure goes across process boundaries, the referencing chain is elongated. There are three links in a typical Binder referencing chain. Now let's assume a BpBinder in process "A" references a BBinder in process "B":‌

(1) BpBinder references a kernel space binder_ref structure in the scope of process "A". Here BpBinder is the referencer and the binder_ref is the target object. So the binder_ref needs to keep the reference counters and BpBinder needs to increment and decrement the counters. BpBinder uses four command codes to change the reference counters on binder_ref: BC_INCREFS, BC_ACQUIRE, BC_DECREFS and BC_RELEASE.‌

(2) A binder_ref kernel structure references a target binder_node structure. Here the binder_ref is the referencer and binder_node is the target object. The binder_node structure needs to keep counters of this kind of reference and binder_ref changes the counters, conceptually. (I say "conceptually" because all this happens in the kernel, so it might be vague to define who keeps the counters and who changes the counters.) Binder driver calls this kind of reference "internal reference" since this is a reference inside Binder driver.‌

(3) A binder_node references a BBinder object in the user space of process "B". Here the binder_node is the referencer and BBinder is the target object. BBinder enables reference counting by inheriting RefBase class. binder_node uses four return codes to change reference counters on BBinder: BR_INCREFS, BR_ACQUIRE, BR_DECREFS and BR_RELEASE.‌

This three links chain describes a stable referencing structure in Binder. I use the word “stable” because this chain exists as long as the BpBinder and BBinder are alive. There are at least two other types of referencing structures in Binder. They are all reference counted by the same countered in BpBinder, binder_ref, binder_node and BBinder. So basically at any given time, a reference counter value is a sum of references coming from all three referencing structures.

The second referencing chain comes from an ongoing Binder transaction. It is a dynamic referencing structure in that it only contributes to the reference counters during a Binder transaction. Let’s say process “A” sends process “B” some serialized data during a transaction. While process “B” is processing this incoming BR_TRANSACTION, it holds the buffer that contains the data copied from process "A". Since this buffer may contain some flat_binder_object structures which are basically serialized binder_refs and binder_nodes, corresponding counters in the binder_refs and binder_nodes need to be incremented. As soon as process "B" is done with the transaction, it will free the buffer with BC_FREE_BUFFER command code, after which the reference counting will be removed from the corresponding binder_refs and binder_nodes. The same reference counting changes are applied when process "A" handles the BR_REPLY return code by holding and releasing the reply data buffer. While a process is handling the transaction buffer, usually it wants to hold a long time reference to a Binder object contained in the buffer. In this case the process needs to create a BpBinder and uses the four BC_* commands to get a stable reference to the remote BBinder.

The third referencing structure doesn’t have to do with Binder. We know that both BBinder and BpBinder inherits RefBase class so that you can use a wp and sp to reference them in local process. So Binder doesn't care how a process references them with wp and sp locally, almost. If a process creates BBinder but only uses it locally and manages it with wp and sp, that's totally fine. Binder driver knows nothing about them as long as this process doesn't pass it through Binder driver. For example, the LocalService in the server module of HamKing is such a BBinder. But things are a little different on the BpBinder side. When you use a wp or sp to reference a BpBinder, you are actually referencing the underlying BBinder. You can't use wp or sp to reference a BpBinder without letting Binder driver know. In fact, referencing a BpBinder with wp or sp will trigger the BC_* command actions under certain cases.

Strong and weak Binders

Before we look into the internals of how Binder implements these reference counting structures, let’s talk about strong and weak Binders a bit. Strong and weak reference is a common concept in object oriented programming and Binder adopts this concept in its design principles. We have seen the writeStrongBinder method in Parcel class in the "Binder data model" article:

The writeStrongBinder serializes a strongly referenced IBinder object into the Parcel in preparation for transaction. By calling this method, the caller tells Binder framework it wants to pass a strong Binder type to remote process. Let's assume process "A" creates a BBinder and calls writeStrongBinder to serialize it. It doesn't matter how process "A" references this BBinder object locally, as long as it calls writeStrongBinder with a sp type argument, it tells Binder framework it wants to pass a strong type Binder for IPC purposes. When another process "B" receives the transaction buffer, it can read out the Binder object as a strong proxy with readStrongBinder, or read out as a weak proxy through readWeakBinder. If process "B" reads it out as a strong proxy, it can directly interact with the remote BBinder, otherwise it needs to first promote it to a strong proxy before interacting with the remote BBinder.

Accordingly, there is a writeWeakBinder and readWeakBinder method in Parcel. The methods are removed in latest Android version, so you can only find them in older Android releases:

If process “A” creates BBinder and serializes it with writeWeakBinder, then a remote process "B" can only read out a weak proxy through readWeakBinder to get a wp pointing to a BpBinder. If process "B" wants to interact with the remote BBinder, it needs to first promote the weak handle to a strong handle. The promotion is done by the BC_ATTEMPT_ACQUIRE command code. Process "B" sends this command code to Binder driver which will eventually enqueue a BR_ATTEMPT_ACQUIRE return code for process "A" to process. Process "A" handles this BR_ATTEMPT_ACQUIRE by trying to promote the native BBinder. Process "A" will then send back a boolean value indicating whether the promotion is successful using the BC_ACQUIRE_RESULT command code. This sequence is basically expanding the logic of attemptIncStrong method in RefBase across process boundaries.

As we described in the “Smart pointers” article, the onIncStrongAttempted callback will be invoked if you are trying to promote a wp to sp that points to a BpBinder for the first time. BpBinder will then try to promote the remote BBinder to a strong Binder through BC_ATTEMPT_ACQUIRE command. The target process will then try to promote the local BBinder and return the result.

Due to the concept of weak references in Binders, when a BBinder is serialized, the address of the weak reference type weakref_impl will be written to Binder driver and this address is used as the unique key of binder_node. This is necessary because weak references may outlive the target object so it is possible that the BBinder is already deallocated when some remote processes are still weakly referencing it. In this case, the weakref_impl still needs to be alive for Binder driver to manage the already deallocated BBinder.

This is an introduction of the concept of strong and weak Binders. But the bummer is that weak Binder is never implemented in Android. The Binder code base had a blueprint for weak Binder support and Android native framework had code supporting that. However, corresponding support was never implemented in Binder driver. Specifically, the BC_ATTEMPT_ACQUIRE and BC_ACQUIRE_RESULT command codes are never implemented by Binder driver:

Without the capability of promoting a weak Binder to strong Binder, the weak Binder becomes useless. Recent Android versions just removed the readWeakBinder and writeWeakBinder code in Parcel as well. Even though weak Binder is not supported right now, it is still important to understand it because Binder was designed with that in mind.

Binder reference counting implementation

The reference counters in BpBinder and BBinder comes from the RefBase base class. Besides, binder_ref and binder_node also need to have certain reference counters to support the different reference structures described above.

The strong and weak fields in binder_ref_data records the number of strong and weak references coming from user space. In user space it is actually the BpBinder that actually manages the counts. In theory, many instances of BpBinder can be created in user space and each will contribute to the reference count. But in reality, ProcessState keeps a global cache of BpBinder instances so that at most one BpBinder instance exists given a handle value. So in fact only one BpBinder will contribute to the reference counts. In addition to the references coming from BpBinders, a Binder transaction will add to the counts if the transaction buffer contains a flat_binder_object that represents the corresponding binder_ref.‌

The internal_strong_refs is the number of binder_ref structures that are strongly referencing this binder_node. The term "internal" comes from the fact that the reference comes from Binder driver. Since binder_ref indicates a reference from another process, this number actually means the number of strong references coming from other processes. The refs field is a hash table of all binder_ref structures referencing this binder_node. Since a strong reference always indicates a weak reference, so the size of refs is the number of weak internal references. (You can also say the weak internal count is the size of refs minus internal_strong_refs, it doesn't matter.) Since a process can have at most one binder_ref pointing to a certain binder_node, the internal_strong_refs and the size of refs actually indicate the number of remote processes that are referencing this binder_node strongly and weakly. While a process is handling BR_TRANSACTION or BR_REPLY, the transaction buffer may contain flat_binder_object structures of type BINDER_TYPE_BINDER or BINDER_TYPE_WEAK_BINDER which represent BBinder objects created in local process. While user space is processing the transaction buffer, the local_weak_refs or local_strong_refs field needs to be incremented to record such references coming from user space of local process. The local reference counts will decrement after user space frees the transaction buffer with BC_FREE_BUFFER. The local_weak_refs and local_strong_refs only makes sense during a transaction, otherwise a process just references a local BBinder through a sp or wp, the driver won't need to know about these references, needless to say keep reference counts for them. For example, in the pickupOrder method in the RemoteService class in HamKing project, an IOrderSession object that was created by the server app process is sent back from client app process. While a Binder thread in server app process is handling the pickupOrder request, the local_strong_refs in the binder_node structure for the IOrderSession needs to be incremented until server app is done with the pickupOrder call.‌

The has_strong_ref and has_weak_ref indicates whether Binder driver has requested the corresponding BBinder to increment strong or weak reference counting. In another word, they indicates whether this binder_node is referencing the user space BBinder object. As long as the current binder_node still has any reference count greater than zero, the binder_node needs to use the four BR_* return codes to ask the BBinder to increment reference counts, so to keep the BBinder alive. If all reference counts on a binder_node drop to zero, the driver will free the binder_node and no longer cares about whether the user space BBinder is alive. After the driver asks user space to increment the reference counts of the BBinder and before the user space sends back a confirmation, pending_strong_ref or pending_weak_ref flag will be set.

Next let’s trace through the implementation of the stable referencing structure in Binder.

A BpBinder object will be created as soon as a process receives a flat_binder_object with type BINDER_TYPE_HANDLE or BINDER_TYPE_WEAK_HANDLE. As soon as a BpBinder is created, a BC_INCREFS command needs to be used to increment the weak reference count on the corresponding binder_ref structure. This is essential to prevent the binder_ref from being deallocated by Binder driver. Besides, the weak count will be decremented with BC_DECREFS when the BpBinder is destructed. When the BpBinder is referenced by a sp for the first time, a BC_ACQUIRE will be used to increment the strong count on binder_ref structure. The strong count will be decremented when last sp on the BpBinder is removed. The extendObjectLifetime method is used to avoid the BpBinder being deallocated when the last strong reference on it is removed. This method in smart pointer framework is specially added for BpBinder and BpRefBase to handle this case.

The four reference count changing commands are handled together. BC_INCREFS or BC_ACQUIRE increments weak or strong reference count respectively; BC_DECREFS or BC_RELEASE decrements weak or strong reference count respectively. Line 30 to 38 handles a special case for context manager node. Reference counting for context manager node is a little special but similar to normal cases so we won't talk about it separately. Basically binder_update_ref_for_handle is used to update reference counts.

Line 11 gets the binder_ref structure from the process's binder_proc structure. binder_inc_ref_olocked is used to handle count incrementing and binder_dec_ref_olocked is used to handle count decrementing. In normal cases, this two functions just changes the weak and strong field in binder_ref_data which is what the BC_* commands meant to do. The reference count changes need to propagate the chain of reference to the target binder_node sometimes. It only happens when a count value changes from 0 to 1 or from 1 to 0. When a value changes from 0 to 1, binder_inc_node is called to increment internal reference count on the binder_node, or in the 1 to 0 case, binder_dec_node is called to decrement internal reference count. If both strong and weak reference count on the binder_ref drops zero after the change, not only does the count change should propagate to binder_node, the binder_ref structure itself needs to be cleaned up and freed.

Both binder_dec_node_nilocked and binder_inc_node_nilocked functions will adjust the corresponding reference count based on the internal and strong arguments. Note that in our scenario the internal is set to 1 because the reference is coming from another binder_ref. Besides adjusting the reference counters in binder_node, the reference count changes need to propagate to user space BBinder in certain cases. Line 14 to 20 handles the case where a strong reference count is incremented on the binder_node, but the driver hasn't asked user space to increment the strong reference count on the BBinder. Line 24 to 26 handles the similar case for weak reference count. The way Binder driver asks user space to adjust reference counts on the BBinder is enqueuing the binder_node onto the todo list of target process. The binder_work structure embedded in a binder_node has type BINDER_WORK_NODE. Similarly, in binder_dec_node_nilocked, line 56 to 61 handles the case where the total number of strong references or weak references drops to zero after decrementing by enqueuing the binder_node onto the todo list of target thread. Line 62 to 69 handles the case where all kinds of references on the binder_node drop to zero after decrementing by returning a true, so that binder_free_node will be called to free the binder_node structure on line 79. As we can see, the driver will only ask user space to adjust reference count on BBinder object only when a type of reference count on binder_node changes from 0 to 1 or from 1 to 0. This is very similar to how the reference counts on binder_ref affects the target binder_node.

Same as all other kinds of binder_work, a Binder thread will process the BINDER_WORK_NODE in binder_thread_read:

The way a target thread handles a BINDER_WORK_NODE is to compare the expected reference counting state of target BBinder and current state, if they are different then writes a corresponding BR_* command for user space to handle. The strong and weak variables indicates whether the BBinder should increment its strong reference count or weak reference count, respectively. For example, line 29 to 33 handles the case where the weak reference count on BBinder should be incremented but the driver never asked user space to do so. In this case a BR_INCREFS return code is written to user space so that the weak reference count on BBinder can be incremented. Other three kinds of mismatches are handled similarly. Let's see how user space handles the reference counting related BR_* return codes:

The BR_* return codes are handled by just incrementing or decrementing the reference counter in the BBinder object. Reference count decrements are added to mPendingStrongDerefs or mPendingWeakDerefs to be processed a little bit later. Decrementing reference counts is less critical than incrementing since not incrementing reference count right away may cause the BBinder to be deallocated incorrectly. A BC_ACQUIRE_DONE or BC_INCREFS_DONE command code will be sent to Binder driver to acknowledge the completion of reference count incrementing. The driver code to handle that is easy to understand so I will not discuss.

We have traced the sequence of reference counting propagation through the stable referencing structure. Next we are going to look at how a Binder transaction contributes to reference counting temporarily. In the “Binder transaction” article, we have seen how Binder driver implements special handlings for live objects in the transaction buffer. However we skipped the reference counting handling part of them. Now it’s time to look at it again.

As a reminder, the binder_translate_binder will be called to do special handling on a serialized BBinder that is being sent to another process. The node variable points to the binder_node structure for this BBinder. The target_proc points the binder_proc structure of the transaction's target process. The binder_inc_ref_for_node function finds a binder_ref structure in target process that references this binder_node and increments the reference count on the binder_ref.

The binder_translate_handle function will be called when a serialized BpBinder is about to be sent to another process during transaction. If the target process is the process where the underlying BBinder is created, then binder_inc_node_nilocked will be called to increment the local reference count on the binder_node. Note that the internal argument is set to 0 so that the local_weak_refs or local_strong_refs will be incremented. If the target process is not the process where the underlying BBinder is created, the live object in the transaction buffer remains a handle. So binder_inc_ref_for_node will be called to increment reference count on the binder_ref structure for the corresponding live object in the buffer.

The reference count increments make sense because these Binder objects are serialized in a transaction buffer and copied to a remote process, and the remote process will hold on to the buffer during a transaction. However, as soon as the target process releases the transaction buffer, the incremented reference counts will be reset.

We have seen how BR_TRANSACTION is handled in the "Binder transaction" article, but now let's look at it from another angle. Line 17 create a temporary Parcel and calls the ipcSetDataReference method to let the temporary Parcel point to the transaction buffer copied from source process. Note the braces on line 13 and 38, it confines the scope of this temporary Parcel to ensures that the destructor of this Parcel to be called. Also note that a function pointer freeBuffer is passed into ipcSetDataReference.

As we just described, line 6 sets the mOwner function hook to freeBuffer so when a Parcel is destructed, it will call BC_FREE_BUFFER to free the kernel transaction buffer.

The binder_free_buf function is used to free the corresponding transaction buffer managed by a binder_buffer structure. The binder_alloc_free_buf will return this buffer to the process's free buffer list and return physical pages to Linux kernel. The binder_transaction_buffer_release does all the reference counting resetting work.

This function walks through the transaction buffer and locates all live objects in it. If a serialized BBinder found, line 37 will call binder_dec_node to reset the local reference count on the corresponding binder_node. Similarly, if a serialized BpBinder is found, line 46 will call binder_dec_ref_for_handle to reset the reference count on the corresponding binder_ref. After the transaction buffer is released, all reference counting contributed by this transaction is reset.

External links

[1] https://github.com/androidonekb/HamKing