Binder Lifecycle Management
Death notification
Death notification is a Binder mechanism to allow a BpBinder
to get notified when the BBinder
it references to dies. I briefly touched on this topic in the "Binder architecture and core components" article. In this article we are going to take a deeper look into its implementation. When you say a BBinder
dies, it means the process that created this BBinder
terminates due to unexpected crashing or intentional exiting. For example, in the server module of HamKing project [1], the requestOrder
method in RemoteService
uses the linkToDeath
method to register a death listener for the ICreditCard
so that it can remove any order that is "purchased" with this ICreditCard
instance:
The ICreditCard
instance is created in client app process before calling requestOrder
and passed to the server app process. So the server app process holds a proxy of ICreditCard
and the asBinder
method returns a BinderProxy
object.
The DeathRecipient
interface contains a single callback method binderDied
which will be invoked by Binder framework when the remote BBinder
dies. The linkToDeath
method is a native method that calls the corresponding linkToDeath
method on its peer BpBinder
:
mObitsSent
is a flag that gets set if this BpBinder
receives a death notification. This flag will never be unset because if the remote process dies, it dies forever, so does the BBinder
it references to. There is no concept of reviving for BBinder
. So if that flag is set, the method just returns a DEAD_OBJECT
which will cause JNI layer to throw a DeadObjectException
in Java code. Otherwise, this recipient
is wrapped inside an Obituary
object and added into mObituaries
list. If this is the first death listener to register, then the requestDeathNotification
and flushCommands
methods in IPCThread
state are called to register a death listener to Binder driver. As we can see, no matter how many user space death listeners are registered through linkToDeath
, the Binder framework only register a death recipient to the driver once.
A BC_REQUEST_DEATH_NOTIFICATION
command code is written to the thread's output buffer. The handle
parameter is used to identify target BBinder
and the memory address of the BpBinder
is used as a cookie so that IPCThreadState
can find the corresponding BpBinder
when a death notification is received from Binder driver. flushCommands
method will trigger an ioctl
system call to register a death recipient to the driver. On the other hand, unlinkToDeath
is used to clear a previously registered death notification and the corresponding method in IPCThreadState
is clearDeathNotification
:
The only difference is the command code. We will jump to the relevant code in Binder driver. Please read through previous articles in this series if you feel hard to follow.
The BC_CLEAR_DEATH_NOTIFICATION
and the BC_REQUEST_DEATH_NOTIFICATION
are handled together because their data contract are the same. (They both need a handle
value and and the address of BpBinder
.) The binder_ref_death
represents a death recipient in Binder driver. It has an embedded binder_work
so that it can be enqueued onto a todo list. The cookie
is used by user space to identify death notification receiver, it is transparent to the driver.
binder_get_ref
returns the corresponding binder_ref
structure using the handle
value. In short, all that BC_REQUEST_DEATH_NOTIFICATION
does is to create a binder_ref_death
structure and let the corresponding binder_ref
point to it. However, line 35 to 43 handles a special case where the target process is already dead. In this case, a death notification will be sent back right away. As we can see, the way Binder driver knows a process is alive is by checking whether the target binder_node
points a binder_proc
structure. Line 36 initializes a work item and enqueue it onto a todo list. If the calling thread is a Binder thread, then line 38 enqueues the work item onto the thread's todo list. Otherwise it is enqueued to the process's todo list so that a Binder thread in the process can pick it up. In the HamKing example code above, the linkToDeath
method is called by a Binder thread in server app process so the death notification handling will be scheduled on the same Binder thread. However, if a UI thread calls linkToDeath
then the death notification handling will be scheduled on a Binder thread in the process.
Let’s look at the handling of BC_CLEAR_DEATH_NOTIFICATION
command. If line 47 is true, then it means the binder_ref_death
hasn't been enqueued onto a todo list which is the normal case. Otherwise, it means the remote process is dead and the driver has scheduled the work on a todo list but the target thread hasn't handled it yet. In the normal case, a BINDER_WORK_CLEAR_DEATH_NOTIFICATION
work type is created to indicate a successful death recipient removal, the work is then enqueued onto a todo list. Otherwise line 56 changes the work type from BINDER_WORK_DEAD_BINDER
to BINDER_WORK_DEAD_BINDER_AND_CLEAR
which is a combination of the previous two work types. The work doesn't need to be scheduled since it is already on a todo list.
In the corner case that the remote process is already dead when a dead recipient is being registered or deregistered, a death notification will be sent back immediately. But in the normal case where the target process is still alive, when the driver detects the death of the target process later, it will distribute death notifications to all death recipients listening for it.
How does Binder driver detect the death of a BBinder
? When a process crashes or exits, Linux kernel needs to clean up the resources that the process uses. One of the cleanups is to call the release
function hook in all the files that this process opens. For /dev/binder
, the function hook points to binder_release
who will eventually call binder_deferred_release
:
Line 8 to 13 gets all the binder_node
structures in the dying process and calls binder_node_release
on each binder_node
. Line 21 clears the proc
field in the binder_node
structure so that the binder_ref
s pointing to it will know the binder_node
is already dead later. Line 24 to 32 tries to deliver death notifications to each binder_ref
that points to this binder_node
. If a binder_ref
does have a death recipient added, then line 29 to 31 initializes a work item and enqueues it to the referencing process's todo list.
We have seen how a dead notification work is scheduled, let’s study how the listening process handles the death notification.
Like all other kinds of work items, death notification related works will be processed in binder_thread_read
in which a thread tries to read data from Binder driver. If the work type is BINDER_WORK_CLEAR_DEATH_NOTIFICATION
, then it is just a notifier that the death recipient removal is successful so a return code BR_CLEAR_DEATH_NOTIFICATION_DONE
will be sent to user space. Otherwise it means the remote process is dead so a BR_DEAD_BINDER
is sent to user space. Line 45 to 48 writes the return command and the cookie value to the reading thread's receive data buffer.
BR_CLEAR_DEATH_NOTIFICATION_DONE
is handled by just decrementing the weak reference count on the corresponding BpBinder
object. BR_DEAD_BINDER
indicates an actual death notification, line 10 casts the cookie back to a BpBinder
and calls the sendObituary
method to invoke death notification callbacks. Besides, a BC_DEAD_BINDER_DONE
command is sent to Binder driver to tell it the death notification is handled in user space. We won't go into BC_DEAD_BINDER_DONE
since what it does is quite trivial.
The sendObituary
method invokes binderDied
callback in each DeathRecipient
object. Beside, it will remove the death recipient from Binder driver.
Binder reference counting
In the article “Smart pointers” we learned a reference counting based framework for automatic object deallocation. This framework enables you to use sp
and wp
classes to reference reference counting enabled objects inside a process's memory space. Binder framework expands the concept of reference counting across process boundaries. This is necessary because Binder is an object oriented IPC mechanism where each entity that can go across process boundaries is an object that inherits IBinder
. The BBinder
is the target object to be remotely referenced by BpBinder
s in other processes. With this object oriented design, it is natural for Binder to implement a reference counting based mechanism to manage the lifetimes of Binder objects. The reference counting is implemented with the help of Android framework's smart pointers.
There are two parts in an object referencing structure: the referencer and the target object. For example, in the smart pointer framework, the wp
or sp
is the referencer and the underlying object that inherits fromRefBase
or LightRefBase
is the target object. The target object keeps the reference counters and the referencer increments or decrements the counters. There is only one link in this chain of reference, but when a referencing structure goes across process boundaries, the referencing chain is elongated. There are three links in a typical Binder referencing chain. Now let's assume a BpBinder
in process "A" references a BBinder
in process "B":
(1) BpBinder
references a kernel space binder_ref
structure in the scope of process "A". Here BpBinder
is the referencer and the binder_ref
is the target object. So the binder_ref
needs to keep the reference counters and BpBinder
needs to increment and decrement the counters. BpBinder
uses four command codes to change the reference counters on binder_ref
: BC_INCREFS
, BC_ACQUIRE
, BC_DECREFS
and BC_RELEASE
.
(2) A binder_ref
kernel structure references a target binder_node
structure. Here the binder_ref
is the referencer and binder_node
is the target object. The binder_node
structure needs to keep counters of this kind of reference and binder_ref
changes the counters, conceptually. (I say "conceptually" because all this happens in the kernel, so it might be vague to define who keeps the counters and who changes the counters.) Binder driver calls this kind of reference "internal reference" since this is a reference inside Binder driver.
(3) A binder_node
references a BBinder
object in the user space of process "B". Here the binder_node
is the referencer and BBinder
is the target object. BBinder
enables reference counting by inheriting RefBase
class. binder_node
uses four return codes to change reference counters on BBinder
: BR_INCREFS
, BR_ACQUIRE
, BR_DECREFS
and BR_RELEASE
.
This three links chain describes a stable referencing structure in Binder. I use the word “stable” because this chain exists as long as the BpBinder
and BBinder
are alive. There are at least two other types of referencing structures in Binder. They are all reference counted by the same countered in BpBinder
, binder_ref
, binder_node
and BBinder
. So basically at any given time, a reference counter value is a sum of references coming from all three referencing structures.
The second referencing chain comes from an ongoing Binder transaction. It is a dynamic referencing structure in that it only contributes to the reference counters during a Binder transaction. Let’s say process “A” sends process “B” some serialized data during a transaction. While process “B” is processing this incoming BR_TRANSACTION
, it holds the buffer that contains the data copied from process "A". Since this buffer may contain some flat_binder_object
structures which are basically serialized binder_ref
s and binder_node
s, corresponding counters in the binder_ref
s and binder_node
s need to be incremented. As soon as process "B" is done with the transaction, it will free the buffer with BC_FREE_BUFFER
command code, after which the reference counting will be removed from the corresponding binder_ref
s and binder_node
s. The same reference counting changes are applied when process "A" handles the BR_REPLY
return code by holding and releasing the reply data buffer. While a process is handling the transaction buffer, usually it wants to hold a long time reference to a Binder object contained in the buffer. In this case the process needs to create a BpBinder
and uses the four BC_*
commands to get a stable reference to the remote BBinder
.
The third referencing structure doesn’t have to do with Binder. We know that both BBinder
and BpBinder
inherits RefBase
class so that you can use a wp
and sp
to reference them in local process. So Binder doesn't care how a process references them with wp
and sp
locally, almost. If a process creates BBinder
but only uses it locally and manages it with wp
and sp
, that's totally fine. Binder driver knows nothing about them as long as this process doesn't pass it through Binder driver. For example, the LocalService
in the server module of HamKing is such a BBinder
. But things are a little different on the BpBinder
side. When you use a wp
or sp
to reference a BpBinder
, you are actually referencing the underlying BBinder
. You can't use wp
or sp
to reference a BpBinder
without letting Binder driver know. In fact, referencing a BpBinder
with wp
or sp
will trigger the BC_*
command actions under certain cases.
Strong and weak Binders
Before we look into the internals of how Binder implements these reference counting structures, let’s talk about strong and weak Binders a bit. Strong and weak reference is a common concept in object oriented programming and Binder adopts this concept in its design principles. We have seen the writeStrongBinder
method in Parcel
class in the "Binder data model" article:
The writeStrongBinder
serializes a strongly referenced IBinder
object into the Parcel
in preparation for transaction. By calling this method, the caller tells Binder framework it wants to pass a strong Binder type to remote process. Let's assume process "A" creates a BBinder
and calls writeStrongBinder
to serialize it. It doesn't matter how process "A" references this BBinder
object locally, as long as it calls writeStrongBinder
with a sp
type argument, it tells Binder framework it wants to pass a strong type Binder for IPC purposes. When another process "B" receives the transaction buffer, it can read out the Binder object as a strong proxy with readStrongBinder
, or read out as a weak proxy through readWeakBinder
. If process "B" reads it out as a strong proxy, it can directly interact with the remote BBinder
, otherwise it needs to first promote it to a strong proxy before interacting with the remote BBinder
.
Accordingly, there is a writeWeakBinder
and readWeakBinder
method in Parcel
. The methods are removed in latest Android version, so you can only find them in older Android releases:
If process “A” creates BBinder
and serializes it with writeWeakBinder
, then a remote process "B" can only read out a weak proxy through readWeakBinder
to get a wp
pointing to a BpBinder
. If process "B" wants to interact with the remote BBinder
, it needs to first promote the weak handle to a strong handle. The promotion is done by the BC_ATTEMPT_ACQUIRE
command code. Process "B" sends this command code to Binder driver which will eventually enqueue a BR_ATTEMPT_ACQUIRE
return code for process "A" to process. Process "A" handles this BR_ATTEMPT_ACQUIRE
by trying to promote the native BBinder
. Process "A" will then send back a boolean value indicating whether the promotion is successful using the BC_ACQUIRE_RESULT
command code. This sequence is basically expanding the logic of attemptIncStrong
method in RefBase
across process boundaries.
As we described in the “Smart pointers” article, the onIncStrongAttempted
callback will be invoked if you are trying to promote a wp
to sp
that points to a BpBinder
for the first time. BpBinder
will then try to promote the remote BBinder
to a strong Binder through BC_ATTEMPT_ACQUIRE
command. The target process will then try to promote the local BBinder
and return the result.
Due to the concept of weak references in Binders, when a BBinder
is serialized, the address of the weak reference type weakref_impl
will be written to Binder driver and this address is used as the unique key of binder_node
. This is necessary because weak references may outlive the target object so it is possible that the BBinder
is already deallocated when some remote processes are still weakly referencing it. In this case, the weakref_impl
still needs to be alive for Binder driver to manage the already deallocated BBinder
.
This is an introduction of the concept of strong and weak Binders. But the bummer is that weak Binder is never implemented in Android. The Binder code base had a blueprint for weak Binder support and Android native framework had code supporting that. However, corresponding support was never implemented in Binder driver. Specifically, the BC_ATTEMPT_ACQUIRE
and BC_ACQUIRE_RESULT
command codes are never implemented by Binder driver:
Without the capability of promoting a weak Binder to strong Binder, the weak Binder becomes useless. Recent Android versions just removed the readWeakBinder
and writeWeakBinder
code in Parcel
as well. Even though weak Binder is not supported right now, it is still important to understand it because Binder was designed with that in mind.
Binder reference counting implementation
The reference counters in BpBinder
and BBinder
comes from the RefBase
base class. Besides, binder_ref
and binder_node
also need to have certain reference counters to support the different reference structures described above.
The strong
and weak
fields in binder_ref_data
records the number of strong and weak references coming from user space. In user space it is actually the BpBinder
that actually manages the counts. In theory, many instances of BpBinder
can be created in user space and each will contribute to the reference count. But in reality, ProcessState
keeps a global cache of BpBinder
instances so that at most one BpBinder
instance exists given a handle value. So in fact only one BpBinder
will contribute to the reference counts. In addition to the references coming from BpBinder
s, a Binder transaction will add to the counts if the transaction buffer contains a flat_binder_object
that represents the corresponding binder_ref
.
The internal_strong_refs
is the number of binder_ref
structures that are strongly referencing this binder_node
. The term "internal" comes from the fact that the reference comes from Binder driver. Since binder_ref
indicates a reference from another process, this number actually means the number of strong references coming from other processes. The refs
field is a hash table of all binder_ref
structures referencing this binder_node
. Since a strong reference always indicates a weak reference, so the size of refs
is the number of weak internal references. (You can also say the weak internal count is the size of refs
minus internal_strong_refs
, it doesn't matter.) Since a process can have at most one binder_ref
pointing to a certain binder_node
, the internal_strong_refs
and the size of refs
actually indicate the number of remote processes that are referencing this binder_node
strongly and weakly. While a process is handling BR_TRANSACTION
or BR_REPLY
, the transaction buffer may contain flat_binder_object
structures of type BINDER_TYPE_BINDER
or BINDER_TYPE_WEAK_BINDER
which represent BBinder
objects created in local process. While user space is processing the transaction buffer, the local_weak_refs
or local_strong_refs
field needs to be incremented to record such references coming from user space of local process. The local reference counts will decrement after user space frees the transaction buffer with BC_FREE_BUFFER
. The local_weak_refs
and local_strong_refs
only makes sense during a transaction, otherwise a process just references a local BBinder
through a sp
or wp
, the driver won't need to know about these references, needless to say keep reference counts for them. For example, in the pickupOrder
method in the RemoteService
class in HamKing project, an IOrderSession
object that was created by the server app process is sent back from client app process. While a Binder thread in server app process is handling the pickupOrder
request, the local_strong_refs
in the binder_node
structure for the IOrderSession
needs to be incremented until server app is done with the pickupOrder
call.
The has_strong_ref
and has_weak_ref
indicates whether Binder driver has requested the corresponding BBinder
to increment strong or weak reference counting. In another word, they indicates whether this binder_node
is referencing the user space BBinder
object. As long as the current binder_node
still has any reference count greater than zero, the binder_node
needs to use the four BR_*
return codes to ask the BBinder
to increment reference counts, so to keep the BBinder
alive. If all reference counts on a binder_node
drop to zero, the driver will free the binder_node
and no longer cares about whether the user space BBinder
is alive. After the driver asks user space to increment the reference counts of the BBinder
and before the user space sends back a confirmation, pending_strong_ref
or pending_weak_ref
flag will be set.
Next let’s trace through the implementation of the stable referencing structure in Binder.
A BpBinder
object will be created as soon as a process receives a flat_binder_object
with type BINDER_TYPE_HANDLE
or BINDER_TYPE_WEAK_HANDLE
. As soon as a BpBinder
is created, a BC_INCREFS
command needs to be used to increment the weak reference count on the corresponding binder_ref
structure. This is essential to prevent the binder_ref
from being deallocated by Binder driver. Besides, the weak count will be decremented with BC_DECREFS
when the BpBinder
is destructed. When the BpBinder
is referenced by a sp
for the first time, a BC_ACQUIRE
will be used to increment the strong count on binder_ref
structure. The strong count will be decremented when last sp
on the BpBinder
is removed. The extendObjectLifetime
method is used to avoid the BpBinder
being deallocated when the last strong reference on it is removed. This method in smart pointer framework is specially added for BpBinder
and BpRefBase
to handle this case.
The four reference count changing commands are handled together. BC_INCREFS
or BC_ACQUIRE
increments weak or strong reference count respectively; BC_DECREFS
or BC_RELEASE
decrements weak or strong reference count respectively. Line 30 to 38 handles a special case for context manager node. Reference counting for context manager node is a little special but similar to normal cases so we won't talk about it separately. Basically binder_update_ref_for_handle
is used to update reference counts.
Line 11 gets the binder_ref
structure from the process's binder_proc
structure. binder_inc_ref_olocked
is used to handle count incrementing and binder_dec_ref_olocked
is used to handle count decrementing. In normal cases, this two functions just changes the weak
and strong
field in binder_ref_data
which is what the BC_*
commands meant to do. The reference count changes need to propagate the chain of reference to the target binder_node
sometimes. It only happens when a count value changes from 0 to 1 or from 1 to 0. When a value changes from 0 to 1, binder_inc_node
is called to increment internal reference count on the binder_node
, or in the 1 to 0 case, binder_dec_node
is called to decrement internal reference count. If both strong and weak reference count on the binder_ref
drops zero after the change, not only does the count change should propagate to binder_node
, the binder_ref
structure itself needs to be cleaned up and freed.
Both binder_dec_node_nilocked
and binder_inc_node_nilocked
functions will adjust the corresponding reference count based on the internal
and strong
arguments. Note that in our scenario the internal
is set to 1 because the reference is coming from another binder_ref
. Besides adjusting the reference counters in binder_node
, the reference count changes need to propagate to user space BBinder
in certain cases. Line 14 to 20 handles the case where a strong reference count is incremented on the binder_node
, but the driver hasn't asked user space to increment the strong reference count on the BBinder
. Line 24 to 26 handles the similar case for weak reference count. The way Binder driver asks user space to adjust reference counts on the BBinder
is enqueuing the binder_node
onto the todo list of target process. The binder_work
structure embedded in a binder_node
has type BINDER_WORK_NODE
. Similarly, in binder_dec_node_nilocked
, line 56 to 61 handles the case where the total number of strong references or weak references drops to zero after decrementing by enqueuing the binder_node
onto the todo list of target thread. Line 62 to 69 handles the case where all kinds of references on the binder_node
drop to zero after decrementing by returning a true, so that binder_free_node
will be called to free the binder_node
structure on line 79. As we can see, the driver will only ask user space to adjust reference count on BBinder
object only when a type of reference count on binder_node
changes from 0 to 1 or from 1 to 0. This is very similar to how the reference counts on binder_ref
affects the target binder_node
.
Same as all other kinds of binder_work
, a Binder thread will process the BINDER_WORK_NODE
in binder_thread_read
:
The way a target thread handles a BINDER_WORK_NODE
is to compare the expected reference counting state of target BBinder
and current state, if they are different then writes a corresponding BR_*
command for user space to handle. The strong
and weak
variables indicates whether the BBinder
should increment its strong reference count or weak reference count, respectively. For example, line 29 to 33 handles the case where the weak reference count on BBinder
should be incremented but the driver never asked user space to do so. In this case a BR_INCREFS
return code is written to user space so that the weak reference count on BBinder
can be incremented. Other three kinds of mismatches are handled similarly. Let's see how user space handles the reference counting related BR_*
return codes:
The BR_*
return codes are handled by just incrementing or decrementing the reference counter in the BBinder
object. Reference count decrements are added to mPendingStrongDerefs
or mPendingWeakDerefs
to be processed a little bit later. Decrementing reference counts is less critical than incrementing since not incrementing reference count right away may cause the BBinder
to be deallocated incorrectly. A BC_ACQUIRE_DONE
or BC_INCREFS_DONE
command code will be sent to Binder driver to acknowledge the completion of reference count incrementing. The driver code to handle that is easy to understand so I will not discuss.
We have traced the sequence of reference counting propagation through the stable referencing structure. Next we are going to look at how a Binder transaction contributes to reference counting temporarily. In the “Binder transaction” article, we have seen how Binder driver implements special handlings for live objects in the transaction buffer. However we skipped the reference counting handling part of them. Now it’s time to look at it again.
As a reminder, the binder_translate_binder
will be called to do special handling on a serialized BBinder
that is being sent to another process. The node
variable points to the binder_node
structure for this BBinder
. The target_proc
points the binder_proc
structure of the transaction's target process. The binder_inc_ref_for_node
function finds a binder_ref
structure in target process that references this binder_node
and increments the reference count on the binder_ref
.
The binder_translate_handle
function will be called when a serialized BpBinder
is about to be sent to another process during transaction. If the target process is the process where the underlying BBinder
is created, then binder_inc_node_nilocked
will be called to increment the local reference count on the binder_node
. Note that the internal
argument is set to 0 so that the local_weak_refs
or local_strong_refs
will be incremented. If the target process is not the process where the underlying BBinder
is created, the live object in the transaction buffer remains a handle. So binder_inc_ref_for_node
will be called to increment reference count on the binder_ref
structure for the corresponding live object in the buffer.
The reference count increments make sense because these Binder objects are serialized in a transaction buffer and copied to a remote process, and the remote process will hold on to the buffer during a transaction. However, as soon as the target process releases the transaction buffer, the incremented reference counts will be reset.
We have seen how BR_TRANSACTION
is handled in the "Binder transaction" article, but now let's look at it from another angle. Line 17 create a temporary Parcel
and calls the ipcSetDataReference
method to let the temporary Parcel
point to the transaction buffer copied from source process. Note the braces on line 13 and 38, it confines the scope of this temporary Parcel
to ensures that the destructor of this Parcel
to be called. Also note that a function pointer freeBuffer
is passed into ipcSetDataReference
.
As we just described, line 6 sets the mOwner
function hook to freeBuffer
so when a Parcel
is destructed, it will call BC_FREE_BUFFER
to free the kernel transaction buffer.
The binder_free_buf
function is used to free the corresponding transaction buffer managed by a binder_buffer
structure. The binder_alloc_free_buf
will return this buffer to the process's free buffer list and return physical pages to Linux kernel. The binder_transaction_buffer_release
does all the reference counting resetting work.
This function walks through the transaction buffer and locates all live objects in it. If a serialized BBinder
found, line 37 will call binder_dec_node
to reset the local reference count on the corresponding binder_node
. Similarly, if a serialized BpBinder
is found, line 46 will call binder_dec_ref_for_handle
to reset the reference count on the corresponding binder_ref
. After the transaction buffer is released, all reference counting contributed by this transaction is reset.