WorryFree Computers   »   [go: up one dir, main page]

Last year we wrote about how moving native code in Android from C++ to Rust has resulted in fewer security vulnerabilities. Most of the components we mentioned then were system services in userspace (running under Linux), but these are not the only components typically written in memory-unsafe languages. Many security-critical components of an Android system run in a “bare-metal” environment, outside of the Linux kernel, and these are historically written in C. As part of our efforts to harden firmware on Android devices, we are increasingly using Rust in these bare-metal environments too.

To that end, we have rewritten the Android Virtualization Framework’s protected VM (pVM) firmware in Rust to provide a memory safe foundation for the pVM root of trust. This firmware performs a similar function to a bootloader, and was initially built on top of U-Boot, a widely used open source bootloader. However, U-Boot was not designed with security in a hostile environment in mind, and there have been numerous security vulnerabilities found in it due to out of bounds memory access, integer underflow and memory corruption. Its VirtIO drivers in particular had a number of missing or problematic bounds checks. We fixed the specific issues we found in U-Boot, but by leveraging Rust we can avoid these sorts of memory-safety vulnerabilities in future. The new Rust pVM firmware was released in Android 14.

As part of this effort, we contributed back to the Rust community by using and contributing to existing crates where possible, and publishing a number of new crates as well. For example, for VirtIO in pVM firmware we’ve spent time fixing bugs and soundness issues in the existing virtio-drivers crate, as well as adding new functionality, and are now helping maintain this crate. We’ve published crates for making PSCI and other Arm SMCCC calls, and for managing page tables. These are just a start; we plan to release more Rust crates to support bare-metal programming on a range of platforms. These crates are also being used outside of Android, such as in Project Oak and the bare-metal section of our Comprehensive Rust course.

Training engineers

Many engineers have been positively surprised by how productive and pleasant Rust is to work with, providing nice high-level features even in low-level environments. The engineers working on these projects come from a range of backgrounds. Our comprehensive Rust course has helped experienced and novice programmers quickly come up to speed. Anecdotally the Rust type system (including the borrow checker and lifetimes) helps avoid making mistakes that are easily made in C or C++, such as leaking pointers to stack-allocated values out of scope.

One of our bare-metal Rust course attendees had this to say:

"types can be built that bring in all of Rust's niceties and safeties and 
yet still compile down to extremely efficient code like writes
of constants to memory-mapped IO."

97% of attendees that completed a survey agreed the course was worth their time.

Advantages and challenges

Device drivers are often written in an object-oriented fashion for flexibility, even in C. Rust traits, which can be seen as a form of compile-time polymorphism, provide a useful high-level abstraction for this. In many cases this can be resolved entirely at compile time, with no runtime overhead of dynamic dispatch via vtables or structs of function pointers.

There have been some challenges. Safe Rust’s type system is designed with an implicit assumption that the only memory the program needs to care about is allocated by the program (be it on the stack, the heap, or statically), and only used by the program. Bare-metal programs often have to deal with MMIO and shared memory, which break this assumption. This tends to require a lot of unsafe code and raw pointers, with limited tools for encapsulation. There is some disagreement in the Rust community about the soundness of references to MMIO space, and the facilities for working with raw pointers in stable Rust are currently somewhat limited. The stabilisation of offset_of, slice_ptr_get, slice_ptr_len, and other nightly features will improve this, but it is still challenging to encapsulate cleanly. Better syntax for accessing struct fields and array indices via raw pointers without creating references would also be helpful.

The concurrency introduced by interrupt and exception handlers can also be awkward, as they often need to access shared mutable state but can’t rely on being able to take locks. Better abstractions for critical sections will help somewhat, but there are some exceptions that can’t practically be disabled, such as page faults used to implement copy-on-write or other on-demand page mapping strategies.

Another issue we’ve had is that some unsafe operations, such as manipulating the page table, can’t be encapsulated cleanly as they have safety implications for the whole program. Usually in Rust we are able to encapsulate unsafe operations (operations which may cause undefined behaviour in some circumstances, because they have contracts which the compiler can’t check) in safe wrappers where we ensure the necessary preconditions so that it is not possible for any caller to cause undefined behaviour. However, mapping or unmapping pages in one part of the program can make other parts of the program invalid, so we haven’t found a way to provide a fully general safe interface to this. It should be noted that the same concerns apply to a program written in C, where the programmer always has to reason about the safety of the whole program.

Some people adopting Rust for bare-metal use cases have raised concerns about binary size. We have seen this in some cases; for example our Rust pVM firmware binary is around 460 kB compared to 220 kB for the earlier C version. However, this is not a fair comparison as we also added more functionality which allowed us to remove other components from the boot chain, so the overall size of all VM boot chain components was comparable. We also weren’t particularly optimizing for binary size in this case; speed and correctness were more important. In cases where binary size is critical, compiling with size optimization, being careful about dependencies, and avoiding Rust’s string formatting machinery in release builds usually allows comparable results to C.

Architectural support is another concern. Rust is generally well supported on the Arm and RISC-V cores that we see most often, but support for more esoteric architectures (for example, the Qualcomm Hexagon DSP included in many Qualcomm SoCs used in Android phones) can be lacking compared to C.

The future of bare-metal Rust

Overall, despite these challenges and limitations, we’ve still found Rust to be a significant improvement over C (or C++), both in terms of safety and productivity, in all the bare-metal use cases where we’ve tried it so far. We plan to use it wherever practical.

As well as the work in the Android Virtualization Framework, the team working on Trusty (the open-source Trusted Execution Environment used on Pixel phones, among others) have been hard at work adding support for Trusted Applications written in Rust. For example, the reference KeyMint Trusted Application implementation is now in Rust. And there’s more to come in future Android devices, as we continue to use Rust to improve security of the devices you trust.

Android 14 is the third major Android release with Rust support. We are already seeing a number of benefits:

These positive early results provided an enticing motivation to increase the speed and scope of Rust adoption. We hoped to accomplish this by investing heavily in training to expand from the early adopters.

Scaling up from Early Adopters

Early adopters are often willing to accept more risk to try out a new technology. They know there will be some inconveniences and a steep learning curve but are willing to learn, often on their own time.

Scaling up Rust adoption required moving beyond early adopters. For that we need to ensure a baseline level of comfort and productivity within a set period of time. An important part of our strategy for accomplishing this was training. Unfortunately, the type of training we wanted to provide simply didn’t exist. We made the decision to write and implement our own Rust training.

Training Engineers

Our goals for the training were to:

  • Quickly ramp up engineers: It is hard to take people away from their regular work for a long period of time, so we aimed to provide a solid foundation for using Rust in days, not weeks. We could not make anybody a Rust expert in so little time, but we could give people the tools and foundation needed to be productive while they continued to grow. The goal is to enable people to use Rust to be productive members of their teams. The time constraints meant we couldn’t teach people programming from scratch; we also decided not to teach macros or unsafe Rust in detail.
  • Make it engaging (and fun!): We wanted people to see a lot of Rust while also getting hands-on experience. Given the scope and time constraints mentioned above, the training was necessarily information-dense. This called for an interactive setting where people could quickly ask questions to the instructor. Research shows that retention improves when people can quickly verify assumptions and practice new concepts.
  • Make it relevant for Android: The Android-specific tooling for Rust was already documented, but we wanted to show engineers how to use it via worked examples. We also wanted to document emerging standards, such as using thiserror and anyhow crates for error handling. Finally, because Rust is a new language in the Android Platform (AOSP), we needed to show how to interoperate with existing languages such as Java and C++.

With those three goals as a starting point, we looked at the existing material and available tools.

Existing Material

Documentation is a key value of the Rust community and there are many great resources available for learning Rust. First, there is the freely available Rust Book, which covers almost all of the language. Second, the standard library is extensively documented.

Because we knew our target audience, we could make stronger assumptions than most material found online. We created the course for engineers with at least 2–3 years of coding experience in either C, C++, or Java. This allowed us to move quickly when explaining concepts familiar to our audience, such as "control flow", “stack vs heap”, and “methods”. People with other backgrounds can learn Rust from the many other resources freely available online.

Technology

For free-form documentation, mdBook has become the de facto standard in the Rust community. It is used for official documentation such as the Rust Book and Rust Reference.

A particularly interesting feature is the ability to embed executable snippets of Rust code. This is key to making the training engaging since the code can be edited live and executed directly in the slides:

In addition to being a familiar community standard, mdBook offers the following important features:

  • Maintainability: mdbook test compiles and executes every code snippet in the course. This allowed us to evolve the class over time while ensuring that we always showed valid code to the participants.
  • Extensibility: mdBook has a plugin system which allowed us to extend the tool as needed. We relied on this feature for translations and ASCII art diagrams.

These features made it easy for us to choose mdBook. While mdBook is not designed for presentations, the output looked OK on a projector when we limited the vertical size of each page.

Supporting Translations

Android has developers and OEM partners in many countries. It is critical that they can adapt existing Rust code in AOSP to fit their needs. To support translations, we developed mdbook-i18n-helpers. Support for multilingual documentation has been a community wish since 2015 and we are glad to see the plugins being adopted by several other projects to produce maintainable multilingual documentation for everybody.

Comprehensive Rust

With the technology and format nailed down, we started writing the course. We roughly followed the outline from the Rust Book since it covered most of what we need to cover. This gave us a three day course which we called Rust Fundamentals. We designed it to run for three days for five hours a day and encompass Rust syntax, semantics, and important concepts such as traits, generics, and error handling.

We then extended Rust Fundamentals with three deep dives:

  • Rust in Android: a half-day course on using Rust for AOSP development. It includes interoperability with C, C++, and Java.
  • Bare-metal Rust: a full-day class on using Rust for bare-metal development. Android devices ship significant amounts of firmware. These components are often foundational in nature (for example, the bootloader, which establishes the trust for the rest of the system), thus they must be secure.
  • Concurrency in Rust: a full-day class on concurrency in Rust. We cover both multithreading with blocking synchronization primitives (such as mutexes) and async/await concurrency (cooperative multitasking using futures).

A large set of in-house and community translators have helped translate the course into several languages. The full translations were Brazilian Portuguese and Korean. We are working on Simplified Chinese and Traditional Chinese translations as well.

Course Reception

We started teaching the class in late 2022. In 2023, we hired a vendor, Immunant, to teach the majority of classes for Android engineers. This was important for scalability and for quality: dedicated instructors soon discovered where the course participants struggled and could adapt the delivery. In addition, over 30 Googlers have taught the course worldwide.

More than 500 Google engineers have taken the class. Feedback has been very positive: 96% of participants agreed it was worth their time. People consistently told us that they loved the interactive style, highlighting how it helped to be able to ask clarifying questions at any time. Instructors noted that people gave the course their undivided attention once they realized it was live. Live-coding demands a lot from the instructor, but it is worth it due to the high engagement it achieves.

Most importantly, people exited this course and were able to be immediately productive with Rust in their day jobs. When participants were asked three months later, they confirmed that they were able to write and review Rust code. This matched the results from the much larger survey we made in 2022.

Looking Forward

We have been teaching Rust classes at Google for a year now. There are a few things that we want to improve: better topic ordering, more exercises, and more speaker notes. We would also like to extend the course with more deep dives. Pull requests are very welcome!

The full course is available for free at https://google.github.io/comprehensive-rust/. We are thrilled to see people starting to use Comprehensive Rust for classes around the world. We hope it can be a useful resource for the Rust community and that it will help both small and large teams get started on their Rust journey!

Thanks!

We are grateful to the 190+ contributors from all over the world who created more than 1,000 pull requests and issues on GitHub. Their bug reports, fixes, and feedback improved the course in countless ways. This includes the 50+ people who worked hard on writing and maintaining the many translations.

Special thanks to Andrew Walbran for writing Bare-metal Rust and to Razieh Behjati, Dustin Mitchell, and Alexandre Senges for writing Concurrency in Rust.

We also owe a great deal of thanks to the many volunteer instructors at Google who have been spending their time teaching classes around the globe. Your feedback has helped shape the course.

Finally, thanks to Jeffrey Vander Stoep, Ivan Lozano, Matthew Maurer, Dmytro Hrybenko, and Lars Bergstrom for providing feedback on this post.

For more than a decade, memory safety vulnerabilities have consistently represented more than 65% of vulnerabilities across products, and across the industry. On Android, we’re now seeing something different - a significant drop in memory safety vulnerabilities and an associated drop in the severity of our vulnerabilities.

Looking at vulnerabilities reported in the Android security bulletin, which includes critical/high severity vulnerabilities reported through our vulnerability rewards program (VRP) and vulnerabilities reported internally, we see that the number of memory safety vulnerabilities have dropped considerably over the past few years/releases. From 2019 to 2022 the annual number of memory safety vulnerabilities dropped from 223 down to 85.

This drop coincides with a shift in programming language usage away from memory unsafe languages. Android 13 is the first Android release where a majority of new code added to the release is in a memory safe language.

As the amount of new memory-unsafe code entering Android has decreased, so too has the number of memory safety vulnerabilities. From 2019 to 2022 it has dropped from 76% down to 35% of Android’s total vulnerabilities. 2022 is the first year where memory safety vulnerabilities do not represent a majority of Android’s vulnerabilities.

While correlation doesn’t necessarily mean causation, it’s interesting to note that the percent of vulnerabilities caused by memory safety issues seems to correlate rather closely with the development language that’s used for new code. This matches the expectations published in our blog post 2 years ago about the age of memory safety vulnerabilities and why our focus should be on new code, not rewriting existing components. Of course there may be other contributing factors or alternative explanations. However, the shift is a major departure from industry-wide trends that have persisted for more than a decade (and likely longer) despite substantial investments in improvements to memory unsafe languages.

We continue to invest in tools to improve the safety of our C/C++. Over the past few releases we’ve introduced the Scudo hardened allocator, HWASAN, GWP-ASAN, and KFENCE on production Android devices. We’ve also increased our fuzzing coverage on our existing code base. Vulnerabilities found using these tools contributed both to prevention of vulnerabilities in new code as well as vulnerabilities found in old code that are included in the above evaluation. These are important tools, and critically important for our C/C++ code. However, these alone do not account for the large shift in vulnerabilities that we’re seeing, and other projects that have deployed these technologies have not seen a major shift in their vulnerability composition. We believe Android’s ongoing shift from memory-unsafe to memory-safe languages is a major factor.

Rust for Native Code

In Android 12 we announced support for the Rust programming language in the Android platform as a memory-safe alternative to C/C++. Since then we’ve been scaling up our Rust experience and usage within the Android Open Source Project (AOSP).

As we noted in the original announcement, our goal is not to convert existing C/C++ to Rust, but rather to shift development of new code to memory safe languages over time.

In Android 13, about 21% of all new native code (C/C++/Rust) is in Rust. There are approximately 1.5 million total lines of Rust code in AOSP across new functionality and components such as Keystore2, the new Ultra-wideband (UWB) stack, DNS-over-HTTP3, Android’s Virtualization framework (AVF), and various other components and their open source dependencies. These are low-level components that require a systems language which otherwise would have been implemented in C++.

Security impact

To date, there have been zero memory safety vulnerabilities discovered in Android’s Rust code.


We don’t expect that number to stay zero forever, but given the volume of new Rust code across two Android releases, and the security-sensitive components where it’s being used, it’s a significant result. It demonstrates that Rust is fulfilling its intended purpose of preventing Android’s most common source of vulnerabilities. Historical vulnerability density is greater than 1/kLOC (1 vulnerability per thousand lines of code) in many of Android’s C/C++ components (e.g. media, Bluetooth, NFC, etc). Based on this historical vulnerability density, it’s likely that using Rust has already prevented hundreds of vulnerabilities from reaching production.

What about unsafe Rust?

Operating system development requires accessing resources that the compiler cannot reason about. For memory-safe languages this means that an escape hatch is required to do systems programming. For Java, Android uses JNI to access low-level resources. When using JNI, care must be taken to avoid introducing unsafe behavior. Fortunately, it has proven significantly simpler to review small snippets of C/C++ for safety than entire programs. There are no pure Java processes in Android. It’s all built on top of JNI. Despite that, memory safety vulnerabilities are exceptionally rare in our Java code.

Rust likewise has the unsafe{} escape hatch which allows interacting with system resources and non-Rust code. Much like with Java + JNI, using this escape hatch comes with additional scrutiny. But like Java, our Rust code is proving to be significantly safer than pure C/C++ implementations. Let’s look at the new UWB stack as an example.

There are exactly two uses of unsafe in the UWB code: one to materialize a reference to a Rust object stored inside a Java object, and another for the teardown of the same. Unsafe was actively helpful in this situation because the extra attention on this code allowed us to discover a possible race condition and guard against it.

In general, use of unsafe in Android’s Rust appears to be working as intended. It’s used rarely, and when it is used, it’s encapsulating behavior that’s easier to reason about and review for safety.

Safety measures make memory-unsafe languages slow

Mobile devices have limited resources and we’re always trying to make better use of them to provide users with a better experience (for example, by optimizing performance, improving battery life, and reducing lag). Using memory unsafe code often means that we have to make tradeoffs between security and performance, such as adding additional sandboxing, sanitizers, runtime mitigations, and hardware protections. Unfortunately, these all negatively impact code size, memory, and performance.

Using Rust in Android allows us to optimize both security and system health with fewer compromises. For example, with the new UWB stack we were able to save several megabytes of memory and avoid some IPC latency by running it within an existing process. The new DNS-over-HTTP/3 implementation uses fewer threads to perform the same amount of work by using Rust’s async/await feature to process many tasks on a single thread in a safe manner.

What about non-memory-safety vulnerabilities?

The number of vulnerabilities reported in the bulletin has stayed somewhat steady over the past 4 years at around 20 per month, even as the number of memory safety vulnerabilities has gone down significantly. So, what gives? A few thoughts on that.

A drop in severity

Memory safety vulnerabilities disproportionately represent our most severe vulnerabilities. In 2022, despite only representing 36% of vulnerabilities in the security bulletin, memory-safety vulnerabilities accounted for 86% of our critical severity security vulnerabilities, our highest rating, and 89% of our remotely exploitable vulnerabilities. Over the past few years, memory safety vulnerabilities have accounted for 78% of confirmed exploited “in-the-wild” vulnerabilities on Android devices.

Many vulnerabilities have a well defined scope of impact. For example, a permissions bypass vulnerability generally grants access to a specific set of information or resources and is generally only reachable if code is already running on the device. Memory safety vulnerabilities tend to be much more versatile. Getting code execution in a process grants access not just to a specific resource, but everything that that process has access to, including attack surface to other processes. Memory safety vulnerabilities are often flexible enough to allow chaining multiple vulnerabilities together. The high versatility is perhaps one reason why the vast majority of exploit chains that we have seen use one or more memory safety vulnerabilities.

With the drop in memory safety vulnerabilities, we’re seeing a corresponding drop in vulnerability severity.



With the decrease in our most severe vulnerabilities, we’re seeing increased reports of less severe vulnerability types. For example, about 15% of vulnerabilities in 2022 are DoS vulnerabilities (requiring a factory reset of the device). This represents a drop in security risk.

Android appreciates our security research community and all contributions made to the Android VRP. We apply higher payouts for more severe vulnerabilities to ensure that incentives are aligned with vulnerability risk. As we make it harder to find and exploit memory safety vulnerabilities, security researchers are pivoting their focus towards other vulnerability types. Perhaps the total number of vulnerabilities found is primarily constrained by the total researcher time devoted to finding them. Or perhaps there’s another explanation that we have not considered. In any case, we hope that if our vulnerability researcher community is finding fewer of these powerful and versatile vulnerabilities, the same applies to adversaries.

Attack surface

Despite most of the existing code in Android being in C/C++, most of Android’s API surface is implemented in Java. This means that Java is disproportionately represented in the OS’s attack surface that is reachable by apps. This provides an important security property: most of the attack surface that’s reachable by apps isn’t susceptible to memory corruption bugs. It also means that we would expect Java to be over-represented when looking at non-memory safety vulnerabilities. It’s important to note however that types of vulnerabilities that we’re seeing in Java are largely logic bugs, and as mentioned above, generally lower in severity. Going forward, we will be exploring how Rust’s richer type system can help prevent common types of logic bugs as well.

Google’s ability to react

With the vulnerability types we’re seeing now, Google’s ability to detect and prevent misuse is considerably better. Apps are scanned to help detect misuse of APIs before being published on the Play store and Google Play Protect warns users if they have abusive apps installed.

What’s next?

Migrating away from C/C++ is challenging, but we’re making progress. Rust use is growing in the Android platform, but that’s not the end of the story. To meet the goals of improving security, stability, and quality Android-wide, we need to be able to use Rust anywhere in the codebase that native code is required. We’re implementing userspace HALs in Rust. We’re adding support for Rust in Trusted Applications. We’ve migrated VM firmware in the Android Virtualization Framework to Rust. With support for Rust landing in Linux 6.1 we’re excited to bring memory-safety to the kernel, starting with kernel drivers.

As Android migrates away from C/C++ to Java/Kotlin/Rust, we expect the number of memory safety vulnerabilities to continue to fall. Here’s to a future where memory corruption bugs on Android are rare!

Posted by Matthew Maurer and Mike Yu, Android team

To help keep Android users’ DNS queries private, Android supports encrypted DNS. In addition to existing support for DNS-over-TLS, Android now supports DNS-over-HTTP/3 which has a number of improvements over DNS-over-TLS.

Most network connections begin with a DNS lookup. While transport security may be applied to the connection itself, that DNS lookup has traditionally not been private by default: the base DNS protocol is raw UDP with no encryption. While the internet has migrated to TLS over time, DNS has a bootstrapping problem. Certificate verification relies on the domain of the other party, which requires either DNS itself, or moves the problem to DHCP (which may be maliciously controlled). This issue is mitigated by central resolvers like Google, Cloudflare, OpenDNS and Quad9, which allow devices to configure a single DNS resolver locally for every network, overriding what is offered through DHCP.

In Android 9.0, we announced the Private DNS feature, which uses DNS-over-TLS (DoT) to protect DNS queries when enabled and supported by the server. Unfortunately, DoT incurs overhead for every DNS request. An alternative encrypted DNS protocol, DNS-over-HTTPS (DoH), is rapidly gaining traction within the industry as DoH has already been deployed by most public DNS operators, including the Cloudflare Resolver and Google Public DNS. While using HTTPS alone will not reduce the overhead significantly, HTTP/3 uses QUIC, a transport that efficiently multiplexes multiple streams over UDP using a single TLS session with session resumption. All of these features are crucial to efficient operation on mobile devices.

DNS-over-HTTP/3 (DoH3) support was released as part of a Google Play system update, so by the time you’re reading this, Android devices from Android 11 onwards1 will use DoH3 instead of DoT for well-known2 DNS servers which support it. Which DNS service you are using is unaffected by this change; only the transport will be upgraded. In the future, we aim to support DDR which will allow us to dynamically select the correct configuration for any server. This feature should decrease the performance impact of encrypted DNS.

Performance

DNS-over-HTTP/3 avoids several problems that can occur with DNS-over-TLS operation:

  • As DoT operates on a single stream of requests and responses, many server implementations suffer from head-of-line blocking3. This means that if the request at the front of the line takes a while to resolve (possibly because a recursive resolution is necessary), responses for subsequent requests that would have otherwise been resolved quickly are blocked waiting on that first request. DoH3 by comparison runs each request over a separate logical stream, which means implementations will resolve requests out-of-order by default.
  • Mobile devices change networks frequently as the user moves around. With DoT, these events require a full renegotiation of the connection. By contrast, the QUIC transport HTTP/3 is based on can resume a suspended connection in a single RTT.
  • DoT intends for many queries to use the same connection to amortize the cost of TCP and TLS handshakes at the start. Unfortunately, in practice several factors (such as network disconnects or server TCP connection management) make these connections less long-lived than we might like. Once a connection is closed, establishing the connection again requires at least 1 RTT.

    In unreliable networks, DoH3 may even outperform traditional DNS. While unintuitive, this is because the flow control mechanisms in QUIC can alert either party that packets weren’t received. In traditional DNS, the timeout for a query needs to be based on expected time for the entire query, not just for the resolver to receive the packet.

Field measurements during the initial limited rollout of this feature show that DoH3 significantly improves on DoT’s performance. For successful queries, our studies showed that replacing DoT with DoH3 reduces median query time by 24%, and 95th percentile query time by 44%. While it might seem suspect that the reported data is conditioned on successful queries, both DoT and DoH3 resolve 97% of queries successfully, so their metrics are directly comparable. UDP resolves only 83% of queries successfully. As a result, UDP latency is not directly comparable to TLS/HTTP3 latency because non-connection-oriented protocols have a different notion of what a "query" is. We have still included it for rough comparison.

Memory Safety

The DNS resolver processes input that could potentially be controlled by an attacker, both from the network and from apps on the device. To reduce the risk of security vulnerabilities, we chose to use a memory safe language for the implementation.

Fortunately, we’ve been adding Rust support to the Android platform. This effort is intended exactly for cases like this — system level features which need to be performant or low level (both in this case) and which would carry risk to implement in C++. While we’ve previously launched Keystore 2.0, this represents our first foray into Rust in Mainline Modules. Cloudflare maintains an HTTP/3 library called quiche, which fits our use case well, as it has a memory-safe implementation, few dependencies, and a small code size. Quiche also supports use directly from C++. We considered this, but even the request dispatching service had sufficient complexity that we chose to implement that portion in Rust as well.

We built the query engine using the Tokio async framework to simultaneously handle new requests, incoming packet events, control signals, and timers. In C++, this would likely have required multiple threads or a carefully crafted event loop. By leveraging asynchronous in Rust, this occurs on a single thread with minimal locking4. The DoH3 implementation is 1,640 lines and uses a single runtime thread. By comparison, DoT takes 1,680 lines while managing less and using up to 4 threads per DoT server in use.

Safety and Performance — Together at Last

With the introduction of Rust, we are able to improve both security and the performance at the same time. Likewise, QUIC allows us to improve network performance and privacy simultaneously. Finally, Mainline ensures that such improvements are able to make their way to more Android users sooner.

Acknowledgements

Special thanks to Luke Huang who greatly contributed to the development of this feature, and Lorenzo Colitti for his in-depth review of the technical aspects of this post.


  1. Some Android 10 devices which adopted Google Play system updates early will also receive this feature. 

  2. Google DNS and Cloudflare DNS at launch, others may be added in the future. 

  3. DoT can be implemented in a way that avoids this problem, as the client must accept server responses out of order. However, in practice most servers do not implement this reordering. 

  4. There is a lock used for the SSL context which is accessed once per DNS server, and another on the FFI when issuing a request. The FFI lock could be removed with changes to the C++ side, but has remained because it is low contention. 

One of the main challenges of evaluating Rust for use within the Android platform was ensuring we could provide sufficient interoperability with our existing codebase. If Rust is to meet its goals of improving security, stability, and quality Android-wide, we need to be able to use Rust anywhere in the codebase that native code is required. To accomplish this, we need to provide the majority of functionality platform developers use. As we discussed previously, we have too much C++ to consider ignoring it, rewriting all of it is infeasible, and rewriting older code would likely be counterproductive as the bugs in that code have largely been fixed. This means interoperability is the most practical way forward.

Before introducing Rust into the Android Open Source Project (AOSP), we needed to demonstrate that Rust interoperability with C and C++ is sufficient for practical, convenient, and safe use within Android. Adding a new language has costs; we needed to demonstrate that Rust would be able to scale across the codebase and meet its potential in order to justify those costs. This post will cover the analysis we did more than a year ago while we evaluated Rust for use in Android. We also present a follow-up analysis with some insights into how the original analysis has held up as Android projects have adopted Rust.

Language interoperability in Android

Existing language interoperability in Android focuses on well defined foreign-function interface (FFI) boundaries, which is where code written in one programming language calls into code written in a different language. Rust support will likewise focus on the FFI boundary as this is consistent with how AOSP projects are developed, how code is shared, and how dependencies are managed. For Rust interoperability with C, the C application binary interface (ABI) is already sufficient.

Interoperability with C++ is more challenging and is the focus of this post. While both Rust and C++ support using the C ABI, it is not sufficient for idiomatic usage of either language. Simply enumerating the features of each language results in an unsurprising conclusion: many concepts are not easily translatable, nor do we necessarily want them to be. After all, we’re introducing Rust because many features and characteristics of C++ make it difficult to write safe and correct code. Therefore, our goal is not to consider all language features, but rather to analyze how Android uses C++ and ensure that interop is convenient for the vast majority of our use cases.

We analyzed code and interfaces in the Android platform specifically, not codebases in general. While this means our specific conclusions may not be accurate for other codebases, we hope the methodology can help others to make a more informed decision about introducing Rust into their large codebase. Our colleagues on the Chrome browser team have done a similar analysis, which you can find here.

This analysis was not originally intended to be published outside of Google: our goal was to make a data-driven decision on whether or not Rust was a good choice for systems development in Android. While the analysis is intended to be accurate and actionable, it was never intended to be comprehensive, and we’ve pointed out a couple of areas where it could be more complete. However, we also note that initial investigations into these areas showed that they would not significantly impact the results, which is why we decided to not invest the additional effort.

Methodology

Exported functions from Rust and C++ libraries are where we consider interop to be essential. Our goals are simple:

  • Rust must be able to call functions from C++ libraries and vice versa.
  • FFI should require a minimum of boilerplate.
  • FFI should not require deep expertise.

While making Rust functions callable from C++ is a goal, this analysis focuses on making C++ functions available to Rust so that new Rust code can be added while taking advantage of existing implementations in C++. To that end, we look at exported C++ functions and consider existing and planned compatibility with Rust via the C ABI and compatibility libraries. Types are extracted by running objdump on shared libraries to find external C++ functions they use1 and running c++filt to parse the C++ types. This gives functions and their arguments. It does not consider return values, but a preliminary analysis2 of those revealed that they would not significantly affect the results.

We then classify each of these types into one of the following buckets:

Supported by bindgen

These are generally simple types involving primitives (including pointers and references to them). For these types, Rust’s existing FFI will handle them correctly, and Android’s build system will auto-generate the bindings.

Supported by cxx compat crate

These are handled by the cxx crate. This currently includes std::string, std::vector, and C++ methods (including pointers/references to these types). Users simply have to define the types and functions they want to share across languages and cxx will generate the code to do that safely.

Native support

These types are not directly supported, but the interfaces that use them have been manually reworked to add Rust support. Specifically, this includes types used by AIDL and protobufs.

We have also implemented a native interface for StatsD as the existing C++ interface relies on method overloading, which is not well supported by bindgen and cxx3. Usage of this system does not show up in the analysis because the C++ API does not use any unique types.

Potential addition to cxx

This is currently common data structures such as std::optional and std::chrono::duration and custom string and vector implementations.

These can either be supported natively by a future contribution to cxx, or by using its ExternType facilities. We have only included types in this category that we believe are relatively straightforward to implement and have a reasonable chance of being accepted into the cxx project.

We don't need/intend to support

Some types are exposed in today’s C++ APIs that are either an implicit part of the API, not an API we expect to want to use from Rust, or are language specific. Examples of types we do not intend to support include:

  • Mutexes - we expect that locking will take place in one language or the other, rather than needing to pass mutexes between languages, as per our coarse-grained philosophy.
  • native_handle - this is a JNI interface type, so it is inappropriate for use in Rust/C++ communication.
  • std::locale& - Android uses a separate locale system from C++ locales. This type primarily appears in output due to e.g., cout usage, which would be inappropriate to use in Rust.

Overall, this category represents types that we do not believe a Rust developer should be using.

HIDL

Android is in the process of deprecating HIDL and migrating to AIDL for HALs for new services.We’re also migrating some existing implementations to stable AIDL. Our current plan is to not support HIDL, preferring to migrate to stable AIDL instead. These types thus currently fall into the “We don't need/intend to support'' bucket above, but we break them out to be more specific. If there is sufficient demand for HIDL support, we may revisit this decision later.

Other

This contains all types that do not fit into any of the above buckets. It is currently mostly std::string being passed by value, which is not supported by cxx.

Top C++ libraries

One of the primary reasons for supporting interop is to allow reuse of existing code. With this in mind, we determined the most commonly used C++ libraries in Android: liblog, libbase, libutils, libcutils, libhidlbase, libbinder, libhardware, libz, libcrypto, and libui. We then analyzed all of the external C++ functions used by these libraries and their arguments to determine how well they would interoperate with Rust.

Overall, 81% of types are in the first three categories (which we currently fully support) and 87% are in the first four categories (which includes those we believe we can easily support). Almost all of the remaining types are those we believe we do not need to support.

Mainline modules

In addition to analyzing popular C++ libraries, we also examined Mainline modules. Supporting this context is critical as Android is migrating some of its core functionality to Mainline, including much of the native code we hope to augment with Rust. Additionally, their modularity presents an opportunity for interop support.

We analyzed 64 binaries and libraries in 21 modules. For each analyzed library we examined their used C++ functions and analyzed the types of their arguments to determine how well they would interoperate with Rust in the same way we did above for the top 10 libraries.

Here 88% of types are in the first three categories and 90% in the first four, with almost all of the remaining being types we do not need to handle.

Analysis of Rust/C++ Interop in AOSP

With almost a year of Rust development in AOSP behind us, and more than a hundred thousand lines of code written in Rust, we can now examine how our original analysis has held up based on how C/C++ code is currently called from Rust in AOSP.4

The results largely match what we expected from our analysis with bindgen handling the majority of interop needs. Extensive use of AIDL by the new Keystore2 service results in the primary difference between our original analysis and actual Rust usage in the “Native Support” category.

A few current examples of interop are:

  • Cxx in Bluetooth - While Rust is intended to be the primary language for Bluetooth, migrating from the existing C/C++ implementation will happen in stages. Using cxx allows the Bluetooth team to more easily serve legacy protocols like HIDL until they are phased out by using the existing C++ support to incrementally migrate their service.
  • AIDL in keystore - Keystore implements AIDL services and interacts with apps and other services over AIDL. Providing this functionality would be difficult to support with tools like cxx or bindgen, but the native AIDL support is simple and ergonomic to use.
  • Manually-written wrappers in profcollectd - While our goal is to provide seamless interop for most use cases, we also want to demonstrate that, even when auto-generated interop solutions are not an option, manually creating them can be simple and straightforward. Profcollectd is a small daemon that only exists on non-production engineering builds. Instead of using cxx it uses some small manually-written C wrappers around C++ libraries that it then passes to bindgen.

Conclusion

Bindgen and cxx provide the vast majority of Rust/C++ interoperability needed by Android. For some of the exceptions, such as AIDL, the native version provides convenient interop between Rust and other languages. Manually written wrappers can be used to handle the few remaining types and functions not supported by other options as well as to create ergonomic Rust APIs. Overall, we believe interoperability between Rust and C++ is already largely sufficient for convenient use of Rust within Android.

If you are considering how Rust could integrate into your C++ project, we recommend doing a similar analysis of your codebase. When addressing interop gaps, we recommend that you consider upstreaming support to existing compat libraries like cxx.

Acknowledgements

Our first attempt at quantifying Rust/C++ interop involved analyzing the potential mismatches between the languages. This led to a lot of interesting information, but was difficult to draw actionable conclusions from. Rather than enumerating all the potential places where interop could occur, Stephen Hines suggested that we instead consider how code is currently shared between C/C++ projects as a reasonable proxy for where we’ll also likely want interop for Rust. This provided us with actionable information that was straightforward to prioritize and implement. Looking back, the data from our real-world Rust usage has reinforced that the initial methodology was sound. Thanks Stephen!

Also, thanks to:

  • Andrei Homescu and Stephen Crane for contributing AIDL support to AOSP.
  • Ivan Lozano for contributing protobuf support to AOSP.
  • David Tolnay for publishing cxx and accepting our contributions.
  • The many authors and contributors to bindgen.
  • Jeff Vander Stoep and Adrian Taylor for contributions to this post.


  1. We used undefined symbols of function type as reported by objdump to perform this analysis. This means that any header-only functions will be absent from our analysis, and internal (non-API) functions which are called by header-only functions may appear in it. 

  2. We extracted return values by parsing DWARF symbols, which give the return types of functions. 

  3. Even without automated binding generation, manually implementing the bindings is straightforward. 

  4. In the case of handwritten C/C++ wrappers, we analyzed the functions they call, not the wrappers themselves. For all uses of our native AIDL library, we analyzed the types used in the C++ version of the library. 

The Android team has been working on introducing the Rust programming language into the Android Open Source Project (AOSP) since 2019 as a memory-safe alternative for platform native code development. As with any large project, introducing a new language requires careful consideration. For Android, one important area was assessing how to best fit Rust into Android’s build system. Currently this means the Soong build system (where the Rust support resides), but these design decisions and considerations are equally applicable for Bazel when AOSP migrates to that build system. This post discusses some of the key design considerations and resulting decisions we made in integrating Rust support into Android’s build system.

Rust integration into large projects

A RustConf 2019 meeting on Rust usage within large organizations highlighted several challenges, such as the risk that eschewing Cargo in favor of using the Rust Compiler, rustc, directly (see next section) may remove organizations from the wider Rust community. We share this same concern. When changes to imported third-party crates might be beneficial to the wider community, our goal is to upstream those changes. Likewise when crates developed for Android could benefit the wider Rust community, we hope to release them as independent crates. We believe that the success of Rust within Android is dependent on minimizing any divergence between Android and the Rust community at large, and hope that the Rust community will benefit from Android’s involvement.

No nested build systems

Rust provides Cargo as the default build system and package manager, collecting dependencies and invoking rustc (the Rust compiler) to build the target crate (Rust package). Soong takes this role instead in Android and calls rustc directly for several reasons:

  • In Cargo, C dependencies are handled independently in an ad-hoc manner via build.rs scripts. Soong already provides a mechanism for building C libraries and defining them as dependencies, and Android carefully controls the compiler version and global compilation flags to ensure libraries are built a particular way. Relying on Cargo would introduce a second non-Soong mechanism for defining/building C libraries that would not be constrained by the carefully selected compilation controls implemented in Soong. This could also lead to multiple different versions of the same library, negatively impacting memory/disk usage.
  • Calling compilers directly through Soong provides the stability and control Android requires for the variety of build configurations it supports (for example, specifying where target-specific dependencies are and which compilation flags to use). While it would technically be possible to achieve the necessary level of control over rustc indirectly through Cargo, Soong would have no understanding of how the Cargo.toml (the Cargo build file) would influence the commands Cargo emits to rustc. Paired with the fact that Cargo evolves independently, this would severely restrict Soong’s ability to precisely control how build artifacts are created.
  • Builds which are self-contained and insensitive to the host configuration, known as hermetic builds, are necessary for Android to produce reproducible builds. Cargo, which relies on build.rs scripts, doesn’t yet provide hermeticity guarantees.
  • Incremental builds are important to maintain engineering productivity; building Android takes a considerable amount of resources. Cargo was not designed for integration into existing build systems and does not expose its compilation units. Each Cargo invocation builds the entire crate dependency graph for a given Cargo.toml, rebuilding crates multiple times across projects1. This is too coarse for integration into Soong’s incremental build support, which expects smaller compilation units. This support is necessary to scale up Rust usage within Android.

    Using the Rust compiler directly allows us to avoid these issues and is consistent with how we compile all other code in AOSP. It provides the most control over the build process and eases integration into Android’s existing build system. Unfortunately, avoiding it introduces several challenges and influences many other build system decisions because Cargo usage is so deeply ingrained in the Rust crate ecosystem.

    No build.rs scripts

    A build.rs script compiles to a Rust binary which Cargo builds and executes during a build to handle pre-build tasks, commonly setting up the build environment, or building libraries in other languages (for example C/C++). This is analogous to configure scripts used for other languages.

    Avoiding build.rs scripts somewhat flows naturally from not relying on Cargo since supporting these would require replicating Cargo behavior and assumptions. Beyond this however, there are good reasons for AOSP to avoid build scripts as well:

    • build.rs scripts can execute arbitrary code on the build host. From a security perspective, this introduces an additional burden when adding or updating third-party code as the build.rs script needs careful scrutiny.
    • Third-party build.rs scripts may not be hermetic or reproducible in potentially subtle ways. It is also common for build.rs files to access files outside the build directory (such as /usr/lib). When they are not hermetic, we would need to either carry a local patch or work with upstream to resolve the issue.
    • The most common task for build.rs is to build C libraries which Rust code depends on. We already support this through Soong.
    • Android likewise avoids running build scripts while building for other languages, instead, simply using them to inform the structure of the Android.bp file.

For instances in third-party code where a build script is used only to compile C dependencies, we either use existing cc_library Soong definitions (such as boringssl for quiche) or create new definitions for crate-specific code.

When the build.rs is used to generate source, we try to replicate the core functionality in a Soong rust_binary module for use as a custom source generator. In other cases where Soong can provide the information without source generation, we may carry a small patch that leverages this information.

Why proc_macro but not build.rs?

Why do we support proc_macros, which are compiler plug-ins that execute code on the host within the compiler context, but not build.rs scripts?

While build.rs code is written as one-off code to handle building a single crate, proc_macros define reusable functionality within the compiler which can become widely relied upon across the Rust community. As a result popular proc_macros are generally better maintained and more scrutinized upstream, which makes the code review process more manageable. They are also more readily sandboxed as part of the build process since they are less likely to have dependencies external to the compiler.

proc_macros are also a language feature rather than a method for building code. These are relied upon by source code, are unavoidable for third-party dependencies, and are useful enough to define and use within our platform code. While we can avoid build.rs by leveraging our build system, the same can’t be said of proc_macros.

There is also precedence for compiler plugin support within the Android build system. For example see Soong’s java_plugin modules.

Generated source as crates

Unlike C/C++ compilers, rustc only accepts a single source file representing an entry point to a binary or library. It expects that the source tree is structured such that all required source files can be automatically discovered. This means that generated source either needs to be placed in the source tree or provided through an include directive in source:

include!("/path/to/hello.rs");

The Rust community depends on build.rs scripts alongside assumptions about the Cargo build environment to get around this limitation. When building, the cargo command sets an OUT_DIR environment variable which build.rs scripts are expected to place generated source code in. This source can then be included via:

include!(concat!(env!("OUT_DIR"), "/hello.rs"));

This presents a challenge for Soong as outputs for each module are placed in their own out/ directory2; there is no single OUT_DIR where dependencies output their generated source.

For platform code, we prefer to package generated source into a crate that can be imported. There are a few reasons to favor this approach:

  • Prevent generated source file names from colliding.
  • Reduce boilerplate code checked-in throughout the tree and which needs to be maintained. Any boilerplate necessary to make the generated source compile into a crate can be centrally maintained.
  • Avoid implicit3 interactions between generated code and the surrounding crate.
  • Reduce pressure on memory and disk by dynamically liking commonly used generated sources.

    As a result, all of Android’s Rust source generation module types produce code that can be compiled and used as a crate.

    We still support third-party crates without modification by copying all the generated source dependencies for a module into a single per-module directory similar to Cargo. Soong then sets the OUT_DIR environment variable to that directory when compiling the module so the generated source can be found. However we discourage use of this mechanism in platform code unless absolutely necessary for the reasons described above.

    Dynamic linkage by default

    By default, the Rust ecosystem assumes that crates will be statically linked into binaries. The usual benefits of dynamic libraries are upgrades (whether for security or functionality) and decreased memory usage. Rust’s lack of a stable binary interface and usage of cross-crate information flow prevents upgrading libraries without upgrading all dependent code. Even when the same crate is used by two different programs on the system, it is unlikely to be provided by the same shared object4 due to the precision with which Rust identifies its crates. This makes Rust binaries more portable but also results in larger disk and memory footprints.

    This is problematic for Android devices where resources like memory and disk usage must be carefully managed because statically linking all crates into Rust binaries would result in excessive code duplication (especially in the standard library). However, our situation is also different from the standard host environment: we build Android using global decisions about dependencies. This means that nearly every crate is shareable between all users of that crate. Thus, we opt to link crates dynamically by default for device targets. This reduces the overall memory footprint of Rust in Android by allowing crates to be reused across multiple binaries which depend on them.

    Since this is unusual in the Rust community, not all third-party crates support dynamic compilation. Sometimes we must carry small patches while we work with upstream maintainers to add support.

    Current Status of Build Support

    We support building all output types supported by rustc (rlibs, dylibs, proc_macros, cdylibs, staticlibs, and executables). Rust modules can automatically request the appropriate crate linkage for a given dependency (rlib vs dylib). C and C++ modules can depend on Rust cdylib or staticlib producing modules the same way as they would for a C or C++ library.

    In addition to being able to build Rust code, Android’s build system also provides support for protobuf and gRPC and AIDL generated crates. First-class bindgen support makes interfacing with existing C code simple and we have support modules using cxx for tighter integration with C++ code.

    The Rust community produces great tooling for developers, such as the language server rust-analyzer. We have integrated support for rust-analyzer into the build system so that any IDE which supports it can provide code completion and goto definitions for Android modules.

    Source-based code coverage builds are supported to provide platform developers high level signals on how well their code is covered by tests. Benchmarks are supported as their own module type, leveraging the criterion crate to provide performance metrics. In order to maintain a consistent style and level of code quality, a default set of clippy lints and rustc lints are enabled by default. Additionally, HWASAN/ASAN fuzzers are supported, with the HWASAN rustc support added to upstream.

    In the near future, we plan to add documentation to source.android.com on how to define and use Rust modules in Soong. We expect Android’s support for Rust to continue evolving alongside the Rust ecosystem and hope to continue to participate in discussions around how Rust can be integrated into existing build systems.

    Thank you to Matthew Maurer, Jeff Vander Stoep, Joel Galenson, Manish Goregaokar, and Tyler Mandry for their contributions to this post.

    Notes


    1. This can be mitigated to some extent with workspaces, but requires a very specific directory arrangement that AOSP does not conform to. 

    2. This presents no problem for C/C++ and similar languages as the path to the generated source is provided directly to the compiler. 

    3. Since include! works by textual inclusion, it may reference values from the enclosing namespace, modify the namespace, or use constructs like #![foo]. These implicit interactions can be difficult to maintain. Macros should be preferred if interaction with the rest of the crate is truly required.  

    4. While libstd would usually be shareable for the same compiler revision, most other libraries would end up with several copies for Cargo-built Rust binaries, since each build would attempt to use a minimum feature set and may select different dependency versions for the library in question. Since information propagates across crate boundaries, you cannot simply produce a “most general” instance of that library. 

In our previous post, we announced that Android now supports the Rust programming language for developing the OS itself. Related to this, we are also participating in the effort to evaluate the use of Rust as a supported language for developing the Linux kernel. In this post, we discuss some technical aspects of this work using a few simple examples.

C has been the language of choice for writing kernels for almost half a century because it offers the level of control and predictable performance required by such a critical component. Density of memory safety bugs in the Linux kernel is generally quite low due to high code quality, high standards of code review, and carefully implemented safeguards. However, memory safety bugs do still regularly occur. On Android, vulnerabilities in the kernel are generally considered high-severity because they can result in a security model bypass due to the privileged mode that the kernel runs in.

We feel that Rust is now ready to join C as a practical language for implementing the kernel. It can help us reduce the number of potential bugs and security vulnerabilities in privileged code while playing nicely with the core kernel and preserving its performance characteristics.

Supporting Rust

We developed an initial prototype of the Binder driver to allow us to make meaningful comparisons between the safety and performance characteristics of the existing C version and its Rust counterpart. The Linux kernel has over 30 million lines of code, so naturally our goal is not to convert it all to Rust but rather to allow new code to be written in Rust. We believe this incremental approach allows us to benefit from the kernel’s existing high-performance implementation while providing kernel developers with new tools to improve memory safety and maintain performance going forward.

We joined the Rust for Linux organization, where the community had already done and continues to do great work toward adding Rust support to the Linux kernel build system. We also need designs that allow code in the two languages to interact with each other: we're particularly interested in safe, zero-cost abstractions that allow Rust code to use kernel functionality written in C, and how to implement functionality in idiomatic Rust that can be called seamlessly from the C portions of the kernel.

Since Rust is a new language for the kernel, we also have the opportunity to enforce best practices in terms of documentation and uniformity. For example, we have specific machine-checked requirements around the usage of unsafe code: for every unsafe function, the developer must document the requirements that need to be satisfied by callers to ensure that its usage is safe; additionally, for every call to unsafe functions (or usage of unsafe constructs like dereferencing a raw pointer), the developer must document the justification for why it is safe to do so.

Just as important as safety, Rust support needs to be convenient and helpful for developers to use. Let’s get into a few examples of how Rust can assist kernel developers in writing drivers that are safe and correct.

Example driver

We'll use an implementation of a semaphore character device. Each device has a current value; writes of n bytes result in the device value being incremented by n; reads decrement the value by 1 unless the value is 0, in which case they will block until they can decrement the count without going below 0.

Suppose semaphore is a file representing our device. We can interact with it from the shell as follows:

> cat semaphore

When semaphore is a newly initialized device, the command above will block because the device's current value is 0. It will be unblocked if we run the following command from another shell because it increments the value by 1, which allows the original read to complete:

> echo -n a > semaphore

We could also increment the count by more than 1 if we write more data, for example:

> echo -n abc > semaphore

increments the count by 3, so the next 3 reads won't block.

To allow us to show a few more aspects of Rust, we'll add the following features to our driver: remember what the maximum value was throughout the lifetime of a device, and remember how many reads each file issued on the device.

We'll now show how such a driver would be implemented in Rust, contrasting it with a C implementation. We note, however, we are still early on so this is all subject to change in the future. How Rust can assist the developer is the aspect that we'd like to emphasize. For example, at compile time it allows us to eliminate or greatly reduce the chances of introducing classes of bugs, while at the same time remaining flexible and having minimal overhead.

Character devices

A developer needs to do the following to implement a driver for a new character device in Rust:

  1. Implement the FileOperations trait: all associated functions are optional, so the developer only needs to implement the relevant ones for their scenario. They relate to the fields in C's struct file_operations.
  2. Implement the FileOpener trait: it is a type-safe equivalent to C's open field of struct file_operations.
  3. Register the new device type with the kernel: this lets the kernel know what functions need to be called in response to files of this new type being operated on.

The following outlines how the first two steps of our example compare in Rust and C:

impl FileOpener<Arc<Semaphore>> for FileState {
    fn open(
        shared: &Arc<Semaphore>
    ) -> KernelResult<Box<Self>> {
        [...]
    }
}
 
impl FileOperations for FileState {
    type Wrapper = Box<Self>;
 
    fn read(
        &self,
        _: &File,
        data: &mut UserSlicePtrWriter,
        offset: u64
    ) -> KernelResult<usize> {
        [...]
    }
 
    fn write(
        &self,
        data: &mut UserSlicePtrReader,
        _offset: u64
    ) -> KernelResult<usize> {
        [...]
    }
 
    fn ioctl(
        &self,
        file: &File,
        cmd: &mut IoctlCommand
    ) -> KernelResult<i32> {
        [...]
    }
 
    fn release(_obj: Box<Self>, _file: &File) {
        [...]
    }
 
    declare_file_operations!(read, write, ioctl);
}
static 
int semaphore_open(struct inode *nodp,
                   struct file *filp)
{
    struct semaphore_state *shared =
        container_of(filp->private_data,
                     struct semaphore_state,
                     miscdev);
    [...]
}
 
static
ssize_t semaphore_write(struct file *filp,
                        const char __user *buffer,
                        size_t count, loff_t *ppos)
{
    struct file_state *state = filp->private_data;
    [...]
}
 
static
ssize_t semaphore_read(struct file *filp,
                       char __user *buffer,
                       size_t count, loff_t *ppos)
{
    struct file_state *state = filp->private_data;
    [...]
}
 
static
long semaphore_ioctl(struct file *filp,
                     unsigned int cmd,
                     unsigned long arg)
{
    struct file_state *state = filp->private_data;
    [...]
}
 
static
int semaphore_release(struct inode *nodp,
                      struct file *filp)
{
    struct file_state *state = filp->private_data;
    [...]
}
 
static const struct file_operations semaphore_fops = {
        .owner = THIS_MODULE,
        .open = semaphore_open,
        .read = semaphore_read,
        .write = semaphore_write,
        .compat_ioctl = semaphore_ioctl,
        .release = semaphore_release,
};

Character devices in Rust benefit from a number of safety features:

  • Per-file state lifetime management: FileOpener::open returns an object whose lifetime is owned by the caller from then on. Any object that implements the PointerWrapper trait can be returned, and we provide implementations for Box<T> and Arc<T>, so developers that use Rust's idiomatic heap-allocated or reference-counted pointers have no additional requirements.

    All associated functions in FileOperations receive non-mutable references to self (more about this below), except the release function, which is the last function to be called and receives the plain object back (and its ownership with it). The release implementation can then defer the object destruction by transferring its ownership elsewhere, or destroy it then; in the case of a reference-counted object, 'destruction' means decrementing the reference count (and actual object destruction if the count goes to zero).

    That is, we use Rust's ownership discipline when interacting with C code by handing the C portion ownership of a Rust object, allowing it to call functions implemented in Rust, then eventually giving ownership back. So as long as the C code is correct, the lifetime of Rust file objects work seamlessly as well, with the compiler enforcing correct lifetime management on the Rust side, for example: open cannot return stack-allocated pointers or heap-allocated objects containing pointers to the stack, ioctl/read/write cannot free (or modify without synchronization) the contents of the object stored in filp->private_data, etc.

  • Non-mutable references: the associated functions called between open and release all receive non-mutable references to self because they can be called concurrently by multiple threads and Rust aliasing rules prohibit more than one mutable reference to an object at any given time.

    If a developer needs to modify some state (and they generally do), they can do so via interior mutability: mutable state can be wrapped in a Mutex<T> or SpinLock<T> (or atomics) and safely modified through them.

    This prevents, at compile-time, bugs where a developer fails to acquire the appropriate lock when accessing a field (the field is inaccessible), or when a developer fails to wrap a field with a lock (the field is read-only).

  • Per-device state: when file instances need to share per-device state, which is a very common occurrence in drivers, they can do so safely in Rust. When a device is registered, a typed object can be provided and a non-mutable reference to it is provided when FileOperation::open is called. In our example, the shared object is wrapped in Arc<T>, so files can safely clone and hold on to a reference to them.

    The reason FileOperation is its own trait (as opposed to, for example, open being part of the FileOperations trait) is to allow a single file implementation to be registered in different ways.

    This eliminates opportunities for developers to get the wrong data when trying to retrieve shared state. For example, in C when a miscdevice is registered, a pointer to it is available in filp->private_data; when a cdev is registered, a pointer to it is available in inode->i_cdev. These structs are usually embedded in an outer struct that contains the shared state, so developers usually use the container_of macro to recover the shared state. Rust encapsulates all of this and the potentially troublesome pointer casts in a safe abstraction.

  • Static typing: we take advantage of Rust's support for generics to implement all of the above functions and types with static types. So there are no opportunities for a developer to convert an untyped variable or field to the wrong type. The C code in the table above has casts from an essentially untyped (void *) pointer to the desired type at the start of each function: this is likely to work fine when first written, but may lead to bugs as the code evolves and assumptions change. Rust would catch any such mistakes at compile time.

  • File operations: as we mentioned before, a developer needs to implement the FileOperations trait to customize the behavior of their device. They do this with a block starting with impl FileOperations for Device, where Device is the type implementing the file behavior (FileState in our example). Once inside this block, tools know that only a limited number of functions can be defined, so they can automatically insert the prototypes. (Personally, I use neovim and the rust-analyzer LSP server.)

    While we use this trait in Rust, the C portion of the kernel still requires an instance of struct file_operations. The kernel crate automatically generates one from the trait implementation (and optionally the declare_file_operations macro): although it has code to generate the correct struct, it is all const, so evaluated at compile-time with zero runtime cost.

Ioctl handling

For a driver to provide a custom ioctl handler, it needs to implement the ioctl function that is part of the FileOperations trait, as exemplified in the table below.

fn ioctl(
    &self,
    file: &File,
    cmd: &mut IoctlCommand
) -> KernelResult<i32> {
    cmd.dispatch(self, file)
}
 
impl IoctlHandler for FileState {
    fn read(
        &self,
        _file: &File,
        cmd: u32,
        writer: &mut UserSlicePtrWriter
    ) -> KernelResult<i32> {
        match cmd {
            IOCTL_GET_READ_COUNT => {
                writer.write(
                    &self
                    .read_count
                    .load(Ordering::Relaxed))?;
                Ok(0)
            }
            _ => Err(Error::EINVAL),
        }
    }
 
    fn write(
        &self,
        _file: &File,
        cmd: u32,
        reader: &mut UserSlicePtrReader
    ) -> KernelResult<i32> {
        match cmd {
            IOCTL_SET_READ_COUNT => {
                self
                .read_count
                .store(reader.read()?,
                       Ordering::Relaxed);
                Ok(0)
            }
            _ => Err(Error::EINVAL),
        }
    }
}
#define IOCTL_GET_READ_COUNT _IOR('c', 1, u64)
#define IOCTL_SET_READ_COUNT _IOW('c', 1, u64)
 
static
long semaphore_ioctl(struct file *filp,
                     unsigned int cmd,
                     unsigned long arg)
{
    struct file_state *state = filp->private_data;
    void __user *buffer = (void __user *)arg;
    u64 value;
 
    switch (cmd) {
    case IOCTL_GET_READ_COUNT:
        value = atomic64_read(&state->read_count);
        if (copy_to_user(buffer, &value, sizeof(value)))
            return -EFAULT;
        return 0;
    case IOCTL_SET_READ_COUNT:
        if (copy_from_user(&value, buffer, sizeof(value)))
            return -EFAULT;
        atomic64_set(&state->read_count, value);
        return 0;
    default:
        return -EINVAL;
    }
}

Ioctl commands are standardized such that, given a command, we know whether a user buffer is provided, its intended use (read, write, both, none), and its size. In Rust, we provide a dispatcher (accessible by calling cmd.dispatch) that uses this information to automatically create user memory access helpers and pass them to the caller.

A driver is not required to use this though. If, for example, it doesn't use the standard ioctl encoding, Rust offers the flexibility of simply calling cmd.raw to extract the raw arguments and using them to handle the ioctl (potentially with unsafe code, which will need to be justified).

However, if a driver implementation does use the standard dispatcher, it will benefit from not having to implement any unsafe code, and:

  • The pointer to user memory is never a native pointer, so the developer cannot accidentally dereference it.
  • The types that allow the driver to read from user space only allow data to be read once, so we eliminate the risk of time-of-check to time-of-use (TOCTOU) bugs because when a driver needs to access data twice, it needs to copy it to kernel memory, where an attacker is not allowed to modify it. Excluding unsafe blocks, there is no way to introduce this class of bugs in Rust.
  • No accidental overflow of the user buffer: we'll never read or write past the end of the user buffer because this is enforced automatically based on the size encoded in the ioctl command. In our example above, the implementation of IOCTL_GET_READ_COUNT only has access to an instance of UserSlicePtrWriter, which limits the number of writable bytes to sizeof(u64) as encoded in the ioctl command.
  • No mixing of reads and writes: we'll never write buffers for ioctls that are only meant to read and never read buffers for ioctls that are only meant to write. This is enforced by read and write handlers only getting instances of UserSlicePtrWriter and UserSlicePtrReader respectively.

All of the above could potentially also be done in C, but it's very easy for developers to (likely unintentionally) break contracts that lead to unsafety; Rust requires unsafe blocks for this, which should only be used in rare cases and brings additional scrutiny. Additionally, Rust offers the following:

  • The types used to read and write user memory do not implement the Send and Sync traits, which means that they (and pointers to them) are not safe to be used in another thread context. In Rust, if a driver developer attempted to write code that passed one of these objects to another thread (where it wouldn't be safe to use them because it isn't necessarily in the right memory manager context), they would get a compilation error.
  • When calling IoctlCommand::dispatch, one might understandably think that we need dynamic dispatching to reach the actual handler implementation (which would incur additional cost in comparison to C), but we don't. Our usage of generics will lead the compiler to monomorphize the function, which will result in static function calls that can even be inlined if the optimizer so chooses.

Locking and condition variables

We allow developers to use mutexes and spinlocks to provide interior mutability. In our example, we use a mutex to protect mutable data; in the tables below we show the data structures we use in C and Rust, and how we implement a wait until the count is nonzero so that we can satisfy a read:

struct SemaphoreInner {
    count: usize,
    max_seen: usize,
}
 
struct Semaphore {
    changed: CondVar,
    inner: Mutex<SemaphoreInner>,
}
 
struct FileState {
    read_count: AtomicU64,
    shared: Arc<Semaphore>,
}
struct semaphore_state {
    struct kref ref;
    struct miscdevice miscdev;
    wait_queue_head_t changed;
    struct mutex mutex;
    size_t count;
    size_t max_seen;
};
 
struct file_state {
    atomic64_t read_count;
    struct semaphore_state *shared;
};

fn consume(&self) -> KernelResult {
    let mut inner = self.shared.inner.lock();
    while inner.count == 0 {
        if self.shared.changed.wait(&mut inner) {
            return Err(Error::EINTR);
        }
    }
    inner.count -= 1;
    Ok(())
}
static int semaphore_consume(
    struct semaphore_state *state)
{
    DEFINE_WAIT(wait);
 
    mutex_lock(&state->mutex);
    while (state->count == 0) {
        prepare_to_wait(&state->changed, &wait,
                        TASK_INTERRUPTIBLE);
        mutex_unlock(&state->mutex);
        schedule();
        finish_wait(&state->changed, &wait);
        if (signal_pending(current))
            return -EINTR;
        mutex_lock(&state->mutex);
    }
 
    state->count--;
    mutex_unlock(&state->mutex);
 
    return 0;
}

We note that such waits are not uncommon in the existing C code, for example, a pipe waiting for a "partner" to write, a unix-domain socket waiting for data, an inode search waiting for completion of a delete, or a user-mode helper waiting for state change.

The following are benefits from the Rust implementation:

  • The Semaphore::inner field is only accessible when the lock is held, through the guard returned by the lock function. So developers cannot accidentally read or write protected data without locking it first. In the C example above, count and max_seen in semaphore_state are protected by mutex, but there is no enforcement that the lock is held while they're accessed.
  • Resource Acquisition Is Initialization (RAII): the lock is unlocked automatically when the guard (inner in this case) goes out of scope. This ensures that locks are always unlocked: if the developer needs to keep a lock locked, they can keep the guard alive, for example, by returning the guard itself; conversely, if they need to unlock before the end of the scope, they can explicitly do it by calling the drop function.
  • Developers can use any lock that implements the Lock trait, which includes Mutex and SpinLock, at no additional runtime cost when compared to a C implementation. Other synchronization constructs, including condition variables, also work transparently and with zero additional run-time cost.
  • Rust implements condition variables using kernel wait queues. This allows developers to benefit from atomic release of the lock and putting the thread to sleep without having to reason about low-level kernel scheduler functions. In the C example above, semaphore_consume is a mix of semaphore logic and subtle Linux scheduling: for example, the code is incorrect if mutex_unlock is called before prepare_to_wait because it may result in a wake up being missed.
  • No unsynchronized access: as we mentioned before, variables shared by multiple threads/CPUs must be read-only, with interior mutability being the solution for cases when mutability is needed. In addition to the example with locks above, the ioctl example in the previous section also has an example of using an atomic variable; Rust also requires developers to specify how memory is to be synchronized by atomic accesses. In the C part of the example, we happen to use atomic64_t, but the compiler won't alert a developer to this need.

Error handling and control flow

In the tables below, we show how open, read, and write are implemented in our example driver:

fn read(
    &self,
    _: &File,
    data: &mut UserSlicePtrWriter,
    offset: u64
) -> KernelResult<usize> {
   if data.is_empty() || offset > 0 {
        return Ok(0);
    }
 
    self.consume()?;
    data.write_slice(&[0u8; 1])?;
    self.read_count.fetch_add(1, Ordering::Relaxed);
    Ok(1)
}
 
static
ssize_t semaphore_read(struct file *filp,
                       char __user *buffer,
                       size_t count, loff_t *ppos)
{
    struct file_state *state = filp->private_data;
    char c = 0;
    int ret;
 
    if (count == 0 || *ppos > 0)
        return 0;
 
    ret = semaphore_consume(state->shared);
    if (ret)
        return ret;
 
    if (copy_to_user(buffer, &c, sizeof(c)))
        return -EFAULT;
 
    atomic64_add(1, &state->read_count);
    *ppos += 1;
    return 1;
}

fn write(
    &self,
    data: &mut UserSlicePtrReader,
    _offset: u64
) -> KernelResult<usize> {
   {
        let mut inner = self.shared.inner.lock();
        inner.count = inner.count.saturating_add(data.len());
        if inner.count > inner.max_seen {
            inner.max_seen = inner.count;
        }
    }
 
    self.shared.changed.notify_all();
    Ok(data.len())
}
static
ssize_t semaphore_write(struct file *filp,
                        const char __user *buffer,
                        size_t count, loff_t *ppos)
{
    struct file_state *state = filp->private_data;
    struct semaphore_state *shared = state->shared;
 
    mutex_lock(&shared->mutex);
    shared->count += count;
    if (shared->count < count)
        shared->count = SIZE_MAX;
 
    if (shared->count > shared->max_seen)
        shared->max_seen = shared->count;
 
    mutex_unlock(&shared->mutex);
 
    wake_up_all(&shared->changed);
    return count;
}

fn open(
    shared: &Arc<Semaphore>
) -> KernelResult<Box<Self>> {
    Ok(Box::try_new(Self {
        read_count: AtomicU64::new(0),
        shared: shared.clone(),
    })?)
}
static 
int semaphore_open(struct inode *nodp,
                   struct file *filp)
{
    struct semaphore_state *shared =
        container_of(filp->private_data,
                     struct semaphore_state,
                     miscdev);
    struct file_state *state;
 
    state = kzalloc(sizeof(*state), GFP_KERNEL);
    if (!state)
        return -ENOMEM;
 
    kref_get(&shared->ref);
    state->shared = shared;
    atomic64_set(&state->read_count, 0);
 
    filp->private_data = state;
 
    return 0;
}

They illustrate other benefits brought by Rust:

  • The ? operator: it is used by the Rust open and read implementations to do error handling implicitly; the developer can focus on the semaphore logic, the resulting code being quite small and readable. The C versions have error-handling noise that can make them less readable.
  • Required initialization: Rust requires all fields of a struct to be initialized on construction, so the developer can never accidentally fail to initialize a field; C offers no such facility. In our open example above, the developer of the C version could easily fail to call kref_get (even though all fields would have been initialized); in Rust, the user is required to call clone (which increments the ref count), otherwise they get a compilation error.
  • RAII scoping: the Rust write implementation uses a statement block to control when inner goes out of scope and therefore the lock is released.
  • Integer overflow behavior: Rust encourages developers to always consider how overflows should be handled. In our write example, we want a saturating one so that we don't end up with a zero value when adding to our semaphore. In C, we need to manually check for overflows, there is no additional support from the compiler.

What's next

The examples above are only a small part of the whole project. We hope it gives readers a glimpse of the kinds of benefits that Rust brings. At the moment we have nearly all generic kernel functionality needed by Binder neatly wrapped in safe Rust abstractions, so we are in the process of gathering feedback from the broader Linux kernel community with the intent of upstreaming the existing Rust support.

We also continue to make progress on our Binder prototype, implement additional abstractions, and smooth out some rough edges. This is an exciting time and a rare opportunity to potentially influence how the Linux kernel is developed, as well as inform the evolution of the Rust language. We invite those interested to join us in Rust for Linux and attend our planned talk at Linux Plumbers Conference 2021!


Thanks Nick Desaulniers, Kees Cook, and Adrian Taylor for contributions to this post. Special thanks to Jeff Vander Stoep for contributions and editing, and to Greg Kroah-Hartman for reviewing and contributing to the code examples.