Google Online Security Blog: October 2019

How Google adopted BeyondCorp: Part 4 (services)

October 31, 2019

Posted by Guilherme Gonçalves, Site Reliability Engineer and Kyle O'Malley, Security Engineer

Intro
This is the final post in a series of four, in which we set out to revisit various BeyondCorp topics and share lessons that were learnt along the internal implementation path at Google.

The first post in this series focused on providing necessary context for how Google adopted BeyondCorp, Google’s implementation of the zero trust security model. The second post focused on managing devices - how we decide whether or not a device should be trusted and why that distinction is necessary. The third post focused on tiered access - how to define access tiers and rules and how to simplify troubleshooting when things go wrong.

This post introduces the concept of gated services, how to identify and, subsequently, migrate them and the associated lessons we learned along the way.

High level architecture for BeyondCorp

Identifying and gating services

How do you identify and categorize all the services that should be gated?
Google began as a web-based company, and as it matured in the modern era, most internal business applications were developed with a web-first approach. These applications were hosted on similar internal architecture as our external services, with the exception that they could only be accessed on corporate office networks. Thus, identifying services to be gated by BeyondCorp was made easier for us due to the fact that most internal services were already properly inventoried and hosted via standard, central solutions. Migration, in many cases, was as simple as a DNS change. Solid IT asset inventory systems and maintenance are critical to migrating to a zero trust security model.

Enforcement of zero trust access policies began with services which we determined would not be meaningfully impacted by the change in access requirements. For most services, requirements could be gathered via typical access log analysis or consulting with service owners. Services which could not be readily gated by default ACL requirements required service owners to develop strict access groups and/or eliminate risky workflows before they could be migrated.

How do you know which trust tier is needed for every service?
As discussed in our previous blog post, Google makes internal services available based on device trust tiers. Today, those services are accessible by the highest trust tier by default.

When the intent of the change is to restrict access to a service to a specific group or team, service owners are free to propose access changes to add or remove restrictions to their service. Access changes which are deemed to be sufficiently low risk can be automatically approved. In all other cases, such as where the owning team wants to expose a service to a risky device tier, they must work with security engineers to follow the principle of least privilege and devise solutions.

What do you do with services that are incompatible with BeyondCorp ideals?

It may not always be possible to gate an application by the preferred zero trust solution. Services that cannot be easily gated typically fall into these categories:

Type 1: "Non-proxyable protocols", e.g. non-HTTP/HTTPS traffic.
Type 2: Low latency requirements or localized high throughput traffic.
Type 3: Administrative and emergency access networks.

The typical first step in finding a solution for these cases is finding a way to remove the need for that service altogether. In many cases, this was made possible by deprecating or replacing systems which could not be made compatible with the BeyondCorp implementation.

When that was not an option, we found that no single solution would work for all critical requirements:

Solutions for the "Type 1" traffic have generally involved maintaining a specialized client tunneling which strongly enforces authentication and authorization decisions on the client and the server end of the connection. This is usually client/server type traffic which is similar to HTTP traffic in that connectivity is typically multi-point to point.
Solutions to the "Type 2" problems generally rely on moving BeyondCorp-compatible compute resources locally or developing a solution tightly integrated with network access equipment to selectively forward "local" traffic without permanently opening network holes.
As for “Type 3,” it would be ideal to completely eliminate all privileged internal networks. However, the reality is that some privileged networking will likely always be required to maintain the network itself and also to provide emergency access during outages.

It should be noted that server-to-server traffic in secure production data center environments does not necessarily rely on BeyondCorp, although many systems are integrated regardless, due to the Service-Oriented Design benefits that BeyondCorp inherently provides.

How do you prioritize gating?

Prioritization starts by identifying all the services that are currently accessible via internal IP-access alone and migrating the most critical services to BeyondCorp, while working to slowly ratchet down permissions via exception management processes. Criticality of the service may also depend on the number and type of users, sensitivity of data handled, security and privacy risks enabled by the service.

Migration logistics

Most services required integration testing with the BeyondCorp proxy. Service teams were encouraged to stand up "test" services which were used to test functionality behind the BeyondCorp proxy. Most services that performed their own access control enforcement were reconfigured to instead rely on BeyondCorp for all user/group authentication and authorization. Service teams have been encouraged to develop their own "fine-grained" discretionary access controls in the services by leveraging session data provided by the BeyondCorp proxy.

Lessons learnt

Allow coarse gating and exceptions

Inventory: It's easy to overlook the importance of keeping a good inventory of services, devices, owners and security exceptions. The journey to a BeyondCorp world should start by solving organizational challenges when managing and maintaining data quality in inventory systems. In short, knowing how a service works, who should access it, and what makes that acceptable are the central tenets of managing BeyondCorp. Fine-grained access control is severely complicated when this insight is missing.

Legacy protocols: Most large enterprises will inevitably need to support workflows and protocols which cannot be migrated to a BeyondCorp world (in any reasonable amount of time). Exception management and service inventory become crucial at this stage while stakeholders develop solutions.

Run highly reliable systems
The BeyondCorp initiative would not be sustainable at Google’s scale without the involvement of various Site Reliability Engineering (SRE) teams across the inventory systems, BeyondCorp infrastructure and client side solutions. The ability to successfully achieve wide-spread adoption of changes this large can be hampered by perceived (or in some cases, actual) reliability issues. Understanding the user workflows that might be impacted, working with key stakeholders and ensuring the transition is smooth and trouble-free for all users helps protect against backlash and avoids users finding undesirable workarounds. By applying our reliability engineering practices, those teams helped to ensure that the components of our implementation all have availability and latency targets, operational robustness, etc. These are compatible with our business needs and intended user experiences.

Put employees in control as much as possible

Employees cover a broad range of job functions with varying requirements of technology and tools. In addition to communicating changes to our employees early, we provide them with self-service solutions for handling exceptions or addressing issues affecting their devices. By putting our employees in control, we help to ensure that security mechanisms do not get in their way, helping with the acceptance and scaling processes.

Summary

Throughout this series of blog posts, we set out to revisit and demystify BeyondCorp, Google’s internal implementation of a zero trust security model. The four posts had different focus areas - setting context, devices, tiered access and, finally, services (this post).

If you want to learn more, you can check out the BeyondCorp research papers. In addition, getting started with BeyondCorp is now easier using zero trust solutions from Google Cloud (context-aware access) and other enterprise providers. Lastly, stay tuned for an upcoming BeyondCorp webinar on Cloud OnAir in a few months where you will be able to learn more and ask us questions. We hope that these blog posts, research papers, and webinars will help you on your journey to enable zero trust access.

Thank you to the editors of the BeyondCorp blog post series, Puneet Goel (Product Manager), Lior Tishbi (Program Manager), and Justin McWilliams (Engineering Manager).

Protecting against code reuse in the Linux kernel with Shadow Call Stack

October 30, 2019

Posted by Sami Tolvanen, Staff Software Engineer, Android Security & Privacy Team

The Linux kernel is responsible for enforcing much of Android’s security model, which is why we have put a lot of effort into hardening the Android Linux kernel against exploitation. In Android 9, we introduced support for Clang’s forward-edge Control-Flow Integrity (CFI) enforcement to protect the kernel from code reuse attacks that modify stored function pointers. This year, we have added backward-edge protection for return addresses using Clang’s Shadow Call Stack (SCS).
Google’s Pixel 3 and 3a phones have kernel SCS enabled in the Android 10 update, and Pixel 4 ships with this protection out of the box. We have made patches available to all supported versions of the Android kernel and also maintain a patch set against upstream Linux. This post explains how kernel SCS works, the benefits and trade-offs, how to enable the feature, and how to debug potential issues.

Return-oriented programming

As kernel memory protections increasingly make code injection more difficult, attackers commonly use control flow hijacking to exploit kernel bugs. Return-oriented programming (ROP) is a technique where the attacker gains control of the kernel stack to overwrite function return addresses and redirect execution to carefully selected parts of existing kernel code, known as ROP gadgets. While address space randomization and stack canaries can make this attack more challenging, return addresses stored on the stack remain vulnerable to many overwrite flaws. The general availability of tools for automatically generating this type of kernel exploit makes protecting against it increasingly important.

Shadow Call Stack

One method of protecting return addresses is to store them in a separately allocated shadow stack that’s not vulnerable to traditional buffer overflows. This can also help protect against arbitrary overwrite attacks.
Clang added the Shadow Call Stack instrumentation pass for arm64 in version 7. When enabled, each non-leaf function that pushes the return address to the stack will be instrumented with code that also saves the address to a shadow stack. A pointer to the current task’s shadow stack is always kept in the x18 register, which is reserved for this purpose. Here’s what instrumentation looks like in a typical kernel function:

SCS doesn’t require error handling as it uses the return address from the shadow stack unconditionally. Compatibility with stack unwinding for debugging purposes is maintained by keeping a copy of the return address in the normal stack, but this value is never used for control flow decisions.
Despite requiring a dedicated register, SCS has minimal performance overhead. The instrumentation itself consists of one load and one store instruction per function, which results in a performance impact that’s within noise in our benchmarking. Allocating a shadow stack for each thread does increase the kernel’s memory usage but as only return addresses are stored, the stack size defaults to 1kB. Therefore, the overhead is a fraction of the memory used for the already small regular kernel stacks.
SCS patches are available for Android kernels 4.14 and 4.19, and for upstream Linux. It can be enabled using the following configuration options:

CONFIG_SHADOW_CALL_STACK=y
# CONFIG_SHADOW_CALL_STACK_VMAP is not set
# CONFIG_DEBUG_STACK_USAGE is not set

By default, shadow stacks are not virtually allocated to minimize memory overhead, but CONFIG_SHADOW_CALL_STACK_VMAP can be enabled for better stack exhaustion protection. With CONFIG_DEBUG_STACK_USAGE, the kernel will also print out shadow stack usage in addition to normal stack usage which can be helpful when debugging issues.

Alternatives

Signing return addresses using ARMv8.3 Pointer Authentication (PAC) is an alternative to shadow stacks. PAC has similar security properties and comparable performance to SCS but without the memory allocation overhead. Unfortunately, PAC requires hardware support, which means it cannot be used on existing devices, but may be a viable option for future devices. For x86, Intel’s Control-flow Enforcement Technology (CET) extension will offer a native shadow stack support, but also requires compatible hardware.

Conclusion

We have improved Linux kernel code reuse attack protections on Pixel devices running Android 10. Pixel 3, 3a, and 4 kernels have both CFI and SCS enabled and we have made patches available to all Android OEMs.

Improving Site Isolation for Stronger Browser Security

October 17, 2019

Posted by Charlie Reis, Site Isolator

The Chrome Security team values having multiple lines of defense. Web browsers are complex, and malicious web pages may try to find and exploit browser bugs to steal data. Additional lines of defense, like sandboxes, make it harder for attackers to access your computer, even if bugs in the browser are exploited. With Site Isolation, Chrome has gained a new line of defense that helps protect your accounts on the Web as well.

Site Isolation ensures that pages from different sites end up in different sandboxed processes in the browser. Chrome can thus limit the entire process to accessing data from only one site, making it harder for an attacker to steal cross-site data. We started isolating all sites for desktop users back in Chrome 67, and now we’re excited to enable it on Android for sites that users log into in Chrome 77. We've also strengthened Site Isolation on desktop to help defend against even fully compromised processes.

Site Isolation helps defend against two types of threats. First, attackers may try to use advanced "side channel" attacks to leak sensitive data from a process through unexpected means. For example, Spectre attacks take advantage of CPU performance features to access data that should be off limits. With Site Isolation, it is harder for the attacker to get cross-site data into their process in the first place.

Second, even more powerful attackers may discover security bugs in the browser, allowing them to completely hijack the sandboxed process. On desktop platforms, Site Isolation can now catch these misbehaving processes and limit their access to cross-site data. We're working to bring this level of hijacked process protection to Android in the future as well.

Thanks to this extra line of defense, Chrome can now help keep your web accounts even more secure. We are still making improvements to get the full benefits of Site Isolation, but this change gives Chrome a solid foundation for protecting your data.

If you’d like to learn more, check out our technical write up on the Chromium blog.

USB-C Titan Security Keys - available tomorrow in the US

October 14, 2019

Posted by Christiaan Brand, Product Manager, Google Cloud

Securing access to online accounts is critical for safeguarding private, financial, and other sensitive data online. Phishing - where an attacker tries to trick you into giving them your username and password - is one of the most common causes of data breaches. To protect user accounts, we’ve long made it a priority to offer users many convenient forms of 2-Step Verification (2SV), also known as two-factor authentication (2FA), in addition to Google’s automatic protections. These measures help to ensure that users are not relying solely on passwords for account security.

For users at higher risk (e.g., IT administrators, executives, politicians, activists) who need more effective protection against targeted attacks, security keys provide the strongest form of 2FA. To make this phishing-resistant security accessible to more people and businesses, we recently built this capability into Android phones, expanded the availability of Titan Security Keys to more regions (Canada, France, Japan, the UK), and extended Google’s Advanced Protection Program to the enterprise.

Starting tomorrow, you will have an additional option: Google’s new USB-C Titan Security Key, compatible with your Android, Chrome OS, macOS, and Windows devices.

USB-C Titan Security Key

We partnered with Yubico to manufacture the USB-C Titan Security Key. We have had a long-standing working and customer relationship with Yubico that began in 2012 with the collaborative effort to create the FIDO Universal 2nd Factor (U2F) standard, the first open standard to enable phishing-resistant authentication. This is the same security technology that we use at Google to protect access to internal applications and systems.

USB-C Titan Security Keys are built with a hardware secure element chip that includes firmware engineered by Google to verify the key’s integrity. This is the same secure element chip and firmware that we use in our existing USB-A/NFC and Bluetooth/NFC/USB Titan Security Key models manufactured in partnership with Feitian Technologies.

USB-C Titan Security Keys will be available tomorrow individually for $40 on the Google Store in the United States. USB-A/NFC and Bluetooth/NFC/USB Titan Security Keys will also become available individually in addition to the existing bundle. Bulk orders are available for enterprise organizations in select countries.

We highly recommend all users at a higher risk of targeted attacks to get Titan Security Keys and enroll into the Advanced Protection Program (APP), which provides Google’s industry-leading security protections to defend against evolving methods that attackers use to gain access to your accounts and data. You can also use Titan Security Keys for any site where FIDO security keys are supported for 2FA, including your personal or work Google Account, 1Password, Coinbase, Dropbox, Facebook, GitHub, Salesforce, Stripe, Twitter, and more.

No More Mixed Messages About HTTPS

October 3, 2019

Posted by Emily Stark and Carlos Joan Rafael Ibarra Lopez, Chrome security team

Update (04/06/2020): Mixed image autoupgrading was originally scheduled for Chrome 81, but will be delayed until at least Chrome 84. Check the Chrome Platform Status entry for the latest information about when mixed images will be autoupgraded and blocked if they fail to load over https://. Sites with mixed images will continue to trigger the “Not Secure” warning.

Today we’re announcing that Chrome will gradually start ensuring that https:// pages can only load secure https:// subresources. In a series of steps outlined below, we’ll start blocking mixed content (insecure http:// subresources on https:// pages) by default. This change will improve user privacy and security on the web, and present a clearer browser security UX to users.
In the past several years, the web has made great progress in transitioning to HTTPS: Chrome users now spend over 90% of their browsing time on HTTPS on all major platforms. We’re now turning our attention to making sure that HTTPS configurations across the web are secure and up-to-date.
HTTPS pages commonly suffer from a problem called mixed content, where subresources on the page are loaded insecurely over http://. Browsers block many types of mixed content by default, like scripts and iframes, but images, audio, and video are still allowed to load, which threatens users’ privacy and security. For example, an attacker could tamper with a mixed image of a stock chart to mislead investors, or inject a tracking cookie into a mixed resource load. Loading mixed content also leads to a confusing browser security UX, where the page is presented as neither secure nor insecure but somewhere in between.
In a series of steps starting in Chrome 79, Chrome will gradually move to blocking all mixed content by default. To minimize breakage, we will autoupgrade mixed resources to https://, so sites will continue to work if their subresources are already available over https://. Users will be able to enable a setting to opt out of mixed content blocking on particular websites, and below we’ll describe the resources available to developers to help them find and fix mixed content.
Timeline
Instead of blocking all mixed content all at once, we’ll be rolling out this change in a series of steps.

In Chrome 79, releasing to stable channel in December 2019, we’ll introduce a new setting to unblock mixed content on specific sites. This setting will apply to mixed scripts, iframes, and other types of content that Chrome currently blocks by default. Users can toggle this setting by clicking the lock icon on any https:// page and clicking Site Settings. This will replace the shield icon that shows up at the right side of the omnibox for unblocking mixed content in previous versions of desktop Chrome.

Accessing Site settings, from which users will be able to unblock mixed content loads in Chrome 79.

In Chrome 80, mixed audio and video resources will be autoupgraded to https://, and Chrome will block them by default if they fail to load over https://. Chrome 80 will be released to early release channels in January 2020. Users can unblock affected audio and video resources with the setting described above.
Also in Chrome 80, mixed images will still be allowed to load, but they will cause Chrome to show a “Not Secure” chip in the omnibox. We anticipate that this is a clearer security UI for users and that it will motivate websites to migrate their images to HTTPS. Developers can use the upgrade-insecure-requests or block-all-mixed-content Content Security Policy directives to avoid this warning.

Omnibox treatment for websites that load mixed images in Chrome 80.

In Chrome 81, mixed images will be autoupgraded to https://, and Chrome will block them by default if they fail to load over https://. Chrome 81 will be released to early release channels in February 2020.

Resources for developers
Developers should migrate their mixed content to https:// immediately to avoid warnings and breakage. Here are some resources:

Use Content Security Policy and Lighthouse’s mixed content audit to discover and fix mixed content on your site.
See this guide for general advice on migrating servers to HTTPS.
Check with your CDN, web host, or content management system to see if they have special tools for debugging mixed content. For example, Cloudflare offers a tool to rewrite mixed content to https://, and WordPress plugins are available as well.

Chrome UI for Deprecating Legacy TLS Versions

October 1, 2019

Posted by Chris Thompson, Chrome security team
[Cross-posted from the Chromium blog]

Update (04/06/2020): The removal of legacy TLS versions was originally scheduled for Chrome 81, but is being delayed until at least Chrome 84. Chrome will continue to show the “Not Secure” indicator for sites using TLS 1.0 or 1.1, and Chrome 81 Beta will show the full page interstitial warning for affected sites. Our hope is that this will help alert affected site owners ahead of the delayed removal. Check the Chrome Platform Status entry for the latest information about the removal of TLS 1.0 and 1.1 support.

Last October we announced our plans to remove support for TLS 1.0 and 1.1 in Chrome 81. In this post we’re announcing a pre-removal phase in which we’ll introduce a gentler warning UI, and previewing the UI that we’ll use to block TLS 1.0 and 1.1 in Chrome 81. Site administrators should immediately enable TLS 1.2 or later to avoid these UI treatments.
While legacy TLS usage has decreased, we still see over 0.5% of page loads using these deprecated versions. To ease the transition to the final removal of support and to reduce user surprise when outdated configurations stop working, Chrome will discontinue support in two steps: first, showing new security indicators for sites using these deprecated versions; and second, blocking connections to these sites with a full page warning.
Pre-removal warning
Starting January 13, 2020, for Chrome 79 and higher, we will show a “Not Secure” indicator for sites using TLS 1.0 or 1.1 to alert users to the outdated configuration:

The new security indicator and connection security information that will be shown to users who visit a site using TLS 1.0 or 1.1 starting in January 2020.

When a site uses TLS 1.0 or 1.1, Chrome will downgrade the security indicator and show a more detailed warning message inside Page Info. This change will not block users from visiting or using the page, but will alert them to the downgraded security of the connection.
Note that Chrome already shows warnings in DevTools to alert site owners that they are using a deprecated version of TLS.
Removal UI
In Chrome 81, which will be released to the Stable channel in March 2020, we will begin blocking connections to sites using TLS 1.0 or 1.1, showing a full page interstitial warning:

The full screen interstitial warning that will be shown to users who visit a site using TLS 1.0 or 1.1 starting in Chrome 81. Final warning subject to change.

Site administrators should immediately enable TLS 1.2 or later. Depending on server software (such as Apache or nginx), this may be a configuration change or a software update. Additionally, we encourage all sites to revisit their TLS configuration. In our original announcement, we outlined our current criteria for modern TLS.
Enterprise deployments can preview the final removal of TLS 1.0 and 1.1 by setting the SSLVersionMin policy to “tls1.2”. This will prevent clients from connecting over these protocol versions. For enterprise deployments that need more time, this same policy can be used to re-enable TLS 1.0 or TLS 1.1 and disable the warning UIs until January 2021.

Security Blog