A major data leak at Microsoft illustrates the importance of implementing robust security processes, as the company has revealed that an employee accidentally disclosed an “overly permissive” shared access signature (SAS) token while training an artificial intelligence (AI) model. When even the biggest tech multinational companies can expose themselves to SAS security breaches, what should individuals do to keep their data secure?
What Happened at Microsoft?
In 2020, while contributing to open-source AI learning models to a public GitHub repository, a Microsoft employee shared a URL using the SAS token feature of Microsoft cloud computing platform Azure, which allows users to share data from Azure Storage accounts. The URL included a SAS token for an internal storage account, which had excessive privileges that allowed access to information.
The account contained 38TB of private data, including a disk backup of workstation profiles for two former employees. The backup included private keys, passwords to Microsoft services, and more than 30,000 internal Microsoft Teams messages from 359 employees.
The security threat was not identified until June 2023, when analysts at cloud security firm Wiz.io found the accidental data exposure during an Internet scan for misconfigured storage containers. Wiz.io worked with Microsoft to revoke the SAS token, prevent external access to the account, and investigate any impact on customers or business continuity. “No customer data was exposed, and no other internal services were put at risk because of this issue,” Microsoft stated.
In addition to providing access, the token was misconfigured to allow “full control” instead of read-only permissions. This meant that an attacker could not only view the private files, but they could delete or overwrite them. As the token shared the data with the GitHub repository, an attacker could have injected malicious code into all the AI models in the storage account, infecting every user.
GitHub’s scanning service monitors all public open-source code changes for exposure of credentials and private data. This includes SAS detection that flags Azure Storage SAS URLs pointing to sensitive content. Still, Microsoft has now expanded this detection to include any SAS token that could have overly permissive privileges or expiration times.
What is a SAS Token?
A SAS token is a security token or URL that is intended to provide limited and time-bound access to specific resources in a cloud-based service. This acts like the shared URL you might send a friend to access a file in your Microsoft OneDrive or Google Drive. There are three types of SAS tokens: Account, Service, and User Delegation. An Account SAS token was used in Microsoft’s repository.
Microsoft Azure typically uses SAS tokens, but also in other cloud platforms such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). SAS tokens are often used to grant temporary and controlled access to data or services without exposing sensitive credentials or allowing unrestricted access.
SAS tokens can give users access to a single file, a container, or an entire storage account. The token permissions to access the data can be customized, such as read, write, delete, or list, ranging from read-only to full control, and they can restrict access to specific resources. They have a fully customizable expiration time, which limits when they can be used to reduce the risk of long-term exposure. They can also be revoked before their expiration time, terminating access to the data.
SAS tokens are often appended to the URL of the file, folder, or other resource they provide access to, eliminating the need for the user to have the account credentials.
To generate a SAS token, you need the appropriate permissions for the resource you want to grant access to and a service-specific method or tool – such as Azure SDK, Azure Portal, or command-line tools — to create the token with the right permissions and duration.
Benefits and Shortcomings of SAS Tokens
SAS tokens provide a secure mechanism to give users access to specific data within a storage account, unlike a Shared Key, which has full access to an entire account. SAS tokens can restrict the resources a user can access, which operations a user can perform, what network a client can access from, and how long they have access. This provides control for the token issuer and agility for the user, but it also creates the risk of granting too much access – such as in Microsoft’s case.
SAS tokens can be configured to last “effectively forever,” Wiz.io noted. The first token that Microsoft shared with the AI GitHub repository was added in July 2020, and it remained valid until October 5, 2021, but on October 6, 2021, the expiry was updated to October 6, 2051.
The problem is that there is no way for an Azure account administrator to know when a user creates a highly permissive, non-expiring token or where it is circulating. And revoking a token also makes all other tokens signed by the same key ineffective. This makes SAS tokens attractive to attackers looking to exploit unintentional data exposure.
“A recent Microsoft report indicates that attackers are taking advantage of the service’s lack of monitoring capabilities in order to issue privileged SAS tokens as a backdoor,” Wiz.io’s analysts said.
Best Practices for Using SAS Tokens
Large data breaches can cost individuals their personal resources and safety and cost businesses millions of dollars in regulatory fines, remedial measures, and customers’ trust. As unauthorized access to SAS tokens can result in unauthorized access to your cloud data or services, it is essential to manage them carefully. Here are some best practices to follow:
- Secure your network. Before adopting Microsoft Azure, consider how to secure access to the cloud network, such as with network security groups and Azure Firewall.
- Limit permissions. When generating SAS URLs, apply the principle of least privilege and restrict them to only the necessary resources, such as a single file or folder, with just the permissions the user needs to do their job, such as read-only or write-only access.
- Set short expiration times. Always set a relatively short expiration time and have users request new SAS tokens as needed.
- Create dedicated storage accounts for external sharing. Having separate accounts for resources that will be shared with external users limits the potential impact of an overly-permission token or security breach.
- Exercise caution. Treat tokens as sensitive data and share them only with users who require access to a particular storage account.
- Avoid using account SAS for external sharing. The lack of security and governance over account SAS tokens means they should be considered as sensitive as an account key and not shared externally.
- Establish a strategy to revoke tokens. Set a stored access policy and be prepared to revoke token access if they become compromised.
- Monitor and audit applications. Track activity and check how requests to your storage account are authenticated. Set an expiration policy to identify users with long-lasting SAS URLs.
- Train users or employees. Educate users on the importance of keeping SAS URLs secure to limit who has access to them.
Following best practices to create and handle SAS tokens appropriately can help minimize the risk of unintended access or abuse.
In the wake of the breach, Microsoft said it will improve its detection and scanning tools to proactively identify over-provisioned SAS URLs and enhance its default security posture.
Microsoft’s security breach highlights the risks of oversharing data and supply chain attacks. The overly-permissioned SAS token granted full write access to the storage account, so a malicious attacker would have been able to infect it with malicious code that could attack other researchers accessing the GitHub repository and could have created further damage if the code became accessible to the public.
These threats will only increase as more researchers and companies work on sharing large datasets with AI models. It is critical for security teams to set clear guidelines for external data sharing, following best practices for using SAS tokens and other forms of cloud access, such as limiting permissions and siloing shared AI data.
“This case is an example of the new risks organizations face when starting to leverage the power of AI more broadly,” Wiz.io’s analysts stated. “As data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards.”