Hunting for Bucket Traversals in Google's Client Libraries
Table of Contents
Preface
This writeup picks up pretty much where the last one ended, that is when I found an exploitable instance of a bucket traversal vulnerability and stumbled on an N-day in Go Cloud Storage client library.
Intrigued by this finding, I decided to audit other Cloud Storage client libraries focused solely on variants of similar issues.
Bucket Traversal 101
So what is a Bucket Traversal to begin with?
The above definition is not wrong but it doesn’t mention the vital Cloud IAM prerequisite, which also has to be met:
The Service Account (non-human identity) attached to the targed workload, has to be granted IAM permissions to read/write/edit bucket(s) other than the intended one
For the purpose of this writeup, I will differentiate between two subsets of bucket traversal issues:
- Application level ie. faulty business logic, lack of input validation etc.
- Library level ie. security bugs in opinionated SDKs often maintained by Cloud Service Providers
There is little I could add to what practices could prevent issues from #1
(AppSec 101)
Therefore, I will focus primarily on class #2
, that is when the implementation of the vendor maintained library is vulnerable itself.
Case study
TL;DR
The method google.cloud.storage.transfer_manager.upload_chunks_concurrently() was vulnerable to a variant of a path (bucket) traversal.
Timeline:
- Faulty function was introduced in version 2.11.0 on
September 19th 2023
- I submitted this issue to Google VRP on
July 14th 2024
- The vulnerability was fixed in version 2.18.1 on
August 6th 2024
Here’s the recently disclosed report -> https://bughunters.google.com/reports/vrp/h1K5SciPh
Why is there no GitHub Security Advisory (GHSA) and/or CVE published you might wonder (?)
Well that’s a topic for a separate discussion - I was told that at least a post factum comment will be eventually added.
Overview
Python Client for Google Cloud Storage() is an Open Source project maintained by Google.
This library is used in many foundational Python-based ML/AI Open Source projects such as:
The relatively modest number of stars on GitHub does not properly reflects its significance
Technical analysis
Google Cloud Storage exposes three distinct APIs:
I decided to focus on the XML API due to its subjectively error prone schema & interoperability with Amazon Simple Storage Service (Amazon S3)
After some brief source code review & grey box testing, I pinpointed a spot where a traversal could occur:
https://github.com/googleapis/python-storage/blob/d5d3c68a6e5c6f8cefc59892c1ccceaf181ff32d/google/cloud/storage/transfer_manager.py#L1084-L1087
Issue stemmed from the fact that the URL path was constructed insecurely (lack of context specific encoding)
url = "{hostname}/{bucket}/{blob}".format(
hostname=hostname, bucket=bucket.name, blob=blob.name
)
As a result, if blob.name
was supplied from user input, then an attacker could make use of the classic dot-dot-slash technique and upload a file to a bucket unintended by the victim eg. ../bucket/object
PoC
Here’s the orginal PoC recording, based on the official sample snippet
Attack scenario
Depending on the IAM permissions granted to the underlying Service Account this could lead to malicious scenarios such as:
- overwriting existing files (data & integrity loss)
- upload of an object later consumed by the application (config override, XSS etc.)
Potential impact associated with vector #1
is self evident.
I think that scenario #2
is far more interesting.
Diagram of a sample vulnerable application
Prepared a diagram of a sample vulnerable application GigaUpload
to better convey the idea.
This fictitious service meets following criteria:
- large file upload implemented using google.cloud.storage.transfer_manager.upload_chunks_concurrently()
- client side behaviour (features, flags etc.) managed via config files fetched from a dedicated GCS bucket
Summary
Bucket traversal appears to be an underresearched class of vulnerabilities, requiring significant context-specific knowledge for comprehensive understanding.
It exists at the intersection of traditional Application Security (AppSec) and Cloud Security, underscoring the critical need to integrate these two domains.