search
Search
Login
Math ML Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook

Downloading files from Google Cloud Storage using Python

Cloud Computing
chevron_right
Google Cloud Platform
chevron_right
Cloud Storage
chevron_right
Python client library
schedule Jul 1, 2022
Last updated
local_offer Cloud Computing
Tags

Prerequisites

To follow along with this guide, please make sure to have:

  • created a GCP (Google Cloud Platform) project

  • created a service account and downloaded the private key (JSON file) for authentication

  • installed the Python client library for Google Cloud Storage (GCS):

    pip install --upgrade google-cloud-storage

If you haven't, then please check out my detailed guide first!

Downloading a single file from Google Cloud Storage using Python

Suppose we have a text file called uploaded_sample.txt that lives in the bucket example-bucket-skytowner on Google Cloud Storage (GCS).

To download this file from GCS, use the download_to_filename(~) method:

from google.cloud import storage

path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
bucket = storage.Bucket(client, 'example-bucket-skytowner')
blob = bucket.blob('uploaded_sample.txt')
blob.download_to_filename('.downloaded_file')

Note the following:

  • the credential JSON file for the service account resides in the same directory as this Python script

  • example-bucket-skytowner is the name of the bucket in which the file resides

  • uploaded_sample.txt is the name of the file on GCS that you wish to download

  • the download_to_filename(~) method takes as argument the path of where the file should be downloaded to.

After running this code, we should see a file called downloaded_file in the same directory as this Python script.

Referencing blob and bucket name

We can reference the names of our file and bucket using the name property:

bucket = storage.Bucket(client, 'example-bucket-skytowner')
blob = bucket.blob('uploaded_sample.txt')
print(f'Bucket name: {bucket.name}')
print(f'Blob name: {blob.name}')
blob.download_to_filename(f'{bucket.name}_{blob.name}')
Bucket name: example-bucket-skytowner
Blob name: uploaded_sample.txt

The name property is oftentimes quite handy when organizing where the files should be locally downloaded to. We will see examples of this later in this guide.

Downloading to a directory using relative path

The download_to_filename(~) will throw an error if we supply a local path that does not exist. For instance, suppose we wanted to download a file in a local downloads directory, which currently does not exist:

blob.download_to_filename(f'./downloads/{blob.name}')
FileNotFoundError: [Errno 2] No such file or directory: './downloads/uploaded_sample.txt'

The way to get around this is to create the folders using the method mkdir(~) in the Path library before we call the download_to_filename(~) method:

from pathlib import Path
path_folder = f'./downloads/{bucket.name}'
# Create this folder locally if it does not exist
# parents=True will create intermediate directories if they do not exist
Path(path_folder).mkdir(parents=True, exist_ok=True)
blob = bucket.blob('uploaded_sample.txt')
blob.download_to_filename(f'{path_folder}/{blob.name}')

When running this code, the directory downloads/example-bucket-skytowner will be created if they do not exist yet, and the file will be downloaded in this directory. The final local path of the downloaded file would therefore be:

./downloads/example-bucket-skytowner/uploaded_sample.txt

Handling error in case of file not found

Trying to download files that do not exist in GCS will throw a 404 NotFound error:

blob = bucket.blob('.some_non_existing_file')
blob.download_to_filename('./downloaded_file')
NotFound: 404 GET https://storage.googleapis.com/download/storage/v1/b/example-bucket-skytowner/o/.some_non_existing_file?alt=media:
No such object: example-bucket-skytowner/.some_non_existing_file:
('Request failed with status code', 404 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)

To account for this case, we can wrap our methods in a try-except clause:

from google.cloud.exceptions import NotFound

try:
blob = bucket.blob('.some_non_existing_file')
blob.download_to_filename('./downloaded_file')
except NotFound:
print(f'🚨 {blob.name} does not exist - do something')
# Handle this case
🚨 .some_non_existing_file does not exist - do something

Note the following:

  • we had to import the NotFound error from google.cloud.exceptions.

Downloading multiple files from Google Cloud Storage

Currently, GCS only allows downloading files one at a time. Therefore, we must iteratively call the download_to_filename(~) method to download multiple files from GCS.

The following code block extends the case of downloading a single file:

from google.cloud import storage

path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
bucket = storage.Bucket(client, 'example-bucket-skytowner')

list_files_to_download = ['uploaded_sample.txt', 'cat.png']
for file_to_download in list_files_to_download:
blob = bucket.blob(file_to_download)
blob.download_to_filename(f'./{blob.name}')

Once running this code, we should see the files uploaded_sample.txt and cat.png downloaded in the same directory as this Python file.

Downloading a folder from Google Cloud Storage

Suppose we have the following two files under a folder called my_folder on GCS:

📁 my_folder
├─ cat.png
├─ uploaded_sample.txt

To download all files inside the folder my_folder:

path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
bucket = storage.Bucket(client, 'example-bucket-skytowner')

str_folder_name_on_gcs = 'my_folder/'

# Create the directory locally
Path(str_folder_name_on_gcs).mkdir(parents=True, exist_ok=True)

blobs = bucket.list_blobs(prefix=str_folder_name_on_gcs)
for blob in blobs:
if not blob.name.endswith('/'):
# This blob is not a directory!
print(f'Downloading file [{blob.name}]')
blob.download_to_filename(f'./{blob.name}')
Downloading file [my_folder/cat.png]
Downloading file [my_folder/uploaded_sample.txt]

After running this code, we should see a new my_folder folder containing the two files in our current directory:

├─ script.py
📁 my_folder
├─ cat.png
├─ uploaded_sample.txt

Now, let's explain how our code works:

  • the list_blobs(~) method takes in as argument prefix which allows us to fetch all blobs starting with prefix.

  • in our case, we are fetching blobs whose name begins with 'my_folder/'. Unfortunately, my_folder/ which represents a directory in GCS is also fetched as a blob. Since we do not want to download directory blobs, we filter these blobs out by ignoring those that end with the '/' character.

  • even though the file name is my_folder/cat.png, the method download_to_filename(~) will place the cat.png inside the folder my_folder. We must make sure that this folder exists by using the built-in Path(~) library - otherwise a DirectoryNotFound error will occur.

Downloading the content of files in memory

Instead of downloading an actual file to a local path, suppose we wanted to store the content of the file in a variable. For instance, let's read the content of a text file on GCS called uploaded_sample.txt in memory using the download_as_string() method:

from google.cloud import storage

path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
# The name of our bucket
bucket = storage.Bucket(client, 'example-bucket-skytowner')
# The name of the file on GCS
blob = bucket.blob('uploaded_sample.txt')

byte_str_file_content = blob.download_as_string()
str_file_content = byte_str_file_content.decode('utf-8')
print(str_file_content)
This is some sample text.
Hello World.

Note the following:

  • the download_as_string(~) method returns a byte string

  • we use the decode('utf-8') method to convert the byte string into a standard string

  • the content of our text file ('uploaded_sample.txt') is printed in the output

mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...