search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Downloading files from Google Cloud Storage using Python

schedule Aug 10, 2023
Last updated
local_offer
Cloud Computing
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Prerequisites

To follow along with this guide, please make sure to have:

  • created a GCP (Google Cloud Platform) project

  • created a service account and downloaded the private key (JSON file) for authentication

  • installed the Python client library for Google Cloud Storage (GCS):

    pip install --upgrade google-cloud-storage

If you haven't, then please check out my detailed guide first!

Downloading a single file from Google Cloud Storage using Python

Suppose we have a text file called uploaded_sample.txt that lives in the bucket example-bucket-skytowner on Google Cloud Storage (GCS).

To download this file from GCS, use the download_to_filename(~) method:

from google.cloud import storage

path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
bucket = storage.Bucket(client, 'example-bucket-skytowner')
blob = bucket.blob('uploaded_sample.txt')
blob.download_to_filename('.downloaded_file')

Note the following:

  • the credential JSON file for the service account resides in the same directory as this Python script

  • example-bucket-skytowner is the name of the bucket in which the file resides

  • uploaded_sample.txt is the name of the file on GCS that you wish to download

  • the download_to_filename(~) method takes as argument the path of where the file should be downloaded to.

After running this code, we should see a file called downloaded_file in the same directory as this Python script.

Referencing blob and bucket name

We can reference the names of our file and bucket using the name property:

bucket = storage.Bucket(client, 'example-bucket-skytowner')
blob = bucket.blob('uploaded_sample.txt')
print(f'Bucket name: {bucket.name}')
print(f'Blob name: {blob.name}')
blob.download_to_filename(f'{bucket.name}_{blob.name}')
Bucket name: example-bucket-skytowner
Blob name: uploaded_sample.txt

The name property is oftentimes quite handy when organizing where the files should be locally downloaded to. We will see examples of this later in this guide.

Downloading to a directory using relative path

The download_to_filename(~) will throw an error if we supply a local path that does not exist. For instance, suppose we wanted to download a file in a local downloads directory, which currently does not exist:

blob.download_to_filename(f'./downloads/{blob.name}')
FileNotFoundError: [Errno 2] No such file or directory: './downloads/uploaded_sample.txt'

The way to get around this is to create the folders using the method mkdir(~) in the Path library before we call the download_to_filename(~) method:

from pathlib import Path
path_folder = f'./downloads/{bucket.name}'
# Create this folder locally if it does not exist
# parents=True will create intermediate directories if they do not exist
Path(path_folder).mkdir(parents=True, exist_ok=True)
blob = bucket.blob('uploaded_sample.txt')
blob.download_to_filename(f'{path_folder}/{blob.name}')

When running this code, the directory downloads/example-bucket-skytowner will be created if they do not exist yet, and the file will be downloaded in this directory. The final local path of the downloaded file would therefore be:

./downloads/example-bucket-skytowner/uploaded_sample.txt

Handling error in case of file not found

Trying to download files that do not exist in GCS will throw a 404 NotFound error:

blob = bucket.blob('.some_non_existing_file')
blob.download_to_filename('./downloaded_file')
NotFound: 404 GET https://storage.googleapis.com/download/storage/v1/b/example-bucket-skytowner/o/.some_non_existing_file?alt=media:
No such object: example-bucket-skytowner/.some_non_existing_file:
('Request failed with status code', 404 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)

To account for this case, we can wrap our methods in a try-except clause:

from google.cloud.exceptions import NotFound

try:
blob = bucket.blob('.some_non_existing_file')
blob.download_to_filename('./downloaded_file')
except NotFound:
print(f'🚨 {blob.name} does not exist - do something')
# Handle this case
🚨 .some_non_existing_file does not exist - do something

Note the following:

  • we had to import the NotFound error from google.cloud.exceptions.

Downloading multiple files from Google Cloud Storage

Currently, GCS only allows downloading files one at a time. Therefore, we must iteratively call the download_to_filename(~) method to download multiple files from GCS.

The following code block extends the case of downloading a single file:

from google.cloud import storage

path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
bucket = storage.Bucket(client, 'example-bucket-skytowner')

list_files_to_download = ['uploaded_sample.txt', 'cat.png']
for file_to_download in list_files_to_download:
blob = bucket.blob(file_to_download)
blob.download_to_filename(f'./{blob.name}')

Once running this code, we should see the files uploaded_sample.txt and cat.png downloaded in the same directory as this Python file.

Downloading a folder from Google Cloud Storage

Suppose we have the following two files under a folder called my_folder on GCS:

📁 my_folder
├─ cat.png
├─ uploaded_sample.txt

To download all files inside the folder my_folder:

path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
bucket = storage.Bucket(client, 'example-bucket-skytowner')

str_folder_name_on_gcs = 'my_folder/'

# Create the directory locally
Path(str_folder_name_on_gcs).mkdir(parents=True, exist_ok=True)

blobs = bucket.list_blobs(prefix=str_folder_name_on_gcs)
for blob in blobs:
if not blob.name.endswith('/'):
# This blob is not a directory!
print(f'Downloading file [{blob.name}]')
blob.download_to_filename(f'./{blob.name}')
Downloading file [my_folder/cat.png]
Downloading file [my_folder/uploaded_sample.txt]

After running this code, we should see a new my_folder folder containing the two files in our current directory:

├─ script.py
📁 my_folder
├─ cat.png
├─ uploaded_sample.txt

Now, let's explain how our code works:

  • the list_blobs(~) method takes in as argument prefix which allows us to fetch all blobs starting with prefix.

  • in our case, we are fetching blobs whose name begins with 'my_folder/'. Unfortunately, my_folder/ which represents a directory in GCS is also fetched as a blob. Since we do not want to download directory blobs, we filter these blobs out by ignoring those that end with the '/' character.

  • even though the file name is my_folder/cat.png, the method download_to_filename(~) will place the cat.png inside the folder my_folder. We must make sure that this folder exists by using the built-in Path(~) library - otherwise a DirectoryNotFound error will occur.

Downloading the content of files in memory

Instead of downloading an actual file to a local path, suppose we wanted to store the content of the file in a variable. For instance, let's read the content of a text file on GCS called uploaded_sample.txt in memory using the download_as_string() method:

from google.cloud import storage

path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
# The name of our bucket
bucket = storage.Bucket(client, 'example-bucket-skytowner')
# The name of the file on GCS
blob = bucket.blob('uploaded_sample.txt')

byte_str_file_content = blob.download_as_string()
str_file_content = byte_str_file_content.decode('utf-8')
print(str_file_content)
This is some sample text.
Hello World.

Note the following:

  • the download_as_string(~) method returns a byte string

  • we use the decode('utf-8') method to convert the byte string into a standard string

  • the content of our text file ('uploaded_sample.txt') is printed in the output

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...