Getting list of file names in bucket in Google Cloud Storage using Python
Start your free 7-days trial now!
Prerequisites
To follow along with this guide, please make sure to have:
created a GCP (Google Cloud Platform) project
created a service account and downloaded the private key (JSON file) for authentication
installed the Python client library for Google Cloud Storage (GCS):
pip install --upgrade google-cloud-storage
If you haven't, then please check out my detailed guide first!
Getting list of file names in Google Cloud Storage bucket
Suppose we have the following two files on Google Cloud Storage (GCS):
├─ cat.png├─ uploaded_sample.txt
To get the list of file names in a certain bucket, use the list_blobs(~)
method, which returns a list of blobs (files):
from google.cloud import storage# Authenticate ourselves using the service account private keypath_to_private_key = './gcs-project-354207-099ef6796af6.json'client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
blobs = client.list_blobs('example-bucket-skytowner')for blob in blobs:
uploaded_sample.txtcat.png
Here, note the following:
our private key (JSON file) resides in the same directory as this Python script.
the name of our bucket is called
example-bucket-skytowner
.each blob has the
name
property that represents the file name.
Getting list of file names under a specific folder
Suppose we have a folder called my_folder
that contains the following two files in GCS:
📁 my_folder ├─ cat.png ├─ uploaded_sample.txt
To fetch the list of file names under my_folder
, use the list_blobs(~)
method with the prefix
argument:
path_to_private_key = './gcs-project-354207-099ef6796af6.json'client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)bucket = storage.Bucket(client, 'example-bucket-skytowner')
str_folder_name_on_gcs = 'my_folder/'blobs = bucket.list_blobs(prefix=str_folder_name_on_gcs)for blob in blobs:
my_folder/my_folder/cat.pngmy_folder/uploaded_sample.txt
Notice how the first blob represents a folder, while the latter two are the files. By setting the prefix argument to be 'my_folder/'
, we are fetching all the blobs that begin with 'my_folder/'
, which includes the directory blob. Since directories are characterized by an ending '/'
, we can filter them out as we iterate like so:
my_folder/cat.pngmy_folder/uploaded_sample.txt