search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Writing a Pandas DataFrame to Google Cloud Storage in Python

schedule Aug 10, 2023
Last updated
local_offer
Cloud Computing
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Prerequisites

To follow along with this guide, please make sure to have:

  • created a service account and downloaded the private key (JSON file) for authentication (please check out my detailed guide)

  • installed the Python client library:

    pip install --upgrade google-cloud-storage

Writing Pandas DataFrame to Google Cloud Storage as a CSV file

Consider the following Pandas DataFrame:

import pandas as pd
df = pd.DataFrame({'A':[3,4],'B':[5,6]})
df.head()
A B
0 3 5
1 4 6

Case when you already have a bucket

To write this Pandas DataFrame to Google Cloud Storage (GCS) as a CSV file, use the blob's upload_from_string(~) method:

from google.cloud import storage
path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)

# The bucket on GCS in which to write the CSV file
bucket = client.bucket('test-bucket-skytowner')
# The name assigned to the CSV file on GCS
blob = bucket.blob('my_data.csv')
blob.upload_from_string(df.to_csv(), 'text/csv')

Note the following:

  • if the bucket with the specified name does not exist, then an error will be thrown

  • the DataFrame's to_csv() file converts the DataFrame into a string CSV:

    df.to_csv()
    ',A,B\n0,3,5\n1,4,6\n'
  • the second argument of upload_from_string(~) is the content type of the file

After running this code, we can see that my_data.csv has been written in our test-bucket-skytowner bucket on the GCS web console:

Case when you do not have a bucket

The above solution only works when you have already created a bucket in which to place the file on GCS - specifying a bucket that does not exist will throw an error. Therefore, we must first create a bucket on GCS using the method create_bucket(~), which returns the created bucket:

from google.cloud import storage
path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)

# The NEW bucket on GCS in which to write the CSV file
bucket = client.create_bucket('test-v2-bucket-skytowner')
# The name assigned to the CSV file on GCS
blob = bucket.blob('my_data.csv')
blob.upload_from_string(df.to_csv(), 'text/csv')

Writing Pandas DataFrame to Google Cloud Storage as a feather file

The logic for writing a Pandas DataFrame to GCS as a feather file is very similar to the CSV case, except that we must first write the feather file locally, and then upload this file using the method upload_from_filename(~):

import pyarrow.feather as feather
feather.write_feather(df, './feather_df')

# The bucket in which to place the feather file on GCS
bucket = storage.Bucket(client, 'example-bucket-skytowner')
# The name to assign to the feather file on GCS
blob = bucket.blob('my_data.feather')
blob.upload_from_filename('./feather_df')

After running this code, we should see the my_data.feather file appear on web GCS console:

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...