search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
chevron_leftMiscellaneous Cookbook
Adjusting number of rows that are printedAppending DataFrame to an existing CSV fileChecking differences between two indexesChecking if a DataFrame is emptyChecking if a variable is a DataFrameChecking if index is sortedChecking if value exists in IndexChecking memory usage of DataFrameChecking whether a Pandas object is a view or a copyConcatenating a list of DataFramesConverting a DataFrame to a listConverting a DataFrame to a SeriesConverting DataFrame to a list of dictionariesConverting DataFrame to list of tuplesCounting the number of negative valuesCreating a DataFrame using cartesian product of two DataFramesDisplaying DataFrames side by sideDisplaying full non-truncated DataFrame valuesDrawing frequency histogram of DataFrame columnExporting Pandas DataFrame to PostgreSQL tableHighlighting a particular cell of a DataFrameHighlighting DataFrame cell based on valueHow to solve "ValueError: If using all scalar values, you must pass an index"Importing BigQuery table as Pandas DataFramePlotting two columns of DataFramePrinting DataFrame on a single linePrinting DataFrame without indexPrinting DataFrames in tabular formatRandomly splitting DataFrame into multiple DataFrames of equal sizeReducing DataFrame memory sizeSaving a DataFrame as a CSV fileSaving DataFrame as Excel fileSaving DataFrame as feather fileSetting all values to zeroShowing all dtypes without truncationSplitting DataFrame into multiple DataFrames based on valueSplitting DataFrame into smaller equal-sized DataFramesWriting DataFrame to SQLite
check_circle
Mark as learned
thumb_up
0
thumb_down
1
chat_bubble_outline
0
Comment
auto_stories Bi-column layout
settings

Importing BigQuery table as Pandas DataFrame

schedule Aug 12, 2023
Last updated
local_offer
BigQueryPandasPython
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

To import a BigQuery table as a DataFrame, Pandas offer a built-in method called read_gbq that takes in as argument a query string (e.g. SELECT * FROM users;) as well as a path to the JSON credential file for authentication. Let's first go through the steps on creating this credential file!

Creating a service account for authentication

Before we can import our BigQuery table, we must first create a service account for authentication. We can think of service accounts as users who can be assigned certain privileges to interact with Google Cloud Platform (GCP) products - including BigQuery.

A service account can be created on the GCP console hereopen_in_new. Pick the project where your BigQuery data lives, and proceed to create the service account. There are 2 simple steps in total - the first is to give a name to the service account:

Next, we have to grant privileges to this service account:

For demonstration sake, we will assign the highest BigQuery Admin privilege, but do keep in mind that we should always assign just the right level of privileges in a production setting for better security.

We can skip the third optional step and finish creating the service account.

Once this step is done, we should see our new service account like so:

Generating a JSON credential file

Now that the service account is created, we need to generate a JSON credential file to pass to Pandas. To do so, click on the three dots under Actions, and click on Manage keys:

Next, click on ADD KEY, and then Create new key:

Finally, select JSON and then click CREATE:

This will download the private key, which is a JSON file that holds all the credentials required for authentication. Make sure to not share this JSON file publicly (e.g. on GitHub repository) because others can gain access to your BigQuery data and wreak havoc.

Importing BigQuery table as a Pandas DataFrame

Now that we have the JSON credential file, we can use Pandas' read_gbq(~) method to read some data from our BigQuery table as a Pandas DataFrame:

import pandas as pd
from google.cloud import bigquery
from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file(
'./demobigquery-353007-1fb4e20ef41a.json',
scopes=['https://www.googleapis.com/auth/cloud-platform'],
)
query_string = 'SELECT * FROM `demobigquery-353007.forum.users`;'
df = pd.read_gbq(query_string, credentials=credentials)
df.head()
name age
0 Cathy 40
1 Alex 20
2 Bob 30

Here, note the following:

  • for the code to run, you may have to install the pandas-gbq library via pip or conda.

  • the ./demobigquery-353007-1fb4e20ef41a.json credential file created in previous step lives in the same directory as this Python script.

  • demobigquery-353007 is the ID of our GCP project - you can find this information by logging into the GCP console.

  • the forum in forum.users is the dataset name (or sometimes referred to as the schema), and users is the name of our table.

NOTE

If you want to create this BigQuery table, please follow my "Getting started with BigQuery" guide.

Importing a public BigQuery dataset

BigQuery has a number of datasets available for public access such as data from Stack Overflow and Hacker News.

For instance, Stack Overflow's data lives in the following:

Project ID: bigquery-public-data
Dataset: stackoverflow
Table: post_questions

Therefore, to access some records from this BigQuery table:

query_string = """
SELECT id, title
FROM `bigquery-public-data.stackoverflow.posts_questions`
LIMIT 100
"""
df = pd.read_gbq(query_string, credentials=credentials)
df.head()
id title
0 68922691 Vue 3 + Module Federation
1 68923973 Replace newline \n with expression using sed (...
2 68926624 Wait_For_Component is not working in cogs [dis...
3 68979195 How to merge date and time from two separate c...
4 68997505 Slick Slider is not working on the website?
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
1
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!