Get data from a Google BigQuery table using Python 3 – a specific task that took me a while to complete due to little or confusing online resources
You might ask why a Data Scientist was stuck to solve such a trivial Data Engineering task? Well… because most of the time… there is no proper Data Engineering support in an organization.
Steps to follow if you want to connect to Google BigQuery and pull data using Python:
- Ask your GCP admin to generate a Google Cloud secret key and save it in a json file:
-
- install libraries:
- pip install google-cloud-bigquery
- pip install google-cloud
- pip install tqdm
- pip install pandas_gbq
- import libraries:
- import os
- import io
- from google.cloud.bigquery.client import Client
- from google.cloud import bigquery
- import pandas_gbq
- set Google Credentials (your json file created at Step 0):
- os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] = ‘path to your json file/filename.json’
- define a BQ client:
- storage_client = storage.Client( project = ‘yourprojectname’)
- define de query and save it in a variable
- query = f””” SELECT
*
FROM `projectname.tablename`;
“””
- query = f””” SELECT
- use pandas_gbq to read the results and save them in a dataframe:
- queryResultsTbl= pandas_gbq.read_gbq(
query,
project_id=project_id,
dialect=”standard”
Something like this:
import os
import io
from google.cloud.bigquery.client import Client
from google.cloud import bigquery
import pandas_gbq
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'google-key_Cristina.json'
project_id = "project-name"
client = bigquery.Client(project = project_id)
query = f""" SELECT
*
FROM `project-name.table-name`;
"""
queryResultsTbl= pandas_gbq.read_gbq(
query,
project_id=project_id,
dialect="standard"
)
queryResultsTbl.head(10)This is a personal blog. My opinion on what I share with you is that “All models are wrong, but some are useful”. Improve the accuracy of any model I present and make it useful!
- queryResultsTbl= pandas_gbq.read_gbq(
- install libraries: