0% found this document useful (0 votes)
54 views5 pages

Google Big Query Quick 5min Understanding

Cloud BigQuery is a fully managed analytics data warehouse designed for enterprise-level analytics and dashboard purposes, allowing for seamless data synchronization and no operational overhead. It offers features like flexible data ingestion, global availability, and cost control, while providing best practices for optimizing query performance and managing costs. Users can interact with BigQuery through various tools, including command-line utilities and integrations with other Google products, and pricing is based on the amount of data processed and storage used.

Uploaded by

a logical human
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views5 pages

Google Big Query Quick 5min Understanding

Cloud BigQuery is a fully managed analytics data warehouse designed for enterprise-level analytics and dashboard purposes, allowing for seamless data synchronization and no operational overhead. It offers features like flexible data ingestion, global availability, and cost control, while providing best practices for optimizing query performance and managing costs. Users can interact with BigQuery through various tools, including command-line utilities and integrations with other Google products, and pricing is based on the amount of data processed and storage used.

Uploaded by

a logical human
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

##############################################################

############## cloud big query ##############

cloud big query : fully manger analytics datawarehouse ,

Enterprise level datawarehouse ,

analytics and dashbord purpose ,

syn for all data reports ,

No ops platform

GCP --> nevigation ---> big data --> big query

Database is dataset in big query

Resources ---> create dataset

data location (data locality principal)

dafault table expiration

encriptions ( google manged or customer mangeed )

dataset ---> create table ( empty table , google cloud storage , upload ,drive , google cloud bigtable )

---> project name

---> dataset name , table name,table tyupe

---> partitioning and clustering

Gcp --> nevigation manu ---> big data --> Big query ---> query editor

---> query
history

saved queries

job
history

transfers

schedule queries
Bi
engine

Big query partions : for low cardinality ,

column based partitons ,

data stored within partion ,

no need share manully shareding tables.

big query advantages : io : 100k disks in parallel ,sepearation of storage and compute , cpu : 100 cpus
, jupiter networks

best partices Big query :

query performance : std query anti patterns, avoid select * , sample data using preview options ,
check price before running ,

limit the cost before running by restricting the number of bytes billed , partion data by date to retrive
the data , materilize query result in in stages , consider the cost of large resuklt set, use streaming
insert with cautions

big query limits are given in google cloud /bigqueries/quotas .

storage optimizations

big query :

any size, from anywhere

big query calculate amount of data its processing add computation accordingly and charged per
query basis .

features : flexible data ingetion , global avablity , security and permissions , cost control , highly
available ,fully integrated , connect with google products , automatic data transfer service ,

when to use big query :


when strucured data , dont need low latency system

big query ---> chrage for size of storege used for storing , only charged when query being executed

Big query can be used with cloud datalab , tablue ,qlik , data studio , google sheet ,cowerkers

structured data + analytics workload + low latency === cloud big table ( no sql database )

structured data + mobile sdk === cloud storage firebase

structure data == cloud storage

strucured data + no anlytics workload + relational data + no hirizontal scaling -=== cloud sql

structured data + no analytics workload + relational data + horizontal scalblity = === cloud spanner

structured data + no analytics workload + relational data + mobile sdk == firebase realtime db

pricing in bigquery : calculate according to Slots :

slots are units of


computaion capacity

slots are automatically


calculate for query

flat pricing can be used to


purchase no of slots

Estimation : can guessetimate query before submitting ,


online console owr with dryrun in bg

big query command line:

bq : command line itiluty

bq up

$ bq --location=US load --source_format=CSV mydataset.mytable ./myfile.csv


qtr:STRING,sales:FLOAT,year:STRING
$ bq --location=asia-northeast1 load --source_format=CSV mydataset.mytable ./myfile.csv
qtr:STRING,sales:FLOAT,year:STRING

$ bq mk --table [PROJECT_ID]:[DATASET].[TABLE] [SCHEMA]

$ bq mk --table mydataset.mytable qtr:STRING,sales:FLOAT,year:STRING

$ bq mk --table mydataset.mytable ./myschema.js

mk --> make

bq head --max_rows=10 mydataset.mytable ====> see table rows

bq head myotherproject:mydataset.mytable =====> head command to see few 1000 records

bq head --start_row 100 --selected_fields "field1,field2" mydataset.mytable ====> seeing records

Copy Table

bq --location=US cp mydataset.mytable mydataset2.mytable2

bq --location=asia-northeast1 cp mydataset.mytable mydataset2.mytable2

Exporting Data

bq --location=US extract --compression GZIP 'mydataset.mytable' gs://example-bucket/myfile.csv

bq --location=US extract --destination_format NEWLINE_DELIMITED_JSON 'mydataset.mytable'


gs://example-

bucket/myfile.json

bq --location=US extract --destination_format AVRO --compression SNAPPY 'mydataset.mytable'


gs://example-

bucket/myfile.avro

Copy Table

bq --location=US cp mydataset.mytable mydataset2.mytable2

bq --location=asia-northeast1 cp mydataset.mytable mydataset2.mytable2

Delete

bq rm -t myotherproject:mydataset.mytable

bq rm -t mydataset.mytable

bq ls --format=prettyjson --project_id
bq ls --format=prettyjson

bq show --format=prettyjson [PROJECT_ID]:[DATASET]

bq query --nouse_legacy_sql \

'SELECT * EXCEPT(schema_owner) FROM INFORMATION_SCHEMA.SCHEMATA’

Update Dataset

bq update --description "Description of mydataset" mydataset

bq update --default_table_expiration 7200 mydatase

Remove

bq rm -r -f -d [PROJECT_ID]:[DATASET]

Access Control Access.

bq show --format=prettyjson [PROJECT_ID]:[DATASET] > [PATH_TO_FILE]

bq update --source [PATH_TO_FILE] [PROJECT_ID]:[DATASET]

############IAM roles for big query ##########

permissions : bigquery.jobs. <create/listAll/list/get>

datasets.<delete/get/insert>

set predefined roles : roles/bigquery.<metadataViewer/dataViewer/dataEditor>

premitive roles : dataset reader/writer/owner

project edit/owner ...

You might also like