##############################################################
############## cloud big query ##############
cloud big query : fully manger analytics datawarehouse ,
Enterprise level datawarehouse ,
analytics and dashbord purpose ,
syn for all data reports ,
No ops platform
GCP --> nevigation ---> big data --> big query
Database is dataset in big query
Resources ---> create dataset
data location (data locality principal)
dafault table expiration
encriptions ( google manged or customer mangeed )
dataset ---> create table ( empty table , google cloud storage , upload ,drive , google cloud bigtable )
---> project name
---> dataset name , table name,table tyupe
---> partitioning and clustering
Gcp --> nevigation manu ---> big data --> Big query ---> query editor
---> query
history
saved queries
job
history
transfers
schedule queries
Bi
engine
Big query partions : for low cardinality ,
column based partitons ,
data stored within partion ,
no need share manully shareding tables.
big query advantages : io : 100k disks in parallel ,sepearation of storage and compute , cpu : 100 cpus
, jupiter networks
best partices Big query :
query performance : std query anti patterns, avoid select * , sample data using preview options ,
check price before running ,
limit the cost before running by restricting the number of bytes billed , partion data by date to retrive
the data , materilize query result in in stages , consider the cost of large resuklt set, use streaming
insert with cautions
big query limits are given in google cloud /bigqueries/quotas .
storage optimizations
big query :
any size, from anywhere
big query calculate amount of data its processing add computation accordingly and charged per
query basis .
features : flexible data ingetion , global avablity , security and permissions , cost control , highly
available ,fully integrated , connect with google products , automatic data transfer service ,
when to use big query :
when strucured data , dont need low latency system
big query ---> chrage for size of storege used for storing , only charged when query being executed
Big query can be used with cloud datalab , tablue ,qlik , data studio , google sheet ,cowerkers
structured data + analytics workload + low latency === cloud big table ( no sql database )
structured data + mobile sdk === cloud storage firebase
structure data == cloud storage
strucured data + no anlytics workload + relational data + no hirizontal scaling -=== cloud sql
structured data + no analytics workload + relational data + horizontal scalblity = === cloud spanner
structured data + no analytics workload + relational data + mobile sdk == firebase realtime db
pricing in bigquery : calculate according to Slots :
slots are units of
computaion capacity
slots are automatically
calculate for query
flat pricing can be used to
purchase no of slots
Estimation : can guessetimate query before submitting ,
online console owr with dryrun in bg
big query command line:
bq : command line itiluty
bq up
$ bq --location=US load --source_format=CSV mydataset.mytable ./myfile.csv
qtr:STRING,sales:FLOAT,year:STRING
$ bq --location=asia-northeast1 load --source_format=CSV mydataset.mytable ./myfile.csv
qtr:STRING,sales:FLOAT,year:STRING
$ bq mk --table [PROJECT_ID]:[DATASET].[TABLE] [SCHEMA]
$ bq mk --table mydataset.mytable qtr:STRING,sales:FLOAT,year:STRING
$ bq mk --table mydataset.mytable ./myschema.js
mk --> make
bq head --max_rows=10 mydataset.mytable ====> see table rows
bq head myotherproject:mydataset.mytable =====> head command to see few 1000 records
bq head --start_row 100 --selected_fields "field1,field2" mydataset.mytable ====> seeing records
Copy Table
bq --location=US cp mydataset.mytable mydataset2.mytable2
bq --location=asia-northeast1 cp mydataset.mytable mydataset2.mytable2
Exporting Data
bq --location=US extract --compression GZIP 'mydataset.mytable' gs://example-bucket/myfile.csv
bq --location=US extract --destination_format NEWLINE_DELIMITED_JSON 'mydataset.mytable'
gs://example-
bucket/myfile.json
bq --location=US extract --destination_format AVRO --compression SNAPPY 'mydataset.mytable'
gs://example-
bucket/myfile.avro
Copy Table
bq --location=US cp mydataset.mytable mydataset2.mytable2
bq --location=asia-northeast1 cp mydataset.mytable mydataset2.mytable2
Delete
bq rm -t myotherproject:mydataset.mytable
bq rm -t mydataset.mytable
bq ls --format=prettyjson --project_id
bq ls --format=prettyjson
bq show --format=prettyjson [PROJECT_ID]:[DATASET]
bq query --nouse_legacy_sql \
'SELECT * EXCEPT(schema_owner) FROM INFORMATION_SCHEMA.SCHEMATA’
Update Dataset
bq update --description "Description of mydataset" mydataset
bq update --default_table_expiration 7200 mydatase
Remove
bq rm -r -f -d [PROJECT_ID]:[DATASET]
Access Control Access.
bq show --format=prettyjson [PROJECT_ID]:[DATASET] > [PATH_TO_FILE]
bq update --source [PATH_TO_FILE] [PROJECT_ID]:[DATASET]
############IAM roles for big query ##########
permissions : bigquery.jobs. <create/listAll/list/get>
datasets.<delete/get/insert>
set predefined roles : roles/bigquery.<metadataViewer/dataViewer/dataEditor>
premitive roles : dataset reader/writer/owner
project edit/owner ...