0% found this document useful (0 votes)

36 views2 pages

06 Data Processing Using Google Cloud Functions

The document provides a comprehensive guide on using Google Cloud Functions to create a file format converter that processes CSV files into Parquet format using Python. It includes steps for creating, running, and validating the Cloud Function, as well as deploying the application with necessary configurations and environment variables. Additionally, it outlines the logic for file conversion using Pandas and setting up the project in VS Code.

Uploaded by

malekbouslah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views2 pages

06 Data Processing Using Google Cloud Functions

Uploaded by

malekbouslah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

# Data Processing using Google Cloud Functions

## Overview of Google Cloud Functions

Google Cloud Functions are nothing but serverless functions provided as part of
Google Cloud Platform.
* Simplified developer experience and increased developer velocity
* Pay only for what you use
* Avoid lock-in with open technology
You can follow this [page](https://cloud.google.com/functions) to get more details
about Google Cloud Functions.

## Create First Google Cloud Function using Python

Here are the instructions to create first google cloud function using Python.
* Search for Cloud Function
* Follow the steps as demonstrated to create first Google Cloud Function
* Name: `file_format_converter`
* Language: `python3.9`
* Type: `Cloud Storage` (Default is HTTP)
* Review Default Memory and Timeout Settings

## Run and Validate Google Cloud Function

Here are the instructions to run and validate Google Cloud Function.
* Go to Testing and add this JSON.

```json
{
"name": "testing.csv"
}
```

## Review File Format Converter Logic using Pandas

> TBD: Need to make the logic dynamic.

* Source: `landing/retail_db/orders`
* Source File Format: `CSV`
* Schema: `landing/retail_db/schemas.json`
* Target: `bronze/retail_db/orders`
* Target File Format: `parquet`

Here is the design for the file format conversion.

* The application should take table name as argument.
* It has to read the schema from `schemas.json` and need to be applied on `CSV`
data while creating Pandas Data Frame.
* The Data Frame should be written to target using target file format.
* Source Bucket, Target Bucket as well as base folders should be passed as
environment variables.

```python
import json
import os
import pandas as pd

def get_columns(input_base_dir, ds_name):

schemas = json.load(open(f'{input_base_dir}/schemas.json'))
columns = list(map(lambda td: td['column_name'], schemas[ds_name]))
return columns
input_base_dir = os.environ.get('INPUT_BASE_DIR')
output_base_dir = os.environ.get('OUTPUT_BASE_DIR')
ds_name = 'orders'
columns = get_columns(input_base_dir, ds_name)
print(columns)
for file in os.listdir(f'{input_base_dir}/{ds_name}'):
print(file)
df = pd.read_csv(f'{input_base_dir}/{ds_name}/{file}', names=columns)
os.makedirs(f'{output_base_dir}/{ds_name}', exist_ok=True)
df.to_parquet(f'{output_base_dir}/{ds_name}/{file}.snappy.parquet')
```

## Deploy Inline Application as Google Cloud Function

As we have reviewed the core logic, let us make sure to deploy file format
converted as Google Cloud Function.
* Create Function with relevant run time (Python 3.9).
* Update `requirements.txt` with all the required dependencies.
* Update the program file with the logic to convert the file format.
* Review the configuration and make sure memory is upgraded to 1 GB (from 256 MB)
* Update Environment Variables for bucket names as well as base folder names.

## Run Inline Application as Google Cloud Function

As File Format Converter as deployed as Cloud Function, let us go through the
details of running it. We will also validate to confirm if the Cloud Function is
working as expected or not.
* Run the Cloud Function by passing Table Name as run time argument.
* Review the logs to confirm, the Cloud Function is executed with out any errors.
* Review the files in GCS in the target location.
* Use Pandas `read_parquet` to see if the data in the converted files can be read
into Pandas Data Frame.

## Setup Project for Google Cloud Function

Let us go ahead and setup project for Google Cloud Function using VS Code.
* Create new project.
* Create Python Virtual Environment using Python 3.9
* Add dependencies for local develement to `requirements_dev.txt`
* Add Driver Program for Google Cloud Function.
## Build and Deploy Application in GCS as Google Cloud Function

## Run Deployed Application in GCS as Google Cloud Function

Code
No ratings yet
Code
14 pages
CC Lab3
No ratings yet
CC Lab3
8 pages
Cloud Computing Lab-3
No ratings yet
Cloud Computing Lab-3
8 pages
Bigquery Scenarios - Dipakraj Patil
No ratings yet
Bigquery Scenarios - Dipakraj Patil
37 pages
(Agent Builder) - BigQuery Structured Data (NJ)
No ratings yet
(Agent Builder) - BigQuery Structured Data (NJ)
49 pages
Cloud Dataproc Spark Guide
No ratings yet
Cloud Dataproc Spark Guide
4 pages
Combine Description Child Entities (External)
No ratings yet
Combine Description Child Entities (External)
14 pages
GCP Updates: Latest Features and Commands
No ratings yet
GCP Updates: Latest Features and Commands
124 pages
APIs
No ratings yet
APIs
5 pages
05 Functions
No ratings yet
05 Functions
6 pages
Fast Back End
No ratings yet
Fast Back End
11 pages
02 Setting Up Data Lake Using GCS
No ratings yet
02 Setting Up Data Lake Using GCS
3 pages
GCP Associate Cloud Engineer Master Cheatsheet
100% (2)
GCP Associate Cloud Engineer Master Cheatsheet
45 pages
Hadoop To GCP Migration Plan
No ratings yet
Hadoop To GCP Migration Plan
3 pages
Bcs601 Cloud Computing Lab Manual Updated
No ratings yet
Bcs601 Cloud Computing Lab Manual Updated
35 pages
BigQuery Remote Function User Guide
No ratings yet
BigQuery Remote Function User Guide
7 pages
Vivek 210033252 BDCW - Ipynb - Colaboratory
No ratings yet
Vivek 210033252 BDCW - Ipynb - Colaboratory
112 pages
ETL Pipelines with Dataflow and BigQuery
0% (1)
ETL Pipelines with Dataflow and BigQuery
15 pages
Migrating Data From HDFS To Big Query
No ratings yet
Migrating Data From HDFS To Big Query
5 pages
Google - Professional Cloud Architect.v2022 06 30.q216
100% (1)
Google - Professional Cloud Architect.v2022 06 30.q216
147 pages
Gcs To BQ Via Dataproc Dag
No ratings yet
Gcs To BQ Via Dataproc Dag
2 pages
01 Getting Started With Data Engineering On GCP
No ratings yet
01 Getting Started With Data Engineering On GCP
2 pages
GCP Questions PDF
No ratings yet
GCP Questions PDF
1 page
Sdswhduehfiudvnic
No ratings yet
Sdswhduehfiudvnic
7 pages
GCP Questions - Topics
No ratings yet
GCP Questions - Topics
1 page
Cloud Data Fusion
No ratings yet
Cloud Data Fusion
4 pages
Spark to BigQuery Migration Guide
No ratings yet
Spark to BigQuery Migration Guide
7 pages
GCP Data Transfer and Management Guide
No ratings yet
GCP Data Transfer and Management Guide
1 page
Airflow Assignment 1
No ratings yet
Airflow Assignment 1
3 pages
CS For DS Lab Record 2024 - 2
No ratings yet
CS For DS Lab Record 2024 - 2
50 pages
Cloud Functions for Data Engineering
No ratings yet
Cloud Functions for Data Engineering
3 pages
But Can You Really Run Your App On 2 Clouds at The Same Time
No ratings yet
But Can You Really Run Your App On 2 Clouds at The Same Time
60 pages
AI Detector Report Overview
No ratings yet
AI Detector Report Overview
3 pages
Import GCP to ADLS with Spark
No ratings yet
Import GCP to ADLS with Spark
7 pages
Sender Module Design Specification First Release
No ratings yet
Sender Module Design Specification First Release
8 pages
05 Data Warehouse Using Google Big Query
No ratings yet
05 Data Warehouse Using Google Big Query
6 pages
PRJ DE 1 Supporting
No ratings yet
PRJ DE 1 Supporting
17 pages
Ex 9
No ratings yet
Ex 9
20 pages
GCP Data Engineering Course Overview
No ratings yet
GCP Data Engineering Course Overview
7 pages
Session 5
No ratings yet
Session 5
14 pages
Snowflake Snowpro Certification Exam Cheat Sheet by Jeno Yamma
100% (1)
Snowflake Snowpro Certification Exam Cheat Sheet by Jeno Yamma
7 pages
Snowflake Snowpro Exam Cheatsheet
83% (12)
Snowflake Snowpro Exam Cheatsheet
7 pages
GCP Data Engineer Resume
No ratings yet
GCP Data Engineer Resume
1 page
Snowflake Document
No ratings yet
Snowflake Document
26 pages
AWS Lambda and Kinesis Stream Setup Guide
No ratings yet
AWS Lambda and Kinesis Stream Setup Guide
86 pages
Practice Test 5
No ratings yet
Practice Test 5
78 pages
Google - Professional Cloud Architect - Page 1 - Examprepper
No ratings yet
Google - Professional Cloud Architect - Page 1 - Examprepper
4 pages
Google's Professional Data Engineer - ExamTopics
100% (2)
Google's Professional Data Engineer - ExamTopics
234 pages
Dataflow Pipeline to BigQuery
No ratings yet
Dataflow Pipeline to BigQuery
6 pages
How To Work With Apache Airflow
No ratings yet
How To Work With Apache Airflow
111 pages
Code Explanation
No ratings yet
Code Explanation
3 pages
0 Problem Statement 5thjuly 4thaug
No ratings yet
0 Problem Statement 5thjuly 4thaug
22 pages
M9 Google Cloud Data Migration Services
No ratings yet
M9 Google Cloud Data Migration Services
40 pages
Hive on Google Cloud Dataproc Guide
No ratings yet
Hive on Google Cloud Dataproc Guide
16 pages
AWS Lambda S3 Copy Tutorial
No ratings yet
AWS Lambda S3 Copy Tutorial
22 pages
GCP Cloud Migration Strategy Guide
No ratings yet
GCP Cloud Migration Strategy Guide
5 pages
Snowflake Data Loading Techniques
No ratings yet
Snowflake Data Loading Techniques
5 pages
Feature Store
No ratings yet
Feature Store
19 pages
Development Lab Requirements
No ratings yet
Development Lab Requirements
13 pages
SY AI&DS Class Schedule 2024-25
No ratings yet
SY AI&DS Class Schedule 2024-25
1 page
Lpic 1
No ratings yet
Lpic 1
291 pages
Enterprise Java: RequestDispatcher & Cookies
No ratings yet
Enterprise Java: RequestDispatcher & Cookies
25 pages
Linksys WAP54G - DD-WRT Wiki
No ratings yet
Linksys WAP54G - DD-WRT Wiki
5 pages
IOT Unit 4
No ratings yet
IOT Unit 4
11 pages
Examiners Report Computer Systems
No ratings yet
Examiners Report Computer Systems
28 pages
Unit 4
No ratings yet
Unit 4
56 pages
Full Stack Development Internship Report
No ratings yet
Full Stack Development Internship Report
45 pages
Digital Twin Security Threats and Countermeasures An Introduction
No ratings yet
Digital Twin Security Threats and Countermeasures An Introduction
5 pages
Intro Python
No ratings yet
Intro Python
24 pages
Distributed Systems: C5 Basic Distributed Algorithms
No ratings yet
Distributed Systems: C5 Basic Distributed Algorithms
29 pages
Consultant Profiles and Rates
No ratings yet
Consultant Profiles and Rates
6 pages
Vmx-Pi Robotics Controllermotion Vision Processor Users Guide
No ratings yet
Vmx-Pi Robotics Controllermotion Vision Processor Users Guide
54 pages
React Props and State Descriptive (21-11-2023)
No ratings yet
React Props and State Descriptive (21-11-2023)
18 pages
Viibhor Resume
No ratings yet
Viibhor Resume
2 pages
Rciub122 14
No ratings yet
Rciub122 14
20 pages
Solidity Basic
No ratings yet
Solidity Basic
7 pages
AAIT Data Center Visit Report
No ratings yet
AAIT Data Center Visit Report
13 pages
PGDIT 101 Introduction To IT and Programming - Lecture 1 Program & Programming Languages
No ratings yet
PGDIT 101 Introduction To IT and Programming - Lecture 1 Program & Programming Languages
12 pages
Image Enhancement in Frequency Domain
No ratings yet
Image Enhancement in Frequency Domain
59 pages
Networking Devices CheatSheet - WK v1
No ratings yet
Networking Devices CheatSheet - WK v1
1 page
DFA and NFA Construction Techniques
88% (8)
DFA and NFA Construction Techniques
102 pages
Ceph Optimizations For Nvme: Chunmei Liu, Intel Corporation
No ratings yet
Ceph Optimizations For Nvme: Chunmei Liu, Intel Corporation
32 pages
Sreeja Resume
No ratings yet
Sreeja Resume
1 page
Smart Hackathon
No ratings yet
Smart Hackathon
14 pages
Standar Pelayanan Data Kebudayaan
No ratings yet
Standar Pelayanan Data Kebudayaan
8 pages
FMA Lab Manual 2019 Patt Final
100% (1)
FMA Lab Manual 2019 Patt Final
36 pages
8 - Android Layout
No ratings yet
8 - Android Layout
18 pages
Computer Vision Algorithms Overview
No ratings yet
Computer Vision Algorithms Overview
181 pages
List of Experiments
No ratings yet
List of Experiments
2 pages

06 Data Processing Using Google Cloud Functions

Uploaded by

06 Data Processing Using Google Cloud Functions

Uploaded by

# Data Processing using Google Cloud Functions

## Overview of Google Cloud Functions

## Create First Google Cloud Function using Python

## Run and Validate Google Cloud Function

## Review File Format Converter Logic using Pandas

> TBD: Need to make the logic dynamic.

Here is the design for the file format conversion.

def get_columns(input_base_dir, ds_name):

## Deploy Inline Application as Google Cloud Function

## Run Inline Application as Google Cloud Function

## Setup Project for Google Cloud Function

## Run Deployed Application in GCS as Google Cloud Function

You might also like