THE DATA BUZZ WRAP
COMMUNITY
WEEKENDS DATA ENGINEERING/ DATA SCIENCE COURSE
BATCH 6
Class Already Started, 2023
Course Overview
It is a complete end to end Data Engineering / Data Science course which would cover
Spark, Hive, SQL, Python, AWS Cloud, Airflow, GIT along with Guesstimates and
Problem Solving. This course would particularly be helpful for the fresher’s college students of
someone who wants to make a transition into the engineering-science-analytics field. If someone
wants to upskill oneself oí wants to brush up one's knowledge then,this course would be
particularly very helpful considering the comprehensiveness along with the short duration of
the course.
Course Duration: 3 to 4 months
Class Timing: 10Am- 11:30 Am (Sat - Sun)
Doubt Lecture 1 Hour – Sunday
Live Lectures would be conducted on Zoom.
The recoding of each live session with life-time access would also be provided to you.
But we would urge you to attend the live lectures for better understanding.
SPARK
Spark Overview
Why Spark is getting used everywhere
instead of MapReduce
Advantages & Disadvantages of Spark
Spark Components
Spark Architecture
Spark RDD's , Data Frames in detail
Different File Formats used in Spark
Spark Operations(Transformation & Action)
Shuffling in Spark
Parallelism in Spark
Spark Built in Functions
SPARK SQL in detail
Spark Joins
Spark Optimization techniques
Shared Variables in Spark
Spark Computations
Realtime problem and solution
Spark Assignment
HIVE
Hive Overview & Architecture
Hive VS RDBMS
Hive Meta-Store
OLAP VS OLTP
Hive Execution engines
HQL VS SQL
Hive Built in Functions
ORC file format
Different tables in Hive
Table level optimizations
Query level optimizations
Partitioning vs Bucketing
Hive Built in Functions
Different types of Hive partitions
SERDE in hive
SCD implementations in Hive
Hive Optimization techniques
Hive assignment
AWS CLOUD
Will Be Providing 1-year free AWS Cloud Account
Amazon S3 Overview
Different S3 buckets overview
S3 life cycle
real time use case of S3
EMR
Autoscaling & Cooldown
Real time use of EMR
Amazon Athena Overview
Tables & View Creation
MSCK REPAIR
Glue
Redshift
Practice Problems
AIRFLOW
Airflow Overview
Why Airflow
What is DAG
DAG Creation
Operators & Sensors in Airflow
Integration of Spark jobs to Airflow
Real time problem statement
SQL
● Introduction to SQL
● What are databases and SQL and how they can be used together to
dive in
● How to store and modify the data in a database:
● DDL Commands: CREATE, ALTER, DROP, TRUNCATE, etc.
● Data types: VARCHAR, INT, DECIMAL, DATE, BOOLEAN, etc.
● Constraints: PRIMARY KEY UNIQUE KEY and NOT NULL etc.
● DML Operations: INSERT, UPDATE, DELETE etc.
● How to retrieve data: SELECT Statement
● Basic select clause operations: Distinct, Limit, ORDER By
● The filter (WHERE) clause: Logical operations, Comparison
operators,Advance filters
● Aggregation and Advance Aggregation: Group by, Partition By,
RowsBetween clause, Rolling Calculations, filter with Having
clause.
● SQL JOINS: INNER, LEFT, RIGHT, FULL OUTER, SELF, CROSS
● Self-Operations: UNION, UNION ALL, MINUS, Intersect
● Calculated Columns and SQL Functions: CASE WHEN, Date
Functions,String functions, Data type conversion functions, etc.
● Queries within queries: Subqueries and CTE (With Clause)
● Window Analytical Functions: RANK, ROW_NUMBER,
DENSE_RANK,LEAD/LAG, NTILE
● Performance tuning: Clustered and non-clustered indexes, best
practices for SQL optimization
PYTHON
● Introduction to Python
● Variables, keywords, indentation quotes
● Comparison: Arithmetic and logic operator
● LOOP
● PASS, BREAK AND Continue
● String (type casting, string formatting, slicing, string method
● List (type casting, String formatting, slicing, string method)
● List (type casting, string formatting, slicing, string method)
● Set (TYPE Castling, Different Operations)
● MAP (USE CASE)
● LAMBDA- (LAMBDA Functions USE)
● NUMPY, PANDAS (Python LIBRARIES IN Detail’s
GUESSTIMATES
● KEY Points’ About ANSWERING Guesstimate’s Question’s
● STEPS FOR SOLVING A Guesstimate Question
● Guesstimate’s Interview Question AND ANSWER EXAMPLES
● CONCLUSION:
● WHAT IS Guesstimate?
8
● WHAT ARE THE SKILLS DECIPHERED WHILE
ANSWERING THE Guesstimate Question’s?
PROBLEM SOLVING
● TO BE PREPARED TO Actively Listen IN ORDER TO
AccuratelyUnderstand THE PROBLEM
● TO HELP YOU KNOW HOW TO TAKE THE FIRST STEP IN
SOLVING APROBLEM
● TO CLARIFY AND DEFINE THE PROBLEM
● TO Understand THE USEÏULNESS OF Collaborative
PROBLEMSOLVING AND DECISION MAKING
9
WHAT ELSE?
RESUME MAKING GUIDANCE
• My Main Focus Would Be to Present You as A Person Who Has Done
Some Work as A Data Engineer and Doesn't Just Have Knowledge.
• For A fresher, It Would Be through Your Projects and for An
ExperiencedPerson, It Would Be through Resume Molding
• I Would Also Be Guiding You How to Make an attractive and
creative Resume
GUIDANCE TO USE VARIOUS JOB BOARDS
● We would be guiding you how to leverage various job
portals likeNaukíi.com and LinkedIn to get a job.
● A proper template to reach out to people on LinkedIn of via email
would be shared.
● Referrals to be provided for 1 year
GUIDANCE FOR HR ROUND
• Guidance for HR Round
• Answering Most Asked Questions by HRs
• Taking One-On-One Mocks for HR Rounds, If You failed In Any.
1
REASONS TO JOIN THIS COURSE INSEAD Of ANY YouTube
VIDEO:
● One to one interaction would be there during the live class.
● Assignments would be given which you can ask, if unable to solve
● Practice of questions which have already been asked in the
interviews would be solved which teaching the concepts ●
Certification after completing the course
THE COST OF THIS COURSE IS RS 5000
ONLY
Mode of Payment:
● All the payments to be done by (gpay/phonepe)
● After that send me the screenshot of the payment along with your
email id.
● You would be added into the WhatsApp group within 30 minutes.
1
In case of any query, please connect with me.
8095821145- WHASTAPP OR CALL
Regards,
Your Data Guy
7 + Years Exp in
The Industry
with Top
Product
Companies
SUBHADIP DAS
1
THANK YOU