0% found this document useful (0 votes)
20 views4 pages

Session1 PythonIntro Keywords Comments Indentation

session 1

Uploaded by

parthu12347299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views4 pages

Session1 PythonIntro Keywords Comments Indentation

session 1

Uploaded by

parthu12347299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

DATA ENGINEERING --> AWS + PYSPARK --> AWS DATA ENGINEER --> AWS CLOUD

LEARNINGS -->
1) PYTHON
2) PYSPARK --> TABLE IN SPARK --> DATAFRAMES
3) AWS --> DATA ENGINEERING SERVICES

BIG DATA --> 200 plus frameworks -->

SPARK --> DATA PROCESSING FRAMEWORK --> 2012 --> SCALA ( JAVA )

2019 - 2021
1) SCALA --> 80 to 90%
2) Python --> 7 - 9 %
3) Java --> 3 to 5 %

2022 -->

PYTHON --> 65% --> PYSPARK --> SPARK + PYTHON --> 1


Scala --> 20 to 30 --> Scala --> 2
Java --> less than 2 % --> FORD --> 3

PYSPARK --> HEART OF BIG DATA --> PYTHON + SPARK

=========================
Python --> Data Processing , Data Analysis

1) Data types
2) Collections --> List , tuple , dictionary ( IMP)
3) LOOPS --> IF , FOR
4) Functions --> This is very imp
5) Class , method --> DIfferent class methods
6) Error handling mechanism

=========================

THINGS TO BE DONE BEFORE TOMMOROW'S SESSION -->

1) Download Python --> SHARE THE DOCUMENT

2) IDE's --> Jupyter notebook , Pycharm , INTELLIJ , ECLIPSE

DOwnload pycharm --> Share the document

3) Sublime test --> Share the document

4) You can download Anaconda --> Jupyter notebook --> YOUTUBE LINK

============================

What is Python ?

SPARK -->

Python -->
1) Simple to learn
2) Reduce the number of lines of code --> Debugging will be very easy
3) vast availability of libraries
1) General Purpose Programming language -->

--> Data Engineering --> Data processing


--> Web development
--> Reporting purpose
--> Machine learning

DE , DS , DA , WD

2) SCALA is Object Oriented programming language

Python Object Oriented programming language

3) Why python is interpreted ?


Python --> 10 lines -->
line 1 --> mc
line 2 --> mc

Compile -->
WHy scala is faster than python ?
SCALA --> CL --> MC
PYTHON --> IL

4) pyspark code --> ERROR --> Interactive mode ...

CLI --> Command Land interface

5) Dynamically typed -->

WINDOWS , MAC , LINUX

===================================
Structured programming --> Modularized programming

Functional programming --> Immutability --> Pure functions -->Removing mutability


in code

Mutability --> Mutable code

Function --> Logic --> Impure function

fn(1,2) --> JAN 1 --> 100


fn(1,2) --> JAN 2 --> 200

fn(1,2) --> JAN 1 --> 100 --> Pure function


fn(1,2) --> JAN 2 --> 100

Immutable -->

Functions -->

==================
Automatich Garbage Collections -->

Indentation --> 4 spaces and 1 tab -->

python 2 or python 3 ...

100% --> Python 3

==================

1) Keywords
2) Variables
3) Indentation
4) Comments
5) Loops
6) Output format

1) Keywords -->

Reserved Words in python

2) Identifiers --> Rules

data ---> identifer

1) DOnt create an identifier with digit at the start


2) Lower case or upper case or combination of digits
3) DOnt create indetifer with keywords

3) Comments in python

WHy we need comments ?


# -->

DOC STRING -->

class ETL_Pipelines {

def extract {
"""
Extracting the data from Hive tables customer and creating the
dataframe out of it
"""
}
}

4) Indentation -->

{} --> Block of code


indentation --> 4 spaces or 1 tab

pyspark code --> spark-submit --> testing --> IndentationError: unexpected indent

CONTROL + ENTER
Data types -->

1) Python Numbers --> Integer data type , float data type and Complex Data type
2) Boolean
3) String data

isinstance()

You might also like