Introduction to
for Data Analysis
Why Python for Data Analysis?
• Fast processing
• Very simple and easy to learn
• Rich ecosystem of powerful libraries
• Scalability and Flexibility
• Python offers is the ideal tool for data scientists in fields such as
Machine Learning, Deep Learning, NLP etc.
What is
Anaconda?
• Anaconda is a free and open source distribution of python. Focused
on providing the tools necessary for Data science.
Anaconda
Interface
Jupyter
Notebook
• Originally called Ipython, the name Jupyter comes from the fact that it
supports writing code in three popular languages: Julia and Python and
R which are popular for data science.
• Jupyter Notebook is an open source, web based interface application
which accommodates live code and visualization all in one place.
• Supports more than 40 programming languages.
Exploring The
Interface
Stop execution
Save Cut, copy Block type
paste
New block Move block Reset block
And clear output
Working Modes
• THE COMMAND MODE (The Blue Cell): The command mode
essentially means you’re in command of everything i.e. you can add a
cell, delete a cell.
• THE EDIT MODE (The Green Cell): The edit mode means you’re in the
cell. Essentially means you can edit a cell. When you want to get out
the edit mode, you press the ESC button
printing in
python
We can print any statement on console in python using keyword “print”
Example 1
print (“This is my first program”)
Example 2
name = “ben”
print(“My name is”, name)
Commenting in Jupyter
Notebook
• Useful when your code needs further explanation. Either for your
future self and anybody else.
• Useful when you want to remove the code from execution but not
permanently
• Comments in Python are done with #
Variables
VARIABLE REASON
• Variables are defined to store values such
var1 Valid Starts with letter
as characters (text), integers (numbers),
var1? Invalid Only _ is allowed
Boolean values (True or False) etc.
var1_ Valid Starts with letter and contains _
• Variable name must start with only letter
1var Invalid Starts with number
or underscore (not number)
_ var Valid Can start with _
_1var Valid Can start with _ • Variable name should contain only
var_1 Valid _ and numbers are allowed number, letter or underscore
. var Invalid Can only start with _ or A-Z or a-z
Data Types
Data Type Example
Integer 5, 10, 22, 36
Float 5.25, 3.26, 5.32
Complex 3 + 4j, 2j
Boolean True, False
String ‘Hello’, “Hi”, ’23.56’
List [1,2,3, ”Hello”, True]
Tuple (1,2,3,’Hello’,True)
Set {1,2,3,’Hello”,True}
Dictionary {1:’Sam’,2:’’Sarah}
Some common
• Pandas:
libraries
The cornerstone of our data analysis with python
• Matplotlib: The foundational library for visualization.
• Numpy: The numeric library for all calculation in Python.
• Seaborn: A statistical visualization tool built on top of matplotlib
• Statsmodels: A library with many advanced statistical functions
• Scipy: Advanced scientific computing
• Scikit-learn: The most popular machine learning library