NumPy, Pandas, and Matplotlib -
Detailed Guide
Submitted by: Adreeja Mahato
This document is a comprehensive guide to three essential Python libraries for data analysis
and visualization: NumPy, Pandas, and Matplotlib. It consolidates examples, explanations,
and corrected code snippets from the original content provided, along with expanded
details for better understanding. Each section is organized into subsections with examples
and expected outputs.
1. NumPy
NumPy (Numerical Python) is a powerful library for numerical computations and array
manipulations.
1.1 Array Creation
Single Dimensional Array:
import numpy as np
n1 = [Link]([10,20,30,40])
# Output: array([10, 20, 30, 40])
Multi Dimensional Array:
n2 = [Link]([[10,20,30,40], [50,60,70,80], [90,80,70,60]])
# Output:
[[10 20 30 40]
[50 60 70 80]
[90 80 70 60]]
1.2 Special Arrays
Zeros Array:
n1 = [Link]((5,5))
# Output: 5x5 matrix of zeros
Full Array:
n1 = [Link]((6,6), 9)
# Output: 6x6 matrix filled with 9
Arange:
n1 = [Link](0, 101, 10)
# Output: [0,10,20,30,40,50,60,70,80,90,100]
Random Integers:
n1 = [Link](1, 100, 5)
# Output: e.g., [52, 5, 74, 37, 66]
1.3 Shape, Indexing, and Slicing
Example:
n1 = [Link]([[1,2,3],[4,5,6]])
[Link]
# Output: (2,3)
Indexing:
n1[0] -> first row
n1[:,1] -> second column
[Link]() -> transpose of matrix
1.4 Set Operations
Intersection:
np.intersect1d(n1, n2)
Difference:
np.setdiff1d(n1, n2)
1.5 Arithmetic & Statistics
Element-wise operations:
n1+1, n1-1, n1*2, n1/2
Statistics:
[Link](n1), [Link](n1), [Link](n1), [Link](n1), [Link](n1)
1.6 Linear Algebra
Matrix Multiplication:
[Link](n2) or [Link](n1, n2)
2. Pandas
Pandas is a library for data manipulation and analysis. It provides Series (1D) and
DataFrame (2D) structures.
2.1 Series
Creating a Series:
import pandas as pd
s1 = [Link]([1,2,3,4,5])
Custom Index:
s1 = [Link]([1,2,3,4,5], index=['a','b','c','d','e'])
From Dictionary:
s1 = [Link]({'a':1, 'b':2, 'c':3})
Extracting elements:
s1[2], s1[2:5], s1[-3:]
2.2 DataFrame
Creating from Dictionary:
data = {'Name': ['Shubham','Yash','Yugansh'], 'Marks':[97,92,23]}
df = [Link](data)
Reading CSV:
df = pd.read_csv('[Link]')
[Link](), [Link]()
2.3 DataFrame Operations
Selection: df['Name'], df[['Name','Marks']]
Filtering: df[df['Marks'] > 50]
GroupBy: [Link]('Name').mean()
Handling Missing Data: [Link](), [Link](0)
3. Matplotlib
Matplotlib is a library for creating visualizations. It can generate plots, histograms, scatter
plots, etc.
3.1 Line Plot
import [Link] as plt
x = [1,2,3,4,5]
y = [10,20,25,30,40]
[Link](x, y, marker='o')
[Link]('X-axis')
[Link]('Y-axis')
[Link]('Line Plot')
[Link]()
3.2 Bar Chart
x = ['A','B','C','D']
y = [3,7,1,5]
[Link](x,y)
[Link]('Bar Chart')
[Link]()
3.3 Histogram
data = [1,2,2,3,3,3,4,4,4,4]
[Link](data, bins=4)
[Link]('Histogram')
[Link]()
3.4 Scatter Plot
x = [5,7,8,7,2,17,2,9,4,11]
y = [99,86,87,88,100,86,103,87,94,78]
[Link](x, y)
[Link]('Scatter Plot')
[Link]()
ARTIFICIAL INTELLIGENCE
Artificial Intelligence (AI) is a branch of computer science that focuses on creating
systems that can perform tasks that normally require human intelligence. These
tasks include learning, reasoning, problem-solving, understanding natural language,
recognizing patterns, decision-making, and adapting to new situations.
Jhon McCarthy first coined the term Artificial Intelligence in the year 1956.
Types of AI-
Narrow AI: Designed for a specific task(e.g., Google Maps, siri,
chatbots).
General AI: Hypothetical AI that can perform any intellectual task
like a human.
Superintelligent AI: Beyond human intelligence
Advantages-
a. Automate repetitive tasks.
b. Processes large amount of data quickly.
c. Improves accuracy in predictions and decisions.
Challenges-
a. Job displacement.
b. Bias in AI systems.
c. Data privacy issues.
d. Ethical concerns.
MACHINE LEARNING
Arthur Samuel first coined the term Machine Learning in the year 1959.
Machine Learning is a subset of Artificial Intelligence (AI) that allows computers to
learn patterns from data and improve their performance automatically without
being explicitly programmed.
Types of Machine Learning—
1. Supervised Learning: The model learns from labeled data (input +
correct output).
Example: Predicting house prices, spam detection, stock price prediction.
2. Unsupervised Learning: The model learns from unlabeled data, finding
hidden patterns.
Example: Customer segmentation, grouping similar products, market basket analysis.
3. Reinforcement Learning: The model learns by trial and error using
rewards and punishments.
Example: Self-driving cars, game-playing AIs (like AlphaGo).
Key Steps in Machine Learning:-
1. Data Collection → Gather raw data.
2. Data Preprocessing → Clean and prepare the data.
3. Feature Selection/Engineering → Pick the right variables (features).
4. Model Training → Use algorithms (like Decision Trees, Neural
Networks, etc.) to learn patterns.
5. Testing & Evaluation → Check model accuracy using test data.
6. Deployment → Use the trained model in real-world applications.
Common ML Algorithms:-
Linear Regression → Predict continuous values (e.g., salary
prediction).
Logistic Regression → Classification (e.g., spam or not spam).
Decision Trees & Random Forests → For both classification &
regression.
Support Vector Machines (SVM) → Classification with margin
separation.
K-Means Clustering → Unsupervised grouping.
Neural Networks & Deep Learning → Advanced tasks like
image and speech recognition.
Advantages:-
o Automates decision-making.
o Learns and improves from experience.
o Can analyze massive amounts of data.
Challenges:-
o Requires lots of data.
o Risk of bias in data → unfair results.
o Lack of transparency in complex models (“black box problem”).
NUMPY
NumPy is a Python library.
NumPy is used for working with arrays.
NumPy is short for "Numerical Python".
Creating a NumPy array:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
o/p- [1 2 3 4 5]
<class '[Link]'>
0-D Arrays:
import numpy as np
arr = [Link](42)
print(arr)
o/p- 42
1-D Arrays:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
o/p- [1 2 3 4 5]
2-D Arrays:
import numpy as np
arr = [Link]([[1, 2, 3], [4, 5, 6]])
print(arr)
o/p- [[1 2 3]
[4 5 6]]
3-D Arrays:
import numpy as np
arr = [Link]([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
o/p- [[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]]
Accessing Array Elements:
1. Getting the first element-
import numpy as np
arr = [Link]([1, 2, 3, 4])
print(arr[0])
o/p- 1
2. Getting the third and fourth elements-
import numpy as np
arr = [Link]([1, 2, 3, 4])
print(arr[2] + arr[3])
o/p- 7
Array Slicing:
1. Slicing in python means taking elements from one given index to another
given index.
2. We pass slice instead of index like this: [start:end].
3. We can also define the step, like this: [start:end:step].
4. If we don't pass start its considered 0
5. If we don't pass end its considered length of array in that dimension
6. If we don't pass step its considered 1
Slice elements from index 1 to index 5 from the following array:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
o/p- [2 3 4 5]
Slice elements from the beginning to index 4 (not included):
import numpy as np
arr = [Link]([1, 2, 3, 4, 5, 6, 7])
print(arr[:4])
o/p- [1 2 3 4]
Negative Slicing:
1. Use the minus operator to refer to an index from the end.
2. Slice from the index 3 from the end to index 1 from the end:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5, 6, 7])
print(arr[-3:-1])
o/p- [5 6]
NumPy Array Shape:
1. The shape of an array is the number of elements in each dimension.
2. Print the shape of a 2-D array:
import numpy as np
arr = [Link]([[1, 2, 3, 4], [5, 6, 7, 8]])
print([Link])
o/p- (2, 4)
Joining NumPy Arrays:
1. Join two arrays:
import numpy as np
arr1 = [Link]([1, 2, 3])
arr2 = [Link]([4, 5, 6])
arr = [Link]((arr1, arr2))
print(arr)
o/p- [1 2 3 4 5 6]
Sorting Arrays:
Sorting means putting elements in an ordered sequence
import numpy as np
arr = [Link]([3, 2, 0, 1])
print([Link](arr))
o/p- [0 1 2 3]
import numpy as np
arr = [Link](['banana', 'cherry', 'apple'])
print([Link](arr))
o/p- ['apple' 'banana' 'cherry']
Generating Random Number:
from numpy import random
x = [Link](100)
print(x)
o/p- 67
Generate Random Float:
from numpy import random
x = [Link]()
print(x)
o/p- 0.8578710965891362
Generate Random Array:
from numpy import random
x=[Link](100, size=(5))
print(x)
o/p- [55 1 96 12 63]
Initializing numpy array with zeros
N1= [Link]((3,3))
N1
o/p- array([0,0,0],
[0,0,0],
[0,0,0]])
initializing numpy array with same number
n1=[Link]((3,3),1)
n1
op- array([1,1,1],
[1,1,1],
[1,1,1])
Vstack:
n1= [Link]([1,2,3])
n2=nparray([1,2,3])
[Link](n1,n2)
o/p- array([1,2,3])
n1= [Link]([1,2,3])
n2=nparray([1,2,3])
[Link](n1,n2)
o/p- array[]
PANDAS
Pandas is a Python library.
Pandas is used to analyze data.
Loading A CSV file into Pandas Dataframe:
import pandas as pd
df = pd.read_csv('[Link]')
print(df.to_string())
o/p- Duration Pulse Maxpulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0
10 60 103 147 329.3
Series: A Pandas Series is like a column in a table.
import pandas as pd
a = [1, 7, 2]
myvar = [Link](a)
print(myvar)
o/p- 0 1
1 7
2 2
dtype: int64
DataFrames:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
myvar = [Link](data)
print(myvar)
o/p- calories duration
0 420 50
1 380 40
2 390 45
Labels:
import pandas as pd
a = [1, 7, 2]
myvar = [Link](a, index = ["x", "y", "z"])
print(myvar)
o/p- x 1
y 7
z 2
dtype: int64
Viewing the data:
import pandas as pd
df = pd.read_csv('[Link]')
print([Link](5))
o/p-
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
print([Link]())
o/p-
Duration Pulse Maxpulse Calories
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4
print([Link]())
o/p-
<class '[Link]'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 169 non-null int64
1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
3 Calories 164 non-null float64
dtypes: float64(1), int64(3)
memory usage: 5.4 KB
None
MATPLOTLIB
Matplotlib is a low level graph plotting library in python that serves as a
visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is open source and we can use it freely.
Matplotlib is mostly written in python, a few segments are written in C,
Objective-C and Javascript for Platform compatibility.
Matplotlib Pyplot:
1. Draw a line in a diagram from position (0,0) to position (6,250):
import [Link] as plt
import numpy as np
xpoints = [Link]([0, 6])
ypoints = [Link]([0, 250])
[Link](xpoints, ypoints)
[Link]()
Matplotlib Plotting:
1. Draw a line in a diagram from position (1, 3) to position
(8, 10):
import [Link] as plt
import numpy as np
xpoints = [Link]([1, 8])
ypoints = [Link]([3, 10])
[Link](xpoints, ypoints)
[Link]()
Matplotlib Markers:
1. Mark each point with a circle:
import [Link] as plt
import numpy as np
ypoints = [Link]([3, 8, 1, 10])
[Link](ypoints, marker = 'o')
[Link]()
Format Strings:
1. Mark each point with a circle:
import [Link] as plt
import numpy as np
ypoints = [Link]([3, 8, 1, 10])
[Link](ypoints, 'o:r')
[Link]()
Line Reference:
Line Syntax Description
'-' Solid line
:' Dotted line
'--' Dashed line
'-.' Dashed/dotted line
Color Reference:
Color Syntax Description
'r' Red
'g' Green
'b' Blue
'c' Cyan
'm' Magenta
'y' Yellow
'k' Black
'w' White
Marker Size:
Set the size of the markers to 20:
import [Link] as plt
import numpy as np
ypoints = [Link]([3, 8, 1, 10])
[Link](ypoints, marker = 'o', ms = 20)
[Link]()
Marker Color:
import [Link] as plt
import numpy as np
ypoints = [Link]([3, 8, 1, 10])
[Link](ypoints, marker = 'o', ms = 20, mec = 'r')
[Link]()
Linestyle:
import [Link] as plt
import numpy as np
ypoints = [Link]([3, 8, 1, 10])
[Link](ypoints, linestyle = 'dotted')
[Link]()
Line color:
import [Link] as plt
import numpy as np
ypoints = [Link]([3, 8, 1, 10])
[Link](ypoints, color = 'r')
[Link]()
Multiple Lines:
import [Link] as plt
import numpy as np
y1 = [Link]([3, 8, 1, 10])
y2 = [Link]([6, 2, 7, 11])
[Link](y1)
[Link](y2)
[Link]()
Create Labels for a Plot:
import numpy as np
import [Link] as plt
x = [Link]([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = [Link]([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
[Link](x, y)
[Link]("Average Pulse")
[Link]("Calorie Burnage")
[Link]()
Create a Title for a Plot:
import numpy as np
import [Link] as plt
x = [Link]([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = [Link]([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
[Link](x, y)
[Link]("Sports Watch Data")
[Link]("Average Pulse")
[Link]("Calorie Burnage")
[Link]()