0% found this document useful (0 votes)
8 views31 pages

Detailed NumPy Pandas Matplotlib Intro

This document is a detailed guide on three essential Python libraries for data analysis and visualization: NumPy, Pandas, and Matplotlib, providing examples and explanations for each. It also covers concepts in Artificial Intelligence and Machine Learning, detailing their definitions, types, advantages, and challenges. The guide includes code snippets for practical applications of these libraries and concepts.

Uploaded by

Adreeja Mahato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views31 pages

Detailed NumPy Pandas Matplotlib Intro

This document is a detailed guide on three essential Python libraries for data analysis and visualization: NumPy, Pandas, and Matplotlib, providing examples and explanations for each. It also covers concepts in Artificial Intelligence and Machine Learning, detailing their definitions, types, advantages, and challenges. The guide includes code snippets for practical applications of these libraries and concepts.

Uploaded by

Adreeja Mahato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

NumPy, Pandas, and Matplotlib -

Detailed Guide
Submitted by: Adreeja Mahato

This document is a comprehensive guide to three essential Python libraries for data analysis
and visualization: NumPy, Pandas, and Matplotlib. It consolidates examples, explanations,
and corrected code snippets from the original content provided, along with expanded
details for better understanding. Each section is organized into subsections with examples
and expected outputs.

1. NumPy
NumPy (Numerical Python) is a powerful library for numerical computations and array
manipulations.

1.1 Array Creation


Single Dimensional Array:

import numpy as np
n1 = [Link]([10,20,30,40])
# Output: array([10, 20, 30, 40])

Multi Dimensional Array:

n2 = [Link]([[10,20,30,40], [50,60,70,80], [90,80,70,60]])


# Output:
[[10 20 30 40]
[50 60 70 80]
[90 80 70 60]]

1.2 Special Arrays


Zeros Array:
n1 = [Link]((5,5))
# Output: 5x5 matrix of zeros

Full Array:
n1 = [Link]((6,6), 9)
# Output: 6x6 matrix filled with 9
Arange:
n1 = [Link](0, 101, 10)
# Output: [0,10,20,30,40,50,60,70,80,90,100]

Random Integers:
n1 = [Link](1, 100, 5)
# Output: e.g., [52, 5, 74, 37, 66]

1.3 Shape, Indexing, and Slicing


Example:
n1 = [Link]([[1,2,3],[4,5,6]])
[Link]
# Output: (2,3)

Indexing:
n1[0] -> first row
n1[:,1] -> second column
[Link]() -> transpose of matrix

1.4 Set Operations


Intersection:
np.intersect1d(n1, n2)

Difference:
np.setdiff1d(n1, n2)

1.5 Arithmetic & Statistics


Element-wise operations:
n1+1, n1-1, n1*2, n1/2

Statistics:
[Link](n1), [Link](n1), [Link](n1), [Link](n1), [Link](n1)

1.6 Linear Algebra


Matrix Multiplication:
[Link](n2) or [Link](n1, n2)

2. Pandas
Pandas is a library for data manipulation and analysis. It provides Series (1D) and
DataFrame (2D) structures.
2.1 Series
Creating a Series:
import pandas as pd
s1 = [Link]([1,2,3,4,5])

Custom Index:
s1 = [Link]([1,2,3,4,5], index=['a','b','c','d','e'])

From Dictionary:
s1 = [Link]({'a':1, 'b':2, 'c':3})

Extracting elements:
s1[2], s1[2:5], s1[-3:]

2.2 DataFrame
Creating from Dictionary:
data = {'Name': ['Shubham','Yash','Yugansh'], 'Marks':[97,92,23]}
df = [Link](data)

Reading CSV:
df = pd.read_csv('[Link]')
[Link](), [Link]()

2.3 DataFrame Operations


Selection: df['Name'], df[['Name','Marks']]

Filtering: df[df['Marks'] > 50]

GroupBy: [Link]('Name').mean()

Handling Missing Data: [Link](), [Link](0)

3. Matplotlib
Matplotlib is a library for creating visualizations. It can generate plots, histograms, scatter
plots, etc.

3.1 Line Plot


import [Link] as plt
x = [1,2,3,4,5]
y = [10,20,25,30,40]
[Link](x, y, marker='o')
[Link]('X-axis')
[Link]('Y-axis')
[Link]('Line Plot')
[Link]()
3.2 Bar Chart
x = ['A','B','C','D']
y = [3,7,1,5]
[Link](x,y)
[Link]('Bar Chart')
[Link]()

3.3 Histogram
data = [1,2,2,3,3,3,4,4,4,4]
[Link](data, bins=4)
[Link]('Histogram')
[Link]()

3.4 Scatter Plot


x = [5,7,8,7,2,17,2,9,4,11]
y = [99,86,87,88,100,86,103,87,94,78]
[Link](x, y)
[Link]('Scatter Plot')
[Link]()

ARTIFICIAL INTELLIGENCE

 Artificial Intelligence (AI) is a branch of computer science that focuses on creating


systems that can perform tasks that normally require human intelligence. These
tasks include learning, reasoning, problem-solving, understanding natural language,
recognizing patterns, decision-making, and adapting to new situations.

 Jhon McCarthy first coined the term Artificial Intelligence in the year 1956.

 Types of AI-

 Narrow AI: Designed for a specific task(e.g., Google Maps, siri,


chatbots).

 General AI: Hypothetical AI that can perform any intellectual task


like a human.
 Superintelligent AI: Beyond human intelligence

 Advantages-

a. Automate repetitive tasks.

b. Processes large amount of data quickly.

c. Improves accuracy in predictions and decisions.

 Challenges-

a. Job displacement.

b. Bias in AI systems.

c. Data privacy issues.

d. Ethical concerns.

MACHINE LEARNING

 Arthur Samuel first coined the term Machine Learning in the year 1959.

 Machine Learning is a subset of Artificial Intelligence (AI) that allows computers to


learn patterns from data and improve their performance automatically without
being explicitly programmed.

 Types of Machine Learning—

1. Supervised Learning: The model learns from labeled data (input +


correct output).

Example: Predicting house prices, spam detection, stock price prediction.

2. Unsupervised Learning: The model learns from unlabeled data, finding


hidden patterns.

Example: Customer segmentation, grouping similar products, market basket analysis.


3. Reinforcement Learning: The model learns by trial and error using
rewards and punishments.

Example: Self-driving cars, game-playing AIs (like AlphaGo).

 Key Steps in Machine Learning:-

1. Data Collection → Gather raw data.


2. Data Preprocessing → Clean and prepare the data.
3. Feature Selection/Engineering → Pick the right variables (features).
4. Model Training → Use algorithms (like Decision Trees, Neural
Networks, etc.) to learn patterns.

5. Testing & Evaluation → Check model accuracy using test data.


6. Deployment → Use the trained model in real-world applications.

 Common ML Algorithms:-

 Linear Regression → Predict continuous values (e.g., salary


prediction).

 Logistic Regression → Classification (e.g., spam or not spam).

 Decision Trees & Random Forests → For both classification &


regression.

 Support Vector Machines (SVM) → Classification with margin


separation.

 K-Means Clustering → Unsupervised grouping.

 Neural Networks & Deep Learning → Advanced tasks like


image and speech recognition.
 Advantages:-

o Automates decision-making.

o Learns and improves from experience.

o Can analyze massive amounts of data.

 Challenges:-

o Requires lots of data.

o Risk of bias in data → unfair results.

o Lack of transparency in complex models (“black box problem”).

NUMPY

 NumPy is a Python library.

 NumPy is used for working with arrays.

 NumPy is short for "Numerical Python".

 Creating a NumPy array:

import numpy as np

arr = [Link]([1, 2, 3, 4, 5])

print(arr)

print(type(arr))
o/p- [1 2 3 4 5]

<class '[Link]'>

 0-D Arrays:

import numpy as np

arr = [Link](42)

print(arr)

o/p- 42

 1-D Arrays:

import numpy as np

arr = [Link]([1, 2, 3, 4, 5])

print(arr)

o/p- [1 2 3 4 5]

 2-D Arrays:

import numpy as np

arr = [Link]([[1, 2, 3], [4, 5, 6]])

print(arr)
o/p- [[1 2 3]

[4 5 6]]

 3-D Arrays:

import numpy as np

arr = [Link]([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr)

o/p- [[[1 2 3]

[4 5 6]]

[[1 2 3]

[4 5 6]]]

 Accessing Array Elements:

1. Getting the first element-

import numpy as np

arr = [Link]([1, 2, 3, 4])

print(arr[0])

o/p- 1

2. Getting the third and fourth elements-

import numpy as np

arr = [Link]([1, 2, 3, 4])

print(arr[2] + arr[3])
o/p- 7

 Array Slicing:

1. Slicing in python means taking elements from one given index to another
given index.

2. We pass slice instead of index like this: [start:end].

3. We can also define the step, like this: [start:end:step].

4. If we don't pass start its considered 0

5. If we don't pass end its considered length of array in that dimension

6. If we don't pass step its considered 1

 Slice elements from index 1 to index 5 from the following array:

import numpy as np

arr = [Link]([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5])

o/p- [2 3 4 5]

 Slice elements from the beginning to index 4 (not included):


import numpy as np

arr = [Link]([1, 2, 3, 4, 5, 6, 7])

print(arr[:4])

o/p- [1 2 3 4]

 Negative Slicing:

1. Use the minus operator to refer to an index from the end.

2. Slice from the index 3 from the end to index 1 from the end:

import numpy as np

arr = [Link]([1, 2, 3, 4, 5, 6, 7])

print(arr[-3:-1])

o/p- [5 6]

 NumPy Array Shape:

1. The shape of an array is the number of elements in each dimension.

2. Print the shape of a 2-D array:

import numpy as np

arr = [Link]([[1, 2, 3, 4], [5, 6, 7, 8]])

print([Link])

o/p- (2, 4)
 Joining NumPy Arrays:

1. Join two arrays:

import numpy as np

arr1 = [Link]([1, 2, 3])

arr2 = [Link]([4, 5, 6])

arr = [Link]((arr1, arr2))

print(arr)

o/p- [1 2 3 4 5 6]

 Sorting Arrays:

Sorting means putting elements in an ordered sequence

import numpy as np

arr = [Link]([3, 2, 0, 1])

print([Link](arr))

o/p- [0 1 2 3]

import numpy as np

arr = [Link](['banana', 'cherry', 'apple'])

print([Link](arr))
o/p- ['apple' 'banana' 'cherry']

 Generating Random Number:

from numpy import random

x = [Link](100)

print(x)

o/p- 67

 Generate Random Float:

from numpy import random

x = [Link]()

print(x)

o/p- 0.8578710965891362

 Generate Random Array:

from numpy import random

x=[Link](100, size=(5))

print(x)

o/p- [55 1 96 12 63]

 Initializing numpy array with zeros


N1= [Link]((3,3))

N1

o/p- array([0,0,0],

[0,0,0],

[0,0,0]])

 initializing numpy array with same number

n1=[Link]((3,3),1)

n1

op- array([1,1,1],

[1,1,1],

[1,1,1])

 Vstack:

n1= [Link]([1,2,3])

n2=nparray([1,2,3])

[Link](n1,n2)

o/p- array([1,2,3])

n1= [Link]([1,2,3])
n2=nparray([1,2,3])

[Link](n1,n2)

o/p- array[]

PANDAS

 Pandas is a Python library.

 Pandas is used to analyze data.

 Loading A CSV file into Pandas Dataframe:

import pandas as pd

df = pd.read_csv('[Link]')

print(df.to_string())

o/p- Duration Pulse Maxpulse Calories

0 60 110 130 409.1

1 60 117 145 479.0

2 60 103 135 340.0

3 45 109 175 282.4

4 45 117 148 406.0

5 60 102 127 300.5

6 60 110 136 374.0

7 45 104 134 253.3

8 30 109 133 195.1


9 60 98 124 269.0

10 60 103 147 329.3

 Series: A Pandas Series is like a column in a table.

import pandas as pd

a = [1, 7, 2]

myvar = [Link](a)

print(myvar)

o/p- 0 1

1 7

2 2

dtype: int64

 DataFrames:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

myvar = [Link](data)

print(myvar)

o/p- calories duration

0 420 50

1 380 40
2 390 45

 Labels:

import pandas as pd

a = [1, 7, 2]

myvar = [Link](a, index = ["x", "y", "z"])

print(myvar)

o/p- x 1

y 7

z 2

dtype: int64

 Viewing the data:

import pandas as pd

df = pd.read_csv('[Link]')

print([Link](5))

o/p-

Duration Pulse Maxpulse Calories

0 60 110 130 409.1

1 60 117 145 479.0

2 60 103 135 340.0

3 45 109 175 282.4


4 45 117 148 406.0

5 60 102 127 300.5

print([Link]())

o/p-

Duration Pulse Maxpulse Calories

164 60 105 140 290.8

165 60 110 145 300.4

166 60 115 145 310.2

167 75 120 150 320.4

168 75 125 150 330.4

print([Link]())

o/p-

<class '[Link]'>

RangeIndex: 169 entries, 0 to 168

Data columns (total 4 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Duration 169 non-null int64

1 Pulse 169 non-null int64

2 Maxpulse 169 non-null int64

3 Calories 164 non-null float64

dtypes: float64(1), int64(3)

memory usage: 5.4 KB


None

MATPLOTLIB

 Matplotlib is a low level graph plotting library in python that serves as a


visualization utility.

 Matplotlib was created by John D. Hunter.

 Matplotlib is open source and we can use it freely.

 Matplotlib is mostly written in python, a few segments are written in C,


Objective-C and Javascript for Platform compatibility.

 Matplotlib Pyplot:

1. Draw a line in a diagram from position (0,0) to position (6,250):

import [Link] as plt


import numpy as np

xpoints = [Link]([0, 6])


ypoints = [Link]([0, 250])

[Link](xpoints, ypoints)
[Link]()
 Matplotlib Plotting:

1. Draw a line in a diagram from position (1, 3) to position


(8, 10):

import [Link] as plt


import numpy as np

xpoints = [Link]([1, 8])


ypoints = [Link]([3, 10])

[Link](xpoints, ypoints)
[Link]()
 Matplotlib Markers:

1. Mark each point with a circle:

import [Link] as plt


import numpy as np

ypoints = [Link]([3, 8, 1, 10])

[Link](ypoints, marker = 'o')


[Link]()
 Format Strings:

1. Mark each point with a circle:

import [Link] as plt


import numpy as np

ypoints = [Link]([3, 8, 1, 10])

[Link](ypoints, 'o:r')
[Link]()
 Line Reference:

Line Syntax Description

'-' Solid line

:' Dotted line

'--' Dashed line

'-.' Dashed/dotted line

 Color Reference:

Color Syntax Description

'r' Red

'g' Green
'b' Blue

'c' Cyan

'm' Magenta

'y' Yellow

'k' Black

'w' White

 Marker Size:

Set the size of the markers to 20:

import [Link] as plt


import numpy as np

ypoints = [Link]([3, 8, 1, 10])

[Link](ypoints, marker = 'o', ms = 20)


[Link]()
 Marker Color:

import [Link] as plt


import numpy as np

ypoints = [Link]([3, 8, 1, 10])

[Link](ypoints, marker = 'o', ms = 20, mec = 'r')


[Link]()
 Linestyle:

import [Link] as plt


import numpy as np

ypoints = [Link]([3, 8, 1, 10])

[Link](ypoints, linestyle = 'dotted')


[Link]()
 Line color:

import [Link] as plt


import numpy as np

ypoints = [Link]([3, 8, 1, 10])

[Link](ypoints, color = 'r')


[Link]()
 Multiple Lines:

import [Link] as plt


import numpy as np

y1 = [Link]([3, 8, 1, 10])
y2 = [Link]([6, 2, 7, 11])

[Link](y1)
[Link](y2)

[Link]()
 Create Labels for a Plot:

import numpy as np
import [Link] as plt

x = [Link]([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = [Link]([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

[Link](x, y)

[Link]("Average Pulse")
[Link]("Calorie Burnage")

[Link]()
 Create a Title for a Plot:

import numpy as np
import [Link] as plt

x = [Link]([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = [Link]([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

[Link](x, y)

[Link]("Sports Watch Data")


[Link]("Average Pulse")
[Link]("Calorie Burnage")

[Link]()

You might also like