0% found this document useful (0 votes)

9 views23 pages

Unit IV Python Part1

The document outlines the syllabus for a course on data wrangling using Python libraries, focusing on NumPy and Pandas. It details the data wrangling process, which includes discovery, organization, cleaning, enrichment, validation, and publishing of data. Additionally, it covers the basics of NumPy arrays, including their attributes, indexing, slicing, reshaping, and concatenation.

Uploaded by

dineshk20005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views23 pages

Unit IV Python Part1

Uploaded by

dineshk20005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

UNIT IV

PYTHON LIBRARIES FOR

1
DATA WRANGLING
Dr A Mithila
CS3352 FOUNDATIONS OF DATA SCIENCE
SYLLABUS 2

- Basics Of Numpy Arrays

– Aggregations
– Computations On Arrays
– Comparisons, Masks, Boolean Logic
– Fancy Indexing
– Structured Arrays
SYLLABUS 3

– Data Manipulation With Pandas

– Data Indexing And Selection
– Operating On Data
– Missing Data
– Hierarchical Indexing
– Combining Datasets
– Aggregation And Grouping
– Pivot Tables
Data Wrangling 4

• Data Wrangling is the process of gathering,

collecting, and transforming Raw data into
another format for better understanding,
decision-making, accessing, and analysis in less
time.
• Data Wrangling is also known as Data Munging.
Data Wrangling Process 5
Data Wrangling Process 6

1. Discovery: Before starting the wrangling process, it is

critical to think about what may lie beneath your data. It
is crucial to think critically about what results from you
anticipate from your data and what you will use it for
once the wrangling process is complete. Once you've
determined your objectives, you can gather your data.
2. Organization: After you've gathered your raw data within
a particular dataset, you must structure your data. Due
to the variety and complexity of data types and sources,
raw data is often overwhelming at first glance.
Data Wrangling Process 7

3. Cleaning: When your data is organized, you can begin

cleaning your data. Data cleaning involves removing
outliers, formatting nulls, and eliminating duplicate
data. It is important to note that cleaning data collected
from web scraping methods might be more tedious than
cleaning data collected from a database. Essentially, web
data can be highly unstructured and require more time
than structured datafrom a database.
Data Wrangling Process 8

4. Data enrichment: This step requires that you take a step back from
your data to determine if you have enough data to proceed.
Finishing the wrangling process without enough data may
compromise insights gathered from further analysis. For example,
investors looking to analyze product review data will want a
significant amount of data to portray the market and increase
investment intelligence
Data Wrangling Process 9

5. Validation: After determining you gathered enough data, you will

need to apply validation rules to your data. Validation rules,
performed in repetitive sequences, confirm that your data is
consistent throughout your dataset. Validation rules will also ensure
quality as well as security. This step follows similar logic utilized in
data normalization, a data standardization process involving
validation rules.
6. Publishing: The final step of the data munging process is data
publishing. Data publishing involves preparing the data for future
use. This may include providing notes and documentation of your
wrangling process and creating access for other users and
applications.
10
The Basics of Numpy Arrays 11

• Data manipulation in Python is nearly synonymous with NumPy

array manipulation: even newer tools like Pandas are built around
the NumPy array.
• Examples using NumPy array manipulation to access data and
subarrays, and to split, reshape, and join the arrays.
The Basics of Numpy Arrays 12

1. Attributes of arrays -Determining the size, shape, memory

consumption, and data types of arrays
2. Indexing of arrays -Getting and setting the value of individual
array elements
3. Slicing of arrays - Getting and setting smaller subarrays within a
larger array
4. Reshaping of arrays - Changing the shape of a given array
5. Joining and splitting of arrays - Combining multiple arrays into
one, and splitting one array into many
1. Attributes of arrays – shape, size, datatype 13

• N-dimensional arrays or ndarray

• Fixed sized array in memory, Datatype – Integers, Floating point
values
• dtype – attribute that accesses array
• shape – attribute that returns a tuple
• Finds out shape,dimension and item size
• ndarray.shape : resizes the array
• ndarray.size : no.of elements in an array
• ndarray.dtype : describes data type
import numpy as np
np.random.seed(0)
 x1 = np.random.randint(10, size=6) 14
x2 = np.random.randint(10, size=(3, 4))
x3 = np.random.randint(10, size=(3, 4, 5))
 print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
 print("dtype:", x3.dtype)

Output
x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
dtype: int64
15

print("itemsize:", x3.itemsize, "bytes")

print("nbytes:", x3.nbytes, "bytes")

Output:
itemsize: 8 bytes
nbytes: 480 bytes
2. Array Indexing: Accessing Single Elements 16

[ ] – Index the elements of the array

x1
Output : array([5,0,3,3,7,9])
x1[0]
Output : 5 [0] [1] [2] [3] [4] [5]
x1[4] 5 0 3 3 7 9
Output : 7 [-6] [-5] [-4] [-3] [-2] [-1]
array([[3, 5, 2, 4], [7, 6, 8, 8], [1, 6, 7, 7]])

Array Indexing: Accessing Single Elements 17

Code Output
x2 array([3,5,2,4], [7,6,8,8], [1,6,7,7])
x2[0,0] 3
x2[2, 0] 1

x2[2, -1] 7
3. Array Slicing: Accessing Subarrays 18

• To access subarrays with the slice notation, marked by the colon

(:) character - x[start:stop:step] - start=0, stop=size of dimension,
step=1
• One-dimensional subarrays : Code Output
x = np.arange(10) array([0,1,2,3,4,5,6,7,8,9])
x
 x[:5] # first five elements array([0,1,2,3,4,5])

 x[5:] # elements after index 5 array([5,6,7,8,9])

 x[4:7] # middle subarray array([4,5,6])

Array Slicing: Accessing Subarrays 19

Code Output
x[::2] # every other element array([0, 2, 4, 6, 8])
 x[1::2] # every other array([1, 3, 5, 7, 9])
element, starting at index 1
 x[::-1] # all elements, array([9, 8, 7, 6, 5, 4, 3, 2, 1,
reversed 0])
 x[5::-2] # reversed every array([5, 3, 1])
other from index 5
4. Reshaping of Arrays 20
• reshape() method

Code Output
If you want to put the numbers 1 through 9 in a 3×3 grid
 grid = np.arange(1, 10).reshape((3, 3)) print(grid) [[1 2 3]
[4 5 6]
[7 8 9]]
Conversion of a one-dimensional array into a two-dimensional row or column matrix.
 x = np.array([1, 2, 3]) # row vector via reshape array([[1, 2, 3]]
x.reshape((1, 3))

# column vector via reshape array([[1],

x.reshape((3, 1)) [2],
[3]])
5. Array Concatenation and Splitting (Joining
and Splitting of Arrays) 21

• Concatenation of arrays
• Splitting of arrays
Concatenation of arrays 22
Code Output
x = np.array([1, 2, 3]) array([1, 2, 3, 3, 2, 1])
y = np.array([3, 2, 1])
np.concatenate([x, y])

z = [99, 99, 99] [ 1 2 3 3 2 1 99 99 99]

print(np.concatenate([x, y, z]))
grid = np.array([[1, 2, 3],[4, 5, 6]]) array([[1, 2, 3],
np.concatenate([grid, grid]) [4, 5, 6],
[1, 2, 3],
[4, 5, 6]])
np.concatenate([grid, grid], axis=1) array([[1, 2, 3, 1, 2, 3],
[4, 5, 6, 4, 5, 6]])
Splitting of arrays
Code Output 23
x = [1, 2, 3, 99, 99, 3, 2, 1] [1 2 3] [99 99] [3 2 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

np.hsplit and np.vsplit

grid = np.arange(16).reshape((4, 4)) array([[ 0, 1, 2, 3],
grid [ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
upper, lower = np.vsplit(grid, [2]) [[0 1 2 3] [4 5 6 7]]
print(upper) [[ 8 9 10 11] [12 13 14 15]]
print(lower)
left, right = np.hsplit(grid, [2]) [[ 0 1]
[ 4 5]
print(left) print(right) [ 8 9]
[12 13]]
[[ 2 3]
[ 6 7]
[10 11]
[14 15]]

M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
Dse Unit 3
No ratings yet
Dse Unit 3
12 pages
Unit Iv FDS
No ratings yet
Unit Iv FDS
142 pages
Fundamentals of Data Science Unit 4 and 5
No ratings yet
Fundamentals of Data Science Unit 4 and 5
90 pages
Mds1111 Merged Numbered
No ratings yet
Mds1111 Merged Numbered
41 pages
NUMPY
No ratings yet
NUMPY
8 pages
Unit IV Fds Notes Fds
No ratings yet
Unit IV Fds Notes Fds
50 pages
Num Py
No ratings yet
Num Py
21 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Unit 3 - Numpy - VP
No ratings yet
Unit 3 - Numpy - VP
53 pages
UNIT II - Data Handling Part I
No ratings yet
UNIT II - Data Handling Part I
8 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
Numpy Arrays and Data Manipulation Guide
No ratings yet
Numpy Arrays and Data Manipulation Guide
39 pages
Topic - 2 - The Basics of NumPy Arrays 1
100% (1)
Topic - 2 - The Basics of NumPy Arrays 1
10 pages
De Lab Manual New
No ratings yet
De Lab Manual New
24 pages
Ch-2 Python Libraries For ML
No ratings yet
Ch-2 Python Libraries For ML
70 pages
Numpy
No ratings yet
Numpy
32 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Unit III - Data Manipulation Using Python
No ratings yet
Unit III - Data Manipulation Using Python
16 pages
Numpy
No ratings yet
Numpy
28 pages
Lets Begin With Numpy
No ratings yet
Lets Begin With Numpy
16 pages
Introduction to NumPy Basics
No ratings yet
Introduction to NumPy Basics
48 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
Numpy Part-1
No ratings yet
Numpy Part-1
22 pages
Basic Array Creation and Operations
No ratings yet
Basic Array Creation and Operations
27 pages
NUMPYA03
No ratings yet
NUMPYA03
36 pages
Num Py Detailed - Intro To Indexing & Filtering
No ratings yet
Num Py Detailed - Intro To Indexing & Filtering
4 pages
Numpy
No ratings yet
Numpy
27 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
Python NumPy for Developers
No ratings yet
Python NumPy for Developers
43 pages
05 NumPy - Arrays and Vectorized Computation
No ratings yet
05 NumPy - Arrays and Vectorized Computation
47 pages
Unit 4
No ratings yet
Unit 4
62 pages
Numpy
No ratings yet
Numpy
11 pages
Wa0006.
No ratings yet
Wa0006.
14 pages
Data Science Question Bank Unit-I
No ratings yet
Data Science Question Bank Unit-I
53 pages
NumPy & Pandas for Data Analysts
No ratings yet
NumPy & Pandas for Data Analysts
70 pages
NumPy Notes
No ratings yet
NumPy Notes
13 pages
Numpy
No ratings yet
Numpy
14 pages
Numpy
No ratings yet
Numpy
51 pages
Python Numpy Pandas1
No ratings yet
Python Numpy Pandas1
11 pages
Xi Informatics Practices c6
No ratings yet
Xi Informatics Practices c6
36 pages
NumPy Array Operations Guide
No ratings yet
NumPy Array Operations Guide
14 pages
Exp 12345
No ratings yet
Exp 12345
15 pages
NumPy Guide for Python Developers
No ratings yet
NumPy Guide for Python Developers
23 pages
Num Py Deep Dive
No ratings yet
Num Py Deep Dive
5 pages
Numpy
No ratings yet
Numpy
11 pages
NumPy Array Creation & Manipulation
No ratings yet
NumPy Array Creation & Manipulation
27 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
Introduction to NumPy Basics
No ratings yet
Introduction to NumPy Basics
20 pages
NumPy Basics for Engineers
No ratings yet
NumPy Basics for Engineers
13 pages
Numpy
No ratings yet
Numpy
64 pages
NumPy Essentials for Beginners
No ratings yet
NumPy Essentials for Beginners
19 pages
Unit II - Notes
No ratings yet
Unit II - Notes
10 pages
Lab 2, Python Numpy
No ratings yet
Lab 2, Python Numpy
9 pages
Python Project-Tarekegn Kelta
No ratings yet
Python Project-Tarekegn Kelta
14 pages
Mackay Hazel PythoMachine Learning With Pytorch and Scikit Learn A Co
No ratings yet
Mackay Hazel PythoMachine Learning With Pytorch and Scikit Learn A Co
135 pages
Half Yearly Practivcal QP
No ratings yet
Half Yearly Practivcal QP
3 pages
Data Analysis Project On Customer Purchases Dataset
No ratings yet
Data Analysis Project On Customer Purchases Dataset
1 page
Project Crops Production Analysis Python Xii Ip
No ratings yet
Project Crops Production Analysis Python Xii Ip
21 pages
Aparna INTERN REPORT 12
No ratings yet
Aparna INTERN REPORT 12
46 pages
Practical File Assignment
No ratings yet
Practical File Assignment
4 pages
Navya Data Analyst Resume
No ratings yet
Navya Data Analyst Resume
2 pages
Python Basics For Data Science and Analysis
No ratings yet
Python Basics For Data Science and Analysis
29 pages
Artificial Intelligence - Notes
No ratings yet
Artificial Intelligence - Notes
19 pages
Thanish Data Analyst Resume Updated
No ratings yet
Thanish Data Analyst Resume Updated
1 page
Tools For Business Analyst
No ratings yet
Tools For Business Analyst
9 pages
Python Unit 4&5 Que
No ratings yet
Python Unit 4&5 Que
33 pages
Data Scientist Resume: Parthasarathi Swain
No ratings yet
Data Scientist Resume: Parthasarathi Swain
2 pages
Hw0 Programming Handout 4TbRRB6IAl
No ratings yet
Hw0 Programming Handout 4TbRRB6IAl
2 pages
Data-Centric Computing Course Overview
No ratings yet
Data-Centric Computing Course Overview
8 pages
Data Analytics Curriculum
No ratings yet
Data Analytics Curriculum
8 pages
AI ML Interactive Checklist
No ratings yet
AI ML Interactive Checklist
26 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
Python & Excel Integration Guide
No ratings yet
Python & Excel Integration Guide
54 pages
Kavya Agarwal Resume
No ratings yet
Kavya Agarwal Resume
1 page
Ek/ Fed F'K (KK CKSMZ) Jktlfkku) Vtesj: Ikb Øe L 2025&2026
No ratings yet
Ek/ Fed F'K (KK CKSMZ) Jktlfkku) Vtesj: Ikb Øe L 2025&2026
167 pages
Data Science Lab Manual (EDA)
No ratings yet
Data Science Lab Manual (EDA)
39 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Python in 4 Hours Course Content
No ratings yet
Python in 4 Hours Course Content
2 pages
DSC-C-BCA-352T - MAJOR - Problem Solving Using Python
No ratings yet
DSC-C-BCA-352T - MAJOR - Problem Solving Using Python
4 pages
Zomoto Data Analysis Using Python - 1
No ratings yet
Zomoto Data Analysis Using Python - 1
10 pages
Internship Report
No ratings yet
Internship Report
27 pages
Charulatha - Resume
No ratings yet
Charulatha - Resume
2 pages
Netflix Data Analysis Project Report
No ratings yet
Netflix Data Analysis Project Report
7 pages

Unit IV Python Part1

Uploaded by

Unit IV Python Part1

Uploaded by

UNIT IV

PYTHON LIBRARIES FOR

- Basics Of Numpy Arrays

– Data Manipulation With Pandas

• Data Wrangling is the process of gathering,

1. Discovery: Before starting the wrangling process, it is

3. Cleaning: When your data is organized, you can begin

5. Validation: After determining you gathered enough data, you will

• Data manipulation in Python is nearly synonymous with NumPy

1. Attributes of arrays -Determining the size, shape, memory

• N-dimensional arrays or ndarray

print("itemsize:", x3.itemsize, "bytes")

[ ] – Index the elements of the array

Array Indexing: Accessing Single Elements 17

• To access subarrays with the slice notation, marked by the colon

 x[5:] # elements after index 5 array([5,6,7,8,9])

 x[4:7] # middle subarray array([4,5,6])

# column vector via reshape array([[1],

z = [99, 99, 99] [ 1 2 3 3 2 1 99 99 99]

np.hsplit and np.vsplit

You might also like