0% found this document useful (0 votes)

17 views114 pages

Lecture 2 - Data Wrangling - Update

The lecture focuses on data wrangling in data science, emphasizing the use of tabular data and the pandas library for data manipulation. Key concepts include DataFrames, Series, and methods for extracting, adding, modifying, and arranging data. The lecture also introduces the five essential verbs for data wrangling: filter, select, mutate, arrange, and summarise.

Uploaded by

jevictoria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views114 pages

Lecture 2 - Data Wrangling - Update

Uploaded by

jevictoria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture 2 : Data wrangling

2 0 2 5 S P R ING I N T RO TO DATA S CI E NCE

Congratulations!!!
You have collected or have been given
a box of data.

What does this "data" actually look like?

How will you work with it?
Data Scientists Love Tabular Data

• Tabular data = data in a table.

• Typically:
• A row represents one observation (here, a single person running for president in a particular year).
• A column represents some characteristic, or feature, of that observation (here, the political party of
that person).
Standard Python Data Science Tool: pandas

Using pandas, we can:

• Arrange data in a tabular format.

• Extract useful information filtered by specific conditions.
• Operate on data to gain new insights.
• Apply NumPy functions to our data.

Stands for "panel data"

DataFrames

• In the "language" of pandas, we call a table a DataFrame

• We think of DataFrames as collections of named columns, called Series.
Series - Custom Index

• We can provide index labels for items in a Series by passing an index list.

• A Series index can also be changed.

Selection in Series

• We can select a single value or a set of values in a Series using:

• A single label
• A list of labels
• A filtering condition
Selection in Series

• Say we want to select values in the Series that satisfy a particular condition:
1. Apply a boolean condition to the Series. This creates a new Series of boolean values.
2. Index into our Series using this boolean condition. pandas will select only the entries in the
Series that satisfy the condition.
DataFrames of Series!

• Typically, we will work with Series using the perspective that they are columns in a DataFrame.

• We can think of a DataFrame as a collection of Series that all share the same Index.
Creating a DataFrame

• The syntax of creating DataFrame is:

pandas.DataFrame(data, index, columns)

• Many approaches exist for creating a DataFrame. Here, we will go over the
most popular ones.
• From a CSV file.
• Using a list and column name(s).
• From a dictionary.
• From a Series.
Creating a DataFrame

• From a CSV file.

• Using a list and column name(s).
• From a dictionary.
• From a Series.
Creating a DataFrame

• From a CSV file.

• Using a list and column name(s).
• From a dictionary.
• From a Series.
Creating a DataFrame

• From a CSV file.

• Using a list and column name(s).
• From a dictionary.
• From a Series.
Creating a DataFrame

• From a CSV file.

• Using a list and column name(s).
• From a dictionary.
• From a Series.
Indices Are Not Necessarily Row Numbers

An Index (a.k.a. row labels) can also:

• Be non-numeric.

• Have a name, e.g. "Candidate".

Indices Are Not Necessarily Unique

The row labels that constitute an index do not have to be unique.

• Left: The index values are all unique and numeric, acting as a row number.
• Right: The index values are named and non-unique.
Modifying Indices

• We can select a new column and set it as the index of the DataFrame.

• Example: Setting the index to the "Party" column.

Resetting the Index

• We can change our mind and reset the Index back to the default list of integers.
Column Names Are Usually Unique!

• Column names in pandas are almost always unique.

• Example: Really shouldn’t have two columns named "Candidate".
Retrieving the Index, Columns, and shape

• Sometimes you'll want to extract the list of row and column labels.

• For row labels, use DataFrame.index:

• For column labels, use DataFrame.columns:

•For shape of the DataFrame we use DataFrame.shape:

The Relationship Between DataFrames, Series, and Indices

• We can think of a DataFrame as a collection of Series that all share the

same Index.
• Candidate, Party, %, Year, and Result Series all share an Index from 0 to
5.
Data Science Lifecycle
Recall the Data Science Lifecycle

Data wrangling / visualization

Into to Data Wrangling

• Raw data is often messy, incomplete, and inconsistent

• Data wrangling is the process of cleaning, transforming, and organizing data into a
structured and usable format for analysis.

• The purpose is to ensure that data is accurate, consistent, and ready for modeling and
visualization.
Into to data wrangling

• Hadley Wickham says that the five verbs help solve 90% of challenge
• filter: select rows (observations) in a data frame;
• select: select columns (variables) in a data frame;
• mutate: add new columns to a data frame;
• arrange: reorder rows in a data frame;
• summarise: collapses a data frame to a single row;

• We will cover essential tools for implementing this verbs using pandas.

* Grammar of Data Manipulation wirh R (2020)

1. Extracting Data (filter, select)

• One of the most basic tasks for manipulating a DataFrame is to extract rows and columns of
interest.
1. Extracting Data (filter, select)

• Common ways we may want to extract data:

• Grab the first or last k rows in the DataFrame.
• Grab data with a certain label.
• Grab data at a certain position.

• We'll find that all three of these methods are useful to us in data
manipulation tasks.
.head and .tail

• The simplest scenarios: We want to extract the first or last n rows from the
DataFrame.
• df.head(k) will return the first k rows of the DataFrame df.
• df.tail(k) will return the last k rows.
Label-based Extraction: .loc

• A more complex task: We want to extract data with specific column or index labels.

• The .loc accessor allows us to specify the labels of rows and columns we wish to extract.

• We describe "labels" as the bolded text at the top and left of a DataFrame.
Label-based Extraction: .loc

• Arguments to .loc can be:

• A list.
• A slice (syntax is inclusive of the right hand side of the slice).
• A single value.
Label-based Extraction: .loc

• Arguments to .loc can be:

• A list.
• A slice (syntax is inclusive of the right hand side of the slice).
• A single value.
Label-based Extraction: .loc

• Arguments to .loc can be:

• A list.
• A slice (syntax is inclusive of the right hand side of the slice).
• A single value.
Label-based Extraction: .loc

• Arguments to .loc can be:

• A list.
• A slice (syntax is inclusive of the right hand side of the slice).
• A single value.
Label-based Extraction: .loc

• Arguments to .loc can be:

• A list.
• A slice (syntax is inclusive of the right hand side of the slice).
• A single value.

A series

A single string
Integer-based Extraction: .iloc

• Arguments to .iloc can be:

• A list.
• A slice (syntax is exclusive of the right hand side of the slice).
• A single value.
Integer-based Extraction: .iloc

• Arguments to .iloc can be:

• A list.
• A slice (syntax is exclusive of the right hand side of the slice).
• A single value.

elections.iloc[[1, 2, 3], [0, 1, 2]]

Integer-based Extraction: .iloc

• Arguments to .iloc can be:

• A list.
• A slice (syntax is exclusive of the right hand side of the slice).
• A single value.

elections.iloc[[1, 2, 3], 0:3]

Integer-based Extraction: .iloc

• Arguments to .loc can be:

• A list.
• A slice (syntax is exclusive of the right hand side of the slice).
• A single value.

elections.iloc[:, 0:3]
Integer-based Extraction: .iloc

• Arguments to .loc can be:

• A list.
• A slice (syntax is exclusive of the right hand side of the slice).
• A single value.

elections.iloc[[1, 2, 3], 1]

elections.iloc[0, 1]
.loc vs .iloc

• Remember:
• .loc performs label-based extraction
• .iloc performs integer-based extraction

• When choosing between .loc and .iloc, you'll usually choose .loc.
• Safer: If the order of data gets shuffled in a public database, your code still works.
• Readable: Easier to understand what elections.loc[:, ["Year", "Candidate", "Result"]]
means than elections.iloc[:, [0, 1, 4]]

• .iloc can still be useful.

• Example: If you have a DataFrame of movie earnings sorted by earnings, can use .iloc to get the
median earnings for a given year (index into the middle).
Context-dependent Extraction: [ ]

• Selection operators:
• .loc selects items by label. First argument is rows, second argument is columns.
• .iloc selects items by integer. First argument is rows, second argument is columns.
• [] only takes one argument, which may be:
• A slice of row numbers.
• A list of column labels.
• A single column label.

• That is, [] is context sensitive.

Context-dependent Extraction: [ ]

•[] only takes one argument, which may be:

• A slice of row integers.
• A list of column labels.
• A single column label.

elections[3:7]
Context-dependent Extraction: [ ]

•[] only takes one argument, which may be:

• A slice of row integers.
• A list of column labels.
• A single column label.

elections[["Year", "Candidate", "Result"]]

Context-dependent Extraction: [ ]

•[] only takes one argument, which may be:

• A slice of row integers.
• A list of column labels.
• A single column label.

elections["Candidate"]
Why Use []?

• In short: [ ] can be much more concise than .loc or .iloc

• Consider the case where we wish to extract the "Candidate" column. It is far simpler to write
elections["Candidate"] than it is to write elections.loc[:, "Candidate"]

• In practice, [ ] is often used over .iloc and .loc in data science work. Typing time adds up!
Boolean Array Input for .loc and [ ]

• We learned to extract data according to its integer position (.iloc) or its label (.loc)
• What if we want to extract rows that satisfy a given condition?
• .loc and [ ] also accept boolean arrays as input.
• Rows corresponding to True are extracted; rows corresponding to False are not.

babynames_first_10_rows = babynames.loc[:9, :]
Boolean Array Input for .loc and [ ]

babynames_first_10_rows = babynames.loc[:9, :]
Boolean Array Input for .loc and [ ]
Boolean Array Input for .loc and [ ]

We can perform the same operation using .loc.

babynames_first_10_rows.loc[[True, False, True, False, True, False, True, False,

True, False], :]
Boolean Array Input

Useful because boolean arrays can be generated by using logical operators on Series.
Boolean Array Input

Can also use .loc.

Boolean Array Input

• Boolean Series can be combined using various operators, allowing filtering of results by
multiple criteria.
• The & operator allows us to apply logical_operator_1 and logical_operator_2
• The | operator allows us to apply logical_operator_1 or logical_operator_2

babynames[(babynames["Sex"] == "F") | (babynames["Year"] < 2000)]

Bitwise Operators

• & and | are examples of bitwise operators. They allow us to apply multiple logical conditions.

• If p and q are boolean arrays or Series:

• Boolean array selection is a useful tool, but can lead to overly verbose code for complex conditions.
babynames[(babynames["Name"] == "Bella") | (babynames["Name"] == "Alex") |
(babynames["Name"] == "Narges") | (babynames["Name"] == "Lisa")]

• pandas provides many alternatives, for example:

• .isin
names = ["Bella", "Alex", "Narges", "Lisa"]
babynames[babynames["Name"].isin(names)]
• .str.startswith

babynames[babynames["Name"].str.startswith("N")]
2. Adding Data

• Remember the five verbs

• filter: select rows (observations) in a data frame;
• select: select columns (variables) in a data frame;
• mutate: add new columns to a data frame;
• arrange: reorder rows in a data frame;
• summarise: collapses a data frame to a single row;
Syntax for Adding a Column

•Adding a column is easy:

1. Use [ ] to reference the desired new column.
2. Assign this column to a Series or array of the appropriate length.
Syntax for Modifying a Column

• Modifying a column is very similar to adding a column.

1.Use [ ] to reference the existing column.
2.Assign this column to a new Series or array of the appropriate length.

# Modify the "name_lengths" column to be one less than its original value
babynames["name_lengths"] = babynames["name_lengths"]-1
Syntax for Renaming a Column

• Rename a column using the (creatively named) .rename() method.

•.rename() takes in a dictionary that maps old column names to their new ones.
Syntax for Dropping a Column (or Row)

•Remove columns using the (also creatively named) .drop method.

•The .drop() method assumes you're dropping a row by default. Use axis="columns" to drop a column
instead.

babynames = babynames.drop("Length", axis="columns")

An Important Note: DataFrame Copies
• Notice that we re-assigned babynames to an updated value on the previous slide.
babynames = babynames.drop("Length", axis="columns")
• By default, pandas methods create a copy of the DataFrame, without changing the original
DataFrame at all.

•To apply our changes, we must update our DataFrame to this new, modified copy.
3. Arrange rows

• Remember the five verbs

• Series.sort_values( ) will automatically sort all values in the Series.

• DataFrame.sort_values(column_name) must specify the name of the column to be used for
sorting.

babynames["Name"].sort_values() babynames.sort_values(by="Count", ascending=False)

Sorting By Length

• Let’s try to solve the sorting problem with different approaches

Approach 1: Create a Temporary Column and Sort Based on the New Column

• Sorting the DataFrame as usual

Approach 2: Sorting Using the key Argument
Approach 3: Create a Temporary Column and Sort Based on the New Column

• Suppose we want to sort by the number of occurrences of "dr" and "ea"s.

• Use the Series.map method.

4. Summaize data

• Remember the five verbs

Our goal:

• Group together rows that fall under the same category.

• For example, group together all rows from the same year.
• Perform an operation that aggregates across all rows in the category.
• For example, sum up the total number of babies born in that year.
Why Group?
.groupby()

• A .groupby() operation involves some combination of splitting the object, applying a

function, and combining the results.
• Calling .groupby() generates DataFrameGroupBy objects → "mini" sub-DataFrames
• Each subframe contains all rows that correspond to the same group (here, a particular year)
.groupby().agg()

• We cannot work directly with DataFrameGroupBy objects! The diagram below is to help understand what goes on
conceptually – in reality, we can't "see" the result of calling .groupby.
• Instead, we transform a DataFrameGroupBy object back into a DataFrame using .agg
• .agg is how we apply an aggregation operation to the data.
Putting it all together

dataframe.groupby(column_name).agg(aggregation_function)
•babynames[["Year", "Count"]].groupby("Year").agg(sum) computes the total number of
babies born in each year.
Alternatives …

• Now, we create groups for each year.

babynames.groupby("Year")[["Count"]].agg(sum)
or
babynames.groupby("Year")[["Count"]].sum()
or
babynames.groupby("Year").sum(numeric_only=True)
Concluding groupby.agg

• A groupby operation involves some combination of splitting the object, applying a function, and combining the
results.

• So far, we've seen that df.groupby("Year").agg(sum):

• Split df into sub-DataFrames based on Year.
• Apply the sum function to each column of each sub-DataFrame.
• Combine the results of sum into a single DataFrame, indexed by Year.
Groupby Review Question
Answer
Case Study: Name "Popularity"

• Goal: Find the baby name with sex "F" that has fallen in popularity the most in California.

f_babynames = babynames[babynames["Sex"]=="F"]
f_babynames = f_babynames.sort_values(["Year"])
jenn_counts_series =f_babynames[f_babynames["Name"]=="Jennifer"]["Count"]

Number of Jennifers Born in California Per Year.

What Is "Popularity"?

Goal: Find the baby name with sex "F" that has fallen in popularity the most in California.

How do we define "fallen in popularity?"

• Let’s create a metric: "Ratio to Peak" (RTP).

• The RTP is the ratio of babies born with a given name in 2022 to the maximum number of babies born
with that name in any year.

Example for "Jennifer":

• In 1972, we hit peak Jennifer. 6,065 Jennifers were born.

• In 2022, there were only 114 Jennifers.
• RTP is 114 / 6065 = 0.018796372629843364.
Calculating RTP

max_jenn = max(f_babynames[f_babynames["Name"]=="Jennifer"]["Count"])
6065

curr_jenn = f_babynames[f_babynames["Name"]=="Jennifer"]["Count"].iloc[-1]
114 Remember: f_babynames is sorted by year.
.iloc[-1] means “grab the latest year”
rtp = curr_jenn / max_jenn
0.018796372629843364

def ratio_to_peak(series):
return series.iloc[-1] / max(series)
jenn_counts_ser = f_babynames[f_babynames["Name"]=="Jennifer"]["Count"]
ratio_to_peak(jenn_counts_ser)
0.018796372629843364
Calculating RTP Using .groupby()

• .groupby() makes it easy to compute the RTP for all names at once!

rtp_table = f_babynames.groupby("Name")[["Count"]].agg(ratio_to_peak)
Renaming Columns After Grouping

• By default, .groupby will not rename any aggregated columns (the column is still named "Count", even
though it now represents the RTP.

• For better readability, we may wish to rename "Count" to "Count RTP”

rtp_table = f_babynames.groupby("Name")[["Count"]].agg(ratio_to_peak)
rtp_table = rtp_table.rename(columns={"Count":"Count RTP"})
Some Data Science Payoff

• By sorting rtp_table we can see the names whose popularity has decreased the most.

rtp_table.sort_values("Count RTP")
Some Data Science Payoff

• We can get the list of the top 10 names and then plot popularity with:

top10 = rtp_table.sort_values("Count RTP").head(10).index

Raw GroupBy Objects and Other Methods

•The result of a groupby operation applied to a DataFrame is a DataFrameGroupBy object.

•It is not a DataFrame!

grouped_by_year = elections.groupby("Year")
type(grouped_by_year)

•Given a DataFrameGroupBy object, can use various functions to generate DataFrames (or Series). agg is
only one choice:
groupby.size() and groupby.count()
groupby.size() and groupby.count()
Filtering by Group

•Another common use for groups is to filter data.

• groupby.filter takes an argument func.

• func is a function that:
• Takes a DataFrame as input.
• Returns either True or False.
• filter applies func to each group/sub-DataFrame:
• If func returns True for a group, then all rows belonging to the group are preserved.
• If func returns False for a group, then all rows belonging to that group are filtered out.
• Notes:
• Filtering is done per group, not per row. Different from boolean filtering.
• Unlike agg(), the column we grouped on does NOT become the index!
groupby.filter()
Filtering Elections Dataset

• Going back to the elections dataset.

• Let's keep only election year results where the max '%' is less than 45%.
elections.groupby("Year").filter(lambda sf: sf["%"].max() < 45)
groupby Quiz

• We want to know the best election by each party.

groupby Quiz

• We want to know the best election by each party.

• Best election: The election with the highest % of votes.
• For example, Democrat’s best election was in 1964, with candidate Lyndon Johnson
winning 61.3% of votes.
Attempt #1

• Why does the table seem to claim that Woodrow Wilson won the presidency in 2020?

elections.groupby("Party").max().head(10)
Problem with Attempt #1

• Why does the table seem to claim that Woodrow Wilson won the presidency in 2020?
• Every column is calculated independently! Among Democrats:
• Last year they ran: 2020.
• Alphabetically the latest candidate name: Woodrow Wilson.
• Highest % of vote: 61.34%.
Attempt #2: Motivation

• We want to preserve entire rows, so we need an aggregate function that does that.
Attempt #2: Solution
Attempt #2: Solution

• First sort the DataFrame so that rows are in descending order of %.

• Then group by Party and take the first item of each sub-DataFrame.
Grouping by Multiple Columns

• Suppose we want to build a table showing the total number of babies born of each sex in each year.

• One way is to groupby using both columns of interest:

babynames.groupby(["Year", "Sex"])[["Count"]].agg(sum).head(6)
Pivot function

babynames_pivot = babynames.pivot(
index = "Year", # rows (turned into index)
columns = "Sex", # column values
values = ["Count"], # field(s) to process in
each group
aggfunc = np.sum, # group operation
)
babynames_pivot.head(6)
groupby(["Year", "Sex"]) vs. pivot

• The pivot() output more naturally represents our data.

Pivot output

babynames.groupby(["Year",
"Sex"])[["Count"]].agg(sum).head(6)
Pivot Tables with Multiple Values

• Pivot_table is same as Pivot() but can takes multiple columns as values

babynames_pivot = babynames.pivot_table(
index = "Year", # rows (turned into
index)
columns = "Sex", # column values
values = ["Count", "Name"],
aggfunc = np.max, # group operation)
babynames_pivot.head(6)
Pivot Table Mechanics
Where are we?

• 5 verbs for data manipulation

• Joining tables

• Tidy data
Joining Tables

• Suppose want to know the popularity of presidential candidate's names in 2022.

• Example: Dwight Eisenhower's name Dwight is not popular today, with only 5 babies born
with this name in California in 2022.

• To solve this problem, we’ll have to join tables.

Creating Table 1: Babynames in 2022

• Let's set aside names in California from 2022 first:

babynames_2022 = babynames[babynames["Year"] == 2022]

babynames_2022
Creating Table 2: Presidents with First Names

•To join our table, we’ll also need to set aside the first names of each candidate.

elections["First Name"] =
elections["Candidate"].str.split().str[0]
Joining Our Tables: Two Options

merged = pd.merge(left = elections, right = babynames_2022, left_on = "First Name", right_on = "Name")
merged = elections.merge(right = babynames_2022, left_on = "First Name", right_on = "Name")
Tidy data

• You can represent the same underlying data in multiple ways.

• Let’s see the example table which records : country, year, population, and number of
documented cases of TB (tuberculosis)
• Which one is the best representation?
Another example

<Table 2>
Tidy data

• Tidy data is data where:

• 1. Each variable is in a column.
• 2. Each observation is a row.
• 3. Each value is a cell.
Tidy data
Tidy data –cont.

• Why tidy data?

• 1. You can easily add observations
• 2. You can easily add columns
• 3. Many python packages are designed with tidy data in mind
Two verbs for tidy data

• It makes “wide” data longer.

• It makes “long” data wider.
• Useful when multiple observations stored in one
• Useful when observations are spread over
row
multiple rows
melt()
melt()

Pandas 1
No ratings yet
Pandas 1
49 pages
Lec 03 - DS100 Fa24 - Pandas II
No ratings yet
Lec 03 - DS100 Fa24 - Pandas II
63 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Lec 02 - DS100 Fa23 - Pandas 1
No ratings yet
Lec 02 - DS100 Fa23 - Pandas 1
61 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
1 - Indexing in Pandas
No ratings yet
1 - Indexing in Pandas
8 pages
Python Data Science: Pandas & ML Basics
100% (1)
Python Data Science: Pandas & ML Basics
41 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
Pandas
No ratings yet
Pandas
5 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Dao2702 (L7)
No ratings yet
Dao2702 (L7)
37 pages
Pandas Row/Column Selection Guide
No ratings yet
Pandas Row/Column Selection Guide
7 pages
Data Frames
No ratings yet
Data Frames
42 pages
Pandas Practice
No ratings yet
Pandas Practice
7 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
8 pages
Python Pandas: Data Manipulation Guide
No ratings yet
Python Pandas: Data Manipulation Guide
84 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
Eda Unit 2
No ratings yet
Eda Unit 2
65 pages
Pandas DataFrame: Syntax and Usage
No ratings yet
Pandas DataFrame: Syntax and Usage
70 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Data Handling with Pandas: Series & DataFrame
No ratings yet
Data Handling with Pandas: Series & DataFrame
44 pages
ICT2103 Full Book-Part-3
No ratings yet
ICT2103 Full Book-Part-3
14 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Pandas
No ratings yet
Pandas
7 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
Mastering Pandas: A Comprehensive Guide
No ratings yet
Mastering Pandas: A Comprehensive Guide
13 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Ip Study
No ratings yet
Ip Study
18 pages
Subject IP
No ratings yet
Subject IP
9 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
Line by Line 12 IP
No ratings yet
Line by Line 12 IP
21 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Data Frames
No ratings yet
Data Frames
60 pages
Python For ML
No ratings yet
Python For ML
41 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Data Analysis with Pandas
No ratings yet
Data Analysis with Pandas
31 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Python & Pandas for Beginners
No ratings yet
Python & Pandas for Beginners
29 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
33 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Python Data Analysis Libraries Guide
100% (1)
Python Data Analysis Libraries Guide
43 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
32 pages
Pandaspythonfordatascience
No ratings yet
Pandaspythonfordatascience
1 page
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Chapter 7
No ratings yet
Chapter 7
35 pages
Data Visualization HW Summary
No ratings yet
Data Visualization HW Summary
2 pages
기말고사 기출 2
No ratings yet
기말고사 기출 2
17 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
보조자료 SeqTech
No ratings yet
보조자료 SeqTech
35 pages
기초회로이론및실험 Lab1 LTspice
No ratings yet
기초회로이론및실험 Lab1 LTspice
17 pages
California Baby Names
No ratings yet
California Baby Names
6,906 pages
InternshipReport B3 DataCenter Final
No ratings yet
InternshipReport B3 DataCenter Final
44 pages
Exercises Topic 3. Part 1
No ratings yet
Exercises Topic 3. Part 1
8 pages
Listening Practice Test for English Learners
No ratings yet
Listening Practice Test for English Learners
11 pages
7 A
No ratings yet
7 A
6 pages
Ai (Artificial Intelligence) Techniques For Predictive Maintenance
No ratings yet
Ai (Artificial Intelligence) Techniques For Predictive Maintenance
2 pages
Dating App Hacks for Men
No ratings yet
Dating App Hacks for Men
3 pages
Food Receipt Program in Python
No ratings yet
Food Receipt Program in Python
4 pages
Frontend Internship Test
No ratings yet
Frontend Internship Test
6 pages
Chapter 1 - San School
No ratings yet
Chapter 1 - San School
17 pages
Automated Calibration System Overview
No ratings yet
Automated Calibration System Overview
4 pages
New Design Questions Only Updated (No Supporting Documents) 23-09-24 - B6 (v5)
No ratings yet
New Design Questions Only Updated (No Supporting Documents) 23-09-24 - B6 (v5)
14 pages
WooCommerce Product Table
No ratings yet
WooCommerce Product Table
22 pages
International Journal of Information Management: Merve Bayramusta, V. Aslihan Nasir
No ratings yet
International Journal of Information Management: Merve Bayramusta, V. Aslihan Nasir
10 pages
Civil Engineering Survey Quiz
No ratings yet
Civil Engineering Survey Quiz
4 pages
Data Analyst Roadmap with Resources
No ratings yet
Data Analyst Roadmap with Resources
9 pages
Syllabus Spring2022 v1-3
No ratings yet
Syllabus Spring2022 v1-3
2 pages
Digital Conference System Overview
No ratings yet
Digital Conference System Overview
68 pages
Oracle 1z0-931 Dumps
No ratings yet
Oracle 1z0-931 Dumps
11 pages
Plant Maintenance Lifecycle Management
No ratings yet
Plant Maintenance Lifecycle Management
2 pages
Day - 2 C-Programming
No ratings yet
Day - 2 C-Programming
28 pages
Dataguard Build
No ratings yet
Dataguard Build
8 pages
Fa.01a Fire Alarm Site Plan-Fa.01
No ratings yet
Fa.01a Fire Alarm Site Plan-Fa.01
1 page
Mapping
No ratings yet
Mapping
44 pages
Integrated Software Architecture For Real-Time Inverter Control and Diagnostics
No ratings yet
Integrated Software Architecture For Real-Time Inverter Control and Diagnostics
6 pages
Computer Equipment Installation Guide
No ratings yet
Computer Equipment Installation Guide
25 pages
Well-Structured Purlins, Girts and Bridging
No ratings yet
Well-Structured Purlins, Girts and Bridging
32 pages
Shiva Kumar Chin Tala
No ratings yet
Shiva Kumar Chin Tala
1 page
20 DLD Lec 20 Dont Care, NAND Implementation Dated 27 Nov 2020 Lecture Slides
No ratings yet
20 DLD Lec 20 Dont Care, NAND Implementation Dated 27 Nov 2020 Lecture Slides
17 pages
TCP/IP Application Layer Overview
No ratings yet
TCP/IP Application Layer Overview
5 pages
Measurement Error and Misclassification in Statistics and Epidemiology Impacts and Bayesian Adjustments 1st Edition Paul Gustafson
100% (21)
Measurement Error and Misclassification in Statistics and Epidemiology Impacts and Bayesian Adjustments 1st Edition Paul Gustafson
85 pages