0% found this document useful (0 votes)

30 views18 pages

DataCamp DataScience

Uploaded by

Amir Vahdani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views18 pages

DataCamp DataScience

Uploaded by

Amir Vahdani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Introduction to Python

List addition:
# Create lists first and second
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]

#Paste together first and second: full

full = first + second

List Sorting:
# Sort full in descending order: full_sorted
full_sorted = sorted(full, reverse=True)

NumPy Arrays:
Sample array = np.array(sample_list)
Has +, *, -, etc. functions on each individual element
Boolean arrays:
Sample_boolean_array = sample_array > 50
Return an array of True/False values
Can be used for indexing, as in: sample_array[sample_boolean_array] #filters for all the
elements that have TRUE assigned to them

2D Numpy Arrays:
Structured as a list of lists:
Sample_2darray = np.array(sample_list_of_lists)

Can be indexed similar to lists:

Sample_array[x, y]
X and Y can be ":" or other slices such as 2:5
Also can be indexed with boolean filters, among different arrays:
# Convert positions and heights to numpy arrays: np_positions, np_heights

np_positions = np.array(positions)
np_heights = np.array(heights)

# Heights of the goalkeepers: gk_heights

gk_heights = np_heights[np_positions == 'GK']

2D Array Multiplication:
A y*x array can be multiplied in a 1*x array, multiplying the Ath column in the Ath value of the 1*x
array
2D Array Basic Statistics:
np.mean, np.median, np.std, np.corrcoef
Intermediate Python:
Matplotlib.pyplot as plt
Linear and scatter plots can be made from 2 lists with the same length named 'X' and 'Y':
Plt.plot(x, y) OR plt.scatter(x, y)
Histograms: (from a list)
Plt.hist(sample_list, bins=int)
#the arbitrary bins argument specifies the number of bins

Matplotlib.pyplot plt function samples:

Plt.xscale('log') #self-explanatory
Plt.xlabel(xlab)
#title for the x-axis
Plt.ylabel(ylab)
Plt.title(plot_title)
Plt.xticks([number_for_the axis], [titles_for_each_number])
#The second list is arbitrary. The two lists must have the same length.
Plt.yticks(y_axis_ticks)
Plt.text(x_coordinate, y_coordinate, 'text_string')
#The first two arguments are floats

Array.index('element_in_array) #returns the index number of the specified element

pyplot used as method (not function)

accomplishes similar results, with slightly different syntax:
avocados.plot(kind='scatter', x='nb_sold', y='avg_price', title='Number of avocados sol
d vs. average price')

avocados[avocados['type']=='conventional']['avg_price'].hist()
legend:
# Add a legend
plt.legend(['conventional', 'organic'])

the following snippet can be used to plot the number of NaN values in a DF’s columns:
df.isna().sum().plot(kind='bar')
the .isna() method returns a series of boolean values for each value in the dataframe.

the .dropna() method can be used to delete all rows containing NaN values:
avocados_complete = avocados_2016.dropna()

Dictionaries:
Sample_dict.keys() #returns the keys in the dictionary, can be printed
Delete:
Del(sample_dict['sample_key']

Dictionaries of dictionaries can be double-indexed to get to the values in the secondary

dictionaries.

Pandas DataFrames
Import pandas as pd
Can be construced as a dictionary of lists (as the values) and the keys (as the column names):
Cars = pd.DataFrame(dict_of_lists_as_values)
The row IDs can be specified:
Sample_dataframe.index = sample_list_of_strings

Open files:
Cars = pd.read_csv('sample_file.csv', index_col = 0)
The arbitrary 2nd argument specifies the ID column; pandas uses int values by default
for ID

Indexing (for returning or printing):

#One or multiple columns or rows as Series:

Sample_dataframe['column_title']
Sample_dataframe['row_title']
Sample_dataframe['first_row', 'second_row']

#One or multiple columns or rows as DataFrame; use double brackets, may contain 1
or 2 lists for rows and columns:
Sample_DataFrame[['column_title']]
Sample_DataFrame[['row_title']]
Sample_DataFrame[['column_title', 'second_column_title']]
#Indexing using loc or iloc:
# Print out observation for Japan
print(cars.loc['JPN'])

# Print out observations for Australia and Egypt

print(cars.loc[['AUS', 'EG']])

# Print out drives_right value of Morocco

print(cars.loc[['MOR'], ['drives_right']])

# Print sub-DataFrame
print(cars.loc[['RU', 'MOR'], ['country', 'drives_right']])

# Print out drives_right column as DataFrame

print(cars.loc[:, ['drives_right']])
Comparisons:
Boolean arguments for NumPy arrays:
import numpy as np
np.logical_and()
np.logical_or()
np.logical_not()
#example:
np.logical_and(sample_array > 21, sample_array < 22)
#returns an array of boolean values for all the elements between 21 and 22
bmi[np.logical_and(bmi>21, bmi<22)]
#returns an array of all the elements between 21 and 22
Filtering pandas DataFrames:
#DataFrame filtering:
sample_filter = sample_dataframe["column_name"] > x
print(sample_dataframe[sample_filter])
#can also use loc or a y-axis indexing to further customize the filtering
NumPy boolean functions can also be used on dataframes, since pandas is written based on
numpy:
sample_dataframe[np.logical_and(sample_dataframe["column_name"] > x, sample_dataframe["
column_name"] < y)]
#returns all the dataframe rows with x<column_name<y

Lists:
for i, v in enumerate(sample_list): #enumerate returns a list of tuples with i as the
index and v as the value in the list
print(str(i) + str(v))

For loop with NumPy arrays:

the np.nditer function can be used to run the loop on all the elements in the array with no tabular
formatting.
import numpy as np
list1 = np.array([1, 2, 3, 4])
list2 = np.array([5, 6, 7, 8])

total = np.array([list1, list2])

print("basic for-loop print:")
for i in total: print(i)

print("for-loop with nditer print: ")

for i in np.nditer(total): print (i)

"print(simple print:"
print(total)

Printing pandas DataFrames with for loops:

for column_title, row in sample_dataframe.iterrows():
print (column_title)
print (row)
indexing:
for column_title, row in sample_dataframe.iterrows():
print(column_title + ": " + row["column_title"])

Adding a column to a DataFrame:

for column_title, row in sample_dataframe.iterrows():
sample_dataframe.loc[column_title, "new_column"] = len(row["specified_column"])
the apply function is a much better method for this:
sample_dataframe["new_column"] = any_dataframe["specified_column"].apply(len) #or any
other function instead of len

Numpy.random
import numpy.random as npr
npr.seed(123) #or any other number, if u want reproducibility
for i in range (1, 10):
print(npr.rand(), end="\t")

for i in range (1, 10):

print(npr.randint(1, 10), end="\t")

Selecting min and max rows of a column from a pd.DataFrame:

# Display the closest game(s) and biggest blowouts
display(super_bowls[super_bowls['difference_pts'] == max])
display(super_bowls[super_bowls['difference_pts'] == min])
Data Manipulation with pandas:
specific dataframe attributes:
# Print the values of df
print(df.values)

# Print the column index of df

print(df.columns)

# Print the row index of df

print(df.index)

homelessness_ind = homelessness.sort_values("individuals")
#can have an argument for descending sort:
homelessness_ind = homelessness.sort_values("individuals", ascending=False)

basic column summary statistics: head, info, mean, median, max, cumsum, and cummax methods:
# Print the head of the sales DataFrame
print(sales.head())

# Print the info about the sales DataFrame

print(sales.info())

# Print the mean of weekly_sales

print(sales["weekly_sales"].mean())

# Print the median of weekly_sales

print(sales["weekly_sales"].median())

and also custom function(s) using the agg method (use a tuple of function names for multiple
functions:
# A custom IQR function
def iqr(column):
return column.quantile(0.75) - column.quantile(0.25)

# Print IQR of the temperature column

print(sales["temperature_c"].agg(iqr))

Counting:
drop the duplicate column or multiple column combinations using the drop_duplicates methods
and the column name or a list of column names:
store_depts = sales.drop_duplicates(["store", "department"])

count the number of rows belonging to each category using the value_counts() method:
store_counts = stores["store_type"].value_counts()

the count can also have normalize and sort arguments:

# Get the proportion of stores of each type
store_props = stores["store_type"].value_counts(normalize=True)

# Count the number of each department number and sort

dept_counts_sorted = departments["department_num"].value_counts(sort=True)

summary stats can be calculated for a column for different categories of rows using the groupby()
method:
sales_by_type = sales.groupby("type")["weekly_sales"].sum()
combining this with the agg method:
sales_stats = sales.groupby("type")["weekly_sales"].agg([np.max, np.min, np.mean, n
p.median])

Pivot tables:
make data queries much simpler. use the mean function by default. index: categorize data, values:
quantitative value to calculate a descriptive stat for. columns: a second level of categorization
resulting in a tabular pivot table. margins: descriptive stats for rows and columns of 2D pivot
tables. fill_value: default value for null values. aggfunc: the function performed on the values.
# Pivot for mean weekly_sales for each store type
mean_sales_by_type = sales.pivot_table(index="type", values="weekly_sales")

Summary stats can be calculated for pivot tables. the mean() method returns the mean of all the
column by default, and the mean of all the rows with the axis=”columns”.

DataFrame indexing:
indexing is an arbitrary choice.
you can set an index with the set_index() function, with one or several columns as the index:
temperatures_ind = temperatures.set_index("city")

you can also remove the indexing. the drop argument is for deleting or keeping the index data:
# Reset the index, dropping its contents
print(temperatures_ind.reset_index(drop=True))

the main feature of indexes is the loc and iloc functions, which is a much simpler way of filtering
data compared to [] subsetting:
# Subset temperatures using square brackets
print(temperatures[temperatures["city"].isin(cities)])
#vs:
# Subset temperatures_ind using .loc[]
print(temperatures_ind.loc[cities])

multilevel indexes use a list of columns for set_index(), and (a list of) tuples for loc:
# Index temperatures by country & city
temperatures_ind = temperatures.set_index(["country", "city"])

# Subset for rows to keep

print(temperatures_ind.loc[("Brazil", "Rio De Janeiro"), ("Pakistan", "Lahore")])

the sort_index() method can be used to sort a dataframe based on single- or multi-level indexes:
# Sort temperatures_ind by index values
print(temperatures_ind.sort_index())

# Sort temperatures_ind by index values at the city level

print(temperatures_ind.sort_index(level="city"))

# Sort temperatures_ind by country then descending city

print(temperatures_ind.sort_index(level=["country", "city"], ascending = [True, False])
)

Multi-level index slicing is done using tuples:

print(temperatures_srt.loc[("Pakistan", "Lahore"):("Russia", "Moscow")])
example of column-and-row indexing:
# Subset in both directions at once
print(temperatures_srt.loc[("India", "Hyderabad"):("Iraq", "Baghdad"),
"date":"avg_temp_c"])
Note that indexing using loc is inclusive on both ends, unlike list and iloc indexing which is
exclusive for the end.

Creating DataFrames:
Dataframes can be created as a dictionary of lists(each list being a column) or a list of dictionaries
(each dict being a row), using the .pd.DataFrame() method. The .to_csv(“csv_file_name”)
method does the reverse of this.

A common method of creating DFs is the read_csv() function which needs the name or address of
the CSV file as a string. This function can have several arguments such as
index_col=”column_name”, reindex(list_of_desired_index_values), reindex().ffill() to
forward-fill NaNs, parse_dates=True or parse_dates=”column_name”, etc.

Several methods are available after creating a DF, such as .sort_values(), dropna(), etc.

Joining DataFrames
The .append() method and the concat() function accomplish this. Concat is more flexible.
The .append() method joins two dataframes (or series):
df3 = df1.append(df2) #will retain the previous index values
df3 = df1.append(df2).reset_index() #will reset the indexes to have unique values
Concat uses a list of dataframes (or series):
df3 = df1.append(df2) #will retain the previous index values
df3 = df1.append(df2).reset_index() #will reset the indexes to have unique values
The optional axis=’rows’ and axis=’columns’ arguments can be used to specify the
concatenation axis.
Horizontal concatenating is actually an outer join.
If inner joining is desired, the optional join=’inner’ argument can be used.

If the DFs have repeating index values, the optional keys=[#list] argument can be used to specify
an outer index value related to each DF, to avoid ambiguity:
rain1314 = pd.concat([rain2013, rain2014], keys=[2013, 2014], axis=0)
The keys argument actually works with both vertical and horizontal concatenating.
The concat function can also be performed on a dictionary of DFs! Here, the dictionary keys will act
as the keys argument for a list concat. #fascinating.

Merging DataFrames
The function pd.merge() can be used to perform an inner-join on two DFs. the on= argument can
use a (list of ) columns to take for the join, just like SQL joining:
mergedDF = pd.merge(DF1, DF2, on=['col1', 'col2'])
If, for example, ‘col1’ is in both DFs, then the merged dataframe will have only one copy of that
column, otherwise, the suffixes -x and -y are used to specify the parent DF for each column. The
suffixes can be changed using the argument suffixes=[]:
mergedDF = pd.merge(DF1, DF2, on=['col1', 'col2'], suffixes=['_DF1', '_DF2'])
The on= argument works only when the desired joining column has the same name in both DFs. If
they differ in name, two arguments, left_on=’DF1_column_name’ and
right_on=’DF2_column_name’ must be used.
pd.merge() can also perform a left-join (!) by using the argument how=’left’. how=’inner’ is the
default behavior. how=’right’ and how=’outer’ are also possible.

Joining DFs
Another solution is the merged_DF=DF1.join(DF2) method. Join too, can have a how argument.
4. Data visualization with Matplotlib
# Import the matplotlib.pyplot submodule and name it plt
import matplotlib.pyplot as plt

# Create a Figure and an Axes with plt.subplots

fig, ax = plt.subplots()
This creates a framework to work with for plotting. Multiple plots can be shown using the
ax.plot() method.
The ax.plot() method can have several optional arguments besides the necessary X and Y axis,
including color=, marker=, and linestyle=:
# Plot Seattle data, setting data appearance
ax.plot(seattle_weather["MONTH"], seattle_weather["MLY-PRCP-NORMAL"], color='b', marke
r='o', linestyle='--')
The default behavior of ax, figure = plt.subplots() is ax, figure = plt.subplots(1, 1).
Passing numbers other than 1 (for the number of rows and columns, in order) allows us to work
with an array of plots. Indexing each subplot is similar to array indexing:
# In the top right (index 0,1), plot month and Seattle temperatures
ax[0, 1].plot(seattle_weather["MONTH"], seattle_weather["MLY-TAVG-NORMAL"])
If the array is one-dimensional, only on number is needed for indexing, e.g. ax[2].

Time-series data with Matplotlib

Typically, we start with something like this:
climate_change = pd.read_csv('climate_change.csv', parse_dates = ["date"], index_col =
"date")

The whole process can be summarized with a function:

# Define a function called plot_timeseries
def plot_timeseries(axes, x, y, color, xlabel, ylabel):

# Plot the inputs x,y in the provided color

axes.plot(x, y, color=color)

# Set the x-axis label

axes.set_xlabel(xlabel)

# Set the y-axis label

axes.set_ylabel(ylabel, color=color)

# Set the colors tick params for y-axis

axes.tick_params('y', colors=color)

Annotation example:
ax.annotate(">1 degree", xy=(pd.Timestamp('2015-10-06'), 1))
which has the following options:

ax2.annotate(">1 degree", xy=(pd.Timestamp('2015-10-06'), 1), xytex

t=(pd.Timestamp('2008-10-06'), -0.2), arrowprops={'arrowstyle':'->', 'color':'gray'})
Quantitative (bar-plots, histograms):
# Plot a bar-chart of gold medals as a function of country
ax.bar(medals.index, medals['Gold'])

# Set the x-axis tick labels to the country names

ax.set_xticklabels(medals.index, rotation = 90)

# Set the y-axis label

ax.set_ylabel("Number of medals")

plt.show()

Stacked bar-plot:
# Add bars for "Gold" with the label "Gold"
#ax.bar(medals.index, medals['Gold'], label='Gold')

# Stack bars for "Silver" on top with label "Silver"

ax.bar(medals.index, medals['Silver'], bottom=medals['Gold'], label = 'Silver')

# Stack bars for "Bronze" on top of that with label "Bronze"

ax.bar(medals.index, medals['Bronze'], bottom=medals['Gold'] + medals['Silver'], label
= 'Bronze')

Histogram:
fig, ax = plt.subplots()

# Plot a histogram of "Weight" for mens_rowing

ax.hist(mens_rowing["Weight"], histtype='step', label="Rowing", bins=5)

# Compare to histogram of "Weight" for mens_gymnastics

ax.hist(mens_gymnastics["Weight"], histtype='step', label="Gymnastics", bins=5)

ax.set_xlabel("Weight (kg)")
ax.set_ylabel("# of observations")

Error bars:
Can be used as the argument y_err = DF[‘column_name’].std() to the ax.bar() method; or for
a time plot with the ax.errorbar(x, y, yerr=DF[‘y_sd’]) method.
ax.bar("Rowing", mens_rowing["Height"].mean(), yerr=mens_rowing["Height"].std())

Box-plots:
Take a list of DF columns:
ax.boxplot([mens_rowing["Height"], mens_gymnastics["Height"]]

Labeling is also done using a list:

ax.set_xticklabels(["Rowing", "Gymnastics"])
Scatter plots can use the c= and s= arguments to show additional data with color and size,
respectively.
# Add data: "co2", "relative_temp" as x-y, index as color
ax.scatter(climate_change['co2'], climate_change['relative_temp'], c=climate_change.ind
ex)

PLT exporting
plt.style.use(‘#style_name’)
fig.savefig(‘file_name.png’, dpi=300)
fig.set_size_inches([5, 3])
Seaborn
import seaborn as sns
we start with sns.countplot() and sns.scatterplot():
sns.countplot(y=region)

you can pandas DataFrame in seaborn using the data= argument:

sns.countplot(x='Spiders', data=df)
x and y will take dataframe column names when doing this.

Additional information can be shown using the hue= argument, which will take another dataframe
column or a list as an input:
sns.scatterplot(x='absences', y='G3', data=student_data, hue='location')

When working with hue, options such as palette= and hue_order=[] are available:
# Create a dictionary mapping subgroup values to colors
palette_colors = {'Rural': "green", 'Urban': "blue"}

# Create a count plot of school with location subgroups

sns.countplot(x='school', data=student_data, hue='location', palette=palette_colors)

Relplot() is the more sophisticated method of working with seaborn since it allows multiple
subplots. Here you’ll need a kind=”” argument to specify the type of plot you want. You’ll also
need the col= and row= methods to use multiple subplots. More specific subplot allocation will
need the col_wrap= and col_order= methods.
sns.relplot(x="absences", y="G3",
data=student_data,
kind="scatter")

sns.relplot(x="G1", y="G3",
data=student_data,
kind="scatter",
col="schoolsup",
col_order=["yes", "no"],
row='famsup',
row_order=['yes', 'no'])

Further customizations:
size=, alpha=, style=:
sns.relplot(x="horsepower", y="mpg",
data=mpg, kind="scatter",
size="cylinders",
hue='cylinders')

sns.relplot(kind='scatter', data=mpg,
x='acceleration', y='mpg',
style='origin', hue='origin')
Line plots:
sns.relplot(x="model_year", y="horsepower",
data=mpg, kind="line",
ci=None, style="origin",
hue="origin",
dashes=False,
markers=True)

Catplot()
Has the same col= and row= parameters of relplot()
# Create column subplots based on age category
sns.catplot(y="Internet usage", col='Age Category', data=survey_dat
a,
kind="count")

So far we have seen kind=’bar’ and kind=’count’.

Boxplot: kind=’box’
# Create a box plot with subgroups and omit the outliers
sns.catplot(data=student_data, kind='box',
x='internet', y='G3',
hue='location')

Pointplot: kind=’point’
sns.catplot(x="romantic", y="absences",
data=student_data,
kind="point",
hue="school",
ci=None)

Style and other customizations

sns.set_style():
‘white’, ‘dark’, ‘whitegrid’, ‘darkgrid’ and ‘ticks’ are the 5 default styles.

sns.set_palette():
several palette styles are available, including diverging (for showing dichotomies) and sequential
(for showing a possible trend).
Two examples are: ‘Purples’ are ‘RdBu’.

sns.set_context():
Sets the thickness of the plot elements for use in different contexts, including ‘paper’,
‘notebook’, ‘talk’, ‘poster’.

Facet Grid and subplots

Catplot() and relplot() create Facet Grid objects, which can be assigned to a variable. Setting a
title for the facetgrid tells seaborne we want the titles for the whole plot, not an individual subplot.
If we want a title for a subplot, we should assign it to an Axes Subplot object.

FaceGrid title:
# Create scatter plot
g = sns.relplot(x="weight",
y="horsepower",
data=mpg,
kind="scatter")

# Add a title "Car Weight vs. Horsepower"

g.fig.suptitle("Car Weight vs. Horsepower")

AxesSubplot:
# Create scatter plot
g = sns.boxplot(x="weight",
y="horsepower",
data=mpg,
kind="scatter")

# Add a title "Car Weight vs. Horsepower"

g.set_title("Car Weight vs. Horsepower", y=1.03) #the y= argument sets the height of t
he title

Tick and label options:

plt.xticks():
# Create point plot
sns.catplot(x="origin",
y="acceleration",
data=mpg,
kind="point",
join=False,
capsize=0.1)

# Rotate x-tick labels

plt.xticks(rotation=90)

Labels:
# Create line plot
g = sns.lineplot(x="model_year", y="mpg_mean",
data=mpg_mean,
hue="origin")

# Add a title "Average MPG Over Time"

g.set_title("Average MPG Over Time")

# Add x-axis and y-axis labels

g.set(xlabel='Car Model Year', ylabel='Average MPG')

Examples putting it all together:

# Set palette to "Blues"
sns.set_palette("Blues")

# Adjust to add subgroups based on "Interested in Pets"

g = sns.catplot(x="Gender",
y="Age", data=survey_data,
kind="box", hue='Interested in Pets')

# Set title to "Age of Those Interested in Pets vs. Not"

g.fig.suptitle("Age of Those Interested in Pets vs. Not")

# Show plot
plt.show()

# Set the figure style to "dark"

sns.set_style("dark")

# Adjust to add subplots per gender

g = sns.catplot(x="Village - town", y="Likes Techno",
data=survey_data, kind="bar",
col='Gender')

# Add title and axis labels

g.fig.suptitle("Percentage of Young People Who Like Techno", y=1.02)
g.set(xlabel="Location of Residence",
ylabel="% Who Like Techno")

# Show plot
plt.show()

Array - Numpy 1
No ratings yet
Array - Numpy 1
14 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
OCS353 Data Science Manual Print
No ratings yet
OCS353 Data Science Manual Print
58 pages
Ex. No.: 01 Working With Numpy Arrays
No ratings yet
Ex. No.: 01 Working With Numpy Arrays
30 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Python For Data Science
No ratings yet
Python For Data Science
4 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Data Science Fundamentals Lab
No ratings yet
Data Science Fundamentals Lab
24 pages
Numpy and Pandas Data Analysis Examples
No ratings yet
Numpy and Pandas Data Analysis Examples
45 pages
Untitled 8
No ratings yet
Untitled 8
2 pages
CS3361 - Data Science University Question Paper Answers
No ratings yet
CS3361 - Data Science University Question Paper Answers
46 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Practical File 12th
No ratings yet
Practical File 12th
19 pages
Datacamp Python Intro Guide
No ratings yet
Datacamp Python Intro Guide
10 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Rahul ML File' (1) 2
No ratings yet
Rahul ML File' (1) 2
30 pages
Data Analysis with Python Libraries
No ratings yet
Data Analysis with Python Libraries
29 pages
Remove Duplicates in Numpy Arrays
No ratings yet
Remove Duplicates in Numpy Arrays
40 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
3 pages
Week 4 - Introduction To Python #3
No ratings yet
Week 4 - Introduction To Python #3
47 pages
Python Cheat Sheet 2.0
100% (2)
Python Cheat Sheet 2.0
10 pages
Python Basics Cheat Sheet
No ratings yet
Python Basics Cheat Sheet
3 pages
Python Codes
No ratings yet
Python Codes
17 pages
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
No ratings yet
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
19 pages
Introduction To Python (Part III)
No ratings yet
Introduction To Python (Part III)
29 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
Fds Lab
No ratings yet
Fds Lab
16 pages
Python Pandas Data Manipulation Guide
No ratings yet
Python Pandas Data Manipulation Guide
11 pages
Practicals 1 To 4
No ratings yet
Practicals 1 To 4
15 pages
M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
Batch2 Ds
No ratings yet
Batch2 Ds
34 pages
Essential Guide To Data Science For Petroleum Engineers
No ratings yet
Essential Guide To Data Science For Petroleum Engineers
150 pages
Pandas and Binary Search Assignment 8
No ratings yet
Pandas and Binary Search Assignment 8
10 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
Batch2 FDS Printout
No ratings yet
Batch2 FDS Printout
38 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
DSF Lab Exp Full
No ratings yet
DSF Lab Exp Full
88 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
ELE492 - ELE492 - Image Process Lecture Notes 5
No ratings yet
ELE492 - ELE492 - Image Process Lecture Notes 5
41 pages
Section 7
No ratings yet
Section 7
33 pages
IP Practical File 2022
No ratings yet
IP Practical File 2022
26 pages
The Series Data Structure: Import Pandas As PD
No ratings yet
The Series Data Structure: Import Pandas As PD
8 pages
Economy of Different Countries
No ratings yet
Economy of Different Countries
24 pages
Python Data Analysis Cheat Sheet
100% (3)
Python Data Analysis Cheat Sheet
9 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
48 pages
02 Python Basics
No ratings yet
02 Python Basics
52 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Answers 1
No ratings yet
Answers 1
17 pages
Python Basics for Data Science
No ratings yet
Python Basics for Data Science
30 pages
14.python Dictionaries and Its Methods
No ratings yet
14.python Dictionaries and Its Methods
13 pages
NumPy and Pandas: Essential Python Libraries
No ratings yet
NumPy and Pandas: Essential Python Libraries
72 pages
Section I: Introduction
No ratings yet
Section I: Introduction
11 pages
Probability & Random Variables
No ratings yet
Probability & Random Variables
8 pages
Unpooled vs Pooled Standard Error
No ratings yet
Unpooled vs Pooled Standard Error
4 pages
Overview of Invertebrate Phyla
No ratings yet
Overview of Invertebrate Phyla
16 pages
Solow 57
No ratings yet
Solow 57
10 pages
Psychology Module IGNOU
No ratings yet
Psychology Module IGNOU
11 pages
Year 9 Non-Linear Relationships Lesson
No ratings yet
Year 9 Non-Linear Relationships Lesson
11 pages
Quantum Physics: Bogoliubov Transformations
No ratings yet
Quantum Physics: Bogoliubov Transformations
8 pages
Structural Stability Theory and Practice Buckling of Columns Beams Plates and Shells Sukhvarsh Jerath Instant Download Full Chapters
No ratings yet
Structural Stability Theory and Practice Buckling of Columns Beams Plates and Shells Sukhvarsh Jerath Instant Download Full Chapters
101 pages
Miller, N. (1988) Ratios in
No ratings yet
Miller, N. (1988) Ratios in
10 pages
LCM & HCF
No ratings yet
LCM & HCF
27 pages
4.5.2023 Unit - 05 - Assessment - H
No ratings yet
4.5.2023 Unit - 05 - Assessment - H
3 pages
German Malaysian Institute Department of Industrial Electronic
No ratings yet
German Malaysian Institute Department of Industrial Electronic
11 pages
Spojbr Obi
No ratings yet
Spojbr Obi
218 pages
Using The Power Law Model To Quantify Shear Thinning Behavior On A Rotational Rheometer
No ratings yet
Using The Power Law Model To Quantify Shear Thinning Behavior On A Rotational Rheometer
7 pages
Swan Use
No ratings yet
Swan Use
143 pages
High-Entropy Alloys (Second Edition) B.S. Murty Instant Download
100% (2)
High-Entropy Alloys (Second Edition) B.S. Murty Instant Download
142 pages
Skull Crusher-3 Class XI JEE (Adv) Physics
No ratings yet
Skull Crusher-3 Class XI JEE (Adv) Physics
3 pages
Linear Regression 2
No ratings yet
Linear Regression 2
22 pages
18 - How Craig Barton Wishes He'd Taught Maths
100% (1)
18 - How Craig Barton Wishes He'd Taught Maths
6 pages
Information Theory and Coding NOTES
No ratings yet
Information Theory and Coding NOTES
129 pages
Mastering Pivot Tables and Charts
No ratings yet
Mastering Pivot Tables and Charts
30 pages
Strip and Circular Footings On A Mohr-Coulomb Material
No ratings yet
Strip and Circular Footings On A Mohr-Coulomb Material
12 pages
In-Process Motor Testing Results
No ratings yet
In-Process Motor Testing Results
5 pages
Datalog: A Tutorial on Deductive DBs
No ratings yet
Datalog: A Tutorial on Deductive DBs
13 pages
Differential Equation Problem Set
No ratings yet
Differential Equation Problem Set
3 pages
Physical Quantities and Measurement
No ratings yet
Physical Quantities and Measurement
2 pages
Ratios and Rates Practice Worksheet
No ratings yet
Ratios and Rates Practice Worksheet
2 pages
LLMs and Machine Learning in CFD
No ratings yet
LLMs and Machine Learning in CFD
11 pages
6.9 - Practice - Changing Forms of Circle Equations - Honors - Answers-1
No ratings yet
6.9 - Practice - Changing Forms of Circle Equations - Honors - Answers-1
2 pages
Scientific Reasons / Short Questions: Chapter # 1 Scope of Physics
No ratings yet
Scientific Reasons / Short Questions: Chapter # 1 Scope of Physics
22 pages
Understanding Motion: Key Concepts and Problems
No ratings yet
Understanding Motion: Key Concepts and Problems
75 pages
Field Guide To Image Processing Khan Iftekharuddin Latest PDF 2025
No ratings yet
Field Guide To Image Processing Khan Iftekharuddin Latest PDF 2025
119 pages
Basic Statistical Concepts Guide
No ratings yet
Basic Statistical Concepts Guide
17 pages

DataCamp DataScience

Uploaded by

DataCamp DataScience

Uploaded by

Introduction to Python

#Paste together first and second: full

Can be indexed similar to lists:

# Heights of the goalkeepers: gk_heights

Matplotlib.pyplot plt function samples:

Array.index('element_in_array) #returns the index number of the specified element

pyplot used as method (not function)

Dictionaries of dictionaries can be double-indexed to get to the values in the secondary

Indexing (for returning or printing):

# Print out observations for Australia and Egypt

# Print out drives_right value of Morocco

# Print out drives_right column as DataFrame

For loop with NumPy arrays:

total = np.array([list1, list2])

print("for-loop with nditer print: ")

Printing pandas DataFrames with for loops:

Adding a column to a DataFrame:

for i in range (1, 10):

Selecting min and max rows of a column from a pd.DataFrame:

# Print the column index of df

# Print the row index of df

# Print the info about the sales DataFrame

# Print the mean of weekly_sales

# Print the median of weekly_sales

# Print IQR of the temperature column

the count can also have normalize and sort arguments:

# Count the number of each department number and sort

# Subset for rows to keep

# Sort temperatures_ind by index values at the city level

# Sort temperatures_ind by country then descending city

Multi-level index slicing is done using tuples:

# Create a Figure and an Axes with plt.subplots

Time-series data with Matplotlib

The whole process can be summarized with a function:

# Plot the inputs x,y in the provided color

# Set the x-axis label

# Set the y-axis label

# Set the colors tick params for y-axis

ax2.annotate(">1 degree", xy=(pd.Timestamp('2015-10-06'), 1), xytex

# Set the x-axis tick labels to the country names

# Set the y-axis label

# Stack bars for "Silver" on top with label "Silver"

# Stack bars for "Bronze" on top of that with label "Bronze"

# Plot a histogram of "Weight" for mens_rowing

# Compare to histogram of "Weight" for mens_gymnastics

Labeling is also done using a list:

you can pandas DataFrame in seaborn using the data= argument:

# Create a count plot of school with location subgroups

So far we have seen kind=’bar’ and kind=’count’.

Style and other customizations

Facet Grid and subplots

# Add a title "Car Weight vs. Horsepower"

# Add a title "Car Weight vs. Horsepower"

Tick and label options:

# Rotate x-tick labels

# Add a title "Average MPG Over Time"

# Add x-axis and y-axis labels

Examples putting it all together:

# Adjust to add subgroups based on "Interested in Pets"

# Set title to "Age of Those Interested in Pets vs. Not"

# Set the figure style to "dark"

# Adjust to add subplots per gender

# Add title and axis labels

You might also like