All codes are highlighted
Ctrl F the key words of your question for code
Copy Paste in Jupyter
Edit code as per question requirement example:
(Accessing element no., column names etc.)
Edit code if asked create a own list, dataframe : give
your own names and numbers-
Example: my_list = [1, 2, 3, "four", "five"]
Change this to my_list = [4, 8, 10, "fifty", "six"]
Or else Just ‘CHATGPT’
LIST
# create a list
my_list = [1, 2, 3, "four", "five"]
# accessing elements
print(my_list[0]) # 1
print(my_list[3]) # "four"
# append element
my_list.append("six")
# insert element
my_list.insert(2, "two")
# replace element
my_list[4] = 5
# delete element
del my_list[1]
TUPLE
# create a tuple
my_tuple = (1, 2, 3, "four", "five")
# accessing elements
print(my_tuple[0]) # 1
print(my_tuple[3]) # "four"
STRING
# create a string
my_string = "hello world"
# accessing elements
print(my_string[0]) # "h"
print(my_string[6]) # "w"
# replace element
my_string = my_string.replace("world", "python")
DICTIONARY
# create a dictionary
my_dict = {"name": "John", "age": 30, "city": "New York"}
# accessing elements
print(my_dict["name"]) # "John"
print(my_dict["age"]) # 30
# add element
my_dict["country"] = "USA"
# delete element
del my_dict["city"]
PANDAS SERIES (PDSERIES)
import pandas as pd
# create a Pandas Series
my_series = pd.Series([1, 2, 3, 4, 5], index=["a", "b", "c", "d", "e"])
# accessing elements
print(my_series["a"]) # 1
print(my_series["d"]) # 4
# add element
my_series["f"] = 6
# delete element
my_series = my_series.drop("b")
DATAFRAMES
import pandas as pd
# create a Pandas DataFrame
my_data = {"name": ["John", "Alice", "Bob"],
"age": [30, 25, 35],
"country": ["USA", "Canada", "UK"]}
df = pd.DataFrame(my_data)
# accessing rows
print(df.loc[0]) # first row
# accessing columns
print(df["name"]) # name column
# add row
new_row = {"name": "Mary", "age": 28, "country": "Australia"}
df = df.append(new_row, ignore_index=True)
# delete row
df = df.drop(1)
IMPORT CSV FILE
import pandas as pd
df = pd.read_csv("my_data.csv")
DATA CLEANING
import pandas as pd
# read csv file
df = pd.read_csv("my_data.csv")
# drop rows with missing values
df = df.dropna()
# replace missing values with a specific value
df = df.fillna(0)
# replace values based on a condition
df.loc[df["age"] > 30, "age"] = 40
# remove duplicates
df = df.drop_duplicates()
# change data type of a column
df["age"] = df["age"].astype(float)
# rename column
df = df.rename(columns={"name": "full_name"})
DATA MANIPULATION
import pandas as pd
# read csv files
df1 = pd.read_csv("my_data1.csv")
df2 = pd.read_csv("my_data2.csv")
# merge two dataframes
df = pd.merge(df1, df2, on="id")
# filter rows based on a condition
df = df[df["age"] > 30]
# group by a column and calculate mean of another column
df_grouped = df.groupby("country")["age"].mean()
# sort dataframe by a column
df = df.sort_values("age")
# apply a function to a column
df["age_squared"] = df["age"].apply(lambda x: x**2)
FREQUENCY TABLE
import pandas as pd
# read csv file
df = pd.read_csv("my_data.csv")
# create frequency table
freq_table = df["age"].value_counts()
CROSS TABLE
import pandas as pd
# read csv file
df = pd.read_csv("my_data.csv")
# create cross table
cross_table = pd.crosstab(df["gender"], df["age_group"])
DESCRIPTIVE STATISTICS
import pandas as pd
# read csv file
df = pd.read_csv("my_data.csv")
# compute descriptive statistics
stats = df["age"].describe()
DATA VISUALIZATION
import pandas as pd
import matplotlib.pyplot as plt
# read csv file
df = pd.read_csv("my_data.csv")
# create histogram
plt.hist(df["age"], bins=10)
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()
Example question:
1. Construct a data frame a dictionary with default
python index.
```python
import pandas as pd
data = {'name': ['John', 'Alice', 'Bob', 'Jane'],
'age': [32, 25, 45, 19],
'gender': ['M', 'F', 'M', 'F']}
df = pd.DataFrame(data)
print(df)
Output:
```
name age gender
0 John 32 M
1 Alice 25 F
2 Bob 45 M
3 Jane 19 F
```
Construct a series from a dictionary with default
2.
python index.
```python
data = {'John': 32, 'Alice': 25, 'Bob': 45, 'Jane': 19}
s = pd.Series(data)
print(s)
```
Output:
```
John 32
Alice 25
Bob 45
Jane 19
dtype: int64
```
3. Construct a data frame with user defined index.
```python
data = {'name': ['John', 'Alice', 'Bob', 'Jane'],
'age': [32, 25, 45, 19],
'gender': ['M', 'F', 'M', 'F']}
df = pd.DataFrame(data, index=['a', 'b', 'c', 'd'])
print(df)
```
Output:
```
name age gender
a John 32 M
b Alice 25 F
c Bob 45 M
d Jane 19 F
```
4. Import the data frame and name it as “df”
Assuming the data is in a CSV file called "cars.csv", the following code can be used to import it
as a data frame:
```python
df = pd.read_csv('cars.csv')
```
5. Access the price column from the data frame.
```python
price_column = df['price']
```
6. Write a syntax to determine the number of
missing values for all columns.
```python
missing_values_count = df.isnull().sum()
print(missing_values_count)
```
Output:
```
car_ID 0
symboling 0
CarName 0
fueltype 0
aspiration 0
doornumber 0
carbody 0
drivewheel 0
enginelocation 1
wheelbase 0
carlength 0
carwidth 0
carheight 0
curbweight 0
enginetype 0
cylindernumber 0
enginesize 0
fuelsystem 0
boreratio 0
stroke 0
compressionratio 0
horsepower 0
peakrpm 0
citympg 0
highwaympg 0
price 0
dtype: int64
```
7. If you find any missing values for numerical
variables, replace with its mean.
```python
df.fillna(df.mean(), inplace=True)
```
8. There is a missing value in “enginelocation”
variable, replace it with “front” category.
```python
df['enginelocation'].fillna('front', inplace=True)
```
9. Construct a frequency table for “carbody” and
interpret.
```python
frequency_table = pd.value_counts(df['carbody'])
print(frequency_table)
```
Output:
```
sedan 96
hatchback
10. To construct a cross table between “carbody”
and “enginelocation” and express the figures in
percentages by rows, we can use the pandas
`crosstab()` function with the argument
`normalize='index'`.
```python
import pandas as pd
# assuming 'df' is the name of the data frame with the relevant columns
cross_tab = pd.crosstab(df['carbody'], df['enginelocation'], normalize='index')
print(cross_tab)
```
This will give us a table with the percentage of each engine location for each car body type.
11. To determine the average price of cars for sedan
cars whose drive wheel is “rwd” and “fwd”, we can
use the pandas `groupby()` function to group the
data by the relevant columns and then calculate the
mean of the price column.
```python
sedan_df = df[df['carbody'] == 'sedan']
rwd_mean_price = sedan_df[sedan_df['drivewheel'] == 'rwd']['price'].mean()
fwd_mean_price = sedan_df[sedan_df['drivewheel'] == 'fwd']['price'].mean()
print("Average price for sedan cars with rwd drive wheel:", rwd_mean_price)
print("Average price for sedan cars with fwd drive wheel:", fwd_mean_price)
```
12. To describe various descriptives for `carlength`,
`wheelbase`, `citympg`, `highwaympg`, and `price`
by “carbody”, we can use the pandas `groupby()`
function to group the data by `carbody` and then
use the `describe()` function to get the summary
statistics for each column.
```python
grouped_by_carbody = df.groupby('carbody')[['carlength', 'wheelbase', 'citympg',
'highwaympg', 'price']]
description_by_carbody = grouped_by_carbody.describe()
print(description_by_carbody)
```
This will give us the summary statistics for each column, grouped by car body type.
13. To construct a bar chart for “enginelocation”, we
can use the pandas `value_counts()` function to get
the count of each engine location and then use the
`plot()` function with the argument `kind='bar'` to
create a bar chart.
```python
import matplotlib.pyplot as plt
engine_loc_counts = df['enginelocation'].value_counts()
engine_loc_counts.plot(kind='bar')
plt.title('Engine Location')
plt.xlabel('Location')
plt.ylabel('Count')
plt.show()
```
This will give us a bar chart with the count of each engine location.
14. To construct a boxplot for price by cylinder
number, we can use the pandas `boxplot()` function
with the relevant columns.
```python
df.boxplot(column='price', by='cylindernumber')
plt.title('Price by Cylinder Number')
plt.show()
```
This will give us a boxplot of the price column grouped by cylinder number.
15. To construct a scatter plot between price as
dependent variable and horsepower as
independent variable, we can use the `scatter()`
function from matplotlib.
```python
plt.scatter(df['horsepower'], df['price'])
plt.title('Price vs. Horsepower')
plt.xlabel('Horsepower')
plt.ylabel('Price')
plt.show()
```
This will give us a scatter plot of price against horsepower.
1. Here's a program to print the value 20 from the
given tuple1:
```python
tuple1 = ("Orange", [10, 20, 30], (5, 15, 25))
# Access the second element of tuple1, which is a list, and then access the second element of
the list
print(tuple1[1][1])
```
Output:
```
20
```
2. Here's a program to access elements 44 and 55
from the given tuple2:
```python
tuple2 = (11, 22, 33, 44, 55, 66)
# Access the fourth and fifth elements of tuple2
element1 = tuple2[3]
element2 = tuple2[4]
print(element1, element2)
```
Output:
```
44 55
```
3. Here's a program to create a 5 x 2 array from a
range between 100 to 200 with a width of 10:
```python
import numpy as np
array = np.arange(100, 200, 10).reshape(5, 2)
print(array)
```
Output:
```
[[100 110]
[120 130]
[140 150]
[160 170]
[180 190]]
```
4. Here's a program to return an array of items from
the third column of Array1:
```python
import numpy as np
Array1 = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
# Access the third column of Array1 using slicing
third_column = Array1[:, 2]
print(third_column)
```
Output:
```
[30 60 90]
```
5. Here's a program to delete the second column
from Array2:
```python
import numpy as np
Array2 = np.array([[34, 43, 73], [82, 22, 12], [53, 94, 66]])
# Delete the second column of Array2 using slicing
Array2 = np.delete(Array2, 1, axis=1)
print(Array2)
```
Output:
```
[[34 73]
[82 12]
[53 66]]
```
6. Here's a program to create two 2-D arrays and
concatenate them:
```python
import numpy as np
array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6], [7, 8]])
# Concatenate the two arrays vertically using vstack
concatenated_array = np.vstack((array1, array2))
print(concatenated_array)
```
Output:
```
[[1 2]
[3 4]
[5 6]
[7 8]]
```
7. Here's a program to add an element value 65 to
List1 and an element value 72 in the index position
2:
```python
List1 = [10, 20, 30, 40]
# Add 65 to the end of List1 using append
List1.append(65)
# Insert 72 in the index position 2 using insert
List1.insert(2, 72)
print(List1)
```
Output:
```
[10, 20, 72, 30, 40, 65]
```
8. Here's a program to remove element 9 from
List2:
```python
List2 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,]