Steps to Generate Hypothetical the Dataset
Steps Code Explanation
Install and Import import pandas as pd The first step is installing and importing the
Libraries import numpy as np necessary libraries. We will use the import
statement to bring Pandas and NumPy into
our Python environment.
Generate Random num_rows = 1000000 num_rows:
Features num_features = 10 • Specifies the desired number of rows
X= in the dataset
pd.DataFrame(np.rand num_features:
om.rand(num_rows, • Defines the number of features or
num_features), columns we want in the dataset
columns=[f'col{i+1}' for np.random.rand(num_rows,
i in num_features):
range(num_features)]) Generate a NumPy array containing random
numbers between 0 (inclusive) and 1
(exclusive)
pd.DataFrame:
• Convert array into a Pandas data
frame
f'col{i+1}' for i in range(num_features):
A list comprehension to create a list of
column names
Generate Random y = pd.DataFrame DataFrame: Create a DataFrame
Target Variable (np.random.rand(num_ named y containing the randomly
rows), columns=['targe generated target variable.
t_column’])
Merge Features and df = pd.concat([X, y], pd.concat: Combine target variable
Target Columns axis=1) DataFrame y with our existing features
DataFrame X.
axis=1: Concatenate along the columns,
merging the two DataFrames side-by-side.
Save the Dataset as CSV df.to_csv('dataset.csv', df.to_csv:
File index=False) Save DataFrame df as a comma-separated
values (CSV) file
index=False:
Exclude the row index from the saved file
for a cleaner format
Load the Dataset df = pd.read_csv pd.read_csv: Load the data set
('dataset.csv')
Code to Explore the Dataset
Filter and Subset the Dataset
Group and Aggregate the Data
Visualise the Data
Mathematical and Statistical Functions
Visualisation Using Matplotlib
Line Graph
Scatter Plot
Bar Graph
Histogram
Box Plot
Heatmaps