0% found this document useful (0 votes)
19 views6 pages

Code To Create Hypothetical Data in Python

Uploaded by

Shalini Tilak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

Code To Create Hypothetical Data in Python

Uploaded by

Shalini Tilak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Steps to Generate Hypothetical the Dataset

Steps Code Explanation


Install and Import import pandas as pd The first step is installing and importing the
Libraries import numpy as np necessary libraries. We will use the import
statement to bring Pandas and NumPy into
our Python environment.

Generate Random num_rows = 1000000 num_rows:


Features num_features = 10 • Specifies the desired number of rows
X= in the dataset
pd.DataFrame(np.rand num_features:
om.rand(num_rows, • Defines the number of features or
num_features), columns we want in the dataset
columns=[f'col{i+1}' for np.random.rand(num_rows,
i in num_features):
range(num_features)]) Generate a NumPy array containing random
numbers between 0 (inclusive) and 1
(exclusive)
pd.DataFrame:
• Convert array into a Pandas data
frame
f'col{i+1}' for i in range(num_features):
A list comprehension to create a list of
column names
Generate Random y = pd.DataFrame DataFrame: Create a DataFrame
Target Variable (np.random.rand(num_ named y containing the randomly
rows), columns=['targe generated target variable.
t_column’])

Merge Features and df = pd.concat([X, y], pd.concat: Combine target variable


Target Columns axis=1) DataFrame y with our existing features
DataFrame X.
axis=1: Concatenate along the columns,
merging the two DataFrames side-by-side.

Save the Dataset as CSV df.to_csv('dataset.csv', df.to_csv:


File index=False) Save DataFrame df as a comma-separated
values (CSV) file
index=False:
Exclude the row index from the saved file
for a cleaner format

Load the Dataset df = pd.read_csv pd.read_csv: Load the data set


('dataset.csv')
Code to Explore the Dataset

Filter and Subset the Dataset


Group and Aggregate the Data

Visualise the Data


Mathematical and Statistical Functions

Visualisation Using Matplotlib


Line Graph
Scatter Plot

Bar Graph

Histogram
Box Plot

Heatmaps

You might also like