0% found this document useful (0 votes)

14 views16 pages

Chapter 1+ Python Basics-1

Uploaded by

faryalqayyumktk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views16 pages

Chapter 1+ Python Basics-1

Uploaded by

faryalqayyumktk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Data Science

Chapter 1
1. What is Data Science?
Definition:
Data Science is an interdisciplinary field that uses data to extract insights, build models, and
support decision-making.
It combines three key elements:
o Statistics & Mathematics → to understand patterns in data.
o Computer Science (Programming) → to process and analyze data using algorithms.
o Domain Knowledge → to apply findings to real-world problems.
o Example:
o Healthcare: Using data to predict diseases.
o Netflix: Recommending movies based on your watch history.
o Banks: Detecting fraudulent transactions.

2. Why Data Science?

In the modern world, huge amounts of data are generated every second (social media, online
shopping, healthcare, finance, etc.).
Raw data is useless unless converted into meaningful insights.
Data Science helps in decision-making, predictions, and automation.
Applications of Data Science:
Netflix: Movie/TV show recommendations.
Facebook & Instagram: Friend suggestions and content ranking.
Banks: Fraud detection and credit scoring.
Healthcare: Disease risk prediction and drug discovery.

3. The Data Science Workflow

The typical process followed by a data scientist is:
o Define the Problem → What are we trying to solve?
o Collect Data → Gather information from databases, APIs, sensors, etc.
o Clean & Prepare Data → Handle missing values, remove duplicates, fix errors.
o Explore Data (EDA – Exploratory Data Analysis) → Use statistics and visualizations
to understand data.
o Build Models (Machine Learning/AI) → Train algorithms to make predictions.
o Evaluate & Interpret Results → Check accuracy, performance, and meaning.
o Communicate Insights → Present findings through reports, dashboards, or
visualizations.
Think of it as a cycle – data science is an iterative process.

4. Case Study: DataSciencester

To make these ideas practical, Joel Grus (in Data Science from Scratch) introduced a fictional
social network called DataSciencester.
(a) Representing Users
Each user is stored in Python as a dictionary with an id and name.
users = [
{"id": 0, "name": "Hero"},
{"id": 1, "name": "Dunn"},
{"id": 2, "name": "Sue"},
]
Here, id is unique, making it easy to track users.
(b) Representing Friendships
Friendships are stored as pairs of user IDs.
friendships = [
(0, 1),
(0, 2),
(1, 2)
]
(0, 1) means user 0 (Hero) is friends with user 1 (Dunn).
(c) Building a Friend Network
We can attach a list of friends to each user:
for user in users:
user["friends"] = [] # start with empty list
for i, j in friendships:
users[i]["friends"].append(users[j])
users[j]["friends"].append(users[i])
Now:
Hero’s friends → Dunn, Sue
Dunn’s friends → Hero, Sue
(d) Analyzing Connections
Number of Friends:
def number_of_friends(user):
return len(user["friends"])
Average Connections:
total_connections = sum(number_of_friends(user) for user in users)
avg_connections = total connections / len(users)
This tells us how connected people are on average.
(e) Finding Popular Users
We can sort users by their number of friends:
num_friends_by_id = [(user["id"], number_of_friends(user)) for user in users]
print(sorted(num_friends_by_id, key=lambda x: x[1], reverse=True))
This identifies influencers in the network.
(f) Friend-of-a-Friend (Foaf)
We can recommend new friends based on mutual friends:
def friends_of_friend_ids(user):
return [foaf["id"]
for friend in user["friends"]
for foaf in friend["friends"]]
Example: If Hero is friends with Dunn, and Dunn is friends with Sue, then Sue is a friend-of-a-
friend of Hero.

5. Why This Case Study is Important

Teaches data representation → users and relationships stored in Python.
Demonstrates basic analysis → counting, averaging, ranking.
Shows real-world relevance → friend suggestions, influencer ranking, community detection.
This is a mini version of Facebook or LinkedIn.

Roles in Data Science

Data science is teamwork where different professionals handle different parts of the process.
Data Scientist
Analyzes data to find patterns, insights, and predictions.
Builds machine learning models.
Communicates results to decision-makers.
Example: Predicting which customers are likely to leave a telecom company.
Data Engineer
Designs and manages data pipelines, databases, and storage systems.
Ensures data is clean, accessible, and reliable for analysis.
Example: Building the system that collects streaming data from YouTube users.
Machine Learning Engineer
Focuses on deploying ML models into production.
Optimizes algorithms for speed and accuracy.
Example: Implementing recommendation models that run live on Netflix.
Business Analyst
Acts as a bridge between technical teams and business teams.
Converts insights into strategies and decisions.
Example: Using sales data to advise a retail store on stocking inventory.

Skills Required for Data Science

Programming Skills
Python, R, SQL for data analysis and automation.
Libraries: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn.
Mathematics & Statistics
Probability, hypothesis testing, linear algebra.
Understanding distributions, correlation, regression.
Data Visualization
Tools: Tableau, Power BI, Matplotlib, Seaborn.
Skill: Present results in charts and graphs for better storytelling.
Soft Skills
Problem-Solving: Breaking down complex problems into steps.
Communication: Explaining results to non-technical people.
Storytelling with Data: Turning numbers into actionable insights.
Example: Presenting a fraud detection system to bank managers in simple terms.

Scope of Data Science

Data Science is one of the fastest-growing fields with applications across industries.
o Healthcare: Predicting diseases, drug discovery.
o Finance: Fraud detection, credit scoring.
o Retail & E-commerce: Customer segmentation, product recommendation.
o Social Media: News feed ranking, friend recommendations.
o Manufacturing: Predictive maintenance of machines.

Class Activity
Task: Pick one Pakistani company (for example, Careem, Daraz, Jazz, HBL).
Discuss in groups:
How does it already use data science?
If not, how could it use data science to improve services?
Example: Careem uses data science for estimating ride fares, matching drivers with passengers,
and predicting demand in different areas.

Python Basics – Syntax, Data Types, and Loops

Python Setup
Install Python from python.org or install Anaconda which includes Python, Jupyter Notebook,
and libraries.
o IDEs (Integrated Development Environments):
o Jupyter Notebook: Best for data science.
o VS Code: Lightweight and widely used.
o PyCharm: Professional IDE.

Basic Python Syntax Rules:

Must start with a letter or underscore.
Print Statement Cannot start with a number.
print("Hello, Data Science") Case-sensitive (Name ≠ name).
Output:
Hello, Data Science
Variables
Variables are used to store values.
name = "Aleeza" # string
age = 21 # integer
gpa = 3.7 # float
is_student = True # boolean

Data Types in Python

Text Type Boolean Type

str → string (text in quotes). bool → True or False
message = "Hello World" is_active = True

Collections
List → Ordered, changeable. Numeric Types
fruits = ["apple", "banana", "cherry"] int → integers (e.g., 10)
Tuple → Ordered, unchangeable. float → decimal numbers (e.g., 3.14)
coordinates = (4, 5) x = 10 # int
Dictionary → Key-value pairs. y = 3.14 # float
student = {"name": "Aleeza", "age": 21}

Loops in Python
For Loop
Used when the number of iterations is known.
for i in range(5):
print(i)
Output:
0
1
2
3
4
While Loop
Used when the number of iterations is not fixed.
i=0
while i < 5:
print(i)
i += 1
Output:
0
1
2
3
4

Data Structures in Python

Introduction
Data structures are ways of storing and organizing data so that they can be used efficiently in
programs. Python provides several built-in data structures that are widely used in data science
tasks.
The main data structures are:
o Lists
o Tuples
o Dictionaries
o Sets

Lists
A list is an ordered collection of items. Lists are mutable, which means items can be added,
removed, or changed.
Creating a List
fruits = ["apple", "banana", "cherry"]
numbers = [10, 20, 30, 40]
Accessing Elements
print(fruits[0]) # apple
print(fruits[2]) # cherry
Modifying Elements
fruits[1] = "mango"
print(fruits) # ['apple', 'mango', 'cherry']
List Methods
fruits.append("orange") # add item
fruits.remove("apple") # remove item
len(fruits) # length of list

Tuples
A tuple is similar to a list but immutable (cannot be changed after creation).
Creating a Tuple
coordinates = (10, 20)
Accessing Elements
print(coordinates[0]) # 10
Immutability
coordinates[0] = 50 # Error: cannot modify
Use tuples when data should not change (for example, fixed locations, constant values).

Dictionaries
A dictionary stores data in key-value pairs. It is unordered and mutable.
Creating a Dictionary
student = {"name": "Aleeza", "age": 21, "grade": "A"}
Accessing Values
print(student["name"]) # Aleeza
Adding/Updating Values
student["age"] = 22
student["city"] = "Lahore"
Removing Keys
del student["grade"]
Dictionaries are very useful in data science for structured data like JSON files.

Sets
A set is an unordered collection of unique items.
Creating a Set
numbers = {1, 2, 3, 4, 4, 5}
print(numbers) # {1, 2, 3, 4, 5}
Set Operations
A = {1, 2, 3}
B = {3, 4, 5}
print(A.union(B)) # {1, 2, 3, 4, 5}
print(A.intersection(B)) # {3}
print(A.difference(B)) # {1, 2}
Sets are useful when uniqueness of items is required.

Class Activity

Activity 1
Create a list of five student names. Add two more names and remove one.

students = ["Ali", "Sara", "Hassan", "Fatima", "Omar"]

students.append("Areeba")
students.append("Bilal")
students.remove("Omar")
print(students)
# Output: ['Ali', 'Sara', 'Hassan', 'Fatima', 'Areeba', 'Bilal']

Activity 2
Create a tuple of three cities and try to change one element (observe the error).

cities = ("Karachi", "Lahore", "Islamabad")

# cities[0] = "Multan" # This will give an error: TypeError: 'tuple' object does not support item
assignment

Activity 3
Create a dictionary to store details of a book (title, author, year). Update the year.

book = {"title": "Data Science 101", "author": "John Smith", "year": 2018}
book["year"] = 2023
print(book)
# Output: {'title': 'Data Science 101', 'author': 'John Smith', 'year': 2023}
Activity 4
Create two sets of numbers and find their union and intersection.

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

print(A.union(B)) # {1, 2, 3, 4, 5, 6}
print(A.intersection(B)) # {3, 4}

File Handling and Data Input/Output in Python

Introduction
File handling is an important part of programming because it allows reading and writing data
permanently. Unlike variables that are temporary, files store data even after a program stops. In
Python, we use the built-in open() function for working with files.

Opening and Closing Files

file = open("example.txt", "r") # open file in read mode
file.close() # always close after use
Modes:
"r" → Read (default, error if file not found)
"w" → Write (creates new file or overwrites existing)
"a" → Append (adds data at the end of file)
"r+" → Read and Write

Writing to a File
file = open("data.txt", "w")
file.write("Hello, this is my first file.\n")
file.write("Python makes file handling easy.")
file.close()
This will create a file named data.txt with two lines of text.

Reading from a File

Method 1: Read entire file
file = open("data.txt", "r")
content = file.read()
print(content)
file.close()
Method 2: Read line by line
file = open("data.txt", "r")
for line in file:
print(line.strip()) # strip removes newline character
file.close()
Method 3: Readlines() into list
file = open("data.txt", "r")
lines = file.readlines()
print(lines) # ['Hello, this is my first file.\n', 'Python makes file handling easy.']
file.close()

Using with Statement (Recommended)

The with statement automatically closes the file after use.
with open("data.txt", "r") as file:
content = file.read()
print(content)

Appending to a File
with open("data.txt", "a") as file:
file.write("\nThis line is added later.")

Handling CSV Files

CSV (Comma Separated Values) files are very common in data science.
import csv

# Writing CSV
with open("students.csv", "w", newline="") as file:
writer = csv.writer(file)
writer.writerow(["Name", "Age", "Grade"])
writer.writerow(["Ali", 20, "A"])
writer.writerow(["Sara", 22, "B"])

# Reading CSV
with open("students.csv", "r") as file:
reader = csv.reader(file)
for row in reader:
print(row)

Class Activities
Activity 1
Write a program to create a file called notes.txt and write three lines into it.
with open("notes.txt", "w") as f:
f.write("This is line 1\n")
f.write("This is line 2\n")
f.write("This is line 3\n")
Activity 2
Write a program to read the contents of notes.txt and display them.
with open("notes.txt", "r") as f:
print(f.read())
Activity 3
Append one more line to notes.txt and then display all lines.
with open("notes.txt", "a") as f:
f.write("This is line 4\n")

with open("notes.txt", "r") as f:

for line in f:
print(line.strip())
Activity 4
Create a CSV file of three employees with their names and salaries. Then read and display the
data.
import csv

with open("employees.csv", "w", newline="") as f:

writer = csv.writer(f)
writer.writerow(["Name", "Salary"])
writer.writerow(["Areeba", 50000])
writer.writerow(["Bilal", 60000])
writer.writerow(["Omar", 55000])

with open("employees.csv", "r") as f:

reader = csv.reader(f)
for row in reader:
print(row)

Introduction to NumPy and Pandas

Introduction
In Data Science, handling and analyzing large datasets efficiently is very important. Python
provides two powerful libraries for this purpose: NumPy and Pandas.
NumPy (Numerical Python): Used for numerical operations, arrays, and mathematical functions.
Pandas: Built on NumPy, used for data manipulation and analysis in tabular (row/column)
format.

Part 1: NumPy Basics

Importing NumPy
import numpy as np

Creating Arrays
arr = np.array([1, 2, 3, 4, 5])
print(arr) # [1 2 3 4 5]
print(type(arr)) # <class 'numpy.ndarray'>
1D Array: np.array([1,2,3])
2D Array:
arr2d = np.array([[1,2,3],[4,5,6]])
print(arr2d)

Useful Array Functions

print(np.zeros(5)) # [0. 0. 0. 0. 0.]
print(np.ones((2,3))) # 2x3 array of ones
print(np.arange(1,10,2)) # [1 3 5 7 9]
print(np.linspace(0,1,5))# [0. 0.25 0.5 0.75 1.]

Array Operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b) # [5 7 9]
print(a * b) # [ 4 10 18]
print(a ** 2) # [1 4 9]
print(np.dot(a, b)) # 32 (dot product)

Statistical Functions
arr = np.array([10, 20, 30, 40, 50])
print(np.mean(arr)) # 30.0
print(np.median(arr)) # 30.0
print(np.std(arr)) # standard deviation

Part 2: Pandas Basics

Importing Pandas
import pandas as pd

Series (1D Data)

s = pd.Series([10, 20, 30, 40], index=["a","b","c","d"])
print(s)
Output:
a 10
b 20
c 30
d 40

DataFrame (2D Data)

data = {
"Name": ["Ali", "Sara", "Omar"],
"Age": [22, 24, 21],
"Marks": [85, 90, 78]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Marks
0 Ali 22 85
1 Sara 24 90
2 Omar 21 78

Accessing Data in DataFrame

print(df["Name"]) # column access
print(df.iloc[0]) # row by index
print(df.loc[1, "Marks"]) # specific cell

Basic Operations
print(df.describe()) # summary statistics
print(df.head(2)) # first 2 rows
print(df.tail(1)) # last row

Reading and Writing CSV with Pandas

# Save to CSV
df.to_csv("students.csv", index=False)

# Read from CSV

df2 = pd.read_csv("students.csv")
print(df2)

Class Activities
Activity 1: NumPy
Create a NumPy array of numbers from 1 to 10 and calculate their mean and standard deviation.
import numpy as np
arr = np.arange(1,11)
print("Mean:", np.mean(arr))
print("Standard Deviation:", np.std(arr))

Activity 2: Pandas DataFrame

Create a DataFrame of 3 students with columns: Name, Age, GPA. Then display only the GPA
column.
import pandas as pd
data = {
"Name": ["Hina", "Bilal", "Owais"],
"Age": [20, 21, 22],
"GPA": [3.5, 3.8, 3.2]
}
df = pd.DataFrame(data)
print(df["GPA"])

Activity 3: CSV Handling with Pandas

Create a DataFrame for 3 products with Price and Quantity, save it to a CSV file, then read it
back.
data = {
"Product": ["Pen", "Notebook", "Eraser"],
"Price": [20, 50, 10],
"Quantity": [5, 2, 10]
}
df = pd.DataFrame(data)
df.to_csv("products.csv", index=False)

df2 = pd.read_csv("products.csv")
print(df2)

Data Cleaning and Preparation

Introduction
Before analysis or modeling, real-world data usually needs cleaning.
Data is often incomplete, inconsistent, or contains errors.
Data Cleaning and Preparation ensures high-quality, accurate datasets for analysis.

1. Common Problems in Raw Data

Missing Values: Some entries are empty.
Duplicates: Same record appears multiple times.
Incorrect Data Types: Numbers stored as text, dates stored as strings.
Inconsistent Formatting: "Male"/"M", "Female"/"F".
Outliers: Unusual values (e.g., salary = 999999).

2. Handling Missing Data

Checking Missing Data
import pandas as pd
data = {
"Name": ["Ali", "Sara", "Omar", "Hina"],
"Age": [22, None, 21, 23],
"Marks": [85, 90, None, 88]
}
df = pd.DataFrame(data)

print(df.isnull()) # shows True where values are missing

print(df.isnull().sum()) # counts missing values per column
Filling Missing Values
df["Age"].fillna(df["Age"].mean(), inplace=True) # replace with mean
df["Marks"].fillna(0, inplace=True) # replace with 0
Dropping Missing Values
df.dropna(inplace=True) # removes rows with any missing value

3. Removing Duplicates
df = pd.DataFrame({
"Name": ["Ali", "Sara", "Ali"],
"Age": [22, 23, 22]
})
df = df.drop_duplicates()

4. Correcting Data Types

df["Age"] = df["Age"].astype(int) # convert to integer

5. Handling Inconsistent Data

Example: Different labels for gender.
df["Gender"] = df["Gender"].replace({"M":"Male","F":"Female"})

6. Detecting Outliers
Using statistical methods:
import numpy as np
arr = np.array([10, 12, 15, 14, 100]) # 100 is an outlier
mean = np.mean(arr)
std = np.std(arr)

for x in arr:
if abs(x - mean) > 2*std:
print("Outlier:", x)

7. Renaming Columns
df.rename(columns={"Marks":"Score"}, inplace=True)

8. Feature Scaling (Normalization/Standardization)

Scaling helps when data values have different ranges.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[["Marks"]] = scaler.fit_transform(df[["Marks"]])
Class Activities and Solutions
Activity 1: Handling Missing Data
Create a DataFrame of 5 students with some missing ages and marks. Replace missing ages with
average age, and missing marks with 0.
data = {
"Name": ["Ali", "Sara", "Omar", "Hina", "Bilal"],
"Age": [22, None, 21, None, 24],
"Marks": [85, 90, None, 88, None]
}
df = pd.DataFrame(data)
df["Age"].fillna(df["Age"].mean(), inplace=True)
df["Marks"].fillna(0, inplace=True)
print(df)

Activity 2: Removing Duplicates

Create a DataFrame with duplicate rows and remove duplicates.
df = pd.DataFrame({
"Product": ["Pen", "Pen", "Notebook", "Eraser"],
"Price": [20, 20, 50, 10]
})
df = df.drop_duplicates()
print(df)

Activity 3: Gender Formatting

A DataFrame has inconsistent gender values. Replace them with standard labels.
df = pd.DataFrame({
"Name": ["Ali", "Sara", "Omar"],
"Gender": ["M", "Female", "F"]
})
df["Gender"] = df["Gender"].replace({"M":"Male","F":"Female"})
print(df)

Chapter 1+ Python Basics
No ratings yet
Chapter 1+ Python Basics
6 pages
Module 1.foundations of Data Science
No ratings yet
Module 1.foundations of Data Science
17 pages
DS Unit 1 - NUMPY
No ratings yet
DS Unit 1 - NUMPY
29 pages
Introduction To Python 1
No ratings yet
Introduction To Python 1
13 pages
Data Science... 1
No ratings yet
Data Science... 1
20 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
25 pages
Introduction to Data Science Concepts
100% (1)
Introduction to Data Science Concepts
167 pages
SENG419-python 98745
No ratings yet
SENG419-python 98745
103 pages
FODS Full Notes
No ratings yet
FODS Full Notes
217 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
21CSS303T - UNIT-1 - Lecture - 1
No ratings yet
21CSS303T - UNIT-1 - Lecture - 1
90 pages
CS3352 - Foundations of Data Science
No ratings yet
CS3352 - Foundations of Data Science
142 pages
Python's Role in Data Science Explained
No ratings yet
Python's Role in Data Science Explained
17 pages
Data Science 2
No ratings yet
Data Science 2
15 pages
Python For Data Science
No ratings yet
Python For Data Science
27 pages
Data Science Book
No ratings yet
Data Science Book
383 pages
Lecture 1 and 2 Powerpoints
No ratings yet
Lecture 1 and 2 Powerpoints
32 pages
Data Science Python
No ratings yet
Data Science Python
42 pages
Datascince 1
No ratings yet
Datascince 1
190 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Artificial and Data Science
No ratings yet
Artificial and Data Science
52 pages
FDS CH1
No ratings yet
FDS CH1
4 pages
Data Science Workshop - Day 1
No ratings yet
Data Science Workshop - Day 1
80 pages
Data Science
No ratings yet
Data Science
244 pages
Ocs353dsf Unit Wise Notes
100% (4)
Ocs353dsf Unit Wise Notes
121 pages
Master Data Science With Python
No ratings yet
Master Data Science With Python
87 pages
Python for Data Science Guide
No ratings yet
Python for Data Science Guide
38 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Data v2
No ratings yet
Data v2
25 pages
Lecture 01
No ratings yet
Lecture 01
69 pages
Summer Training Report on Data Science
100% (1)
Summer Training Report on Data Science
41 pages
Python Data Science Projects
No ratings yet
Python Data Science Projects
14 pages
PDS Chapter 2
No ratings yet
PDS Chapter 2
10 pages
CSE 355 Data Science Lab Manual
No ratings yet
CSE 355 Data Science Lab Manual
20 pages
Data Science Machine Learning 17054
No ratings yet
Data Science Machine Learning 17054
27 pages
Data Science
No ratings yet
Data Science
9 pages
Data Science Syllabus From Beginner To Advanced
No ratings yet
Data Science Syllabus From Beginner To Advanced
7 pages
Introduction To Data Science - Ii-I Course File 2025-26
No ratings yet
Introduction To Data Science - Ii-I Course File 2025-26
152 pages
Fundamentals of Data Science Course
75% (4)
Fundamentals of Data Science Course
62 pages
Data Science
No ratings yet
Data Science
13 pages
Python (Till Libraries)
No ratings yet
Python (Till Libraries)
4 pages
Data Science - Data
No ratings yet
Data Science - Data
10 pages
File 2
No ratings yet
File 2
43 pages
Data Science
No ratings yet
Data Science
8 pages
Data Science
No ratings yet
Data Science
14 pages
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
No ratings yet
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
35 pages
Data Munging
No ratings yet
Data Munging
61 pages
Data Science Introduction - Lecture Class
No ratings yet
Data Science Introduction - Lecture Class
62 pages
What Is Data Science
No ratings yet
What Is Data Science
4 pages
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
No ratings yet
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
27 pages
Ultimate Step by Step Guide To Machine Learning Using Python Predictive
100% (3)
Ultimate Step by Step Guide To Machine Learning Using Python Predictive
56 pages
Big Data Analytics: Data Scientists Are in High Demand
No ratings yet
Big Data Analytics: Data Scientists Are in High Demand
32 pages
Class X Data Science
No ratings yet
Class X Data Science
29 pages
Grade 10 Ch-4 Data Science
No ratings yet
Grade 10 Ch-4 Data Science
34 pages
Data Science Basics with Python
100% (1)
Data Science Basics with Python
25 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Python Basics for Data Science Course
No ratings yet
Python Basics for Data Science Course
9 pages
Data Science Circullum
No ratings yet
Data Science Circullum
9 pages
Dog Breed Identification Using Deep Learning
No ratings yet
Dog Breed Identification Using Deep Learning
17 pages
An Image Is Worth 16x16 Words: Transformers For Image Recognition at Scale (Vision Transformer)
No ratings yet
An Image Is Worth 16x16 Words: Transformers For Image Recognition at Scale (Vision Transformer)
28 pages
Unit I 1
No ratings yet
Unit I 1
203 pages
MCA Project Titles
No ratings yet
MCA Project Titles
2 pages
How Ai Could Lead To A Better Understanding of The Brain
No ratings yet
How Ai Could Lead To A Better Understanding of The Brain
4 pages
Summarize The Papers: 1. Sowmya Hegde, Shreyashree A V, Malnad College of Engineering, "Machine
No ratings yet
Summarize The Papers: 1. Sowmya Hegde, Shreyashree A V, Malnad College of Engineering, "Machine
2 pages
Personalized Adaptive Learning Technologies Based On Machine 35x7bgs2
No ratings yet
Personalized Adaptive Learning Technologies Based On Machine 35x7bgs2
18 pages
ML Report
No ratings yet
ML Report
29 pages
Research Paper Attri
No ratings yet
Research Paper Attri
7 pages
Wa0019
No ratings yet
Wa0019
2 pages
PankajJha 2025-04-14 Resume
No ratings yet
PankajJha 2025-04-14 Resume
2 pages
AIPT Practical Exam Codes
No ratings yet
AIPT Practical Exam Codes
12 pages
Almaqtari Et Al. (2021)
No ratings yet
Almaqtari Et Al. (2021)
8 pages
RapidMiner - Humans Guide ML V2
No ratings yet
RapidMiner - Humans Guide ML V2
19 pages
Ai Repair Paper
No ratings yet
Ai Repair Paper
4 pages
Monograph AI
No ratings yet
Monograph AI
30 pages
IT 323 Lectures by Ruchika Pharswan Till Midterm V2
No ratings yet
IT 323 Lectures by Ruchika Pharswan Till Midterm V2
157 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Robotics
No ratings yet
Robotics
7 pages
4-Literature Review of Digital Twin Technologies For Civil Infrastructure
No ratings yet
4-Literature Review of Digital Twin Technologies For Civil Infrastructure
10 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
World's Most Influential Leader in AI - 2025
No ratings yet
World's Most Influential Leader in AI - 2025
28 pages
Behrens Et Al. 2018 - Spatial Modelling With Euclidean Distance Fields and Machine Learning
No ratings yet
Behrens Et Al. 2018 - Spatial Modelling With Euclidean Distance Fields and Machine Learning
14 pages
The Role of AIIn Warehouse Digital Twins
No ratings yet
The Role of AIIn Warehouse Digital Twins
9 pages
Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach - ScienceDirect
No ratings yet
Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach - ScienceDirect
38 pages
M.tech CSE AI ML Curriculum
No ratings yet
M.tech CSE AI ML Curriculum
3 pages
Prediction and Optimization of Epoxy Adhesive Strength From A Small Dataset Through Active Learning
No ratings yet
Prediction and Optimization of Epoxy Adhesive Strength From A Small Dataset Through Active Learning
13 pages
Machine Learning For Predict Well Productivity With Frac
No ratings yet
Machine Learning For Predict Well Productivity With Frac
42 pages
Synthetic Data For Machine Learning
No ratings yet
Synthetic Data For Machine Learning
10 pages
RevisedDetailedTimeTable-B TechS8 (R, S) Exam
No ratings yet
RevisedDetailedTimeTable-B TechS8 (R, S) Exam
21 pages

Chapter 1+ Python Basics-1

Uploaded by

Chapter 1+ Python Basics-1

Uploaded by

Data Science

2. Why Data Science?

3. The Data Science Workflow

4. Case Study: DataSciencester

5. Why This Case Study is Important

Roles in Data Science

Skills Required for Data Science

Scope of Data Science

Python Basics – Syntax, Data Types, and Loops

Basic Python Syntax Rules:

Data Types in Python

Text Type Boolean Type

Data Structures in Python

students = ["Ali", "Sara", "Hassan", "Fatima", "Omar"]

cities = ("Karachi", "Lahore", "Islamabad")

File Handling and Data Input/Output in Python

Opening and Closing Files

Reading from a File

Using with Statement (Recommended)

Handling CSV Files

with open("notes.txt", "r") as f:

with open("employees.csv", "w", newline="") as f:

with open("employees.csv", "r") as f:

Introduction to NumPy and Pandas

Part 1: NumPy Basics

Useful Array Functions

Part 2: Pandas Basics

Series (1D Data)

DataFrame (2D Data)

Accessing Data in DataFrame

Reading and Writing CSV with Pandas

# Read from CSV

Activity 2: Pandas DataFrame

Activity 3: CSV Handling with Pandas

Data Cleaning and Preparation

1. Common Problems in Raw Data

2. Handling Missing Data

print(df.isnull()) # shows True where values are missing

4. Correcting Data Types

5. Handling Inconsistent Data

8. Feature Scaling (Normalization/Standardization)

Activity 2: Removing Duplicates

Activity 3: Gender Formatting

You might also like