Data Science Laboratory
PCEL102
by
Dr. Pradnya Kamble
Lab Objectives
Lab Prerequisite 1. To study the pandas and Numpy Library.
1. Python programming, 2. To study supervised and unsupervised learning
2. R programming algorithms.
3. MATLAB 3. To learn complete data analysis
4. .To learn mathematical methods used in
data science.
Lab Outcomes
1. Apply quantitative modelling and data analysis techniques to the solution of real-
world business problems, communicate findings, and effectively present results
using data visualization techniques.
2. Implement exploratory data analysis.
3. Evaluate the performance of machine learning algorithms.
4. Apply principles of Data Science to the analysis of business problems.
5. Implement statistical methods used in data science applications.
6. Apply ethical principles like timeliness and adhere to the rules of the laboratory.
Role of Data Science in Artificial Intelligence (AI)
Data Collection & Cleaning – Prepares raw data into usable formats for AI.
Feature Engineering – Extracts and selects important features from datasets for
AI models.
Exploratory Data Analysis (EDA) – Identifies trends, patterns, and insights to guide
AI modeling.
Statistical Foundations – Provides probability, statistics, and linear algebra
concepts essential for AI.
Model Training & Optimization – Supplies methods and tools to train and
evaluate AI/ML models.
Data-Driven Decision Making – Ensures AI systems make reliable, evidence-
based predictions.
Bias & Quality Control – Reduces errors by ensuring AI learns from unbiased,
high- quality data.
Continuous Monitoring & Updating – Uses data science to track AI performance
and retrain models.
Visualization & Interpretation – Translates AI outputs into understandable insights
for humans.
Applications of Data Science in the real world
Healthcare – Disease prediction, drug discovery, medical image analysis, patient
monitoring.
Finance – Fraud detection, credit scoring, risk analysis, algorithmic trading.
Retail & E-commerce – Customer segmentation, recommendation systems, inventory
optimization.
Marketing – Targeted advertising, customer behavior analysis, sentiment analysis.
Transportation – Traffic prediction, route optimization, autonomous vehicles.
Manufacturing – Predictive maintenance, quality control, supply chain optimization.
Social Media & Entertainment – Personalized content recommendations (Netflix,
YouTube), trend analysis.
Cybersecurity – Intrusion detection, anomaly detection, phishing/fraud prevention.
Agriculture – Crop yield prediction, soil analysis, precision farming.
Energy Sector – Smart grids, energy consumption forecasting, renewable energy
optimization.
Government & Public Policy – Crime prediction, disaster management, urban planning.
Education – Adaptive learning systems, student performance prediction, personalized
tutoring.
Assessment & Term Work
• Minimum 8 experiments required
• Assignments on data science foundations
• Evaluation:
• Experiments: 15 Marks
• Attendance: 5 Marks
• Assignments: 5 Marks
- Oral/Practical exam based on lab work
Tools Used in Laboratory
• Google Colab:
Cloud-based Python environment
Free CPU/GPU access for ML
Collaboration & sharing
• MATLAB:
Mathematical computation & modeling
Data analysis & simulations
• R Programming:
Strong statistical analysis & visualization
Data wrangling and plotting
Experiment Execution Process
1. Define objective of experiment
2. Study prerequisites/theory
3. Design methodology/flowchart
4. Implement code
5. Execute and test
6. Analyze results/outputs
7. Document steps & observations
8. Write conclusion
9. Submit for evaluation
Start Split Data
Import Libraries
Train Model
Create Dataset Make
Prediction
Visualize Data Evaluate Model
Start
Start
Convert Text to Numerical Features (Count Vectorizer)
Load Dataset (NLTK Movie Reviews)
Train Model (Naïve Bayes Classifier)
Data Processing
–Tokenization
-Stop word Removal- Evaluate Model-Classification Report
- join words -Confusion Matrix
Split Data into Train and Test Sets Test on Custom Sentences
End
Start
Reconstruct Matrix A' = U × Σ × Vt
Define Input Matrix A
(e.g., 2×3 matrix)
Perform SVD using Compare A and A' Check if np.allclose(A,A') Rec
U, S, Vt = np.linalg.svd(A)
Display Results: Display Reconstructed A and Verification Resul
- U (Left Singular Vectors)
- S (Singular Values)
- Vt (Right Singular Vectors)
End
Construct Sigma Matrix Σ
(diagonal matrix of S)
Start
Visualize PCA Results]
Import Libraries - Plot first 2 principal components
- Color-code by class labels
- Add legend, labels, grid, title
Load Dataset]
- Load Iris dataset Analyze Explained Variance]
- Separate features (X) and labels (y) - Print explained variance ratio for each component
- Print original shape of data - Print total variance explained by 2 components
Standardize Data]
- Apply StandardScaler
- Transform features to zero mean & unit
End
variance
Apply PCA]
Reduce dimensions (n_components=2)
- Fit PCA on scaled data
Transform data to principal components
- Print reduced data shape
Start
Input Text Sentence Assign POS tags to each token (e.g., Noun, Ve
Text Preprocessing
- Tokenization (split into words) Display Tagged Output
- Tokens with POS tags
- Compare NLTK & spaCy
Apply POS Tagging Algorithm
- Using NLTK POS Tagger End
- Using spaCy NLP Model