0% found this document useful (0 votes)
37 views5 pages

Machine Learning Tasks

The document outlines a series of machine learning tasks for internships of varying lengths, requiring participants to complete a specified number of tasks. Each task includes a description, recommended dataset, tools and libraries, covered topics, and bonus challenges. The tasks range from student score prediction to traffic sign recognition, covering various machine learning concepts such as regression, classification, clustering, and recommendation systems.

Uploaded by

Muzamal1656
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views5 pages

Machine Learning Tasks

The document outlines a series of machine learning tasks for internships of varying lengths, requiring participants to complete a specified number of tasks. Each task includes a description, recommended dataset, tools and libraries, covered topics, and bonus challenges. The tasks range from student score prediction to traffic sign recognition, covering various machine learning concepts such as regression, classification, clustering, and recommendation systems.

Uploaded by

Muzamal1656
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning Note:

Participants must complete at least

Tasks
3 tasks for the 2-week internship and

4 tasks for the 1-month internship

— from any level.

Level 1
Task 1: Student Score Prediction
Description:
Dataset (Recommended): Student Performance Factors (Kaggle
Build a model to predict students' exam scores based on their study hour
Perform data cleaning and basic visualization to understand the datase
Split the dataset into training and testing set
Train a linear regression model to estimate final score
Visualize predictions and evaluate model performance

Tools & Libraries:


Python Pandas Matplotlib Scikit-learn

Covered Topics
Regression | Evaluation metrics

Bonus:

Try polynomial regression and compare performance

Try experimenting with different feature combinations (e.g., removing or adding features like sleep, participation, etc.)

Task 2: Customer Segmentation

Description:
Dataset (Recommended): Mall Customer (Kaggle
Cluster customers into segments based on income and spending score
Perform scaling and visual exploration of groupings
Apply K-Means clustering and determine optimal number of clusters
Visualize clusters using 2D plots

Tools & Libraries:


Python Pandas Matplotlib Scikit-learn

Covered Topics
Clustering | Unsupervised learning

Bonus:

Try different clustering algorithms (e.g., DBSCAN)

Analyze average spending per cluster


1
Machine Learning
Tasks
Level 2
Task 3: Forest Cover Type Classification

Description:
Dataset (Recommended): Covertype (UCI
Predict the type of forest cover based on cartographic and environmental features
Clean and preprocess the data including categorical handling
Train and evaluate multi-class classification models
Visualize confusion matrix and feature importance

Tools & Libraries:


Python Pandas Scikit-learn XGBoost

Covered Topics
Multi-class classification | Tree-based modeling

Bonus:

Compare different models (e.g., Random Forest vs. XGBoost)

Perform hyperparameter tuning

Task 4: Loan Approval Prediction Description

Description:
Dataset (Recommended): Loan-Approval-Prediction-Dataset (Kaggle
Build a model to predict whether a loan application will be approved
Handle missing values and encode categorical features
Train a classification model and evaluate performance on imbalanced data
Focus on precision, recall, and F1-score

Tools & Libraries:


Python Pandas Scikit-learn

Covered Topics
Binary classification | Imbalanced data

Bonus:

Use SMOTE or other techniques to address class imbalance

Try logistic regression vs. decision tree

2
Machine Learning
Tasks
Level 2
Task 5: Movie Recommendation System Description
Description:
Dataset (Recommended): MovieLens 100K Dataset (Kaggle
Build a system that recommends movies based on user similarity
Use a user-item matrix to compute similarity scores
Recommend top-rated unseen movies for a given user
Evaluate performance using precision at K

Tools & Libraries:


Python Pandas Numpy Scikit-learn

Covered Topics
Recommendation systems | Similarity-based modeling

Bonus:

Implement item-based collaborative filtering

Try matrix factorization (SVD)

3
Machine Learning
Tasks
Level 3
Task 6: Music Genre Classification Description
Description:
Dataset (Recommended): GTZAN (Kaggle
Classify songs into genres based on extracted audio features
Preprocess features such as MFCCs or use spectrogram images
Train and evaluate a multi-class model using tabular or image data
If image-based, use a CNN model

Tools & Libraries:


Python Librosa (for features) Scikit-learn or Keras

Covered Topics
Audio data / CNNs | Multi-class classification

Bonus:

Try both tabular and image-based approaches and compare results

Use transfer learning on spectrograms

Task 7: Sales Forecasting Description


Description:
Dataset (Recommended): Walmart Sales Forecast (Kaggle
Predict future sales based on historical sales data
Create time-based features (day, month, lag values)
Apply regression models to forecast next period's sales
Plot actual vs. predicted values over time

Tools & Libraries:


Python Pandas Matplotlib Scikit-learn

Covered Topics
Time series forecasting | Regression

Bonus:

Use rolling averages and seasonal decomposition

Apply XGBoost or LightGBM with time-aware validation


4
Machine Learning
Tasks
Industry Level
Task 8: Traffic Sign Recognition Description

Description:
Dataset (Recommended): GTSRB (Kaggle
Classify traffic signs based on their image using deep learning
Preprocess images (resizing, normalization)
Train a CNN model to recognize different traffic sign classes
Evaluate performance using accuracy and confusion matrix

Tools & Libraries:


Python Keras TensorFlow
OpenCV

Covered Topics
Computer vision (CNN) | Multi-class classification

Bonus:

Add data augmentation to improve performance

Compare custom CNN vs. pre-trained model (e.g., MobileNet)

You might also like