0% found this document useful (0 votes)
26 views2 pages

ML - Assignment Advanced

A sample of advanced machine learning assignment.

Uploaded by

Jutt Sahib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views2 pages

ML - Assignment Advanced

A sample of advanced machine learning assignment.

Uploaded by

Jutt Sahib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Assignment: End-to-End Machine

Learning Pipeline
Objective
Apply everything you have learned so far to build a complete machine learning pipeline —
from raw data to model evaluation.

Dataset
Download the dataset from Kaggle. Don’t use any built-in library datasets, such as those
from scikit-learn or seaborn.

🔹 Assignment Tasks

1. Data Handling (NumPy & Pandas)


• Load dataset into a Pandas DataFrame.
• Perform initial checks.
• Handle missing values and duplicates.
• Convert categorical features into numerical form if needed.

2. Exploratory Data Analysis (EDA)


• Use NumPy & Pandas for basic statistics.
• Visualize data using:
- Matplotlib / Seaborn
- Plotly: at least one interactive plot (e.g., scatter or bar chart).

3. Feature Engineering
• Split dataset into features (X) and target (y).
• Normalize/scale data if necessary.
• Perform train-test split.

4. Model Training
• Train the following models:
• - KNN Classifier
• - Decision Tree Classifier
• - Random Forest Classifier
• Compare baseline results.
5. Feature Importance
• Extract and visualize feature importance from Random Forest.
• Discuss which features contribute most to predictions.

6. Hyperparameter Tuning
• Use RandomizedSearchCV to optimize hyperparameters:
• - KNN → n_neighbors, weights, metric
• - Decision Tree → max_depth, min_samples_split
• - Random Forest → n_estimators, max_depth, min_samples_split
• Compare default vs tuned models.

7. Model Evaluation
• Evaluate models using:
• - Accuracy
• - Precision, Recall, F1-score
• - Confusion Matrix
• Plot ROC Curve for the best-performing model.

8. Conclusion
• Which model performed best and why?
• Which features were most important?
• How did hyperparameter tuning improve results?

Deliverables
1. Jupyter Notebook with well-commented code and results.
2. Report (2–3 pages) summarizing:
- Dataset insights
- Visualization findings
- Model comparison table
- Key conclusions

You might also like