0% found this document useful (0 votes)
25 views12 pages

Employee Salary Prediction-1

This is a pdf of this salary prediction

Uploaded by

aajy1677
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views12 pages

Employee Salary Prediction-1

This is a pdf of this salary prediction

Uploaded by

aajy1677
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Employee Salary Prediction using

Machine learning algorithms


BY: S. ADRIAN JUBAL
Bharath Institute of higher
education and research
Outline
• Problem statement
• System approach
• Algorithm & Deployment
• Result
• Conclusion
• References
Problem Statement
 Predict whether an individual's income exceeds or falls below $50K based
on demographic and employment attributes.
 Automate the income classification process using advanced machine
learning algorithms.
 Enhance pre-qualification procedures in critical sectors like HR, finance,
and insurance.
 Utilize structured data to minimize manual screening, improving efficiency
and accuracy.
 Showcase the practical application of ensemble learning techniques for
real-world classification challenges.
System Approach
 System Requirements: Python 3.12, Scikit-learn, XGBoost , Pandas,

 Libraries Used: sklearn , xgboost , pandas, numpy , matplotlib,

 Backend: Robust ensemble model combining Random Forest, Logistic


Regression, and XGBoost for enhanced predictive power.

 Data Source: Comprehensive UCI Adult Income dataset in CSV format,


providing rich demographic and employment information.
Algorithms
 Random Forest Classifier
 Logistic Regression
 XGBoost Classifier
 Voting Classifier (Ensemble)
 KMeans Clustering
 Isolation Forest
Steps & Measures
 Import required libraries
 Load and clean the dataset
 Separate features and target
 Identify numerical and categorical columns
 Create preprocessing pipelines
 Apply preprocessing using ColumnTransformer
 Detect and remove outliers using Isolation Forest
 Convert sparse matrix to dense if needed
 Add KMeans cluster label as a new feature
 Split data into training and testing sets
 Define individual models: Random Forest, Logistic Regression, XGBoost
 Create an ensemble model using VotingClassifier
 Train the ensemble model
 Evaluate the model: Accuracy + Classification Report
 Plot and save model accuracy
 Save the trained model with preprocessing and clustering using joblib
Results
Results
Links
 Model Link: Prediction model code

 Google Colab: Prediction Model

 Short Description:
• 🔗 [GitHub Link] — Complete project as ZIP (includes outcome, trained model,
training code, and presentation)
• 📦 [Drive Link] — Direct download of trained model file (.pkl)
References

1. Scikit-learn Documentation
https://scikit-learn.org/stable/documentation.html

2. XGBoost Documentation
https://xgboost.readthedocs.io/

3. PyInstaller Docs
 https://pyinstaller.org/
Conclusion

Summarize the findings and discuss the effectiveness of


the proposed solution. Highlight any challenges
encountered during the implementation and potential
improvements.
THANK YOU

You might also like