Bsc Data Science Sem 4
ASSIGNMENT 2 :– Machine Learning
Marks :- 10
Date of Submission:- 10/02/2025
• DOMAIN: Banking and finance
• CONTEXT: A bank X is on a massive digital transformation for all its departments. Bank has a growing
customer base, majority of them are liability customers vs borrowers. The bank is interested in expanding the
borrowers base rapidly to bring in more business via loan interests. A campaign that the bank ran in last quarter
showed an average single digit conversion rate. Digital transformation being the core strength of the business
strategy, marketing department wants to devise effective campaigns with better target marketing to
increase the conversion ratio to double digitwith same budget as per last campaign.
• DATA DESCRIPTION: The data consists of the following attributes:
1. ID: Customer ID
2. Age Customer’s approximate age.
3. CustomerSince: Customer of the bank since. [encrypted unit]
4. HighestSpend: Customer’s highest spend so far in one transaction. [encrypted unit]
5. ZipCode: Customer’s zip code.
6. HiddenScore: A score associated to the customer which is masked by the bank as an IP.
7. MonthlyAverageSpend: Customer’s monthly average spend so far. [encrypted unit]
8. Level: A level associated to the customer which is masked by the bank as an IP.
9. Mortgage: Customer’s mortgage. [encrypted unit]
10. Security: Customer’s security asset with the bank. [encrypted unit]
11. FixedDepositAccount: Customer’s fixed deposit account with the bank. [encrypted unit]
12. InternetBanking: if the customer uses internet banking.
13. CreditCard: if the customer uses bank’s credit card.
14. LoanOnCard: if the customer has a loan on credit card.
• PROJECT OBJECTIVE: Build an AIML model to perform focused marketing by predicting the potential
customers who willconvert using the historical dataset.
Steps and tasks: [ Total : 60 points ]
1. Import:
• Import all the given datasets and explore shape and size of each.
• Merge all datasets onto one and explore final shape and size.
2. Data cleansing:
• Explore and if required correct the datatypes of each attribute
• Explore for null values in the attributes and if required drop or impute values.
3. EDA:
• Perform detailed statistical analysis on the data.
• Perform a detailed univariate, bivariate and multivariate analysis with appropriate detailed comments after
each analysis.
1. Data pre-processing:
• Segregate predictors vs target attributes
• Check for target balancing and fix it if found imbalanced.
• Perform train-test split.
2. Model training:
• Design and train a Logistic regression and Naive Bayes classifiers.
• Design a random forest model.
• Design a Logistic Regression model.
3. Model Selection
• Display the classification accuracies for train and test data.
• Display and explain the classification report in detail.
• Apply all the possible tuning techniques to train the best model for the given data. Select the
final best trained model withyour comments for selecting this model.
4. Conclusion and improvisation:
• Write your conclusion on the results.
• Detailed suggestions or improvements or on quality, quantity, variety, velocity, veracity etc. on the data
points collected by thebank to perform a better data analysis in future.