An
INTERNSHIP REPORT ON
“Created A Diabetes Predicting Model Using Support Vector Machine”
SUBMITTED TO THE SAVITRIBAI PHULE PUNE UNIVERSITY IN THE PARTIAL
FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD OF THE DEGREE OF
THIRD YEAR OF
ENGINEERING (INFORMATION
TECHNOLOGY)
SUBMITTED BY
Mr. Kaif Mahaldar Exam Seat No. T190438541
Guided By Prof. P.S.Bangare
DEPARTMENT OF INFORMATION
TECHNOLOGY SINHGAD ACADEMY OF
ENGINEERING
KONDHWA, PUNE 411048
SAVITRIBAI PHULE PUNE UNIVERSITY
2022-23
1
DEPARTMENT OF INFORMATION
TECHNOLOGY SINHGAD ACADEMY OF
ENGINEERING KONDHWA, PUNE 411048
CERTIFICATE
This is to certify that the Internship report entitle
“Created A Diabetes Predicting Model Using Support Vector Machine”
Submitted by
Mr.Kaif Mahaldar Exam Seat No: T190438541
This is a bonafide work carried out by him under the supervision of Prof. P.S.Bangare and
it is submitted towards the fulfillment of the requirement of Savitribai Phule Pune University, for
the award of the degree of Third year of Engineering (Information Technology).
Prof. P.S.Bangare Dr. S.S.Kulkarni
Guide HOD
Department of Information Technology
Dr. K.P. Patil
Principal,
Sinhgad Academy of Engineering, Pune – 48
Place:
Date:
2
Table of Contents
List Of Figures 04
Internship Completion Certificate 05
Acknowledgement 06
Abstract 07
Chapter 1: About Company 08
1.1Business and Experince
Chapter 2 : Introduction 09
2.1 Introduction to Machine Learning
2.1.1. Comprehensive Data Cleaning:
2.1.2. Support Vector Machine
2.1.3. Making a Predictive System
2.2 Basic Terms
2.2.1. Language used
2.2.2. Prediction Working
2.2.3. Version Control
2.2.4. Why Python is used ?
Chapter 3 : System Requirements 13
3.1 Software Required
3.2 Hardware Required
Chapter 4 : Model Building 14
4.1 Import Libraries
4.2 Load & Prepare Data
4.3 Feature Scaling
4.4 Make Predictions
4.5 Evaluate the Model
Chapter 5 : Implementation 16
Chapter 6 : Conclusion 20
3
LIST OF FIGURES
Fig No 1 : Company Logo
Fig No 2 : Support Vector Machine (SVM)
Fig No 3 : Including Dependencies Example
Fig No 4 : Confusion Matri
4
Internship Completion Certificate
5
ACKNOWLEDGEMENT
I would like to express my sincere gratitude to everyone who made my internship possible and
helped me throughout the duration of the program.
I would like to thank the entire team for their support and assistance. Their encouragement and
motivation helped me to overcome challenges and complete my tasks successfully. I am thankful
to Faculty Mentor Prof. P.S.Bangare for giving me valuable suggestions for completing my
internship report.
I am thankful to Dr. S.S.Kulkarni Head of Department, Information Technology Engineering, for
his motivating and valuable support throughout the course.
Finally, I would like to thank my colleagues for their cooperation and support. Their camaraderie
and teamwork made my internship experience enjoyable and memorable.
Thank you once again for this wonderful opportunity, and I look forward to applying the
knowledge and skills gained during my internship in my future endeavors.
Dr. S.S.Kulkarni
6
ABSTRACT
It has been an excellent opportunity for me to work in one of the most illustrious company existing
in its field. This training has certainly helped me to bridge the gap between theory and practical. It
has empowered me to see how the knowledge gained through textbooks is implemented in practice.
I encountered the latest fields of technology, which were unknown to me, thus broadening my
knowledge base as well as strengthening my credentials. From these projects, I got an opportunity
to understand how work is carried out. I also learnt how to research various things needed for the
project. This internship has helped me to understand the need for collaborative efforts of various
persons at different levels in different departments in achieving business growth and set targets.
The report provides an overview of the tasks undertaken during the internship, including the
deployment of the website. Additionally, the report details the skills and knowledge gained during
the internship.
The report concludes with a summary of the valuable experience gained during the internship and
how it has contributed to the development of my professional skills and knowledge in the field of
web development.
7
Chapter No. 01 About Company
1.1 Business and Experience
Fig. No. 1 Company logo
INTERNSHALA is a dot com business with the heart of dot org. We are a technology company on a
mission to equip students with relevant skills & practical exposure to help them get the best possible
start to their careers. Imagine a world full of freedom and possibilities. A world where you can discover
your passion and turn it into your career. A world where you graduate fully assured, confident, and
prepared to stake a claim on your place in the world.
Internshala is an online recruitment and training platform. On recruitment portal of Internshala,
the internship seekers and job-seeking freshers from all over India, across different education
streams, can search and apply to various internships and fresher jobs of their choice and
organisations. Additionally, startups, corporates, SMEs, NGOs, education institutes, and big
brands, can post their intern and entry-level job requirements to hire university students and fresh
graduates all over India. On Internshala Trainings, the e-learning arm of Internshala, the online
learners including students and professionals can avail online trainings in the latest in-demand
industry.
8
Chapter No. 02
INTRODUCTION
2.1 Introduction:
During my AIML internship at [Company/Organization Name], I delved into the transformative
realms of Artificial Intelligence and Machine Learning. Collaborating with seasoned professionals,
I honed skills in algorithm development, model implementation, and data analysis. This report
encapsulates my journey, highlighting projects undertaken, challenges tackled, and lessons gleaned.
It underscores the pivotal role of AIML in reshaping industries and outlines its significance within
the context of [Company/Organization Name]'s mission. Through this succinct reflection, I aim to
convey the profound impact of AIML while expressing gratitude for the enriching opportunity to
contribute to its advancement.
2.1.1 Comprehensive Data Cleaning:
Comprehensive data cleaning is a critical aspect of preparing datasets for Artificial Intelligence and
Machine Learning (AIML) projects. This process involves a series of essential steps to ensure the
dataset's accuracy, consistency, and suitability for training machine learning models. It begins with
handling missing values, employing techniques like imputation or deletion, and extends to dealing
with outliers through detection and treatment methods. Categorical variables are encoded into
numerical format to make them compatible with machine learning algorithms, while duplicate
records are identified and removed to enhance model performance. Data quality issues are
addressed, including inconsistencies or errors, through manual inspection or validation checks.
Feature engineering may be employed to create or derive new features from existing ones,
enhancing the predictive power of the models. Skewed data distributions are normalized or
transformed to improve model performance, and imbalanced data in classification tasks is handled
through oversampling, undersampling, or specialized algorithms. Additionally, the dataset is split
into training, validation, and test sets to evaluate model performance effectively and mitigate
overfitting risks. Thorough documentation of the data cleaning process ensures transparency and
reproducibility, contributing to the overall reliability of AIML models trained on the dataset.
2.1.2 Support Vector Machine (SVM) :
Support vector machine is a supervised learning system and is used for classification and regression
problems. Support vector machine is extremely favored by many as it produces notable correctness
with less computation power. It is mostly used in classification problems. We have three types of
learning: supervised, unsupervised, and reinforcement learning. A support vector machine is a
selective classifier formally defined by dividing the hyperplane. Given labeled training data the
algorithm outputs best hyperplane which classified new examples. In two-dimensional space, this
9
hyperplane is a line splitting a plane into two parts where each class lies on either side. The intention
of the support vector machine algorithm is to find a hyperplane in an N-dimensional space that
separately classifies the data points.
Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for
classification and regression tasks. The main idea behind SVM is to find the best boundary (or
hyperplane) that separates the data into different classes.
In the case of classification, an SVM algorithm finds the best boundary that separates the data into
different classes. The boundary is chosen in such a way that it maximizes the margin, which is the
distance between the boundary and the closest data points from each class. These closest data points
are called support vectors.
Fig. No. 2 SVM
2.1.3 Making a Predictive System:
Creating a predictive system in AI and ML involves a series of interconnected steps aimed at
harnessing data to make accurate predictions. The process begins with clearly defining the problem
to be addressed and collecting relevant data from diverse sources. Once gathered, the data
undergoes preprocessing, including cleaning, normalization, and feature engineering to prepare it
for model training. With the data ready, suitable machine learning algorithms are selected and
trained using the processed dataset, followed by rigorous evaluation to ensure model accuracy and
reliability. Upon selecting the best-performing model, it is deployed into a production environment,
often accompanied by a user-friendly interface for input and output. Post-deployment, continuous
monitoring and maintenance are essential to ensure model performance remains optimal over time.
This iterative process allows for ongoing improvement and refinement of the predictive system,
adapting to changing data and requirements, ultimately delivering valuable insights and predictions
in various domains.
10
2.2Basic Terminologies :
2.2.1 Language used :
Python, with libraries like TensorFlow, PyTorch, and scikit-learn, is a popular choice for AIML
projects. TensorFlow and PyTorch offer deep learning capabilities, while scikit-learn provides a
wide range of machine learning algorithms. Pandas and NumPy are essential for data manipulation,
and Matplotlib aids in data visualization. Python's simplicity, versatility, and extensive libraries
make it well-suited for tasks ranging from data preprocessing and model training to deployment in
AIML applications.
Locomotive Scroll is a JavaScript library that revolutionizes frontend development by providing
smooth scrolling functionality. By overriding the default browser scroll behavior, Locomotive
Scroll enables seamless navigation through web content, creating an immersive user experience. Its
ability to handle various scroll-based animations and parallax effects elevates the visual appeal of
web applications, making them more engaging and dynamic. With Locomotive Scroll, developers
can effortlessly implement custom scroll animations, enhancing the overall aesthetics and
interactivity of their frontend designs.
2.2.2 How Predictions works in ML :
Prediction in machine learning involves using trained models to estimate the outcome or value of a
new input based on patterns learned from historical data. When presented with unseen data, the
model applies the learned relationships between input features and target values to make
predictions. This process typically involves feeding the input data into the trained model, which
applies mathematical operations based on the model's parameters to generate a prediction.
Depending on the problem type, predictions may involve classification (assigning a category or
label) or regression (predicting a continuous value). The accuracy of predictions is evaluated using
performance metrics tailored to the specific problem domain.
2.2.3 Version Control :
GitHub is a widely-used version control platform that allows developers to manage and collaborate
on code projects. It enables users to track changes, coordinate work with team members, and
maintain a complete history of revisions. By using GitHub, developers can streamline development
workflows, ensure code integrity, and facilitate seamless collaboration across distributed teams.
11
2.2.4 Why Python is used :
Python is preferred in AI and ML due to its simplicity, versatility, and rich ecosystem of libraries.
Libraries like TensorFlow, PyTorch, and scikit-learn offer powerful tools for building and training
complex models with ease. Python's readability and clean syntax facilitate rapid development and
experimentation, while its extensive community support and vast array of resources make it
accessible to developers of all levels. Its integration capabilities with other languages and platforms
further enhance its utility in AI and ML project.
12
Chapter No. 03
System Requirements
3.1 Software and Tools :
Software Version
Jupyter v7. 1
Python v3.10
3.2 Hardware :
Hardware Mininmum Requirement
Processor 2.0GHz – Core Processor
Ram 4 GB
Storage Space 10 GB
Display 1920*1080 Resolution
13
Chapter No. 04
Model Building
4.1 Import Dependencies :
In this step, necessary libraries are imported. pandas is used for data manipulation and loading the dataset.
train_test_split from Scikit-learn is employed to split the dataset into training and testing sets. StandardScaler is
used to scale the features, ensuring they have a mean of 0 and a standard deviation of 1, which is important for
SVMs. SVC is the Support Vector Classifier from Scikit-learn, the implementation of SVM. Finally, we
import evaluation metrics like accuracy_score, classification_report, and confusion_matrix to assess the
model's performance.
Fig 3 Importing Dependencies
4.2 Load and Prepare Data :
In this step, the dataset is loaded using pd.read_csv from pandas. The dataset is assumed to be a CSV file, and
the URL is replaced with the actual dataset URL. Next, the features (X) and the target variable (y) are
separated. The drop function is used to remove the target variable from the features. Then, the dataset is split
into training and testing sets using train_test_split. Here, 80% of the data is used for training (X_train, y_train),
and 20% is used for testing (X_test, y_test).
4.3 Feature Scaling :
Feature scaling is performed to ensure all features have the same scale. This step is crucial for SVMs because
they are sensitive to the scale of features. The StandardScaler object is created and fitted to the training data
14
(X_train) using fit_transform. This calculates the mean and standard deviation of each feature and transforms
the data. The same transformation is then applied to the testing data (X_test) using transform. Now, both
X_train and X_test are scaled and ready for model training.
4.4 Train the SVM Model :
Here, the SVM model is initialized using SVC (Support Vector Classifier) with an RBF (Radial Basis
Function) kernel. The RBF kernel is commonly used and effective for non-linear datasets. The gamma
parameter is set to 'scale', which automatically uses 1 / (n_features * X.var()) as the value of gamma. The C
parameter, set to 1.0, is the regularization parameter that controls the trade-off between smooth decision
boundary and classifying points correctly. The model is then trained on the scaled training data (X_train,
y_train) using the fit method.
4.5 Make Predictions :
After training the model, it's used to make predictions on the testing set (X_test) using the predict method. The
predicted values (y_pred) are obtained based on the features in X_test.
4.6 Evaluate the Model :
Finally, the model's performance is evaluated using various metrics. The accuracy_score calculates the
accuracy of the model by comparing the predicted values (y_pred) with the actual values (y_test). The
classification_report provides a detailed summary of the model's performance, including precision, recall, F1-
score, and support for each class (0 for non-diabetic, 1 for diabetic). The confusion_matrix displays the counts
of true positives, false positives, true negatives, and false negatives, providing insight into the model's behavior
across different classes.
15
Fig 4 Confusion Matrix
Chapter No. 05
Implementation
16
17
18
19
Chapter No. 06
Conclusion
My internship was a valuable and enriching experience that provided me with
practical knowledge and hands-on experience in the field of Data Science. During
my 4-weeks internship, I was able to learn and work on data science, including
machine learning.
Throughout my internship, I was able to work with a team of experienced
professionals who provided guidance and support, allowing me to gain valuable
insights into the industry's best practices and standards. I was also able to develop
my communication and collaboration skills by working with other team members
and communicating with clients.
Overall, my internship has helped me to gain practical knowledge and experience
in the field of web application infrastructure, which will be useful in my future
career endeavors. I am grateful for the opportunity to have worked with such a
20
talented and experienced team and am confident that the skills and knowledge
gained during my internship will help me to succeed in my future career.
21
22