0% found this document useful (0 votes)

2K views23 pages

Exposys Data Labs: Internship Report On Data Science Project

This internship report summarizes a data science project to predict diabetes using machine learning algorithms. The intern analyzed diabetes data and built prediction models using K-Nearest Neighbors and Support Vector Machine algorithms. The K-Nearest Neighbors model achieved the highest accuracy in predicting whether a patient was diabetic or not. The goal of the project was to help identify individuals at risk of diabetes so they can better manage their health.

Uploaded by

Dhyeaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views23 pages

Exposys Data Labs: Internship Report On Data Science Project

Uploaded by

Dhyeaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Internship Report

on
Data Science Project

Topic :
Diabetes Disease Prediction

Submitted
by
Richa Jha

Under the Guidance

of
Y .Vishnuvardhan

Performed at

Exposys Data Labs

P.M R. Residency
Ground Floor, No-5/3 Sy. No.10/6-1
Doddaballapur Main Road
Yelahanka Bengaluru, Karnataka 560064

Duration of Internship :
1 Month

Abstract
Diabetes is a most common disease in these world affecting major part of
the population. Although it is a global concern but still most part of
population are unaware to deal with it. When the level of glucose in our
blood increased (often called hyperglycaemia) above the required level
then the state is called Diabetes. Inability of pancreas to produce required
insulin or inability to use it properly to convert glucose into energy is the
cause of Diabetes. Diabetes majorly of 3 types : Diabetes Mellitus,
Diabetes Insipidus, Gestational Diabetes. So in this project. It is a Data
science project in which we need to predict whether a given data point is
diabetic or not. Since it comes under Classification technique which is a
type of Supervised Machine Learning, here the model is built using
different classification algorithms like K-Nearest Neighbors, and Support
Vector Machine. The model built with K-Nearest Neighbors gave high
accuracy out of all in predicting whether the given data point is diabetic or
not.

CONTENTS
1 INTRODUCTION 1-6

1.1 BASIC INFORMATION 1

1.2 DATA SCIENCE 2

1.3 MACHINE LEARNING 3

1.4 MACHINE LEARNING MODELS 4-6

2 PROPOSED METHODOLOGY 7-14

2.1 PROJECT DETAILS 7-9

2.2 PROCESS FLOW 9-14

3 IMPLEMENTATION 15-18

3.1 PROJECT DETAILS 15

3.2 REQUIREMENTS 15

3.3 IMPLEMENTATION STEPS 15-18

4
CONCLUSION 19

5
REFERENCES 20

CHAPTER 1
INTRODUCTION
1.1 BASIC INFORMATION
Diabetes is a most common disease in these world affecting major part of the
population. Although it is a global concern but still most part of population are
unaware to deal with it. When the level of glucose in our blood increased (often
called hyperglycaemia) above the required level then the state is called
Diabetes. Inability of pancreas to produce required insulin or inability to use it
properly to convert glucose into energy is the cause of Diabetes.

Diabetes majorly of 3 types :

• Diabetes Mellitus :
This is the most common form of Diabetes which can occur at any age because
of the current lifestyle of Humans.
These disease is caused due to inability of functioning of pancreas thus
fluctuating the level of glucose in the blood.
Its general symptom can include Weakness, Shaking , sweating , Hunger etc
Diabetes Mellitus types are:
Type 1 Diabetes : It is a state where pancreas are unable to produce enough
insulin. It is an Autoimmune disorder which develops rapidly often in younger
ages. It immediately need for insulin to be injected in body to make insulin
balance stable. It can be deadly if not caught in time. This are not preventable
just after diagnose we can make the insulin level stable by insulin therapy.
Symptoms can be : Unexplained Weight loss, Tiredness and weakness,
Increased Thirst etc.
Type 2 Diabetes : This is state of body where insulin is produced properly but
unable to use it properly and thus makes body insulin resistant. It can prevented
through proper diet, exercise and lifestyle. Symptoms can be obesity and
weakness etc. It is not genetic disorder so it can come even if it not in
hereditary.
Page 1

• Gestational Diabetes :
It is a temporary state in pregnancy which occur during the gestation period.
Develop in 2nd or 3rd trimester which ends after pregnancy. It can be treated
through proper diet and exercise. But If proper care not taken can risk to Type 2
Diabetes later in life.

• Diabetes Insipidus :
It is a rare state where there is no role with glucose in blood while this takes
place in kidney. Kidney produces a lot of dilute pale urine which does not have
glucose.
Symptom can be excessive thirst or hunger , excessive urine , excessive thirst
and dehydration.

1.2 DATA SCIENCE

Data Science has an intersection with artificial intelligence but is not a subset of
artificial intelligence. This means it is that technique which uses some
techniques of artificial intelligence, machine learning, and deep learning. By
using this techniques data are first visualized, then builds model, and at last do
prediction.

Artificial
Intelligence

Machine Data
Learning

Deep
Learning

Fig 1.2.1 Intersection of Data Science with AI, ML, DL

Page 2

Applications of Data science are :

• Healthcare
• Telecom
• Bank
• Retail, and etc.

1.3 MACHINE LEARNING

Machine Learning is a subset of Artificial Intelligence. It is a set of algorithms
that train on a data set to make predictions or take actions in order to optimize
some systems.
It is a technique in which computer program learn from experiences or past
experiences. Then specifically related to that it performs the task.
Machine Learning algorithm create model using sample data or training data.
Which takes the decision or prediction to perform the task. Then model checks
the accuracy of that prediction, if accuracy is acceptable and prediction is
correct, then machine learning algorithm gets deploy. But if accuracy is not
acceptable then model again get trained until accuracy doesn’t get accepted.

In machine learning whatever the processes are involved are similar to data
mining and predictive modeling.
• Data mining is the process of finding the patterns from the large dataset in
which machine learning, statistics, and database systems are used. It is also
known as KDD(Knowledge discovery in databases).
• Predictive Modeling is the process of predicting future outcomes by using
data mining and probability.

It uses mathematical module to find out hidden pattern from the data. This
solves the real life problem or business problem. Machine Learning do complex
task easily. For example face detection, self driving car, Alexa, etc.
Page 3

Machine learning algorithms has three parts:-

- Input
- Output
- Objective Function or Performance matrix

The steps involved in machine learning are:-

• Problem Identification
• Data Collection and validation
• Model Building
• Feedback

1.2.1 MACHINE LEARNING MODELS

In Machine Learning, techniques and data are more important, but to identify
the type of problem (supervised or unsupervised) is equally important.

Machine Learning

Supervised Unsupervised

Fig 1.2.1 Types of Machine Learning

Supervised Learning:-
In this input and output are already available. On the basis of this input and
output or labeled data model is created and on that model new input are given
and checked whether valid output is coming or not. In this model machine
knows the features.
Supervised Learning are,
• Already Tagged Data
• Features and labels present

Page 4

• There are two types of supervised Learning:-

• Classification:-
- Classify which label a given set of features belongs to.
- This algorithm is used when there are limited number of answers
Like, yes or no, 0 or 1, true or false, etc.
For example, Is it cold? -> Yes or No
- In these algorithm we have only two outputs. So, when there are
only two outputs (only two choices) then that type is called as 2
class classification algorithm. But if there are more than two
choices then this type is called as multi class classification.
- To solve the classification problem the algorithms required are,
- Logistic regression
- K-Nearest Neighbors (KNN)
- Support Vector machine (SVM)
- Naive Bayes
- Decision Tree Classification
- Random Forest Classification
• Regression:-
- Find out the value of the label using previous data.
For example, What will be the temperature tomorrow?
So the output will be suppose, 28.
- To solve the regression problem the algorithms required are,
- Linear Regression
- Multiple Linear Regression
- Polynomial Regression
- Support Vector Regression
- Decision Tree Regression
- Random Forest Regression

Unsupervised Learning:-
All a machine knows is the data in front of it. No Features, No Labels. Machine
seeing the data for the first time.
Page 5

In this model,
• Data are not already tagged
• Features and labels not present
• Training not done
• Clustering:- Discover the inherit groupings in the data, such as growing
customers by purchasing behaviour. There are two types of algorithm used for
clustering and they are,
- K-Means Clustering
- Hierarchical Clustering
• Association:- Association rule learning problem, such as people that buy X
also tend to buy Y. Algorithm used for association are,
- Apriori
- Eclat

To know how a problem cane solved using machine learning models, some of
the examples are:-
1. Is this person is diabetic? (Yes or no)
Algorithm used will be Classification Algorithm.
2. Is this A or B?
Algorithm used will be Classification Algorithm
3. Is this weird?
Algorithm used will be Anomaly Detection Algorithm.
4. How many?
Algorithm used will be Regression Algorithms.
5. How is this organised?
Algorithm used will be Clustering Algorithm.

Page 6

CHAPTER 2
PROPOSED METHODOLOGY
2.1 PROJECT DETAILS
It is a Data science project in which we need to predict whether a given data
point is diabetic or not. Since it comes under Classification technique which is a
type of Supervised Machine Learning, here the model is built using different
classification algorithms like K-Nearest Neighbors, Naive Bayes , and Support
Vector Machine. The model built with K-Nearest Neighbors gave high accuracy
out of all in predicting whether the given data point is diabetic or not.

Libraries used in this project are:-

• NumPy :
- Here, Num means numeric and Py means python.
- It is a scientific computing library for python.
- It supports multi-dimensional array. It is used to represent large number of
data in the form of array.
- NumPy Library is used for numeric calculation.
- To store data it uses less memory. It is very convenient and process fast.
- NumPy for Machine Learning:- Any machine learning model can’t directly
perform operation on images so for that images are converted into the
numpy array format then machine learning can perform operations.

Fig 2.1.1 NumPy for Machine Learning

Page 7

Here in fig., image is converted to binary format(numeric format) then in

NumPy array format. Now machine learning model can perform operation on
this format.

• Pandas:
- Pandas is a powerful python data analysis toolkit.
- Pandas is used for data manipulation, analysis and cleaning.
- Open Source
- For data manipulation it is a fast and efficient data frame.
- It can read different types of formats like, csv, JSON, etc.
- Importance of Pandas:- Suppose in data science projects, to find useful
insights or patterns from raw data firstly raw data should be in proper
format. This is known as data preprocessing. In data preprocessing panda
libraries are used. To prepare and process the data this, library is very fast
and efficient. This library have many features like, it can use for handling
missing values, reshaping the datasets, etc.

• SKLearn:
Using SKLearn for machine learning:-
- Simple and efficient tools for data mining and data analysis.
- Accessible to everybody and reusable in various contexts.
- Built on Numpy, Scipy, and Matplotlib.
- Open source, commercially usable-BSD license.

• Matplotlib:
- Matplotlib is a 2D and 3D plotting python library.
- This library is used when there is a bulk of data at that time we can
transform that data in a graphical representation and can analyse the data
easily.
- It is use to create high quality graph.
- Matplotlib graphs are Histogram, Bar charts, Scatterplots, etc.

Page 8

Fig 2.1.2 Matplotlib Plots

• Seaborn:
- It is a python library use for data visualization. This library is build using
matplotlib library.
- It is use for statistical library.
- Seaborn graphics are Heatmap, Pair plot, Facet grid, etc.
- Dependency of Seaborn:- Seaborn library is dependent upon Python,
NumPy, Pandas, Scipy, Matplotlib.

2.2 PROCESS FLOW

The steps involved in the machine learning process,

Project Objectives Data Collection Data Preprocessing

Data visualization

Deployment Model Building Model Selection

Fig 2.2.1 Process Flow

Page 9

Step 1: Project Objectives

This is the first step in which business problems and requirements are specified.
For example, In this project the objective is, to predict whether the person has
Diabetes or not based on various features like Number of Pregnancies, Insulin
Level, Age, BMI.

Step 2: Data Collection

In this process, the data required for project are collected or gathered.
There are two types of data collection:-
1. Primary data
2. Secondary data

Primary data :- This are the original data collected through researcher or user
who is doing project. This is the first hand source of data collection.

Secondary data :- If the data is collected from a person other than the user then
that data is called as secondary data. This type of data are passed through
statistical process or some modification and given to user.

Now by using this process data are collected. But in this raw data their are some
impurities like,
• Between integer value there are some text values,
• Some values are missing,
• Inconsistency in data i.e data is not in ordered form or not in proper format,
and etc.
So because of these impurities or problem work can’t be done on this data.
Therefore, to overcome from this problem after the data collection process data
preprocessing is done.

Step 3: Data Preprocessing

In this process, raw data or original data are purified. For data preprocessing the
python libraries like NumPy, Panda, Sklearn, etc are used.
Page 10

The steps in this process are,

1. Data Cleaning:
In this step, missing data are filled using nan or 0. Any row who’s heading
or label is not given are ignored or removed. Any type of irrelevant data
are removed. After this, noisy data (meaningless data or corrupted data)
are deleted or removed.

2. Data Transformation:
In this step, feature scaling is done. In this scikit-learn library is used.
Normally for solving classification problem data transformation is used.

3. Dimensionality Reduction:
In this step, dimensions are reduced like 3D data is converted to 2D data.
So in this way data are analysed.

Step 4: Data Visualization

Data visualization is a graphical representation of data. Here, data are
transformed to piecharts, graphs, bar graphs, histograms, etc. For data
visualization python libraries like matplotlib, seaborn, etc are used. Values are
created from data by doing Visualization. New patterns are discovered and to
find out useful insights. This is like comparative analysis here data are
compared and find out the relationships between x(independent variable) and
y(dependent variable or target variable) variable.

Step 5: Model Selection

In this step, we have to select best machine learning model according to the
data which will give us the accurate result. In simple language, we have to find
out the which algorithm should be used for this particular data.
Suppose we have to predict the numeric value, then linear regression algorithm
will be used or if we have to do classification then logistic regression algorithm
can be used and so on. So basically perfect model or algorithms are selected
according to the problem statement.

Page 11

There are many machine learning models, but in this project we will see
classification algorithms. Since this project (diabetes disease prediction) comes
under Classification technique which is a type of Supervised Machine
Learning, here the model is built using different classification algorithms and
some of them are:
‣ Logistic regression :-
This algorithm is used to solve the classification problems. The nature of
the response variable is categorical. It helps to calculate the possibility of
a particular event taking place.

It takes features and training data (labels). It fits a linear model. Instead of
giving the result, it gives the logistic of the result.

Logistic regression is a sigmoid function and shape of the curve is S.

The formula for Sigmoid curve is,
Y = 1 / 1+e-x

Fig 2.2.2 Logistic regression

Applications of Logistic regression are weather prediction, disease

prediction, image categorisation, and etc.

‣ K-Nearest Neighbors (KNN) :-

It is a supervised learning algorithm that can perform both classification
and regression task using k (numbers) neighbours (instances). In this,
Page 12

firstly train sets fit into the model and then input the test set into the
trained model to obtain the required results.

‣ Support Vector machine (SVM):

It is a supervised learning algorithm that can perform both classification
and regression task. Support vector machine makes sure that when
hyperpath plane is created then two margin line is also created. This
margin lines are created, have some distance so that it is easily linearly
separable for both the classification points.

Fig 2.2.3 SVM

The main goal of the SVM algorithm is to develop the best decision
boundary that can segregate n-dimensional space into classes. So that the
new data points can be easily put in the correct category – Hyperplane.

‣ Naive Bayes:
- It is a supervised learning algorithm in which features are independent.
- In this, labels are already assigned and feature detection is defined in
advance.
- Classifiers are created by training them on labeled data.
- It can solve both continuous and categorical value attributes.
- It is a process of two step,
- Model Construction and
- Model Usage
Page 13

- It is a classifier which uses Bayes Theorem. Bayes theorem finds the

probability of an event occurring given the probability of another event
that has already occurred. Bayes theorem is stated mathematically as
the following equation,
P(A/B) = P(B|A) P(A)/P(B)

Step 6: Model building

In this step, by using algorithm data are split into training and testing sets like
70-30, 80-30, and etc.
Suppose data are split into 70-30 then this means that,
• 70% part of the data are for training phase in which model get trained and
• remaining 30% data are for testing phase.
After this process, prediction is done from that model. If prediction is not
accurate then again model gets trained and testing is done again.
This happen again and again until accurate prediction is not calculated. And in
this way model improve itself according to desired output. Now model is ready
for prediction and for solving problem.

Step 7: Model Deployment

This is the last stage of the machine learning life cycle. Machine learning model
are deployed into production environment for taking decision.

Page 14

CHAPTER 3
IMPLEMENTATION
3.1 PROBLEM STATEMENT:
The goal of the project is to,

1. Prepare the data-set using several methods to train the model.

2. Build a model which can give high accuracy of predicting the disease.

3.2 REQUIREMENTS:
• SOFTWARE:
• Anaconda navigator's Jupiter notebook
• WEBSITE:
• Here, Dataset is taken from kaggle site (https://www.kaggle.com/uciml/
pima-indians-diabetes-database).
• LIBRARIES:
• Pandas,
• Numpy,
• Matplotlib,
• Seaborn,
• Scikit Learn

3.3 IMPLEMENTATION STEPS:

Step by Step implementation of the project:-
Step 1:-
Import all the required libraries such as numpy, pandas, etc.
For example,
import numpy as np

Page 15

Fig. 3.3.1 Libraries

This is the syntax one should import libraries like this. But this libraries should
be installed first then import the libraries.

Step 2:-
Second step is to read the dataset and stored it into new variable.
Here, Dataset is taken from kaggle site (https://www.kaggle.com/uciml/pima-
indians-diabetes-database).

In this, datasets consists of several medical predictor variables and one target
variable, Outcome. Predictor variables include the number of pregnancies the
pa ent has had, their BMI, insulin level, age, and so on.
The following is the descrip on of the 6 variables:

1. Pregnancies: It is no of pregnancies the pa ent has had which is in terms

of numerical values.

2. Glucose: It is blood glucose levels.

3. Blood Pressure: It's Diastolic Blood pressure.

4. Skin Thickness: Triceps skin fold thickness in mm

5. Insulin: It’s the insulin levels in the pa ent blood.

6. BMI: Body Mass Index i.e., (weight in kg/(height in m)^2).

7. Diabetes Pedigree Func on.

8. Age: In years.

Page 16
ti

ti
ti

9. Outcome: In 0 or 1 where 1 represents presents of diabetes and 0 shows

there is no diabetes.

To read any dataset we need pandas library. In this project, dataset is in the csv
format. Here csv means comma separated value and to read the data of csv
format we use,
read_csv()
For example,

Fig 3.3.2 Screenshot of code

Pandas library are used to read any format of the dataset. Suppose if dataset is
in excel form then,
read_excel().

Now after reading the dataset new variable will be assigned in which this
dataset will be stored.
For example, in this project “data” variable is assigned,
data = pd.read_csv()

We assign this variable so that whenever we have to use the dataset we can call
this variable.

Step 3:-
Next step is to check the shape of the dataset or dataframe.
For example,
Shape of the dataset : (768, 9)
This means, in dataset there are 768 rows and 9 columns.

Page 17

Step 4:-
Now next step is to check if there is any missing values present in the dataset.
Here there is no missing value present so we can move further.

Step 5:-
Now in this step we will perform the data visualization process. And will
perform training and testing.

Step 6:-
Now Next step is to select the model. So, in this project we used the algorithms
like K Neighbors Classifier, and Support Vector Classifier.

Step 7:-
After model building next step is to check the highest accuracy among all the
algorithms used.

Step8:-
After finding highest accuracy, check whether the model selected is correct or
not.

Fig 3.3.3 Checking the Model

Page 18

CHAPTER 4
CONCLUSION
In this project, all the steps were performed from Data collecting to Model
deployment. In this, classification Learning algorithms and different libraries
are used. During Model evaluation, machine learning algorithms on the basis of
accuracy score were compared and finds the best one. So, the best accuracy was
given by K- Nearest Neighbors (KNN). This model is perfectly predicting the
person suffering from diabetes disease.

Page 19

References
I. https://www.kaggle.com/uciml/pima-indians-diabetes-database

II. Vapnik, V..Statistical learning theory. 1998 (Vol. 3). .New York,
NY: Wiley，1998:Chapter 10-11, pp.401-492

III. Research on Diabetes Prediction Method Based on Machine

Learning To cite this article: Jingyu Xue et al 2020 J. Phys.:
Conf. Ser. 1684 012062

Page 20

Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
13 pages
21cs644 Module 3
100% (1)
21cs644 Module 3
95 pages
Cognizant SDE Sheet: Interview Questions and Answers
No ratings yet
Cognizant SDE Sheet: Interview Questions and Answers
16 pages
Skin Cancer Detection with DL
100% (2)
Skin Cancer Detection with DL
5 pages
Web Development Problem Statements
100% (1)
Web Development Problem Statements
2 pages
Batch C03 Medicine Recommendation System Using Machine Learning
No ratings yet
Batch C03 Medicine Recommendation System Using Machine Learning
17 pages
Analytics Insights PWC Launchpad24 Handbook 15th Feb24
No ratings yet
Analytics Insights PWC Launchpad24 Handbook 15th Feb24
18 pages
Infosys - Data Sufficiency
No ratings yet
Infosys - Data Sufficiency
25 pages
Final Year Project
No ratings yet
Final Year Project
41 pages
Avasoft Written Test Preparation
100% (1)
Avasoft Written Test Preparation
2 pages
Gold Price Prediction Using Ensemble Based Supervised Machine Learning
100% (2)
Gold Price Prediction Using Ensemble Based Supervised Machine Learning
30 pages
ML - CSA 301 - ML Perspective and Issues
No ratings yet
ML - CSA 301 - ML Perspective and Issues
34 pages
Report 20220209 Talent Matcht Prueba 1 Backend C.buitron Outlook - Com77911720767
No ratings yet
Report 20220209 Talent Matcht Prueba 1 Backend C.buitron Outlook - Com77911720767
10 pages
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
PHP Practicals
No ratings yet
PHP Practicals
36 pages
Wissen Test
100% (1)
Wissen Test
10 pages
AKTU Notes Machine Learning (ROE083) Unit-1 - UPTU Notes PDF
50% (2)
AKTU Notes Machine Learning (ROE083) Unit-1 - UPTU Notes PDF
66 pages
TCS CodeVita Season 12 Complete Guide
No ratings yet
TCS CodeVita Season 12 Complete Guide
91 pages
ML - Lab Manual (BAI702) - Updated 2-7-2025
100% (1)
ML - Lab Manual (BAI702) - Updated 2-7-2025
32 pages
Cisco C Programming MCQ Review
No ratings yet
Cisco C Programming MCQ Review
9 pages
ML Lab Manual-Bcsl606
No ratings yet
ML Lab Manual-Bcsl606
64 pages
Question 1: Make A List of Project Numbers For Projects That Involve An Employee Whose Last Name Is Smith', Either As A Worker or As A Manager of The Department That Controls The Project
100% (1)
Question 1: Make A List of Project Numbers For Projects That Involve An Employee Whose Last Name Is Smith', Either As A Worker or As A Manager of The Department That Controls The Project
6 pages
Unit I Notes Machine Learning Techniques 1
No ratings yet
Unit I Notes Machine Learning Techniques 1
21 pages
ML UNIT-5 Notes PDF
No ratings yet
ML UNIT-5 Notes PDF
41 pages
(Updated) TCS NQT Syllabus 2025 & Exam Pattern - PrepInsta
No ratings yet
(Updated) TCS NQT Syllabus 2025 & Exam Pattern - PrepInsta
9 pages
Mental Health Detection Using Machine Learning
100% (1)
Mental Health Detection Using Machine Learning
31 pages
Scenario - Based - Que - Dbms
100% (1)
Scenario - Based - Que - Dbms
8 pages
Cognizant Resume Template for Freshers
No ratings yet
Cognizant Resume Template for Freshers
1 page
Problem-Solving Approach To Typical AI Problems
50% (2)
Problem-Solving Approach To Typical AI Problems
4 pages
F.Y.M.Sc. (CS) Sem-I AI Pract Slip
No ratings yet
F.Y.M.Sc. (CS) Sem-I AI Pract Slip
22 pages
Testpress Coding Round Challenges
100% (1)
Testpress Coding Round Challenges
2 pages
HP Placement Papers With Answers
100% (1)
HP Placement Papers With Answers
29 pages
Data Science Mathematics Quizzes Overview
100% (1)
Data Science Mathematics Quizzes Overview
25 pages
VTU 6th Sem Machine Learning Lab Manual
No ratings yet
VTU 6th Sem Machine Learning Lab Manual
49 pages
TCS Aptitude Test 1
No ratings yet
TCS Aptitude Test 1
16 pages
Evaluating Hypotheses in Machine Learning
No ratings yet
Evaluating Hypotheses in Machine Learning
8 pages
Machine Learning Unit 5
No ratings yet
Machine Learning Unit 5
43 pages
KakushIN Problem Statement
No ratings yet
KakushIN Problem Statement
3 pages
Sciative Questions & Answar
No ratings yet
Sciative Questions & Answar
3 pages
Machine Learning Techniques Unit-1 (KAI-601)
No ratings yet
Machine Learning Techniques Unit-1 (KAI-601)
78 pages
Stateless Algorithms in Reinforcement Learning
No ratings yet
Stateless Algorithms in Reinforcement Learning
4 pages
CNN Architecture & Training Guide
No ratings yet
CNN Architecture & Training Guide
7 pages
Hypothesis Space Search in Decision Trees
No ratings yet
Hypothesis Space Search in Decision Trees
15 pages
ICICI - Sample Placement Question Papers
No ratings yet
ICICI - Sample Placement Question Papers
12 pages
Smart Health Disease Prediction Django
No ratings yet
Smart Health Disease Prediction Django
41 pages
Fake Job Post Detection Using Machine Learning
100% (1)
Fake Job Post Detection Using Machine Learning
24 pages
Flipkart Reviews Sentiment Analysis
100% (1)
Flipkart Reviews Sentiment Analysis
22 pages
Data Mining - Classification Using Frequent Pattern
No ratings yet
Data Mining - Classification Using Frequent Pattern
8 pages
Prodapt Sample Campus Assessment Paper
100% (1)
Prodapt Sample Campus Assessment Paper
25 pages
Frequent Pattern Based Clustering
100% (1)
Frequent Pattern Based Clustering
18 pages
JD From ADVI Group of Companies
No ratings yet
JD From ADVI Group of Companies
7 pages
Python IEEE Project Titles 2024 2025 Machine Learning Projects Artificial Intelligence Projects Deep Learning Projects
No ratings yet
Python IEEE Project Titles 2024 2025 Machine Learning Projects Artificial Intelligence Projects Deep Learning Projects
10 pages
A Mini Project Report On: "Big Mart Sales Prediction" by
67% (3)
A Mini Project Report On: "Big Mart Sales Prediction" by
23 pages
AVASOFT Application Developer Technical Screening - MCQs
No ratings yet
AVASOFT Application Developer Technical Screening - MCQs
23 pages
Business Analytics Anna University
No ratings yet
Business Analytics Anna University
40 pages
Managing State: 5.1 The Problem of State in Web Applications
No ratings yet
Managing State: 5.1 The Problem of State in Web Applications
17 pages
Machine Learning - Version Space - Problem 1
No ratings yet
Machine Learning - Version Space - Problem 1
4 pages
Internship
No ratings yet
Internship
15 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
15 pages
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
No ratings yet
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
12 pages
Elbow Joint Mechanics Explained
No ratings yet
Elbow Joint Mechanics Explained
32 pages
FAI Unit 1
No ratings yet
FAI Unit 1
40 pages
Knee Joint Biomechanics Overview
No ratings yet
Knee Joint Biomechanics Overview
14 pages
BM - Lec 24 - Biomechanics of Soft Tissue (Muscles)
No ratings yet
BM - Lec 24 - Biomechanics of Soft Tissue (Muscles)
40 pages
Biomechanics of Cartilage Mechanics
No ratings yet
Biomechanics of Cartilage Mechanics
10 pages
Diabetes Prediction Using ML Techniques
No ratings yet
Diabetes Prediction Using ML Techniques
18 pages
Grade 5 Math Exam Solutions by Ulka Shah
No ratings yet
Grade 5 Math Exam Solutions by Ulka Shah
4 pages
Advanced Math Problems
No ratings yet
Advanced Math Problems
4 pages
14 DPP Jee Neet Wave PDF
No ratings yet
14 DPP Jee Neet Wave PDF
2 pages
NCP 2
No ratings yet
NCP 2
1 page
(Colombo & Mella, 2022) Organizational Behavior - Attention, Knowledge, and Control Systems
No ratings yet
(Colombo & Mella, 2022) Organizational Behavior - Attention, Knowledge, and Control Systems
27 pages
Baroody The Development of Adaptive Expertise and Flexibility The Integration of Conceptual and Procedural Knowledge
No ratings yet
Baroody The Development of Adaptive Expertise and Flexibility The Integration of Conceptual and Procedural Knowledge
52 pages
A-Level Child Psychology Insights
No ratings yet
A-Level Child Psychology Insights
13 pages
VB2 DA 2020.01.07 Nguyenthisongthuong
No ratings yet
VB2 DA 2020.01.07 Nguyenthisongthuong
5 pages
Frequency and Ability in Language Use
No ratings yet
Frequency and Ability in Language Use
4 pages
Case Formulation in Emotion-Focused Therapy Co-Creating Clinical Maps For Change (Rhonda N. Goldman and Leslie S. Greenberg)
100% (11)
Case Formulation in Emotion-Focused Therapy Co-Creating Clinical Maps For Change (Rhonda N. Goldman and Leslie S. Greenberg)
249 pages
DLunit 3
No ratings yet
DLunit 3
13 pages
Methodology in Tesol
No ratings yet
Methodology in Tesol
11 pages
Technical Writing Process Overview
No ratings yet
Technical Writing Process Overview
20 pages
Mindsets Behaviors
No ratings yet
Mindsets Behaviors
6 pages
Accomplishment Report: Oct 22-26, 2018
No ratings yet
Accomplishment Report: Oct 22-26, 2018
1 page
Sentience in Animals and Human Relations
No ratings yet
Sentience in Animals and Human Relations
26 pages
Action Plan 8WRRP 2021 2022 5
No ratings yet
Action Plan 8WRRP 2021 2022 5
4 pages
Pretest English For Academic and Professional Purposes (Write The Word of The Correct Answer)
No ratings yet
Pretest English For Academic and Professional Purposes (Write The Word of The Correct Answer)
3 pages
Theories
No ratings yet
Theories
2 pages
Technical Writing Style
No ratings yet
Technical Writing Style
18 pages
Therapist Microskills Analysis
No ratings yet
Therapist Microskills Analysis
4 pages
Developing Self-Regulated Learning
No ratings yet
Developing Self-Regulated Learning
2 pages
7th Grade English Grammar Guide
No ratings yet
7th Grade English Grammar Guide
44 pages
3rd Grade Solar System Lesson
No ratings yet
3rd Grade Solar System Lesson
4 pages
Unit 1: Embracing The Humanities
0% (1)
Unit 1: Embracing The Humanities
35 pages
Special Education Assessment Article
No ratings yet
Special Education Assessment Article
85 pages
Self MGMT ES Lesson Grades 3 5
No ratings yet
Self MGMT ES Lesson Grades 3 5
2 pages
Mind Control Language Techniques
100% (3)
Mind Control Language Techniques
32 pages
Article MCQ Questions and Answers - English Aptitude MCQ
No ratings yet
Article MCQ Questions and Answers - English Aptitude MCQ
7 pages
Miroslav Kubat Ivan Bratko Ryszard Michalski
No ratings yet
Miroslav Kubat Ivan Bratko Ryszard Michalski
72 pages
Lesson Plan in English 7: Department of Education
No ratings yet
Lesson Plan in English 7: Department of Education
6 pages
CRM AI Viva Cheat Sheet
No ratings yet
CRM AI Viva Cheat Sheet
2 pages
Awareness
No ratings yet
Awareness
681 pages

Exposys Data Labs: Internship Report On Data Science Project

Uploaded by

Exposys Data Labs: Internship Report On Data Science Project

Uploaded by

Internship Report

Under the Guidance

Exposys Data Labs

1.1 BASIC INFORMATION 1

1.2 DATA SCIENCE 2

1.3 MACHINE LEARNING 3

1.4 MACHINE LEARNING MODELS 4-6

2 PROPOSED METHODOLOGY 7-14

2.1 PROJECT DETAILS 7-9

2.2 PROCESS FLOW 9-14

3.1 PROJECT DETAILS 15

3.3 IMPLEMENTATION STEPS 15-18

Diabetes majorly of 3 types :

1.2 DATA SCIENCE

Fig 1.2.1 Intersection of Data Science with AI, ML, DL

Applications of Data science are :

1.3 MACHINE LEARNING

Machine learning algorithms has three parts:-

The steps involved in machine learning are:-

1.2.1 MACHINE LEARNING MODELS

Fig 1.2.1 Types of Machine Learning

• There are two types of supervised Learning:-

Libraries used in this project are:-

Fig 2.1.1 NumPy for Machine Learning

Here in fig., image is converted to binary format(numeric format) then in

Fig 2.1.2 Matplotlib Plots

2.2 PROCESS FLOW

Project Objectives Data Collection Data Preprocessing

Deployment Model Building Model Selection

Fig 2.2.1 Process Flow

Step 1: Project Objectives

Step 2: Data Collection

Step 3: Data Preprocessing

The steps in this process are,

Step 4: Data Visualization

Step 5: Model Selection

Logistic regression is a sigmoid function and shape of the curve is S.

Fig 2.2.2 Logistic regression

Applications of Logistic regression are weather prediction, disease

‣ K-Nearest Neighbors (KNN) :-

‣ Support Vector machine (SVM):

Fig 2.2.3 SVM

- It is a classifier which uses Bayes Theorem. Bayes theorem finds the

Step 6: Model building

Step 7: Model Deployment

1. Prepare the data-set using several methods to train the model.

3.3 IMPLEMENTATION STEPS:

Fig. 3.3.1 Libraries

1. Pregnancies: It is no of pregnancies the pa ent has had which is in terms

2. Glucose: It is blood glucose levels.

3. Blood Pressure: It's Diastolic Blood pressure.

4. Skin Thickness: Triceps skin fold thickness in mm

5. Insulin: It’s the insulin levels in the pa ent blood.

6. BMI: Body Mass Index i.e., (weight in kg/(height in m)^2).

7. Diabetes Pedigree Func on.

9. Outcome: In 0 or 1 where 1 represents presents of diabetes and 0 shows

Fig 3.3.2 Screenshot of code

Fig 3.3.3 Checking the Model

III. Research on Diabetes Prediction Method Based on Machine

You might also like