0% found this document useful (0 votes)

18 views7 pages

Predicting Insurance Charges Using Linear Regressi

This paper explores the application of linear regression models to predict medical insurance charges using a dataset from Kaggle. It analyzes the correlation between various factors such as age, sex, BMI, smoking behavior, and region, ultimately constructing a multilinear regression model to forecast insurance costs. The findings indicate that smoking and BMI significantly influence insurance charges, while gender has a minimal impact.

Uploaded by

Protiva Rani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views7 pages

Predicting Insurance Charges Using Linear Regressi

Uploaded by

Protiva Rani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Proceedings of CONF-MPCS 2024 Workshop: Quantum Machine Learning: Bridging Quantum Physics and Computational Simulations

DOI: 10.54254/2753-8818/51/2024CH0161

Predicting insurance charges using linear regression models

Wenyu Dai
College of Letters & Science, University of Wisconsin-Madison, WI, 53706, United
States

[email protected]

Abstract. Linear regression method can be performed to predict the outcome from one or many
input values. Its versatility allows it to be applied on many datasets that contain correlated values.
However, researches on the application of linear regression on medical insurance costs, a highly
important part of people’s life, are few. This paper studies an insurance dataset from Kaggle by
applying linear regression on it. The author validates the dataset at first and explores the
correlation between each individual factor and their corresponding charges to better show how
insurance costs differ from person to person with different background. Many figures are also
included to help visualize the correlation between factors. In the end, the author creates a
multilinear regression model to predict the insurance charges. The R-Squared score of the model
and a result table including regression coeffects are also provided to show the accuracy and
details of the model.

Keywords: Insurance, charge prediction, linear regression.

1. Introduction
Medical insurance is an important form of risk management in people’s life. In case of major diseases,
it helps people to get through these difficulties by paying for their treatments and medical expenses.
However, the cost of medical insurances can be exceptionally high as it varies from one insurer to
another. People with bad habits or unhealthy life style are more likely to be charged a higher insurance
price because they are prone to major diseases. This paper uses a medical insurance dataset from Kaggle
to study the potential cause of high insurance charges. Since this dataset contains not only charges but
also other important information of the insurer, it is suitable to implement exploratory analysis on this
dataset and discover the statistical relationship between insurers’ characteristics and their insurance
charges.
In the dataset, each row represents an individual insurer. In a single row, the first six columns
represent characteristics about the insurer, and the last column is the medical insurance charge that
person needs to pay for. These characteristics all contribute to the final pricing of the insurance, but they
do not contribute equally.
It is expected that smoking and high body mass index (bmi) contribute more to higher charged, which
will be discussed in the result section [1]. The author also constructed a multilinear regression model to
predict the insurance charges. The multiple linear regression (MLR) equation is expected be ℎ𝜃 (𝑥𝑖 ) =
𝜃0 + 𝜃1 𝑥𝑖1 + 𝜃2 𝑥𝑖2 + ⋯ + 𝜃𝑛 𝑥𝑚𝑛 where θ is the parameter of each characteristic of the insurers, x is
the dependent variable like age and bmi, n is the number of variables, and m is the number of trails. For

© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).

51
Proceedings of CONF-MPCS 2024 Workshop: Quantum Machine Learning: Bridging Quantum Physics and Computational Simulations
DOI: 10.54254/2753-8818/51/2024CH0161

this dataset, the multilinear regression equation is ℎ𝜃 (𝑥𝑖 ) = 𝜃0 + 𝜃1 𝑎𝑔𝑒 + 𝜃2 𝑠𝑒𝑥 + 𝜃3 𝑏𝑚𝑖 +
𝜃4 𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛 + 𝜃5 𝑠𝑚𝑜𝑘𝑒𝑟 + 𝜃6 𝑟𝑒𝑔𝑖𝑜𝑛. For example, the MLR formula for the first insurer will be
written like this: 16884.92400 = 𝜃0 + 𝜃1 ∗ 19 + 𝜃2 ∗ 𝑓𝑒𝑚𝑎𝑙𝑒 + 𝜃3 ∗ 27.900 + 𝜃4 ∗ 0 + 𝜃5 ∗
𝑠𝑚𝑜𝑘𝑒𝑟(𝑦𝑒𝑠) + 𝜃6 ∗ 𝑠𝑜𝑢𝑡ℎ𝑤𝑒𝑠𝑡. The author uses Python to construct this MLR model and test its
accuracy.

2. Literature review
Tranmer et al. explained multilinear regression as an extension of simple linear regression [2]. It is a
statical model that can be applied to predict one dependent variable Y from a collection of independent
variables (𝑥1 → 𝑥𝑛 ). The dataset presented in this paper is suitable for multilinear regression because
the dependent variable charge is determined by several independent variables like age and sex.
Although this statistical model is advantageous in forecasting outcomes from various variables, there
can be drawbacks associated with it. In case where the relationship between different variables is
complex, it is difficult to examine which variables will possibly pollute the linear relationship due to
multicollinearity [3].
A good example of implementing MLR is found in the work of Uyanık et al. [4]. They constructed
an MLR model on State Employees Selection Exam (KPSS) scores. They use five other scores like
course evaluation scores and guidance scores as their independent variables to predict the dependent
variable: KPSS score. Their MLR model achieve a successful R-Squared score of 0.87 which indicates
that MLR should be effective on similar dataset like the one in this paper. Moreover, using MLR allows
all possible variables that can affect the final outcome to be considered.
Similarly, Rossi et al. tried to predict the specific methane production (SMP) using several other
chemicals [5]. They first develop a simple linear regression model using only the most correlated
chemical, lignin, to predict the SMP. The resulting model is unsatisfactory with a low R-Squared score
of 0.3. Consequently, they develop a MLR model including all variables correlated to the SMP, which
results in a successful model with a high R-Squared score of 0.87. Moreover, Rath et al. collected
COVID-19 data from World Health Organization to predict the number of active cases next day [6].
Their MLR model produce a high R-Squared score which shows the potential of using MLR in
predicting the spread of contagious diseases.
The size of sample size is also an important factor in producing a successful MLR model [7]. As
Knofczynski et al. showed in their work, the Squared Population Multiple Correlation decreases as the
sample size increases. It is important to increases sample sizes if dataset has a large number of
independent variables.
One drawback of MLR model is that the common measurements like mean square error and R-
Squared only test the goodness of fit of the model but not the accuracy of the prediction [8, 9]. In some
circumstances where the performance of the model is need, other methods like Random Forest may need
to be applied [10].

3. Methodology

3.1. Data source

The data for this paper is retrieved from Kaggle. The original data is from Brett Lantz’s Machine
Learning with R, and it is available on GitHub. Miri Choi reposted this dataset from GitHub to Kaggle.

3.2. Variable selection

Table 1 below shows the first five rows of the dataset. The dataset contains 1336 insurers with 51%
male and 49% female. The dataset contains 7 variables: the age of the insurer as “age”, the sex of the
insurer as “sex”, the body mass index of the insurer as “bmi”, the number of children of the insurer as
“children”, the smoking behavior of the insurer as “smoker”, the region of the United where the insurer
resides as “region”, and the insurance charge of that particular insurer.

52
Proceedings of CONF-MPCS 2024 Workshop: Quantum Machine Learning: Bridging Quantum Physics and Computational Simulations
DOI: 10.54254/2753-8818/51/2024CH0161

Table 1. Dataset Preview

age sex bmi children smoker region charges
0 19 female 27.900 0 yes southwest 16884.924
1 18 male 33.770 1 no southeast 1725.552
2 28 male 33.000 3 no southeast 4449.462
3 33 male 22.705 0 no northwest 21984.471
4 32 male 28.880 0 no northwest 3866.855
The medical insurances dataset is imported to Python as a comma-separated values file from Kaggle.
The dataset contains no null value, which means the integrity of the dataset is well preserved.

Figure 1. Multicollinearity of the dataset

To better deal with the dataset, categorical value like sex, smoker and region are converted to numeric
data. For sex and smoker, they are converted to simple 0s and 1s to distinguish whether an insurer is
male or female, smoker or non-smoker. For regions, there are a total of four different categories:
northeast, southeast, southwest, and northwest. They are converted to 0 to 3 to each represent a region.
After the dataset is preprocessed, a heat map, in figure 1, is made to check the multicollinearity of
the dataset. It is clear that independent variables do no show significant correlation with each other,
which means the MLR model will be more accurate. As result, the author will select all variables to
construct an MLR model.
It is also shown in figure 1 that age, bmi, and smoker are more correlated with charges, and the
correlations between each independent variables and the dependent variable will be explained in the
result section.

3.3. Linear regression method

The linear regression function can be simplified to its basic form of a regression line: 𝑓𝑤,𝑏 (𝑥) = 𝑤𝑥 +
𝑏 → 𝑦̂ where w is slope and b is intercept. To find the optimal w and b, it is important to minimize the
1 2
mean square error (MSE): ∑𝑛𝑖=1(𝑓𝑤,𝑏 (𝑥𝑖 ) − 𝑦𝑖 ) which penalize poor estimations. Before minimizing
𝑛
1 … 𝑥1
the MSE, variables x, w, and y should be written in vector form. 𝑥 = ( 1 … 𝑥𝑖 ) =
1 … 𝑥𝑁
𝑥10 𝑥11 … 𝑥1𝐷 𝑤0 = 𝑏 𝑦1
( 𝑥𝑖0 𝑥11 … 𝑥𝑖𝐷 ) 𝑤 = ( … ), and 𝑦 = ( … ). Now, the MSE can be minimized by setting
𝑥𝑁0 𝑥𝑁1 … 𝑥𝑁𝐷 𝑤𝐷 𝑦𝑛
its partial derivative to zero with respect to 𝑤𝑘 for k from 0 to D.

53
Proceedings of CONF-MPCS 2024 Workshop: Quantum Machine Learning: Bridging Quantum Physics and Computational Simulations
DOI: 10.54254/2753-8818/51/2024CH0161

𝑁 𝐷 𝑁 𝑁 𝐷
𝜕𝑦 1
(𝑀𝑆𝐸) = ∑ 2 (𝑦𝑖 − ∑ 𝑤𝑗 𝑥𝑖𝑗 ) (−𝑥𝑖𝑘 ) = 0 → ∑ 𝑦𝑖 𝑥𝑖𝑘 = ∑ ∑ 𝑤𝑗 𝑥𝑖𝑗 𝑥𝑖𝑘
𝜕𝑤𝑘 𝑁
𝑖=1 𝑗=0 𝑖=1 𝑖=1 𝑗=0
𝑁 𝐷 𝑛
𝑇 𝑇
→ ∑ 𝑋𝑘𝑖 𝑦𝑖 = ∑ (∑ 𝑋𝑘𝑖 𝑥𝑖𝑗 ) 𝑤𝑗 → [𝑋 𝑇 𝑦]𝑘 = [(𝑋 𝑇 𝑋)𝑤]𝑘 → 𝑤 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 (1)
𝑖=1 𝑗=0 𝑖=1

4. Results and discussion

4.1. Descriptive analysis

Figure 2 contains two histograms that each shows the distribution of medical insurance charges for
smoker and non-smoker.

Figure 2. Distribution of changes for smoker and non-smoker

From figure 3, it is clear that people with smoking behavior tend to spend more on insurances charges
while people who do not smoke spend much less. It is a strong indicator that smoking is highly correlated
with worse health conditions that tend to incur high charges. It is also worth noticing that gender is not
an important factor in influencing the insurance charges. As the violin plot figure 4 shows below, the
shape of the plots is relatively same for both male (as sex 0 in figure 3) and female (as sex 1), showing
that the distribution of charges is also similar for both sexes. The only factor that causes the plots to be
different is smoking. It is clear that smoking induces a higher insurance charge no matter it is male or
female, and smoking causes women to pay more than men.

Figure 3. Distribution of charges for male, female, distinguished by smoking behavior

54
Proceedings of CONF-MPCS 2024 Workshop: Quantum Machine Learning: Bridging Quantum Physics and Computational Simulations
DOI: 10.54254/2753-8818/51/2024CH0161

Figure 4 below shows the distribution of charges for different ages. For non-smoking insurers, it is
clear that the charges remain low and increase steadily with growing age. However, for smoking insurers,
although the charges also increase steadily, they are much higher than non-smokers. It is also remarkable
that the plot for smokers has two stripes of dots. This is relevant to figure 4 which shows the width of
the violin graph is wider for smoking women at higher charged than men. It is likely that the higher strip
of dots represents smoking women and the lower strip of dots represent smoking men. It is also assumed
that smokers have a higher tendency of developing other unhealthy behaviors that subsequently increase
the insurance charges.

Figure 4. Distribution of charges for different ages, distinguished by smoking behavior

Figure 5 shows the correlation between body mass index (bmi) and charges. A bmi over 30 is often
considered obese which has greater potentials of inducing serious diseases [6]. The graph shows that for
non-smokers, insurance charges only increase slightly as their bmi increases. However, for smokers, the
insurances charges increase dramatically with the rising of bmi. This can potentially explain why there
are two stripes of points in the second plot of figure 5.

Figure 5. Distribution of charges for different bmi, distinguished by smoking behavior

55
Proceedings of CONF-MPCS 2024 Workshop: Quantum Machine Learning: Bridging Quantum Physics and Computational Simulations
DOI: 10.54254/2753-8818/51/2024CH0161

Figure 6 and 7 below shows the histogram of the number of children that insurers have and the
correlation between number of children and medical insurances charges. The histogram shows that the
number of insures decreases as the number of children that have increases. It shows that most insurers
have 1 and less children. The second violin graph shows the relationship between charges and number
of children. It is clear that they only possess a weak correlation since the shape of the violin is almost
identical across the number of children, with the only exception of children number 4 and 5 due to lack
of samples.

Figure 6. Distribution of number of children

Figure 7. Correlation between number of children and charges

Although number of children only possess a weak correlation with charges, it may not be a wise idea
to drop this variable from the MLR model as Rossi et al. suggest in their paper [4]. Therefore, the author
decides to include all variables for the MLR model. In the end, the author uses Python’s scikit-learn to
make the MLR model:
𝑐ℎ𝑎𝑟𝑔𝑒 = −11815.45 + 257.29 ∗ 𝑎𝑔𝑒 − 131.11 ∗ 𝑠𝑒𝑥 + ⋯ − 353.64 ∗ 𝑟𝑒𝑔𝑖𝑜𝑛. (2)
From this MLR model, it is evident that smoking is the most correlated variable with a large
coefficient of 23820.43. It is clear that smoking deteriorates health condition which in turn prompts
insurances companies to charge a higher price. It is notable that sex has the weakest correlation with the
insurance charges. The model shows that sex is not an important factor that determines the insurance

56
Proceedings of CONF-MPCS 2024 Workshop: Quantum Machine Learning: Bridging Quantum Physics and Computational Simulations
DOI: 10.54254/2753-8818/51/2024CH0161

charges. An unexceptional outcome is that bmi is also weakly correlated with charges, given the fact
that higher bmi leads to worse health condition (table 2).
Table 2. MLR Results
Variables Beta Standard Error t P
constant -11815.45 955.13 -12.37 0.000
age 257.29 11.87 21.65 0.000
sex -131.11 332.81 -0.39 0.694
bmi 332.57 27.72 12.00 0.000
children 479.37 137.64 3.48 0.001
smoker 23820.43 411.84 57.84 0.000
region -353.64 151.93 -2.33 0.020

Lastly, the author uses the score function of scikit-learn to obtain the R-Squared score of this MLR
model. The score is estimated to be 0.75, which is considered a successful attempt.

5. Conclusion
In this paper, the author conducts an exploratory analysis on medical insurance dataset from Kaggle.
The author preprocesses the dataset at first and shows the correlation between each independent variable
and the dependent variable. In the end, the author provides the MLR model with a satisfactory accuracy
of 0.75.
It cannot be denied that overfitting may be a problem for this dataset since it contains only around
1000 inputs. More data should be incorporated to prevent this issue in future studies. Other prediction
models like Random Forest can also be incorporated in future studies to compare with MLR in order to
achieve a better prediction result.

References
[1] James W, et al. 2004 Overweight and Obesity (High Body Mass Index). Comparative
Quantification of Health Risks: Global and Regional Burden of Disease Attributable to
Selected Major Risk Factors, 497-596.
[2] Tranmer M, Murphy J, Elliot M and Pampaka M 2020 Multiple Linear Regression (2nd Edition).
Cathie Marsh Institute, Working Paper, 1.
[3] Joshua C 2023 Implementing Multiple Linear Regression model using Neural Networks.
ResearchGate.
[4] Gülden K U and Neşe G A 2013 Study on Multiple Linear Regression Analysis. Procedia - Social
and Behavioral Sciences, 234-240.
[5] Elena R, Isabella P and Renato I 2022 Multilinear Regression Model for Biogas Production
Prediction from Dry Anaerobic Digestion of OFMSW. Sustainability 14, 4393.
[6] Smita R, Alakananda T and Alok R T 2020 Prediction of new active cases of coronavirus disease
(COVID-19) pandemic using multiple linear regression model. Diabetes & Metabolic
Syndrome: Clinical Research & Reviews, 14(5), 1467-1474.
[7] Knofczynski G T and Mundfrom D 2007 Sample Sizes When Using Multiple Linear Regression
for Prediction. Educational and Psychological Measurement, 68(3), 431-442.
[8] Stephen Y, et al. 2018 Predicting Students' Academic Performance Using Multiple Linear
Regression and Principal Component Analysis, Journal of Information Processing.
[9] Stavroula D and Konstantinos G N 2022 Multiple Linear Regression Models with Limited Data
for the Prediction of Reference Evapotranspiration of the Peloponnese, Greece. Hydrology, 9,
124.
[10] Biau G and Scornet E 2016 A random forest guided tour. TEST, 25, 197-227.

Predict Health Insurance Cost by Using Machine Learning and DNN Regression Models
No ratings yet
Predict Health Insurance Cost by Using Machine Learning and DNN Regression Models
7 pages
Medicial
No ratings yet
Medicial
13 pages
Health Insurance Premium Analysis
100% (1)
Health Insurance Premium Analysis
24 pages
Machine Learning 2331151
No ratings yet
Machine Learning 2331151
12 pages
U.S Medical Insurance Costs: Wesley F. Maia
No ratings yet
U.S Medical Insurance Costs: Wesley F. Maia
30 pages
Medical
No ratings yet
Medical
4 pages
Medical Insurance Cost Prediction
No ratings yet
Medical Insurance Cost Prediction
16 pages
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
0% (1)
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
7 pages
Linear Regression Modelfor Predicting Medical Expenses
No ratings yet
Linear Regression Modelfor Predicting Medical Expenses
5 pages
Health Insurance Amount Prediction: Nidhi Bhardwaj, Rishabh Anand
No ratings yet
Health Insurance Amount Prediction: Nidhi Bhardwaj, Rishabh Anand
4 pages
Understanding The Data: Objective
No ratings yet
Understanding The Data: Objective
1 page
Health Insurance Cost Prediction
No ratings yet
Health Insurance Cost Prediction
4 pages
063a - Submission Attachment - Aust Industry - SPC Ardmona - UsaMultiple Linear Regression Analysis
No ratings yet
063a - Submission Attachment - Aust Industry - SPC Ardmona - UsaMultiple Linear Regression Analysis
35 pages
Linear Regression Model For Predicting Medical Expenses Based On Insurance Data
No ratings yet
Linear Regression Model For Predicting Medical Expenses Based On Insurance Data
6 pages
Day 6 Session 2 MLR
No ratings yet
Day 6 Session 2 MLR
16 pages
Aih Lab1
No ratings yet
Aih Lab1
10 pages
Modeling Life Insurance Risk: Prudential Insurance Data Set
No ratings yet
Modeling Life Insurance Risk: Prudential Insurance Data Set
7 pages
LAb Test 2
No ratings yet
LAb Test 2
4 pages
Multiple Linear Regression 13112023 063212pm
No ratings yet
Multiple Linear Regression 13112023 063212pm
49 pages
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
0% (1)
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
18 pages
Lecture 4 - Multiple Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 4 - Multiple Linear Regression Imran 20022025 092939am
49 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Regression Diagnostics in R Analysis
No ratings yet
Regression Diagnostics in R Analysis
8 pages
Presentation Health Insurance USA
No ratings yet
Presentation Health Insurance USA
18 pages
Linear Regression with Scikit-Learn Guide
No ratings yet
Linear Regression with Scikit-Learn Guide
22 pages
Dental
No ratings yet
Dental
10 pages
Multiple Linear Regression Explained
No ratings yet
Multiple Linear Regression Explained
55 pages
Ist 407 Final Paper
No ratings yet
Ist 407 Final Paper
6 pages
DAV 2201079 Exp 3-1
No ratings yet
DAV 2201079 Exp 3-1
11 pages
Statistics and Probability PROJECT 1
No ratings yet
Statistics and Probability PROJECT 1
4 pages
Linear Regression Lab Guide
100% (1)
Linear Regression Lab Guide
8 pages
Group 5 - Applied Statistics and Experimental 152611
No ratings yet
Group 5 - Applied Statistics and Experimental 152611
28 pages
ML Linear Regression Trupesh Patel
No ratings yet
ML Linear Regression Trupesh Patel
23 pages
Medical Insurance Cost Prediction Using Machine Learning
No ratings yet
Medical Insurance Cost Prediction Using Machine Learning
7 pages
AMA3602Final2024Fall Ray
No ratings yet
AMA3602Final2024Fall Ray
21 pages
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
No ratings yet
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
1,472 pages
Multiple Regression Methodology and Applications
No ratings yet
Multiple Regression Methodology and Applications
7 pages
Notes 23 Regression R
No ratings yet
Notes 23 Regression R
5 pages
Exp 1
No ratings yet
Exp 1
7 pages
CAPESTONE
No ratings yet
CAPESTONE
16 pages
Lecture - 8 MLR
No ratings yet
Lecture - 8 MLR
63 pages
Lec 22 - Multiple Regression
No ratings yet
Lec 22 - Multiple Regression
22 pages
Model Fit Measures
No ratings yet
Model Fit Measures
5 pages
Medical Insurance Cost Prediction ML
No ratings yet
Medical Insurance Cost Prediction ML
9 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
Multiple Linear Regression 3
No ratings yet
Multiple Linear Regression 3
68 pages
Beyond Multiple Linear Regression Applied Generalized Linear Models and Multilevel Models in R 1st Edition Paul Roback
No ratings yet
Beyond Multiple Linear Regression Applied Generalized Linear Models and Multilevel Models in R 1st Edition Paul Roback
71 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Multiple Linear Regression - Prof. Sami Day 1
No ratings yet
Multiple Linear Regression - Prof. Sami Day 1
58 pages
Atm 08 16 982
No ratings yet
Atm 08 16 982
8 pages
IE266 S25 Week12
No ratings yet
IE266 S25 Week12
53 pages
Project Report Certificate - PDF
No ratings yet
Project Report Certificate - PDF
13 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Business Analytics Project Report: Deloitte Insurance, Pricing Strategy Development
No ratings yet
Business Analytics Project Report: Deloitte Insurance, Pricing Strategy Development
4 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Models For Multi-State Survival Data - Per Kragh Andersen, Henrik Ravn (Chapman & Hall - CRC Texts in Statistical Science) - CRC (2024)
No ratings yet
Models For Multi-State Survival Data - Per Kragh Andersen, Henrik Ravn (Chapman & Hall - CRC Texts in Statistical Science) - CRC (2024)
293 pages
01.coding, Missing Alphabet, Blood Relation
No ratings yet
01.coding, Missing Alphabet, Blood Relation
6 pages
Mental Ability Suggestion & Live Class
No ratings yet
Mental Ability Suggestion & Live Class
1 page
Physics SSC
No ratings yet
Physics SSC
3 pages
Higher Math 5.4 SSC
No ratings yet
Higher Math 5.4 SSC
2 pages
Automatic Congestion Handling Feature Parameter Description: Issue Date
No ratings yet
Automatic Congestion Handling Feature Parameter Description: Issue Date
61 pages
Automatic Rain Sensing Wiper System
80% (5)
Automatic Rain Sensing Wiper System
10 pages
Quảng Bình 2021 National Exam Prep
No ratings yet
Quảng Bình 2021 National Exam Prep
16 pages
Corrigé Type Anglais BAC G
No ratings yet
Corrigé Type Anglais BAC G
2 pages
Unit III. Learning Theories and Models
No ratings yet
Unit III. Learning Theories and Models
40 pages
Tuned-Mass Systems For The Seismic Retrofit of Buildings: Peter Nawrotzki
No ratings yet
Tuned-Mass Systems For The Seismic Retrofit of Buildings: Peter Nawrotzki
8 pages
Wan. 2" Medicine: - Puioeopathy S
100% (1)
Wan. 2" Medicine: - Puioeopathy S
244 pages
Order Letter 2
No ratings yet
Order Letter 2
3 pages
MY2222 802 X253 GLC Spec Combine 20220309
No ratings yet
MY2222 802 X253 GLC Spec Combine 20220309
56 pages
Ubuntu Server 10.04 LTS
No ratings yet
Ubuntu Server 10.04 LTS
54 pages
Business Management MCQs and Answers
No ratings yet
Business Management MCQs and Answers
13 pages
English Grade 09 Worksheet 2
No ratings yet
English Grade 09 Worksheet 2
3 pages
Against The Dying of The Light: Robin Boyd and Australian Architecture.
No ratings yet
Against The Dying of The Light: Robin Boyd and Australian Architecture.
18 pages
Learning by Solving Solved Problems
No ratings yet
Learning by Solving Solved Problems
2 pages
A.P. Student and Teacher Development Plan
No ratings yet
A.P. Student and Teacher Development Plan
2 pages
Destiny Consultancy: "No Advice Only Solution"
No ratings yet
Destiny Consultancy: "No Advice Only Solution"
13 pages
ELASTO-DECK 5001 HT Waterproofing Guide
No ratings yet
ELASTO-DECK 5001 HT Waterproofing Guide
3 pages
Writing Effective Reports Handouts
No ratings yet
Writing Effective Reports Handouts
40 pages
IPO Underwriting Report
No ratings yet
IPO Underwriting Report
6 pages
Geography Notefor Grade 11,2 ND Term 2025 Fien 35 SK
No ratings yet
Geography Notefor Grade 11,2 ND Term 2025 Fien 35 SK
128 pages
Indian Data Privacy in Digital Marketing
No ratings yet
Indian Data Privacy in Digital Marketing
31 pages
President CEO Consumer Products in Phoenix AZ Resume Mindi Osborn
No ratings yet
President CEO Consumer Products in Phoenix AZ Resume Mindi Osborn
2 pages
Principles of Electrochemistry: Potential & Thermodynamics
No ratings yet
Principles of Electrochemistry: Potential & Thermodynamics
13 pages
7 Way Instagram Hack
100% (1)
7 Way Instagram Hack
3 pages
Lesson 3.2 - Create UT Case With PCL
No ratings yet
Lesson 3.2 - Create UT Case With PCL
9 pages
TNS Gen 7 v1 Contents Log
No ratings yet
TNS Gen 7 v1 Contents Log
2 pages
Gnome Sheet
No ratings yet
Gnome Sheet
4 pages
ERP Implementation Failure Factors
No ratings yet
ERP Implementation Failure Factors
8 pages
Öz, H. (2014) - Morphology and Implications For English Language Teaching. in A. Saricoban (Ed.), Linguistics For English Language
No ratings yet
Öz, H. (2014) - Morphology and Implications For English Language Teaching. in A. Saricoban (Ed.), Linguistics For English Language
42 pages
System Formwor
100% (1)
System Formwor
55 pages

Predicting Insurance Charges Using Linear Regressi

Uploaded by

Predicting Insurance Charges Using Linear Regressi

Uploaded by

Proceedings of CONF-MPCS 2024 Workshop: Quantum Machine Learning: Bridging Quantum Physics and Computational Simulations

Predicting insurance charges using linear regression models

Keywords: Insurance, charge prediction, linear regression.

3.1. Data source

3.2. Variable selection

Table 1. Dataset Preview

Figure 1. Multicollinearity of the dataset

3.3. Linear regression method

4. Results and discussion

4.1. Descriptive analysis

Figure 2. Distribution of changes for smoker and non-smoker

Figure 3. Distribution of charges for male, female, distinguished by smoking behavior

Figure 4. Distribution of charges for different ages, distinguished by smoking behavior

Figure 5. Distribution of charges for different bmi, distinguished by smoking behavior

Figure 6. Distribution of number of children

Figure 7. Correlation between number of children and charges

You might also like