Better Exponential Curve Fitting
Using Excel
Mike Middleton
DSI 2010 San Diego
Michael R. Middleton, Ph.D.
Decision Toolworks
Mike@[Link]
415.310.7190
Background
The exponential function, Y=c*EXP(b*x), is useful for fitting
some non-linear single-bulge data patterns.
In Excel, you can create an XY (Scatter) chart and add a best-fit
trendline based on the exponential function.
Problem: Regarding the fitted curve for Excels Exponential
Trendline,
(1) the reported value for R Squared is incorrect, and
(2) the fitted values do not minimize Sum of Squared
Deviations.
DSI 2010 San Diego
[Link]
Cisco Revenue Example
Data from example originally presented in Winston (2004)
Model for growth of Cisco revenue during 1900-1999
Potentially useful for projecting revenues and determining
company value
For 1900-1999, Cisco revenue seems to grow by
approximately the same percentage each year
The exponential function, Y=c*EXP(b*X), has the property
that for each unit increase in X the value of Y increases by a
constant percentage
DSI 2010 San Diego
[Link]
Cisco Data and XY Chart
Year
X
1
2
3
4
5
6
7
8
9
10
B
$ Millions
Revenue
Y
$70
$183
$340
$649
$1,243
$1,979
$4,096
$6,440
$8,459
$12,154
Cisco Annual Revenue, 1990-1999
$20,000
Revenue, in Millions
A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$15,000
$10,000
$5,000
$0
10
Year
In Excel 2010, select data A4:B13. Insert XY Scatter chart. Use Chart
Tools Layout to add chart title and axes titles.
Right-click a data point to select the data series, and choose Add
Trendline from the shortcut menu.
DSI 2010 San Diego
[Link]
Trendline Dialog Box
DSI 2010 San Diego
[Link]
Excel Chart with Exponential Trendline
Year
X
1
2
3
4
5
6
7
8
9
10
B
$ Millions
Revenue
Y
$70
$183
$340
$649
$1,243
$1,979
$4,096
$6,440
$8,459
$12,154
Cisco Annual Revenue, 1990-1999
$20,000
Revenue, in Millions
A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
y = 58.553e0.5694x
R = 0.9828
$15,000
$10,000
$5,000
$0
0
10
Year
Next, compute the fitted values for Y, and use worksheet functions
and formulas to compute the actual value of R Squared
DSI 2010 San Diego
[Link]
Actual R Squared for Exponential Trendline
A
1
2
3
4
5
6
7
8
9
10
11
12
13
Year
X
1
2
3
4
5
6
7
8
9
10
B
$ Millions
Revenue
Y
$70
$183
$340
$649
$1,243
$1,979
$4,096
$6,440
$8,459
$12,154
Fitted Y
$103
$183
$323
$571
$1,009
$1,783
$3,151
$5,568
$9,840
$17,389
E
Exponential Trendline
SS Total
SS Regression
SS Residual
156,733,316
125,667,007
31,066,309
R Squared
StDev(Residuals)
0.802
Total SS
Explained SS
Unexplained SS, SSD
Explained SS / Total SS
$1,763
Excels Trendline reports R Squared = 0.9828
Actual R Squared = 0.802
Approximately 80% of the variation in Y is explained by X
using the fitted exponential function
DSI 2010 San Diego
[Link]
Shortcut Excel functions for R Squared calculations
A
1
2
3
4
5
6
7
8
9
10
11
12
13
Year
X
1
2
3
4
5
6
7
8
9
10
B
$ Millions
Revenue
Y
$70
$183
$340
$649
$1,243
$1,979
$4,096
$6,440
$8,459
$12,154
Fitted Y
$103
$183
$323
$571
$1,009
$1,783
$3,151
$5,568
$9,840
$17,389
E
Exponential Trendline
SS Total
SS Regression
SS Residual
=COUNT(B4:B13)*VARP(B4:B13)
=F4-F6
=SUMXMY2(B4:B13,C4:C13)
Total SS
Explained SS
Unexplained SS, SSD
R Squared
=F5/F4
Explained SS / Total SS
StDev(Residuals)
=SQRT(F6/COUNT(B4:B13))
Note that we cannot use Excels worksheet functions RSQ or PEARSON^2
or CORREL^2 to compute R Squared because those functions are based
on a linear fit between Y and X.
DSI 2010 San Diego
[Link]
Setup display for better fit using Excels Solver
A
1
2
3
4
5
6
7
8
9
10
11
12
13
Year
X
1
2
3
4
5
6
7
8
9
10
B
C
D
E
$ Millions
Coeff c
Revenue
58.55266
Y c*EXP(b*X)
$70
$103
SS Total
$183
$183
SS Regression
$340
$323
SS Residual
$649
$571
$1,243
$1,009
R Squared
$1,979
$1,783
$4,096
$3,151
StDev(Residuals)
$6,440
$5,568
$8,459
$9,840
$12,154
$17,389
F
G
Coeff b
0.569367
156,733,316
125,666,623
31,066,693
0.802
Total SS
Explained SS
Unexplained SS, SSD
Explained SS / Total SS
$1,763
Tentative values for coefficients in E2:F2 (Solver Changing Cells)
Formula for fitted value in C4 depends on coefficients and X, copied to C5:C13
Sum of Squared Deviations formula in F6 (Solver Objective) to be minimized
DSI 2010 San Diego
[Link]
Setup formulas for better fit using Excels Solver
A
1
2
3
4
5
6
7
8
9
10
11
12
13
Year
X
1
2
3
4
5
6
7
8
9
10
B
$ Millions
Revenue
Y
$70
$183
$340
$649
$1,243
$1,979
$4,096
$6,440
$8,459
$12,154
Coeff c
58.55266
c*EXP(b*X)
=$E$2*EXP($F$2*A4)
=$E$2*EXP($F$2*A5)
=$E$2*EXP($F$2*A6)
=$E$2*EXP($F$2*A7)
=$E$2*EXP($F$2*A8)
=$E$2*EXP($F$2*A9)
=$E$2*EXP($F$2*A10)
=$E$2*EXP($F$2*A11)
=$E$2*EXP($F$2*A12)
=$E$2*EXP($F$2*A13)
Coeff b
0.569367
SS Total
SS Regression
SS Residual
=COUNT(B4:B13)*VARP(B4:B13)
=F4-F6
=SUMXMY2(B4:B13,C4:C13)
Total SS
Explained SS
Unexplained SS, SSD
R Squared
=F5/F4
Explained SS / Total SS
StDev(Residuals)
=SQRT(F6/COUNT(B4:B13))
Tentative values for coefficients in E2:F2 (Solver Changing Cells)
Formula for fitted value in C4 depends on coefficients and X (absolute
references to E2:F2, relative reference to A4), copied to C5:C13
Sum of Squared Deviations formula in F6 (Solver Objective) to be minimized
DSI 2010 San Diego
[Link]
10
Excel 2010 Solver Parameters Dialog Box
DSI 2010 San Diego
[Link]
11
Excel 2010 Solver Options Dialog Boxes
DSI 2010 San Diego
[Link]
12
Results for Exponential Fit using Solver
A
1
2
3
4
5
6
7
8
9
10
11
12
13
Year
X
1
2
3
4
5
6
7
8
9
10
B
C
$ Millions
Revenue
Y c*EXP(b*X)
$70
$325
$183
$488
$340
$732
$649
$1,098
$1,243
$1,648
$1,979
$2,471
$4,096
$3,707
$6,440
$5,560
$8,459
$8,339
$12,154
$12,509
F
Coeff c
Coeff b
217.0084285 0.40542436
SS Total
SS Regression
SS Residual
156,733,316
154,746,736
1,986,580
R Squared
0.987
StDev(Residuals)
$446
Total SS
Explained SS
Unexplained SS, SSD
Explained SS / Total SS
Excels Trendline reported R Squared = 0.9828, but its
actual R Squared = 0.802 and StDev(Residuals) = $1,763
Solvers better fit has actual R Squared = 0.987 and
StDev(Residuals) = $446
DSI 2010 San Diego
[Link]
13
Visual Comparison of Fits
Cisco Annual Revenue, 1990-1999
$18,000
Trendline
$16,000
$14,000
Revenue, in Millions
$12,000
$10,000
Solver
$8,000
$6,000
$4,000
$2,000
$0
0
10
Year
DSI 2010 San Diego
[Link]
14
Comparison of Current/Previous Ratios
Year
X
1
2
3
4
5
6
7
8
9
10
$ Millions
Revenue
Actual
Y Current/Previous
$70
$183
2.614
$340
1.858
$649
1.909
$1,243
1.915
$1,979
1.592
$4,096
2.070
$6,440
1.572
$8,459
1.314
$12,154
1.437
Average Ratio, 2 to 10
1.809
Average Ratio, 3 to 10
1.708
Average Ratio, 8 to 10
1.441
DSI 2010 San Diego
R^2=0.802, SD(Resid)=$1763
Trendline Exponential
Fitted Y Current/Previous
$103
$183
1.767
$323
1.767
$571
1.767
$1,009
1.767
$1,783
1.767
$3,151
1.767
$5,568
1.767
$9,840
1.767
$17,389
1.767
[Link]
R^2=0.987, SD(Resid)=$446
Solver Fit Exponential
Fitted Y Current/Previous
$325
$488
1.500
$732
1.500
$1,098
1.500
$1,648
1.500
$2,471
1.500
$3,707
1.500
$5,560
1.500
$8,339
1.500
$12,509
1.500
15
Excels Method for Fitting Exponential Trendline, 1 of 2
The exponential model creates a trendline using the equation
y = c * ebx.
Excel uses a log transformation of the original y data to determine fitted values, so
the values of the dependent variable in your data set must be positive.
The exponential trendline feature does not find values of b and c that minimize the
sum of squared deviations between actual y and predicted y (= c * ebx). Instead,
Excel's method takes the logarithm of both sides of the exponential formula, which
then can be written as
Ln(y) = Ln(c) + b * x
and uses standard linear regression with Ln(y) as the dependent variable and x as
the explanatory variable. That is, Excel finds the intercept and slope that minimize
the sum of squared deviations between actual Ln(y) and predicted Ln(y), using the
formula
Ln(y) = Intercept + Slope * x.
Therefore, the Intercept value corresponds to Ln(c), and c in the exponential
formula is equal to Exp(Intercept). The Slope value corresponds to b in the
exponential formula. - Middleton (1995)
DSI 2010 San Diego
[Link]
16
Excels Method for Fitting Exponential Trendline, 2 of 2
Y
70
183
340
649
1243
1979
4096
6440
8459
12154
Ln(Y)
4.248495
5.209486
5.828946
6.475433
7.125283
7.590347
8.317766
8.770284
9.042986
9.405414
Plot of Ln(Y) vs X
Ln(Y), Ln(Revenue)
X
1
2
3
4
5
6
7
8
9
10
12
10
8
6
4
2
0
0
10
X, Year
Ln(Y), Ln(Revenue)
Linear Fit for Ln(Y) vs X
12
10
8
6
4
2
0
Y = c*EXP(b*X)
Ln(y) = 0.5694x + 4.0699
R = 0.9828
LN(Y) = LN(c) + b*X
Fit: LN(Y) = 4.0699 + 0.5694*X
b = 0.5694
10
X, Year
DSI 2010 San Diego
[Link]
c = EXP(LN(c))
c = EXP(4.0699)
c = 58.55
17
General Steps for Curve Fitting
Goal: explain variation in a variable of interest, Y
Prepare a histogram for Y, the dependent (or response) variable
Find data for explanatory variable(s) that make sense
Look at the data: plot XY (Scatter) charts to see relationships
Propose a functional form for the relationship, based on
knowledge of the underlying process,
visual examination of the plot, parsimony, etc.
Determine values for the parameters of the function
best fit, minimize sum of squared deviations
answers the question: What is the relationship?
Perform diagnostics, e.g., R Squared, StDev(Residuals), etc.
answers the question: How good is the relationship?
Use the function
prediction for cross-sectional data, mostly interpolation
forecasts for time-series data, mostly extrapolation
DSI 2010 San Diego
[Link]
18
Summary of Excel Trendline Options
Exponential: Y=c*EXP(b*X), transforms data before fit, not the
best fit, inaccurate R Squared
Linear: Y=b0+b1*X, OK
Logarithmic: Y = c*LN(X)+b, OK
Polynomial: Y=b0+b1*X1+b2*X2+, OK
Power: Y = c*X^b, transforms data before fit, not the best fit,
inaccurate R Squared
Moving Average: OK, but non-standard diagnostics
DSI 2010 San Diego
[Link]
19
Excels Method for Fitting Power Trendline
The power model creates a trendline using the equation
y = c * xb.
Excel uses a log transformation of the original x and y data to determine fitted
values, so the values of both the dependent and explanatory variables in your data
set must be positive.
The power trendline feature does not find values of b and c that minimize the sum
of squared deviations between actual y and predicted y (= c * xb). Instead, Excel's
method takes the logarithm of both sides of the power formula, which then can be
written as
Ln(y) = Ln(c) + b * Ln(x),
and uses standard linear regression with Ln(y) as the dependent variable and Ln(x)
as the explanatory variable. That is, Excel finds the intercept and slope that
minimize the sum of squared deviations between actual Ln(y) and predicted Ln(y),
using the formula
Ln(y) = Intercept + Slope * Ln(x).
Therefore, the Intercept value corresponds to Ln(c), and c in the power formula is
equal to Exp(Intercept). The Slope value corresponds to b in the power formula.
DSI 2010 San Diego
[Link]
20
References
Middleton, M.R. 1995. Data Analysis Using Microsoft Excel 5.0. Duxbury
Press, Belmont, CA.
Winston, W.L. 2004. Microsoft Excel Data Analysis and Business Modeling.
Microsoft Press, Redmond, WA.
DSI 2010 San Diego
[Link]
21
Better Exponential Curve Fitting
Using Excel
Mike Middleton, DSI 2010 San Diego
Michael R. Middleton, Ph.D.
Decision Toolworks
Mike@[Link]
415.310.7190
PowerPoint Slides, Slides PDF File, and Excel Workbook
[Link]
[Link]
[Link]