Tutorial:
Regression 102
[Link],wecontinuethe analysisdiscussionwestartedearlierandleverageanadvancedtechniquestepwiseregressionto helpusfindanoptimalsetofexplanatoryvariablesforthemodel. Again,[Link] attemptstoexplainandpredictweeklysalesforeachsalesperson(dependentvariable)usingtwo explanatoryvariables:intelligence(IQ)andextroversion.
Data Preparation
Similartowhatwedidinanearliertutorial,weorganizeoursampledatabyplacingthevalueofeach variableinaseparatecolumnandeachobservationinaseparaterow. Next,[Link](0,1),whichchooseswhichvariableis included(orexcluded)fromtheanalysis. Initially,atthetopofthetable,letsinsertthemaskcellsarray,eachwithavalueof1([Link]).The arrayisshownhighlightedbelow.
Inthisexample,wehave20observationsandtwoindependent(explanatory)[Link] dependentvariableistheweeklysales.
Process
Now,[Link],selectanemptycellinyourworksheet whereyouwishtheoutputtobegenerated,thenlocateandclickontheregressioniconintheNumXL
Regression102Tutorial
SpiderFinancialCorp,2013
tab(ortoolbar).
TheRegressionwizardappears.
Selectthecellsrangefortheresponse/dependentvariablevalues([Link]).Selectthecells rangefortheexplanatory(independent)[Link](X)Mask,selectthecellsatthe topofthedatatable(Booleanarray). Notes: 1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput tableswhereitreferencesthosevariables. 2. Theexplanatoryvariables(i.e.X)arealreadygroupedbycolumns(eachcolumnrepresentsa variable),sowedontneedtochangethat. 3. Bydefault,theoutputcellsrangeissettothecurrentlyselectedcellinyourworksheet. PleasenotethatonceweselecttheXandYcellsrange,theOptions,ForecastandMissingValues tabsbecomeavailable(enabled). Next,selecttheOptionstab.
Regression102Tutorial
SpiderFinancialCorp,2013
Initially,thetabissettothefollowingvalues: Theregressionintercept/[Link] [Link]([Link](0)),enterit there. Thesignificancelevel(aka. )issetto5%. IntheOutputsection,themostcommonregressionanalysesareselected. [Link].
Now,clickontheMissingValuestab.
Regression102Tutorial 3 SpiderFinancialCorp,2013
Inthistab,youcanselectanapproachtohandlemissingvaluesinthedataset(XandY).Bydefault,any missingvaluefoundinXorinYinanyobservationwouldexcludetheobservationfromtheanalysis. Thistreatmentisagoodapproachforouranalysis,soletsleaveitunchanged. Now,clickOKtogeneratetheoutputtables:
Analysis
AsidefromtheVariables(X)Masksettings,everythingisexactlythesameaswedidintheprior tutorial,sowhatsournextstep? TheMaskvariabledetermineswhichvariableisincludedintheregressionanalysis,soletstake anotherlookattheCoefficientstable.
First,[Link] maskvalueforthiscelltozero.
Regression102Tutorial
SpiderFinancialCorp,2013
Now,ifyouhavetheCalculationoptionsettomanual,[Link],thespreadsheet recalculatesautomatically.
Checkingtheoutputtables,wefindthefollowing: Rsquaredroppedby6%. AdjustedRsquaredroppedby1.5%. Standarderrorincreasedby$3. AICdroppedbyone(1). ANOVAtableshowstheregressionissignificant. Residualdiagnosischecksoutforalltests. Intheregressioncoefficientstable,theinterceptandthecoefficientoftheExtroversion variablearebothstatisticallysignificant.
Thismodelhasfewerparameters([Link])andexplainsthevariationinthevaluesoftheresponse variablejustaswellaswhenwehadtwo(2)explanatoryvariables. Now,letsplottheestimatedvaluesagainsttheactual.
Regression102Tutorial
SpiderFinancialCorp,2013
$4,500 $Sales/Week Estimated $4,000
$3,500
$3,000
$2,500
$2,000
$1,500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Theshadedarearepresentsthe95%confidenceintervalfortheestimatesoftheregressionmodel. Sofar,wehavedemonstratedthatdroppingavariablefromtheanalysisisaseasyasflippingaswitch; [Link],butyou mightbewondering:ifIhadmoreexplanatoryvariables(say10),whatistheoptimalsetofvariables? ShouldItryeverysinglesubset? NumXLsupportsaninterestingfunctionalitystepwiseregressiontohelpyouselectthisoptimalset. Letsdemonstratehowyouwoulduseit. (1) IntheMaskcellsrange,turnthevariablesonoroffthatyouwishthestepwiseregressionto [Link],wewillturnthemallon.
(2) LocateandclickontheregressioniconintheNumXLtab. Regression102Tutorial 6 SpiderFinancialCorp,2013
(3) TheRegressionWizardpopsup. (4) IntheGeneraltab,selecttheinputcellsrangeandthemaskcellsrange. (5) UndertheOptionstab,checktheStepwiseRegressionbox.
(6) Leavethe3differentmethodschecked. (7) ClickOK. (8) Theoutputtablesaregenerated.
Thestepwiseregressiongeneratesoneadditionaltablenexttothecoefficientstable. Regression102Tutorial 7 SpiderFinancialCorp,2013
Letstakeacloserlookatthisnewtable. ThestepwiseregressioncarriesonaseriesofpartialF testtoinclude(ordrop)variablesfromtheregression model. Forwardselection:westartwithanintercept, andexamineaddinganadditionalvariable. Backwardelimination:westartfromthefull modelwithallvariablesin,andconsider droppingonerepressoratatime. Bidirectionaleliminationisahybridofthetwo methods.
[Link](1)standsforinclusion andzero(0)forexclusion. Atthebottomofthetable,[Link] thiscase,thethreemodelscamebackwiththesamesetofvariables,sonocomparisonisneeded. Pleasenotethat,giventhesamesetofinputvariablesandresponses,themaskisusedtodifferentiate onemodelfromotherssimplybylistingtheinclusion/exclusionlist.
Conclusion
Sofar,wehavecreatedaregressionmodel,examineditssignificance,verifiedthatitsatisfiesunderlying assumptions,andfoundtheoptimalsubsetofvariablesofthemodel. Formany,thisistheendofanalysis,andtheywouldprobablystartusingitforforecasting. Beforewecanusethemodelforforecasting,therearetwomorequestionsweoughttoanswer: (1) Dowehaveanyobservationthatexertsasignificantinfluence([Link])ontheregression model? (2) Istheregressionmodelstableoverthesampledata? [Link],readon.
Regression102Tutorial
SpiderFinancialCorp,2013