100% found this document useful (1 vote)

3K views667 pages

NLOGIT 5 Reference Guide

Nlogit 5 reference file

Uploaded by

Bharat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

3K views667 pages

NLOGIT 5 Reference Guide

Nlogit 5 reference file

Uploaded by

Bharat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 667

NLOGIT

Version 5
Reference Guide

William H. Greene
Econometric Software, Inc.
1986 - 2012 Econometric Software, Inc. All rights reserved.

This software product, including both the program code and the accompanying
documentation, is copyrighted by, and all rights are reserved by Econometric Software, Inc. No part
of this product, either the software or the documentation, may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means without prior written permission of Econometric
Software, Inc.

LIMDEP and NLOGIT are registered trademarks of Econometric Software, Inc. All other
brand and product names are trademarks or registered trademarks of their respective companies.

Econometric Software, Inc.

15 Gloria Place
Plainview, NY 11803
USA
Tel: +1 516-938-5254
Fax: +1 516-938-2441
Email: [email protected]
Websites: www.limdep.com and www.nlogit.com.

Econometric Software, Australia

215 Excelsior Avenue
Castle Hill, NSW 2154
Australia
Tel: +61 (0)4-1843-3057
Fax: +61 (0)2-9899-6674
Email: [email protected]
End-User License Agreement
This is a contract between you and Econometric Software, Inc. The software product refers
to the computer software and documentation as well as any upgrades, modified versions, copies or
supplements supplied by Econometric Software. By installing, downloading, accessing or otherwise
using the software product, you agree to be bound by the terms and conditions of this Agreement.
Subject to the terms and conditions of this Agreement, Econometric Software, Inc. grants
you a non-assignable, non-transferable license, without the right to sublicense, to use the licensed
software and documentation in object-code form only, solely for your internal business, research, or
educational purposes.

Copyright, Trademark, and Intellectual Property

This software product is copyrighted by, and all rights are reserved by Econometric Software,
Inc. No part of this software product, either the software or the documentation, may be reproduced,
distributed, downloaded, stored in a retrieval system, transmitted in any form or by any means, sold or
transferred without prior written permission of Econometric Software. You may not, or permit any
person, to: (i) modify, adapt, translate, or change the software product; (ii) reverse engineer, decompile,
disassemble, or otherwise attempt to discover the source code of the software product; (iii) sublicense,
resell, rent, lease, distribute, commercialize, or otherwise transfer rights or usage to the software
product; (iv) remove, modify, or obscure any copyright, registered trademark, or other proprietary
notices; (v) embed the software product in any third-party applications; or (vi) make the software
product, either the software or the documentation, available on any website.
LIMDEP and NLOGIT are registered trademarks of Econometric Software, Inc. The
software product is licensed, not sold. Your possession, installation and use of the software product
does not transfer to you any title and intellectual property rights, nor does this license grant you any
rights in connection with software product registered trademarks.

Use of the Software Product

You have only the non-exclusive right to use this software product. A single user license is
registered to one specific individual as the sole authorized user, and is not for multiple users on one
machine or for installation on a network, in a computer laboratory or on a public access computer.
For a single user license only, the registered user may install the software on a primary stand alone
computer and one home or portable secondary computer for his or her exclusive use. However, the
software may not be used on the primary computer by another person while the secondary computer
is in use. For a multi-user site license, the specific terms of the site license agreement apply for scope
of use and installation.
Limited Warranty

Econometric Software warrants that the software product will perform substantially in
accordance with the documentation for a period of ninety (90) days from the date of the original
purchase. To make a warranty claim, you must notify Econometric Software in writing within ninety
(90) days from the date of the original purchase and return the defective software to Econometric
Software. If the software does not perform substantially in accordance with the documentation, the
entire liability and your exclusive remedy shall be limited to, at Econometric Softwares option, the
replacement of the software product or refund of the license fee paid to Econometric Software for the
software product. Proof of purchase from an authorized source is required. This limited warranty is
void if failure of the software product has resulted from accident, abuse, or misapplication. Some states
and jurisdictions do not allow limitations on the duration of an implied warranty, so the above
limitation may not apply to you. To the extent permissible, any implied warranties on the software
product are limited to ninety (90) days.
Econometric Software does not warrant the performance or results you may obtain by using
the software product. To the maximum extent permitted by applicable law, Econometric Software
disclaims all other warranties and conditions, either expressed or implied, including, but not limited
to, implied warranties of merchantability, fitness for a particular purpose, title, and non-infringement
with respect to the software product. This limited warranty gives you specific legal rights. You may
have others, which vary from state to state and jurisdiction to jurisdiction.

Limitation of Liability

Under no circumstances will Econometric Software be liable to you or any other person for
any indirect, special, incidental, or consequential damages whatsoever (including, without limitation,
damages for loss of business profits, business interruption, computer failure or malfunction, loss of
business information, or any other pecuniary loss) arising out of the use or inability to use the
software product, even if Econometric Software has been advised of the possibility of such damages.
In any case, Econometric Softwares entire liability under any provision of this agreement shall not
exceed the amount paid to Econometric Software for the software product. Some states or
jurisdictions do not allow the exclusion or limitation of liability for incidental or consequential
damages, so the above limitation may not apply to you.
Preface
NLOGIT is a major suite of programs for the estimation of discrete choice models. It is built
on the original DISCRETE CHOICE (or CLOGIT as is used in the current versions) command in
LIMDEP Version 6, which provided some of the features that are described with the estimator
presented in Chapter N17 of this reference guide. NLOGIT, itself, began in 1996 with the development
of the nested logit command, originally an extension of the multinomial logit model. With the
additions of the multinomial probit model and the mixed logit model among several others, NLOGIT
has now grown to a self standing superset of LIMDEP. The focus of most of the recent development is
the random parameters logit model, or mixed logit model as it is frequently called in the literature.
NLOGIT is now the only generally available package that contains panel data (repeated measures)
versions of this model, in random effects and autoregressive forms. We note, the technology used in
the random parameters model, originally proposed by Dan McFadden and Kenneth Train, has proved
so versatile and robust, that we have been able to extend it into most of the other modeling platforms
that are contained in LIMDEP. They, like NLOGIT, now contain random parameters versions. Finally,
a major feature of NLOGIT is the simulation package. With this program, you can use any model that
you have estimated to do what if sorts of simulations to examine the effects on predicted behavior of
changes in the attributes of choices in your model.
NLOGIT Version 5 continues the ongoing (since 1985) collaboration of William Greene
(Econometric Software, Inc.) and David Hensher (Econometric Software, Australia.) Recent
developments, especially the random parameters and generalized mixed logit in its cross section and
panel data variants have also benefited from the enthusiastic collaboration of John Rose (Econometric
Software, Australia).
We note, the monograph Applied Choice Analysis: A Primer (Hensher, D., Rose, J. and
Greene, W., Cambridge University Press, 2005) is a wide ranging introduction to discrete choice
modeling that contains numerous applications developed with NLOGIT. This book should provide a
useful companion to the documentation for NLOGIT.

Econometric Software, Inc.

Plainview, NY 11803
2012
NLOGIT 5 Table of Contents vi

Table of Contents
Table of Contents....................................................................................................................vi

Whats New in Version 5? ................................................................................................... N-1

WN1 The NLOGIT 5 Reference Guide ...................................................................................... N-1
WN2 New Multinomial Choice Models ..................................................................................... N-1
WN2.1 New Scaled and Generalized Mixed Logit Models .......................................... N-1
WN2.2 Estimation in Willingness to Pay Space ........................................................... N-2
WN2.3 Random Regret Logit Model ............................................................................ N-2
WN2.4 Latent Class Models.......................................................................................... N-2
WN2.5 Error Components Logit Model ........................................................................ N-3
WN2.6 Nonlinear Random Parameters Logit Model .................................................... N-3
WN.3 Model Extensions............................................................................................................. N-4
WN3.1 General -888 format .......................................................................................... N-4
WN3.2 Mixed Logit Models ......................................................................................... N-4
WN3.3 Elasticities and Partial Effects .......................................................................... N-4
WN3.4 Robust Covariance Matrix ................................................................................ N-4

N1: Introduction to NLOGIT Version 5................................................................................ N-5

N1.1 Introduction ....................................................................................................................... N-5
N1.2 The NLOGIT Program ....................................................................................................... N-5
N1.3 NLOGIT and LIMDEP Integration and Documentation .................................................... N-5
N1.4 Discrete Choice Modeling with NLOGIT .......................................................................... N-6
N1.5 Types of Discrete Choice Models in NLOGIT .................................................................. N-7
N1.5.1 Random Regret Logit Model ............................................................................. N-9
N1.5.2 Scaled Multinomial Logit Model..................................................................... N-10
N1.5.3 Latent Class and Random Parameters LC Models .......................................... N-10
N1.5.4 Heteroscedastic Extreme Value Model............................................................ N-10
N1.5.5 Multinomial Probit Model ............................................................................... N-10
N1.5.6 Nested Logit Models ....................................................................................... N-11
N1.5.7 Random Parameters and Nonlinear RP Logit Model ...................................... N-11
N1.5.8 Error Components Logit Model....................................................................... N-12
N1.5.9 Generalized Mixed Logit Model ..................................................................... N-12
N1.6 Functions of NLOGIT ...................................................................................................... N-12

N2: Discrete Choice Models .............................................................................................. N-13

N2.1 Introduction ..................................................................................................................... N-13
N2.2 Random Utility Models ................................................................................................... N-13
N2.3 Binary Choice Models ..................................................................................................... N-14
N2.4 Bivariate and Multivariate Binary Choice Models .......................................................... N-16
N2.5 Ordered Choice Models ................................................................................................... N-17
N2.6 Multinomial Logit Model ................................................................................................ N-20
N2.6.1 Random Effects and Common (True) Random Effects ................................... N-21
N2.6.2 A Dynamic Multinomial Logit Model ............................................................. N-22
N2.7 Conditional Logit Model ................................................................................................. N-22
NLOGIT 5 Table of Contents vii

N2.7.1 Random Regret Logit and Hybrid Utility Models ........................................... N-23
N2.7.2 Scaled MNL Model ......................................................................................... N-24
N2.8 Error Components Logit Model....................................................................................... N-24
N2.9 Heteroscedastic Extreme Value Model............................................................................ N-25
N2.10 Nested and Generalized Nested Logit Models .............................................................. N-26
N2.10.1 Alternative Normalizations of the Nested Logit Model ................................ N-27
N2.10.2 A Model of Covariance Heterogeneity .......................................................... N-29
N2.10.3 Generalized Nested Logit Model ................................................................... N-30
N2.10.4 Box-Cox Nested Logit ................................................................................... N-30
N2.11 Random Parameters Logit Models ................................................................................ N-31
N2.11.1 Nonlinear Utility RP Model........................................................................... N-32
N2.11.2 Generalized Mixed Logit Model ................................................................... N-33
N2.12 Latent Class Logit Models ............................................................................................. N-33
N2.12.1 2K Latent Class Model ................................................................................... N-34
N2.12.2 Latent Class Random Parameters Model .................................................... N-35
N2.13 Multinomial Probit Model ............................................................................................. N-35

N3: Model and Command Summary for Discrete Choice Models .................................. N-37
N3.1 Introduction ..................................................................................................................... N-37
N3.2 Model Dimensions ........................................................................................................... N-37
N3.3 Basic Discrete Choice Models ......................................................................................... N-38
N3.3.1 Binary Choice Models ..................................................................................... N-38
N3.3.2 Bivariate Binary Choices ................................................................................. N-38
N3.3.3 Multivariate Binary Choice Models ................................................................ N-39
N3.3.4 Ordered Choice Models ................................................................................... N-39
N3.4 Multinomial Logit Models............................................................................................... N-39
N3.4.1 Multinomial Logit............................................................................................ N-39
N3.4.2 Conditional Logit ............................................................................................. N-40
N3.5 NLOGIT Extensions of Conditional Logit ....................................................................... N-41
N3.5.1 Random Regret Logit ...................................................................................... N-41
N3.5.2 Scaled Multinomial Logit ................................................................................ N-41
N3.5.3 Heteroscedastic Extreme Value ....................................................................... N-41
N3.5.4 Error Components Logit .................................................................................. N-42
N3.5.5 Nested and Generalized Nested Logit ............................................................. N-42
N3.5.6 Random Parameters Logit ............................................................................... N-43
N3.5.7 Generalized Mixed Logit ................................................................................. N-44
N3.5.8 Nonlinear Random Parameters Logit .............................................................. N-44
N3.5.9 Latent Class Logit ............................................................................................ N-44
N3.5.10 2K Latent Class Logit ..................................................................................... N-45
N3.5.11 Latent Class Random Parameters .................................................................. N-45
N3.5.12 Multinomial Probit......................................................................................... N-45
N3.6 Command Summary ........................................................................................................ N-46
N3.7 Subcommand Summary ................................................................................................... N-47

N4: Data for Binary and Ordered Choice Models............................................................. N-51

N4.1 Introduction ..................................................................................................................... N-51
N4.2 Grouped and Individual Data for Discrete Choice Models ............................................. N-51
NLOGIT 5 Table of Contents viii

N4.3 Data Used in Estimation of Binary Choice Models ......................................................... N-52

N4.3.1 The Dependent Variable .................................................................................. N-52
N4.3.2 Problems with the Independent Variables ....................................................... N-52
N4.3.3 Dummy Variables with Empty Cells ............................................................... N-55
N4.3.4 Missing Values ................................................................................................ N-58
N4.4 Bivariate Binary Choice .................................................................................................. N-59
N4.5 Ordered Choice Model Structure and Data...................................................................... N-59
N4.5.1 Empty Cells ..................................................................................................... N-59
N4.5.2 Coding the Dependent Variable....................................................................... N-60
N4.6 Constant Terms ................................................................................................................ N-60

N5: Models for Binary Choice ........................................................................................... N-61

N5.1 Introduction ..................................................................................................................... N-61
N5.2 Modeling Binary Choices ................................................................................................ N-61
N5.2.1 Underlying Processes....................................................................................... N-61
N5.2.2 Modeling Approaches...................................................................................... N-62
N5.2.3 The Linear Probability Model ......................................................................... N-63
N5.3 Grouped and Individual Data for Binary Choice Models .................................................. N-64
N5.4 Variance Normalization ................................................................................................... N-64
N5.5 The Constant Term in Index Function Models ................................................................ N-65

N6: Probit and Logit Models: Estimation ......................................................................... N-66

N6.1 Introduction ..................................................................................................................... N-66
N6.2 Probit and Logit Models for Binary Choice .................................................................... N-66
N6.3 Commands ....................................................................................................................... N-66
N6.4 Output .............................................................................................................................. N-67
N6.4.1 Reported Estimates .......................................................................................... N-67
N6.4.2 Fit Measures .................................................................................................... N-69
N6.4.3 Covariance Matrix ........................................................................................... N-71
N6.4.4 Retained Results and Generalized Residuals ................................................... N-72
N6.5 Robust Covariance Matrix Estimation............................................................................. N-73
N6.5.1 The Sandwich Estimator .................................................................................. N-73
N6.5.2 Clustering......................................................................................................... N-73
N6.5.3 Stratification and Clustering ............................................................................ N-76
N6.6 Analysis of Partial Effects ............................................................................................... N-77
N6.6.1 The Krinsky and Robb Method ....................................................................... N-78
N6.7 Simulation and Analysis of a Binary Choice Model ....................................................... N-83
N6.8 Using Weights and Choice Based Sampling ................................................................... N-85
N6.9 Heteroscedasticity in Probit and Logit Models................................................................ N-87

N7: Tests and Restrictions in Models for Binary Choice ................................................ N-93
N7.1 Introduction ..................................................................................................................... N-93
N7.2 Testing Hypotheses.......................................................................................................... N-93
N7.2.1 Wald Tests ....................................................................................................... N-93
N7.2.2 Likelihood Ratio Tests..................................................................................... N-95
N7.2.3 Lagrange Multiplier Tests................................................................................ N-97
N7.3 Two Specification Tests .................................................................................................. N-99
NLOGIT 5 Table of Contents ix

N7.3.1 A Test for Nonnested Probit Models ............................................................... N-99

N7.3.2 A Test for Normality in the Probit Model ..................................................... N-100
N7.4 The WALD Command .................................................................................................. N-101
N7.5 Imposing Linear Restrictions ......................................................................................... N-103

N8: Extended Binary Choice Models .............................................................................. N-104

N8.1 Introduction ................................................................................................................... N-104
N8.2 Sample Selection in Probit and Logit Models ............................................................... N-104
N8.3 Endogenous Variable in a Probit Model ........................................................................ N-105

N9: Fixed and Random Effects Models for Binary Choice ............................................ N-108
N9.1 Introduction ................................................................................................................... N-108
N9.2 Commands ..................................................................................................................... N-109
N9.3 Clustering, Stratification and Robust Covariance Matrices........................................... N-110
N9.4 One and Two Way Fixed Effects Models...................................................................... N-112
N9.5 Conditional MLE of the Fixed Effects Logit Model ..................................................... N-118
N9.5.1 Command....................................................................................................... N-119
N9.5.2 Application .................................................................................................... N-120
N9.5.3 Estimating the Individual Constant Terms .................................................... N-122
N9.5.4 A Hausman Test for Fixed Effects in the Logit Model ................................. N-123
N9.6 Random Effects Models for Binary Choice................................................................... N-124

N10: Random Parameter Models for Binary Choice ...................................................... N-131

N10.1 Introduction ................................................................................................................. N-131
N10.2 Probit and Logit Models with Random Parameters ..................................................... N-132
N10.2.1 Command for the Random Parameters Models ........................................... N-132
N10.2.2 Results from the Estimator and Applications .............................................. N-134
N10.2.3 Controlling the Simulation .......................................................................... N-141
N10.2.4 The Parameter Vector and Starting Values.................................................. N-142
N10.2.5 A Dynamic Probit Model............................................................................. N-143
N10.3 Latent Class Models for Binary Choice....................................................................... N-145
N10.3.1 Application .................................................................................................. N-146

N11: Semiparametric and Nonparametric Models for Binary Choice........................... N-153

N11.1 Introduction ................................................................................................................. N-153
N11.2 Maximum Score Estimation - MSCORE..................................................................... N-154
N11.2.1 Command for MSCORE.............................................................................. N-155
N11.2.2 Options Specific to the Maximum Score Estimator .................................... N-155
N11.2.3 General Options for MSCORE .................................................................... N-157
N11.2.4 Output from MSCORE ................................................................................ N-158
N11.3 Klein and Spadys Semiparametric Binary Choice Model ............................................ N-159
N11.3.1 Command..................................................................................................... N-160
N11.3.2 Output .......................................................................................................... N-160
N11.3.3 Application .................................................................................................. N-161
N11.4 Nonparametric Binary Choice Model .......................................................................... N-163
N11.4.1 Output from NPREG ................................................................................... N-165
N11.4.2 Application .................................................................................................. N-165
NLOGIT 5 Table of Contents x

N12: Bivariate and Multivariate Probit and Partial Observability Models .................... N-168
N12.1 Introduction ................................................................................................................. N-168
N12.2 Estimating the Bivariate Probit Model ........................................................................ N-169
N12.2.1 Options for the Bivariate Probit Model ....................................................... N-169
N12.2.2 Proportions Data .......................................................................................... N-171
N12.2.3 Heteroscedasticity ........................................................................................ N-172
N12.2.4 Specification Tests ....................................................................................... N-172
N12.2.5 Model Results for the Bivariate Probit Model ............................................. N-174
N12.2.6 Partial Effects .............................................................................................. N-175
N12.3 Tetrachoric Correlation................................................................................................ N-181
N12.4 Bivariate Probit Model with Sample Selection............................................................ N-183
N12.5 Simultaneity in the Binary Variables ........................................................................... N-183
N12.6 Recursive Bivariate Probit Model................................................................................ N-184
N12.7 Panel Data Bivariate Probit Models............................................................................. N-186
N12.8 Simulation and Partial Effects ..................................................................................... N-192
N12.9 Multivariate Probit Model ........................................................................................... N-194
N12.9.1 Retrievable Results ...................................................................................... N-195
N12.9.2 Partial Effects .............................................................................................. N-195
N12.9.3 Sample Selection Model .............................................................................. N-196

N13: Ordered Choice Models .......................................................................................... N-198

N13.1 Introduction ................................................................................................................. N-198
N13.2 Command for Ordered Probability Models ................................................................. N-199
N13.3 Data Problems.............................................................................................................. N-200
N13.4 Output from the Ordered Probability Estimators ......................................................... N-200
N13.4.1 Robust Covariance Matrix Estimation......................................................... N-203
N13.4.2 Saved Results ............................................................................................... N-204
N13.5 Partial Effects and Simulations.................................................................................... N-205

N14: Extended Ordered Choice Models ......................................................................... N-210

N14.1 Introduction ................................................................................................................. N-210
N14.2 Weighting and Heteroscedasticity ............................................................................... N-210
N14.3 Multiplicative Heteroscedasticity ................................................................................ N-211
N14.3.1 Testing for Heteroscedasticity ..................................................................... N-212
N14.3.2 Partial Effects in the Heteroscedasticity Model ........................................... N-216
N14.4 Sample Selection and Treatment Effects ..................................................................... N-218
N14.4.1 Command..................................................................................................... N-219
N14.4.2 Saved Results ............................................................................................... N-219
N14.4.3 Applications ................................................................................................. N-220
N14.5 Hierarchical Ordered Probit Models ............................................................................ N-224
N14.6 Zero Inflated Ordered Probit (ZIOP, ZIHOP) Models ................................................ N-227
N14.7 Bivariate Ordered Probit and Polychoric Correlation.................................................. N-229

N15: Panel Data Models for Ordered Choice ................................................................. N-234

N15.1 Introduction ................................................................................................................. N-234
N15.2 Fixed Effects Ordered Choice Models......................................................................... N-235
N15.3 Random Effects Ordered Choice Models .................................................................... N-238
NLOGIT 5 Table of Contents xi

N15.3.1 Commands ................................................................................................... N-239

N15.3.2 Output and Results....................................................................................... N-240
N15.3.3 Application .................................................................................................. N-241
N15.4 Random Parameters and Random Thresholds Ordered Choice Models ...................... N-243
N15.4.1 Model Commands........................................................................................ N-244
N15.4.2 Results ......................................................................................................... N-247
N15.4.3 Application .................................................................................................. N-247
N15.4.4 Random Parameters HOPIT Model ............................................................. N-251
N15.5 Latent Class Ordered Choice Models .......................................................................... N-257
N15.5.1 Command..................................................................................................... N-257
N15.5.2 Results ......................................................................................................... N-258

N16: The Multinomial Logit Model .................................................................................. N-267

N16.1 Introduction ................................................................................................................. N-267
N16.2 The Multinomial Logit Model MLOGIT.................................................................. N-268
N16.3 Model Command for the Multinomial Logit Model .................................................... N-269
N16.3.1 Imposing Constraints on Parameters ........................................................... N-269
N16.3.2 Starting Values ............................................................................................ N-270
N16.4 Robust Covariance Matrix ........................................................................................... N-270
N16.5 Cluster Correction........................................................................................................ N-271
N16.6 Choice Based Sampling ............................................................................................... N-272
N16.7 Output for the Logit Models ........................................................................................ N-273
N16.8 Partial Effects............................................................................................................... N-276
N16.8.1 Computation of Partial Effects with the Model ........................................... N-277
N16.8.2 Partial Effects Using the PARTIALS EFFECTS Command ....................... N-280
N16.9 Predicted Probabilities ................................................................................................. N-281
N16.10 Generalized Maximum Entropy (GME) Estimation .................................................. N-282
N16.11 Technical Details on Optimization ............................................................................ N-284
N16.12 Panel Data Multinomial Logit Models ...................................................................... N-285
N16.12.1 Random Effects and Common (True) Random Effects ............................. N-285
N16.12.2 Dynamic Multinomial Logit Model........................................................... N-291

N17: Conditional Logit Model.......................................................................................... N-293

N17.1 Introduction ................................................................................................................. N-293
N17.2 The Conditional Logit Model CLOGIT.................................................................... N-294
N17.3 Clogit Data for the Applications .................................................................................. N-295
N17.3.1 Setting Up the Data...................................................................................... N-297
N17.4 Command for the Discrete Choice Model ................................................................... N-298
N17.5 Results for the Conditional Logit Model ..................................................................... N-300
N17.5.1 Robust Standard Errors ................................................................................ N-303
N17.5.2 Descriptive Statistics ................................................................................... N-304
N17.6 Estimating and Fixing Coefficients ............................................................................. N-306
N17.7 Generalized Maximum Entropy Estimator .................................................................. N-308
N17.8 MLOGIT and CLOGIT ............................................................................................... N-310

N18: Data Setup for NLOGIT ........................................................................................... N-312

N18.1 Introduction ................................................................................................................. N-312
NLOGIT 5 Table of Contents xii

N18.2 Basic Data Setup for NLOGIT ..................................................................................... N-312

N18.3 Types of Data on the Choice Variable ......................................................................... N-313
N18.3.1 Unlabeled Choice Sets ................................................................................. N-315
N18.3.2 Simulated Choice Data ................................................................................ N-315
N18.3.3 Checking Data Validity ............................................................................... N-316
N18.4 Weighting .................................................................................................................... N-317
N18.5 Choice Based Sampling ............................................................................................... N-317
N18.6 Entering Data on a Single Line .................................................................................... N-319
N18.7 Converting One Line Data Sets for NLOGIT .............................................................. N-322
N18.7.1 Converting the Data Set to Multiple Line Format ....................................... N-323
N18.7.2 Writing a Multiple Line Data File for NLOGIT .......................................... N-326
N18.8 Merging Invariant Variables into a Panel .................................................................... N-326
N18.9 Modeling Choice Strategy ........................................................................................... N-328
N18.10 Scaling the Data ......................................................................................................... N-329
N18.11 Data for the Applications ........................................................................................... N-330
N18.12 Merging Revealed Preference (RP) and Stated Preference (SP) Data Sets ............... N-332

N19: NLOGIT Commands and Results ........................................................................... N-333

N19.1 Introduction ................................................................................................................. N-333
N19.2 NLOGIT Commands .................................................................................................... N-333
N19.3 Other Optional Specifications on NLOGIT Commands .............................................. N-337
N19.4 Estimation Results ....................................................................................................... N-338
N19.4.1 Descriptive Headers for NLOGIT Models ................................................... N-338
N19.4.2 Standard Model Results ............................................................................... N-339
N19.4.3 Retained Results .......................................................................................... N-342
N19.4.4 Descriptive Statistics for Alternatives ......................................................... N-343
N19.5 Calibrating a Model ..................................................................................................... N-345

N20: Choice Sets and Utility Functions.......................................................................... N-347

N20.1 Introduction ................................................................................................................. N-347
N20.2 Choice Sets .................................................................................................................. N-347
N20.2.1 Fixed and Variable Numbers of Choices ..................................................... N-349
N20.2.2 Restricting the Choice Set ........................................................................... N-351
N20.2.3 A Shorthand for Choice Sets ....................................................................... N-353
N20.2.4 Large Choice Sets A Panel Data Equivalence .......................................... N-353
N20.3 Specifying the Utility Functions with Rhs and Rh2 .................................................... N-355
N20.3.1 Utility Functions .......................................................................................... N-356
N20.3.2 Generic Coefficients .................................................................................... N-356
N20.3.3 Alternative Specific Constants and Interactions with Constants ................... N-357
N20.3.4 Command Builders ...................................................................................... N-359
N20.4 Building the Utility Functions ..................................................................................... N-360
N20.4.1 Notations for Sets of Utility Functions ........................................................ N-362
N20.4.2 Alternative Specific Constants and Interactions .......................................... N-363
N20.4.3 Logs and the Box Cox Transformation ....................................................... N-365
N20.4.4 Equality Constraints..................................................................................... N-366
N20.5 Starting and Fixed Values for Parameters ................................................................... N-367
N20.5.1 Fixed Values ................................................................................................ N-368
NLOGIT 5 Table of Contents xiii

N20.5.2 Starting Values and Fixed Values from a Previous Model .......................... N-368

N21: Post Estimation Results for Conditional Logit Models......................................... N-369

N21.1 Introduction ................................................................................................................. N-369
N21.2 Partial Effects and Elasticities ..................................................................................... N-369
N21.2.1 Elasticities.................................................................................................... N-371
N21.2.2 Influential Observations and Probability Weights ....................................... N-373
N21.2.3 Saving Elasticities in the Data Set ............................................................... N-374
N21.2.4 Computing Partial Effects at Data Means.................................................... N-376
N21.2.5 Exporting Results in a Spreadsheet ............................................................. N-378
N21.3 Predicted Probabilities and Logsums (Inclusive Values) ............................................... N-380
N21.3.1 Fitted Probabilities ....................................................................................... N-380
N21.3.2 Computing and Listing Model Probabilities................................................ N-381
N21.3.3 Utilities and Inclusive Values ...................................................................... N-382
N21.3.4 Fitted Values of the Choice Variable........................................................... N-383
N21.4 Specification Tests of IIA and Hypothesis .................................................................. N-384
N21.4.1 Hausman-McFadden Test of the IIA Assumption ....................................... N-384
N21.4.2 Small-Hsiao Likelihood Ratio Test of IIA .................................................. N-387
N21.4.3 Lagrange Multiplier, Wald, and Likelihood Ratio Tests ............................. N-389

N22: Simulating Probabilities in Discrete Choice Models............................................. N-391

N22.1 Introduction ................................................................................................................. N-391
N22.2 Essential Subcommands .............................................................................................. N-392
N22.3 Multiple Attribute Specifications and Scenarios ......................................................... N-393
N22.4 Simulation Commands................................................................................................. N-394
N22.4.1 Observations Used for the Simulations ....................................................... N-394
N22.4.2 Variables Used for the Simulations ............................................................. N-394
N22.4.3 Choices Simulated ....................................................................................... N-394
N22.4.4 Other NLOGIT Options ............................................................................... N-394
N22.4.5 Observations Used for the Simulations ....................................................... N-394
N22.5 Arc Elasticities ............................................................................................................. N-395
N22.6 Applications ................................................................................................................. N-395
N22.7 A Case Study ............................................................................................................... N-401
N22.7.1 Base Model Multinomial Logit (MNL) .................................................... N-402
N22.7.2 Scenarios...................................................................................................... N-404

N23: The Multinomial Logit and Random Regret Models.............................................. N-412

N23.1 Introduction ................................................................................................................. N-412
N23.2 Command for the Multinomial Logit Model ............................................................... N-413
N23.3 Results for the Multinomial Logit Model .................................................................... N-415
N23.4 Application .................................................................................................................. N-415
N23.5 Partial Effects............................................................................................................... N-420
N23.6 Technical Details on Maximum Likelihood Estimation ................................................ N-422
N23.7 Random Regret Model................................................................................................. N-424
N23.7.1 Commands for Random Regret ................................................................... N-424
N23.7.2 Application .................................................................................................. N-425
N23.7.3 Technical Details: Random Regret Elasticities ........................................... N-427
NLOGIT 5 Table of Contents xiv

N24: The Scaled Multinomial Logit Model ...................................................................... N-429

N24.1 Introduction ................................................................................................................. N-429
N24.2 Command for the Scaled MNL Model ........................................................................ N-430
N24.3 Application .................................................................................................................. N-430
N24.4 Technical Details ......................................................................................................... N-433

N25: Latent Class and 2K Multinomial Logit Model........................................................ N-434

N25.1 Introduction ................................................................................................................. N-434
N25.2 Model Command ......................................................................................................... N-435
N25.3 Individual Specific Results .......................................................................................... N-436
N25.4 Constraining the Model Parameters............................................................................. N-437
N25.5 An Application............................................................................................................. N-440
N25.6 The 2K Model ............................................................................................................... N-442
N25.7 Individual Results ........................................................................................................ N-445
N25.7.1 Parameters ................................................................................................... N-446
N25.7.2 Willingness to Pay ....................................................................................... N-446
N25.7.3 Elasticities.................................................................................................... N-448
N25.8 Technical Details ......................................................................................................... N-449

N26: Heteroscedastic Extreme Value Model .................................................................. N-451

N26.1 Introduction ................................................................................................................. N-451
N26.2 Command for the HEV Model..................................................................................... N-452
N26.3 Application .................................................................................................................. N-454
N26.4 Constraining the Precision Parameters ........................................................................ N-456
N26.5 Individual Heterogeneity in the Variances .................................................................. N-461
N26.6 Technical Details ......................................................................................................... N-463

N27: Multinomial Probit Model ........................................................................................ N-465

N27.1 Introduction ................................................................................................................. N-465
N27.2 Model Command ......................................................................................................... N-466
N27.3 An Application............................................................................................................. N-467
N27.4 Modifying the Covariance Structure............................................................................ N-469
N27.4.1 Specifying the Standard Deviations............................................................. N-470
N27.4.2 Specifying the Correlation Matrix ............................................................... N-472
N27.5 Testing IIA with a Multinomial Probit Model ............................................................. N-475
N27.6 A Model of Covariance Heterogeneity ........................................................................ N-476
N27.7 Panel Data The Multinomial Multiperiod Probit Model ............................................ N-476
N27.8 Technical Details ......................................................................................................... N-477
N27.9 Multivariate Normal Probabilities ............................................................................... N-478

N28: Nested Logit and Covariance Heterogeneity Models ........................................... N-480

N28.1 Introduction ................................................................................................................. N-480
N28.2 Mathematical Specification of the Model.................................................................... N-481
N28.3 Commands for FIML Estimation................................................................................. N-483
N28.3.1 Data Setup.................................................................................................... N-483
N28.3.2 Tree Definition ............................................................................................ N-483
N28.3.3 Utility Functions .......................................................................................... N-485
NLOGIT 5 Table of Contents xv

N28.3.4 Setting and Constraining Inclusive Value Parameters................................. N-486

N28.3.5 Starting Values ............................................................................................ N-487
N28.3.6 Command Builder........................................................................................ N-489
N28.4 Partial Effects and Elasticities ..................................................................................... N-491
N28.5 Inclusive Values, Utilities, and Probabilities ............................................................... N-493
N28.6 Application of a Nested Logit Model .......................................................................... N-494
N28.7 Alternative Normalizations .......................................................................................... N-498
N28.7.1 Nondegenerate Cases ................................................................................... N-501
N28.7.2 Degenerate Cases ......................................................................................... N-504
N28.8 Technical Details ......................................................................................................... N-506
N28.9 Sequential (Two Step) Estimation of Nested Logit Models ........................................ N-508
N28.10 Combining Data Sets and Scaling in Discrete Choice Models .................................. N-511
N28.10.1 Joint Estimation ......................................................................................... N-512
N28.10.2 Sequential Estimation ................................................................................ N-514
N28.11 A Model of Covariance Heterogeneity ...................................................................... N-515
N28.12 The Generalized Nested Logit Model........................................................................ N-517
N28.13 Box-Cox Nested Logit Model ................................................................................... N-520

N29: Random Parameters Logit Model........................................................................... N-523

N29.1 Introduction ................................................................................................................. N-523
N29.2 Random Parameters (Mixed) Logit Models ................................................................ N-524
N29.3 Command for the Random Parameters Logit Models ................................................. N-528
N29.3.1 Distributions of Random Parameters in the Model ..................................... N-529
N29.3.2 Spreads, Scaling Parameters and Standard Deviations ................................ N-532
N29.3.3 Alternative Specific Constants .................................................................... N-536
N29.3.4 Heterogeneity in the Means of the Random Parameters.............................. N-537
N29.3.5 Fixed Coefficients........................................................................................ N-538
N29.3.6 Correlated Parameters.................................................................................. N-538
N29.3.7 Restricted Standard Deviations and Hierarchical Logit Models.................. N-541
N29.3.8 Special Forms of Random Parameter Specifications................................... N-543
N29.3.9 Other Optional Specifications...................................................................... N-547
N29.4 Heteroscedasticity and Heterogeneity in the Variances ................................................ N-548
N29.5 Error Components........................................................................................................ N-549
N29.6 Controlling the Simulations ......................................................................................... N-552
N29.6.1 Number and Initiation of the Random Draws.............................................. N-552
N29.6.2 Halton Draws and Random Draws for Simulations..................................... N-552
N29.7 Model Estimates .......................................................................................................... N-553
N29.8 Individual Specific Estimates ...................................................................................... N-556
N29.8.1 Computing Individual Specific Parameter Estimates .................................. N-557
N29.8.2 Examining the Distribution of the Parameters............................................. N-562
N29.8.3 Conditional Confidence Intervals for Parameters ........................................ N-567
N29.8.4 Willingness to Pay Estimates....................................................................... N-568
N29.9 Applications ................................................................................................................. N-570
N29.10 Panel Data .................................................................................................................. N-572
N29.10.1 Random Effects Model .............................................................................. N-573
N29.10.2 Error Components Model .......................................................................... N-575
N29.10.3 Autoregression Model ............................................................................... N-576
NLOGIT 5 Table of Contents xvi

N29.11 Technical Details ....................................................................................................... N-578

N29.11.1 The Simulated Log Likelihood .................................................................. N-578
N29.11.2 Random Draws for the Simulations ........................................................... N-579
N29.11.3 Halton Draws for the Simulations ............................................................. N-580
N29.11.4 Functions and Gradients ............................................................................ N-583
N29.11.5 Hessians ..................................................................................................... N-585
N29.11.6 Panel Data and Autocorrelation ................................................................. N-586

N30: Error Components Multinomial Logit Model ......................................................... N-587

N30.1 Introduction ................................................................................................................. N-587
N30.2 Command for the Error Components MNL Model ..................................................... N-587
N30.3 Heteroscedastic Error Components ............................................................................. N-589
N30.4 General Form of the Error Components Model ........................................................... N-590
N30.5 Results for the Error Components MNL Model .......................................................... N-591
N30.6 Application .................................................................................................................. N-594
N30.7 Technical Details on Maximum Likelihood Estimation ................................................ N-595

N31: Nonlinear Random Parameters Logit Model ......................................................... N-597

N31.1 Introduction ................................................................................................................. N-597
N31.2 Model Command for Nonlinear RP Models ................................................................ N-597
N31.2.1 Parameter Definition.................................................................................... N-598
N31.2.2 Nonlinear Components ................................................................................ N-598
N31.2.3 Utility Functions .......................................................................................... N-599
N31.2.4 The Error Components Model ..................................................................... N-599
N31.2.5 Scaling function, i The Scaled Nonlinear RP Model .............................. N-599
N31.2.6 Panel Data .................................................................................................... N-600
N31.2.7 Ignored Attributes ........................................................................................ N-600
N31.3 Results ......................................................................................................................... N-600
N31.3.1 Individual Specific Parameters .................................................................... N-601
N31.3.2 Willingness to Pay ....................................................................................... N-601
N31.4 Application .................................................................................................................. N-602
N31.4.1 Elasticities and Partial Effects ..................................................................... N-607
N31.4.2 Variables Saved in the Data Set................................................................... N-608
N31.5 Technical Details ......................................................................................................... N-608

N32: Latent Class Random Parameters Model .............................................................. N-610

N32.1 Introduction ................................................................................................................. N-610
N32.2 Command..................................................................................................................... N-610
N32.2.1 Output Options ............................................................................................ N-611
N32.2.2 Post Estimation ............................................................................................ N-611
N32.3 Applications ................................................................................................................. N-612
N32.4 Technical Details ......................................................................................................... N-620

N33: Generalized Mixed Logit Model .............................................................................. N-623

N33.1 Introduction ................................................................................................................. N-623
N33.2 Commands ................................................................................................................... N-624
N33.2.1 Controlling the GMXLOGIT Parameters .................................................... N-625
NLOGIT 5 Table of Contents xvii

N33.2.2 The Scaled MNL Model .............................................................................. N-626

N33.2.3 Alternative Specific Constants .................................................................... N-626
N33.2.4 Heteroscedasticity........................................................................................ N-626
N33.3 Estimation in Willingness to Pay Space ...................................................................... N-627
N33.4 Results ......................................................................................................................... N-629

N34: Diagnostics and Error Messages ........................................................................... N-632

N34.1 Introduction ................................................................................................................. N-632
N34.2 Discrete Choice (CLOGIT) and NLOGIT ................................................................... N-633

NLOGIT 5 References ...................................................................................................... N-641

NLOGIT 5 Index ................................................................................................................ N-645

Whats New in Version 5? N-1

Whats New in Version 5?

NLOGIT 5 takes advantage of all the new features developed in LIMDEP 10. The main
update specifically in NLOGIT 5 is the large number of new models that we have added. These are
several major expansions of the modeling capability of the program, such as the new generalized
mixed logit model and nonlinear random parameters logit model. We have also continued to add
enhancements to give you greater flexibility in analyzing data and organizing results. We have
added dozens of features in NLOGIT 5, some clearly visible ones such as the new models and some
behind the scenes that will smooth the operation and help to stabilize the estimation programs. The
following will summarize the important new developments.

WN1 The NLOGIT 5 Reference Guide

Users of earlier versions of NLOGIT will see that we have reworked the NLOGIT manual.
The new electronic format will make it much simpler to navigate the manual and find specific topics
of interest, and, of course, will make the documentation much more portable. As in Version 4, we
have included in this manual documentation of the foundational discrete choice models described in
greater detail in the LIMDEP Econometric Modeling Guide, including binary choice and ordered
choice models. These are presented here to develop a complete picture of the use of NLOGIT to
analyze data on discrete choices. Second, we have included extensive explanatory text and dozens of
new examples, with applications, for every technique and model presented. The number of chapters
in the model has increased from 19 to 34 to accommodate the new models, to organize specific
topics more compactly and to make it easier for you to find the documentation you are looking for.

WN2 New Multinomial Choice Models

We have added several major model classes to the package. Some of these are extensions of
the random parameters models that are at the forefront of current practice. We have also extended
the latent class model in two major directions.

WN2.1 New Scaled and Generalized Mixed Logit Models

The base case multinomial logit model departs from a model with linear utility functions and
fixed (nonrandom) coefficients;

Uij = xij + ij

with familiar assumptions about the random components of the random utility functions. The scaled
multinomial logit model builds overall scaling heterogeneity into the MNL model, with

i = i
Whats New in Version 5? N-2

where i is randomly distributed across individuals. The base case random parameters (mixed) logit
model departs from the parameter specification,

i = + zi + wi.

The generalized mixed logit model combines the specification of the scaled MNL with an allocation
parameter, , that distributes two sources of random variation, scale heterogeneity in i and
preference heterogeneity in wi. The encompassing formulation in the generalized mixed logit
model is
i = i[ + zi] + [ + i(1-)]wi.

The scaled MNL as well as several other interesting specifications are special cases of the
generalized mixed logit model.

WN2.2 Estimation in Willingness to Pay Space

Estimation of willingness to pay values is a standard exercise in choice modeling. Recent
research has motivated a search for formulations that allow researchers to avoid using ratios of
coefficients that have dubious statistical properties. One promising approach that is built into our
formulation of the generalized mixed logit model is to transform the model parameters so that
estimation takes place in willingness to pay space, rather than in preference. By this device,
willingness to pay values are estimated directly as the coefficients in the transformed model.

WN2.3 Random Regret Logit Model

The use of random utility maximization as the fundamental platform for choice modeling has
long been the standard approach. Random regret minimization suggests a useful alternative criterion
whereby the individual makes a choice based on avoiding the disutility that results from making
alternative choices that might be less or more attractive. This formulation presents an alternative to
the IIA formulation of the multinomial logit, random utility model.

WN2.4 Latent Class Models

Two new types of latent class models are provided. The first is a random parameters latent
class model. Both features are present in the same model. The central result is that there is a random
parameters model that characterizes each of the latent classes.
The second extended latent class model extends the -888, ignored attributes feature to latent
classes. Up to 32 different classes we have raised the maximum number of classes from 9 to 32
are defined to accommodate the possible patterns of deliberately missing values in the data set.
Whats New in Version 5? N-3

WN2.5 Error Components Logit Model

The multinomial logit model,

exp ( x jit )
Prob(yit = j|E1i,E2i,...) = ,
exp ( x qit )
Ji
q=1

has served as the basic platform for discrete choice modeling for decades. Among its restrictive
features is its inability to capture individual choice specific variation due to unobserved factors. The
error components logit model,

exp ( x jit + j E ji )
Prob(yit = j|E1i,E2i,...) = ,
exp ( x qit + q Eqi )
Ji
q =1

has emerged as a form that allows this. In a repeated choice (panel data) situation, this will play the
role of a type of random effects model.

WN2.6 Nonlinear Random Parameters Logit Model

The nonlinear random parameters logit model expands the range of random parameters
models of the form
i = i[ + zi + wi].

(This is a generalized mixed logit model with = 1.) Parameters may enter the utility functions
nonlinearly. The model also encompasses the error components specification, producing

exp (V j [ i , x jit ] + j E ji )
Prob(yit = j|E1i,E2i,...) = ,
exp (V j [ i , x jit ] + j E ji )
Ji
q =1

where Vj[ i,xjit] is an arbitrary nonlinear function that you define.

WN2.7 Box-Cox Nested Logit Model

The nested logit model is extended to allow an automated handling of the Box-Cox
transformation of the attributes. This provides some elements of a nonlinear utility function model,
though it is much less general than the model in the previous section.
Whats New in Version 5? N-4

WN.3 Model Extensions

In addition to the new model frameworks and many new features built into LIMDEP, we
have added some extensions to the multinomial choice models. As noted, some of these are rather
behind the scenes. For example, we have expanded the limit on model sizes from 100 to 500 choices
and from 150 to 300 model parameters.

WN3.1 General -888 format

The -888 feature that allows you to accommodate deliberately ignored attributes has been
extended so that it is now available in all models.

WN3.2 Mixed Logit Models

Numerous specifications have been added to build realistic, plausible parameter
distributions. For example, the Weibull and triangular distributions provide useful alternatives to the
lognormal for imposing sign constraints on coefficients. There are now 20 different stochastic
specifications for the random parameters in a mixed logit model. We have also built optional
specifications into the definitions of the random parameters to allow variation in the characteristics
that appear in the means and standard deviations of different distributions.

WN3.3 Elasticities and Partial Effects

The formatting of results for elasticities has been completely revised. We have also added a
feature to allow you to export tables of elasticities directly to any version of Excel.

WN3.4 Robust Covariance Matrix

The cluster estimator for clustered data sets that has been built into the other estimators in
LIMDEP has now been added to the models in NLOGIT. The cluster estimator is a correction to the
standard errors of an estimator for assumed panel data effects.
N1: Introduction to NLOGIT Version 5 N-5

N1: Introduction to NLOGIT Version 5

N1.1 Introduction
NLOGIT is a package of programs for analyzing data on multinomial choice. The program,
itself, consists of a special set of estimation and analysis routines, specifically for this class of
models and style of analysis. LIMDEP provides the foundation for NLOGIT, including the full set of
tools used for setting up the data, such as importing data files, transforming variables (e.g.,
CREATE), and so on. NLOGIT is created by adding a set of capabilities to LIMDEP. The notes
below describe this connection in a bit more detail.

N1.2 The NLOGIT Program

NLOGIT adds one (very powerful) command to LIMDEP,

NLOGIT ; specification of choice variable

; specification of choice model behavioral equations
; definition of choice modeling framework (e.g., nested logit)
; other required and optional features $

The NLOGIT command is the gateway to the large set of features that are described in this NLOGIT
Reference Guide. All other features and commands in LIMDEP are provided in the NLOGIT package
as well.
The estimation results produced by NLOGIT look essentially the same as by LIMDEP, but
at various points, there are differences that are characteristic of this type of modeling. For example,
the standard data configuration for NLOGIT looks like a panel data set analyzed elsewhere in
LIMDEP. This has implications for the way, for example, model predictions are handled. These
differences are noted specifically in the descriptions to follow. But, at the same time, the estimation
and post estimation tools provided for LIMDEP, such as matrix algebra and the hypothesis testing
procedures, are all unchanged. That is, NLOGIT is LIMDEP with an additional special command.

N1.3 NLOGIT and LIMDEP Integration and Documentation

NLOGIT 5 is a suite of programs for estimating discrete choice models that are built around
the logit and multinomial logit form. This is a superset of LIMDEPs models NLOGIT 5 is all of
LIMDEP 10 plus the set of tools and estimators described in this guide. LIMDEP 10 contains the
CLOGIT command and the estimator for the conditional logit (or multinomial logit) model.
CLOGIT is the same as the most basic form of the NLOGIT command described in Chapter N19.
The full set of features of LIMDEP 10 is part of this package. We assume that you will use
the other parts of LIMDEP as part of your analysis. To use NLOGIT, you will need to be familiar
with the LIMDEP platform. At various points in your operation of the program, you will encounter
LIMDEP, rather than NLOGIT as the program name, for example in certain menus, dialog boxes,
window headers, diagnostics, and so on. Once again, these result from the fact that in obtaining
NLOGIT, you have installed LIMDEP plus some additional capabilities. If you are uncertain which
program is actually installed on your computer, go to the About box in the main menu. It will
clearly indicate which program you are operating.
N1: Introduction to NLOGIT Version 5 N-6

This NLOGIT Reference Guide provides documentation for some aspects of discrete choice
models in general but is primarily focused on the specialized tools and estimators in NLOGIT 5 that
extend the multinomial logit model. These include, for example, extensions of the multinomial logit
model such as the nested logit, random parameters logit, generalized mixed logit and multinomial
probit models. This guide is primarily oriented to the commands added to LIMDEP that request the
set of discrete choice estimators. However, in order to provide a more complete and useful package,
Chapters N4-N17 in the NLOGIT Reference Guide describe common features of LIMDEP 10 and
NLOGIT 5 that will be integral tools in your analysis of discrete choice data, as shown, for example,
in many of the examples and applications in this manual.
Users will find the LIMDEP documentation, the LIMDEP Reference Guide and the LIMDEP
Econometric Modeling Guide, essential for effective use of this program. It is assumed throughout
that you are already a user of LIMDEP. The NLOGIT Reference Guide, by itself, will not be
sufficient documentation for you to use NLOGIT unless you are already familiar with the program
platform, LIMDEP, on which NLOGIT is placed.
The LIMDEP and NLOGIT documentation use the following format: The LIMDEP
Reference Guide chapter numbers are preceded by the letter R. The LIMDEP Econometric
Modeling Guide chapter numbers are preceded by E, and the NLOGIT Reference Guide chapter
numbers are preceded by N.

N1.4 Discrete Choice Modeling with NLOGIT

NLOGIT is a set of tools for building models of discrete choice among multiple alternatives.
The essential building block that underlies the set of programs is the random utility model of choice,

U(choice 1) = f1 (attributes of choice 1, characteristics of the chooser, 1,v,w)

...
U(choice J) = fJ (attributes of choice J, characteristics of the chooser, J,v,w)

where the functions on the right hand side describe the utility to an individual decision maker of J
possible choices, as functions of the attributes of the choices, the characteristics of the chooser,
random choice specific elements of preferences, j, that may be known to the chooser but are
unobserved by the analyst, and random elements v and w, that will capture the unobservable
heterogeneity across individuals. Finally, a crucial element of the underlying theory is the
assumption of utility maximization,

The choice made is alternative j such that U(choice j) > U(choice q) q j.

The tools provided by NLOGIT are a complete suite of estimators beginning with the simplest binary
logit model for choice between two alternatives and progressing through the most recently developed
models for multiple choices, including random parameters, mixed logit models with individual
specific random effects for repeated observation choice settings and the multinomial probit model.
N1: Introduction to NLOGIT Version 5 N-7

Background theory and applications for the programs described here can be found in many
sources. For a primer that develops the theory for multinomial choice modeling in detail and
presents many examples and applications, all using NLOGIT, we suggest

Hensher, D., Rose, J., and Greene, W., Applied Choice Analysis, Cambridge University
Press, 2005.

A general reference for ordered choice models, also based on NLOGIT is

Greene, W. and Hensher, D., Modeling Ordered Choices, Cambridge University Press,
Cambridge, 2010.

It is not possible (nor desirable) to present all of the necessary econometric methodology in a manual of
this sort. The econometric background needed for Applied Choice Analysis as well as for use of the
tools to be described here can be found in many graduate econometrics books. One popular choice is

Greene, W., Econometric Analysis, 7th Edition, Prentice Hall, Englewood Cliffs, 2011.

N1.5 Types of Discrete Choice Models in NLOGIT

The order and organization of presentations in this manual are partly oriented to the types of
models you will analyze and partly toward the types of data you will use. Chapters N2 and N3
describe discrete choice models including NLOGIT model and command summaries.
In Chapters N4-N15, we develop basic choice models that have occupied a large part of the
econometrics literature for several decades. The situations are essentially those in which the
characteristics of decision makers and the choices that they make form the observational base for the
model building. The fundamental building block for all of these, as well as for the more elaborate
models, is the binary choice model: The structural equations for a model of consumer choice based
on a single alternative either to choose an outcome or not to choose it are

U(choice) = x + ,
Prob(choice) = Prob(U > 0)
= F(x),
Prob(not choice) = 1 - F(x),

where x is a vector of characteristics of the consumer such as age, sex, education, income, and other
sociodemographic variables, is a vector of parameters and F(.) is a suitable function that describes the
model. The choice of vote for a political candidate or party is a natural application. Models for binary
choice are developed at length in Chapters E26-E32 in the LIMDEP Econometric Modeling Guide.
They will be briefly summarized in Chapters N4-N7 to provide the departure point for the models that
follow. Useful extensions of the binary choice model presented in Chapters N8-N12 include models
for more than one simultaneous binary choice (of the same type), including bivariate binary choice
models and simultaneous binary choice models and a model for multivariate binary choices (up to 20).
N1: Introduction to NLOGIT Version 5 N-8

The ordered choice model described in Chapters N13-N15 describe a censoring of the
underlying utility in which consumers are able to provide more information about their preferences.
In the binary choice model, decision makers reveal through their decisions that the utility from
making the choice being modeled is greater than the utility of not making that choice. In the ordered
choice case, consumers can reveal more about their preferences we obtain a discretized version of
their underlying utility. Thus, in survey data, voters might reveal their strength of preferences for a
candidate or a food or drink product, from zero (strongly disapprove), one (somewhat disapprove) to,
say, four (strongly approve).
The appropriate model might be

Prob(strongly dislike) = Prob(U < 0),

Prob(dislike) = Prob(0 < U < 1),
Prob(indifferent) = Prob(1 < U < 2),
and so on.

We can also build extensions of the ordered choice model, such as a bivariate ordered choice model
for two simultaneous choices and a sample selection model for nonrandomly selected samples.
The multinomial logit (MNL) model described in Chapters N16 and N17 is the original
formulation of this model for the situations in which, as in the binary choice and ordered choice
models already considered, we observe characteristics of the individual and the choices that they
make. The classic applications are the Nerlove and Press (1973) and Schmidt and Strauss (1976)
studies of labor markets and occupational choice. The model structure appears as follows:

exp ( j xi )
Prob[yi = j] = .
exp ( q xi )
Ji
q=1

Note the signature feature, that the determinants of the outcome probability are the individual
characteristics. This model represents a straightforward special case of the more general forms of
the multinomial choice model described in Chapters N16 and N17 and in the extensions that follow
in Chapters N23-N33.
Chapters N18-N22 document general aspects of operating NLOGIT. Chapter N18 describes
the way that your data will be arranged for estimation of multinomial discrete choice models.
Chapter N19 presents an overview of the command structure for NLOGIT models. The commands
differ somewhat from one model to another, but there are many common elements that are needed to
set up the essential modeling framework. Chapter N20 describes choice sets and utility functions.
Chapter N21 describes results that are computed for the multinomial choice models beyond the
coefficients and standard errors. Finally, Chapter N22 describes the model simulator. You will use
this tool after fitting a model to analyze the effects of changes in the attributes of choices on the
aggregate choices made by individuals in the sample.
N1: Introduction to NLOGIT Version 5 N-9

The models developed in Chapters N23-N33 extend the binary choice case to situations in
which decision makers choose among multiple alternatives. These settings involve richer data sets in
which the attributes of the alternatives are also part of the observation, and more elaborate models of
behavior. The broad modeling framework is the multinomial logit model. With a particular
specification of the utility functions and distributions of the unobservable random components, we
obtain the canonical form of the logit model,

exp ( xij )
Prob[yi = j] = ,
exp ( xiq )
Ji
q=1

where yi is the index of the choice made. This is the basic, core model of the set of estimators in
NLOGIT. (This is the model described in Chapters N16 and N17.)
The basic setup for this model consists of observations on N individuals, each of whom
makes a single choice among Ji choices, or alternatives. There is a subscript on J because we do not
restrict the choice sets to have the same number of choices for every individual. The data will
typically consist of the choices and observations on K attributes for each choice. The attributes that
describe each choice, i.e., the arguments that enter the utility functions, may be the same for all
choices, or may be defined differently for each utility function. It is also possible to incorporate
characteristics of the individual which do not vary across choices in the utility functions. The
estimators described in this manual allow a large number of variations of this basic model.
In the discrete choice framework, the observed dependent variable usually consists of an
indicator of which among Ji alternatives was most preferred by the respondent. All that is known
about the others is that they were judged inferior to the one chosen. But, there are cases in which
information is more complete and consists of a subjective ranking of all Ji alternatives by the
individual. NLOGIT allows specification of the model for estimation with ranks data. In addition,
in some settings, the sample data might consist of aggregates for the choices, such as proportions
(market shares) or frequency counts. NLOGIT will accommodate these cases as well.
The multinomial model has provided a mainstay of empirical research in this literature for
decades. But, it does have limitations, notably the assumption of independence from irrelevant
alternatives, which limit its generality. Recent research has produced many new, different
formulations that have broadened the model. NLOGIT contains most of these, all of which remove
the crucial IIA assumption of the multinomial logit (MNL) model. Chapters N23-N33 describe these
frontier extensions of the multinomial logit model. In brief, these are as follows:

N1.5.1 Random Regret Logit Model

The random regret logit model is a variant of the basic conditional logit model. The form of
the utility functions involves more direct comparisons of the attributes of the alternatives. Whereas
in the essential MNL model, the utility functions enter the probability linearly in terms of the
attributes, so the coefficients are marginal utilities, in the random regret model, the attributes enter
the probabilities through the regret functions,

Rij (m) = log[1 + exp(m ( x jm xim ))]

which compare attribute m in alternative j to that attribute in alternative i.

N1: Introduction to NLOGIT Version 5 N-10

N1.5.2 Scaled Multinomial Logit Model

The scaled multinomial logit model accommodates individual heterogeneity in choice
structures through the scaling of the marginal utilities rather than in the location parameters. The
coefficients in the scaled MNL model take the form

i = i
where i = exp(zi + vi).

This is a type of random parameters model; the scale parameter can vary systematically with the
observables, zi and randomly across individuals with vi.

N1.5.3 Latent Class and Random Parameters LC Models

The latent class model is a semiparametric approximation to the random parameters
multinomial logit model. It embodies many of the features of the RPL model. But, the parameters
are modeled as having a discrete distribution with a small number of support points. An alternative
interpretation is that individuals are intrinsically sorted into a small number of classes, and
information about class membership is extracted from the sample along with class specific parameter
vectors. The RP variant, which is new with this version of NLOGIT, provides a random parameters
logit model (see Section N1.5.7) in each class.

N1.5.4 Heteroscedastic Extreme Value Model

In the base case, multinomial logit model, the assumption of equal variances produces great
simplicity in the mathematical results, but at considerable cost in the generality of the model. In
particular, if the assumption of equal variances is inappropriate, then the different scaling that is
present in the variances will, instead, be forced on the coefficients in the utility functions, in ways
that may distort the predictions of the model. The heteroscedastic extreme value model relaxes this
assumption by allowing the disturbances in the utility functions each to have their own variance. An
extension of this model allows these unequal variances to be dependent on characteristics of the
individual as well. Thus, the heteroscedasticity assumption allows us to relax the assumption of
equal variances across choices and to incorporate individual heterogeneity in the scaling as well as
the locations of the utility functions.

N1.5.5 Multinomial Probit Model

This model is much more general than the multinomial logit model, but until recently was
largely impractical because of the multinormal integrals required for estimation. We include an
implementation based on the GHK simulation method. The multinomial probit (MNP) model
relaxes the assumptions of the MNL model by assuming joint normality for the random terms in the
utility functions and by allowing (subject to some identification restrictions) the random terms to
have different variances and unrestricted correlations.
N1: Introduction to NLOGIT Version 5 N-11

N1.5.6 Nested Logit Models

The choice among alternatives could be viewed as taking place at more than one level. For
instance, in an application developed in the chapters to follow, we consider transportation mode
choice among four alternatives, car, train, bus, and air. One might view the choice among these
four as first between public (bus, train) and private (air, car) transportation and then, within each of
the two branches of the tree,, a second choice of specific mode. This sort of hierarchical choice is
handled in the setting of nested logit models. NLOGIT allows tree structures to have up to four
levels. There are also several specific forms of the nested logit model that enforce the implications
of utility maximization on the model parameters.
The nested logit (NL) model described in the previous paragraph is appropriately viewed as
a relaxation of the strong IID structure of the multinomial logit model that implies the IIA
assumption. In particular, the nested logit model allows for different variances for the groups of
alternatives in the branches and for (equal) correlation across the alternatives within a branch. (The
earlier interpretation of a decision structure is only superimposed on the nested logit model; it is not
the statistical basis of the NL model. The decision part of the model rests at the lowest level,
among the alternatives.) The covariance heterogeneity model extends this model a bit further by
allowing the variances to depend on variables in the model. The covariance heterogeneity model is a
model of heteroscedasticity.
One of the weaker parts of the nested logit specification is the narrow specific assumption of
which alternative appears in each branch of the tree. This is often not known with certainty. The
generalized nested logit model allows alternatives to appear in more than one branch, in a
probabilistic fashion.

N1.5.7 Random Parameters and Nonlinear RP Logit Model

This is the most general model contained in NLOGIT. As argued by McFadden and Train
(2000), it may be the most flexible form of discrete choice model available generally, as they argue
that any behavior pattern can be captured by this model form. The random parameters logit (RPL)
model extends the MNL model by allowing its parameters to be random across individuals. The
random parameters may have their own data dependent means, their own variances, and may be
correlated. By this device, we obtain an extremely general, flexible model. The assumptions about
the covariance matrix of the random parameters are transmitted to the random terms in the utility
functions so that both the uncorrelatedness and equal variance assumptions are relaxed in the
process. This model also allows a panel data treatment, with either random effects or an
autoregressive pattern in the random terms. The error components logit model provides a method of
incorporating a rich structure of individual specific random effects in the conditional logit and
random parameters models. The nonlinear RP variant allows the utility functions in the probability
model to be arbitrary nonlinear functions of the data and parameters.
N1: Introduction to NLOGIT Version 5 N-12

N1.5.8 Error Components Logit Model

The error components logit model is essentially a random effects model for the MNL
framework. The basic model structure for a repeated choice (panel data) setting would be

exp ( x ji + sM=1d js vis )

Prob[yit = j| vi1,...,viM) = ,
exp ( xiq + sM=1d qs vis )
Ji
q =1

where vi1,...,viM are M individual effects that appear in the Ji utility functions and djs are binary
variables that place specific effects in the different alternatives. Different sets of effects, or only
particular ones, appear in each utility function, which allows a nested type of arrangement.

N1.5.9 Generalized Mixed Logit Model

The generalized mixed logit model is an encompassing model for many of the specifications
already noted, and a variety of new specifications as well. The model follows the random
parameters model of Section N1.5.7, but adds several layers to the specification of the random
parameters. Specifically,

i = i + [ + (1 - )i]vi,

where i is the heterogeneous scale factor noted in Section N1.5.2, is a distribution parameter that
moves emphasis to or away from the random part of the model, is (essentially) the correlation
matrix among the random parameters. As noted, several earlier specifications are special cases.
This form of the RP model allows a number of useful extensions, including estimation of the
model in willingness to pay (WTP) space, rather than utility space.

N1.6 Functions of NLOGIT

The chapters to follow will describe the different features of NLOGIT and the various
models it will estimate. The functionality of the program consists of these major features:

Estimation programs. These are full information maximum likelihood estimators for the
collection of models.

Description and analysis. Model results are used to compute elasticities, marginal effects,
and other descriptive measures.

Hypothesis testing, including the IIA assumption and tests of model specification.

Computation of probabilities, utility functions, and inclusive values for individuals in the
sample.

Simulation of the model to predict the effects of changes in the values of attributes in the
aggregate behavior of the individuals in the sample. For example, if x% of the sampled
individuals choose a particular alternative, how would x change if a certain price in the
model were assumed to be p% higher for all individuals?
N2: Discrete Choice Models N-13

N2: Discrete Choice Models

N2.1 Introduction
This chapter will provide a short, thumbnail sketch of the discrete choice models discussed
in this manual. NLOGIT supports a large array of models for both discrete and continuous variables,
including regression models, survival models, models for counts and, of relevance to this setting,
models for discrete outcomes. The group of models described in this manual are those that arise
naturally from a random utility framework, that is, those that arise from an individual choice setting
in which the model is of an individuals selection among two or more alternatives. This includes
several of the models described in the LIMDEP manual, such as the binary logit and probit models,
but also excludes some others, including the models for count data and censored and truncated
regression models, and some of the loglinear models such as the geometric regression model.
Two groups of models are considered. The first set are the binary, ordered and multivariate
choice models that are documented at length in Chapters E26-E35 in the LIMDEP Econometric
Modeling Guide. These form the basic building blocks for the NLOGIT extensions that are the main
focus of this part of the program. Since they are developed in detail elsewhere, we will only provide
the basic forms and only the essential documentation here. The second group of estimators are the
multinomial logit models and extensions of them that form the group of tools specific to NLOGIT.

N2.2 Random Utility Models

The random utility framework starts with a structural model,

U(choice 1) = f1 (attributes of choice 1, characteristics of the consumer, 1,v,w),

...
U(choice J) = fJ (attributes of choice J, characteristics of the consumer, J,v,w),

where 1,...,J denote the random elements of the random utility functions and in our later treatments,
v and w will represent the unobserved individual heterogeneity built into models such as the error
components and random parameters (mixed logit) models. The assumption that the choice made is
alternative j such that

U(choice j) > U(choice q) q j.

The observed outcome variable is then

y = the index of the observed choice.

The econometric model that describes the determination of y is then built around the assumptions
about the random elements in the utility functions that endow the model with its stochastic
characteristics. Thus, where Y is the random variable that will be the observed discrete outcome,

Prob(Y = j) = Prob(U(choice j) > U(choice q) q j).

N2: Discrete Choice Models N-14

The objects of estimation will be the parameters that are built into the utility functions including
possibly those of the distributions of the random components and, with estimates of the parameters
in hand, useful characteristics of consumer behavior that can be derived from the model, such as
partial effects and measures of aggregate behavior.
To consider the simplest example, that will provide the starting point for our development,
consider a consumers random utility derived over a single choice situation, say whether to make a
purchase. The two outcomes are make the purchase and do not make the purchase. The random
utility model is simply

U(not purchase) = 0x0 + 0,

U(purchase) = 1x1 + 1.

Assuming that 0 and 1 are random, the probability that the analyst will observe a purchase is

Prob(purchase) = Prob(U(purchase) > U(not purchase))

= Prob( 1x1 + 1 > 0x0 + 0)
= Prob(1 - 0 < 1x1 - 0x0)
= F( 1x1 - 0x0),

where F(z) is the CDF of the random variable 1 - 0. The model is completed and an estimator,
generally maximum likelihood, is implied by an assumption about this probability distribution. For
example, if 0 and 1 are assumed to be normally distributed, then the difference is also, and the
familiar probit model emerges. (The probit model is developed in Chapters E26 and E27.)
The sections to follow will outline the models described in this manual in the context of this
random utility model. The different models derive from different assumptions about the utility
functions and the distributions of their random components.

N2.3 Binary Choice Models

Continuing the example in the previous section, the choice of alternative 1 (purchase)
reveals that U1 > U0, or that

0 - 1 < 1x1 - 0x0.

Let = 1 - 0 and x represent the difference on the right hand side of the inequality x is the union
of the two sets of covariates, and is constructed from the two parameter vectors with zeros in the
appropriate locations if necessary. Then, a binary choice model applies to the probability that
x, which is the familiar sort of model developed in Chapter E26. Two of the parametric model
formulations in NLOGIT for binary choice models are the probit model based on the normal
distribution:
'x i exp(t 2 / 2)
F = 2
dt = (xi),
N2: Discrete Choice Models N-15

and the logit model based on the logistic distribution

exp(xi )
F = = (xi).
1 + exp(xi )

Numerous variations on the model can be obtained. A model with multiplicative

heteroscedasticity is obtained with the additional assumption

i ~ normal or logistic with variance [exp(zi)]2,

where zi is a set of observed characteristics of the individual. A model of sample selection can be
extended to the probit and logit binary choice models. In both cases, we depart from

Prob(yi = 1 |xi) = F(xi),

where F(t) = (t) for the probit model and (t) for the logit model,
di* = zi + ui, ui ~ N[0,1], di = 1(di* > 0),
yi, xi observed only when di = 1.

where zi is a set of observed characteristics of the individual. In both cases, as stated, there is no
obvious way that the selection mechanism impacts the binary choice model of interest. We modify
the models as follows: For the probit model,

yi* = xi + i, i ~ N[0,1], yi = 1(yi* > 0),

which is the structure underlying the probit model in any event, and

ui, i ~ N2[(0,0),(1,,1)].

(We use NP to denote the P-variate normal distribution, with the mean vector followed by the
definition of the covariance matrix in the succeeding brackets.) For the logit model, a similar
approach does not produce a convenient bivariate model. The probability is changed to

exp(xi + i )
Prob(yi = 1 | xi,i) = .
1 + exp(xi + i )

With the selection model for zi as stated above, the bivariate probability for yi and zi is a mixture of a
logit and a probit model. The log likelihood can be obtained, but it is not in closed form, and must
be computed by approximation. We do so with simulation. The model and the background results
are presented in Chapter E27.
N2: Discrete Choice Models N-16

There are several formulations for extensions of the binary choice models to panel data
setting. These include

Fixed effects: Prob(yit = 1) = F(xit + i),

i correlated with xit.

Random effects: Prob(yit = 1) = Prob(xit + it + ui > 0),

ui uncorrelated with xit.

Random parameters: Prob(yit = 1) = F( ixit),

i | i ~ h(|i) with mean vector and covariance matrix .

Latent class: Prob(yit = 1|class j) = F( jxit),

Prob(class = j) = Gj(,zi),

where zi is a set of observed characteristics of the individual. Other variations include simultaneous
equations models and semiparametric formulations.

N2.4 Bivariate and Multivariate Binary Choice Models

The bivariate probit model is a natural extension of the model above in which two decisions
are made jointly;
yi1* = 1xi1 + i1, yi1 = 1 if yi1* > 0, yi1 = 0 otherwise,
yi2* = 2xi2 + i2, yi2 = 1 if yi2* > 0, yi2 = 0 otherwise,
[i1,i2] ~ N2[0,0,1,1,], -1 < < 1,
individual observations on y1 and y2 are available for all i.

This model extends the binary choice model to two different, but related outcomes. One might, for
example, model y1 = home ownership (vs. renting) and y2 = automobile purchase (vs. leasing). The
two decisions are obviously correlated (and possibly even jointly determined).
A special case of the bivariate probit model is useful for formulating the correlation between
two binary variables. The tetrachoric correlation coefficient is equivalent to the correlation
coefficient in the following bivariate probit model:

yi1* = + i1, yi1 = 1(yi1* > 0),

yi2* = + i2, yi2 = 1(yi2* > 0),
(i1,i2) ~ N2[(0,0),(1,1,)].

The bivariate probit model has been extended to the random parameters form of the panel data
models. For example, a true random effects model for a bivariate probit outcome can be formulated
as follows: Each equation has its own random effect, and the two are correlated.
N2: Discrete Choice Models N-17

The model structure is

yit1* = 1xit1 + it1 + ui1, yit1 = 1 if yit1* > 0, yit1 = 0 otherwise,

yit2* = 2xit2 + it2 + ui2, yit2 = 1 if yit2* > 0, yit2 = 0 otherwise,
[it1,it2] ~ N2[0,0,1,1,], -1 < < 1,
[ui1 , ui2] ~ N2[0,0,1,1,], -1 < < 1.

Individual observations on yi1 and yi2 are available for all i. Note, in the structure, the idiosyncratic
itj creates the bivariate probit model, whereas the time invariant common effects, uij create the
random effects (random constants) model. Thus, there are two sources of correlation across the
equations, the correlation between the unique disturbances, , and the correlation between the time
invariant disturbances, .
The multivariate probit model is the extension to M equations of the bivariate probit model

yim* = mxim+ im, m = 1,,M

yim = 1 if yim* > 0, and 0 otherwise,
im, m = 1,...,M ~ NM[0,R],

where R is the correlation matrix. Each individual equation is a standard probit model. This
generalizes the bivariate probit model for up to M = 20 equations.

N2.5 Ordered Choice Models

The basic ordered choice model can be cast in an analog to our random utility specification.
We suppose that preferences over a given outcome are reflected as earlier, in the random utility
function:
yi* = xi + i,
i ~ F(i |), = a vector of parameters,
E[i|xi] = 0,
Var[i|xi] = 1.

The consumers are asked to reveal the strength of their preferences over the outcome, but are given
only a discrete, ordinal scale, 0,1,...,J. The observed response represents a complete censoring of the
latent utility as follows:

yi = 0 if yi* 0,
= 1 if 0 < yi* 1,
= 2 if 1 < yi* 2,
...
= J if yi* > J-1.
N2: Discrete Choice Models N-18

The latent preference variable, yi* is not observed. The observed counterpart to yi* is yi. (The
model as stated does embody the strong assumption that the threshold values are the same for all
individuals. We will relax that assumption below.) The ordered probit model based on the normal
distribution was developed by Zavoina and McElvey (1975). It applies in applications such as
surveys, in which the respondent expresses a preference with the above sort of ordinal ranking. The
ordered logit model arises if i is assumed to have a logistic distribution rather than a normal. The
variance of i is assumed to be the standard, one for the probit model and 2/6 for the logit model,
since as long as yi*, , and i are all unobserved, no scaling of the underlying model can be deduced
from the observed data. (The assumption of homoscedasticity is arguably a strong one. We will also
relax that assumption.) Since the s are free parameters, there is no significance to the unit distance
between the set of observed values of yi. They merely provide the coding. Estimates are obtained by
maximum likelihood. The probabilities which enter the log likelihood function are

Prob(yi = j) = Prob(yi* is in the jth range).

The model may be estimated either with individual data, with yi = 0, 1, 2, ... or with grouped data, in
which case each observation consists of a full set of J + 1 proportions, pi0,...,piJ.
There are many variants of the ordered probit model. A model with multiplicative
heteroscedasticity of the same form as in the binary choice models is

Var[i] = [exp(zi)]2.

The following describes an ordered probit counterpart to the standard sample selection model. (This
is only available for the ordered probit specification.) The structural equations are, first, the main
equation, the ordered choice model that was given above and, second, a selection equation, a
univariate probit model,
di* = zi + ui,
di = 1 if di* > 0 and 0 otherwise.

The observation mechanism is

[yi,xi] is observed if and only if di = 1,
i,ui ~ N2[0,0,1,1,]; there is selectivity if is not equal to zero.
The general set of panel data formulations is also available for the ordered probit and logit models.

Fixed effects: Prob(yit = j) = F[j -(xit + i)] - F[j-1-(xit + i)],

i correlated with xit.

Random effects: Prob(yit = j) = F[j -(xit + ui)] - F[j-1-(xit + ui)],

ui uncorrelated with xit.

Random parameters: Prob(yit = j) = F(j - ixit) - F(-j-1 ixit),

i | i ~ h(|i) with mean vector and covariance matrix .

Latent class: Prob(yit = j|class c) = F(j,c - cxit) - F(j-1,c -cxit),

Prob(class = c) = Gc(,zi).
N2: Discrete Choice Models N-19

The hierarchical ordered probit model, or generalized ordered probit model, relaxes the
assumption that the threshold parameters are the same for all individuals. Two forms of the model
are provided.
Form 1: ij = exp(j + zi),
Form 2: ij = exp(j + jzi).

Note that in Form 1, each j has a different constant term, but the same coefficient vector, while in
Form 2, each threshold parameter has its own parameter vector.
Harris and Zhao (2004, 2007) have developed a zero inflated ordered probit (ZIOP)
counterpart to the zero inflated Poisson model. The ZIOP formulation would appear

di* = zi + ui, di = 1 (di* > 0),

yi* = xi + i, yi = 0 if yi* < 0 or di = 0,
1 if 0 < yi* < 1 and di = 1,
2 if 1 < yi* < 2 and di = 1,
and so on.
The first equation is assumed to be a probit model (based on the normal distribution) this estimator
does not support a logit formulation. The correlation between ui and i is , which by default equals
zero, but may be estimated instead. The latent class nature of the formulation has the effect of
inflating the number of observed zeros, even if u and are uncorrelated. The model with correlation
between ui and i is an optional specification that analysts might want to test. The zero inflation
model may also be combined with the hierarchical (generalized) model given above.
The bivariate ordered probit model is analogous to the seemingly unrelated regressions
model for the ordered probit case:
yij* = jxji + ij,
yij = 0 if yij* < 0,
1 if 0 < yij* < 1,
2, ... and so on, j = 1,2,

for a pair of ordered probit models that are linked by Cor(i1,i2) = . The model can be estimated
one equation at a time using the results described earlier. Full efficiency in estimation and an
estimate of are achieved by full information maximum likelihood estimation. Either variable (but
not both) may be binary. (If both are binary, the bivariate probit model should be used.) The
polychoric correlation coefficient is used to quantify the correlation between discrete variables that
are qualitative measures. The standard interpretation is that the discrete variables are discretized
counterparts to underlying quantitative measures. We typically use ordered probit models to analyze
such data. The polychoric correlation measures the correlation between y1 = 0,1,...,J1 and y2 = 0,1,...,J2.
(Note, J1 need not equal J2.) One of the two variables may be binary as well. (If both variables are
binary, we use the tetrachoric correlation coefficient described in Section E33.3.) For the case noted,
the polychoric correlation is the correlation in the bivariate ordered probit model, so it can be
estimated just by specifying a bivariate ordered choice model in which both right hand sides contain
only a constant term.
N2: Discrete Choice Models N-20

N2.6 Multinomial Logit Model

The canonical random utility model suggested by the structure of Section N2.2 is as follows:

U(alternative 0) = 0xi0 + i0,

U(alternative 1) = 1xi1 + i1,
...
U(alternative J) = J xiJ + iJ,
Observed yi = choice j if Ui (alternative j) > Ui (alternative q) q j.

The disturbances in this framework (individual heterogeneity terms) are assumed to be

independently and identically distributed with identical type 1extreme value distribution; the CDF is

F(j) = exp(-exp(-j)).

Based on this specification, the choice probabilities are

Prob(choice j) = Prob(Uj > Uq), q j

exp(j xij )
= , j = 0,...,J.
q=0 exp(q xiq )
J

At this point we make a purely semantic distinction between two cases of the model. When the
observed data consist of individual choices and (only) data on the characteristics of the individual,
identification of the model parameters will require that the parameter vectors differ across the utility
functions, as they do above. The study on labor market decisions by Schmidt and Strauss (1975) is a
classic example. For the moment, we will call this the multinomial logit model. When the data also
include attributes of the choices that differ across the alternatives, then the forms of the utility
functions can change slightly and the coefficients can be generic, that is the same across
alternatives. Again, only for the present, we will call this the conditional logit model. (It will
emerge that the multinomial logit is a special case of the conditional logit model, though the reverse
is not true.) The conditional logit model is defined in Section N2.7.
The general form of the multinomial logit model is

exp(j xi )
Prob(choice j) = , j = 0,...,J.
q=0 exp(q xi )
J

A possible J + 1 unordered outcomes can occur. In order to identify the parameters of the model, we
impose the normalization 0 = 0. This model is typically employed for individual or grouped data in
which the x variables are characteristics of the observed individual(s), not the choices. The data
will appear as follows:

Individual data: yi coded 0, 1, ..., J,

Grouped data: yi0, yi1,...,yiJ give proportions or shares.
N2: Discrete Choice Models N-21

N2.6.1 Random Effects and Common (True) Random Effects

The structural equations of the multinomial logit model are

Uijt = jxit + ijt, t = 1,...,Ti, j = 0,1,...,J,i=1,...,N,

where Uijt gives the utility of choice j by person i in period t we assume a panel data application
with t = 1,...,Ti. The model about to be described can be applied to cross sections, where Ti = 1.
Note also that as usual, we assume that panels may be unbalanced. We also assume that ijt has a
type 1 extreme value distribution and that the J random terms are independent. Finally, we assume
that the individual makes the choice with maximum utility. Under these (IIA inducing) assumptions,
the probability that individual i makes choice j in period t is

exp(j xit )
Pijt = .
exp(q xit )
J
q= 0

We now suppose that individual i has latent, unobserved, time invariant heterogeneity that enters the
utility functions in the form of a random effect, so that

Uijt = jxit + ij + ijt, t = 1,...,Ti, j = 0,1,...,J,i=1,...,N.

The resulting choice probabilities, conditioned on the random effects, are

exp(j xit + ij )
Pijt | i1,...,iJ = .
exp(q xit + iq )
J
q= 0

To complete the model, we assume that the heterogeneity is normally distributed with zero means
and (J+1)(J+1) covariance matrix, . For identification purposes, one of the coefficient vectors,
q, must be normalized to zero and one of the iqs is set to zero. We normalize the first element
subscript 0 to zero. For convenience, this normalization is left implicit in what follows. It is
automatically imposed by the software. To allow the remaining random effects to be freely
correlated, we write the J1 vector of nonzero s as

i = vi

where is a lower triangular matrix to be estimated and vi is a standard normally distributed (mean
vector 0, covariance matrix, I) vector.
N2: Discrete Choice Models N-22

N2.6.2 A Dynamic Multinomial Logit Model

The preceding random effects model can be modified to produce the dynamic multinomial
logit model proposed in Gong, van Soest and Villagomez (2000). The choice probabilities are

exp(j xit + j z it + ij )
Pijt | i1,...,iJ = t = 1,...,Ti, j = 0,1,...,J,i=1,...,N,
exp(q xit + q z it + iq )
J
q=1

where zit contains lagged values of the dependent variables (these are binary choice indicators for the
choice made in period t) and possibly interactions with other variables. The zit variables are now
endogenous, and conventional maximum likelihood estimation is inconsistent. The authors argue
that Heckmans treatment of initial conditions is sufficient to produce a consistent estimator. The
core of the treatment is to treat the first period as an equilibrium, with no lagged effects,

exp(j xi 0 + ij )
Pij0 | i1,...,iJ = , t = 0, j = 0,1,...,J,i=1,...,N,
exp(q xi 0 + iq )
J
q=1

where the vector of effects, , is built from the same primitives as in the later choice probabilities.
Thus, i = vi and i = vi, for the same vi, but different lower triangular scaling matrices. (This
treatment slightly less than doubles the size of the model it amounts to a separate treatment for the
first period.) Full information maximum likelihood estimates of the model parameters,
( 1,..., J,1,...,J,1,...,J,,) are obtained by maximum simulated likelihood, by modifying the
random effects model. The likelihood function for individual i consists of the period 0 probability as
shown above times the product of the period 1,2,...,Ti probabilities defined earlier.

N2.7 Conditional Logit Model

If the utility functions are conditioned on observed individual, choice invariant
characteristics, zi, as well as the attributes of the choices, xij, then we write

U(choice j for individual i) = Uij = xij + jzi + ij, j = 1,...,Ji.

(For this model, which uses a different part of NLOGIT, we number the alternatives 1,...,Ji rather
than 0,...,Ji. There is no substantive significance to this it is purely for convenience in the context
of the model development for the program commands.) The random, individual specific terms,
(i1,i2,...,iJ) are once again assumed to be independently distributed across the utilities, each with
the same type 1 extreme value distribution

F(ij) = exp(-exp(-ij)).

Under these assumptions, the probability that individual t chooses alternative j is

Prob(Uij > Uiq) for all q j.

N2: Discrete Choice Models N-23

It has been shown that for independent type 1 extreme value distributions, as above, this probability
is
exp ( xij + j z i )
Prob(yi = j) =
q=i 1 exp (xiq + q z i )
J

where yi is the index of the choice made. We note at the outset that the IID assumptions made about
j are quite stringent, and induce the Independence from Irrelevant Alternatives or IIA features that
characterize the model. This is functionally identical to the multinomial logit model of Section N2.6.
Indeed, the earlier model emerges by the simple restriction j = 0. We have distinguished it in this
fashion because the nature of the data suggests a different arrangement than for the multinomial logit
model and, second, the models in the section to follow are formulated as extensions of this one.

N2.7.1 Random Regret Logit and Hybrid Utility Models

We consider two direct extensions of the conditional logit model, one related to the forms of
the utility functions and a second related to the treatment of heterogeneity.
The random utility form of the model is based on linear utility functions of the alternatives,

Uijt = xit + ijt, t = 1,...,Ti, j = 0,1,...,J,i=1,...,N.

The random regret form bases the choices at least partly on attribute level regret functions,

Rij(k) = log[1+exp(k(xjk xik))]

where k denotes the specific attribute and i and j denote association with alternatives i and j,
respectively. (See Chorus (2010) and Chorus, Greene and Hensher (2011).) The systematic regret
of choice i can then be written

j 1=
k 1 log[1 + exp(k ( x jk xik ))] .
J K
=Ri =

The random regret form of the choice model is then

exp( R j )
Pj =

J
j =1
exp( R j )

This model does not impose the IIA assumptions. The model may also be specified with only a
subset of the attributes treated in the random regret format. This hybrid model is

exp( R j + xij )
Pj =
exp( R j + xij )
J
j =1
N2: Discrete Choice Models N-24

N2.7.2 Scaled MNL Model

The scaled multinomial logit model allows the model to accommodate broad heterogeneity
across individuals, for example when two or more data sets from different groups are combined.
This is a special case of the generalized mixed logit model described in Section N2.11.2. The
general form of the scaled MNL model is

exp ( i xij )
Prob(yi = j) =
exp ( i xiq )
Ji
q=1

where i = exp(zi + vi)

The scaling factor, i differs across individuals, but not across choices. It has a deterministic
component, exp(zi), and a random component, exp(vi). Either (or both) may equal 1.0, that is,
either or both restrictions = 0 or = 0. For example, a simple nonstochastic scaling differential
between two groups would result if = 0 and if zi were simply a dummy variable that identifies the
two groups. Other forms of scaling heterogeneity can be produced by different variables in zi. The
scaling may also be random through the term vi. In this instance, vi is a random term (usually, but
not necessarily normally distributed). With = 0 and 0, we obtain a randomly scaled
multinomial logit model.

N2.8 Error Components Logit Model

When the sample consists of a panel of data, that is, when individuals are observed in more
than one choice situation, the conditional logit model can be augmented with individual effects,
similar to the use of common effects models in regression and other single equation cases. A panel
data form of this model that is a counterpart to the random effects model is what we label the error
components model. (This has been called the kernel logit model in some treatments in the
literature.) The model arises by introducing M up to maxi Ji alternative and individual specific
random terms in the utility functions as in

U(choice j for individual i in choice setting t)

= Uijt
= xij + jzi + ij + mM=1d jm m uim , j = 1,...,Ji, t = 1,...,Ti.
where djm = 1 if effect m appears in utility function j, 0 if not,
m = the standard deviation of effect m (to be estimated),
vim = effect m for individual i.
N2: Discrete Choice Models N-25

The M random individual specifics are muim. They are distributed as normal with zero means and
variances m2. The constants djm equal one if random effect m appears in the utility function for
alternative j, and zero otherwise. The error components account for unobserved, alternative specific
variation. With this device, the sets of random effects in different utility functions can overlap, so as
to accommodate correlation in the unobservables across choices. The random effects may also be
heteroscedastic, with

m,i2 = m2 exp(mzi).

The probabilities attached to the choices are now

exp ( xij + j z i + mM=1d jm m uim )

Prob(yi = j) = .
q =1 exp (xiq + q z i mM=1dqm muim )
Ji

This is precisely an analog to the random effects model for single equation models. Given the
patterns of djm, this can provide a nesting structure as well. Examples in Chapter N30 will
demonstrate.

N2.9 Heteroscedastic Extreme Value Model

In the conditional logit model,

U(choice j for individual i) = Uij = xij + jzi + ij, j = 1,...,Ji,

exp ( xij + j z i )
Prob(yi = j) = ,
exp ( xim + m z i )
Ji
m=1

an implicit assumption is that the variances of ji are the same. With the type 1 extreme value
distribution assumption, this common value is 2/6. This assumption is a strong one, and it is not
necessary for identification or estimation. The heteroscedastic extreme value model relaxes this
assumption. We assume, instead, that

F(ij) = exp(-exp(-jij)],

Var[ij] = j2 (2/6) where j2 = 1/j2,

with one of the variance parameters normalized to one for identification. (Technical details for this
model including a statement of the probabilities appears in Chapter N26.) A further extension of this
model allows the variance parameters to be heterogeneous, in the standard fashion,

ij2 = j2 exp(zi).
N2: Discrete Choice Models N-26

N2.10 Nested and Generalized Nested Logit Models

The nested logit model is an extension of the conditional logit model. The models supported
by NLOGIT are based on variations of a four level tree structure such as the following:

ROOT root

TRUNKS trunk1 trunk2

LIMBS limb1 limb2 limb3 limb4

BRANCHES branch1 branch2 branch3 branch4 branch5 branch6 branch7 branch8

ALTS a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16

The choice probability under the assumption of the nested logit model is defined to be the
conditional probability of alternative j in branch b, limb l, and trunk r, j|b,l,r:

exp(x j|b ,l , r ) exp(x j|b ,l ,r )

P(j|b,l,r) = = ,
q|b,l ,r exp(xq|b,l ,r ) exp( J b|l , r )

exp(z l |r + l |r I l |r ) exp(z l |r + l |r I l |r )
P(l|r) = = ,
s|r exp(z q|r + s|r I s|r ) exp( H r )

where Hr is the inclusive value for trunk r, Hr = log s|lexp(zs|r + s|rIs|r).

N2: Discrete Choice Models N-27

Finally, the probability of choosing a particular limb is

exp(h r + r H r )
P(r) = .
s exp(h s + s H s )
By the laws of probability, the unconditional probability of the observed choice made by an
individual is
P(j,b,l,r) = P(j|b,l,r) P(b|l,r) P(l|r) P(r).

This is the contribution of an individual observation to the likelihood function for the sample.
The nested logit aspect of the model arises when any of the b|l,r or l|r or r differ from 1.0.
If all of these deep parameters are set equal to 1.0, the unconditional probability reduces to

exp(x j|b ,l , r + y b|l , r + z l |r + h r )

P(j,b,l,r) = ,
r l b j exp(x j ,b,l ,r + y b,l ,r + z l ,r + h r )
which is the probability for a one level conditional (multinomial) logit model.

N2.10.1 Alternative Normalizations of the Nested Logit Model

The formulation of the nested logit model imposes no restrictions on the inclusive value
parameters. However, the assumption of utility maximization and the stochastic underpinnings of
the model do imply certain restrictions. For the former, in principle, the inclusive value parameters
must be between zero and one. For the latter, the restrictions are implied by the way that the random
terms in the utility functions are constructed. In particular, the nesting aspect of the model is
obtained by writing
j|b,l,r = uj|b,l,r + vb|l,r.

That is, within a branch, the random terms are viewed as the sum of a unique component, uj|b,l,r, and a
common component, vb|l,r. This has certain implications for the structure of the scale parameters in
the model. NLOGIT provides a method of imposing the restrictions implied by the underlying
theory.
There are three possible normalizations of the inclusive value parameters which will produce
the desired results. These are provided in this estimator for two and three level models only. This
includes most of the received applications. We will detail the first two of these forms here and
describe how to estimate all of them in Chapter N28. For convenience, we label these random utility
formulations RU1, RU2 and RU3. (RU3 is just a variant of RU2.)
N2: Discrete Choice Models N-28

RU1

The first form is

where Jb|l is the inclusive value for branch b in limb l,

Jb|l = log q|b,l exp(xq|b,l).

At the next level up the tree, we define the conditional probability of choosing a particular branch in
limb l,
exp b|l (y b|l + J b|l ) exp b|l (y b|l + J b|l )
P(b|l) = = ,
s|l exp s|l (y s|l + J s|l ) exp( I l )

where Il is the inclusive value for limb l,

Il = log s|l exp[s|l (ys|l + Js|l)].

The probability of choosing limb l is

exp[ l (z l | + I l )] exp[ l (z l + I l )]
P(l) = = .
s exp [ s (z s + I s )] exp( H )

Note that this the same as the familiar normalization used earlier; this form just makes the scaling
explicit at each level.

RU2

The second form moves the scaling down to the twig level, rather than at the branch level.
Here it is made explicit that within a branch, the scaling must be the same for alternatives.

exp b|l (x j|b ,l ) exp b|l (x j|b ,l )

P(j|b,l) = = .
q|b ,l
exp b|l (x q|b ,l ) exp( J b|l )

Note in the summation in the inclusive value that the scaling parameter is not varying with the
summation index. It is the same for all twigs in the branch. Now, Jb|l is the inclusive value for
branch j in limb l,

Jb|l = log q|b,l exp[b|l (xq|b,l)].

N2: Discrete Choice Models N-29

At the next level up the tree, we define the conditional probability of choosing a particular branch in
limb l,

exp l ( y b|l + (1/ b|l ) J b|l ) exp l ( y b|l + (1/ b|l ) J b|l )
P(b|l) = = ,
s
exp s ( y s|l + (1/ s|l ) J s|l ) exp( I l )

where Il is the inclusive value for limb l,

Il = log s|l exp l ( ' y s|l + (1/ s|l ) J s|l ) .

Finally, the probability of choosing limb l is

exp [ z l + (1/ l ) I l ] exp [ z l + (1/ l ) I l ]

P(l) = = ,
s exp [z s + (1/ s ) I s ] exp( H )

where the log sum for the full model is

H = log s exp [ ' z s + (1/ s ) I s ] .

N2.10.2 A Model of Covariance Heterogeneity

This is a modification of the two level nested logit model. The base case for the model is

exp(x j|b )
P ( j | b) = .
q =1 exp(xq|b )
J |b

Denote the logsum, the log of the denominator, as Jb = inclusive value for branch b = IV(b). Then,

exp(y b + b J b )
P (b) = .
s =1 exp(y s + s J s )
B

The covariance heterogeneity model allows the b inclusive value parameters to be functions of a set
of attributes, vb , in the form

b* = b exp(vb),

where is a new vector of parameters to be estimated. Since the inclusive parameter is a scaling
parameter for a common random component in the alternatives within a branch, this is equivalent to
a model of heteroscedasticity.
N2: Discrete Choice Models N-30

N2.10.3 Generalized Nested Logit Model

The generalized nested logit model is an extension of the nested logit model in which
alternatives may appear in more than one branch. Alternatives that appear in more than one branch
are allocated across branches probabilistically. The model estimated includes the usual nested logit
framework (only two levels are supported in this framework), as well as the matrix of allocation
parameters. The only difference between this and the more basic nested logit model is the
specification of the tree. For the allocations of choices to branches, a multinomial logit form is used,

j,b = Prob(alternative j is in branch b) = exp(j,b) / s exp(j,s),

where the parameters are estimated by the program. Note the denominator summation is over
branches that the alternative appears in. The probabilities sum to one. The identification rule that
one of the s for each alternative modeled equals one is imposed. These allocations may depend on
an individual characteristic (not a choice attribute), such as income. In this instance, the multinomial
logit probabilities become functions of this variable,

j,b = Prob(alternative j is in branch b) = exp(j,b + j,bzi ) / s exp(j,s+ j,szi).

Now, to achieve identification, one of the s is set equal to zero and one of the s is set equal to zero.
It is convenient to form the matrix = [j,b]. This is a JB matrix of allocation parameters. The
rows sum to one, and note that some values in the matrix are zero. But, no rows have all zeros
every alternative appears in at least one branch, and no columns have all zeros every branch
contains at least one alternative. The probabilities for the observed choices are formed as

Prob(alternative, branch) = P(j,b)

= P(j|b) P(b)

[ j ,bU j ]b
where P ( j | b) =
[ j , sU s ]s
B
s =1

(the denominator summation is over the alternatives in that branch)

1/ b
[ j ,bU j ]b
P (b) =
j |b
and 1/ b
.
b=1 j|b [ j ,bU j ]
b
B

N2.10.4 Box-Cox Nested Logit

The Box-Cox form of the nested logit model automates a model specification that was
already in NLOGIT 4. This form can replace the function transformation BCX(variable) in the utility
functions.
N2: Discrete Choice Models N-31

N2.11 Random Parameters Logit Models

In its most general form, we write the multinomial logit probability as

exp( ji + j z i + j f ji + ji x ji )
P( j | vi ) = ,
exp( qi + q z i + qf qi + qi x qi )
J
q =1

where U(j,i) = ji + j z i + j f ji + ji x ji , j = 1,...,Ji alternatives in individual is choice set

ji is an alternative specific constant which may be fixed or random, Ji = 0,

j is a vector of nonrandom (fixed) coefficients, Ji = 0,
j is a vector of nonrandom (fixed) coefficients,
ji is a coefficient vector that is randomly distributed across individuals;
vi enters ji,
zi is a set of choice invariant individual characteristics such as age or income,
fji is a vector of M individual and choice varying attributes of choices,
multiplied by j,
xji is a vector of L individual and choice varying attributes of choices,
multiplied by ji.
The term mixed logit is often used in the literature (e.g., Revelt and Train (1998)) for this model. The
choice specific constants, ji and the elements of ji are distributed randomly across individuals such
that for each random coefficient, ki = any (not necessarily all of) ji or jki, the coefficient on attribute
xjik, k = 1,...,K,
jki = ji or jki = jk + kwi + kvki,
or jki = ji or jki = exp(jk + jkwi + jkvjki).

The vector wi (which does not include one) is a set of choice invariant characteristics that produce
individual heterogeneity in the means of the randomly distributed coefficients; jk is the constant
term and jk is a vector of deep coefficients which produce an individual specific mean. The
random term, vjki is normally distributed (or distributed with some other distribution) with mean 0
and standard deviation 1, so jk is the standard deviation of the marginal distribution of jki. The vjkis
are individual and choice specific, unobserved random disturbances the source of the
heterogeneity. Thus, as stated above, in the population

ji or jki ~ Normal or Lognormal [jk + jkwi, jk2].

(Other distributions may be specified.) For the full vector of K random coefficients in the model, we
may write
i = + wi + vi
N2: Discrete Choice Models N-32

where is a diagonal matrix which contains k on its diagonal. A nondiagonal allows the random
parameters to be correlated. Then, the full covariance matrix of the random coefficients is = .
The standard case of uncorrelated coefficients has = diag(1,2 ,,k). If the coefficients are
freely correlated, is a full, unrestricted, lower triangular matrix and will have nonzero off
diagonal elements. An additional level of flexibility is obtained by allowing the distributions of the
random parameters to be heteroscedastic,
ijk2 = jk2 exp(jkhi).
This is now built into the model by specifying
i = + wi + i vi
where i = diag[ijk2]

and now, is a lower triangular matrix of constants with ones on the diagonal. Finally,
autocorrelation can also be incorporated by allowing the random components of the random
parameters to obey an autoregressive process,

vki,t = ki vki,t-1 + cki,t

where cki,t is now the random element driving the random parameter.
This produces, then, the full random parameters logit model

exp( ji + i x ji )
P( j | vi ) = ,
exp( mi + i x mi )
J
m =1

i = + zi + i vi
vi ~ with mean vector 0 and covariance matrix I.
The specific distributions may vary from one parameter to the next. We also allow the parameters to
be lognormally distributed so that the preceding specification applies to the logarithm of the specific
parameter.

N2.11.1 Nonlinear Utility RP Model

The nonlinear utility function (NLRP) form of the mixed model is one of two major extensions
of this model that appear in NLOGIT 5 the other is the generalized mixed model in the next section.
In the NLRP model, the model parameters may be specified as in the model above. But, the utility
functions need not be linear in the attributes and characteristics. This more general model is
exp[U j (i , x ji )]
P( j | vi ) = ,
m=1 exp[U j (i , x ji )]
J

where i = + zi + i vi
vi ~ with mean vector 0 and covariance matrix I.
and U j (i , x ji ) is any nonlinear function of the data and parameters.
N2: Discrete Choice Models N-33

N2.11.2 Generalized Mixed Logit Model

The second major extension of the random parameters model is the generalized mixed logit
model developed by Fiebig, Keane, Louviere and Wasi (2010). The extension of the random
parameters model is
i = i + vi + (1 - )ivi

The generalized mixed logit model embodies several different forms of heterogeneity in the random
parameters and random scaling, as well as the distribution parameter, , which allocates the influence
of the parameter heterogeneity and the scaling heterogeneity. Several interesting model forms are
produced by different restrictions on the parameters. For example, if = 0 and = 0, we obtain the
scaled MNL model in Section N2.7.2. A variety of other special cases are also provided. One
nonlinear normalization in particular allows the model to be transformed from a specification in
utility space as above to willingness to pay space by analyzing an implicit ratio of coefficients.

N2.12 Latent Class Logit Models

In the latent class formulation, parameter heterogeneity across individuals is modeled with a
discrete distribution, or set of classes. The situation can be viewed as one in which the individual
resides in a latent class, c, which is not revealed to the analyst. There are a fixed number of
classes, C. Estimates consist of the class specific parameters and for each person, a set of
probabilities defined over the classes. Individual is choice among J alternatives at choice situation t
given that individual i is in class c is the one with maximum utility, where the utility functions are

Ujit|c = cxjit + jit

where Ujit = utility of alternative j to individual i in choice situation t

xjit = union of all attributes that appear in all utility functions. For
some alternatives, xjit,k may be zero by construction for some
attribute k which does not enter their utility function for
alternative j.
jit = unobserved heterogeneity for individual i and alternative j in
choice situation t.
c = class specific parameter vector.

Within the class, choice probabilities are assumed to be generated by the multinomial logit model

exp ( c x jit )
Prob[yit = j | class = c] = .
exp ( c x jit )
Ji
j =1
N2: Discrete Choice Models N-34

As noted, the class is not observed. Class probabilities are specified by the multinomial logit form,

exp ( c z i )
Prob[class = c] = Qic = , C = 0.
c=1 exp ( c z i )
C

where zi is an optional set of person, situation invariant characteristics. The class specific
probabilities may be a set of fixed constants if no such characteristics are observed. In this case, the
class probabilities are simply functions of C parameters, c, the last of which is fixed at zero. This
model does not impose the IIA property on the observed probabilities.
For a given individual, the models estimate of the probability of a specific choice is the
expected value (over classes) of the class specific probabilities. Thus,

exp ( x )
Prob(yit = j) = Ec Ji
c jit

exp ( c x jit )
j =1

exp ( x )
c =1 .
C
=
c jit
= Prob(class c )
Ji exp ( c x jit )
j =1

N2.12.1 2K Latent Class Model

NLOGIT accommodates attribute nonattendance by the -888 feature described in Chapter
N18. In particular, in some choice analyses, some, but not all individuals indicate that they did not
pay attention to certain attributes. The appropriate model building strategy is to impose zero
restrictions on the utility parameters, , for these specific individuals. NLOGIT provides this
capability throughout the estimation suite all models are fit with this capability. (This feature is
unique to NLOGIT.) This feature accommodates cases in which individuals explicitly reveal the
form of their utility functions. The model noted here is usable when the sorting of individuals in this
way is latent there is no observed indicator. Consider a model with four attributes, x1, x2, x3, x4.
All individuals attend to x1 and x2. Some ignore x3, some ignore x4, and some ignore both x3 and
x4 (and some attend both). Thus, in terms of the possible utility functions, there are four types of
individuals in the population, distinguished by the type of utility function that is appropriate:

(x3 and x4) Uij = 1x1 + 2x2 + 3x3 + 4x4 +

(x3 only) Uij = 1x1 + 2x2 + 3x3 +
(x4 only) Uij = 1x1 + 2x2 + 4x4 +
(Neither) Uij = 1x1 + 2x2 +

The difference that is built into this model form is that the analyst does not know which individual is
in which group. This can be treated as a latent class model. The number of classes is 2K where K is
the number of attributes that treated by the latent class specification.
N2: Discrete Choice Models N-35

N2.12.2 Latent Class Random Parameters Model

The LCRP model is a combination of the latent class model described above and the random
parameters model in Section N2.11. This is a latent class model in which a random parameters
model applies within each class.

N2.13 Multinomial Probit Model

In this model, the individuals choice among J alternatives is the one with maximum utility,
where the utility functions are

Uji = xji + ji
where Uji = utility of alternative j to individual i
xjit = union of all attributes that appear in all utility functions. For
some alternatives, xjit,k may be zero by construction for some
attribute k which does not enter their utility function for
alternative j.

The multinomial logit model specifies that ji are draws from independent extreme value
distributions (which induces the IIA condition). In the multinomial probit model, we assume that ji
are normally distributed with standard deviations Sdv[ji] = j and correlations Cor[ji, qi] = jq (the
same for all individuals). Observations are independent, so Cor[ji,qs ] = 0 if i is not equal to s, for
all j and q. A variation of the model allows the standard deviations and covariances to be scaled by a
function of the data, which allows some heteroscedasticity across individuals.
The correlations jq are restricted to -1 < jq < 1, but they are otherwise unrestricted save for
a necessary normalization. The correlations in the last row of the correlation matrix must be fixed at
zero. The standard deviations are unrestricted with the exception of a normalization two standard
deviations are fixed at 1.0 NLOGIT fixes the last two.
This model may also be fit with panel data. In this case, the utility function is modified as
follows:
Uji,t = xji,t + ji,t + vji,t

where t indexes the periods or replications. There are two formulations for vji,t,

Random effects vji,t = vji,t (the same in all periods)

First order autoregressive vji,t = j vji,t-1 + aji,t.

It is assumed that you have a total of Ti observations (choice situations) for person i. Two situations
might lend themselves to this treatment. If the individual is faced with a set of choice situations that
are similar and occur close together in time, then the random effects formulation is likely to be
appropriate. However, if the choice situations are fairly far apart in time, or if habits or knowledge
accumulation are likely to influence the latter choices, then the autoregressive model might be the
better one.
N2: Discrete Choice Models N-36

You can also add a form of individual heterogeneity to the disturbance covariance matrix.
The model extension is

Var[i] = exp[hi]

where is the matrix defined earlier (the same for all individuals), and hi is an individual (not
alternative) specific set of variables not including a constant.
N3: Model and Command Summary for Discrete Choice Models N-37

N3: Model and Command Summary for

Discrete Choice Models
N3.1 Introduction
The chapters to follow will provide details on the various discrete choice models you can
estimate with NLOGIT and on the model commands you will use to request the estimates. This chapter
will provide a brief summary listing of the models and model commands. The variety of logit models
now use a set of specific names, rather than qualifiers to more general model classes as in earlier
versions. For example, the model name OLOGIT can be used instead of ORDERD ; Logit. The
earlier formats remain available, but the newer ones may prove more convenient. The full listing of
these commands is also given below. The commands below specify the essential parts needed to fit the
model. The numerous options and different forms are discussed in the chapters to follow (and, were
noted in the LIMDEP Econometric Modeling Guide as well).

N3.2 Model Dimensions

The descriptions below present the different discrete choice models that are the main feature
of NLOGIT. NLOGIT contains all of LIMDEP, so all of the models documented in the LIMDEP
Econometric Modeling Guide, including the regression models, limited dependent variable models,
generalized linear models, sample selection models, and so on are supported in NLOGIT, as well as
the ancillary tools including MATRIX, etc.
There are various built in limits in the estimators. These are noted at the specific points
below where necessary. The following lists the most important internal constraints on the
estimators:

Multinomial choice model estimators in NLOGIT: maximum numbers of:

Alternatives 500
Attributes 300
Branches in nested logit models 25
Limbs in nested logit models 10
Random error components 10
Maximum number of choices in the MLOGIT form of the model 25
Heteroscedasticity models, maximum number of variables 75
Ordered choice models: maximum number of outcomes 25
Unconditional fixed effects models, number of individuals 100,000
Random parameters models, maximum number of RPs 25
Latent class models, maximum number of classes 30
N3: Model and Command Summary for Discrete Choice Models N-38

N3.3 Basic Discrete Choice Models

The binomial probit and logit models and the ordered probit and logit models are the primary
model frameworks for single equation, single decision, discrete choice models. The ordered choice
and the bivariate and multivariate probit models are multivariate extensions of the simple probit model.

N3.3.1 Binary Choice Models

There are six binary choice models, probit, logit, complementary log log, Gompertz, Burr,
and arctangent documented in Chapter E27. The ones that interest us here are the binary probit and
logit models. The probit model is requested with

PROBIT ; Lhs = dependent variable

; Rhs = independent variables $

The binary logit model may be invoked with

BLOGIT ; Lhs = dependent variable

; Rhs = independent variables $

In earlier versions, you would use the LOGIT command, which is still useable. LOGIT is the same
as BLOGIT when the data on the dependent variable are either binary (zeros and ones) or
proportions (strictly between zero and one). Chapters E26-E29 document numerous extensions of
these models. Chapters E30-E32 consider semiparametric and nonparametric approaches and
extensions of the binary choice models for panel data.

N3.3.2 Bivariate Binary Choices

The command for the bivariate probit model is

BVPROBIT ; Lhs = variable 1, variable 2

; Rh1 = independent variables for equation 1
; Rh2 = independent variables for equation 2 $

In this form, the Lhs specifies two binary dependent variables. You may use proportions data
instead, in which case, you will provide four proportions variables, in order, p00, p01, p10, p11.
This command is the same as BIVARIATE PROBIT in earlier versions. (You may still use
BIVARIATE PROBIT.)
N3: Model and Command Summary for Discrete Choice Models N-39

N3.3.3 Multivariate Binary Choice Models

The multivariate probit model is specified with

MVPROBIT ; Lhs = y1, y2, ..., yM

; Eq1 = Rhs variables for equation 1
; Eq2 = Rhs variables for equation 2
...
; EqM = Rhs variables for equation M $

Data for this model must be individual. The Lhs specifies a set of binary dependent variables. This
command is the same as MPROBIT (which may still be used) in earlier versions.

N3.3.4 Ordered Choice Models

Chapter E34 describes five forms for the ordered choice model, probit, logit, complementary
log log, Gompertz and arctangent. The first two interest us here. The ordered probit model is
requested with

OPROBIT ; Lhs = dependent variable

; Rhs = independent variables $

This is the same as the ORDERED PROBIT command, which may still be used. In this model, the
dependent variable is integer valued, taking the values 0, 1, ..., J. All J+1 values must appear in the
data set, including zero. You may supply a set of J+1 proportions variables instead. Proportions will
sum to 1.0 for every observation. Chapter E35 documents a bivariate version of the ordered probit
model for two joint ordered outcomes, and a sample selection model.
The ordered logit model is requested with

OLOGIT ; Lhs = dependent variable

; Rhs = independent variables $

The same arrangement for the dependent variables as for the ordered probit model is assumed. This
command is the same as ORDERED ; Logit in earlier versions.

N3.4 Multinomial Logit Models

The multinomial logit model is a special case of the conditional logit model, which, itself,
is the gateway model to the main model extensions described in Section N2.5.

N3.4.1 Multinomial Logit

The multinomial logit model described in Section N2.6 and Chapter E37 is invoked with

MLOGIT ; Lhs = dependent variable

; Rhs = independent variables $
N3: Model and Command Summary for Discrete Choice Models N-40

Data for the MLOGIT model consist of an integer valued variable taking the values 0, 1, ..., J. This
model may also be fit with proportions data. In that case, you will provide the names of J+1 Lhs
variables that will be strictly between zero and one, and will sum to one at every observation. The
MLOGIT command is the same as LOGIT. The program inspects the command (Lhs) and the data,
and determines internally whether BLOGIT or MLOGIT is appropriate. Note, on proportions data,
if you want to fit a binary logit model with proportions data, you will supply a single proportions
variable, not two. (What would be the second one is just one minus the first.) If you want to fit a
multinomial logit model with proportions data with three or more outcomes, you must provide the
full set of proportions. Thus, you would never supply two Lhs variables in a LOGIT, BLOGIT or
MLOGIT command.

N3.4.2 Conditional Logit

The command for the conditional model, and the commands in the sections to follow, are
variants of the NLOGIT command. This is a full class of estimators based on the conditional logit
form. There are several forms of the essential command for fitting the conditional logit model with
NLOGIT. The simpler one is

CLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics $

As discussed in Chapter N20 and in Section E38.3, the data for this estimator consist of a set of J
observations, one for each alternative. (The observation resembles a group in a panel data set.) The
command just given assumes that every individual in the sample chooses from the same size choice
set, J. The choice sets may have different numbers of choices, in which case, the command is
changed to
; Lhs = dependent variable, choice set size variable

The second Lhs variable is structured exactly the same as a ; Pds variable for a panel data estimator.
In the second form of the model command, the utility functions are specified directly, symbolically.
The ; Rhs and ; Rh2 specifications can be replaced with

; Model: ... specification of the utility functions

This is discussed in Chapter N21 and Chapter E39.

The CLOGIT command is the same as DISCRETE CHOICE. It is also the same as
NLOGIT when the only information given in the command is that specified above, that is when
none of the specifications that invoke the model extensions that are described in the sections to
follow are provided.
N3: Model and Command Summary for Discrete Choice Models N-41

N3.5 NLOGIT Extensions of Conditional Logit

The conditional logit model provides the basic framework for a very large number of
extensions that are provided by NLOGIT. The following lists the basic commands for most of these.
Each model form is developed in greater detail in one of the chapters that follow. Each model may
be specified with a variety of options and different specifications for numerous variants. The
following shows the essential command for the most basic form of the model.

N3.5.1 Random Regret Logit

The random regret form of the model is specified with

RRLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics $
The command is otherwise the same as CLOGIT, with the same formats for variable choice set
sizes, etc. The utility functions must be specified as above, not using ; Model: , owing to the
particular form of the utility functions in the random regret format.

N3.5.2 Scaled Multinomial Logit

The scaled multinomial logit model is a randomly scaled MNL, with i = i, where i is a
heterogeneous scalar. The model command is

SMNLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics $

N3.5.3 Heteroscedastic Extreme Value

The heteroscedastic extreme value model is requested with the command

HLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics $

The command is otherwise the same as the CLOGIT command, with the same formats for variable
choice set sizes and utility function specifications. The HLOGIT command is the same as

NLOGIT ; Heteroscedasticity
; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics $
that was used in earlier versions of NLOGIT. (This may still be used if desired.)
N3: Model and Command Summary for Discrete Choice Models N-42

N3.5.4 Error Components Logit

The error components model is described in Section N2.8 and in Chapter N30. The model
command is

ECLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics
; ECM = specification of the tree structure for the error components $

This command is the same as NLOGIT ; ECM = specification ... $ The error components model
may also be specified as a part of the random parameters model. Thus, your RPLOGIT command
may also contain the ; ECM = specification.

N3.5.5 Nested and Generalized Nested Logit

The nested logit model is the default form of the NLOGIT command. Request the nested
logit model with

NLOGIT ; Tree = specification of the tree structure

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics $

The generalized nested logit model command is

GNLOGIT ; Tree = specification of the tree structure

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics $

The GNLOGIT command in place of the NLOGIT command tells NLOGIT that the tree structure
may have overlapping branch specifications. (You may also use NLOGIT ; GNL.) If you specify
that alternatives appear in more than one branch in the NLOGIT command, this will produce an
error message. The option is available only for the GNLOGIT command. The specification of
variable choice set sizes and utility functions is the same as for the CLOGIT command.
N3: Model and Command Summary for Discrete Choice Models N-43

N3.5.6 Random Parameters Logit

The random parameters logit model (mixed logit model) is requested by specifying a
conditional logit model, and adding the specification of the random parameters. The model
command is

RPLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics
; Fcn = the specifications of the random parameters
; ... other specifications for the random parameters model $

Once again, variable choice set sizes and utility function specifications are specified as in the
CLOGIT command. This command is the same as

NLOGIT ; RPL
; ... the rest of the command $

There is one modification that might be necessary. If you are providing variables that affect the
means of the random parameters, you would generally use

NLOGIT ; RPL = the list of variables

; ... the rest of the command $
The RPL specification may still be used this way. The command can be NLOGIT as above, or

RPLOGIT ; RPL = the list of variables

; ... the rest of the command $

These are identical.

The random parameters model may also include an error components specification defined
in the next section. The command will be

RPLOGIT ; Lhs = dependent variable

N3.5.7 Generalized Mixed Logit

The generalized mixed logit model is an extension of the random parameters model. The
command has several parts that produce the various model types. The essential command is

GMXLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics
; Fcn = specification of the random parameters $

N3.5.8 Nonlinear Random Parameters Logit

This command extends the random parameters model by allowing the utility functions to be
any nonlinear that you specify. There are numerous variants of this model. The essential command
is
NLRPLOGIT ; Lhs = dependent variable
; Choices = the names of the J alternatives
; Labels = the labels used for the model parameters
; Start = starting values for iterations
; Fn1 = specification of a nonlinear function
; up to 50 nonlinear function specifications
; Model: U(name) = one of the nonlinear functions defined /
U(name) = another one of the functions, etc.
; Fcn = specifications of the random parameters $
The model is set up by defining the choice variable and a set of nonlinear functions that will be combined to
make the utility functions. The functions may be arbitrarily complex

N3.5.9 Latent Class Logit

The essential form of the command for the latent class model is

LCLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics
; Pts = the number of classes $
Like the RPLOGIT command, you need to modify this command if you are providing variables that
affect the class probabilities. You would generally use

NLOGIT ; LCM = the list of variables

; ... the rest of the command $
The LCM specification may still be used this way. The command can be NLOGIT as above, or
identically,

LCLOGIT ; LCM = the list of variables

; ... the rest of the command $
N3: Model and Command Summary for Discrete Choice Models N-45

N3.5.10 2K Latent Class Logit

The 2K model is a particular latent class model in which there are simple constraints across
the classes, but only one parameter vector used for the whole model. The model is set up as a latent
class model with an additional specification:

LCLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics
; Pts = the number of classes $

In this form of the model, the number of points is specified as 102, 103, or 104, corresponding to
whether the first 2, 3, or 4 variables in the RHS list are given the special treatment that defines the
model.

N3.5.11 Latent Class Random Parameters

The latent class random parameters model extends the latent class model. The essential
command is

LCRPLOGIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics
; Fcn = definition of the random parameters part
; Pts = the number of classes $

N3.5.12 Multinomial Probit

The multinomial probit model is described in Chapter N27 and Section N2.13. The essential
command is

MNPROBIT ; Lhs = dependent variable

; Choices = the names of the J alternatives
; Rhs = list of choice specific attributes
; Rh2 = list of choice invariant individual characteristics $

Variable choice set sizes and utility function specifications are specified as in the CLOGIT
command. This command is the same as

NLOGIT ; MNP
; ... the rest of the command $
N3: Model and Command Summary for Discrete Choice Models N-46

N3.6 Command Summary

The following lists the current and where applicable, alternative forms of the discrete choice
model commands. The two sets of commands are identical, and for each model, in NLOGIT 5, either
command may be used for that model.

Models Command Alternative Command Form

Binary Choice Models

Binary Probit PROBIT PROBIT
Binary Logit BLOGIT LOGIT
Bivariate Probit BVPROBIT BIVARIATE PROBIT
Multivariate Probit MVPROBIT MPROBIT

Ordered Choice Models

Ordered Probit OPROBIT ORDERED PROBIT
Ordered Logit OLOGIT ORDERED ; Logit

Multinomial Logit Models

Multinomial Logit MLOGIT LOGIT
Conditional Logit CLOGIT DISCRETE CHOICE

Conditional Logit Extensions

Conditional Logit CLOGIT CLOGIT
Multinomial Logit NLOGIT NLOGIT (Same as CLOGIT)
Scaled Multinomial Logit SMNLOGIT GMXLOGIT ; SMNL
Random Regret Multinomial Logit RRLOGIT
Error Components Logit ECLOGIT NLOGIT ; ECM = ...
Heteroscedastic Extreme Value HLOGIT NLOGIT ; Het
Nested Logit NLOGIT ; Tree = ... NLOGIT ; Tree = ...
Generalized Nested Logit GNLOGIT ; Tree = ... NLOGIT ; GNL ; Tree = ...
Random Parameters Logit RPLOGIT NLOGIT ; RPL
Generalized Mixed Logit GMXLOGIT
Nonlinear Random Parameters NLRPLOGIT
Latent Class Logit LCLOGIT NLOGIT ; LCM
2K Latent Class LCLOGIT
Random Parameters Latent Class LCRPLOGIT
Multinomial Probit MNPROBIT NLOGIT ; MNP

NLOGIT contains an additional command that is used for a specific purpose:

NLCONVERT ; Lhs = ... ; Rhs = ... ; Other parameters $

This command is used to reconfigure a data set from a one line format to a multiple line format that is
more convenient in NLOGIT. NLCONVERT is described in Chapter N18.
N3: Model and Command Summary for Discrete Choice Models N-47

N3.7 Subcommand Summary

The following subcommands are used in NLOGIT model commands. The BLOGIT,
BPROBIT, BVPROBIT, MVPROBIT, OLOGIT and OPROBIT commands have additional
specifications that are documented in the LIMDEP Econometric Modeling Guide for these specific
models. The specifications below are those that may appear in the NLOGIT command or the
conditional logit extensions described above.

General Model Specification and Data Setup

Data on Dependent Variable

; Ranks indicates that data are in the form of ranks, possibly ties at last place.
; Shares indicates that data are in the form of proportions or shares.
; Frequencies indicates that data are in the form of frequencies or counts.
; Checkdata checks validity of the data before estimation.
; Wts = name specifies a weighting variable. (Noscale is not used here.)
; Scale (list of variables) = values for scaling loop specifies scaling of certain variables
during iterations.
; Pds = spec indicates multiple choice situations for individuals. Used by RPL, LCM, ECM,
MNP and by binary choice models to indicate a panel data set.

Specification of the Dependent Variable

; Lhs = names specifies model dependent variable(s).

Second Lhs variable indicates variable choice set size.
Third Lhs variable indicates specific choices in a universal choice set.
First Lhs variable is a set of utilities if ; MCS is used.
; MCS requests data generated by Monte Carlo simulation.
; Choices = list lists names for alternatives.

Specification of Utility Functions

; Rhs = names lists choice varying attribute variables.

; Rh2 = names lists choice invariant characteristic variables.
; Model: alternative way to specify utility functions, followed by definitions of
utility functions.
; Fix = list lists names of and values for coefficients that are to be fixed.
; Uset (list of alternatives) = list of values or [list of values] alternative method of
specifying starting values or fixed coefficients.
; Lambda = value specifies coefficient to use for Box-Cox transformation.
; Attr = list lists names for attributes used in one line entry format.
N3: Model and Command Summary for Discrete Choice Models N-48

Output Control

List and Retain Variables and Results

; Prob = name keeps predicted probabilities from estimated model as variable.

; Keep = name keeps predicted values from estimated model as variable. Used by
PROBIT and BLOGIT only.
; Utility = name keeps predicted utilities as variable.
; List lists predicted probabilities and predicted outcomes with model results.
; Parameters retains additional parameters as matrices. With RPL and LCM, keeps
matrices of individual specific parameter means.
; WTP = list lists specifications to retain computations of willingness to pay.

Covariance Matrices

; Covariance Matrix displays estimated asymptotic covariance matrix (normally not shown),
same as ; Printvc.

; Robust computes robust sandwich estimator for asymptotic covariance matrix.

; Cluster = spec computes robust cluster corrected asymptotic covariance matrix.

Display of Estimation Results

; Show displays model specification and tree structure.

; Describe lists descriptive statistics for attributes by alternative.
; Odds includes odds ratios in estimation results. Used only by BLOGIT.
; Crosstab includes crosstabulation of predicted and actual outcomes.
; Table = name adds model results to stored tables.

Marginal Effects

; Effects: spec displays estimated marginal effects. Used by NLOGIT.

; Partial Effects displays marginal effects, same as ; Marginal Effects. Used by PROBIT,
BLOGIT, BVPROBIT, MVPROBIT, OLOGIT, OPROBIT.
; Means computes marginal effects using data means. Uses average partial effects if
this is not specified.
; Pwt uses probability weights to compute average partial effects.

Hypothesis Testing

; Test: spec defines a Wald test of linear restrictions.

; Wald: spec defines a Wald test of linear restrictions, same as ; Test: spec.
; IAS = list lists choices used with CLOGIT to test IIA assumption.
N3: Model and Command Summary for Discrete Choice Models N-49

Optimization

Iterations Controls

; Alg = name specifies optimization method.

; Maxit = n sets the maximum iterations.
; Tlg [ = value] sets the convergence value for convergence on the gradient.
; Tlf [ = value] sets the convergence value for function convergence.
; Tlb[ = value] sets the convergence value for convergence on change in parameters.
; Set keeps current setting of optimization parameters as permanent.
; Output = n requests technical output during iterations; the level n is 1, 2, 3 or 4.

Starting Values

; Start = list provides starting values for all model parameters.

; PR0 = list provides starting values for free parameters only. (Generally not used.)

Constrained Estimation

; CML: spec defines a constrained maximum likelihood estimator.

; Rst = list imposes fixed value and equality constraints.
; Calibrate fixes parameters at previously estimated values.
; ASC initially fit model with just ASCs.

Criterion Function for CLOGIT

; GME [ = number of support points] generalized maximum entropy. Used by MLOGIT

and CLOGIT.
; Sequential sequential two step estimator for nested logit. (Generally not used.)
; Conditional conditional estimator for two step nested logit. (Generally not used.)

Simulation Based Estimation

; Pts = number sets number of replications for simulation estimator. Used by ECM and
MNP. (Also used by LCM to specify number of latent classes.)
; Shuffled uses shuffled uniform draws to compute draws for simulations.
; Halton uses Halton sequences for simulation based estimators.

Simulation Processor (BINARY CHOICE Command for PROBIT and BLOGIT)

; Simulation [ = list of choices] simulates effect of changes in attributes on aggregate outcomes.

; Scenarios specifies changes in attributes for simulations.
; Arc computes arc elasticities during simulations.
; Merge merges revealed and stated preference data during simulations.
N3: Model and Command Summary for Discrete Choice Models N-50

Specific NLOGIT Model Commands

; LCM [ = list of variables] specifies latent class model. Optionally, specifies variables that
enter the class probabilities. (Command is also LCLOGIT.) Also used by
PROBIT and BLOGIT.
; ECM = list of specifications specifies error components logit model. (Command is also
ECLOGIT.)
; HEV specifies heteroscedastic extreme value model. (Command is also
HCLOGIT.)
Heteroscedastic Models
; Het specifies a heteroscedastic model. Used by RPL, ECL and HEV.
; Hfr = names specifies heteroscedastic function in RPL, HEV and covariance
heterogeneity form of nested logit model.
; Hfe = names specifies heteroscedasticity for ECM.
Nested Logit Model
; Tree = spec specifies tree structure in nested logit model.
; GNL specifies generalized nested logit model. (Command is also GNLOGIT.)
; RU1 specifies parameterization of second and third levels of the tree.
; RU2 specifies parameterization of second and third levels of the tree.
; RU3 specifies parameterization of second and third levels of the tree.
; IVSET: spec imposes constraints on inclusive value parameters.
; IVB = name keeps branch level inclusive values as a variable.
; IVL = name keeps limb level inclusive values as a variable.
; IVT = name keeps trunk level inclusive values as a variable.
; Prb = name keeps branch level probabilities as a variable.
; Cprob = name keeps conditional probabilities for alternatives.
Random Parameters Logit Model
; RPL [ = list of variables] requests mixed logit model. Optionally specifies variables to
enter means of random parameters.
; AR1 AR(1) structure for random terms in random parameters.
; Fcn: defines names and types of random parameters.
; Correlation specifies that random parameters are correlated.
; Hfr = names defines variables in heteroscedasticity. Also used by HEV and covariance
heterogeneity.
Multinomial Probit
; MNP specifies multinomial probit model. (Command is also MNPROBIT.)
; EQC = list specifies a set of choices whose pairwise correlations are all equal.
; RCR = list specifies configurations for correlations for multinomial probit model.
Also used by RPL.
; SDV = list specifies diagonal elements of covariance matrix. Also used by RPL
and HEV.
; REM specifies random effects form of the model.
N4: Data for Binary and Ordered Choice Models N-51

N4: Data for Binary and Ordered Choice

Models
N4.1 Introduction
The data arrangement needed for discrete choice modeling depends on the model you are
estimating. For the models described in Chapters N4-N15, you are fitting either cross section or
panel models, and the observations are arranged accordingly. This is needed because in this part of
the environment, you are fitting models for a single choice, and you need only a single observation to
record that choice. For the models in Chapters N16 and N17 and N23-N33, the basic format of your
data set will resemble a panel, even though it will usually be a cross section. This is because you are
fitting models for choice sets with multiple alternatives, with one observation (data record) for each
alternative. For panel models in the discrete choice environment, your data will consist of sets of
groups of observations. This is developed in detail in Chapter N20.

N4.2 Grouped and Individual Data for Discrete Choice

Models
There are two types of data which may be analyzed. We say that the data are individual if
the measurement of the dependent variable is physically discrete, consisting of individual responses.
The familiar case of the probit model with measured 0/1 responses is an example. The data are
grouped if the underlying model is discrete but the observed dependent variable is a proportion. In
the probit setting, this arises commonly in bioassay. A number of respondents have the same values
of the independent variables, and the observed dependent variable is the proportion of them with
individual responses equal to one. Voting proportions are a common application from political
science.
With only two exceptions, all of the discrete response models estimated by LIMDEP and
NLOGIT can be estimated with either individual or grouped data. The two exceptions are

the multivariate probit model described in Chapter N12 (and E33)

the multinomial probit model described in Chapter N27

You do not have to inform the program which type you are using. If necessary, the data are inspected
to determine which applies. The differences in estimation arise only in the way starting values are
computed and, occasionally, in the way the output should be interpreted. Cases sometimes arise in
which grouped data contain cells which are empty (proportion is zero) or full (proportion is one).
This does not affect maximum likelihood estimation and is handled internally in obtaining the
starting values. No special attention has to be paid to these cells in assembling the data set. We do
note, zero and unit proportions data are sometimes indicative of a flawed data set, and can distort
your results.
N4: Data for Binary and Ordered Choice Models N-52

N4.3 Data Used in Estimation of Binary Choice Models

The following lists the specific features of the data needed to enable estimation of binary
choice models. Certain features of the data that are inconsequential or irrelevant in linear regression
modeling can impede estimation of a discrete choice model.

N4.3.1 The Dependent Variable

Data on the dependent variable for binary choice models may be individual or grouped. The
estimation program will check internally, and adjust accordingly where necessary. The log
likelihood function computed takes the same form for either case. The only special consideration
concerns the computation of the starting values for the iterations. If you do not provide your own
starting values, they are determined for the individual data case by simple least squares. The OLS
estimator is not useful in itself, but it does help to adjust the scale of the coefficient vector for the
first iteration. For the grouped data case, however, the initial values are determined by the minimum
chi squared, weighted least squares computation. Since this will generally involve logarithms or
other transformations which become noncomputable at zero or one, they are not computed for
individual data.

N4.3.2 Problems with the Independent Variables

There is a special consideration for the independent variables in a binary choice model. If a
variable xk is such that the range of xk can be divided into two parts and within the two parts, the
value of the dependent variable is always the same, then this variable becomes a perfect predictor for
the model. The estimator will break down, sometimes by iterating endlessly as the coefficient vector
drifts to extreme values. The following program illustrates the effect: The variable z is positive
when y equals one and negative when it equals zero. Notice, first, it spun for 100 iterations, which
is almost certainly problematic. A probit model should take less than 10 iterations. Second, note that
the log likelihood function is essentially zero, indicative of a perfect fit. Finally, note that the
coefficients are nonsensical, and the standard errors are essentially infinite. All are indicators of a
bad data set and/or model. The extreme (perfect) values for the fit measures on the next page
underscore the point. Finally, note the prediction table shows that the model predicts the dependent
variable perfectly.

SAMPLE ; 1-100 $
CALC ; Ran(12345) $
CREATE ; x = Rnn(0,1) ; d = Rnu(0,1) > .5 $
CREATE ; y = (-.5 + x + d + Rnn(0,1)) > 0 $
CREATE ; If(y = 1)z = Rnu(0,1)
; If(y = 0)z = -Rnu(0,1) $
PROBIT ; Lhs = y
; Rhs = one,x,z
; Output = 4 $
N4: Data for Binary and Ordered Choice Models N-53

Maximum of 100 iterations. Exit iterations with status=1.

-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable Y
Log likelihood function .00000
Restricted log likelihood -69.13461
Chi squared [ 2 d.f.] 138.26922
Significance level .00000
McFadden Pseudo R-squared 1.0000000
Estimation based on N = 100, K = 3
Inf.Cr.AIC = 6.0 AIC/N = .060
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| -.98505 148462.2 .00 1.0000 *********** ***********
X| .14766 120032.6 .00 1.0000 *********** ***********
Z| 144.424 345728.4 .00 .9997 -677470.698 677759.546
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

+---------------------------------------------------------+
|Predictions for Binary Choice Model. Predicted value is |
|1 when probability is greater than .500000, 0 otherwise.|
|Note, column or row total percentages may not sum to |
|100% because of rounding. Percentages are of full sample.|
+------+---------------------------------+----------------+
|Actual| Predicted Value | |
|Value | 0 1 | Total Actual |
+------+----------------+----------------+----------------+
| 0 | 53 ( 53.0%)| 0 ( .0%)| 53 ( 53.0%)|
| 1 | 0 ( .0%)| 47 ( 47.0%)| 47 ( 47.0%)|
+------+----------------+----------------+----------------+
|Total | 53 ( 53.0%)| 47 ( 47.0%)| 100 (100.0%)|
+------+----------------+----------------+----------------+
+---------------------------------------------------------+
|Crosstab for Binary Choice Model. Predicted probability |
|vs. actual outcome. Entry = Sum[Y(i,j)*Prob(i,m)] 0,1. |
|Note, column or row total percentages may not sum to |
|100% because of rounding. Percentages are of full sample.|
+------+---------------------------------+----------------+
|Actual| Predicted Probability | |
|Value | Prob(y=0) Prob(y=1) | Total Actual |
+------+----------------+----------------+----------------+
| y=0 | 52 ( 52.0%)| 0 ( .0%)| 53 ( 52.0%)|
| y=1 | 0 ( .0%)| 46 ( 46.0%)| 47 ( 46.0%)|
+------+----------------+----------------+----------------+
|Total | 53 ( 52.0%)| 46 ( 46.0%)| 100 ( 98.0%)|
+------+----------------+----------------+----------------+
-----------------------------------------------------------------------
Analysis of Binary Choice Model Predictions Based on Threshold = .5000
-----------------------------------------------------------------------
Prediction Success
-----------------------------------------------------------------------
Sensitivity = actual 1s correctly predicted 97.872%
Specificity = actual 0s correctly predicted 98.113%
Positive predictive value = predicted 1s that were actual 1s 100.000%
Negative predictive value = predicted 0s that were actual 0s 98.113%
Correct prediction = actual 1s and 0s correctly predicted 98.000%
-----------------------------------------------------------------------
Prediction Failure
-----------------------------------------------------------------------
False pos. for true neg. = actual 0s predicted as 1s .000%
False neg. for true pos. = actual 1s predicted as 0s .000%
False pos. for predicted pos. = predicted 1s actual 0s .000%
False neg. for predicted neg. = predicted 0s actual 1s .000%
False predictions = actual 1s and 0s incorrectly predicted .000%
-----------------------------------------------------------------------

In general, for every Rhs variable, x, the minimum x for which y is one must be less than the
maximum x for which y is zero, and the minimum x for which y is zero must be less than the maximum
x for which y is one. If either condition fails, the estimator will break down. This is a more subtle, and
sometimes less obvious failure of the estimator. Unfortunately, it does not lead to a singularity and the
eventual appearance of collinearity in the Hessian. You might observe what appears to be convergence
of the estimator on a set of parameter estimates and standard errors which might look reasonable. The
main indication of this condition would be an excessive number of iterations the probit model will
usually reach convergence in only a handful of iterations and a suspiciously large standard error is
reported for the coefficient on the offending variable, as in the preceding example.
N4: Data for Binary and Ordered Choice Models N-55

You can check for this condition with the command:

CALC ; Chk (names of independent variables to check,

name of dependent variable) $

The offending variable in the previous example would be tagged by this check;

CALC ; Chk(z,y) $
Error 462: 0/1 choice model is inestimable. Bad variable = Z
Error 463: Its values predict 1[Y = 1] perfectly.
Error 116: CALC - Unable to compute result. Check earlier message.

This computation will issue warnings when the condition is found in any of the variables listed.
(Some computer programs will check for this condition automatically, and drop the offending
variable from the model. In keeping with LIMDEPs general approach to modeling, this program
does not automatically make functional form decisions. The software does not accept the job of
determining the appropriate set of variables to include in the equation. This is up to the analyst.)

N4.3.3 Dummy Variables with Empty Cells

A problem similar to the one noted above arises when your model includes a dummy
variable that has no observations equal to one in one of the two cells of the dependent variable, or
vice versa. An example appears in Greene (1993, p. 673) in which the Lhs variable is always zero
when the variable Southwest is zero. Professor Terry Seaks has used this example to examine a
number of econometrics programs. He found that no program which did not specifically check for
the failure only one did could detect the failure in some other way. All iterated to apparent
convergence, though with very different estimates of this coefficient and differing numbers of
iterations because of their use of different convergence rules. This form of incomplete matching of
values likewise prevents estimation, though the effect is likely to be more subtle. In this case, a
likely outcome is that the iterations will fail to converge, though the parameter estimates will not
necessarily become extreme.
Here is an example of this effect at work. The probit model looks excellent in the full
sample. In the restricted sample, d never equals zero when y equals zero. The estimator appears to
have converged, the derivatives are zero, but the standard errors are huge:

SAMPLE ; 1-100 $
CALC ; Ran(12345) $
CREATE ; x = Rnn(0,1)
; d = Rnu(0,1) > .5 $
CREATE ; y = (-.5 + x + d + Rnn(0,1)) > 0 $
PROBIT ; Lhs = y
; Rhs = one,x,d $

In this subset of data, d is always one when y equals zero.

REJECT ;y=0&d=0$
PROBIT ; Lhs = y
; Rhs = one,x,d $
N4: Data for Binary and Ordered Choice Models N-56

Nonlinear Estimation of Model Parameters

Method=NEWTON; Maximum iterations=100
1st derivs. .35811D+02 -.19962D+02 .12369D+01
Itr 1 F= .6981D+02 gtHg= .6608D+01 chg.F= .6981D+02 max|db|= .9613D+01
1st derivs. .49044D+01 -.74989D+01 -.29693D+00
Itr 2 F= .4521D+02 gtHg= .2003D+01 chg.F= .2460D+02 max|db|= .5302D+00
...
Itr 5 F= .4282D+02 gtHg= .1305D-03 chg.F= .4625D-03 max|db|= .2534D-04
1st derivs. -.10201D-08 -.76739D-08 -.32583D-08
Itr 6 F= .4282D+02 gtHg= .2445D-08 chg.F= .8516D-08 max|db|= .4705D-09
* Converged
Normal exit from iterations. Exit status=0.
Function= .69808104286D+02, at entry, .42822158396D+02 at exit

+---------------------------------------------+
| Binomial Probit Model |
| Dependent variable Y |
| Number of observations 100 |
| Iterations completed 6 |
| Log likelihood function -42.82216 |
+---------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
---------+Index function for probability
Constant| -.93917517 .23373657 -4.018 .0001
X | 1.17177061 .24254318 4.831 .0000 .10291147
D | 1.53191876 .35304007 4.339 .0000 .45000000

The second model required 24 iterations to converge, and produced these results: The apparent
convergence is deceptive, as evidenced by the standard errors.

Nonlinear Estimation of Model Parameters

Method=NEWTON; Maximum iterations=100
Itr 21 F= .1660D+02 gtHg= .3006D-04 chg.F= .1614D-08 max|db|= .2668D-01
1st derivs. -.19854D-08 .10979D-08 -.28588D-14
Parameters: .70037D+01 .14126D+01 -.63569D+01
Itr 22 F= .1660D+02 gtHg= .1787D-04 chg.F= .5692D-09 max|db|= .2530D-01
1st derivs. -.72119D-09 .39979D-09 .11824D-13
Parameters: .71645D+01 .14126D+01 -.65178D+01
Itr 23 F= .1660D+02 gtHg= .1064D-04 chg.F= .2012D-09 max|db|= .2406D-01
1st derivs. -.26221D-09 .14554D-09 -.35527D-14
Parameters: .73213D+01 .14126D+01 -.66746D+01
Itr 24 F= .1660D+02 gtHg= .6336D-05 chg.F= .7126D-10 max|db|= .2294D-01
* Converged
Normal exit: 24 iterations. Status=0, F= 16.60262
Function= .26413087151D+02, at entry, .16602624379D+02 at exit
N4: Data for Binary and Ordered Choice Models N-57

-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable Y
Log likelihood function -16.60262
Restricted log likelihood -32.85957
Chi squared [ 2 d.f.] 32.51388
Significance level .00000
McFadden Pseudo R-squared .4947400
Estimation based on N = 61, K = 3
Inf.Cr.AIC = 39.2 AIC/N = .643
Hosmer-Lemeshow chi-squared = 4.91910
P-value= .08547 with deg.fr. = 2
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| 7.32134 24162.78 .00 .9998 *********** 47365.49187
X| 1.41264*** .39338 3.59 .0003 .64163 2.18365
D| -6.67459 24162.78 .00 .9998 *********** 47351.49594
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

You can check for this condition if you suspect it is present by using a crosstab. The command is

CROSSTAB ; Lhs = dependent variable

; Rhs = independent dummy variable $

The 22 table produced should contain four nonempty cells. If any cells contain zeros, as in the
table below, then the model will be inestimable.

+-----------------------------------------------------------------+
|Cross Tabulation |
|Row variable is Y (Out of range 0-49: 0) |
|Number of Rows = 2 (Y = 0 to 1) |
|Col variable is D (Out of range 0-49: 0) |
|Number of Cols = 2 (D = 0 to 1) |
|Chi-squared independence tests: |
|Chi-squared[ 1] = 6.46052 Prob value = .01103 |
|G-squared [ 1] = 9.92032 Prob value = .00163 |
+-----------------------------------------------------------------+
| D |
+--------+--------------+------+ |
| Y| 0 1| Total| |
+--------+--------------+------+ |
| 0| 0 14| 14| |
| 1| 16 31| 47| |
+--------+--------------+------+ |
| Total| 16 45| 61| |
+-----------------------------------------------------------------+
N4: Data for Binary and Ordered Choice Models N-58

N4.3.4 Missing Values

Missing values in the current sample will always impede estimation. In the case of the
binary choice models, if your sample contains missing observations for the dependent variable, you
will receive a warning about improper coding of the values of the Lhs variable. This message will be
given whenever values of the dependent variable appear to be neither binary (0/1) or a proportion,
strictly between 0 and 1.

Probit: Data on Y are badly coded. (<0,1> and <=0 or >= 1).

Missing values for the independent variables will also badly distort the estimates. Since the
program assumes you will be deciding what observations to use for estimation, and -999 (the missing
value code) is a valid value, missing values on the right hand side of your model are not flagged as
an error. You will generally be able to see their presence in the model results. The sample means
for variables which contain missing values will usually look peculiar. In the small example below,
x2 is a dummy variable. Both coefficients are one, which should be apparent in a sample of 1,000.
The results, which otherwise look quite normal, suggest that missing values are being used as data in
the estimation. With SKIP, the results, based on the complete data, look much more reasonable.

CALC ; Ran(12345) $
SAMPLE ; 1-1000 $
CREATE ; x1 = Rnn(0,1)
; x2 = (Rnu(0,1) > .5) $
CREATE ; y = (-.5 + x1 +x2+rnn(0,1)) > 0 $
CREATE ; If(_obsno > 900)x2 = -999 $
PROBIT ; Lhs = y
; Rhs = one,x1,x2 $
SKIP $
PROBIT ; Lhs = y
; Rhs = one,x1,x2 $

Normal exit: 5 iterations. Status=0, F= 549.5785

-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable Y
Log likelihood function -549.57851
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| -.08623* .04601 -1.87 .0609 -.17640 .00394
X1| .81668*** .05541 14.74 .0000 .70807 .92529
X2| .00029* .00015 1.95 .0517 .00000 .00058
--------+--------------------------------------------------------------------
N4: Data for Binary and Ordered Choice Models N-59

-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable Y
Log likelihood function -441.38989
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| -.57123*** .07004 -8.16 .0000 -.70850 -.43396
X1| .97268*** .06611 14.71 .0000 .84310 1.10225
X2| .98082*** .10134 9.68 .0000 .78219 1.17945
--------+--------------------------------------------------------------------

You should use either SKIP or REJECT to remove the missing data from the sample. (See
Chapter R7 for details on skipping observations with missing values.)

N4.4 Bivariate Binary Choice

The bivariate probit model can be fit with either grouped data (you provide four proportions
variables) or individual data (you provide two binary variables). In either case, the data must contain
observations in both off diagonal cells. If your binary data are such that either the (y1=0,y2=1) or the
(y1=1,y2=0) have no observations, then the correlation coefficient cannot be estimated, and the
estimator will iterate endlessly, eventually converging to a value of -1 or +1 for . Note that this
does not apply to the bivariate probit with selection, but that is a different model. For the grouped
data case, if one of the proportions variables is always zero, the same problem will arise.

N4.5 Ordered Choice Model Structure and Data

Data for the ordered choice models must obey essentially the same rules as those for binary
choice models. Data may be grouped or individual. (Survey data might logically come in grouped
form.) If you provide individual data, the dependent variable is coded 0, 1, 2, ..., J. There must be at
least three values. Otherwise, the binary probit model applies. If the data are grouped, a full set of
proportions, p0, p1, ..., pJ, which sum to one at every observation must be provided. In the individual
data case, the data are examined to determine the value of J, which will be the largest observed value
of y that appears in the sample. In the grouped data case, J is one less than the number of Lhs
variables you provide. There are two additional considerations for ordered choice modeling.

N4.5.1 Empty Cells

If you are using individual data, the Lhs variable must be coded 0,1,...,J. All the values must
be present in the data. NLOGIT will look for empty cells. If there are any, the estimation is halted.
(If the value j is not represented in the data, then the threshold parameter, j cannot be estimated.
In this case, you will receive a diagnostic such as
ORDE, Panel, BIVA PROBIT: A cell has (almost) no observations.
Empty cell: Y never takes the value 2.

This diagnostic means exactly what it says. The ordered probability model cannot be estimated
unless all cells are represented in the data
N4: Data for Binary and Ordered Choice Models N-60

N4.5.2 Coding the Dependent Variable

Users frequently overlook the coding requirement, y = 0,1,... If you have a dependent
variable that is coded 1, 2,..., you will see the following diagnostic

Models - Insufficient variation in dependent variable

The reason this particular diagnostic shows up is that NLOGIT creates a new variable from your
dependent variable, say y, which equals zero when y equals zero and one when y is greater than zero.
It then tries to obtain starting values for the model by fitting a regression model to this new variable.
If you have miscoded the Lhs variable, the transformed variable always equals one, which explains
the diagnostic. In fact, there is no variation in the transformed dependent variable. If this is the case,
you can simply use CREATE to subtract 1.0 from your dependent variable to use this estimator.

N4.6 Constant Terms

In general, discrete choice models should contain constant terms. Omitting the constant term
is analogous to leaving the constant term out of a linear regression. This imposes a restriction that
rarely makes sense.
The ordered probit model must include a constant term, one, as the first Rhs variable. Since
the equation does include a constant term, one of the s is not identified. We normalize 0 to zero.
(Consider the special case of the binary probit model with something other than zero as its threshold
value. If it contains a constant, this cannot be estimated.) Other programs sometimes use different
normalizations of the model. For example, if the constant term is forced to equal zero, then one will
instead, have a nonzero threshold parameter, 0, which equals zero in the presence of a nonzero
constant term.
In the more general multinomial choice models, when choices are unlabelled, there may be
no case for including alternative specific constants (ASCs) in the model, since they are not actually
associated with a particular choice. On the other hand, ASCs in a model with unlabelled choices
might simply imply that after controlling for the effects of the attributes, the indicated alternative is
chosen more or less frequently than the base alternative. It is possible that this might occur because
the alternative is close to the reference alternative or that culturally, those undertaking the
experiment might tend to read left to right. Failure to include ASCs in the model would in this case
correlate the alternative order effect into the other estimated parameters, possibly distorting the
model results.
N5: Models for Binary Choice N-61

N5: Models for Binary Choice

N5.1 Introduction
We define models in which the response variable being described is inherently discrete as
qualitative response (QR) models. This and the next several chapters will describe NLOGITs
qualitative dependent variable model estimators. The simplest of these are the binomial choice
models, which are the subject of this chapter and Chapters E27-E29. This will be followed by the
progressively more intricate formulations such as bivariate and multivariate probit, multinomial logit
and ordered choice models. NLOGIT supports a large variety of models and extensions for the
analysis of binary choice. The parametric model formulations, probit, logit, extreme value
(complementary log log) etc. are treated in detail in Chapter E27. We will focus on the first two of
these here.

N5.2 Modeling Binary Choices

A binomial response may be the outcome of a decision or the response to a question in a
survey. Consider, for example, survey data which indicate political party choice, mode of
transportation, occupation, or choice of location. We model these in terms of probability
distributions defined over the set of outcomes. There are a number of interpretations of an
underlying data generating process that produce the binary choice models we consider here. All of
them are consistent with the models that NLOGIT estimates, but the exact interpretation is a function
of the modeling framework.

N5.2.1 Underlying Processes

Consider a process with two possible outcomes indicated by a dependent variable, y, labeled
for convenience, y = 0 and y = 1. We assume, as well, that there is a set of measurable covariates, x,
which will be used to help explain the occurrence of one outcome or the other. Most models of
binary choice set up in this fashion will be based upon an index function, x, where is a vector of
parameters to be estimated. The modeling of discrete, binary choice in these terms, is typically done
in one of the following frameworks:

Random Utility Approach

The respondent derives utility

U0 = 0x + 0 from choice 0, and U1 = 1x + 1 from choice 1,

in which 0 and 1 are the individual specific, random components of the individuals utility that are
unaccounted for by the measured covariates, x. The choice of alternative 1 reveals that U1 > U0, or that

0 - 1 < 0x - 1x.
N5: Models for Binary Choice N-62

Let = 0 - 1 and let x represent the difference on the right hand side of the inequality x is the
union of the two sets of covariates, and is constructed from the two parameter vectors with zeros in
the appropriate locations if necessary. Then, the binary choice model applies to the probability that
x, which is the familiar sort of model shown in the next paragraph. This is a convenient way to
view migration behavior and survey responses to questions about economic issues.

Latent Regression Approach

A latent regression is specified as

y* = x + .
The observed counterpart to y* is
y = 1 if and only if y* > 0.
This is the basis for most of the binary choice models in econometrics, and is described in further
detail below. It is the same model as the reduced form in the previous paragraph. Threshold models,
such as labor supply and reservation wages lend themselves to this approach.

Conditional Mean Function Approach

We assume that y is a binary variable, taking values 0 and 1, and formulate a priori that
Prob[y=1] = F(x), where F is any function of the index that satisfies the axioms of probability,

0 < F(x) < 1

F (x) > 0,

limz- F(z) = 0, limz+ F(z) = 1.

It follows that,
F(x) = 0 Prob[y = 0 | x] + 1 Prob[y = 1 | x]

is the conditional mean function for the observed binary y. This may be treated as a nonlinear
regression or as a binary choice model amenable to maximum likelihood estimation. This is a useful
departure point for less parametric approaches to binary choice modeling.

N5.2.2 Modeling Approaches

NLOGIT provides estimators for three approaches to formulating the binary choice models
described above:

Parametric Models Probit, Logit, Extreme etc.

Most of the material below (and the received literature) focuses on models in which the full
functional form, including the probability distribution, are defined a priori. Thus, the probit model
which forms the basis of most of the results in econometrics, is based on a latent regression model in
which the disturbances are assumed to have a normal distribution. The logit model, in contrast, can be
construed as a random utility model in which it is assumed that the random parts of the utility functions
are distributed as independent extreme value.
N5: Models for Binary Choice N-63

Semiparametric Models Maximum Score, Semiparametric Analysis

A semiparametric approach to modeling the binary choice steps back one level from the
previous model in that the specific distributional assumption is dropped, while the covariation (index
function) nature of the model is retained. Thus, the semiparametric approach analyzes the common
characteristics of the observed data which would arise regardless of the specific distribution
assumed. Thus, the semiparametric approach is essentially the conditional mean framework without
the specific distribution assumed. For the models that are supported in NLOGIT, MSCORE and
Klein and Spadys framework, it is assumed only that F(x) exists and is a smooth continuous
function of its argument which satisfies the axioms of probability. The semiparametric approach is
more general (and more robust) than the parametric approach, but it provides the analyst far less
flexibility in terms of the types of analysis of the data that may be performed. In a general sense, the
gain to formulating the parametric model is the additional precision with which statements about the
data generating process may be made. Hypothesis tests, model extensions, and analysis of, e.g.,
interactions such as marginal effects, are difficult or impossible in semiparametric settings.

Nonparametric Analysis NPREG

The nonparametric approach, as its name suggests, drops the formal modeling framework. It
is largely a bivariate modeling approach in which little more is assumed than that the probability that
y equals one depends on some x. (It can be extended to a latent regression, but this requires prior
specification and estimation, at least up to scale, of a parameter vector.) The nonparametric
approach to analysis of discrete choice is done in NLOGIT with a kernel density (largely based on
the computation of histograms) and with graphs of the implied relationship. Nonparametric analysis
is, by construction, the most general and robust of the techniques we consider, but, as a consequence,
the least precise. The statements that can be made about the underlying DGP in the nonparametric
framework are, of necessity, very broad, and usually provide little more than a crude overall
characterization of the relationship between a y and an x.

N5.2.3 The Linear Probability Model

One approach to modeling binary choice has been to ignore the special nature of the
dependent variable, and use conventional least squares. The resulting model,

Prob[yi = 1] = xi + i

has been called the linear probability model (LPM). The LPM is known to have several problems,
most importantly that the model cannot be made to satisfy the axioms of probably independently of
the particular data set in use. Some authors have documented approaches to forcing the LPM on the
data, e.g., Fomby, et al., (1984), Long (1997) and Angrist and Pischke (2009). These computations
can easily be done with the other parts of NLOGIT, but will not be pursued here.
N5: Models for Binary Choice N-64

N5.3 Grouped and Individual Data for Binary Choice Models

There are two types of data which may be analyzed. We say that the data are individual if the
measurement of the dependent variable is physically discrete, consisting of individual responses. The
familiar case of the probit model with measured 0/1 responses is an example. The data are grouped if
the underlying model is discrete but the observed dependent variable is a proportion. In the probit
setting, this arises commonly in bioassay. A number of respondents have the same values of the
independent variables, and the observed dependent variable is the proportion of them with individual
responses equal to one. Voting proportions are a common application from political science.
All of the qualitative response models estimated by NLOGIT can be estimated with either
individual or grouped data. You do not have to inform the program which type you are using; if
necessary, the data are inspected to determine which applies. The differences arise only in the way
starting values are computed and, occasionally, in the way the output should be interpreted. Cases
sometimes arise in which grouped data contain cells which are empty (proportion is zero) or full
(proportion is one). This does not affect maximum likelihood estimation and is handled internally in
obtaining the starting values. No special attention has to be paid to these cells in assembling the data set.

N5.4 Variance Normalization

In the latent regression formulation of the model, the observed data are generated by the
underlying process
y = 1 if and only if x + > 0.

The random variable, , is assumed to have a zero mean (which is a simple normalization if the
model contains a constant term). The variance is left unspecified. The data contain no information
about the variance of . Let denote the standard deviation of . The same model and data arise if
the model is written as

y = 1 if and only if (/)x + / > 0.

which is equivalent to
y = 1 if and only if x + w > 0.

where the variance of w equals one. Since only the sign of y is observed, no information about
overall scaling is contained in the data. Therefore, the parameter is not estimable; it is assumed
with no loss of generality to equal one. (In some treatments (Horowitz (1993)), the constant term in
is assumed to equal one, instead, in which case, the constant in the model is an estimator of 1/.
This is simply an alternative normalization of the parameter vector, not a substantive change in the
model.)
N5: Models for Binary Choice N-65

N5.5 The Constant Term in Index Function Models

A question that sometimes arises is whether the binary choice model should contain a
constant term. The answer is yes, unless the underlying structure of your model specifically dictates
that none be included. There are a number of useful features of the parametric models that will be
subverted if you do not include a constant term in your model:

Familiar fit measures will be distorted. Indeed, omitting the constant term can seriously
degrade the fit of a model, and will never improve it.

Certain useful test statistics, such as the overall test for the joint significance of the
coefficients, may be rendered noncomputable if you omit the constant term.

Some properties of the binary choice models, such as their ability to reproduce the average
outcome (sample proportion) will be lost.

Forcing the constant term to be zero is a linear restriction on the coefficient vector. Like any other
linear restriction, if imposed improperly, it will induce biases in the remaining coefficients.
(Orthogonality with the other independent variables is not a salvation here. Thus, putting variables
in mean deviation form does not remove the constant term from the model as it would in the linear
regression case.)
N6: Probit and Logit Models: Estimation N-66

N6: Probit and Logit Models: Estimation

N6.1 Introduction
We define models in which the response variable being described is inherently discrete as
qualitative response (QR) models. This and the next several chapters will describe two of NLOGITs
qualitative dependent variable model estimators, the probit and logit models. More extensive
treatment and technical background are given in Chapters E27-29. Several model extensions such as
models with endogenous variables, and sample selection, are treated in Chapter E29. Panel data
models for binary choice appear in Chapters E30 and E31. Semi- and nonparametric models are
documented in Chapter E32.

N6.2 Probit and Logit Models for Binary Choice

These parametric model formulations are provided as internal procedures in NLOGIT for
binary choice models. The probabilities and density functions are as follows:

Probit

'x i exp(t 2 / 2)
F= 2
dt = (xi), f = (xi)

Logit

exp(xi )
F= = (xi), f = (xi)[1 - (xi)]
1 + exp(xi )

N6.3 Commands
The basic model commands for the two binary choice models of interest here are:

PROBIT ; Lhs = dependent variable

or BLOGIT ; Rhs = regressors $

Data on the dependent variable may be either individual or proportions for both cases. When the
dependent variable is binary, 0 or 1, the model command may be LOGIT the program will inspect
the data and make the appropriate adjustments for estimation of the model.
N6: Probit and Logit Models: Estimation N-67

N6.4 Output
The binary choice models can produce a very large amount of optional output.
Computation begins with some type of least squares estimation in order to obtain starting values.
With ungrouped data, we simply use OLS of the binary variable on the regressors. If requested, the
usual regression results are given, including diagnostic statistics, e.g., sum of squared residuals, and
the coefficient estimates. The OLS estimates based on individual data are known to be inconsistent.
They will be visibly different from the final maximum likelihood estimates. For the grouped data
case, the estimates are GLS, minimum chi squared estimates, which are consistent and efficient. Full
GLS results will be shown for this case.

NOTE: The OLS results will not normally be displayed in the output. To request the display, use
; OLS in any of the model commands.

N6.4.1 Reported Estimates

Final estimates include:

logL = the log likelihood function at the maximum,

logL0 = the log likelihood function assuming all slopes are zero. If your Rhs variables do
not include one, this statistic will be meaningless. It is computed as

logL0 = n[PlogP + (1-P)log(1-P)]

where P is the sample proportion of ones.

McFaddens pseudo R2 - 1 - logL/logL0.

The chi squared statistic for testing H0: = 0 (not including the constant) and the
significance level = probability that 2 exceeds test value. The statistic is

2 = 2(logL - logL0).

Akaikes information criterion, -2(logL - K) and the normalized AIC, = -2(logL - K)/n.

The sample and model sizes, n and K.

Hosmer and Lemeshows fit statistic and associated chi squared and p value. (The Hosmer
and Lemeshow statistic is documented in Section E27.8.)

The standard statistical results, including coefficient estimates, standard errors, t ratios, p
values and confidence intervals appear next. A complete listing is given below with an example.
After the coefficient estimates are given, two additional sets of results can be requested, an analysis
of the model fit and an analysis of the model predictions.
N6: Probit and Logit Models: Estimation N-68

We will illustrate with binary logit and probit estimates of a model for visits to the doctor
using the German health care data described in Chapter E2. The first model command is

LOGIT ; Lhs = doctor

; Rhs = one,age,hhninc,hhkids,educ,married
; OLS ; Summary
; Output = IC $ (Display all variants of information criteria)
Note that the command requests the optional listing of the OLS starting values and the additional fit
and diagnostic results. The results for this command are as follows. With the exception of the table
noted below, the same results (with different values, of course) will appear for all five parametric
models. Some additional optional computations and results will be discussed later.
-----------------------------------------------------------------------------
Binomial Logit Model for Binary Choice
There are 2 outcomes for LHS variable DOCTOR
These are the OLS estimates based on the
binary variables for each outcome Y(i)=j.
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| .63280*** .05584 11.33 .0000 .52335 .74224
AGE| .00387*** .00082 4.73 .0000 .00226 .00547
HHNINC| -.08338** .03967 -2.10 .0356 -.16114 -.00563
HHKIDS| -.08456*** .01943 -4.35 .0000 -.12264 -.04647
EDUC| -.00804** .00355 -2.27 .0234 -.01500 -.00109
MARRIED| .03209 .02131 1.51 .1321 -.00968 .07387
--------+--------------------------------------------------------------------
Binary Logit Model for Binary Choice
Dependent variable DOCTOR
Log likelihood function -2121.43961
Restricted log likelihood -2169.26982
Chi squared [ 5 d.f.] 95.66041
Significance level .00000
McFadden Pseudo R-squared .0220490
Estimation based on N = 3377, K = 6
Inf.Cr.AIC = 4254.879 AIC/N = 1.260
FinSmplAIC = 4254.904 FIC/N = 1.260
Bayes IC = 4291.628 BIC/N = 1.271
HannanQuinn = 4268.018 HIC/N = 1.264
Hosmer-Lemeshow chi-squared = 17.65094
P-value= .02400 with deg.fr. = 8
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| .52240** .24887 2.10 .0358 .03463 1.01018
AGE| .01834*** .00378 4.85 .0000 .01092 .02575
HHNINC| -.38750** .17760 -2.18 .0291 -.73559 -.03941
HHKIDS| -.38161*** .08735 -4.37 .0000 -.55282 -.21040
EDUC| -.03581** .01576 -2.27 .0230 -.06669 -.00493
MARRIED| .14709 .09727 1.51 .1305 -.04357 .33774
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
N6: Probit and Logit Models: Estimation N-69

N6.4.2 Fit Measures

The model results are followed by a cross tabulation of the correct and incorrect predictions
of the model using the rule

y = 1 if F( xi ) > .5, and 0 otherwise.

For the models with symmetric distributions, probit and logit, the average predicted probability will
equal the sample proportion. If you have a quite unbalanced sample high or low proportion of ones
the rule above is likely to result in only one value, zero or one, being predicted for the Lhs variable.
You can choose a threshold different from .5 by using

; Limit = the value you wish

in your command. There is no direct counterpart to an R2 in regression. Authors very commonly

report the
log L(model)
Pseudo R2 = 1 .
log L(constants only)

We emphasize, this is not a proportion of variation explained. Moreover, as a fit measure, it has some
peculiar features. Note, for our example above, it is 1 - (-17673.10)/(-18019.55) = 0.01923, yet with
the standard prediction rule, the estimated model predicts almost 63% of the outcomes correctly.

The next set of results examines the success of the prediction rule

Predict yi = 1 if Pi > P* and 0 otherwise

where P* is a defined threshold probability. The default value of P* is 0.5, which makes the
prediction rule equivalent to Predict yi = 1 if the model says the predicted event yi = 1 | xi is more
likely than the complement, yi = 0 | xi. You can change the threshold from 0.5 to some other value
with
; Limit = your P*

This table computes a variety of conditional and marginal proportions based on the results using the
defined prediction rule. For examples, the 66.697% equals (1482/2222)100% while the 66.727% is
(1482/2221)100%.

-----------------------------------------------------------------------
Analysis of Binary Choice Model Predictions Based on Threshold = .5000
-----------------------------------------------------------------------
Prediction Success
-----------------------------------------------------------------------
Sensitivity = actual 1s correctly predicted 66.697%
Specificity = actual 0s correctly predicted 35.931%
Positive predictive value = predicted 1s that were actual 1s 66.727%
Negative predictive value = predicted 0s that were actual 0s 35.931%
Correct prediction = actual 1s and 0s correctly predicted 56.174%
-----------------------------------------------------------------------
N6: Probit and Logit Models: Estimation N-71

-----------------------------------------------------------------------
Prediction Failure
-----------------------------------------------------------------------
False pos. for true neg. = actual 0s predicted as 1s 63.983%
False neg. for true pos. = actual 1s predicted as 0s 33.258%
False pos. for predicted pos. = predicted 1s actual 0s 33.273%
False neg. for predicted neg. = predicted 0s actual 1s 63.983%
False predictions = actual 1s and 0s incorrectly predicted 43.767%
-----------------------------------------------------------------------

N6.4.3 Covariance Matrix

The estimated asymptotic covariance matrix of the coefficient estimator is not automatically
displayed it might be huge. You can request a display with

; Covariance

If the matrix is not larger than 55, it will be displayed in full. If it is larger, an embedded object that
holds the matrix will show, instead. By double clicking the object, you can display the matrix in a
window. An example appears in Figure N6.1 below.

Figure N6.1 Embedded Matrix

N6: Probit and Logit Models: Estimation N-72

N6.4.4 Retained Results and Generalized Residuals

The results saved by the binary choice models are:

Matrices: b = estimate of (also contains for the Burr model)

varb = asymptotic covariance matrix

Scalars: kreg = number of variables in Rhs

nreg = number of observations
logl = log likelihood function

Variables: logl_obs = individual contribution to log likelihood

score_fn = generalized residual. See Section E27.9.

Last Model: b_variables

Last Function: Prob(y = 1 | x) = F(bx). This varies with the model specification.

Models that are estimated using maximum likelihood automatically create a variable named
logl_obs, that contains the contribution of each individual observation to the log likelihood for the
sample. Since the log likelihood is the sum of these terms, you could, in principle, recover the
overall log likelihood after estimation with

CALC ; List ; Sum(logl_obs) $

The variable can be used for certain hypothesis tests, such as the Vuong test for nonnested models.
The following is an example (albeit, one that appears to have no real power) that applies the Vuong
test to discern whether the logit or probit is a preferable model for a set of data:

LOGIT ;$
CREATE ; lilogit = logl_obs $
PROBIT ;$
CREATE ; liprobit = logl_obs ; di = liprobit - lilogit $
CALC ; List ; vtest = Sqr(n) * Xbr(di) / Sdv(di) $

The generalized residuals in a parametric binary choice model are the derivatives of the log
likelihood with respect to the constant term in the model. These are sometimes used to check the
specification of the model (see Chesher and Irish (1987)). These are easy to compute for the models
listed above in each case, the generalized residual is the derivative of the log of the probability with
respect to x. This is computed internally as part of the iterations, and kept automatically in your
data area in a variable named score_fn. The formulas for the generalized residuals are provided in
Section E27.12 with the technical details for the models. For example, you can verify the
convergence of the estimator to a maximum of the log likelihood with the instruction

CALC ; List ; Sum(score_fn) $

N6: Probit and Logit Models: Estimation N-73

N6.5 Robust Covariance Matrix Estimation

The preceding describes a covariance estimator that accounts for a specific, observed aspect
of the data. The concept of the robust covariance matrix is that it is meant to account for
hypothetical, unobserved failures of the model assumptions. The intent is to produce an asymptotic
covariance matrix that is appropriate even if some of the assumptions of the model are not met. (It is
an important, but infrequently discussed issue whether the estimator, itself, remains consistent in the
presence of these model failures that is, whether the so called robust covariance matrix estimator is
being computed for an inconsistent estimator.) (Chapter R10 provides general discussion of robust
covariance matrix estimation.)

N6.5.1 The Sandwich Estimator

A robust covariance matrix estimator adjusts the estimated asymptotic covariance matrix for
possible misspecification in the model which leaves the MLE consistent but the estimated asymptotic
covariance matrix incorrectly computed. One example would be a binary choice model with
unspecified latent heterogeneity. A frequent adjustment for this case is the sandwich estimator,
which is the choice based sampling estimator suggested above with weights equal to one. (This
suggests how it could be computed.) The desired matrix is

1 1
n 2 log Fi n log Fi log Fi n 2 log Fi
Est.Asy.Var = i 1 ' i 1

= i 1= =

Three ways to obtain this matrix are

; Wts = one ; Choice based sampling

or ; Robust
or ; Cluster = 1

The computation is identical in all cases. (As noted below, the last of them will be slightly larger, as
it will be multiplied by n/(n-1).)

N6.5.2 Clustering
A related calculation is used when observations occur in groups which may be correlated.
This is rather like a panel; one might use this approach in a random effects kind of setting in which
observations have a common latent heterogeneity. The parameter estimator is unchanged in this
case, but an adjustment is made to the estimated asymptotic covariance matrix. The calculation is
done as follows: Suppose the n observations are assembled in G clusters of observations, in which
the number of observations in the ith cluster is ni. Thus,

G
i =1
ni = n.
N6: Probit and Logit Models: Estimation N-74

Let the observation specific gradients and Hessians be

log Lij
gij =

2 log Lij
Hij = .
'

The uncorrected estimator of the asymptotic covariance matrix based on the Hessian is

( )
1

G ni
VH = -H-1 = =i 1 =j 1
H ij

Estimators for some models such as the Burr model will use the BHHH estimator, instead. In
general,

( )
1
i 1=
j 1 gij gij
G ni
VB = =

Let V be the estimator chosen. Then, the corrected asymptotic covariance matrix is

Est.Asy.Var = V
G G
i 1 =
G 1
=
ni
j 1 (

g ij =
ni
)(
j 1
g ij ) V
Note that if there is exactly one observation per cluster, then this is G/(G-1) times the sandwich
estimator discussed above. Also, if you have fewer clusters than parameters, then this matrix is
singular it has rank equal to the minimum of G and K, the number of parameters.
This procedure is described in greater detail in Section E27.5.3. To request the estimator,
your command must include

; Cluster = specification

where the specification is either the fixed value if all the clusters are the same size, or the name of an
identifying variable if the clusters vary in size. Note, this is not the same as the variable in the Pds
function that is used to specify a panel. The cluster specification must be an identifying code that is
specific to the cluster. For example, our health care data used in our examples is an unbalanced
panel. The first variable is a family id, which we will use as follows

; Cluster = id

The results below demonstrate the effect of this estimator. Three sets of estimates are given. The
first are the original logit estimates that ignore the cross observation correlations. The second use the
correction for clustering. The third is a panel data estimator the random effects estimator described
in Chapter E30 that explicitly accounts for the correlation across observations. It is clear that the
different treatments change the results noticeably.
N6: Probit and Logit Models: Estimation N-75

Uncorrected covariance matrix:

Cluster corrected covariance matrix:

+---------------------------------------------------------------------+
| Covariance matrix for the model is adjusted for data clustering. |
| Sample of 27326 observations contained 7293 clusters defined by |
| variable ID which identifies by a value a cluster ID. |
+---------------------------------------------------------------------+
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| -.20205 .12997 -1.55 .1200 -.45678 .05269
AGE| .01935*** .00176 11.00 .0000 .01590 .02280
EDUC| -.02477*** .00811 -3.05 .0023 -.04067 -.00888
MARRIED| .12023*** .04556 2.64 .0083 .03093 .20953
HHNINC| -.21388** .09276 -2.31 .0211 -.39568 -.03209
HHKIDS| -.24879*** .03842 -6.48 .0000 -.32409 -.17349
FEMALE| .58305*** .03744 15.57 .0000 .50967 .65644
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Random effects estimates:

N6.5.3 Stratification and Clustering

The clustering estimator is extended to include stratum level grouping, where a stratum
includes one or more clusters, and weighting to allow finite population correction. We suppose that
there are a total of S strata in the sample. Each stratum, s, contains Cs clusters. The number of
observations in a cluster is Ncs. Neglecting the weights for the moment,

Variance estimator = VGV

V = the inverse of conventional estimator of the Hessian
G = s =1 ws G s
S

Gs = ( Cs
c =1 )
g cs gcs - C1s g s gs

g s = c =1
sC
g cs
g cs = i=1cs wics g ics
N

where gics is the derivative of the contribution to the log likelihood of individual i in cluster c in
stratum s. The remaining detail in the preceding is the weighting factor, ws. The stratum weight is
computed as
ws = fs hs d
where fs = 1 or a finite population correction, 1 - Cs/Cs* where Cs* is the true
number of clusters in stratum s, where Cs* > Cs.
hs = 1 or Cs/(Cs - 1)
d = 1 or (N-1)/(N-K) where N is the total number of observations in the
entire sample and K is the number of parameters (rows in V).
Use
; Cluster = the number of observations in a cluster (fixed) or the name of a
stratification variable which gives the cluster an identification. This
is the setup that is described above.
; Stratum = the number of observations in a stratum (fixed) or the name of a
stratification variable which gives the stratum an identification
; Wts = the name of the usual weighting variable for model estimation if
weights are desired. This defines wics.
; FPC = the name of a variable which gives the number of clusters in the
stratum. This number will be the same for all observations in a
stratum repeated for all clusters in the stratum. If this number is
the same for all strata, then just give the number.
; Huber Use this switch to request hs. If omitted, hs = 1 is used.
; DFC Use this switch to request the use of d given above. If omitted,
d = 1 is used.

Further details on this estimator may be found in Section E30.3 and Section R10.3.
N6: Probit and Logit Models: Estimation N-77

N6.6 Analysis of Partial Effects

Partial effects in a binary choice model are
E[ y | x] F (x) dF (x)
= = = F(x) = f(x)
x x d (x)

That is, the vector of marginal effects is a scalar multiple of the coefficient vector. The scale factor,
f(x), is the density function, which is a function of x. This function can be computed at any data
vector desired. Average partial effects are computed by averaging the function over the sample
observations. The elasticity of the probability is

lo Eg[ y | x] xk E[ y | x] xk
= = marginal effect
lo xgk E[ y | x] xk E[ y | x ]

When the variable in x that is changing in the computation is a dummy variable, the
derivative approach to estimating the marginal effect is not appropriate. An alternative which is
closer to the desired computation for a dummy variable, that we denote z, is
Fz = Prob[y = 1 | z = 1] - Prob[y = 1 | z = 0]
= F(x + z | z = 1) - F(x + z | z = 0)
= F(x + ) - F(x).
NLOGIT examines the variables in the model and makes this adjustment automatically.
There are two programs in NLOGIT for obtaining partial effects for the binary choice (and
most other) models, the built in computation provided by the model command and the PARTIAL
EFFECTS command. Examples of both are shown below.
The LOGIT, PROBIT, etc. commands provide a built in, basic computation for partial
effects. You can request the computation to be done automatically by adding
; Partial Effects (or ; Marginal Effects)
to your command. The results below are produced for logit model in the earlier example. The
standard errors for the partial effects are computed using the delta method. See Section E27.12 for
technical details on the computation. The results reported are the average partial effects.
-----------------------------------------------------------------------------
Partial derivatives of E[y] = F[*] with
respect to the vector of characteristics
Average partial effects for sample obs.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00402*** .26013 4.92 .0000 .00242 .00562
HHNINC| -.08666** -.05857 -2.22 .0267 -.16331 -.01001
HHKIDS| -.08524*** -.05021 -4.33 .0000 -.12382 -.04667 #
EDUC| -.00779** -.13620 -2.24 .0252 -.01461 -.00097
MARRIED| .03279 .03534 1.52 .1288 -.00952 .07510 #
--------+--------------------------------------------------------------------
# Partial effect for dummy variable is E[y|x,d=1] - E[y|x,d=0]
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
N6: Probit and Logit Models: Estimation N-78

The equivalent PARTIAL EFFECTS command, which would immediately follow the LOGIT
command, would be

PARTIAL EFFECTS ; Effects: age / hhninc / hhkids / educ / married

; Summary $

---------------------------------------------------------------------
Partial Effects for Probit Probability Function
Partial Effects Averaged Over Observations
* ==> Partial Effect for a Binary Variable
---------------------------------------------------------------------
Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
---------------------------------------------------------------------
AGE .00402 .00082 4.92 .00242 .00562
HHNINC -.08666 .03911 2.22 -.16331 -.01001
* HHKIDS -.08524 .01968 4.33 -.12382 -.04667
EDUC -.00779 .00348 2.24 -.01461 -.00097
* MARRIED .03279 .02159 1.52 -.00952 .07510
---------------------------------------------------------------------

The second method provides a variety of options for computing partial effects under various
scenarios, plotting the effects, etc. See Chapter R11 for further details.

NOTE: If your model contains nonlinear terms in the variables, such as age^2 or interaction terms
such as age*female, then you must use the PARTIAL EFFECTS command to obtain partial effects.
The built in routine in the command, ; Partial Effects, will not give the correct answers for variables
that appear in nonlinear terms.

N6.6.1 The Krinsky and Robb Method

An alternative to the delta method described above that is sometimes advocated is the
Krinsky and Robb method. By this device, we have our estimate of the model coefficients, b, and
the estimated asymptotic covariance matrix, V. The marginal effects are computed as a function of b
and the vector of means of the sample data, x , say gk(b, x ) for the kth variable. The Krinsky and
Robb technique involves sampling R draws from the asymptotic normal distribution of the estimator,
computing the function with these R draws, then computing the empirical variance. This is not done
automatically by the binary choice estimator, but you can easily do the computation using the
WALD command. For an example, we will use this method to compute the marginal effects for two
variables in the logit model estimated earlier. The program would be

NAMELIST ; x = one,age,hhninc,hhkids,educ,married $
LOGIT ; Lhs = doctor ; Rhs = x ; Partial Effects $
MATRIX ; xbar = Mean(x) $
CALC ; kx = Col(x) ; Ran(12345) $
WALD ; Start = b ; Var = varb ; Labels = kx_b
; Fn1 = b2 * Lgd(b1'xbar)
; Fn2 = b3 * Lgd(b1'xbar)
; K&R ; Pts = 2000 $
N6: Probit and Logit Models: Estimation N-79

-----------------------------------------------------------------------------
WALD procedure. Estimates and standard errors
for nonlinear functions and joint test of
nonlinear restrictions.
Wald Statistic = 27.72506
Prob. from Chi-squared[ 2] = .00000
Krinsky-Robb method used with 2000 draws
Functions are computed at means of variables
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
Fncn(1)| .00409*** .00084 4.85 .0000 .00244 .00575
Fncn(2)| -.08694** .03913 -2.22 .0263 -.16363 -.01025
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
---------------------------------------------------------------------
Partial Effects for Probit Probability Function
Partial Effects Averaged Over Observations
* ==> Partial Effect for a Binary Variable
---------------------------------------------------------------------
Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
---------------------------------------------------------------------
AGE .00402 .00082 4.92 .00242 .00562
HHNINC -.08666 .03911 2.22 -.16331 -.01001
---------------------------------------------------------------------

There is a second sources of difference between the Krinsky and Robb estimates and the delta
method results that follow: The Krinsky and Robb procedure is based on the means of the data while
the delta method averages the partial effects over the observations. It is possible to perform the
K&R iteration at every observation to reproduce the APE calculations by adding ; Average to the
WALD command. The results below illustrate.

--------+--------------------------------------------------------------------
Fncn(1)| .00407*** .00085 4.80 .0000 .00241 .00573
Fncn(2)| -.08673** .03929 -2.21 .0273 -.16373 -.00973
--------+--------------------------------------------------------------------

We do not recommend this as a general procedure, however. It is enormously time consuming and
does not produce a more accurate result.

Estimating Marginal Effects by Strata

Marginal effects may be calculated for indicated subsets of the data by using

; Margin = variable
where variable is the name of a variable coded 0,1,... which designates up to 10 subgroups of the
data set, in addition to the full data set. For example, a common application would be
; Margin = sex
in which the variable sex is coded 0 for men and 1 for women (or vice versa). The variable used in
this computation need not appear in the model; it may be any variable in the data set.
N6: Probit and Logit Models: Estimation N-80

For example, using our logit model above, we now compute marginal effects separately for
men and women:

LOGIT ; Lhs = doctor

; Rhs = one,age,hhninc,hhkids,educ,married
; Margin = female $

-----------------------------------------------------------------------------
Binary Logit Model for Binary Choice
Dependent variable DOCTOR
Log likelihood function -2121.43961
Restricted log likelihood -2169.26982
Chi squared [ 5 d.f.] 95.66041
Significance level .00000
McFadden Pseudo R-squared .0220490
Estimation based on N = 3377, K = 6
Inf.Cr.AIC = 4254.879 AIC/N = 1.260
Hosmer-Lemeshow chi-squared = 17.65094
P-value= .02400 with deg.fr. = 8
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| .52240** .24887 2.10 .0358 .03463 1.01018
AGE| .01834*** .00378 4.85 .0000 .01092 .02575
HHNINC| -.38750** .17760 -2.18 .0291 -.73559 -.03941
HHKIDS| -.38161*** .08735 -4.37 .0000 -.55282 -.21040
EDUC| -.03581** .01576 -2.27 .0230 -.06669 -.00493
MARRIED| .14709 .09727 1.51 .1305 -.04357 .33774
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
Partial derivatives of probabilities with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Observations used are FEMALE=0
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00414*** .26343 4.84 .0000 .00247 .00582
HHNINC| -.08756** -.06038 -2.18 .0291 -.16619 -.00893
HHKIDS| -.08714*** -.05161 -4.34 .0000 -.12645 -.04783 #
EDUC| -.00809** -.14612 -2.27 .0234 -.01509 -.00109
MARRIED| .03351 .03549 1.50 .1334 -.01025 .07728 #
--------+--------------------------------------------------------------------
# Partial effect for dummy variable is E[y|x,d=1] - E[y|x,d=0]
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N6: Probit and Logit Models: Estimation N-81

-----------------------------------------------------------------------------
Partial derivatives of probabilities with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Observations used are FEMALE=1
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00404*** .26337 4.88 .0000 .00242 .00567
HHNINC| -.08545** -.05555 -2.18 .0290 -.16217 -.00873
HHKIDS| -.08519*** -.04911 -4.33 .0000 -.12379 -.04659 #
EDUC| -.00790** -.13086 -2.28 .0225 -.01468 -.00111
MARRIED| .03279 .03550 1.50 .1345 -.01015 .07573 #
--------+--------------------------------------------------------------------
# Partial effect for dummy variable is E[y|x,d=1] - E[y|x,d=0]
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
Partial derivatives of probabilities with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Observations used are All Obs.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00410*** .26352 4.86 .0000 .00244 .00575
HHNINC| -.08660** -.05811 -2.18 .0291 -.16436 -.00884
HHKIDS| -.08626*** -.05044 -4.34 .0000 -.12524 -.04727 #
EDUC| -.00800** -.13893 -2.27 .0230 -.01490 -.00110
MARRIED| .03318 .03551 1.50 .1339 -.01021 .07658 #
--------+--------------------------------------------------------------------
# Partial effect for dummy variable is E[y|x,d=1] - E[y|x,d=0]
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
+-------------------------------------------+
| Marginal Effects for Logit |
+----------+----------+----------+----------+
| Variable | FEMALE=0 | FEMALE=1 | All Obs. |
+----------+----------+----------+----------+
| AGE | .00414 | .00404 | .00410 |
| HHNINC | -.08756 | -.08545 | -.08660 |
| HHKIDS | -.08714 | -.08519 | -.08626 |
| EDUC | -.00809 | -.00790 | -.00800 |
| MARRIED | .03351 | .03279 | .03318 |
+----------+----------+----------+----------+

The computation using the built in estimator is done at the strata means of the data. The
computation can be done by averaging across observations using the PARTIAL EFFECTS (or just
PARTIALS) command. For example, the corresponding results for the income variable are
obtained with

PARTIAL EFFECTS ; Effects: hhninc @ female=0,1$

N6: Probit and Logit Models: Estimation N-82

---------------------------------------------------------------------
Partial Effects Analysis for Logit Probability Function
---------------------------------------------------------------------
Effects on function with respect to HHNINC
Results are computed by average over sample observations
Partial effects for continuous HHNINC computed by differentiation
Effect is computed as derivative = df(.)/dx
---------------------------------------------------------------------
df/dHHNINC Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
---------------------------------------------------------------------
Subsample for this iteration is FEMALE = 0 Observations: 1812
APE. Function -.08585 .03925 2.19 -.16278 -.00892
---------------------------------------------------------------------
Subsample for this iteration is FEMALE = 1 Observations: 1565
APE. Function -.08355 .03820 2.19 -.15841 -.00868

Examining the Effect of a Variable Over a Range of Values

Another useful device is a plot of the probability (conditional mean) over the range of a
variable of interest either holding other variables at their means, or averaging over the sample values.
The figure below does this for the income variable in the logit model for doctor visits. The figure is
plotted for hhkids = 1 and hhkids = 0 to show the two effects. We see that the probability falls with
increased income, and also for individuals in households in which there are children.

SIMULATE ; Scenario: & hhninc = 0(.05).5 | hhkids=0,1 ; plot$

Figure N6.2 Probabilities Varying with Income

N6: Probit and Logit Models: Estimation N-83

N6.7 Simulation and Analysis of a Binary Choice Model

This section describes a procedure that is used with all of the parametric models described
above. It is used for two specific analyses. This procedure allows you to analyze the predictions
made by a binary choice when the variables in the model are changed. The analysis is provided in
two parts:

Change specific variables in the model by a prescribed amount, and examine the changes in
the model predictions.

Vary a particular variable over a range of values and examine the predicted probabilities
when other variables are held fixed at their means.

This program is available for the six parametric binary choice models: probit, logit, Gompertz,
complementary log log, arctangent and Burr. The probit and logit models may also be
heteroscedastic. The routine is accessed as follows. First fit the model as usual. Then, use the
identical model specification as shown below with the specifications indicated:

(MODEL) ; Lhs = ... ; Rhs = ... $

Then

BINARY CHOICE ; Lhs = (the same) ; Rhs = (the same) ; ... (also the same)
; Model = Probit, Logit, Gompertz, Comploglog or Burr
; Start = B (from the preceding model)

(optional, the value to use for predicting Lhs = 1, default = .5)

; Threshold = P*

(optional) ; Scenario: variable operation = value /

(variable operation = value) / ... (may be repeated)

(optional) ; Plot: variable (lower limit, upper limit) $

In the ; Plot specification, the limits part may be omitted, in which case the range of the variable is
used. This will replicate for the one variable the computation of the program in the preceding section.
The ; Scenario section computes all predicted probabilities for the model using the sample
data and the estimated parameters. Then, it recomputes the probabilities after changing the variables
in the way specified in the scenarios. (The actual data are not changed the modification is done
while the probabilities are computed.) The scenarios are of the form

variable operation = value

such as hhkids + = 1 (effect of additional kids in the home)

or hhninc * = 1.1 (effect of a 10% increase in income)
N6: Probit and Logit Models: Estimation N-84

You may provide multiple scenarios. They are evaluated one at a time. This is an extension of the
computation of marginal effects.
In the example below, we extend the analysis of marginal effect in the logit model used
above. The scenario examined is the impact of every individual having one more child in the
household then having a 50% increase in income. (Since hhkids is actually a dummy variable for the
presence of kids in the home, increasing it by one is actually an ambiguous experiment. We retain it
for the sake of a simple numerical example.) The plot shows the effect of income on the probability
of visiting the doctor, according to the model.

NAMELIST ; x = one,age,educ,married,hhninc,hhkids $
LOGIT ; Lhs = doctor ; Rhs = x $
BINARY ; Lhs = doctor ; Rhs = x
; Model = Logit ; Start = b
; Scenario: hhkids + = 1 / hhninc * = 1.5 $

The model output is omitted for brevity.

+-------------------------------------------------------------+
|Scenario 1. Effect on aggregate proportions. Logit Model |
|Threshold T* for computing Fit = 1[Prob > T*] is .50000 |
|Variable changing = HHKIDS , Operation = +, value = 1.000 |
+-------------------------------------------------------------+
|Outcome Base case Under Scenario Change |
| 0 33 = .98% 831 = 24.61% 798 |
| 1 3344 = 99.02% 2546 = 75.39% -798 |
| Total 3377 = 100.00% 3377 = 100.00% 0 |
+-------------------------------------------------------------+
+-------------------------------------------------------------+
|Scenario 2. Effect on aggregate proportions. Logit Model |
|Threshold T* for computing Fit = 1[Prob > T*] is .50000 |
|Variable changing = HHNINC , Operation = *, value = 1.500 |
+-------------------------------------------------------------+
|Outcome Base case Under Scenario Change |
| 0 33 = .98% 106 = 3.14% 73 |
| 1 3344 = 99.02% 3271 = 96.86% -73 |
| Total 3377 = 100.00% 3377 = 100.00% 0 |
+-------------------------------------------------------------+

The SIMULATE command used in the example provides a greater range of scenarios that
one can examine to see the effects of changes in a variable on the overall prediction of the binary
choice model. The advantage of the BINARY command used here is that for straightforward
scenarios, it can be used to provide useful tables such as the ones shown above.
N6: Probit and Logit Models: Estimation N-85

N6.8 Using Weights and Choice Based Sampling

The ; Wts option can always be used in the usual fashion for the probit and logit models.
However, in the grouped data case, a somewhat different treatment may be desired. The
observations may consist of pi, xi and ni, where ni is the number of replications used to obtain pi. The
usual treatment assumes that pi is a sample of one from a distribution with variance pi(1-pi). But pi is
more precise than this. Its unconditional variance is pi(1-pi)/ni. Thus, the efficiency of the estimator
of is underestimated. There is also an inherent heteroscedasticity which must be accounted for.
The heteroscedasticity due to pi is built into the likelihood function. But if your proportions are
based on different numbers of observations, the variances will differ correspondingly. This can be
accounted for by including ni as a weighting variable. Since the weighting procedure automatically
scales the weights so that they sum to the sample size, which would be inappropriate here, it is
necessary to modify the specification. Use

; Wts = variable, Noscale

or just ; Wts = variable, N

to prevent the automatic scaling. This produces a replication of the observations, which is what is
needed for grouped data.
This usage often has the surprising side effect of producing implausibly small standard
errors. Consider, for example, using unscaled weights for statewide observations on election
outcomes. The implication of the Noscale parameter is that each proportion represents millions of
observations. Once again, this is an issue that must be considered on a case by case basis.

Choice Based Sampling

In some individual data cases, the data are deliberately sampled so that one or the other
outcome is overrepresented in the sample. For example, suppose that in a binary response setting,
the true proportion of ones in the population is .05 and the true proportion of zeros is .95. One might
over sample the ones in order to learn more about the decision process. However, some account
must be taken of this fact in the estimation since it obviously will impart some biases. The following
assumes that these population proportions are known, which must be true to apply the technique.
We use the assumed values to demonstrate the technique; other values would be substituted in the
analogous manner.
The general principle involved is as follows: Suppose that the sample is deliberately drawn
so that it contains 50% ones and 50% zeros while it is known that the true proportions in the
population are .05 and .95. Then, the ones are overrepresented by a factor of .50/.05 = 10 while the
zeros are underrepresented by a factor of .50/.95 = .5263. To obtain the right mix in the sample, it
is necessary to scale down the ones by a factor of .05/.50 = .1 and scale up the zeros by a factor of
.95/.50 = 1.9. This can be handled simply by using a weighting variable during estimation to
reweight the observations. The precise method of doing so is discussed below. (See, also, Manski
and McFadden (1981).)
N6: Probit and Logit Models: Estimation N-86

An additional change must be made in order to obtain the correct asymptotic covariance
matrix for the estimates. Let H be the Hessian of the (weighted) log likelihood, i.e., the usual
estimator for the variance matrix of the estimates, and let GG be the summed outer products of the
first derivatives of the (weighted) log likelihood. (This is the inverse of the BHHH estimator.)
Manski and McFadden (1981) show that the appropriate covariance matrix for the estimates is

V = (-H)-1 GG (-H)-1.

The computation of the weighted estimator and the corrected asymptotic covariance is handled
automatically in NLOGIT by the following estimation programs:

univariate probit, logit, extreme value and Gompertz model,

bivariate probit model with and without sample selection,
binomial and multinomial logit models,
discrete choice (conditional logit).

With the exception of the last of these, you request the estimator with

; Wts = name of weighting variable

; Choice Based

The weighting variable can usually be created with a single command. For example, the weighting
variable suggested in the example used above would be specified as follows:

CREATE ; wt = (.95/.50)(y = 0) + (.05/.50)(y = 1) $

For models that do not appear in the list above, there is a general way to do this kind of
computation. How the weights are obtained will be specific to your application if you wish to do
this. To compute the counterpart to V above, you can do the following:

CREATE ; wt = the desired weighting variable $

Model name ; ... specification of the model
; Wts = the weighting variable
; Cluster = 1 $

Since the cluster estimator computes a sandwich estimator, we need only trick the program by
specifying that each cluster contains one observation. The observations in the parts will be weighted
by the variable given, so this is exactly what is needed.
N6: Probit and Logit Models: Estimation N-87

N6.9 Heteroscedasticity in Probit and Logit Models

The univariate choice model with multiplicative heteroscedasticity is

yi* = xi + i,
yi = 1 if yi* > 0 and yi = 0 if yi* 0,
2
i ~ Normal or Logistic with mean 0, and variance [exp(wi)]

(In the logistic case, the true variance is scaled by 2/3.)

NOTE: These heteroscedasticity models require individual data.

Request the model with heteroscedasticity with

PROBIT ; Lhs = dependent variable

or LOGIT ; Rhs = regressors in x
; Rh2 = list of variables in w
; Heteroscedasticity (or just ; Het) $

Other options and specifications for this model are the same as the basic model. Two general
options that are likely to be useful are

; Keep = name to retain predicted values

; Prob = name to retain fitted probabilities

and the controls of the iterations and the amount of output.

NOTE: Do not include one in the Rh2 list. A constant in is not identified.

This model differs from the basic model only in the presence of the variance term. The
output for this model is also the same, with the addition of the coefficients for the variance term. The
initial OLS results are computed without any consideration of the heteroscedasticity, however.
Since the log likelihood for this model, unlike the basic model, is not globally concave, the
default algorithm is BFGS, not Newtons method.
For purposes of hypothesis testing and imposing restrictions, the parameter vector is

= [1,...,K,1,...,L].

If you provide your own starting values, give the right number of values in exactly this order.
You can also use WALD and ; Test: to test hypotheses about the coefficient vector. Finally,
you can impose restrictions with

; Rst = ....
or ; CML: restrictions...
N6: Probit and Logit Models: Estimation N-88

NOTE: In principle, you can impose equality restrictions across the elements of and with
; Rst = ..., (i.e., force an element in to equal one in ), but the results are unlikely to be satisfactory.
Implicitly, the variables involved are of different scales, and this will place a rather stringent
restriction on the model.

Use
; Robust
or ; Cluster = id variable or group size

to request the sandwich style robust covariance matrix estimator or the cluster correction.

NOTE: There is no robust covariance matrix for the logit or probit model that is robust to
heteroscedasticity, in the form of the White estimator for the linear model. In order to accommodate
heteroscedasticity in a binary choice model, you must model it explicitly.

NOTE: ; Maxit = 0 provides an easy way to test for heteroscedasticity with an LM test.

To test the hypothesis of homoscedasticity against the specification of this more general
model, the following template can be used: (The model may be LOGIT if desired.)

NAMELIST ; x = ... the Rhs of the probit model

; w = ... the Rh2 of the heteroscedasticity model $
CALC ; m = Col(w) $
PROBIT ; Lhs = ...
; Rhs = x $
PROBIT ; Lhs = ...
; Rhs = x
; Rh2 = w ; Het
; Start = b, m_0
; Maxit = 0 $

This produces an LM statistic and (superfluously) reproduces the restricted model.

The results that are saved automatically are the same as for the basic model, that is, b, varb,
and the scalars. In this case, b will contain the full set of estimates, with the slopes followed by the
variance parameters, i.e., [b,c]. The Last Model labels for the WALD command are [b_variable,
c_variable].
We note, this model may be rather weakly identified by the observed data, unless they are
plentiful and the model is sharply consistent with the data. In fact, identification is not a problem,
and the model is straightforward to estimate. But, one could argue that the specification problem
addressed by this model is one of functional form rather than heteroscedasticity. That is, the model
specification is arguably indistinguishable from one with a peculiar kind of conditional mean
function, which, in turn, could be standing in for some other, perhaps reasonable, albeit nonlinear
model. In addition, it is common for the estimated standard errors that are computed for this model
to be quite large, as a result of a kind of multicollinearity the high correlation of the derivatives of
the log likelihood.
N6: Probit and Logit Models: Estimation N-89

Application

To illustrate the model, we have refit the specification of the previous section with a
variance term of the form Var[] = [exp(1female + 2working )]2. Since both of these are binary
variables, this is equivalent to a groupwise heteroscedasticity model. The variances are 1.0, exp(21),
exp(22) and exp(21+22) for the four groups. We have fit the original model without
heteroscedasticity first. The second LOGIT command carries out the LM test of heteroscedasticity.
The third command fits the full heteroscedasticity model.

INCLUDE ; New ; year = 1994 $

NAMELIST ; x = one,age,educ,married,hhninc,hhkids,female $
LOGIT ; Lhs = doctor ; Rhs = x
; Partial Effects $
NAMELIST ; w = female,working $
CALC ; m = Col(w) $
LOGIT ; Lhs = doctor ; Rhs = x
; Heteroscedasticity ; Rh2 = w
; Start = b,m_0
; Maxit = 0 $
LOGIT ; Lhs = doctor ; Rhs = x
; Heteroscedasticity ; Rh2 = w
; Partial Effects $
PARTIALS ; Effects: female $

The model results have been rearranged in the listing below to highlight the differences in the
models. Also, for convenience, some of the results have been omitted.

Binary Logit Model for Binary Choice

Dependent variable DOCTOR
Log likelihood function -2085.33796

The LM statistic is included in the initial diagnostic statistics for the second model estimated.

LM Stat. at start values 3.11867

LM statistic kept as scalar LMSTAT

These are the results for the model with homoscedastic disturbances.

Inf.Cr.AIC = 4184.676 AIC/N = 1.239

Restricted log likelihood -2169.26982
McFadden Pseudo R-squared .0386913

These are the coefficient estimates for the two models.

N6: Probit and Logit Models: Estimation N-90

Homoscedastic disturbances
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| .14726 .25460 .58 .5630 -.35173 .64626
AGE| .01643*** .00384 4.28 .0000 .00891 .02395
EDUC| -.01965 .01608 -1.22 .2219 -.05117 .01188
MARRIED| .15536 .09904 1.57 .1167 -.03875 .34947
HHNINC| -.39474** .17993 -2.19 .0282 -.74739 -.04208
HHKIDS| -.41534*** .08866 -4.68 .0000 -.58911 -.24157
FEMALE| .64274*** .07643 8.41 .0000 .49295 .79253
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Heteroscedastic disturbances
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| .12927 .30739 .42 .6741 -.47320 .73174
AGE| .02036*** .00501 4.06 .0000 .01053 .03018
EDUC| -.02913 .01984 -1.47 .1421 -.06803 .00976
MARRIED| .19969 .12639 1.58 .1141 -.04803 .44742
HHNINC| -.36965* .22169 -1.67 .0954 -.80414 .06485
HHKIDS| -.53029*** .12783 -4.15 .0000 -.78083 -.27974
FEMALE| 1.24685*** .45754 2.73 .0064 .35009 2.14361
|Disturbance Variance Terms
FEMALE| .44128* .25946 1.70 .0890 -.06725 .94982
WORKING| .08459 .10082 .84 .4014 -.11300 .28219
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

These are the marginal effects for the two models. Note that the effects are also computed for the
terms in the variance function. The explanatory text indicates the treatment of variables that appear
in both the linear part and the exponential part of the probability.

Homoscedastic disturbances
-----------------------------------------------------------------------------
Partial derivatives of E[y] = F[*] with
respect to the vector of characteristics
Average partial effects for sample obs.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00352*** -.00205 4.29 .0000 .00191 .00512
EDUC| -.00421 .00058 -1.22 .2218 -.01096 .00254
MARRIED| .03357 -.00031 1.56 .1194 -.00868 .07582 #
HHNINC| -.08452** .00044 -2.20 .0282 -.16000 -.00905
HHKIDS| -.09058*** .00027 -4.65 .0000 -.12876 -.05240 #
FEMALE| .13842*** -.00119 8.60 .0000 .10687 .16997 #
--------+--------------------------------------------------------------------
# Partial effect for dummy variable is E[y|x,d=1] - E[y|x,d=0]
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Heteroscedastic disturbances
-----------------------------------------------------------------------------
Partial derivatives of probabilities with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Effects are the sum of the mean and var-
iance term for variables which appear in
both parts of the function.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
AGE| .00337*** .20980 3.84 .0001 .00165 .00509
EDUC| -.00482 -.08104 -1.47 .1404 -.01123 .00159
MARRIED| .03306 .03424 1.59 .1119 -.00769 .07380
HHNINC| -.06119 -.03975 -1.63 .1038 -.13492 .01254
HHKIDS| -.08778*** -.04969 -4.45 .0000 -.12640 -.04916
FEMALE| .20639*** .13969 5.09 .0000 .12687 .28592
|Disturbance Variance Terms
FEMALE| -.07388 -.05000 -1.08 .2784 -.20747 .05972
WORKING| -.01416 -.01493 -.71 .4801 -.05347 .02514
|Sum of terms for variables in both parts
FEMALE| .13252*** .08969 3.52 .0004 .05875 .20629
--------+--------------------------------------------------------------------
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

The partial effects for the heteroscedasticity model are computed at the means of the
variables. It is possible to obtain average partial effects by using the PARTIAL EFFECTS program
rather than the built in marginal effects routine. The following shows the results for female, which
appears in both parts of the model.

PARTIAL EFFECTS ; Effects: female $

N6: Probit and Logit Models: Estimation N-92

---------------------------------------------------------------------
Partial Effects Analysis for Heteros. Logit Prob.Function
---------------------------------------------------------------------
Effects on function with respect to FEMALE
Results are computed by average over sample observations
Partial effects for binary var FEMALE computed by first difference
---------------------------------------------------------------------
df/dFEMALE Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
---------------------------------------------------------------------
APE. Function .13430 .01653 8.12 .10190 .16669

These are the summaries of the predictions of the two estimated models. The performance of the
two models in terms of the simple count of correct predictions is almost identical the
heteroscedasticity model correctly predicts three observations more than the homoscedasticity
model. The mix of correct predictions is very different, however.

Homoscedastic disturbances
+---------------------------------------------------------+
|Predictions for Binary Choice Model. Predicted value is |
|1 when probability is greater than .500000, 0 otherwise.|
|Note, column or row total percentages may not sum to |
|100% because of rounding. Percentages are of full sample.|
+------+---------------------------------+----------------+
|Actual| Predicted Value | |
|Value | 0 1 | Total Actual |
+------+----------------+----------------+----------------+
| 0 | 82 ( 2.4%)| 1073 ( 31.8%)| 1155 ( 34.2%)|
| 1 | 85 ( 2.5%)| 2137 ( 63.3%)| 2222 ( 65.8%)|
+------+----------------+----------------+----------------+
|Total | 167 ( 4.9%)| 3210 ( 95.1%)| 3377 (100.0%)|
+------+----------------+----------------+----------------+

Heteroscedastic disturbances
+---------------------------------------------------------+
|Predictions for Binary Choice Model. Predicted value is |
|1 when probability is greater than .500000, 0 otherwise.|
|Note, column or row total percentages may not sum to |
|100% because of rounding. Percentages are of full sample.|
+------+---------------------------------+----------------+
|Actual| Predicted Value | |
|Value | 0 1 | Total Actual |
+------+----------------+----------------+----------------+
| 0 | 131 ( 3.9%)| 1024 ( 30.3%)| 1155 ( 34.2%)|
| 1 | 139 ( 4.1%)| 2083 ( 61.7%)| 2222 ( 65.8%)|
+------+----------------+----------------+----------------+
|Total | 270 ( 8.0%)| 3107 ( 92.0%)| 3377 (100.0%)|
+------+----------------+----------------+----------------+
N7: Tests and Restrictions in Models for Binary Choice N-93

N7: Tests and Restrictions in Models for Binary

Choice
N7.1 Introduction
We define models in which the response variable being described is inherently discrete as
qualitative response (QR) models. Chapter N6 presented the model formulation and estimation and
analysis tools. This chapter will detail some aspects of hypothesis testing. Most of these results are
generic, and will apply in other models as well.

N7.2 Testing Hypotheses

The full set of options is available for testing hypotheses and imposing restrictions on the
binary choice models. In using these, the set of parameters is

1, ..., K plus for the Burr model

In the parametric models, hypotheses can be done with the standard trinity of tests: Wald,
likelihood ratio and Lagrange Multiplier. All three are particularly straightforward for the binary
choice models.

N7.2.1 Wald Tests

Wald tests are carried out in two ways, with the ; Test: specification in the model command
and by using the WALD command after fitting the model. The former is used for linear restrictions.
The WALD command is more general and allows for tests of nonlinear restrictions on parameters.
The Wald statistic is computed using the estimates of an unrestricted model. The hypothesis
implies a set of restrictions

H0: c() = 0.

(This may involve linear distance from a constant, such as 23 - 1.2 = 0. The preceding formulation
is used to achieve the full generality that NLOGIT allows.) The Wald statistic is computed by the
formula
( ) ( ){ ( )} ( ) ()
1
W = c ' G Est. Asy.Var G ' c

()
c
where ()
G =
'

and is the vector of estimated parameters.

N7: Tests and Restrictions in Models for Binary Choice N-94

You can request Wald tests of simple restrictions by including the request in the model
command. For example:

PROBIT ; Lhs = doctor

; Rhs = one,age,educ,married,hhninc,hhkids
; Test: age + educ = 0,
married = 0 ,
hhninc + 2*hhkids = -.3 $

-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable DOCTOR
Log likelihood function -17670.94233
Restricted log likelihood -18019.55173
Chi squared [ 5 d.f.] 697.21881
Significance level .00000
McFadden Pseudo R-squared .0193462
Estimation based on N = 27326, K = 6
Inf.Cr.AIC =35353.885 AIC/N = 1.294
Hosmer-Lemeshow chi-squared = 105.22799
P-value= .00000 with deg.fr. = 8
Wald test of 3 linear restrictions
Chi-squared = 26.06, P value = .00001
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| .15500*** .05652 2.74 .0061 .04423 .26577
AGE| .01283*** .00079 16.24 .0000 .01129 .01438
EDUC| -.02812*** .00350 -8.03 .0000 -.03498 -.02125
MARRIED| .05226** .02046 2.55 .0106 .01216 .09237
HHNINC| -.11643** .04633 -2.51 .0120 -.20723 -.02563
HHKIDS| -.14118*** .01822 -7.75 .0000 -.17689 -.10548
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Note that the results reported are for the unrestricted model, and the results of the Wald test are
reported with the initial header information. To fit the model subject to the restriction, we change
; Test: in the command to ; CML: with the following results:

PROBIT ; Lhs = doctor

; Rhs = one,age,educ,married,hhninc,hhkids
; CML: age + educ = 0,
married = 0 ,
hhninc + 2*hhkids = -.3 $
N7: Tests and Restrictions in Models for Binary Choice N-95

-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable DOCTOR
Log likelihood function -2125.57999
Restricted log likelihood -2169.26982
Chi squared [ 2 d.f.] 87.37966
Significance level .00000
McFadden Pseudo R-squared .0201403
Estimation based on N = 3377, K = 3
Inf.Cr.AIC = 4257.160 AIC/N = 1.261
Linear constraints imposed 3
Hosmer-Lemeshow chi-squared = 20.93392
P-value= .00733 with deg.fr. = 8
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| .04583 .06144 .75 .4557 -.07458 .16624
AGE| .01427*** .00192 7.44 .0000 .01052 .01803
EDUC| -.01427*** .00192 -7.44 .0000 -.01803 -.01052
MARRIED| 0.0 .....(Fixed Parameter).....
HHNINC| -.06304 .07079 -.89 .3731 -.20178 .07569
HHKIDS| -.11848*** .03539 -3.35 .0008 -.18785 -.04911
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
Fixed parameter ... is constrained to equal the value or
had a nonpositive st.error because of an earlier problem.
-----------------------------------------------------------------------------

When the restrictions are built into the estimator with CML, the information reported is only that the
restrictions were imposed. The results of the Wald or LR test cannot be reported because the
unrestricted model is not computed.

N7.2.2 Likelihood Ratio Tests

Use the log likelihood functions from both restricted and unrestricted models. Log
likelihood functions are saved automatically by the estimators. Do keep in mind that these are
overwritten each time the scalar logl gets replaced by each model command. Your general strategy
for carrying out a likelihood ratio test would be

Model name ; ... - specifies the unrestricted model

CALC ; lu = logl $ Capture log likelihood function
Model name ; ... - specifies the restricted model
CALC ; lr = logl
; List ; chisq = 2*(lu - lr )
; 1 - Chi(chisq, degrees of freedom) $

You must supply the degrees of freedom. If the result of the last line is less than your significance
level usually 0.05 then, the null hypothesis of the restriction would be rejected. Here are two
examples: We continue to examine the German health care data. For purposes of these tests, just for
the illustrations, we will switch to a probit model.
N7: Tests and Restrictions in Models for Binary Choice N-96

Simple Linear Restriction

The following tests the pair of linear restrictions suggested above. Looking at the unrestricted
results from earlier, the restrictions dont look like they are going to pass. The results bear this out.

SAMPLE ; All $
NAMELIST ; x = one,age,educ,married,hhninc,hhkids $
LOGIT ; Lhs = doctor ; Rhs = x $
CALC ; lu = logl $
LOGIT ; Lhs = doctor ; Rhs = x
; Rst = b0, b1, b1, 0, b2, b3 $
CALC ; lr = logl
; List ; chisq = 2*(lu - lr) ; 1 - Chi(chisq,2) $
[CALC] CHISQ = 158.9035080
[CALC] *Result*= .0000000
Calculator: Computed 3 scalar results

Homogeneity Test
We are frequently asked about this. The sample can be partitioned into a number of
subgroups. The question is whether it is valid to pool the subgroups. Here is a general strategy that
is the maximum likelihood counterpart to the Chow test for linear models: Define a variable, say,
group, that takes values 1,2,...,G, that partitions the sample. This is a stratification variable. The test
statistic for homogeneity is

2 = 2[(groups log likelihood for the group) - log likelihood for the pooled sample]
The degrees of freedom is G-1 times the number of coefficients in the model.
Create the group variable.

SAMPLE ; Pooled sample ... however defined ... $

Model name ; ... ; Quiet $ Specify the appropriate model. Suppress the output.
CALC ; chisq = -2*logl ; df = -kreg $
Automate the model fitting estimation, and accumulate the statistic.
PROC
INCLUDE ; New ; Group = i $
Model name ; ... ; Quiet $ Specify the same model. Suppress the output.
CALC ; chisq = chisq + 2*logl ; df = df + kreg $
ENDPROC
Determine the number of groups.

CALC ; g = Max(group) $
Estimate the model once for each group.
EXEC ; i = 1,g $
CALC ; List ; chisq ; df ; 1 - Chi(chisq,df) $
N7: Tests and Restrictions in Models for Binary Choice N-97

This procedure produces only the output of the last CALC command, which will display the test
statistic, the degrees of freedom and the p value for the test.
To illustrate, well test the hypothesis that the same probit model for doctor visits applies to
both men and women. This command suppresses all output save for the actual test of the hypothesis.

NAMELIST ; x = one,age,educ,married,hhninc,hhkids $
PROBIT ; If [ female = 0] ; Lhs = doctor ; Rhs = x ; Quiet $
CALC ; l0 = logl $
PROBIT ; If [ female = 1] ; Lhs = doctor ; Rhs = x ; Quiet $
CALC ; l1 = logl $
PROBIT ; Lhs = doctor ; Rhs = x ; Quiet $
CALC ; l01 = logl ; List
; chisq = -2*(l01 - l0 - l1)
; df = 2*kreg ; pvalue = 1 - Chi(chisq,df) $

The results of the chi squared test strongly reject the homogeneity restriction.

[CALC] CHISQ = 549.3141072

[CALC] DF = 12.0000000
[CALC] PVALUE = .0000000
Calculator: Computed 4 scalar results

N7.2.3 Lagrange Multiplier Tests

The third procedure available for testing hypotheses is the Lagrange Multiplier, or LM
approach. The Lagrange Multiplier statistic is computed as a Wald statistic for testing the hypothesis
that the derivatives of the log likelihood are zero when evaluated at the restricted maximum
likelihood estimator;
( ) { ( )} ( )
1
LM = g R ' Est. Asy.Var g R g R

where R = MLE of the parameters of the model, with restrictions imposed

( )
g R = derivatives of log likelihood of full model, evaluated at R

The estimated asymptotic covariance matrix of the gradient is any of the usual estimators of the
asymptotic covariance matrix of the coefficient estimator, negative inverse of the actual or expected
Hessian, or the BHHH estimator based on the first derivatives only.
Your strategy for carrying out LM tests with NLOGIT is as follows:

Step 1. Obtain the restricted parameter vector. This may involve an unrestricted parameter vector in
some restricted model, padded with some zeros, or a similar arrangement.

Step 2. Set up the full, unrestricted model as if it were to be estimated, but include in the command

; Start = restricted parameter vector

; Maxit = 0
N7: Tests and Restrictions in Models for Binary Choice N-98

The rest of the procedure is automated for you. The ; Maxit = 0 specification takes on a particular
meaning when you also provide a set of starting values. It implies that you wish to carry out an LM
test using the starting values.
To demonstrate, we will carry out the test of the hypothesis

_age + _educ = 0
_married = 0
_hhninc + _hhkids = - .3

that we tested earlier with a Wald statistic, now with the LM test. The commands would be as follows:

PROBIT ; Lhs = doctor

; Rhs = one,age,educ,married,hhninc,hhkids
; CML: age+educ = 0, married = 0 , hhninc + 2*hhkids = -.3 $
PROBIT ; Lhs = doctor
; Rhs = one,age,educ,married,hhninc,hhkids
; Maxit = 0 ; Start = b $

The results of the second model command provide the Lagrange multiplier statistic. The value of
26.06032 is the same as the Wald statistic computed earlier, 26.06.

Maximum of 0 iterations. Exit iterations with status=1.

Maxit = 0. Computing LM statistic at starting values.
No iterations computed and no parameter update done.
-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable DOCTOR
LM Stat. at start values 26.06032
LM statistic kept as scalar LMSTAT
Log likelihood function -17683.96508
Restricted log likelihood -18019.55173
Chi squared [ 5 d.f.] 671.17331
Significance level .00000
McFadden Pseudo R-squared .0186235
Estimation based on N = 27326, K = 6
Inf.Cr.AIC =35379.930 AIC/N = 1.295
Model estimated: Jun 13, 2011, 19:40:02
Hosmer-Lemeshow chi-squared = 132.57086
P-value= .00000 with deg.fr. = 8
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| -.06593 .05655 -1.17 .2437 -.17678 .04491
AGE| .01484*** .00079 18.76 .0000 .01329 .01639
EDUC| -.01484*** .00351 -4.23 .0000 -.02171 -.00796
MARRIED| 0.0 .02049 .00 1.0000 -.40156D-01 .40156D-01
HHNINC| -.09655** .04636 -2.08 .0373 -.18741 -.00568
HHKIDS| -.10173*** .01821 -5.59 .0000 -.13742 -.06603
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N7: Tests and Restrictions in Models for Binary Choice N-99

To complete the trinity of tests, we can carry out the likelihood ratio test, which we could do
as follows:

PROBIT ; Quiet ; Lhs = doctor

; Rhs = one,age,educ,married,hhninc,hhkids
; CML: b(2) + b(3) = 0, b(4) = 0, b(5) + b(6) = -.3 $
CALC ; lr = logl $
PROBIT ; Quiet ; Lhs = doctor
; Rhs = one,age,educ,married,hhninc,hhkids $
CALC ; lu = logl ; List
; lrstat = 2*(lu lr) $

The result of the computation (which displays only the last statistic) is

[CALC] LRSTAT = 26.0455042

Calculator: Computed 2 scalar results

The value of 26.0455 differs only trivially from the other values. This is actually not surprising,
since they should all converge to the same statistic, and the sample in use here is very large.

N7.3 Two Specification Tests

The following are two specialized tests for the probit model, one for testing which of two
competing models appears to be appropriate, and one test against the hypothesis of normality that
underlies the probit model.

N7.3.1 A Test for Nonnested Probit Models

Davidson and MacKinnon (1993) present a test of the nonnested hypothesis that an
alternative set of variables, zi, is the appropriate one for the structural equation of the probit model.

Testing y* = x + vs. y* = z + u

NAMELIST ; x = the independent variables

; z = the competing list of independent variables $
CREATE ; y = the dependent variable $
PROBIT ; Quiet ; Lhs = y ; Rhs = x $
CREATE ; xbeta = xb; fx = N01(xbeta) ; px = Phi(xbeta)
; v = Sqr(px*(1-px)) ; dev = (y - px) / v
; xv = fx*xbeta / v $
PROBIT ; Quiet ; Lhs = y ; Rhs = z $
CREATE ; pz = Phi(zb) ; test = (px - pz) / v $
REGRESS ; Lhs = dev ; Rhs = xv,test $
N7: Tests and Restrictions in Models for Binary Choice N-100

The test is carried out by referring the t ratio on test to the t table. A value larger than the critical
value argues in favor of z as the correct specification. For example, the following tests for which of
two specifications of the right hand side of the probit model is preferred.

NAMELIST ; x = one,age,educ,married,hhninc,hhkids,self
; z = one,age,educ,married,hhninc,female,working $
CREATE ; y = doctor $

The remaining commands are identical.

The essential regression results are as follows. We also reversed the roles of x and z.
Unfortunately, as often happens in specifications, the results are contradictory.

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DEV| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
XV| .04569** .01985 2.30 .0214 .00678 .08459
TEST| -.79517*** .03995 -19.90 .0000 -.87348 -.71687
--------+--------------------------------------------------------------------
XV| .04668** .02033 2.30 .0217 .00684 .08652
TEST| -.26126*** .04273 -6.11 .0000 -.34500 -.17751

The t ratio of -19.9 in the first regression argues in favor of z as the appropriate specification. But,
the also significant t ratio of -6.11 in the second argues in favor of x.

N7.3.2 A Test for Normality in the Probit Model

The second test is a Lagrange multiplier test against the null hypothesis of normality in the
probit model. (The test was developed in Bera, Jarque and Lee (1984).) As usual in normality tests,
the statistic is computed by comparing the third and fourth moments of an underlying variable to
their expected value under normality. The computations are as follows, where i indicates the ith
observation:
ai = xi
i = (ai)
i = (ai)
di = i (yi - i) / [i(1 - i)]
ci = i2 / [i(1 - i)]
m3i = -1/2(ai2 1)
m4i = 1/4 (ai (ai2 + 3))
zi = (xi, m3i, m4i)

(
)( )( )
1
= i 1=
di z i i 1 ci z= i 1 di z i
N N N
Then, LM
= i zi
N7: Tests and Restrictions in Models for Binary Choice N-101

The commands below will carry out the test. The chi squared reported by the last line has two
degrees of freedom.

NAMELIST ; x = one,... $
CREATE ; y = the dependent variable $
PROBIT ; Lhs = y ; Rhs = x $
CREATE ; ai = b'x ; fi = Phi(ai) ; dfi = N01(ai)
; di = (y-fi) * dfi /(fi*(1-fi)) ; ci = dfi^2 /(fi*(1-fi))
; m3i = -1/2*(ai^2-1) ; m4i = 1/4*(ai*(ai^2+3)) $
NAMELIST ; z = x,m3i,m4i $
MATRIX ; List ; LM = diz * <z'[ci]z> * z'di $

We executed the routine for our probit model estimated earlier, with

NAMELIST ; x = one,age,educ,married,hhninc,hhkids,self $
CREATE ; y = doctor $

The result of 93.12115 would lead to rejection of the hypothesis of normality; the 5% critical value
for the chi squared variable with two degrees of freedom is 5.99.

LM| 1
--------+--------------
1| 93.1211

N7.4 The WALD Command

The WALD command may be used for linear and nonlinear restrictions. The model
commands produce a set of names that can be used in WALD commands after estimation. For the
binary choice commands, these are b_variable. The WALD command can be used with these names
in specified restrictions, with no other information needed. For example:

PROBIT ; Lhs = doctor

; Rhs = one,age,educ,married,hhninc,hhkids $
WALD ; Fn1 = b_age + b_educ - 0
; Fn2 = b_married - 0
; Fn3 = b_hhninc + b_hhkids + .3 $

(The latter restriction doesnt make much sense, but we can test it anyway.) The results of this pair
of commands are shown below. (The PROBIT command was shown earlier.)

-----------------------------------------------------------------------------
WALD procedure. Estimates and standard errors
for nonlinear functions and joint test of
nonlinear restrictions.
Wald Statistic = 24.95162
Prob. from Chi-squared[ 3] = .00002
Functions are computed at means of variables
-----------------------------------------------------------------------------
N7: Tests and Restrictions in Models for Binary Choice N-102

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
WaldFcns| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
Fncn(1)| -.01528*** .00369 -4.14 .0000 -.02252 -.00805
Fncn(2)| .05226** .02046 2.55 .0106 .01216 .09237
Fncn(3)| .04239 .05065 .84 .4027 -.05689 .14166
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

You may follow a model command with as many WALD commands as you wish.
You can use WALD to obtain standard errors for linear or nonlinear functions of parameters.
Just ignore the test statistics. Also, WALD produces some useful output in addition to the displayed
results. The new matrix varwald will contain the estimated asymptotic covariance matrix for the set of
functions. The new vector waldfns will contain the values of the specified functions. A third matrix,
jacobian, will equal the derivative matrix, c()/. For the computations above, the three matrices are

Figure N7.1 Matrix Results for the WALD Command

Thus, the command

MATRIX ; w = waldfns <varwald> waldfns $

would recompute the Wald statistic.

Matrix W has 1 rows and 1 columns.
1
+--------------
1| 24.95162
N7: Tests and Restrictions in Models for Binary Choice N-103

N7.5 Imposing Linear Restrictions

Fixed Value and Equality Restrictions

Fixed value and equality restrictions are imposed with

; Rst = the list of settings symbols for free parameters,

values for specific values
For example,

NAMELIST ; x = one,age,educ,married,hhninc,hhkids $
LOGIT ; Lhs = doctor ; Rhs = x
; Rst = b0, b1, b1, 0, b2, b3 $

will force the second and third coefficients to be equal and the fourth to equal zero.

Linear Restrictions

These are imposed with

; CML: the set of linear restrictions

(See Section R13.6.3.) This is a bit more general than the Rst function, but similar. For example, to
force the restriction that the coefficient on age plus that on educ equal twice that on hhninc, use

; CML: age + educ - 2*hhninc = 0

N8: Extended Binary Choice Models N-104

N8: Extended Binary Choice Models

N8.1 Introduction
NLOGIT supports a large variety of models and extensions for the analysis of binary choice.
This chapter documents sample selection models, models with endogenous right hand side variables
and two step estimation of models that build on probit and logit models.

N8.2 Sample Selection in Probit and Logit Models

The model of sample selection can be extended to the probit and logit binary choice models.
In both cases, we depart from

Prob[yi = 1 |xi] = F(xi)

where F(t) = (t) for the probit model and (t) for the logit model,
zi* = wi + ui, ui ~ N[0,1], zi = 1(zi* > 0)
yi, xi observed only when zi = 1.

In both cases, as stated, there is no obvious way that the selection mechanism impacts the binary
choice model of interest. We modify the models as follows:
For the probit model,

yi* = xi + i, i ~ N[0,1], yi = 1(yi* > 0)

which is the structure underlying the probit model in any event, and

ui, i ~ BVN[(0,0),(1,,1)].

This is precisely the structure underlying the bivariate probit model. Thus, the probit model with
selection is treated as a bivariate probit model. Some modification of the model is required to
accommodate the selection mechanism. The command is simply

BIVARIATE ; Lhs = y,z

; Rh1 = variables in x
; Rh2 = variables in w
; Selection $

For the logit model, a similar approach does not produce a convenient bivariate model. The
probability is changed to
exp(xi + i )
Prob(yi = 1 | xi,i) = .
1 + exp(xi + i )
N8: Extended Binary Choice Models N-105

PROBIT ; Lhs = z ; Rhs = variables in w ; Hold $

LOGIT ; Lhs = y ; Rhs = variables in x ; Selection $

The motivation for a probit selection mechanism into a logit model does seem ambiguous.

N8.3 Endogenous Variable in a Probit Model

This estimator is for what is essentially a simultaneous equations model. The model
equations are
y1=* x + y2 + , y=
1 1[ y1 * > 0] ,
y=2 z + u ,
0 1
( , u ) ~ N , 2
.
0

Probit estimation based on y1 and (x1,y2) will not consistently estimate (,) because of the
correlation between y2 and induced by the correlation between u and . Several methods have been
proposed for estimation. One possibility is to use the partial reduced form obtained by inserting the
second equation in the first. This will produce consistent estimates of /(1+22+2)1/2 and
/(1+22+2)1/2. Linear regression of y2 on z produces estimates of and 2, but there is no
method of moments estimator of produced by this procedure, so this estimator is incomplete.
Newey (1987) suggested a minimum chi squared estimator that does estimate all parameters. A
more direct, and actually simpler approach is full information maximum likelihood. Details on the
estimation procedure appear in Section E29.3.
To estimate this model, use the command

PROBIT ; Lhs = y1, y2

; Rh1 = independent variables in probit equation
; Rh2 = independent variables in regression equation $

(Note, the probit must be the first equation.) Other optional features relating to fitted values,
marginal effects, etc. are the same as for the univariate probit command. We note, marginal effects
are computed using the univariate probit probabilities,

Prob[y1 = 1] ~ [x + y2]

These will approximate the marginal effects obtained from the conditional model (which contain u).
When averaged over the sample values, the effect of u will become asymptotically negligible.
Predictions, etc. are kept with ; Keep = name, and so on. Likewise, options for the optimization,
such as maximum iterations, etc. are also the same as for the univariate probit model.
N8: Extended Binary Choice Models N-106

Retained Results
The results saved by this binary choice estimator are:

Matrices: b = estimate of (,,). Using ; Par adds and to b.

varb = asymptotic covariance matrix.

Scalars: kreg = number of variables in Rhs

nreg = number of observations
logl = log likelihood function

Last Model: b_variable (includes ) and, c_variables.

Last Function: (bx + ay2) = Prob(y1 = 1 | x,y2).

The Last Model names are used with WALD to simplify hypothesis tests. The last function is the
conditional mean function. The extra complication of the estimator has been used to obtain a
consistent estimator of ,. With that in hand, the interesting function is E[y1| x,y2].

NAMELIST ; xdoctor = one,age,hsat,public,hhninc$

NAMELIST ; xincome = one,age,age*age,educ,female,hhkids $
PROBIT ; Lhs = doctor,hhninc
; Rh1 = xdoctor ; Rh2 = xincome $
-----------------------------------------------------------------------------
Probit Regression Start Values for DOCTOR
Dependent variable DOCTOR
Log likelihood function -16634.88715
Estimation based on N = 27326, K = 5
Inf.Cr.AIC =33279.774 AIC/N = 1.218
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
Constant| 1.05627*** .05508 19.18 .0000 .94831 1.16423
AGE| .00895*** .00073 12.24 .0000 .00752 .01038
HSAT| -.17520*** .00395 -44.31 .0000 -.18295 -.16745
PUBLIC| .12985*** .02515 5.16 .0000 .08056 .17914
HHNINC| -.01332 .04581 -.29 .7712 -.10310 .07645
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
Ordinary least squares regression ............
LHS=HHNINC Mean = .35208
Standard deviation = .17691
No. of observations = 27326 Degrees of freedom
Regression Sum of Squares = 88.9621 5
Residual Sum of Squares = 766.216 27320
Total Sum of Squares = 855.178 27325
Standard error of e = .16747
Fit R-squared = .10403 R-bar squared = .10386
Model test F[ 5, 27320] = 634.40260 Prob F > F* = .00000
Diagnostic Log likelihood = 10059.42844 Akaike I.C. = -3.57369
Restricted (b=0) = 8558.60603 Bayes I.C. = -3.57189
Chi squared [ 5] = 3001.64483 Prob C2 > C2* = .00000
N8: Extended Binary Choice Models N-107

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
HHNINC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
Constant| -.40365*** .01704 -23.68 .0000 -.43705 -.37024
AGE| .02555*** .00079 32.43 .0000 .02400 .02709
AGE*AGE| -.00029*** .9008D-05 -31.68 .0000 -.00030 -.00027
EDUC| .01989*** .00045 44.22 .0000 .01901 .02077
FEMALE| .00122 .00207 .59 .5538 -.00283 .00527
HHKIDS| -.01146*** .00231 -4.96 .0000 -.01599 -.00693
--------+--------------------------------------------------------------------
Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
Initial iterations cannot improve function.Status=3
Error 805: Initial iterations cannot improve function.Status=3
Function= .61428384629D+04, at entry, .61358027527D+04 at exit
-----------------------------------------------------------------------------
Probit with Endogenous RHS Variable
Dependent variable DOCTOR
Log likelihood function -6135.80156
Restricted log likelihood -16599.60800
Chi squared [ 11 d.f.] 20927.61288
Significance level .00000
McFadden Pseudo R-squared .6303647
Estimation based on N = 27326, K = 13
Inf.Cr.AIC =12297.603 AIC/N = .450
--------+--------------------------------------------------------------------
DOCTOR| Standard Prob. 95% Confidence
HHNINC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Coefficients in Probit Equation for DOCTOR
Constant| 1.05627*** .07626 13.85 .0000 .90681 1.20574
AGE| .00895*** .00074 12.03 .0000 .00749 .01041
HSAT| -.17520*** .00392 -44.72 .0000 -.18288 -.16752
PUBLIC| .12985*** .02626 4.94 .0000 .07838 .18131
HHNINC| -.01332 .14728 -.09 .9279 -.30200 .27535
|Coefficients in Linear Regression for HHNINC
Constant| -.40301*** .01712 -23.55 .0000 -.43656 -.36946
AGE| .02551*** .00081 31.37 .0000 .02391 .02710
AGE*AGE| -.00028*** .9377D-05 -30.39 .0000 -.00030 -.00027
EDUC| .01986*** .00040 50.26 .0000 .01908 .02063
FEMALE| .00122 .00207 .59 .5552 -.00284 .00528
HHKIDS| -.01144*** .00226 -5.06 .0000 -.01587 -.00701
|Standard Deviation of Regression Disturbances
Sigma(w)| .16720*** .00026 639.64 .0000 .16669 .16772
|Correlation Between Probit and Regression Disturbances
Rho(e,w)| .02412 .02550 .95 .3442 -.02586 .07409
--------+--------------------------------------------------------------------
Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N9: Fixed and Random Effects Models for Binary Choice N-108

N9: Fixed and Random Effects Models for

Binary Choice
N9.1 Introduction
The parametric models discussed in Chapters N5-N6 are extended to panel data formats.
Four specific parametric model formulations are provided as internal procedures in NLOGIT for
these binary choice models. These are the same ones described earlier, less the Burr distribution
which is not included in this set. Four classes of models are supported:

Fixed effects: Prob[yit = 1] = F(xit + i),

i may be correlated with xit,

Random effects: Prob[yit = 1] = Prob[xit + it + ui > 0],

ui is uncorrelated with xit,

Random parameters: Prob[yit = 1] = F( ixit),

i | i ~ h(|i) with mean vector and covariance matrix

Latent class: Prob[yit = 1|class j] = F( jxit),

Prob[class = j] = Fj()

The last two models provide various extensions of the basic form shown above.

NOTE: None of these panel data models require balanced panels. The group sizes may always
vary.

NOTE: None of these panel data models are provided for the Burr (scobit) model.

All formulations are treated the same for the five models, probit, logit, extreme value, Gompertz and
arctangent.

NOTE: The random effects estimator requires individual data. The fixed effects estimator allows
grouped data.

The third and fourth arise naturally in a panel data setting, but in fact, can be used in cross section
frameworks as well. The fixed and random effects estimators require panel data. The fixed and
random effects models are described in this chapter. Random parameters and latent class models are
documented in Chapter N10.
N9: Fixed and Random Effects Models for Binary Choice N-109

The applications in this chapter are based on the German health care data used throughout
the documentation. The data are an unbalanced panel of observations on health care utilization by
7,293 individuals. The group sizes in the panel number as follows: Ti: 1=1525, 2=2158, 3=825,
4=926, 5=1051, 6=1000, 7=987. There are altogether 27,326 observations. The variables in the file
that are used here are

doctor = 1 if number of doctor visits > 0, 0 otherwise,

hhninc = household nominal monthly net income in German marks / 10000,
hhkids = 1 if children under age 16 in the household, 0 otherwise,
educ = years of schooling,
married = marital status,
female = 1 for female, 0 for male,
docvis = number of visits to the doctor,
hospvis = number of visits to the hospital,
newhsat = self assessed health satisfaction, coded 0,1,...,10.

The data on health satisfaction in the raw data file, in variable hsat, contained some obvious coding
errors. Our corrected data are in newhsat.

N9.2 Commands
The essential model command for the models described in this chapter are

PROBIT ; Lhs = dependent variable

or LOGIT ; Rhs = independent variables - not including one
; Panel
; ... specification of the panel data model $

As always, panels may be balanced or unbalanced. The panel is indicated with

SETPANEL ; Group = group identifier

; Pds = count variable to be created $
Thereafter,
; Panel

in the model command is sufficient to specify the panel setting. In circumstances where you have set
up the count variable yourself, you may also use the explicit declaration in the command:

; Pds = the fixed number of periods if the panel is balanced

; Pds = a variable which, within a group, repeats the number
of observations in the group

One or the other of these two specifications is required for the fixed and random effects estimators.

NOTE: For these estimators, you should not attempt to manage missing data. Just leave
observations with missing values in the sample. NLOGIT will automatically bypass the missing
values. Do not use SKIP, as it will undermine the setting of ; Pds = specification.
N9: Fixed and Random Effects Models for Binary Choice N-110

The estimator produces and saves the coefficient estimator, b and covariance matrix, varb, as usual.
Unless requested, the estimated fixed effects coefficients are not retained. (They are not reported
regardless.) To save the vector of fixed effects estimates, in a matrix named alphafe, add

; Parameters
to the command. The fixed effects estimators allow up to 100,000 groups. However, only up to
50,000 estimated constant terms may be saved in alphafe.

N9.3 Clustering, Stratification and Robust Covariance

Matrices
The robust estimator based on sample clustering and stratification is available for the
parametric binary choice models. Full details appear in Chapter R10 for the general case and
Section E27.5.2 for the parametric binary choice models of interest here. The option for clustering is
offered in the command builders for most of the nonlinear model and binary choice routines in the
Model Estimates submenu. This will differ a bit from model to model. The one for the probit
model is shown below in Figure N9.1. The Model Estimates dialog box is selected at the bottom of
the Output page, then the clustering is specified in the next dialog box.

Figure N9.1 Command Builder for a Probit Model

This sampling setup may be used with any of the binary choice estimators. Do note, however, you
should not use it with panel data models. The so called clustering corrections are already built into
the panel data estimators. (This is unlike the linear regression case, in which some authors argue that
the correction should be used even when fixed or random effects models are estimated.)
To illustrate, the following shows the setup for the panel data set described in the preceding
section. We have also artificially reduced the sample to 1,015 observations, 29 groups of 35
individuals, all of whom were observed seven times. The information below would appear with a
model command that used this configuration of the data to construct a robust covariance matrix.
N9: Fixed and Random Effects Models for Binary Choice N-111

The commands are:

SAMPLE ; 1-5000 $
REJECT ; _groupti < 7 $
NAMELIST ; x = age,educ,hhninc,hhkids,married $
PROBIT ; Lhs = doctor ; Rhs = one,x
; Cluster = 7
; Stratum = 35
; Describe $

These results appear before any results of the probit command. They are produced by the ; Describe
specification in the command.
========================================================================
Summary of Sample Configuration for Two Level Stratified Data
========================================================================
Stratum # Stratum Number Groups Group Sizes
Size (obs) Sample FPC. 1 2 3 ... Mean
========== ========== ============= =================================
1 35 5 1.0000 7 7 7 ... 7.0
2 35 5 1.0000 7 7 7 ... 7.0
(Rows 3 28 omitted)
29 35 5 1.0000 7 7 7 ... 7.0
+---------------------------------------------------------------------+
| Covariance matrix for the model is adjusted for data clustering. |
| Sample of 1015 observations contained 145 clusters defined by |
| 7 observations (fixed number) in each cluster. |
| Sample of 1015 observations contained 29 strata defined by |
| 35 observations (fixed number) in each stratum. |
+---------------------------------------------------------------------+
-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable DOCTOR
Log likelihood function -621.15030
Restricted log likelihood -634.14416
Chi squared [ 5 d.f.] 25.98772
Significance level .00009
McFadden Pseudo R-squared .0204904
Estimation based on N = 1015, K = 6
Inf.Cr.AIC = 1254.301 AIC/N = 1.236
Hosmer-Lemeshow chi-squared = 18.58245
P-value= .01726 with deg.fr. = 8
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| .71039 2.41718 .29 .7688 -4.02720 5.44797
AGE| .00659 .03221 .20 .8378 -.05655 .06973
EDUC| -.05898 .14043 -.42 .6745 -.33421 .21625
HHNINC| -.13753 1.25599 -.11 .9128 -2.59921 2.32416
HHKIDS| -.11452 .56015 -.20 .8380 -1.21240 .98336
MARRIED| .29025 .82535 .35 .7251 -1.32741 1.90791
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N9: Fixed and Random Effects Models for Binary Choice N-112

N9.4 One and Two Way Fixed Effects Models

The fixed effects models are estimated by unconditional maximum likelihood. The
command for requesting the model is

PROBIT ; Lhs = dependent variable

or LOGIT ; Rhs = independent variables - not including one
; Panel
; Fixed Effects or ; FEM $

NOTE: Your Rhs list should not include a constant term, as the fixed effects model fits a complete
set of constants for the set of groups. If you do include one in your Rhs list, it is automatically
removed prior to beginning estimation.

Further documentation and technical details on fixed effects models for binary choice appear in
Chapter E30.
The fixed effects model assumes a group specific effect:

Prob[yit = 1] = F(xit + i)

where i is the parameter to be estimated. You may also fit a two way fixed effects model

Prob[yit = 1] = F(xit + i + t)

where t is an additional, time (period) specific effect. The time specific effect is requested by
adding
; Time

to the command if the panel is balanced, and

; Time = variable name

if the panel is unbalanced. For the unbalanced panel, we assume that overall, the sample
observation period is
t = 1,2,..., T

and that the Time variable gives for the specific group, the particular values of t that apply to the
observations. Thus, suppose your overall sample is five periods. The first group is three
observations, periods 1, 2, 4, while the second group is four observations, 2, 3, 4, 5. Then, your
panel specification would be

; Pds = Ti, for example, where Ti = (3, 3, 3), (4, 4, 4, 4)

and ; Time = Pd, for example, where Pd = (1, 2, 4), (2, 3, 4, 5).
N9: Fixed and Random Effects Models for Binary Choice N-113

Results that are kept for this model are

Matrices: b = estimate of
varb = asymptotic covariance matrix for estimate of .
alphafe = estimated fixed effects if the command contains ; Parameters

Scalars: kreg = number of variables in Rhs

nreg = number of observations
logl = log likelihood function

Last Model: b_variables

Last Function: None

The upper limit on the number of groups is 100,000. Partial effects are computed locally with
; Partial Effects in the command. The post estimation PARTIAL EFFECTS command does not
have the set of constant terms, some of which are infinite, so the probabilities cannot be computed.

Application

The gender and kids present dummy variables are time invariant and are omitted from the
model. Nonlinear models are like linear models in that time invariant variables will prevent
estimation. This is not due to the within transformation producing columns of zeros. The within
transformation of the data is not used for nonlinear models. A similar effect does arise in the
derivatives of the log likelihood, however, which will halt estimation because of a singular Hessian.
The results of fitting models with no fixed effects, with the person specific effects and with
both person and time effects are listed below. The results are partially reordered to enable
comparison of the results, and some of the results from the pooled estimator are omitted.

SAMPLE ; All $
SETPANEL ; Group = id ; Pds = ti $
NAMELIST ; x = age,educ,hhninc,newhsat $
PROBIT ; Lhs = doctor ; Rhs = x,one
; Partial Effects $
PROBIT ; Lhs = doctor ; Rhs = x
; FEM
; Panel
; Parameters
; Partial Effects $
PROBIT ; Lhs = doctor ; Rhs = x
; FEM
; Panel
; Time Effects
; Parameters
; Partial Effects $
N9: Fixed and Random Effects Models for Binary Choice N-114

These are the results for the pooled data without fixed effects.

-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable DOCTOR
Log likelihood function -16639.23971
Restricted log likelihood -18019.55173
Chi squared [ 4 d.f.] 2760.62404
Significance level .00000
McFadden Pseudo R-squared .0766008
Estimation based on N = 27326, K = 5
Inf.Cr.AIC =33288.479 AIC/N = 1.218
Hosmer-Lemeshow chi-squared = 20.51061
P-value= .00857 with deg.fr. = 8
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
AGE| .00856*** .00074 11.57 .0000 .00711 .01001
EDUC| -.01540*** .00358 -4.30 .0000 -.02241 -.00838
HHNINC| -.00668 .04657 -.14 .8859 -.09795 .08458
NEWHSAT| -.17499*** .00396 -44.21 .0000 -.18275 -.16723
Constant| 1.35879*** .06243 21.77 .0000 1.23644 1.48114
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

These are the estimates for the one way fixed effects model.

-----------------------------------------------------------------------------
FIXED EFFECTS Probit Model
Dependent variable DOCTOR
Log likelihood function -9187.45120
Estimation based on N = 27326, K =4251
Inf.Cr.AIC =26876.902 AIC/N = .984
Unbalanced panel has 7293 individuals
Skipped 3046 groups with inestimable ai
PROBIT (normal) probability model
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
AGE| .04701*** .00438 10.74 .0000 .03844 .05559
EDUC| -.07187* .04111 -1.75 .0804 -.15244 .00870
HHNINC| .04883 .10782 .45 .6506 -.16249 .26015
NEWHSAT| -.18143*** .00805 -22.53 .0000 -.19721 -.16564
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N9: Fixed and Random Effects Models for Binary Choice N-115

Figure N9.2 Estimated Fixed Effects

Note that the results report that 3046 groups had inestimable fixed effects. These are
individuals for which the Lhs variable, doctor, was the same in every period, including 1525 groups
with Ti = 1. If there is no within group variation in the dependent variable for a group, then the fixed
effect for that group cannot be estimated, and the group must be dropped from the sample. The
; Parameters specification requests that the estimates of i be kept in a matrix, alphafe. Groups for
which i is not estimated are filled with the value -1.E20 if yit is always zero and +1.E20 if yit is
always one, as shown above.
The log likelihood function has increased from -16,639.24 to -9187.45 in computing the fixed
effects model. The chi squared statistic is twice the difference, or 14,903.57. This would far exceed
the critical value for 95% significance, so at least at first take, it would seem that the hypothesis of no
fixed effects should be rejected. There are two reasons why this test would be invalid. First, because
of the incidental parameters issue, the fixed effects estimator is inconsistent. As such, the statistic just
computed does not have precisely a chi squared distribution, even in large samples. Second, the fixed
effects estimator is based on a reduced sample. If the test were valid otherwise, it would have to be
based on the same data set. This can be accomplished by using the commands

CREATE ; meandr = Group Mean(doctor, Str = id) $

REJECT ; meandr < .1 | meandr > .9 $
PROBIT ; Lhs = doctor ; Rhs = one,x $
N9: Fixed and Random Effects Models for Binary Choice N-116

(The mean value must be greater than zero and less than one. For groups of seven, it can be as high as
6/7 = .86.) Using the reduced sample, the log likelihood for the pooled sample would be -10,852.71.
The chi squared is 11,573.31 which is still extremely large. But, again, the statistic does not have the
large sample chi squared distribution that allows a formal test. It is a rough guide to the results, but not
precise as a formal rule for building the model.
In order to compute marginal effects, it is necessary to compute the index function, which
does require an i. The mean of the estimated values is used for the computation. The results for the
pooled data are shown for comparison below the fixed effects results.

These are the partial effects for the fixed effects model.
-----------------------------------------------------------------------------
Partial derivatives of E[y] = F[*] with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Estimated E[y|means,mean alphai]= .625
Estimated scale factor for dE/dx= .379
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .01783*** 1.22903 6.39 .0000 .01237 .02330
EDUC| -.02726 -.49559 -1.40 .1628 -.06554 .01102
HHNINC| .01852 .01048 .45 .6542 -.06253 .09957
NEWHSAT| -.06882*** -.77347 -5.96 .0000 -.09144 -.04619
--------+--------------------------------------------------------------------
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

These are the partial effects for the pooled model.

-----------------------------------------------------------------------------
Partial derivatives of E[y] = F[*] with
respect to the vector of characteristics
Average partial effects for sample obs.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00297*** .20554 11.66 .0000 .00247 .00347
EDUC| -.00534*** -.09618 -4.30 .0000 -.00778 -.00291
HHNINC| -.00232 -.00130 -.14 .8859 -.03401 .02937
NEWHSAT| -.06075*** -.65528 -49.40 .0000 -.06316 -.05834
--------+--------------------------------------------------------------------
# Partial effect for dummy variable is E[y|x,d=1] - E[y|x,d=0]
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N9: Fixed and Random Effects Models for Binary Choice N-117

These are the two way fixed effects estimates. The time effects, which are usually few in number,
are shown in the model results, unlike the group effects.

-----------------------------------------------------------------------------
FIXED EFFECTS Probit Model
Dependent variable DOCTOR
Log likelihood function -9175.69958
Estimation based on N = 27326, K =4257
Inf.Cr.AIC =26865.399 AIC/N = .983
Model estimated: Jun 15, 2011, 11:00:11
Unbalanced panel has 7293 individuals
Skipped 3046 groups with inestimable ai
No. of period specific effects= 6
PROBIT (normal) probability model
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
AGE| .03869*** .01310 2.95 .0031 .01301 .06437
EDUC| -.07985* .04130 -1.93 .0532 -.16080 .00109
HHNINC| .05329 .10807 .49 .6219 -.15852 .26510
NEWHSAT| -.18090*** .00806 -22.44 .0000 -.19670 -.16510
Period1| -.08649 .15610 -.55 .5795 -.39244 .21946
Period2| -.00782 .13926 -.06 .9552 -.28076 .26513
Period3| .08766 .12423 .71 .4804 -.15583 .33116
Period4| .03048 .10907 .28 .7799 -.18330 .24425
Period5| -.02437 .09372 -.26 .7948 -.20807 .15932
Period6| .05075 .07761 .65 .5131 -.10136 .20287
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
Partial derivatives of E[y] = F[*] with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Estimated E[y|means,mean alphai]= .625
Estimated scale factor for dE/dx= .379
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .01467*** 1.01123 4.35 .0000 .00806 .02129
EDUC| -.03029 -.55056 -1.49 .1370 -.07021 .00964
HHNINC| .02021 .01144 .48 .6289 -.06176 .10218
NEWHSAT| -.06861*** -.77109 -4.34 .0000 -.09962 -.03761
--------+--------------------------------------------------------------------
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N9: Fixed and Random Effects Models for Binary Choice N-118

N9.5 Conditional MLE of the Fixed Effects Logit Model

Two nonlinear models, the binomial logit and Poisson regression can be estimated by
conditional maximum likelihood. This is a specialized approach that was devised to deal with the
problem of large numbers of incidental parameters discussed in the preceding section. (This model
was studied, among others, by Chamberlain (1980).) The log likelihood for the binomial logit model
with fixed effects is

i 1=
t i 1 log ( 2 yit 1) (xit + i )
N T
logL = =

The first term, 2yit - 1, makes the sign negative for yit = 0 and positive for yit = 1, and (.) is the
logistic probability, (z) = 1/[1 + exp(-z)]. Direct maximization of this log likelihood involves
estimation of N+K parameters, where N is the number of groups. As N may be extremely large, this
is a potentially difficult estimation problem. As we saw in the preceding section, direct estimation
with up to 100,000 coefficients is feasible. But, the method discussed here is not restricted the
number of groups is unlimited because the fixed effects coefficients are not estimated. Rather, the
fixed effects are conditioned out of the log likelihood. The main appeal of this approach, however, is
that whereas the brute force estimator of the preceding section is subject to the incidental parameters
bias, the conditional estimator is not; it is consistent even for small T (even for T = 2).
The contribution to the likelihood function of the Ti observations for group i can be
conditioned on the sum of the observed outcomes to produce the conditional log likelihood,
Ti

t =1
exp[ yit xit ]
Lc = Ti

all arrangements of Ti outcomes with the same sum

s =1
exp[ yisxis ]

exp t =i 1 yit xit

= .
all arrangements of Ti outcomes with the same sum exp si=1 disxis
T

This function can be maximized with respect to the slope parameters, , with no need to estimate the
fixed effects parameters. The number of terms in the denominator of the probability may be
Ti
exceedingly large, as it is the sum of T* terms where T* is equal to the binomial coefficient and
Si
Si is the sum of the binary outcomes for the ith group. This can be extremely large. The computation
of the denominator is accomplished by means of a recursion presented in Krailo and Pike (1984). Let
the denominator be denoted A(Ti,Si). The authors show that for any T and S the function obeys the
recursion
A(T,S) = A(T-1,S) + exp(xiT)A(T-1,S-1)

with initial conditions A(T,s) = 0 if T < s and A(T,0) = 1.

N9: Fixed and Random Effects Models for Binary Choice N-119

This enables rapid computation of the denominator for Ti up to 200 which is the internal limit. (If
your model is this large, expect this computation to be quite time consuming. Although 200 periods
(or more) is technically feasible, the number of terms rises geometrically in Ti, and more than 20 or
30 or so is likely to test the limits of the program (as well as your patience). Note, as well that when
the sum the observations is zero or Ti, the conditional probability is one, since there is only a single
way that each of these can occur. Thus, groups with sums of zero or Ti fall out of the computation.
Estimation of this model is done with Newtons method. When the data set is rich enough
both in terms of variation in xit and in Si, convergence will be quick and simple.

N9.5.1 Command
The command for estimation of the model by this method is

LOGIT ; Lhs = dependent variable

; Rhs = dependent variables (do not include one)
; Pds = fixed number of periods or variable for group sizes $

NOTE: You must omit the ; FEM from the logit command. This is the default panel data estimator
for the binary logit model. Use ; Fixed Effects or ; FEM to request the unconditional estimator
discussed in the previous section.

You may use weights with this estimator. Presumably, these would reflect replications of
the observations. Be sure that the weighting variable takes the same value for all observations within
a group. The specification would be

; Wts = variable, Noscale

The Noscaling option should be used here if the weights are replication factors. If not, then do be
aware that the scaling will make the weights sum to the sample size, not the number of groups.
Results that are retained with this estimator are the usual ones from estimation:

Matrices: b = estimate of
varb = asymptotic covariance matrix for estimate of

Scalars: kreg = number of variables in Rhs

nreg = number of observations
logl = log likelihood function

Last Model: b_variables

Last Function: None

N9: Fixed and Random Effects Models for Binary Choice N-120

N9.5.2 Application
The following will fit the binary logit model using the two methods noted. Bear in mind that
with Ti < 7, the unconditional estimator is inconsistent and in fact likely to be substantially biased.
The conditional estimator is consistent. Based on the simulation results cited earlier, the second
results should exceed the first by roughly 40%. Partial effects are shown as well.

NAMELIST ; x = age,educ,hhninc,newhsat $
LOGIT ; Lhs = doctor ; Rhs = x,one $
LOGIT ; Lhs = doctor ; Rhs = x
; Panel $ (Chamberlain conditional estimator)
LOGIT ; Lhs = doctor ; Rhs = x
; Panel ; FEM $ (unconditional estimator)

These are the pooled estimates.

-----------------------------------------------------------------------------
Binary Logit Model for Binary Choice
Dependent variable DOCTOR
Log likelihood function -16639.86860
Restricted log likelihood -18019.55173
Chi squared [ 4 d.f.] 2759.36627
Significance level .00000
McFadden Pseudo R-squared .0765659
Estimation based on N = 27326, K = 5
Inf.Cr.AIC =33289.737 AIC/N = 1.218
Hosmer-Lemeshow chi-squared = 23.04975
P-value= .00330 with deg.fr. = 8
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
AGE| .01366*** .00121 11.26 .0000 .01128 .01604
EDUC| -.02604*** .00585 -4.45 .0000 -.03750 -.01458
HHNINC| -.01231 .07670 -.16 .8725 -.16264 .13801
NEWHSAT| -.29181*** .00681 -42.86 .0000 -.30515 -.27846
Constant| 2.28922*** .10379 22.06 .0000 2.08580 2.49265
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

These are the conditional maximum likelihood estimates followed by the unconditional fixed effects
estimates. For these data, the unconditional estimates are closer to the conditional ones than might
have been expected, but still noticeably higher as the received results would predict. The suggested
proportionality result also seems to be operating, but with an unbalanced panel, this would not
necessarily occur, and should not be used as any kind of firm rule (save, perhaps for the case of Ti = 2).
+--------------------------------------------------+
| Panel Data Binomial Logit Model |
| Number of individuals = 7293 |
| Number of periods =TI |
| Conditioning event is the sum of DOCTOR |
+--------------------------------------------------+
N9: Fixed and Random Effects Models for Binary Choice N-121

-----------------------------------------------------------------------------
Logit Model for Panel Data
Dependent variable DOCTOR
Log likelihood function -6092.58175
Estimation based on N = 27326, K = 4
Inf.Cr.AIC =12193.164 AIC/N = .446
Hosmer-Lemeshow chi-squared = *********
P-value= .00000 with deg.fr. = 8
Fixed Effect Logit Model for Panel Data
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .06391*** .00659 9.70 .0000 .05100 .07683
EDUC| -.09127 .05752 -1.59 .1126 -.20401 .02147
HHNINC| .06121 .16058 .38 .7031 -.25352 .37594
NEWHSAT| -.23717*** .01208 -19.63 .0000 -.26086 -.21349
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
FIXED EFFECTS Logit Model
Dependent variable DOCTOR
Log likelihood function -9279.06752
Estimation based on N = 27326, K =4251
Inf.Cr.AIC =27060.135 AIC/N = .990
Unbalanced panel has 7293 individuals
Skipped 3046 groups with inestimable ai
LOGIT (Logistic) probability model
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
AGE| .07925*** .00738 10.74 .0000 .06479 .09372
EDUC| -.11803* .06779 -1.74 .0817 -.25090 .01484
HHNINC| .07814 .18102 .43 .6660 -.27665 .43294
NEWHSAT| -.30367*** .01376 -22.07 .0000 -.33064 -.27670
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.

When the panel is balanced, the estimator also produces a frequency count for the
conditioning sums. For example, if we restrict our sample to the individuals who are in the sample
for all seven periods, the following table will also appear with the results.
+--------------------------------------------------+
| Panel Data Binomial Logit Model |
| Number of individuals = 887 |
| Number of periods = 7 |
| Conditioning event is the sum of DOCTOR |
| Distribution of sums over the 7 periods: |
| Sum 0 1 2 3 4 5 6 |
| Number 48 73 82 100 115 116 151 |
| Pct. 5.41 8.23 9.24 11.27 12.97 13.08 17.02 |
| Sum 7 8 9 10 11 12 13 |
| Number 202 0 0 0 0 0 0 |
| Pct. 22.77 .00 .00 .00 .00 .00 .00 |
+--------------------------------------------------+

This count would be meaningless in an unbalanced panel, so it is omitted.

N9: Fixed and Random Effects Models for Binary Choice N-122

How should you choose which estimator to use? We should note that the two approaches
will generally give different numerical answers. The conditional and unconditional log likelihoods
are different. In general, you should use the conditional estimator if T is not relatively large. The
conditional estimator is less efficient by construction, but consistency trumps efficiency at this level.
In addition, if you have more than 100,000 groups, you must use the conditional estimator. If, on the
other hand, T is larger than, say, 10, and N is less than 100,000, then the unconditional estimator
might be preferred. The additional consideration discussed in the next section might also weigh in
favor of the unconditional estimator.

N9.5.3 Estimating the Individual Constant Terms

The conditional fixed effects estimator for the logit model specifically eliminates the fixed
effects, so they are not directly estimated. Without them, however, the parameter estimates are of
relatively little use. Fitted probabilities and marginal effects will both require some estimate of a
constant term. You can request post estimation computation of the fixed effects by using the
specification
; Parameters
This saves a matrix named alphafe in your matrix work area. This will be a vector with number of
elements equal to the number of groups, containing an ad hoc estimate of i for the groups for which
there is within group variation in yit. We note how this is done. The logit model is

Prob[yit = 1|xit] = (xit + i) where (z) = exp(z)/[1+exp(z)]

After estimation of , we treat the xit part of this as known, and let zit = xit. These are now just
data. As such, the log likelihood for group i would be

log Li = t log [(2yit 1)(zit + i)]

The likelihood equation for i would be

t (yit Pit) = 0 where Pit = (zit + i)

The implicit solution for i is given by

t yit = t wit / (ai + wit) where wit = exp(zit) and ai = exp(-i).

If yit is always zero or always one in every period, t, then there is no solution to maximizing this
function. The corresponding element of alphafe will be set equal to -1.d20 or +1.d20 But, if the yits
differ, then the i that equates the left and right hand sides can be found by a straightforward search.
The remaining rows of alphafe will contain the individual specific solutions to these equations.
(This is the method that Heckman and MaCurdy (1980) suggested for estimation of the fixed effects
probit model.)
We emphasize, this is not the maximum likelihood estimator of i because the conditional
estimator of is not the unconditional MLE. Nor, in fact, is it consistent in N. It is consistent in Ti,
but that is not helpful here since Ti is fixed, and presumably small. This estimator is a means to an
end. The estimated marginal effects can be based on this estimator it will give a reasonable
estimator of an overall average of the constant terms, which is all that is needed for the marginal
effects. Individual predicted probabilities remain ambiguous.
N9: Fixed and Random Effects Models for Binary Choice N-123

N9.5.4 A Hausman Test for Fixed Effects in the Logit Model

The fixed effects estimator is illustrated with the data used in the preceding examples: Note
that the first estimator is the pooled estimator. Under the alternative hypothesis of fixed effects, it is
inconsistent. Under the null, it is consistent and efficient. The second estimator is the conditional MLE
and the third one is the unconditional fixed effects estimator. The unconditional fixed estimator cannot
be used for formal testing because of the incidental parameters problem it is inconsistent. The pooled
estimator and the conditional fixed effects estimator use different samples, so the likelihoods are not
comparable. Therefore, testing for the joint significance of the effects is problematic for the
conditional estimator. What one can do is use a Hausman test. The test is constructed as follows:
H0: There are no fixed effects; unconditional ML estimators are b0 and V0
H1: There are fixed effects: conditional ML estimators are b1 and V1
Under H0, b0 is consistent and efficient, while b1 is consistent but inefficient. Under H1, b0 is
inconsistent while b1 is consistent and efficient. The Hausman statistic would therefore be
H = (b1 - b0) [V1 - V0]-1(b1 - b0)
The statistic can be constructed as follows:
NAMELIST ; x = the independent variables, not including one $
LOGIT ; Lhs = ... ; Rhs = x, one $
CALC ; k = Col(x) $
MATRIX ; b0 = b(1:k) ; v0 = varb(1:k,1:k) $
LOGIT ; Lhs = ... ; Rhs = x ; Pds = ... ; FEM $
MATRIX ; b1 = b ; v1 = varb $
MATRIX ; d = b1 - b0 ; List ; h = d * Nvsm(v1, -v0) * d $
We apply this to our innovation data by defining x = imprtshr,fdishare,logsales,relsize,prod and the
dependent variable is innov. The remaining commands are generic.
The three sets of parameter estimates were given earlier. The Hausman statistic using the
procedure suggested above is
SAMPLE ; All $
SETPANEL ; Group = id ; Pds = ti $
NAMELIST ; x = age,educ,hhninc,newhsat $
LOGIT ; Lhs = doctor ; Rhs = x, one $
CALC ; k = Col(x) $
MATRIX ; b0 = b(1:k) ; v0 = Varb(1:k,1:k) $
LOGIT ; Lhs = doctor ; Rhs = x ; Panel $
MATRIX ; b1 = b ; v1 = varb $
MATRIX ; d = b1 - b0 ; List ; h = d' * Nvsm(v1, -v0) * d $
The final result of the MATRIX command is
H| 1
--------+--------------
1| 98.1550

This statistic has four degrees of freedom. The critical value from the chi squared table is 9.49, so
based on this test, we would reject the null hypothesis of no fixed effects.
N9: Fixed and Random Effects Models for Binary Choice N-124

N9.6 Random Effects Models for Binary Choice

The five models we have developed here can also be fit with random effects instead of fixed
effects. The structure of the random effects model is

zit | ui = xit + it + ui

where ui is the unobserved heterogeneity for the ith individual,

ui ~ N[0,u2],

and it is the stochastic term in the model that provides the conditional distribution.

Prob[yit = 1| xit, ui] = F(xit + ui), i = 1,...,N, t = 1,...,Ti.

where F(.) is the distribution discussed earlier (normal, logistic, extreme value, Gompertz). Note
that the unobserved heterogeneity, ui is the same in every period. The parameters of the model are
fit by maximum likelihood. As usual in binary choice models, the underlying variance,

2 = u2 + 2

is not identified. The reduced form parameter,

u2
= ,
2 + u2

is estimated directly. With the normalization that we used earlier, 2 = 1, we can determine

u = .
1

Further discussion of the estimation of these structural parameters appears at the end of this section.
The model command for this form of the model is

PROBIT ; Lhs = dependent variable

or LOGIT ; Rhs = independent variables - not including one
; Panel
; Random Effects $

NOTE: For this model, your Rhs list should include a constant term, one.
N9: Fixed and Random Effects Models for Binary Choice N-125

Partial effects are computed by setting the heterogeneity term, ui to its expected value of zero.
Restrictions may be tested and imposed exactly as in the model with no heterogeneity. Since
restrictions can be imposed on all parameters, including , you can fix the value of at any desired
value. Do note that forcing the ancillary parameter, in this case, , to equal a slope parameter will
almost surely produce unsatisfactory results, and may impede or even prevent convergence of the
iterations.
Starting values for the iterations are obtained by fitting the basic model without random
effects. Thus, the initial results in the output for these models will be the binary choice models
discussed in the preceding sections. You may provide your own starting values for the parameters
with
; Start = ... the list of values for , value for

There is no natural moment based estimator for , so a relatively low guess is used as the starting
value instead. The starting value for is approximately .2 ( = [2/(1-)]1/2 .29 see the technical
details below. Maximum likelihood estimates are then computed and reported, along with the usual
diagnostic statistics. (An example appears below.) This model is fit by approximating the necessary
integrals in the log likelihood function by Hermite quadrature. An alternative approach to estimating
the same model is by Monte Carlo simulation. You can do exactly this by fitting the model as a
random parameters model with only a random constant term.
Your data might not be consistent with the random effects model. That is, there might be no
discernible evidence of random effects in your data. In this case, the estimate of will turn out to be
negligible. If so, the estimation program issues a diagnostic and reverts back to the original,
uncorrelated formulation and reports (again) the results for the basic model.
Results that are kept for this model are

Matrices: b = estimate of
varb = asymptotic covariance matrix for estimate of

Scalars: kreg = number of variables in Rhs

nreg = number of observations
logl = log likelihood function
rho = estimated value of
varrho = estimated asymptotic variance of estimator of

Last Model: b_variables, ru

Last Function: Prob(y = 1|x,u=0) (Note: None if you use ; RPM to fit the RE model.)

The additional specification ; Par in the command requests that be included in b and the additional
row and column corresponding to be included in varb. If you have included ; Par, rho and varrho
will also appear at the appropriate places in b and varb.

NOTE: The hypothesis of no group effects can be tested with a Wald test (simple t test) or with a
likelihood ratio test. The LM approach, using ; Maxit = 0 with a zero starting value for does not
work in this setting because with = 0, the last row of the covariance matrix turns out to contain
zeros.
N9: Fixed and Random Effects Models for Binary Choice N-126

Application

The following study fits the probit model under four sets of assumptions. The first uses the
pooled estimator, then corrects the standard errors for the clustering in the data. The second is the
unconditional fixed effects estimator. The third and fourth compute the random effects estimator,
first by quadrature, using the Butler and Moffitt method and the second using maximum simulated
likelihood with Halton draws. The output is trimmed in each model to compare only the estimates
and the marginal effects.

NAMELIST ; x = age,educ,hhninc,newhsat $
SAMPLE ; All $
SETPANEL ; Group = id ; Pds = ti $
PROBIT ; Lhs = doctor ; Rhs = x,one ; Partial Effects
; Cluster = id $
PROBIT ; Lhs = doctor ; Rhs = x ; Partial Effects
; Panel ; FEM $
PROBIT ; Lhs = doctor ; Rhs = x,one ; Partial Effects
; Panel ; Random Effects $

The random parameters model described in Chapter E31 provides an alternative estimator for the
random effects model based on maximum simulated likelihood rather than with Hermite quadrature.
The general syntax is used below for a probit model to illustrate the method.

PROBIT ; Lhs = doctor ; Rhs = x,one ; Partial Effects

; Panel ; RPM ; Fcn = one(n) ; Pts = 25 ; Halton $
CALC ; List ; b(6)^2/(1+b(6)^2) $

These are the pooled estimates with corrected standard errors.

+---------------------------------------------------------------------+
| Covariance matrix for the model is adjusted for data clustering. |
| Sample of 27326 observations contained 7293 clusters defined by |
| variable ID which identifies by a value a cluster ID. |
+---------------------------------------------------------------------+
-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable DOCTOR
Log likelihood function -16639.23971
Restricted log likelihood -18019.55173
Chi squared [ 4 d.f.] 2760.62404
Significance level .00000
McFadden Pseudo R-squared .0766008
Estimation based on N = 27326, K = 5
Inf.Cr.AIC =33288.479 AIC/N = 1.218
Hosmer-Lemeshow chi-squared = 20.51061
P-value= .00857 with deg.fr. = 8
-----------------------------------------------------------------------------
N9: Fixed and Random Effects Models for Binary Choice N-127

The unconditional fixed effects estimates appear next. They differ greatly from the pooled estimates.
It is worth noting that under the random effects assumption, neither the pooled nor these fixed effects
estimates are consistent.

-----------------------------------------------------------------------------
FIXED EFFECTS Probit Model
Dependent variable DOCTOR
Log likelihood function -9187.45120
Estimation based on N = 27326, K =4251
Inf.Cr.AIC =26876.902 AIC/N = .984
Model estimated: Jun 15, 2011, 14:02:10
Unbalanced panel has 7293 individuals
Skipped 3046 groups with inestimable ai
PROBIT (normal) probability model
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
AGE| .04701*** .00438 10.74 .0000 .03844 .05559
EDUC| -.07187* .04111 -1.75 .0804 -.15244 .00870
HHNINC| .04883 .10782 .45 .6506 -.16249 .26015
NEWHSAT| -.18143*** .00805 -22.53 .0000 -.19721 -.16564
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

These are the random effects estimates. The variance of u and correlation parameter are given
explicitly in the results. In the MSL random effects estimates that appear next, only the standard
deviation of u is given. Squaring the 1.37554428 gives 1.892122, which is nearly the same as the
1.888060 given in the first results. In order to compare the first estimates to the MSL estimates, it is
necessary to divide the first by the estimate of 1+. Thus, the scaled coefficient on age in the first
set of estimates would be 0.019322; that on educ would be -.027611, and so on. Thus, the two sets of
estimates are quite similar.
N9: Fixed and Random Effects Models for Binary Choice N-128

-----------------------------------------------------------------------------
Random Effects Binary Probit Model
Dependent variable DOCTOR
Log likelihood function -15614.50229
Restricted log likelihood -16639.23971
Chi squared [ 1 d.f.] 2049.47485
Significance level .00000
McFadden Pseudo R-squared .0615856
Estimation based on N = 27326, K = 6
Inf.Cr.AIC =31241.005 AIC/N = 1.143
Unbalanced panel has 7293 individuals
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .01305*** .00119 10.97 .0000 .01072 .01538
EDUC| -.01840*** .00594 -3.10 .0020 -.03005 -.00675
HHNINC| .06299 .06387 .99 .3240 -.06218 .18817
NEWHSAT| -.19418*** .00520 -37.32 .0000 -.20437 -.18398
Constant| 1.42666*** .09644 14.79 .0000 1.23765 1.61567
Rho| .39553*** .01045 37.84 .0000 .37504 .41601
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
Random Coefficients Probit Model
Dependent variable DOCTOR
Log likelihood function -15619.14356
Restricted log likelihood -16639.23971
Chi squared [ 1 d.f.] 2040.19230
Significance level .00000
McFadden Pseudo R-squared .0613067
Estimation based on N = 27326, K = 6
Inf.Cr.AIC =31250.287 AIC/N = 1.144
Model estimated: Jun 15, 2011, 14:04:01
Unbalanced panel has 7293 individuals
PROBIT (normal) probability model
Simulation based on 25 Halton draws
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Nonrandom parameters
AGE| .01288*** .00083 15.58 .0000 .01126 .01450
EDUC| -.01823*** .00395 -4.61 .0000 -.02598 -.01048
HHNINC| .06741 .05108 1.32 .1870 -.03271 .16752
NEWHSAT| -.19383*** .00435 -44.58 .0000 -.20235 -.18531
|Means for random parameters
Constant| 1.42554*** .06828 20.88 .0000 1.29172 1.55936
|Scale parameters for dists. of random parameters
Constant| .80930*** .01088 74.38 .0000 .78797 .83062
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N9: Fixed and Random Effects Models for Binary Choice N-129

The random parameters approach provides an alternative way to estimate a random effects
model. A comparison of the two sets of results illustrates the general result that both are consistent
estimators of the same parameters. We note, however, the Hermite quadrature approach produces an
estimator of = u2/(1 + u2) while the RP approach produces an estimator of u. To check the
consistency of the two approaches, we compute an estimate of based on the RP results. The result
below demonstrates the near equivalence of the two approaches.

CALC ; List ; b(6)^2/(1+b(6)^2)$

[CALC] *Result*= .3957574

These are the four sets of estimated partial effects.

Pooled
-----------------------------------------------------------------------------
Average partial effects for sample obs.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00297*** .20554 8.83 .0000 .00231 .00363
EDUC| -.00534*** -.09618 -3.09 .0020 -.00874 -.00195
HHNINC| -.00232 -.00130 -.12 .9058 -.04074 .03610
NEWHSAT| -.06075*** -.65528 -39.87 .0000 -.06374 -.05777
--------+--------------------------------------------------------------------
Unconditional Fixed Effects
-----------------------------------------------------------------------------
Partial derivatives of E[y] = F[*]
Estimated E[y|means,mean alphai]= .625
Estimated scale factor for dE/dx= .379
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .01783*** 1.22903 6.39 .0000 .01237 .02330
EDUC| -.02726 -.49559 -1.40 .1628 -.06554 .01102
HHNINC| .01852 .01048 .45 .6542 -.06253 .09957
NEWHSAT| -.06882*** -.77347 -5.96 .0000 -.09144 -.04619
--------+--------------------------------------------------------------------
Random Effects
-----------------------------------------------------------------------------
Partial derivatives of E[y] = F[*]
Observations used for means are All Obs.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00376*** .25254 11.06 .0000 .00310 .00443
EDUC| -.00531*** -.09261 -3.10 .0020 -.00866 -.00195
HHNINC| .01817 .00986 .99 .3239 -.01793 .05426
NEWHSAT| -.05600*** -.58577 -37.33 .0000 -.05894 -.05306
--------+--------------------------------------------------------------------
N9: Fixed and Random Effects Models for Binary Choice N-130

Random Constant Term

-----------------------------------------------------------------------------
Partial derivatives of expected val. with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Scale Factor for Marginal Effects .3541
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00456*** .28882 11.14 .0000 .00376 .00536
EDUC| -.00646*** -.10635 -5.06 .0000 -.00896 -.00396
HHNINC| .02387 .01223 1.32 .1882 -.01168 .05942
NEWHSAT| -.06864*** -.67771 -33.24 .0000 -.07269 -.06459
--------+--------------------------------------------------------------------
N10: Random Parameter Models for Binary Choice N-131

N10: Random Parameter Models for Binary

Choice
N10.1 Introduction
The probit and logit models are extended to panel data formats as internal procedures. Four
classes of models are supported:

Fixed effects: Prob[yit = 1] = F(xit + i),

i correlated with xit,

Random effects: Prob[yit = 1] = Prob[xit + it + ui > 0],

ui uncorrelated with xit,

Random parameters: Prob[yit = 1] = F( ixit),

i | i ~ h(|i) with mean vector and covariance matrix

Latent class: Prob[yit = 1|class j] = F( jxit),

Prob[class = j] = Fj()

The first two were developed in Chapter E30. This chapter documents the use of random parameters
(mixed) and latent class models for binary choice. Technical details on estimation of random
parameters are given in Chapter R24. Technical details for estimation of latent class models are
given in Chapter R25.

NOTE: None of these panel data models require balanced panels. The group sizes may always vary.

The random parameters and latent class models do not require panel data. You may fit them with a
cross section. If you omit ; Pds and ; Panel in these cases, the cross section case, Ti = 1, is assumed.
(You can also specify ; Pds = 1.) Note that this group of models (and all of the panel data models
described in the rest of this manual) does not use the ; Str = variable specification for indicating the
panel that is only for REGRESS.
The probabilities and density functions supported here are as follows:

Probit

'x i exp(t 2 / 2)
F= 2
dt = (xi), f = (xi)

Logit
exp(xi )
F= = (xi), f = (xi)[1 - (xi)]
1 + exp(xi )
N10: Random Parameter Models for Binary Choice N-132

N10.2 Probit and Logit Models with Random Parameters

We have extended the random parameters model to the binary choice models as well as
many other models including the tobit and exponential regression models. Some of the relevant
background literature includes Revelt and Train (1998), Train (1998), Brownstone and Train (1999),
and Greene (2001). (In that literature, the models are described under the heading mixed logit
models. We will require a broader rubric for our purposes.) The structure of the random parameters
model is based on the conditional probability

Prob[yit = 1| xit, i] = F(ixit), i = 1,...,N, t = 1,...,Ti.

where F(.) is the distribution discussed earlier (normal, logistic, extreme value, Gompertz). The model
assumes that parameters are randomly distributed with possibly heterogeneous (across individuals)

E[ i| zi] = + zi,

(the second term is optional the mean may be constant),

Var[ i| zi] = .

The model is operationalized by writing

i = + zi + vi where vi ~ N[0,I].

As noted earlier, the heterogeneity term is optional. In addition, it may be assumed that some of the
parameters are nonrandom. It is convenient to analyze the model in this fully general form here.
One can easily accommodate nonrandom parameters just by placing rows of zeros in the appropriate
places in and . The command structure for these models makes this simple to do.

NOTE: If there is no heterogeneity in the mean, and only the constant term is considered random
the model may specify that some parameters are nonrandom then this model is equivalent to the
random effects model of the preceding section.

N10.2.1 Command for the Random Parameters Models

The basic model command for this form of the model is

PROBIT ; Lhs = dependent variable

or LOGIT ; Rhs = independent variables
; Panel or Pds = fixed periods or count variable
; RPM
; Fcn = random parameters specification $

NOTE: For this model, your Rhs list should include a constant term.

NOTE: The ; Pds specification is optional. You may fit these models with cross section data.
N10: Random Parameter Models for Binary Choice N-133

Specifying Random Parameters

The ; Fcn = specification is used to define the random parameters. It is constructed from
the list of Rhs names as follows: Suppose your model is specified by

; Rhs = one, x1, x2, x3, x4

This involves five coefficients. Any or all of them may be random; any not specified as random are
assumed to be constant. For those that you wish to specify as random, use

; Fcn = variable name (distribution),

variable name (distribution), ...

Three distributions may be specified. All random variables have mean 0.

n = standard normal distribution, variance = 1,

t = triangular (tent shaped) distribution in [-1,+1], variance = 1/6,
u = standard uniform distribution [-1,1], variance = 1/3,
l = lognormal distribution, variance = exp(.5),
o = tent shaped distribution with one anchor at zero
g = log gamma
or c = variance = 0. (The parameter is not random.)

Each of these is scaled as it enters the distribution, so the variance is only that of the random draw
before multiplication. The normal distribution is used most often, but there are several other
possibilities. Numerous other formats for random parameters are described in Section R24.3. Those
results all apply to the binary choice models. To specify that the constant term and the coefficient on
x1 are each normally distributed with given mean and variance, use

; Fcn = one(n), x1(n).

This specifies that the first and second coefficients are random while the remainder are not. The
parameters estimated will be the mean and standard deviations of the distributions of these two
parameters and the fixed values of the other three.
The results include estimates of the means and standard deviations of the distributions of the
random parameters and the estimates of the nonrandom parameters. The log likelihood shown in the
results is conditioned on the random draws, so one might be cautious about using it to test
hypotheses, for example, that the parameters are random at all by comparing it to the log likelihood
from the basic model with all nonrandom coefficients. The test becomes valid as R increases, but the
50 used in our application is probably too few. With several hundred draws, one could reliably use
the simulated log likelihood for testing purposes.
N10: Random Parameter Models for Binary Choice N-134

Correlated Random Parameters

The preceding defines an estimator for a model in which the covariance matrix of the
random parameters is diagonal. To extend it to a model in which the parameters are freely
correlated, add
; Correlation (or just ; Cor)

to the command. An example appears below.

Heterogeneity in the Means

The preceding examples have specified that the mean of the random variable is fixed over
individuals. If there is measured heterogeneity in the means, in the form of

E[ki] = k + m km zmi

where zm is a variable that is measured for each individual, then the command may be modified to

; RPM = list of variables in z

In the data set, these variables must be repeated for each observation in the group. In the application
below, we have specified that the random parameters have different means for individuals depending
on gender and marital status.

Autocorrelation

You may change the character of the heterogeneity from a time invariant effect to an AR(1)
process,
vkit = kvki,t-1 + wkit.

N10.2.2 Results from the Estimator and Applications

The results produced by this estimator begin with the familiar diagnostic statistics, likelihood
function, information criteria, etc. The coefficient estimates are possibly rearranged so that the
nonrandom parameters appear first. In the base case of a diagonal covariance matrix, the means of
the random parameters appear next, followed in the same order by the estimated scale parameters.
The example below illustrates. For normally distributed parameters, these are the standard
deviations. For other distributions, these scale factors are multiplied by the relevant standard
deviation to obtain the standard deviation of the parameter. For example, if we had specified

; Fcn = educ(u)

in the model command, then the parameter on educ would be defined to have mean 1.697 and
standard deviation .08084 times 1/sqr(6). (The uniform draw is transformed to be U[-1,+1].)
N10: Random Parameter Models for Binary Choice N-135

The commands are:

SAMPLE ; All $
SETPANEL ; Group = id ; Pds = ti $
NAMELIST ; x = age,educ,hhninc,hsat $
LOGIT ; Lhs = doctor ; Rhs = x,one
; Partial Effects ; Panel ; RPM
; Fcn = one(n),hhninc(n),hsat(n)
; Pts = 25 ; Halton $
-----------------------------------------------------------------------------
Logit Regression Start Values for DOCTOR
Dependent variable DOCTOR
Log likelihood function -16639.59764
Estimation based on N = 27326, K = 5
Inf.Cr.AIC =33289.195 AIC/N = 1.218
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .01366*** .00121 11.25 .0000 .01128 .01603
EDUC| -.02603*** .00585 -4.45 .0000 -.03749 -.01457
Constant| 2.28946*** .10379 22.06 .0000 2.08604 2.49288
HHNINC| -.01221 .07670 -.16 .8735 -.16254 .13812
HSAT| -.29185*** .00681 -42.87 .0000 -.30519 -.27850
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
Random Coefficients Logit Model
Dependent variable DOCTOR
Log likelihood function -15617.53717
Restricted log likelihood -16639.59764
Chi squared [ 3 d.f.] 2044.12094
Significance level .00000
McFadden Pseudo R-squared .0614234
Estimation based on N = 27326, K = 8
Inf.Cr.AIC =31251.074 AIC/N = 1.144
Unbalanced panel has 7293 individuals
LOGIT (Logistic) probability model
Simulation based on 25 Halton draws
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Nonrandom parameters
AGE| .01541*** .00100 15.39 .0000 .01344 .01737
EDUC| -.02538*** .00475 -5.34 .0000 -.03469 -.01607
|Means for random parameters
Constant| 1.77433*** .08285 21.42 .0000 1.61195 1.93671
HHNINC| .08517 .06181 1.38 .1682 -.03598 .20632
HSAT| -.23532*** .00541 -43.50 .0000 -.24592 -.22471
|Scale parameters for dists. of random parameters
Constant| 1.37499*** .01982 69.36 .0000 1.33614 1.41384
HHNINC| .18336*** .03792 4.84 .0000 .10904 .25768
HSAT| .00080 .00204 .39 .6960 -.00319 .00479
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N10: Random Parameter Models for Binary Choice N-136

-----------------------------------------------------------------------------
Partial derivatives of expected val. with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Conditional Mean at Sample Point .6436
Scale Factor for Marginal Effects .2294
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00353*** .23902 15.53 .0000 .00309 .00398
EDUC| -.00582*** -.10241 -5.36 .0000 -.00795 -.00369
HHNINC| .01954 .01069 1.38 .1686 -.00827 .04735
HSAT| -.05398*** -.56914 -29.82 .0000 -.05753 -.05043
--------+--------------------------------------------------------------------
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

When the random parameters are specified to be correlated, the output is changed. The
parameter vector in this case is written

i = 0 + vi

where is a lower triangular Cholesky matrix. In this case, the nonrandom parameters and the
means of the random parameters are reported as before. The table then reports in two parts. The
diagonal elements are reported first. These would correspond to the case above. The nonzero
elements of below the diagonal are reported next, rowwise. In the example below, there are three
random parameters, so there are 1 + 2 elements below the main diagonal of in the reported results.
The covariance matrix for the random parameters in this specification is

Var [ i] = = A

where A is the known diagonal covariance matrix of vi. For normally distributed parameters, A = I.
This matrix is reported separately after the tabled coefficient estimates. Finally, the square roots of
the diagonal elements of the estimate of are reported, followed by the correlation matrix derived
from . The example below illustrates.

LOGIT ; Lhs = doctor ; Rhs = x,one

; Partial Effects
; Pds = _groupti
; RPM
; Fcn = one(n),hhninc(n),newhsat(n)
; Correlated
; Pts = 25
; Halton $
N10: Random Parameter Models for Binary Choice N-137

-----------------------------------------------------------------------------
Random Coefficients Logit Model
Dependent variable DOCTOR
Log likelihood function -15606.79747
Restricted log likelihood -16639.59764
Chi squared [ 6 d.f.] 2065.60035
Significance level .00000
McFadden Pseudo R-squared .0620688
Estimation based on N = 27326, K = 11
Inf.Cr.AIC =31235.595 AIC/N = 1.143
Unbalanced panel has 7293 individuals
LOGIT (Logistic) probability model
Simulation based on 25 Halton draws
-----------------------------------------------------------------------------

Implied covariance matrix of random parameters

Var_Beta| 1 2 3
--------+------------------------------------------
1| 3.63867 -.00447279 -.154960
2| -.00447279 .832783 .0865698
3| -.154960 .0865698 .0158724

Implied standard deviations of random parameters

S.D_Beta| 1
--------+--------------
1| 1.90753
2| .912570
3| .125986

Implied correlation matrix of random parameters

Cor_Beta| 1 2 3
--------+------------------------------------------
1| 1.00000 -.00256946 -.644803
2| -.00256946 1.00000 .752973
3| -.644803 .752973 1.00000
N10: Random Parameter Models for Binary Choice N-138

-----------------------------------------------------------------------------
Partial derivatives of expected val. with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Conditional Mean at Sample Point .6464
Scale Factor for Marginal Effects .2286
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00336*** .22640 14.71 .0000 .00291 .00381
EDUC| -.00626*** -.10967 -5.78 .0000 -.00838 -.00414
HHNINC| .02157 .01175 1.43 .1522 -.00796 .05110
HSAT| -.05864*** -.61557 -27.65 .0000 -.06280 -.05448
--------+--------------------------------------------------------------------

Finally, if you specify that there is observable heterogeneity in the means of the parameters
with
; RPM = list of variables

then the model changes to

i = 0 + zi + vi.

The elements of , rowwise, are reported after the decomposition of . The example below, which
contains gender and marital status, illustrates. Note that a compound name is created for the
elements of .

LOGIT ; Lhs = doctor ; Rhs = x,one

; Partial Effects
; Panel
; RPM = female,married
; Fcn = one(n),hhninc(n),hsat(n)
; Correlated
; Pts = 25
; Halton $

-----------------------------------------------------------------------------
Random Coefficients Logit Model
Dependent variable DOCTOR
Log likelihood function -15470.04441
Restricted log likelihood -16639.59764
Chi squared [ 12 d.f.] 2339.10646
Significance level .00000
McFadden Pseudo R-squared .0702874
Estimation based on N = 27326, K = 17
Inf.Cr.AIC =30974.089 AIC/N = 1.134
Model estimated: Jun 15, 2011, 18:43:49
Unbalanced panel has 7293 individuals
LOGIT (Logistic) probability model
Simulation based on 25 Halton draws
-----------------------------------------------------------------------------
N10: Random Parameter Models for Binary Choice N-139

Implied covariance matrix of random parameters

Var_Beta| 1 2 3
--------+------------------------------------------
1| 3.42595 .291109 -.124767
2| .291109 1.40195 .0832340
3| -.124767 .0832340 .0109393

Implied standard deviations of random parameters

S.D_Beta| 1
--------+--------------
1| 1.85093
2| 1.18404
3| .104591

Implied correlation matrix of random parameters

Cor_Beta| 1 2 3
--------+------------------------------------------
1| 1.00000 .132831 -.644484
2| .132831 1.00000 .672107
3| -.644484 .672107 1.00000
N10: Random Parameter Models for Binary Choice N-140

-----------------------------------------------------------------------------
Partial derivatives of expected val. with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Conditional Mean at Sample Point .6687
Scale Factor for Marginal Effects .2215
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00305* .19821 1.89 .0591 -.00012 .00621
EDUC| -.00202 -.03425 -1.28 .1994 -.00511 .00107
HHNINC| .02238 .01178 .38 .7014 -.09203 .13679
HSAT| -.05744 -.58287 -.70 .4825 -.21776 .10288
--------+--------------------------------------------------------------------
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Results saved by this estimator are:

Matrices: b = estimate of
varb = asymptotic covariance matrix for estimate of .
gammaprm = the estimate of
beta_i = individual specific parameters, if ; Par is requested
sdbeta_i = individual specific parameter standard deviations if ; Par
is requested

Scalars: kreg = number of variables in Rhs

nreg = number of observations
logl = log likelihood function

Last Model: b_variables

Last Function: None

Simulation based estimation is time consuming. The sample size here is fairly large (27,326
observations). We limited the simulation to 25 Halton draws. The amount of computation rises
linearly with the number of draws. A typical application of the sort pursued here would use perhaps
300 draws, or 12 times what we used. Estimation of the last model required two minutes and 30
seconds, so in full production, estimation of this model might take 30 minutes. In general, you can
get an idea about estimation times by starting with a small model and a small number of draws. The
amount of computation rises linearly with the number of draws that is the main consumer. It also
rises linearly with the number of random parameters. The time spent fitting the model will rise only
slightly with the number of nonrandom numbers. Finally, it will rise linearly with the number of
observations. Thus, a model with a doubled sample and twice as many draws will take four times as
long to estimate as one with the original sample and number of draws.
When you include ; Par in the model command, two additional matrices are created, beta_i
and sdbeta_i. Extensive detail on the computation of these matrices is provided in Section R24.5.
For the final specification described above, the results would be as shown in Figure N10.1.
N10: Random Parameter Models for Binary Choice N-141

Figure N10.1 Estimated Conditional Parameter Means

N10.2.3 Controlling the Simulation

R is the number of points in the simulation. Authors differ in the appropriate value. Train
recommends several hundred. Bhat suggests 1,000 is an appropriate value. The program default is
100. You can choose the value with

; Pts = number of draws, R

The value of 50 that we set in our experiments above was chosen purely to produce an example that
you could replicate without spending an inordinate amount of waiting for the results.
The standard approach to simulation estimation is to use random draws from the specified
distribution. As suggested immediately above, good performance in this connection requires very
large numbers of draws. The drawback to this approach is that with large samples and large models,
this entails a huge amount of computation and can be very time consuming. Some authors have
documented dramatic speed gains with no degradation in simulation performance through the use of
a small number of Halton draws instead of a large number of random draws. Authors (e.g., Bhat
(2001)) have found that a Halton sequence of draws with only one tenth the number of draws as a
random sequence is equally effective. To use this approach, add

; Halton

to your model command.

In order to replicate an estimation, you must use the same random draws. One implication
of this is that if you give the identical model command twice in sequence, you will not get the
identical set of results because the random draws in the sequences will be different. To obtain the
same results, you must reset the seed of the random number generator with a command such as

CALC ; Ran(seed value) $

N10: Random Parameter Models for Binary Choice N-142

(Note that we have used Ran(12345) before some of our earlier examples, precisely for this reason.
The specific value you use for the seed is not of consequence; any odd number will do.
The random sequence used for the model estimation must be the same in order to obtain
replicability. In addition, during estimation of a particular model, the same set of random draws
must be used for each person every time. That is, the sequence vi1, vi2, ..., viR used for each
individual must be same every time it is used to calculate a probability, derivative, or likelihood
function. (If this is not the case, the likelihood function will be discontinuous in the parameters, and
successful estimation becomes unlikely.) One way to achieve this which has been suggested in the
literature is to store the random numbers in advance, and simply draw from this reservoir of values
as needed. Because NLOGIT is able to use very large samples, this is not a practical solution,
especially if the number of draws is large as well. We achieve the same result by assigning to each
individual, i, in the sample, their own random generator seed which is a unique function of the global
random number seed, S, and their group number, i;

Seed(S,i) = S + 123.0 i, then minus 1.0 if the result is even.

Since the global seed, S, is a positive odd number, this seed value is unique, at least within the
several million observation range of NLOGIT.

N10.2.4 The Parameter Vector and Starting Values

Starting values for the iterations are obtained by fitting the basic model without random
parameters. Other parameters are set to zero. Thus, the initial results in the output for these models
will be the binary choice models discussed in the preceding sections. You may provide your own
starting values for the parameters with

; Start = ... the list of values for .

The parameter vector is laid out as follows, in this order:

1, ..., K are the K nonrandom parameters,

1,...,M are the M means of the distributions of the random parameters,
1,2,...,M are the M scale parameters for the distributions of the random parameters.

These are the essential parameters. If you have specified that parameters are to be correlated, then
the s are followed by the below diagonal elements of . (The s are the diagonal elements.) If you
have specified heterogeneity variables, z, then the preceding are followed by the rows of .
Consider an example: The model specifies:

; RPM = z1,z2
; Rhs = one,x1,x2,x3,x4 ? base parameters 1, 2, 3, 4, 5
; Fcn = one(n),x2(n),x4(n)
; Cor
N10: Random Parameter Models for Binary Choice N-143

Then, after rearranging, the model becomes

Variable Parameter
x1 1
x3 2
one 1 + 1vi1 + 11zi1 + 12zi2
x2 2 + 2vi2 + 21vi1 + 11zi1 + 12zi2
x4 3 + 3vi3 + 31vi1 + 32vi2 + 11zi1 + 12zi2

and the parameter vector would be

= 1, 2, 1, 2, 3, 1, 2, 3, 21, 31, 32, 11, 12, 21, 22, 31, 32.

You may use ; Rst and ; CML to impose restrictions on the parameters. Use the preceding as a
guide to the arrangement of the parameter vector. We do note, using ; Rst to impose fixed value,
such as zero restrictions, will generally work well. Other kinds of restrictions, particularly across the
parts of the parameter vector, will generally produce unfavorable results.
The variances of the underlying random variables are given earlier, 1 for the normal
distribution, 1/3 for the uniform, and 1/6 for the tent distribution. The parameters are only the
standard deviations for the normal distribution. For the other two distributions, k is a scale
parameter. The standard deviation is obtained as k/ 3 for the uniform distribution and k/ 6 for
the triangular distribution. When the parameters are correlated, the implied covariance matrix is
adjusted accordingly. The correlation matrix is unchanged by this.

N10.2.5 A Dynamic Probit Model

We consider estimation of the dynamic (habit persistence) probit model

yit* = + xit + yi,t-1 + it + ui, t = 0,...,Ti, i = 1,...,N

yit = 1(yit* > 0).

Simple estimation of the model by maximum likelihood is clearly inappropriate owing to the random
effect. ML random effects is likewise inconsistent because yi,t-1 will be correlated with the random
effect. Following Heckman (1981), a suggested formulation and procedure for estimation are as
follows: Treat the initial condition as an equilibrium, in which

yi0* = + xi0 + i0 + ui
yi0 = 1(yi0* > 0)

and retain the preceding model for periods 1,...,Ti. Note that the same random effect, ui appears
throughout, but the scaling parameter and the slope vector are different in the initial period. The
lagged value of yit does not appear in period 0. This model can be estimated in this form with the
random parameters estimator in NLOGIT. Use the following procedure.
N10: Random Parameter Models for Binary Choice N-144

Set up the variables:

dit = 1 in period 1, 0 in all other periods,

fit = 1 - dit = 1 in all periods except period 1,
xit = the set of regressors in the model, 0 in the first period,
xi0 = the set of regressors in the model in period 0, 0 in all other periods,
yi,-1 = yi,t-1 in periods 1,...,Ti, 0 in the first period.

Then, the encompassing model is

yit* = xit + xi0 + dit + fit + yi,-1 + it + fitui + ditui,

yit = 1(yit* > 0), t = 0,1,...,Ti.

The commands you might use to set up the data would follow these steps. First, use CREATE to set
up your group size count variable, _groupti.

CREATE ; yit = the dependent variable

; yit1 = yit[-1] ? Make sure that yit1 = 0 in the first period.
; t = Trn(-ti,1) or whatever means to set up 1,2,...Ti + 1
; dit = (t=1) ; fit = (t > 1) $
CREATE ; set up the xit and xi0 sets of variables $

The estimation command is a random parameters probit model. We make use of a special feature of
the RPM that allows the random component of the random parameters to be shared by more than one
parameter. This is precisely what is needed to have both ui and ui appear in the equation without
forcing = .

PROBIT ; Lhs = yit

; Rhs = xit,xi0,yit1,dit,fit
; Panel
; RPM
; Fcn = dit(n), fit(n)
; Common
; ... any other desired specifications for the estimation $

A refinement of this model assumes that ui = zi + wi for a set of time invariant variables. (See
Hyslop (1999) and Greene (2011.) One possibility is the vector of group means of the variables xit.
(Only the time varying variables would be included in these means.) These can be created and
included as additional Rhs variables.
N10: Random Parameter Models for Binary Choice N-145

N10.3 Latent Class Models for Binary Choice

The binary choice model for a panel of data, i = 1,...,N, t = 1,...,Ti is

Prob[Yit = yit | xit] = F(yit,xit) = P(i,t), yit = 0 or 1.

Henceforth, we use the term group to indicate the Ti observations on respondent i in periods t =
1,...,Ti. Unobserved heterogeneity in the distribution of yit is assumed to impact the density in the
form of a random effect. The continuous distribution of the heterogeneity is approximated by using
a finite number of points of support. The distribution is approximated by estimating the location of
the support points and the mass (probability) in each interval. In implementation, it is convenient
and useful to interpret this discrete approximation as producing a sorting of individuals (by
heterogeneity) into J classes, j = 1,...,J. (Since this is an approximation, J is chosen by the analyst.)
Thus, we modify the model for a latent sorting of yit into J classes with a model which
allows for heterogeneity as follows: The probability of observing yit given that regime j applies is

P(i,t|j) = Prob[Yit = yit| xit, j]

where the density is now specific to the group. The analyst does not observe directly which class,
j = 1,...,J generated observation yit|j, and class membership must be estimated. Heckman and Singer
(1984) suggest a simple form of the class variation in which only the constant term varies across the
classes. This would produce the model

P(i,t|j) = F[yit, xit + j], Prob[class = j] = Fj

We formulate this approximation more generally as,

P(i,t|j) = F[yit, xit + jxit], Fj = exp(j) / j exp(j), with J = 0.

In this formulation, each group has its own parameter vector, j = + j, though the variables that
enter the mean are assumed to be the same. (This can be changed by imposing restrictions on the
full parameter vector, as described below.) This allows the Heckman and Singer formulation as a
special case by imposing restrictions on the parameters. You may also specify that the latent class
probabilities depend on person specific characteristics, so that

ij = jzi, J = 0.

The estimation command for this model is

PROBIT or LOGIT ; Lhs = dependent variable

; Rhs = independent variables
; Panel or Pds = fixed periods or count variable
; LCM $
N10: Random Parameter Models for Binary Choice N-146

The default number of support points is five. You may set J from two to 30 classes with

; Pts = the value

Use ; LCM = list of variables in zi

to specify the multinomial logit form of the latent class probabilities.

Estimates retained by this model include

Matrices: b = full parameter vector, [ 1, 2,... F1,...,FJ]

varb = full covariance matrix
Note that b and varb involve J(K+1) estimates.

Two additional matrices are created:

b_class = a JK matrix with each row equal to the corresponding j

class_pr = a J1 vector containing the estimated class probabilities

If the command specifies ; Parameters, then the additional matrix created is:

beta_i = individual specific parameters

Scalars: kreg = number of variables in Rhs list

nreg = total number of observations used for estimation
logl = maximized value of the log likelihood function
exitcode = exit status of the estimation procedure

N10.3.1 Application
To illustrate the model, we will fit probit models with three latent classes as alternatives to
the continuously varying random parameters models in the preceding section. This model requires a
fairly rich data set it will routinely fail to find a maximum if the number of observations in a group
is small. In addition, it will break down if you attempt to fit too many classes. (This point is
addressed in Heckman and Singer.)
The model estimates include the estimates of the prior probabilities of group membership. It
is also possible to compute the posterior probabilities for the groups, conditioned on the data. The
; List specification will request a listing of these. The final illustration below shows this feature for
a small subset of the data used above. The models use the following commands: The first is the
pooled probit estimator. The second is a basic, three class LCM. The third models the latent class
probabilities as functions of the gender and marital status dummy variables. The final model
command fits a comparable random parameters model. We will compare the two estimated models.
N10: Random Parameter Models for Binary Choice N-147

Fit the pooled probit model first, basic latent class, then latent class with the gender and
marital status dummy variables in the class probabilities.

PROBIT ; Lhs = doctor ; Rhs = x,one

; Partial Effects
; Cluster = id $
MATRIX ; betapool = b $
PROBIT ; Lhs = doctor ; Rhs = x,one
; Partial Effects
; Pds = _groupti
; LCM
; Pts = 3 $
PROBIT ; Lhs = doctor ; Rhs = x,one
; Partial Effects
; Pds = _groupti
; LCM = female,married
; Pts = 3
; Parameters $
Fit the random parameters probit model with heterogeneity in means.

PROBIT ; Lhs = doctor ; Rhs = x,one

; Partial Effects
; Pds = _groupti
; RPM = female,married
; Fcn = one(n),hhninc(n),newhsat(n)
; Correlated
; Pts = 25
; Halton
; Parameters $

These are the estimated parameters of the pooled probit model. The cluster correction is shown with
the pooled results.
+---------------------------------------------------------------------+
| Covariance matrix for the model is adjusted for data clustering. |
| Sample of 27326 observations contained 7293 clusters defined by |
| variable ID which identifies by a value a cluster ID. |
+---------------------------------------------------------------------+
-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable DOCTOR
Log likelihood function -16638.96591
Restricted log likelihood -18019.55173
Chi squared [ 4 d.f.] 2761.17165
Significance level .00000
McFadden Pseudo R-squared .0766160
Estimation based on N = 27326, K = 5
Inf.Cr.AIC =33287.932 AIC/N = 1.218
Hosmer-Lemeshow chi-squared = 20.59314
P-value= .00831 with deg.fr. = 8
-----------------------------------------------------------------------------
N10: Random Parameter Models for Binary Choice N-148

These are the estimates of the basic three class latent class model.

-----------------------------------------------------------------------------
Latent Class / Panel Probit Model
Dependent variable DOCTOR
Log likelihood function -15609.05992
Restricted log likelihood -16638.96591
Chi squared [ 13 d.f.] 2059.81198
Significance level .00000
McFadden Pseudo R-squared .0618972
Estimation based on N = 27326, K = 17
Inf.Cr.AIC =31252.120 AIC/N = 1.144
Unbalanced panel has 7293 individuals
PROBIT (normal) probability model
Model fit with 3 latent classes.
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Model parameters for latent class 1
AGE| .01388*** .00228 6.10 .0000 .00942 .01835
EDUC| -.00381 .01146 -.33 .7399 -.02627 .01866
HHNINC| -.07299 .15239 -.48 .6320 -.37166 .22569
HSAT| -.20115*** .01709 -11.77 .0000 -.23466 -.16765
Constant| 2.08411*** .23986 8.69 .0000 1.61399 2.55424
|Model parameters for latent class 2
AGE| .01336*** .00183 7.29 .0000 .00977 .01696
EDUC| -.01886** .00815 -2.31 .0206 -.03483 -.00289
HHNINC| .06824 .10660 .64 .5221 -.14069 .27717
HSAT| -.20129*** .00994 -20.26 .0000 -.22076 -.18181
Constant| 1.15407*** .17393 6.64 .0000 .81317 1.49498
|Model parameters for latent class 3
AGE| .00547 .00464 1.18 .2390 -.00363 .01456
EDUC| -.04318** .01911 -2.26 .0239 -.08063 -.00572
HHNINC| .30044 .21747 1.38 .1671 -.12579 .72668
HSAT| -.14638*** .01965 -7.45 .0000 -.18489 -.10786
Constant| .24354 .31547 .77 .4401 -.37478 .86186
|Estimated prior probabilities for class membership
Class1Pr| .40689*** .04775 8.52 .0000 .31331 .50048
Class2Pr| .45729*** .03335 13.71 .0000 .39192 .52266
Class3Pr| .13581*** .02815 4.82 .0000 .08063 .19100
--------+--------------------------------------------------------------------
N10: Random Parameter Models for Binary Choice N-149

The three class latent class model is extended to allow the prior class probabilities to differ by sex
and marital status.
-----------------------------------------------------------------------------
Latent Class / Panel Probit Model
Dependent variable DOCTOR
Log likelihood function -15471.73843
Restricted log likelihood -16638.96591
Chi squared [ 19 d.f.] 2334.45496
Significance level .00000
McFadden Pseudo R-squared .0701502
Estimation based on N = 27326, K = 21
Inf.Cr.AIC =30985.477 AIC/N = 1.134
Unbalanced panel has 7293 individuals
PROBIT (normal) probability model
Model fit with 3 latent classes.
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Model parameters for latent class 1
AGE| .01225*** .00240 5.11 .0000 .00755 .01695
EDUC| .01438 .01311 1.10 .2725 -.01130 .04007
HHNINC| -.02303 .16581 -.14 .8895 -.34801 .30194
HSAT| -.17738*** .01802 -9.84 .0000 -.21271 -.14205
Constant| 1.76773*** .25126 7.04 .0000 1.27528 2.26018
|Model parameters for latent class 2
AGE| .00185 .00409 .45 .6508 -.00616 .00986
EDUC| -.03067** .01439 -2.13 .0331 -.05888 -.00245
HHNINC| .23788 .18111 1.31 .1890 -.11709 .59285
HSAT| -.15169*** .01623 -9.35 .0000 -.18349 -.11989
Constant| .44044* .26021 1.69 .0905 -.06957 .95045
|Model parameters for latent class 3
AGE| .01401*** .00199 7.02 .0000 .01010 .01791
EDUC| -.00399 .00847 -.47 .6372 -.02060 .01261
HHNINC| .03018 .11424 .26 .7916 -.19372 .25408
HSAT| -.21215*** .01178 -18.01 .0000 -.23524 -.18906
Constant| 1.13165*** .18329 6.17 .0000 .77241 1.49088
|Estimated prior probabilities for class membership
ONE_1| -.53375** .21925 -2.43 .0149 -.96347 -.10403
FEMALE_1| 1.18549*** .13400 8.85 .0000 .92284 1.44813
MARRIE_1| -.33518** .16234 -2.06 .0390 -.65336 -.01700
ONE_2| -.51961* .26512 -1.96 .0500 -1.03924 .00002
FEMALE_2| -.31028* .18197 -1.71 .0882 -.66694 .04638
MARRIE_2| -.42489** .18253 -2.33 .0199 -.78265 -.06713
ONE_3| 0.0 .....(Fixed Parameter).....
FEMALE_3| 0.0 .....(Fixed Parameter).....
MARRIE_3| 0.0 .....(Fixed Parameter).....
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
Fixed parameter ... is constrained to equal the value or
had a nonpositive st.error because of an earlier problem.
-----------------------------------------------------------------------------
+------------------------------------------------------------+
| Prior class probabilities at data means for LCM variables |
| Class 1 Class 2 Class 3 Class 4 Class 5 |
| .36905 .17087 .46008 .00000 .00000 |
+------------------------------------------------------------+
N10: Random Parameter Models for Binary Choice N-150

Since the class probabilities now differ by observation, the program reports an average using
the data means. The earlier fixed prior class probabilities are shown below the averages for this
model. The extension brings only marginal changes in the averages, but this does not show the
variances across the different demographic segments (female/male, married/single) which may be
substantial.
These are the estimated individual parameter vectors.

Figure N10.2 Latent Class Parameter Estimates

The random parameters model in which parameter means differ by sex and marital status and are
correlated with each other is comparable to the full latent class model shown above.

-----------------------------------------------------------------------------
Random Coefficients Probit Model
Dependent variable DOCTOR
Log likelihood function -15469.87914
Restricted log likelihood -16638.96591
Chi squared [ 12 d.f.] 2338.17354
Significance level .00000
McFadden Pseudo R-squared .0702620
Estimation based on N = 27326, K = 17
Inf.Cr.AIC =30973.758 AIC/N = 1.133
Unbalanced panel has 7293 individuals
PROBIT (normal) probability model
Simulation based on 25 Halton draws
-----------------------------------------------------------------------------
N10: Random Parameter Models for Binary Choice N-151

Implied covariance matrix of random parameters

Var_Beta| 1 2 3
--------+------------------------------------------
1| 1.09579 .109842 -.0344941
2| .109842 .496629 .0285454
3| -.0344941 .0285454 .00319490

Implied standard deviations of random parameters

S.D_Beta| 1
--------+--------------
1| 1.04680
2| .704719
3| .0565235

Implied correlation matrix of random parameters

Cor_Beta| 1 2 3
--------+------------------------------------------
1| 1.00000 .148897 -.582977
2| .148897 1.00000 .716624
3| -.582977 .716624 1.00000

These are the estimated marginal effects from the three models estimated, the pooled probit model,
the three class latent class model and a comparable random parameters model, respectively.
N10: Random Parameter Models for Binary Choice N-152

Pooled
-----------------------------------------------------------------------------
Partial derivatives of E[y] = F[*] with
respect to the vector of characteristics
Average partial effects for sample obs.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00297*** .20548 8.83 .0000 .00231 .00363
EDUC| -.00534*** -.09614 -3.09 .0020 -.00873 -.00195
HHNINC| -.00230 -.00129 -.12 .9066 -.04072 .03612
HSAT| -.06076*** -.65534 -39.87 .0000 -.06375 -.05777
--------+--------------------------------------------------------------------
3 Class Latent Class
--------+--------------------------------------------------------------------
AGE| .00446*** .28510 7.28 .0000 .00326 .00566
EDUC| -.00572*** -.09511 -2.64 .0082 -.00997 -.00148
HHNINC| .01510 .00780 .61 .5433 -.03360 .06381
HSAT| -.06917*** -.68884 -19.60 .0000 -.07609 -.06225
--------+--------------------------------------------------------------------
3 Class Heterogeneous Priors
-----------------------------------------------------------------------------
AGE| .00406*** .26197 7.00 .0000 .00292 .00520
EDUC| -.00064 -.01069 -.27 .7838 -.00519 .00391
HHNINC| .01657 .00865 .68 .4953 -.03106 .06420
HSAT| -.06804*** -.68420 -20.83 .0000 -.07444 -.06164
--------+--------------------------------------------------------------------
Random Parameters
-----------------------------------------------------------------------------
AGE| .00424*** .27768 3.18 .0015 .00162 .00685
EDUC| -.00257 -.04379 -1.48 .1385 -.00597 .00083
HHNINC| .03226 .01711 .55 .5814 -.08242 .14695
HSAT| -.07827 -.79992 -1.22 .2216 -.20379 .04724
--------+--------------------------------------------------------------------
N11: Semiparametric and Nonparametric Models for Binary Choice N-153

N11: Semiparametric and Nonparametric

Models for Binary Choice
N11.1 Introduction
This chapter will present three non- and semiparametric estimators for binary choice models.
Familiar parametric estimators of binary response models, such as the probit and logit are based on
the log likelihood criterion,

i =1
1 n
log L = log F ( yi | ' xi ) .
n
The Cramer-Rao theory justifies this procedure on the basis of efficiency of the parameter estimates.
But, it is to be noted that the criterion is not a function of the ability of the model to predict the
response. Moreover, in spite of the widely observed similarity of the predictions from the different
models, the issue of which parametric family (normal, logistic, etc.) is most appropriate has never
been settled, and there exist no formal tests to resolve the question in any given setting. Various
estimators have been suggested for the purpose of broadening the parametric family, so as to relax
the restrictive nature of the model specification. Two semiparametric estimators are presented in
NLOGIT, Manskis (1975, 1985) and Manski and Thompsons (1985, 1987) maximum score
(MSCORE) estimator and Klein and Spadys (1993) kernel density estimator.
The MSCORE estimator is constructed specifically around the prediction criterion
Choose to maximize S = i [yi* zi*],
where yi* = sign (-1/1) of the dependent variable
zi* = the sign (-1/1) of xi.
Thus, the MSCORE estimator seeks to maximize the number of correct predictions by our familiar
prediction rule predict yi = 1 when the estimated Prob[yi = 1] is greater than .5, assuming that the
true, underlying probability function is symmetric. In those settings, such as probit and logit, in
which the density is symmetric, the sign of the argument is sufficient to define whether the
probability is greater or less than .5. For the asymmetric distributions, this is not the case, which
suggests a limitation of the MSCORE approach. The estimator does allow another degree of
freedom in the choice of a quantile other than .5 for the prediction rule see the definition below
but this is only a partial solution unless one has prior knowledge about the underlying density.
Klein and Spadys semiparametric density estimator is based on the specification

Prob[yi = 1] = P(xi)
where P is an unknown, continuous function of its argument with range [0,1]. The function P is not
specified a priori; it is estimated with the parameters. The probability function provides the location
for the index that would otherwise be provided by a constant term. The estimation criterion is
1 n
log L = [ yi log Pn (xi ) + (1 yi ) log(1 Pn (xi ))]
n i =1
where Pn is the estimator of P and is computed using a kernel density estimator.
N11: Semiparametric and Nonparametric Models for Binary Choice N-154

The third estimator is a nonparametric treatment of binary choice based on the index
function estimated from a parametric model such as a logit model.

N11.2 Maximum Score Estimation - MSCORE

Maximum score is a semiparametric approach to estimation which is based on a prediction
rule. The base case (quantile = ) is

S = i [yi* zi* ],

where yi* is the sign (-1/1) of the dependent variable and zi* is the counterpart for the fitted model;
zi* = the sign (-1/1) of xi. Thus, this base case is formulated precisely upon the ability of the sign
of the estimated index function to predict the sign of the dependent variable (which, in the binary
response models, is all that we observe). Formally, MSCORE maximizes the sample score function

MaxB Sn() = (1/n)i[yi* - (1-2)]Sgn(xi),

where B = { RK : = 1}.

The sample data consist of n observations [yi* ,xi] where yi* is the binary response. Input of yi is the
usual binary variable taking values zero and one; yi* is obtained internally by converting zeros to
minus ones. The quantile, , is between zero and one and is provided by the user. The vector xi is
the usual set of K regressors, usually including a constant. An equivalent problem is to maximize the
normalized sample score function

SN*() = (1/n)[Sn() / Wn + 1],

where Wn = (1/n)iwi
and wi = abs(yi* - (1-2)).

This may then be rewritten as

Sn() = i wi 1[yi* = Sgn(xi)],

where wi* = wi / Wn.

and 1[] is the indicator function which equals 1 if the condition in the brackets is true and 0
otherwise. Thus, in the preceding, 1[] equals 1 if the sign of the index function, xi, correctly
predicts yi*. The normalized sample score function is, thus, a weighted average of the prediction
indicators. If = , then wi* equals 1/n, and the normalized score is the fraction of the observations
for which the response variable is correctly predicted. Maximum score estimation can therefore be
interpreted as the problem of finding the parameters that maximize a weighted average number of
correct predictions for the binary response.
The following shows how to use the MSCORE command and gives technical details about
the procedure. An application is given with the development of NPREG, which is a companion
program, in Section N11.4.
N11: Semiparametric and Nonparametric Models for Binary Choice N-155

N11.2.1 Command for MSCORE

The mandatory part of the command for invoking the maximum score estimator

MSCORE ; Lhs = y ; Rhs = x list of independent variables $

The first element of x should be one. The variable y is a binary dependent variable, coded 0/1. The
following are the optional specifications for this command. The default values given are used by
NLOGIT if the option is not specified on the command. MSCORE is designed for relatively small
problems. The internal limits are 15 parameters and 10,000 observations.

N11.2.2 Options Specific to the Maximum Score Estimator

Quantile

The quantile defines the way the score function is computed. The default of .5 dictates that
the score is to be calculated as (1/n) times the number of correctly predicted signs of the response
variable. You may choose any value between 0 and 1with

; Qnt = quantile (default = .5; this is ).

Number of Bootstrap Replications

Bootstrap estimates are computed as follows: After computing the point estimate,
MSCORE generates R bootstrap samples from the data by sampling n observations with
replacement. The entire point estimation procedure, including computation of starting values is
repeated for each one. Let b be the maximum score estimate, R be the number of bootstrap
replications, and di be the ith bootstrap estimate. The mean squared deviation matrix,

MSD = (1/R)i [(di - b)(di - b)],

is computed from the bootstrap estimates. This is reported in the output as if it were the estimated
covariance matrix of the estimates. But, it must be noted that there is no theory to suggest that this is
correct. In purely practical terms, the deviations are from the point estimate, not the mean of the
bootstrap estimates. The results are merely suggestive. The use of ; Test: should also be done with
this in mind. Use
; Nbt = number of bootstraps (default = 20)

to set the number of bootstrap iterations.

N11: Semiparametric and Nonparametric Models for Binary Choice N-156

Analysis of Ties

The specification for analysis of ties is

; Ties to analyze ties (default = no)

If the ; Ties option is chosen, MSCORE reports information about regions of the parameter space
discovered during the endgame searches for which the sample score is tied with the score at the final
estimates. If a tie is found in a region, MSCORE records the endpoints of the interval, the current
search direction, and some information which records each observations contribution to the sample
score in the region. It is possible to determine whether ties found on separate great circle searches
represent disjoint regions or intersections of different great circles. Since the region containing the
final estimates is partially searched in each iteration, the tie checking procedure records extensive
information about this region. For each region, MSCORE reports the minimum and maximum
angular direction from the final estimates. These are labeled PSI-low and PSI-high. The parameter
values associated with these endpoints are also reported.
If tie regions are found that are far from the point estimate, it may be that the global
maximum remains to be found. If so, it may be useful to rerun the estimator using a starting value in
the tied region. The existence of many tie regions does not necessarily indicate an unreliable
estimate. Particularly in large samples, there may be a large number of disjoint regions in a small
neighborhood of the global maximum.

Number of Endgame Iterations

The number of endgame iterations is specified with

; End = number endgame iterations (default = 5).

A given set of great circle searches may miss a direction of increase in the score function. Moreover,
even if the trial maximum is a true local maximum, it may not be a global maximum. For these reasons,
upon finding a trial maximum, MSCORE conducts a user specified number of endgame iterations.
These are simply additional iterations of the maximization algorithm. The random search method is
such that with enough of these, the entire parameter space would ultimately be searched with
probability one. If the endgame iterations provide no improvement in the score, the trial maximum is
deemed the final estimate. If an improvement is made during an endgame search, the current estimate
is updated as usual and the search resumes. The logic of the algorithm depends on the endgame
searches to ensure that all regions of the parameter space are investigated with some probability. The
density of the coverage is an increasing function of the number of endgame searches.
There are no formal rules for the number of endgame searches. It should probably increase
with K and (perhaps a little less certainly) with n. But, because the step function more closely
approximates a continuous population score function, it may be that fewer endgame searches will be
needed as N increases.
N11: Semiparametric and Nonparametric Models for Binary Choice N-157

Starting Values

Starting values are specified with

; Start = starting values (default = none).

If starting values are not provided by the user, they are computed as follows: For each of the K
parameters, we form a vector equal to the kth column of an identity matrix. The sample score
function is evaluated at this vector, and the kth parameter is set equal to this value. At the
conclusion, the starting vector is normalized to unit length. If you do provide your own starting
values, they will be normalized to unit length before the iterations are begun.

Technical Output

Technical output is specified with

; Output = 4 or 5 for output of trace of bootstraps to output file

(default = neither).

This is used to control the amount of information about the bootstrap iterations that is produced.
This can generate hundreds or thousands of lines of output, depending on the number of bootstrap
estimates computed and the number of endgame searches requested. This information is displayed
on the screen, in order to trace the progress of execution. In general, the output is not especially
informative except in the aggregate. That is, individual lines of this trace are likely to be quite
similar. The default is not to retain information about individual bootstraps or endgame searches in
the file. Use ; Output = 4 to request only the bootstrap iterations (one line of output per). Use
; Output = 5 to include, in addition, the corresponding information about the endgame searches.

N11.2.3 General Options for MSCORE

The following general options used with the nonlinear estimators in NLOGIT are available
for MSCORE:
; Covariance Matrix to display MSE matrix (default = no),
same as ; Printvc
; List to display predicted values (default = no list)
; Keep = name to retain predictions in name (default = no)
; Res = name to retain fitted values in name (default = no)
; Test: spec to specify restriction (default = none)
; Maxit = n to set maximum iterations (default = 50)

Note the earlier caution about the MSD matrix when using the ; Test: option. The ; Rst = ... and
; CML: options for imposing restrictions are not available with this estimator.
N11: Semiparametric and Nonparametric Models for Binary Choice N-158

N11.2.4 Output from MSCORE

Output from MSCORE consists of the following, in the order in which it will appear on
your screen or your output file:

1. The iteration summary for the primary estimation procedure (this is labeled bootstrap sample
0) and, if you have requested them, the bootstrap sample estimations. With each one, we
report the number of iterations, the number of completed endgame iterations (see the
discussion above), the maximum normalized score, and the change in the normalized score.

2. Echo of input parameters in your command.

3. The score function and normalized score function evaluated at three different points:

a. naive, the first element of is 1 or -1 and all other values are 0,

b. the starting values,
c. the final estimates.

4. The deviations of the bootstrap estimates from the point estimates are summarized in the
root mean square error and mean absolute angular deviation between them.

5. The point estimates of the parameters.

NOTE: The estimates are presented in NLOGITs standard format for parameter estimates.
If you have computed bootstrap estimates, the mean square deviation matrix (from the point
estimate) is reported as if it were an estimate of the covariance matrix of the estimates. This
includes standard errors, t ratios, and prob. values. These may, in fact, not be
appropriate estimates of the asymptotic standard errors of these parameter estimates.
Discussion appears in the references below.

If you change the number of bootstrap estimates, you may observe large changes in these
standard errors. This is not to be interpreted as reflecting any changes in the precision of the
estimates. If anything, it reflects the unreliability of the bootstrap MSD matrix as an
estimate of the asymptotic covariance matrix of the estimates. It has been shown that the
asymptotic distribution of the maximum score estimator is not normal. (See Kim and
Pollard (1990).) Moreover, even under the best of circumstances, there is no guarantee that
the bootstrap estimates or functions of them (such as t ratios), converge to anything useful.

6. A cross tabulation of the predictions of the model vs. the actual values of the Lhs variable.

7. If the model has more than two parameters, and you have requested analysis of the ties, the
results of the endgame searches are reported last. Records of ties are recorded in your output
file if one is opened, but not displayed on your screen.
N11: Semiparametric and Nonparametric Models for Binary Choice N-159

The predicted values computed by MSCORE are the sign of bxi, coded 0 or 1. Residuals
are yi - y i, which will be 1, 0, or -1. The ; List specification also produces a listing of bxi. The last
column of the listing, labeled Prob[y = 1] is the probabilities computed using the standard normal
distribution. Since the probit model has not been used to fit the model, these may be ignored.
Results which are saved by MSCORE are:

b = final estimates of parameters

varb = mean squared deviation matrix for bootstrap estimates
score = scalar, equal to the maximized value of the score function

The Last Model labels are b_variable. But, note once again, that the underlying theory needed to
justify use of the Wald statistic does not apply here.

N11.3 Klein and Spadys Semiparametric Binary Choice Model

Klein and Spadys semiparametric density estimator is based on the specification

Prob[yi = 1] = P(xi)

where P is an unknown, continuous function of its argument with range [0,1]. The function P is not
specified a priori; it is estimated with the parameters. The probability function provides the location
for the index that would otherwise be provided by a constant term. The estimation criterion is

1 n
log L = [ yi log Pn (xi ) + (1 yi ) log(1 Pn (xi ))]
n i =1

where Pn is the estimator of P and is computed using a kernel density estimator. The probability
function is estimated with a kernel estimator,

yj (xi x j )

n
j =1
K
h h .
Pn(xi) =
1 ( x x )

n i j
j =1
K
h h

Two kernel functions are provided, the logistic function, (z) and the standard normal CDF, (z).
As in the other semiparametric estimators, the bandwidth parameter is a crucial input. The
program default is n-(1/6), which ranges from .3 to about .6 for n ranging from 30 to 1000. You may
provide an alternative value.
N11: Semiparametric and Nonparametric Models for Binary Choice N-160

N11.3.1 Command
The command for this estimator is

SEMIPARAMETRIC
; Lhs = dependent, binary variable
; Rhs = independent variables $

Do not include one on the Rhs list. The function itself is playing the role of the constant. Optional
features include those specific to this model,

; Smooth = desired value for h

; Kernel = Normal the logistic is standard
and the general ones available with other estimators,

; Partial Effects
; Prob = name to retain fitted probabilities
; Keep = name to retain predicted values
; Res = name to retain residuals
; Covariance Matrix to display the estimated asymptotic covariance matrix,
same as ; Printvc
The semiparametric log likelihood function is a continuous function of the parameters which is
maximized using NLOGITs standard tools for optimization. Thus, the options for controlling
optimization are available,
; Maxit = n to set maximum iterations
; Output = 1, 2, 3 to control intermediate output
; Alg = name to select algorithm

Restrictions may be imposed and tested with

; Test: spec to specify restriction (default = none)

; Rst = list to specify fixed value and equality restrictions
; CML: spec to specify other linear constraints

N11.3.2 Output
Output from this estimator includes the usual table of statistical results for a nonlinear
estimator. Note that the estimator constrains the constant term to zero and also normalizes one of the
slope coefficients to one for identification. This will be obvious in the results. Since probabilities
which are a continuous function of the parameters are computed, you may also request marginal
effects with
; Partial Effects
(In previous versions, the command was ; Marginal Effects. This form is still supported.) Partial
effects are computed using Pn(xi) and its derivatives (which are simple sums) computed at the
sample means.
N11: Semiparametric and Nonparametric Models for Binary Choice N-161

Results Kept by the Semiparametric Estimator

The model results kept by this estimator are
Matrices: b = final estimates of parameters
varb = mean squared deviation matrix for bootstrap estimates
Scalars: logl = log likelihood
kreg = number of Rhs variables
nreg = number of observations used to fit the function
exitcode = exit status for estimator
Last Model: The labels are b_variable

Last Function: None

N11.3.3 Application
The Klein and Spady estimator is computed with the binary logit model. We use only a
small subset of the data, the observations that are observed only once. The complete lack of
agreement of the two models is striking, though not unexpected.

REJECT ; _groupti > 1 $

SEMI ; Lhs = doctor
; Rhs = one,age,hhninc,hhkids,educ,married
; Partial Effects $
LOGIT ; Lhs = doctor
; Rhs = one,age,hhninc,hhkids,educ,married
; Partial Effects $
-----------------------------------------------------------------------------
Semiparametric Binary Choice Model
Dependent variable DOCTOR
Log likelihood function -1001.96124
Restricted log likelihood -1004.77427
Chi squared [ 4 d.f.] 5.62607
Significance level .22887
McFadden Pseudo R-squared .0027997
Estimation based on N = 1525, K = 4
Inf.Cr.AIC = 2011.922 AIC/N = 1.319
Hosmer-Lemeshow chi-squared = *********
P-value= .00000 with deg.fr. = 8
Logistic kernel fn. Bandwidth = .29475
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Odds Ratio Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
| Characteristics in numerator of Prob[Y = 1]
AGE| .98652 .02284 -.59 .5577 .94176 1.03128
HHNINC| .02962** .04607 -2.26 .0236 -.06067 .11991
HHKIDS| 3.16366 4.50864 .81 .4190 -5.67311 12.00042
EDUC| .96226 .11808 -.31 .7539 .73083 1.19368
MARRIED| 2.71828 .....(Fixed Parameter).....
--------+--------------------------------------------------------------------
N11: Semiparametric and Nonparametric Models for Binary Choice N-162

--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
Odds ratio = exp(beta); z is computed for the original beta
Fixed parameter ... is constrained to equal the value or
had a nonpositive st.error because of an earlier problem.
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
Partial derivatives of probabilities with
respect to the vector of characteristics.
They are computed at the means of the Xs.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| -.00025 -.01488 -.59 .5523 -.00107 .00057
HHNINC| -.06479*** -.03782 -76.40 .0000 -.06645 -.06313
HHKIDS| .02120 .01063 .26 .7984 -.14148 .18388
EDUC| -.00071 -.01305 -.33 .7445 -.00497 .00355
MARRIED| .01841 .....(Fixed Parameter).....
--------+--------------------------------------------------------------------
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
Fixed parameter ... is constrained to equal the value or
had a nonpositive st.error because of an earlier problem.
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
Binary Logit Model for Binary Choice
Dependent variable DOCTOR
Log likelihood function -996.30681
Restricted log likelihood -1004.77427
Chi squared [ 5 d.f.] 16.93492
Significance level .00462
McFadden Pseudo R-squared .0084272
Estimation based on N = 1525, K = 6
Inf.Cr.AIC = 2004.614 AIC/N = 1.315
Hosmer-Lemeshow chi-squared = 10.56919
P-value= .22732 with deg.fr. = 8
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| .46605 .34260 1.36 .1737 -.20544 1.13754
AGE| .00509 .00448 1.14 .2556 -.00369 .01387
HHNINC| -.49045* .26581 -1.85 .0650 -1.01142 .03052
HHKIDS| -.36639*** .12639 -2.90 .0037 -.61410 -.11867
EDUC| .00783 .02419 .32 .7461 -.03957 .05523
MARRIED| .16046 .12452 1.29 .1975 -.08360 .40451
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N11: Semiparametric and Nonparametric Models for Binary Choice N-163

-----------------------------------------------------------------------------
Partial derivatives of E[y] = F[*] with
respect to the vector of characteristics
Average partial effects for sample obs.
--------+--------------------------------------------------------------------
| Partial Prob. 95% Confidence
DOCTOR| Effect Elasticity z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00117 -.00127 1.14 .2554 -.00085 .00320
HHNINC| -.11304* .00087 -1.85 .0648 -.23301 .00694
HHKIDS| -.08606*** .00019 -2.87 .0041 -.14476 -.02736 #
EDUC| .00180 -.00053 .32 .7461 -.00912 .01273
MARRIED| .03702 -.00057 1.29 .1971 -.01924 .09327 #
--------+--------------------------------------------------------------------
# Partial effect for dummy variable is E[y|x,d=1] - E[y|x,d=0]
z, prob values and confidence intervals are given for the partial effect
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

N11.4 Nonparametric Binary Choice Model

The kernel density estimator is a device used to describe the distribution of a variable
nonparametrically, that is, without any assumption of the underlying distribution. This section
describes an extension to a simple regression function. The kernel density function estimates any
sufficiently smooth regression function, F(z) = E[|x=z], using the method of kernels, for any
parameter vector . must be a response variable with bounded range [0,1]. In the special case in
which is a binary response taking values 0/1, NPREG estimates the probability of a positive
response conditional on the linear index x. With an appropriate choice of x and , and by rescaling
the response, this estimator can estimate any sufficiently smooth univariate regression function with
known bounded range. One simple approach is to assume that x is a single variable and equals 1.0,
in which case, the estimator describes E[yi|xi]. Alternatively, NPREG may be used with the
estimated index function, xi, from any binary choice estimator. The natural choice in this instance
would be MSCORE, since MSCORE does not compute the probabilities (that is, the conditional
mean). In principle, the estimated index function could come from any estimator, but from a probit
or other parametric model, this would be superfluous.
The regression function computed is

z j zi
i =1 yi h K
N 1

h
F (z j ) = ., j = 1,...,M, i = 1,..., number of observations.
1 z j zi
i =1
N
K
h h

The function is computed for a specified set of values zj, j = 1,...,M. Note that each value requires a
sum over the full sample of n values. The primary component of the computation is the kernel
function, K[.].
N11: Semiparametric and Nonparametric Models for Binary Choice N-164

Eight alternatives are provided:

1. Epanechnikov: K[z] = .75(1 - .2z2) / Sqr(5) if |z| <= 5, 0 else,

2. Normal: K[z] = (z) (normal density),
3. Logit: K[z] = (z)[1-(z)] (default),
4. Uniform: K[z] = .5 if |z| < 1, 0 1 else,
5. Beta: Z[z] = (1-z)(1+z)/24 if |z| < 1, 0 1 else,
6. Cosine: K[z] = 1 + cos(2z) if |z| < .5, 0 else,
7. Triangle: K[z] = 1 - |z|, if |z| <= 1, 0 else.
8. Parzen: K[z] = 4/3 - 8z2 + 8|z|3 if |z| <= .5, 8(1-|z|)3 else.

The other essential part of the computation is the smoothing (bandwidth) parameter, h. Large values
of h stabilize the function, but tend to flatten it and reduce the resolution. Small values of h produce
greater detail, but also cause the estimator to become less stable.
The basic command is

NPREG ; Lhs = the dependent variable

; Rhs = the variable $

With no other options specified, the routine uses the logit kernel function, and uses a bandwidth
equal to
h = .9Q/n0.2 where Q = min(std.dev., range/1.5)

You may specify the kernel function to be used with

; Kernel = one of the names of the eight types of kernels listed above.

The bandwidth may be specified with

; Smooth = the bandwidth parameter.

There is no theory for choosing the right smoothing parameter, . Large values will cause
the estimated function to flatten at the average value of yi. Values close to zero will cause the
function to pass through the points zi,yi and to become computationally unstable elsewhere. A choice
might be made on the basis of the CVMSPE. (See Wong (1983) for discussion.) A value that
minimizes CVMSPE() may work well in practice. Since CVMSPE is a saved result, you could
compute this for a number of values of then retrieve the set of values to find the optimal one.
The default number of points specified is 100, with zj = a partition of the range of the
variable. You may specify the number of points, up to 200 with

; Pts = number of points to compute and plot.

The range of values plotted is the equally spaced grid from min(x)-h to max(x)+h, with the number
of points specified.
N11: Semiparametric and Nonparametric Models for Binary Choice N-165

N11.4.1 Output from NPREG

Output from KERNEL is a set of points for an estimated function, several descriptive
statistics, and a plot of the estimated regression function. The added specification

; List
displays the specific results, zi for the sample observations and the associated estimated regression
functions. These values are also placed in a two column matrix named kernel after estimation of the
function.
The cross validation mean squared prediction error (CVMSPE) is a goodness of fit measure.
Each observation, i is excluded in turn from the sample. Using the reduced sample, the regression
function is reestimated at the point zi in order to provide a point prediction for yi. The average
squared prediction error defines the CVMSPE. The calculation is defined by

1 x xi
j i
yjK j
Fi* ( z ) =
h h
1 x j xi
j i
K
h h

Then, CVMSPE(h) = (1/n) i [yi - Fi*(xi)]2.

N11.4.2 Application
The following estimates the parameters of a regression function using MSCORE, then uses
NPREG to plot the regression function.
REJECT ; _groupti > 1 $
NAMELIST ; x = one,age,hhninc,hhkids,educ,married $
MSCORE ; Lhs = doctor ; Rhs =x $
CREATE ; xb = x'b$
NPREG ; Lhs = doctor ; Rhs = xb $
-----------------------------------------------------------------------------
Maximum Score Estimates of Linear Quantile
Regression Model from Binary Response Data
Quantile .500 Number of Parameters = 6
Observations input = 1525 Maximum Iterations = 500
End Game Iterations = 100 Bootstrap Estimates = 20
Check Ties? No
Save bootstraps? No
Start values from MSCORE (normalized)
Normal exit after 100 iterations.
Score functions: Naive At theta(0) Maximum
Raw .26033 .26033 .27738
Normalized .63016 .63016 .63869
Estimated MSEs from 20 bootstrap samples
(Nonconvergence in 0 cases)
Angular deviation (radians) of bootstraps from estimate
Mean square = 1.027841 Mean absolute = .979001
Standard errors below are based on bootstrap mean squared
deviations. These and the t-ratios are only approximations.
N11: Semiparametric and Nonparametric Models for Binary Choice N-166

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
Constant| .42253 .63272 .67 .5043 -.81758 1.66263
AGE| .01146 .03120 .37 .7134 -.04969 .07261
HHNINC| -.20766 .45880 -.45 .6508 -1.10689 .69157
HHKIDS| -.82224 .65955 -1.25 .2125 -2.11494 .47045
EDUC| .01446 .07191 .20 .8406 -.12648 .15541
MARRIED| .31926 .35336 .90 .3663 -.37331 1.01183
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
+---------------------------------------------------------+
|Predictions for Binary Choice Model. Predicted value is |
|1 when beta*x is greater than one, zero otherwise. |
|Note, column or row total percentages may not sum to |
|100% because of rounding. Percentages are of full sample.|
+------+---------------------------------+----------------+
|Actual| Predicted Value | |
|Value | 0 1 | Total Actual |
+------+----------------+----------------+----------------+
| 0 | 23 ( 1.5%)| 541 ( 35.5%)| 564 ( 37.0%)|
| 1 | 10 ( .7%)| 951 ( 62.4%)| 961 ( 63.0%)|
+------+----------------+----------------+----------------+
|Total | 33 ( 2.2%)| 1492 ( 97.8%)| 1525 (100.0%)|
+------+----------------+----------------+----------------+
+---------------------------------------------------------+
|Crosstab for Binary Choice Model. Predicted probability |
|vs. actual outcome. Entry = Sum[Y(i,j)*Prob(i,m)] 0,1. |
|Note, column or row total percentages may not sum to |
|100% because of rounding. Percentages are of full sample.|
+------+---------------------------------+----------------+
|Actual| Predicted Probability | |
|Value | Prob(y=0) Prob(y=1) | Total Actual |
+------+----------------+----------------+----------------+
| y=0 | 564 ( 37.0%)| 0 ( .0%)| 564 ( 37.0%)|
| y=1 | 961 ( 63.0%)| 0 ( .0%)| 961 ( 63.0%)|
+------+----------------+----------------+----------------+
|Total | 1525 (100.0%)| 0 ( .0%)| 1525 (100.0%)|
+------+----------------+----------------+----------------+
+---------------------------------------+
| Nonparametric Regression for DOCTOR |
| Observations = 1525 |
| Points plotted = 1525 |
| Bandwidth = .090121 |
| Statistics for abscissa values---- |
| Mean = .854823 |
| Standard Deviation = .433746 |
| Minimum = -.167791 |
| Maximum = 1.662874 |
| ---------------------------------- |
| Kernel Function = Logistic |
| Cross val. M.S.E. = .231635 |
| Results matrix = KERNEL |
+---------------------------------------+
N11: Semiparametric and Nonparametric Models for Binary Choice N-167

Figure N11.1 Nonparametric Regression

N12: Bivariate and Multivariate Probit and Partial Observability Models N-168

N12: Bivariate and Multivariate Probit and

Partial Observability Models
N12.1 Introduction
The basic formulation of the models in this chapter is the bivariate probit model:

zi1 = 1xi1 + i1, yi1 = 1 if zi1 > 0, yi1 = 0 otherwise,

zi2 = 2xi2 + i2, yi2 = 1 if zi2 > 0, yi2 = 0 otherwise,
[i1,i2] ~ bivariate normal (BVN) [0,0,1,1,], -1 < < 1,
individual observations on y1 and y2 are available for all i.

(This model is also available for grouped (proportions) data. See Section N12.2.2.) The model given
above would be estimated using a complete sample on [y1, y2, x1, x2] where y1 and y2 are binary
variables and xij are sets of regressors. This chapter will describe estimation of this model and
several variants:

The disturbances in one or both equations may be heteroscedastic.

The observation mechanism may be such that yi1 is not observed when yi2 equals zero.

The observation mechanism may be such that only the product of yi1 and yi2 is observed.
That is, we only observe the compound outcomes both variables equal one or one or
both equal zero.

The basic model is extended to as many as 20 equations as a multivariate probit model.

NOTE: It is not necessary for there to be different variables in the two (or more) equations. The
Rh1 and Rh2 lists may be identical if your model specifies that. There is no issue of identifiability or
of estimability of the model the variable lists are unrestricted. This is not a question of
identification by functional form. The analogous case is the SUR model which is also identified even
if the variables in the two equations are the same.

Some extensions to a simultaneous equations model are easily programmed.

The bivariate probit and partial observability models are extended to the random
parameters modeling framework for panel data.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-169

N12.2 Estimating the Bivariate Probit Model

The two equations can each be estimated consistently by individual single equation probit
methods (see Chapter E27). However, this is inefficient and incomplete in that it ignores the
correlation between the disturbances. Moreover, the correlation coefficient itself might be of interest.
The comparison is analogous to that between OLS and GLS in the multivariate regression model. The
model is estimated in NLOGIT using full information maximum likelihood. The essential command is

BIVARIATE PROBIT ; Lhs = y1,y2

(or just BIVARIATE) ; Rh1 = right hand side for equation 1
; Rh2 = right hand side for equation 2 $

N12.2.1 Options for the Bivariate Probit Model

Restrictions may be imposed both between and within equations by using
; Rst = list of specifications...
and ; CML: linear restrictions

You might, for example, force the coefficients in the two equations to be equal as follows:

NAMELIST ; x = ... $
CALC ; k = Col(x) $
BIVARIATE ; Lhs = y1,y2 ; Rh1 = x ; Rh2 = x ; Rst = k_b, k_b, r $

(The model is identified with the same variables in the two equations.)

NOTE: You should not use the name rho for in your ; Rst specification; rho is the reserved name
for the scalar containing the most recently estimated value of in whatever model estimated it. If it
has not been estimated recently, it is zero. Either way, when ; Rst contains the name rho, this is
equivalent to fixing at the value then contained in the scalar rho. That is, rho is a value, not a
model parameter name such as b1. On the contrary, however, you might wish specifically to use rho
in your ; Rst specification. For example, to trace the maximized log likelihood over values of , you
might base the study on a command set that includes

PROCEDURE
BIVARIATE ; .... ; Rst = ..., rho $
...
ENDPROC
EXECUTE ; rho = 0.0, .90, .10 $

This would estimate the bivariate probit model 10 times, with fixed at 0, .1, .2, ..., .9. Presumably,
as part of the procedure, you would be capturing the values of logl and storing them for a later listing
or perhaps a plot of the values against the values of rho.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-170

If you use the constraints option, the parameter specification includes . As such, you can
use this method to fix to a particular value. This is a model for a voting choice and use of private
schools:
vote = f1(one,income,property_taxes)
private = f2(one,income,years,teacher).

Suppose it were desired to make the income coefficient the same in the two equations and, in a
second model, fix rho at 0.4. The commands could be

BIVARIATE ; Lhs = tax,priv

; Rh1 = one,inc,ptax ; Rh2 = one,inc,yrs,tch
; Rst = b10,bi,b12,b20,bi,b22,b23,r $

and BIVARIATE ; Lhs = tax,priv

; Rh1 = one,inc,ptax ; Rh2 = one,inc,yrs,tch
; Rst = b10,bi,b12,b20,bi,b22,b23,0.4 $

Choice Based Sampling

Any of the bivariate probit models may be estimated with choice based sampling. The feature
is requested with
; Wts = the appropriate weighting variable
; Choice Based

For this model, your weighting variable will take four values, for the four cells (0,0), (0,1), (1,0), and
(1,1);
wij = population proportion / sample proportion, i,j = 0,1.

The particular value corresponds to the outcome that actually occurs. You must provide the values.
You can obtain sample proportions you need if you do not already have them by computing a
crosstab for the two Lhs variables:

CROSSTAB ; Lhs = y1 ; Rhs = y2 $

The table proportions are exactly the proportions you will need. To use this estimator, it is assumed
that you know the population proportions.

Robust Covariance Matrix with Correction for Clustering

The standard errors for all bivariate probit models may be corrected for clustering in the
sample. Full details on the computation are given in Chapter R10, so we give only the final result
here. Assume that the data set is partitioned into G clusters of related observations (like a panel).
After estimation, let V be the estimated asymptotic covariance matrix which ignores the clustering.
Let gij denote the first derivatives of the log likelihood with respect to all model parameters for
observation (individual) i in cluster j.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-171

Then, the corrected asymptotic covariance matrix is

G G
Est.Asy.Var = V i 1 =
G 1
= (
ni
j 1
g ij =)(
ni
j 1 )

g ij V

You specify the clusters with

; Cluster = either the fixed number of individuals in a group or the name of a variable
which identifies the group membership

Any identifier which is common to all members in a cluster and different from other clusters may be
used. The controls for stratified and clustered data may be used as well. These are as follows:

; Cluster = the number of observations in a cluster (fixed) or the name of a

stratification variable which gives the cluster an identification. This
is the setup that is described above.
; Stratum = the number of observations in a stratum (fixed) or the name of a
stratification variable which gives the stratum an identification
; Wts = the name of the usual weighting variable for model estimation if
weights are desired. This defines wics. This is the usual weighting
setup that has been used in all previous versions of LIMDEP and
NLOGIT.
; FPC = the name of a variable which gives the number of clusters in the
stratum. This number will be the same for all observations in a
stratum repeated for all clusters in the stratum. If this number is
the same for all strata, then just give the number.
; Huber Use this switch to request hs. If omitted, hs = 1 is used.
; DFC Use this switch to request the use of d given above. If omitted,
d = 1 is used.

Note, these corrections will generally lead to larger standard errors compared to the uncorrected results.

N12.2.2 Proportions Data

Like other discrete choice models, this one may be fit with proportions data. Since this is a
bivariate model, you must provide the full set of four proportions variables, in the order

; Lhs = p00, p01, p10, p11.

(You may use your own names). Proportions must be strictly between zero and one, and the four
variables must add to 1.0.

NOTE: When you fit the model using proportions data, there is no cross tabulation of fitted and
actual values produced, and no fitted values or residuals are computed.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-172

N12.2.3 Heteroscedasticity
All bivariate probit specifications, including the basic two equation model, the sample
selection model (Section N12.4), and the Meng and Schmidt partial observability model (Section
N12.7), may be fit with a multiplicative heteroscedasticity specification. The model is the same as
the univariate probit model;

i ~ N{0, [exp( izi)]2 }, i = 1 and/or 2.

Either or both equations may be specified in this fashion. Use

; Hf1 = list of variables if you wish to modify the first equation

; Hf2 = list of variables if you wish to modify the second equation

NOTE: Do not include one in either list. The model will become inestimable.

The model is unchanged otherwise, and the full set of options given earlier remains available. To
give starting values with this modification, supply the following values in the order given:

= [ 1, 2,1,2,].

As before, all starting values are optional, and if you do provide the slopes, the starting value for is
still optional. The internal starting values for the variance parameters are zero for both equations.
(This produces the original homoscedastic model.)

N12.2.4 Specification Tests

Wald, LM, and LR tests related to the slope parameters would follow the usual patterns
discussed in previous chapters. One might be interested in testing hypotheses about the correlation
coefficient. The Wald test for the hypothesis that equals zero is part of the standard output for the
model see the results below which include a t statistic for this hypothesis. Likelihood ratio and
LM tests can be carried out as shown below:
The following routine will test the specification of the bivariate probit model against the null
hypothesis that two separate univariate probits apply. The test of the hypothesis that equals zero is
sufficient for this. The first group of commands computes and saves the univariate probit
coefficients and log likelihoods.

NAMELIST ; x1 = ... Rhs for the first equation

; x2 = ... Rhs for the second equation $
PROBIT ; Lhs = y1 ; Rhs = x1 $
MATRIX ; b1 = b $
CALC ; l1 = logl $
PROBIT ; Lhs = y2 ; Rhs = x2 $
MATRIX ; b2 = b $
CALC ; l2 = logl $
N12: Bivariate and Multivariate Probit and Partial Observability Models N-173

To carry out the likelihood ratio test, we now fit the bivariate model, which is the unrestricted one.
The restricted model, with = 0, is the two univariate models. The restricted log likelihood is the
sum of the two univariate values. The CALC command carries out the test. The BIVARIATE
command also produces a t statistic in the displayed output for the hypothesis that = 0. To
automate the test, we can also use the automatically retained values rho and varrho. The second
CALC command carries out this test.

BIVARIATE ; Lhs = y1,y2 ; Rh1 = x1 ; Rh2 = x2 $

CALC ; lrtest = 2*(l1 + l2 - logl)
; pvalue = 1 - Chi(lrtest,1) $
CALC ; waldtest = rho^2 / varrho
; pvalue = 1 - Chi(waldtest,1) $

The Lagrange multiplier test is also simple to carry out using the built in procedure, as we have
already estimated the restricted model. The test is carried out with the model command that specifies
the starting values from the restricted model and restricts the maximum iterations to zero.

NAMELIST ; x1 = ... Rhs for the first equation

; x2 = ... Rhs for the second equation $
PROBIT ; Lhs = y1 ; Rhs = x1 $
MATRIX ; b1 = b $
PROBIT ; Lhs = y2 ; Rhs = x2 $
MATRIX ; b2 = b $
BIVARIATE ; Lhs = y1,y2 ; Rh1 = x1 ; Rh2 = x2
; Start = b1,b2,0 ; Maxit = 0 $

You can test the heteroscedasticity assumption by any of the three classical tests as well.
The LM test will be the simplest since it does not require estimation of the model with
heteroscedasticity. You can carry out the LM test as follows:

NAMELIST ; x1 = ... ; x2 = ... ; z1 = ... ; z2 = ... $

BIVARIATE ; Lhs = ... ; Rh1 = x1 ; Rh2 = x2 $
CALC ; h1 = Col(z1) ; h2 = Col(z2)
; k1 = Col(x1) ; k2 = Col(x2) ; k12 = k1+k2 $
MATRIX ; b1_b2 = b(1:k12) $
BIVARIATE ; Lhs = ...
; Rh1 = x1 ; Rh2 = x2 ? specify the two probit equations
; Hf1 = z1 ; Hf2 = z2 ? variables in the two variances
; Start = b1_b2, h1_0, h2_0, rho
; Maxit = 0 $

In this instance, the starting value for rho is the value that was estimated by the first model, which is
retained as a scalar value.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-174

N12.2.5 Model Results for the Bivariate Probit Model

The initial output for the bivariate probit models consists of the ordinary least squares results
if you request them with

; OLS

Final output includes the log likelihood value and the usual statistical results for the parameter
estimates.
The last output, requested with

; Summary

is a joint frequency table for four cells, with actual and predicted values shown. The predicted
outcome is the cell with the largest probability. Cell probabilities are computed using

Pi00 = 1 - Pi11 - Pi10 - Pi01 Pi01 = [ 2xi2] - Pi11

Pi10 = [ 1xi1] - Pi11 Pi11 = 2 [ 1xi1, 2xi1, ]

A table which assesses the success of the model in predicting the two variables is presented as well.
An example appears below. The predictions and residuals are a bit different from the usual setup
(because this is a two equation model):

; Keep = name to retain the predicted y1

; Res = name to retain the predicted y2
; Prob = name to retain the probability for observed y1, y2 outcome
; Density = fitted bivariate normal density for observed outcome

Matrix results kept in the work areas automatically are b and varb. An extra matrix named
b_bprobt is also created. This is a two column matrix that collects the coefficients in the two
equations in a parameter matrix. The number of rows is the larger of the number of variables in x1
and x2. The coefficients are placed at the tops of the respective columns with the shorter column
padded with zeros.

NOTE: There is no correspondence between the coefficients in any particular row of b_bprobt. For
example, in the second row, the coefficient in the first column is that on the second variable in x1
and the coefficient in the second column is that on the second variable in x2. These may or may not
be the same.

The results saved by the binary choice models are:

Matrices: b = estimate of ( 1, 2,)

varb = asymptotic covariance matrix

Scalars: kreg = number of parameters in model

nreg = number of observations
logl = log likelihood function
N12: Bivariate and Multivariate Probit and Partial Observability Models N-175

Variables: logl_obs = individual contribution to log likelihood

Last Model: b1_variables, b2_variables, c1_variables, c1_variables, r12

Last Function: Prob(y1 = 1,y2 = 1|x1,x2) = 2(b1x1,b2x2,r)

The saved scalars are nreg, kreg, logl, rho, varrho. The Last Model labels are b_variables and
b2_variables. If the heteroscedasticity specification is used, the additional coefficients are
c1_variables and c2_variables. To extract a vector that contains only the slopes, and not the
correlation, use

MATRIX ; {kb = kreg-1} ; b1b2 = b(1:kb) $

To extract the two parameter vectors separately, after defining the namelists, you can use

MATRIX ; {k1 = Col(x1) ; k12 = k1+1 ; kb = kreg-1}

; b1 = b(1:k1) ; b2 = b(k12:kb) $

You may use other names for the matrices. (Note that the MATRIX commands contain embedded
CALC commands contained in {}.) If the model specifies heteroscedasticity, similar constructions
can be used to extract the three or four parts of b.

N12.2.6 Partial Effects

Because it is a two equation model, it is unclear what should be an appropriate marginal
effect in the bivariate probit model. (This is one of our frequently asked questions, as users are
often uncertain about what it is that they are looking for when they seek the partial effects in the
model effect of what? on what?) The literature is not necessarily helpful in this regard. The one
published result in the econometrics literature, Christofides, Stengos and Swidinsky (1997), plus an
error correction in a later issue, focuses on the joint probability of the two outcome variables
equaling one which is not a conditional mean. The probability might be of interest. It can be
examined with the PARTIAL EFFECTS program. An example appears below. The marginal
means in the model are the univariate probabilities that the two variables equal one. These are also
not necessarily interesting, but, in any event, they can be computed using the univariate models.
NLOGIT analyzes the conditional mean function

E[y1 | y2 = 1, x1, x2] = Prob[y1 = 1,y2 = 1| x1,x2,) / Prob[y2 = 1|x1].

This is the function analyzed in the bivariate probit marginal effects processor. The bivariate probit
estimator in NLOGIT allows either or both of the latent regressions to be heteroscedastic. The
reported effects for this model include the decomposition of the marginal effect into all four terms,
the regression part and the variance part, in each of the two latent models.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-176

The computations of the following marginal effects in the bivariate probit model are
included as an option with the estimator. There are two models, the base case of y1,y2 a pair of
correlated probit models, and y1|y2 = 1, the bivariate probit with sample selection. (See Section
N12.4 below.) The conditional mean computed for these two models would be identical,

E[y1|y2 = 1] = 2 [w1, w2 , ] / ( w2 )

where 2 is the bivariate normal CDF and is the univariate normal CDF. This model allows
multiplicative heteroscedasticity in either or both equations, so

w1 = 1x1 / exp(1z1)

and likewise for w2. In the homoscedastic model, 1 and/or 2 is a zero vector. Four full sets of
marginal effects are reported, for x1, x2, z1, and z2. Note that the last two may be zero. The four
vectors may also have variables in common. For any variable which appears in more than one of the
parts, the marginal effect is the sum of the individual terms. A table is reported which displays these
total effects for every variable which appears in the model, along with estimated standard errors and
the usual statistical output. Formulas for the parts of these marginal effects are given below with the
technical details. For further details, see Greene (2011).
Note that you can get marginal effects for y2|y1 just by respecifying the model with y1 and y2
reversed (y2 now appears first) in the Lhs list of the command. You can also trick NLOGIT into
giving you marginal effects for y1|y2 = 0 (instead of y2 = 1) by computing z1 = 1-y1 and z2 = 1-y2, and
fitting the same bivariate probit model but with Lhs = z1,z2. You must now reverse the signs of the
marginal effects (and all slope coefficients) that are reported.
The example below was produced by a sampling experiment: Note that the model specifies
heteroscedasticity in the second equation though, in fact, there is none.

CALC ; Ran(12345) $
SAMPLE ; 1-500 $
CREATE ; u1 = Rnn(0,1) ; u2 = u1 + Rnn(0,1)
; z = Rnu(.2,.4) ; x1 = Rnn(0,1) ; x2 = Rnn(0,1)
; x3 = Rnn(0,1) ; y1 = (x1 + x2 + u1) > 0 ; y2 = (x1 + x3 + u2) > 0 $
BIVARIATE ; Lhs = y1,y2
; Rh1 = one,x1,x2 ; Rh2 = one,x1,x3
; Hf2 = z ; Partial Effects $

The first set of results is the model coefficients.

-----------------------------------------------------------------------------
FIML Estimates of Bivariate Probit Model
Dependent variable Y1Y2
Log likelihood function -416.31350
Estimation based on N = 500, K = 8
Inf.Cr.AIC = 848.627 AIC/N = 1.697
Disturbance model is multiplicative het.
Var. Parms follow 6 slope estimates.
For e(2), 1 estimates follow X3
-----------------------------------------------------------------------------
N12: Bivariate and Multivariate Probit and Partial Observability Models N-177

--------+--------------------------------------------------------------------
Y1| Standard Prob. 95% Confidence
Y2| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index equation for Y1
Constant| -.04292 .07362 -.58 .5599 -.18721 .10137
X1| 1.09235*** .08571 12.74 .0000 .92435 1.26035
X2| 1.06802*** .08946 11.94 .0000 .89268 1.24337
|Index equation for Y2
Constant| .01017 .06432 .16 .8744 -.11590 .13623
X1| .82908** .37815 2.19 .0283 .08792 1.57024
X3| .70123** .30512 2.30 .0215 .10321 1.29925
|Variance equation for Y2
Z| -.05575 1.45449 -.04 .9694 -2.90651 2.79500
|Disturbance correlation
RHO(1,2)| .66721*** .07731 8.63 .0000 .51568 .81874
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

This is the decomposition of the marginal effects for the four possible contributors to the effect.
+------------------------------------------------------+
| Partial Effects for Ey1|y2=1 |
+----------+---------------------+---------------------+
| | Regression Function | Heteroscedasticity |
| +---------------------+---------------------+
| | Direct | Indirect | Direct | Indirect |
| Variable | Efct x1 | Efct x2 | Efct h1 | Efct h2 |
+----------+----------+----------+----------+----------+
| X1 | .48383 | -.17370 | .00000 | .00000 |
| X2 | .47305 | .00000 | .00000 | .00000 |
| X3 | .00000 | -.14691 | .00000 | .00000 |
| Z | .00000 | .00000 | .00000 | .00092 |
+----------+----------+----------+----------+----------+

A table of the specific effects is produced for each contributor to the marginal effects. This first
table gives the total effects. The values here are the row total in the table above.
-----------------------------------------------------------------------------
Partial derivatives of E[y1|y2=1] with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Effect shown is total of all parts above.
Estimate of E[y1|y2=1] = .661053
Observations used for means are All Obs.
Total effects reported = direct+indirect.
--------+--------------------------------------------------------------------
Y1| Partial Standard Prob. 95% Confidence
Y2| Effect Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
X1| .31013*** .04356 7.12 .0000 .22476 .39550
X2| .47305*** .04338 10.91 .0000 .38804 .55807
X3| -.14691*** .02853 -5.15 .0000 -.20283 -.09099
Z| .00092 .02404 .04 .9694 -.04620 .04804
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N12: Bivariate and Multivariate Probit and Partial Observability Models N-178

The direct effects are the marginal effects of the variables (x1 and z1) that appear in the first equation.
-----------------------------------------------------------------------------
Partial derivatives of E[y1|y2=1] with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Effect shown is total of all parts above.
Estimate of E[y1|y2=1] = .435447
Observations used for means are All Obs.
These are the direct marginal effects.
--------+--------------------------------------------------------------------
TAX| Partial Standard Prob. 95% Confidence
PRIV| Effect Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
INC| .67814*** .24487 2.77 .0056 .19820 1.15807
PTAX| -.83030** .38146 -2.18 .0295 -1.57794 -.08266
YRS| 0.0 .....(Fixed Parameter).....
--------+--------------------------------------------------------------------
-----------------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
Fixed parameter ... is constrained to equal the value or
had a nonpositive st.error because of an earlier problem.
-----------------------------------------------------------------------------

The indirect effects are the effects of the variables that appear in the other (second) equation.

The marginal effects processor in the bivariate probit model detects when a regressor is a
dummy variable. In this case, the marginal effect is computed using differences, not derivatives.
The model results will contain a specific description. To illustrate this computation, we revisit the
German health care data. A description appears in Chapter E2. Here, we analyze the two health care
utilization variables, doctor = 1(docvis > 0) and hospital = 1(hospvis > 0) in a bivariate probit model.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-179

The model command is

SAMPLE ; All $
CREATE ; doctor = docvis > 0 ; hospital = hospvis > 0 $
BIVARIATE ; Lhs = doctor,hospital
; Rh1 = one,age,educ,hhninc,hhkids
; Rh2 = one,age,hhninc,hhkids
; Partial Effects $

The variable hhkids is a binary variable for whether there are children in the household. The
estimation results are as follows. This is similar to the preceding example. The final table contains
the result for the binary variable. In fact, the explicit treatment of the binary variable results in very
little change in the estimate.
-----------------------------------------------------------------------------
FIML Estimates of Bivariate Probit Model
Dependent variable DOCHOS
Log likelihood function -25552.65886
Estimation based on N = 27326, K = 10
Inf.Cr.AIC =51125.318 AIC/N = 1.871
--------+--------------------------------------------------------------------
DOCTOR| Standard Prob. 95% Confidence
HOSPITAL| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index equation for DOCTOR
Constant| .13653** .05618 2.43 .0151 .02642 .24663
AGE| .01353*** .00076 17.84 .0000 .01205 .01502
EDUC| -.02675*** .00345 -7.75 .0000 -.03352 -.01998
HHNINC| -.10245** .04541 -2.26 .0241 -.19144 -.01345
HHKIDS| -.12299*** .01670 -7.37 .0000 -.15571 -.09027
|Index equation for HOSPITAL
Constant| -1.54988*** .05325 -29.10 .0000 -1.65426 -1.44551
AGE| .00510*** .00100 5.08 .0000 .00313 .00707
HHNINC| -.05514 .05510 -1.00 .3169 -.16314 .05285
HHKIDS| -.02682 .02392 -1.12 .2622 -.07371 .02006
|Disturbance correlation
RHO(1,2)| .30251*** .01381 21.91 .0000 .27545 .32958
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

+--------------------------------+
| Partial Effects for Ey1|y2=1 |
+----------+----------+----------+
| | Direct | Indirect |
| Variable | Efct x1 | Efct x2 |
+----------+----------+----------+
| AGE | .00367 | -.00036 |
| EDUC | -.00726 | .00000 |
| HHNINC | -.02779 | .00385 |
| HHKIDS | -.03336 | .00187 |
+----------+----------+----------+
N12: Bivariate and Multivariate Probit and Partial Observability Models N-180

-----------------------------------------------------------------------------
Partial derivatives of E[y1|y2=1] with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Effect shown is total of all parts above.
Estimate of E[y1|y2=1] = .822131
Observations used for means are All Obs.
Total effects reported = direct+indirect.
--------+--------------------------------------------------------------------
DOCTOR| Partial Standard Prob. 95% Confidence
HOSPITAL| Effect Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00332*** .00023 14.39 .0000 .00286 .00377
EDUC| -.00726*** .00096 -7.58 .0000 -.00913 -.00538
HHNINC| -.02394* .01225 -1.95 .0507 -.04796 .00008
HHKIDS| -.03149*** .00471 -6.69 .0000 -.04072 -.02226
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
Partial derivatives of E[y1|y2=1] with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Effect shown is total of all parts above.
Estimate of E[y1|y2=1] = .822131
Observations used for means are All Obs.
These are the direct marginal effects.
--------+--------------------------------------------------------------------
DOCTOR| Partial Standard Prob. 95% Confidence
HOSPITAL| Effect Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| .00367*** .00022 16.44 .0000 .00323 .00411
EDUC| -.00726*** .00096 -7.58 .0000 -.00913 -.00538
HHNINC| -.02779** .01232 -2.25 .0241 -.05195 -.00364
HHKIDS| -.03336*** .00460 -7.26 .0000 -.04237 -.02436
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
Partial derivatives of E[y1|y2=1] with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Effect shown is total of all parts above.
Estimate of E[y1|y2=1] = .822131
Observations used for means are All Obs.
These are the indirect marginal effects.
--------+--------------------------------------------------------------------
DOCTOR| Partial Standard Prob. 95% Confidence
E[y1|x,z| Effect Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
AGE| -.00036*** .7075D-04 -5.03 .0000 -.00049 -.00022
EDUC| 0.0 .....(Fixed Parameter).....
HHNINC| .00385 .00385 1.00 .3167 -.00369 .01140
HHKIDS| .00187 .00167 1.12 .2620 -.00140 .00515
--------+--------------------------------------------------------------------
Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
Fixed parameter ... is constrained to equal the value or
had a nonpositive st.error because of an earlier problem.
-----------------------------------------------------------------------------
N12: Bivariate and Multivariate Probit and Partial Observability Models N-181

+-----------------------------------------------------------+
| Analysis of dummy variables in the model. The effects are |
| computed using E[y1|y2=1,d=1] - E[y1|y2=1,d=0] where d is |
| the variable. Variances use the delta method. The effect |
| accounts for all appearances of the variable in the model.|
+-----------------------------------------------------------+
|Variable Effect Standard error t ratio |
+-----------------------------------------------------------+
HHKIDS -.031829 .004804 -6.625

N12.3 Tetrachoric Correlation

The tetrachoric correlation is a measure of the correlation between two binary variables. The
familiar Pearson, product moment correlation is inappropriate as it is used for continuous variables.
The tetrachoric correlation coefficient is equivalent to the correlation coefficient in the following
bivariate probit model:

y1* = + 1, y1 = 1(y1* > 0)

y2* = + 2, y2 = 1(y2* > 0)
(1,2) ~ N2[(0,0),(1,1,)]

The applicable literature contains a number of approaches to estimation of this correlation

coefficient, some a bit ad hoc. We proceed directly to the implied maximum likelihood estimator.
You can fit this model with

BIVARIATE ; Lhs = y1,y2 ; Rh1 = one ; Rh2 = one $

The reported estimate of is the desired estimate. NLOGIT notices if your model does not contain
any covariates in the equation, and notes in the output that the estimator is a tetrachoric correlation.
The results below based on the German health care data show an example.

-----------------------------------------------------------------------------
FIML Estimation of Tetrachoric Correlation
Dependent variable DOCHOS
Log likelihood function -25898.27183
Estimation based on N = 27326, K = 3
Inf.Cr.AIC =51802.544 AIC/N = 1.896
--------+--------------------------------------------------------------------
DOCTOR| Standard Prob. 95% Confidence
HOSPITAL| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Estimated alpha for P[DOCTOR =1] = F(alpha)
Constant| .32949*** .00773 42.61 .0000 .31433 .34465
|Estimated alpha for P[HOSPITAL=1] = F(alpha)
Constant| -1.35540*** .01074 -126.15 .0000 -1.37646 -1.33434
|Tetrachoric Correlation between DOCTOR and HOSPITAL
RHO(1,2)| .31106*** .01357 22.92 .0000 .28446 .33766
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N12: Bivariate and Multivariate Probit and Partial Observability Models N-182

The preceding suggests an interpretation for the bivariate probit model; the correlation coefficient
reported is the conditional (on the independent variables) tetrachoric correlation.
The computation in the preceding can be generalized to a set of M binary variables, y1,...,yM.
The tetrachoric correlation matrix would be the MM matrix, R, whose off diagonal elements are the
mn coefficients described immediately above. There are several ways to do this computation, again,
as suggested by a literature that contains recipes. Once again, the maximum likelihood estimator
turns out to be a useful device.
A direct approach would involve expanding the latent model to

y1* = + 1, y1 = 1(y1* > 0)

y2* = + 2, y2 = 1(y2* > 0)
...
yM* = + M, yM = 1(yM* > 0)
(1,2,...,M) ~ NM[0,R]

The appropriate estimator would be NLOGITs multivariate probit estimator, MPROBIT, which can
handle up to M = 20. The correlation matrix produced by this procedure is precisely the full
information MLE of the tetrachoric correlation matrix. However, for any M larger than two, this
requires use of the GHK simulator to maximize the simulated log likelihood, and is extremely slow.
The received estimators of this model estimate the correlations pairwise, as shown earlier. For this
purpose, the FIML estimator is unnecessary. The matrix can be obtained using bivariate probit
estimates. The following procedure would be useable:

NAMELIST ; y = y1,y2,...,ym $
CALC ; m = Col(y) $
MATRIX ; r = Iden(m) $
PROCEDURE $
DO FOR ; 20 ; i = 2,m $
CALC ; i1 = i - 1 $
DO FOR ; 10 ; j = 1,i1 $
BIVARIATE ; Lhs = y:i, y:j ; Rh1 = one ; Rh2 = one $
MATRIX ; r(i,j) = rho $
MATRIX ; r(j,i) = rho $
ENDDO ; 10 $
ENDDO ; 20 $
ENDPROC $
EXECUTE ; Quietly $

A final note, the preceding approach is not fully efficient. Each bivariate probit estimates (m,n)
which means that m is estimated more than once when m > 1. A minimum distance estimator could
be used to reconcile these after all the bivariate probit estimates are computed. But, since the means
are nuisance parameters in this model, this seems unlikely to prove worth the effort.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-183

N12.4 Bivariate Probit Model with Sample Selection

In the bivariate probit setting, data on y1 might be observed only when y2 equals one. For
example, in modeling loan defaults with a sample of applicants, default will only occur among
applicants who are granted loans. Thus, in a bivariate probit model for the two outcomes, the
observed default data are nonrandomly selected from the set of applicants. The model is

zi1 = xi1 + i1, yi1 = sgn(zi1),

zi2 = xi2 + i2, yi2 = sgn(zi2),
i1,i2 ~ BVN(0,0,1,1,),
(yi1,xi1) is observed only when yi2 = 1.

This is a type of sample selectivity model. The estimator was proposed by Wynand and van Praag
(1981). An extensive application which uses choice based sampling as well is Boyes, Hoffman, and
Low (1989). (See also Greene (1992 and 2011).) The sample selection model is obtained by adding

; Selection (or just ; Sel)

to the BIVARIATE PROBIT command. All other options and specifications are the same as
before. Except for the diagnostic table which indicates that this model has been chosen, the results
for the selection model are the same as for the basic model.

N12.5 Simultaneity in the Binary Variables

A simultaneous equations sort of model would appear as

zi1 = 1xi1 + 1yi2 + i1, yi1 = 1 if zi1 > 0, yi1 = 0 otherwise,

zi2 = 2xi2 + 1yi1 + i2, yi2 = 1 if zi2 > 0, yi2 = 0 otherwise,
[i1,i2] ~ bivariate normal (BVN) [0,0,1,1,], -1 < < 1,
individual observations on y1 and y2 are available for all i.

It would follow from the construction that

Prob[y1 = 1, y2 = 1] = 2 ( 1x1+ 1y2, 2x2+ 2y1, ]

and likewise for the other cells, where y1 and y2 are two binary variables. Unfortunately, the model as
stated is not internally consistent, and is inestimable. Ultimately, it is not identifiable. As a practical
matter, you can verify this by attempting to devise a way to simulate a sample of observations that
conforms exactly to the assumptions of the model. In this case, there is none because there is no linear
reduced form for this model. (The approach suggested by Maddala (1983) is not consistent.) NLOGIT
will detect this condition and decline to attempt to do the estimation. For example:

BIVARIATE PROBIT ; Lhs = y1,y2 ; Rh1 = one,x1,x3,y2 ; Rh2 = one,x2,x3,y1 $

produces a diagnostic,
Error 809: Fully simultaneous BVP model is not identified
N12: Bivariate and Multivariate Probit and Partial Observability Models N-184

NOTE: Unlike the case in linear simultaneous equations models, nonidentifiability does not prevent
estimation in this model. (2SLS estimates cannot be computed when there are too few instrumental
variables, which is the signature of nonidentifiability in a linear context.) With the fully
simultaneous bivariate probit model, it is possible to maximize what purports to be a log likelihood
function numbers will be produced that might even look reasonable. However, as noted, the model
itself is nonsensical it lacks internal coherency.

N12.6 Recursive Bivariate Probit Model

A slight modification of the model in the previous section is identified and used in many
recent applications. Consider the model for the probability of the event y1 = 0/1 and y2 = 0/1
assuming 2 = 0.
Prob[y1 = 1, y2 = 1 | x1 , x2 ] = 2 ( 1x1 + 1, 2x2, )
Prob[y1 = 1, y2 = 0 | x1 , x2 ] = 2 ( 1x1, - 2x2, -)
Prob[y1 = 0, y2 = 1 | x1 , x2 ] = 2 (- 1x1 + 1, 2x2, -)
Prob[y1 = 0, y2 = 0 | x1 , x2 ] = 2 (- 1x1, - 2x2, )

This is a recursive simultaneous equations model. Surprisingly enough, it can be estimated by full
information maximum likelihood ignoring the simultaneity in the system;

BIVARIATE ; Lhs = y1, y2

; Rh1 = x1,y2 ; Rh2 = x2 $

(A proof of this result is suggested in Maddala (1983, p. 123) and pursued in Greene (1998).) An
application of the result to the gender economics study is given in Greene (1998). Some extensions
are presented in Greene (2003, 2011).
This model presents the same ambiguity in the conditional mean function and marginal
effects that were noted earlier in the bivariate probit model. The conditional mean for y1 is

E[y1 | y2 = 1, x1, x2] = 2 (1x1 + 1, 2x2, ) / ( 2x2)

for which derivatives were given earlier. Given the form of this result, we can identify direct and
indirect effects in the conditional mean:

E[ y1 | y 2 =1, x1 , x 2 ] g1
= 1 = direct effects
x1 ( ' x 2 )

= E[ y1 | y2 1, x1 , x 2 ] g 2 (x1 , x 2 , )( z2 )
= 2 2 = indirect effects
x 2 (x 2 ) [(x 2 )]
2

The unconditional mean function is

E[y1 | x1, x2] = ( 2x2) E[y1 | y2 = 1, x1, x2] + [1-( 2x2)] E[y1 | y2 = 0, x1, x2]
= 2 ( 1x1 + 1, 2x2, ) + 2 ( 1x1, - 2x2, -)
N12: Bivariate and Multivariate Probit and Partial Observability Models N-185

Derivatives for marginal effects can be derived using the results given earlier. Analysis appears in
Greene (1998). The decomposition is done automatically when you specify a recursive bivariate
probit model one in which the second Lhs variable appears in the Rhs of the first equation.
The following demonstrates this by extending the model. Note the appearance of priv on the
Rhs of the first equation, x1.

NAMELIST ; y = tax, priv

; x1 = one,inc,ptax,priv ; x2 = one,inc,yrs,ptax $
BIVARIATE ; Lhs = tax,priv ; Rh1 = x1 ; Rh2 = x2 ; Partial Effects $

-----------------------------------------------------------------------------
FIML - Recursive Bivariate Probit Model
Dependent variable PRITAX
Log likelihood function -74.21179
Estimation based on N = 80, K = 9
Inf.Cr.AIC = 166.424 AIC/N = 2.080
--------+--------------------------------------------------------------------
PRIV| Standard Prob. 95% Confidence
TAX| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index equation for PRIV
Constant| -2.81454 5.51612 -.51 .6099 -13.62594 7.99687
INC| .16264 .76312 .21 .8312 -1.33304 1.65832
YRS| -.03484 .04247 -.82 .4120 -.11808 .04840
PTAX| .04605 .98275 .05 .9626 -1.88011 1.97220
|Index equation for TAX
Constant| -.68059 4.05341 -.17 .8667 -8.62513 7.26394
INC| 1.22768 .81424 1.51 .1316 -.36820 2.82356
PTAX| -1.63160 .99598 -1.64 .1014 -3.58368 .32047
PRIV| .98178 .95912 1.02 .3060 -.89807 2.86162
|Disturbance correlation
RHO(1,2)| -.83119 .57072 -1.46 .1453 -1.94977 .28740
--------+--------------------------------------------------------------------

---------------------------------------------------------------
Decomposition of Partial Effects for Recursive Bivariate Probit
Model is PRIV = F(x1b1), TAX = F(x2b2+c*PRIV )
Conditional mean function is E[TAX |x1,x2] =
Phi2(x1b1,x2b2+gamma,rho) + Phi2(-x1b1,x2b2,-rho)
Partial effects for continuous variables are derivatives.
Partial effects for dummy variables (*) are first differences.
Direct effect is wrt x2, indirect is wrt x1, total is the sum.
---------------------------------------------------------------
Variable Direct Effect Indirect Effect Total Effect
---------+---------------+-----------------+-------------------
INC | .4787001 .0169062 .4956064
PTAX | -.6362002 .0047864 -.6314138
YRS | .0000000 -.0036217 -.0036217
---------+-----------------------------------------------------

The decomposition of the partial effects accounts for the direct and indirect influences. Note that
there is no partial effect given for priv because this variable is endogenous. It does not vary
partially.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-186

N12.7 Panel Data Bivariate Probit Models

The four bivariate probit models, bivariate probit, bivariate probit with selection, Poiriers
partial observability and Abowds partial observability model have all been extended to the random
parameters form of the panel data models. (The fixed effects and latent class models are not
available.) Use of the random parameters formulation is described in detail in Chapter R24. We will
only sketch the extension here. The commands for the models are as follows, where [ ... ] indicates
an optional part of the specification:

BIVARIATE ; Lhs = y1, y2 ? Bivariate probit

; Rh1 = Rhs for equation 1
; Rh2 = Rhs for equation 2
[ ; Selection ] ? Partial observability

or PROBIT ; Lhs = y ? Probit model

; Rh1 = Rhs for equation 1
; Rh2 = Rhs for equation 2 ? Partial observability (Poirier)
[ ; Selection ] ? Abowd and Farber

Then, ; RPM [ = list for heterogeneity in the mean ]

; Pds = panel specification ? Optional if cross section
[ ; Pts = number of replications ]
[ ; Halton and other controls for the estimation ]
; Fcn = designation of random parameters $

For the random parameters specification, use

; name ( distribution ) distribution = n, u, t, l, c for the first equation

or ; name [ distribution ] for the second equation.

Note that random parameters in the second equation are designated by square brackets rather than
parentheses. This is necessary because the same variables can appear in both equations. Two other
specifications should be useful

; Cor allows the random parameters to be correlated.

; AR1 allows the random terms to evolve according to an AR(1) process
rather than be time invariant.

The two equation random parameters save the matrices b and varb and the scalar logl after
estimation. No other variables, partial effects, etc. are provided internally to the command. But, you
can use the estimation results directly in the SIMULATION, PARTIAL EFFECTS commands,
and so on. An example appears after the results of the simulation below.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-187

Application

To demonstrate this model, we will fit a true random effects model for a bivariate probit
outcome. Each equation has its own random effect, and the two are correlated. The model structure is

zit1 = 1xit1 + it1 + ui1, yit1 = 1 if zit1 > 0, yit1 = 0 otherwise,

zit2 = 2xit2 + it2 + ui2, yit2 = 1 if zit2 > 0, yit2 = 0 otherwise,
[it1,it2] ~ Bivariate normal (BVN) [0,0,1,1,], -1 < < 1,
[ui1,ui2] ~ Bivariate normal (BVN) [0,0,1,1,], -1 < < 1,

Individual observations on y1 and y2 are available for all i. Note, in the structure, the idiosyncratic itj
creates the bivariate probit model, whereas the time invariant common effects, uij create the random
effects (random constants) model. Thus, there are two sources of correlation across the equations, the
correlation between the unique disturbances, , and the correlation between the time invariant
disturbances, . The data are generated artificially according to the assumptions of the model.

CALC ; Ran(12345) $
SAMPLE ; 1-200 $
CREATE ; x1 = Rnn(0,1) ; x2 = Rnn(0,1) ; x3 = Rnn(0,1) $
MATRIX ; u1i = Rndm(20) ; u2i = .5* Rndm(20) + .5* u1i $
CREATE ; i = Trn(10,0) ; u1 = u1i(i) ; u2 = u2i(i) $
CREATE ; e1 = Rnn(0,1) ; e2 = .7*Rnn(0,1) + .3*e1 $
CREATE ; y1 = (x1+e1 + u1) > 0
; y2 = (x2+x3+e2+u2) > 0 ; y12 = y1*y2 $
BIVARIATE ; Lhs = y1,y2 ; Rh1 = one,x1 ; Rh2 = one,x2,x3
; RPM ; Pds = 10 ; Pts = 25 ; Cor ; Halton
; Fcn = one(n), one[n] $
PROBIT ; Lhs = y12 ; Rh1 = one,x1 ; Rh2 = one,x2,x3
; RPM ; Pds = 10 ; Pts = 25 ; Cor ; Halton
; Fcn = one(n), one[n] ; Selection $
PROBIT ; Lhs = y12 ; Rh1 = one,x1 ; Rh2 = one,x2,x3
; RPM ; Pds = 10 ; Pts = 25 ; Cor ; Halton
; Fcn = one(n), one[n] $

Note that by construction, most of the cross equation correlation comes from the random effects, not
the disturbances. The second model is the Abowd/Farber version of the partial observability model.
The Poirier model is not estimable for this setup. It is easy to see why. The correlations in the Poirier
model are overspecified. Indeed, with ; Cor for the random effects, the Poirier model specifies two
separate sources of cross equation correlation. This is a weakly identified model. The implication can
be seen in the results below, where the estimator failed to converge for the probit model, and at the exit,
the estimate of was nearly -1.0. This is the signature of a weakly identified (or unidentified) model.
N12: Bivariate and Multivariate Probit and Partial Observability Models N-188

These are the estimates of the Meng and Schmidt model.

-----------------------------------------------------------------------------
Probit Regression Start Values for Y1
Dependent variable Y1
Log likelihood function -114.32973
--------+--------------------------------------------------------------------
Y1| Standard Prob. 95% Confidence
Y2| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
X1| .65214*** .10287 6.34 .0000 .45052 .85375
Constant| -.12214 .09617 -1.27 .2041 -.31062 .06634
--------+--------------------------------------------------------------------
Probit Regression Start Values for Y2
Dependent variable Y2
Log likelihood function -83.99189
--------+--------------------------------------------------------------------
Y1| Standard Prob. 95% Confidence
Y2| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
X2| .96584*** .14838 6.51 .0000 .67503 1.25665
X3| 1.00421*** .14562 6.90 .0000 .71880 1.28961
Constant| .17104 .11176 1.53 .1259 -.04801 .39009
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
Random Coefficients BivProbt Model
Dependent variable Y1
Log likelihood function -163.43468
Estimation based on N = 200, K = 9
Inf.Cr.AIC = 344.869 AIC/N = 1.724
Sample is 10 pds and 20 individuals
Bivariate Probit model
Simulation based on 25 Halton draws
--------+--------------------------------------------------------------------
Y1| Standard Prob. 95% Confidence
Y2| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Nonrandom parameters
X1_1| 1.08374*** .19408 5.58 .0000 .70335 1.46412
X2_2| 1.18264*** .22213 5.32 .0000 .74727 1.61800
X3_2| 1.18893*** .18946 6.28 .0000 .81758 1.56027
|Means for random parameters
ONE_1| -.05021 .12427 -.40 .6862 -.29377 .19335
ONE_2| .27827* .15481 1.80 .0723 -.02514 .58169
|Diagonal elements of Cholesky matrix
ONE_1| 1.08131*** .17778 6.08 .0000 .73288 1.42975
ONE_2| .42491*** .15811 2.69 .0072 .11503 .73480
|Below diagonal elements of Cholesky matrix
lONE_ONE| -.45867** .17845 -2.57 .0102 -.80842 -.10892
|Unconditional cross equation correlation
lONE_ONE| -.17471 .17798 -.98 .3263 -.52355 .17413
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
N12: Bivariate and Multivariate Probit and Partial Observability Models N-189

Implied covariance matrix of random parameters

Var_Beta| 1 2
--------+----------------------------
1| 1.16924 -.495965
2| -.495965 .390927

Implied standard deviations of random parameters

S.D_Beta| 1
--------+--------------
1| 1.08131
2| .625242

Implied correlation matrix of random parameters

Cor_Beta| 1 2
--------+----------------------------
1| 1.00000 -.733586
2| -.733586 1.00000

These are the estimates of the Abowd and Farber model.

-----------------------------------------------------------------------------
Probit Regression Start Values for Y12
Dependent variable Y12
Log likelihood function -103.81770
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y12| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
X1| .52842*** .10360 5.10 .0000 .32537 .73147
Constant| -.66498*** .10303 -6.45 .0000 -.86692 -.46304
--------+--------------------------------------------------------------------
Probit Regression Start Values for Y12
Dependent variable Y12
Log likelihood function -102.69669
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y12| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
X2| .50336*** .11606 4.34 .0000 .27588 .73084
X3| .38430*** .11126 3.45 .0006 .16622 .60237
Constant| -.64606*** .10368 -6.23 .0000 -.84927 -.44286
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
Random Coefficients PrshlObs Model
Dependent variable Y12
Log likelihood function -72.83435
Restricted log likelihood -102.69669
Chi squared [ 3 d.f.] 59.72467
Significance level .00000
McFadden Pseudo R-squared .2907819
Estimation based on N = 200, K = 8
Inf.Cr.AIC = 161.669 AIC/N = .808
Sample is 10 pds and 20 individuals
Partial observability probit model
Simulation based on 25 Halton draws
N12: Bivariate and Multivariate Probit and Partial Observability Models N-190

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y12| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Nonrandom parameters
X1_1| 1.09511*** .23019 4.76 .0000 .64394 1.54629
X2_2| 2.26279*** .79573 2.84 .0045 .70319 3.82239
X3_2| 1.90015*** .70892 2.68 .0074 .51070 3.28960
|Means for random parameters
ONE_1| .09219 .22240 .41 .6785 -.34370 .52809
ONE_2| -.06872 .36077 -.19 .8489 -.77581 .63837
|Diagonal elements of Cholesky matrix
ONE_1| .59436** .23215 2.56 .0105 .13935 1.04937
ONE_2| 1.98257*** .73799 2.69 .0072 .53614 3.42900
|Below diagonal elements of Cholesky matrix
lONE_ONE| -.91612** .41168 -2.23 .0261 -1.72299 -.10925
|Unconditional cross equation correlation
lONE_ONE| 0.0 .....(Fixed Parameter).....
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
Fixed parameter ... is constrained to equal the value or
had a nonpositive st.error because of an earlier problem.
-----------------------------------------------------------------------------

Implied covariance matrix of random parameters

Var_Beta| 1 2
--------+----------------------------
1| .353265 -.544507
2| -.544507 4.76987

Implied standard deviations of random parameters

S.D_Beta| 1
--------+--------------
1| .594361
2| 2.18400

Implied correlation matrix of random parameters

Cor_Beta| 1 2
--------+----------------------------
1| 1.00000 -.419469
2| -.419469 1.00000

These are the estimates of the Poirier model.

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y12| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
X2| .50336*** .11606 4.34 .0000 .27588 .73084
X3| .38430*** .11126 3.45 .0006 .16622 .60237
Constant| -.64606*** .10368 -6.23 .0000 -.84927 -.44286
--------+--------------------------------------------------------------------

-----------------------------------------------------------------------------
Random Coefficients PrshlObs Model
Dependent variable Y12
Log likelihood function -70.16147
Sample is 10 pds and 20 individuals
Partial observability probit model
Simulation based on 25 Halton draws
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y12| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Nonrandom parameters
X1_1| .95923*** .21311 4.50 .0000 .54154 1.37692
X2_2| 1.02185*** .28212 3.62 .0003 .46890 1.57480
X3_2| .77643*** .23096 3.36 .0008 .32376 1.22910
|Means for random parameters
ONE_1| .41477 .32108 1.29 .1964 -.21454 1.04407
ONE_2| .08625 .31520 .27 .7844 -.53153 .70402
|Diagonal elements of Cholesky matrix
ONE_1| .42395 .28240 1.50 .1333 -.12955 .97744
ONE_2| .98957*** .29127 3.40 .0007 .41869 1.56044
|Below diagonal elements of Cholesky matrix
lONE_ONE| -.62399** .31020 -2.01 .0443 -1.23197 -.01601
|Unconditional cross equation correlation
lONE_ONE| -.99693*** .01079 -92.41 .0000 -1.01808 -.97579
--------+--------------------------------------------------------------------

Implied covariance matrix of random parameters

Var_Beta| 1 2
--------+----------------------------
1| .179731 -.264539
2| -.264539 1.36861
Implied standard deviations of random parameters
S.D_Beta| 1
--------+--------------
1| .423947
2| 1.16988
Implied correlation matrix of random parameters
Cor_Beta| 1 2
--------+----------------------------
1| 1.00000 -.533382
2| -.533382 1.00000
N12: Bivariate and Multivariate Probit and Partial Observability Models N-192

N12.8 Simulation and Partial Effects

This is the model estimated at the beginning of the previous section.

y1* = a1 + b11 x1 + u1 + e1
y2* = a2 + b22 x2 + b23 x3 + u2 + e2.

The random effects, u1 and u2, are time invariant the same value appears in each of the 10 periods
of the data. The model command is

BIVARIATE ; Lhs = y1,y2

; Rh1 = one,x1 ; Rh2 = one,x2,x3
; RPM ; Pds = 10 ; Pts = 25 ; Cor ; Halton
; Fcn = one(n), one[n] $

-----------------------------------------------------------------------------
Random Coefficients BivProbt Model
Bivariate Probit model
Simulation based on 25 Halton draws
--------+--------------------------------------------------------------------
Y1| Standard Prob. 95% Confidence
Y2| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|