0% found this document useful (0 votes)
84 views18 pages

Job Choice Design Analysis

The document describes the experimental design and construction of choice sets for a study in Tanzania. It discusses how the full factorial combination of attributes and levels (1024 possible job profiles) was reduced using experimental design methods to generate 32 manageable choice sets for the study. The design was D-efficient and used 7 job attributes modeled as dummy variables (except salary, which was continuous). The choice sets balanced variations in attribute levels to allow inferences about all possible profiles while limiting the number of choices participants received.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views18 pages

Job Choice Design Analysis

The document describes the experimental design and construction of choice sets for a study in Tanzania. It discusses how the full factorial combination of attributes and levels (1024 possible job profiles) was reduced using experimental design methods to generate 32 manageable choice sets for the study. The design was D-efficient and used 7 job attributes modeled as dummy variables (except salary, which was continuous). The choice sets balanced variations in attribute levels to allow inferences about all possible profiles while limiting the number of choices participants received.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

2.

3 Experimental design and construction of choice sets

2.3.1  Design
Once agreeing on attributes and levels, the researcher defines the choice sets, which are hypothet-
ical jobs (or job profiles) resulting from combining the attributes and levels. Often, the combinations
derived from the full set of attributes and levels (full factorial) result in too many choice sets to
present to individuals. So, for example, in this study, the full factorial is 43*24=1024 possible job
profiles (3 attributes at 4 levels and 4 attributes at 2 levels). This implies (1024*1023)/2=523,775
possible choice sets. As seen in section 1, experimental design methods are commonly used to
reduce the choice set to a manageable level, while allowing the researcher to infer preferences for
all profiles.

In addition, the researcher must consider the specification of the utility function to be estimated
at the design stage, taking account of potential interaction terms and the choice between labeled
and generic experiments. Interaction terms were explored but not found to be significant, so the
main effects of generic design are discussed here. Table 2.2 shows that 6 of the 7 attributes are
modeled as dummy variables, and salary is modeled as continuous in the regression analyses.
It also shows the regression coding labels for the variables.

Table 2.2  Attributes, regression coding, levels, and modeling

Attribute Regression label Level Modeling

Salary and allowances salary T Sh 650,000/month Continuous


T Sh 500,000/month
T Sh 350,000/month
T Sh 200,000/month

Education opportunities/ edu edu_2 Dummy variable


possibility of upgrading edu_4
qualifications edu_6
edu_0

Location loc loc_dsm Dummy variable


loc_reg
loc_dis
loc_3hour

Availability of equipment equipdrugs equipdrugs_s Dummy variable


and drugs equipdrugs_i

Workload work work_normal Dummy variable


work_heavy

Housing housing housing_yes Dummy variable


housing_no

Infrastructure infra infra_good Dummy variable


infra_bad

Having defined the functional form of the utility function to be estimated, the researcher must
then employ experimental design methods to derive the choice set. As shown in section 1, both
orthogonal and D-efficient designs have been employed to date. Here a D-efficient design was
developed, with no a priori assumptions made about the parameters. The design was developed
from a computer program written by one of the researchers involved. Not all researchers con-
ducting a DCE have the skills to write the experimental design program, and catalogues, software,
and experts can help them generate such designs (see section 1 and the Uganda case study).

The experimental design applied in this study generated 32 choices (table 2.3).

43
Table 2.3  Experimental design for Tanzania study

choice- loc loc loc loc housing housing work work equip equip infra infra
alt salary edu_0 edu_6 edu_4 edu_2
set _3ho _dis _reg _dsm _no _yes _heavy _normal drugs_i drugs_s _bad _good

1 1 3 0 0 0 1 1 0 0 0 1 0 0 1 0 1 1 0

1 2 3 0 1 0 0 0 0 0 1 0 1 1 0 1 0 0 1

2 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 0

2 2 2 1 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1

3 1 1 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 1

3 2 3 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 0

4 1 2 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0

4 2 1 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1

5 1 2 1 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0

44
5 2 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 0 1

6 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0

6 2 3 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 1

7 1 3 0 0 1 0 0 0 0 1 0 1 1 0 0 1 0 1

7 2 1 0 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0

8 1 1 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1

8 2 1 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 0

9 1 3 0 0 0 1 0 0 1 0 0 1 0 1 1 0 1 0

9 2 0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1

10 1 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1 1 0

10 2 3 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1

11 1 3 0 1 0 0 1 0 0 0 1 0 0 1 1 0 1 0
choice- loc loc loc loc housing housing work work equip equip infra infra
alt salary edu_0 edu_6 edu_4 edu_2
set _3ho _dis _reg _dsm _no _yes _heavy _normal drugs_i drugs_s _bad _good

11 2 0 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1

12 1 0 1 0 0 0 0 0 0 1 0 1 0 1 1 0 1 0

12 2 2 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 1

13 1 1 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0

13 2 3 0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 1

14 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1 0

14 2 2 0 0 1 0 0 0 1 0 0 1 0 1 1 0 0 1

15 1 3 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 1

15 2 2 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 0

16 1 2 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 1

16 2 1 0 1 0 0 0 0 1 0 0 1 0 1 1 0 1 0

45
17 1 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1

17 2 2 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 0

18 1 2 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1

18 2 0 0 1 0 0 0 0 0 1 0 1 0 1 0 1 1 0

19 1 1 1 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0

19 2 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 1

20 1 2 0 0 0 1 1 0 0 0 0 1 1 0 1 0 1 0

20 2 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1

21 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1

21 2 3 0 0 1 0 0 1 0 0 0 1 1 0 0 1 1 0

22 1 1 0 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1
choice- loc loc loc loc housing housing work work equip equip infra infra
alt salary edu_0 edu_6 edu_4 edu_2
set _3ho _dis _reg _dsm _no _yes _heavy _normal drugs_i drugs_s _bad _good

22 2 1 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0

23 1 3 1 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0

23 2 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 1

24 1 2 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 1

24 2 1 1 0 0 0 1 0 0 0 0 1 0 1 0 1 1 0

25 1 3 0 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0

25 2 2 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0 1

26 1 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1

26 2 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 1 0

27 1 0 1 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0

46
27 2 1 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1

28 1 2 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 1

28 2 3 0 1 0 0 0 1 0 0 1 0 1 0 1 0 1 0

29 1 1 0 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0

29 2 3 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1

30 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 1

30 2 2 1 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0

31 1 1 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 1

31 2 2 0 1 0 0 0 0 1 0 1 0 0 1 0 1 1 0

32 1 2 0 0 0 1 0 1 0 0 0 1 0 1 0 1 1 0

32 2 2 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1
The variables are:
choiceset: indicates the choice set from the DCE questionnaire. There were 32 choice sets.
alt: indicates the alternative within each choice set. Given each option had 2 choices, alt takes on
the value of 1 or 2.
salary: is the continuous variable salary, which takes the value in the design coding of 0 to 3 where
0 = T Sh 200,000/month, 1 = T Sh 350,000/month, 2 = T Sh 500,000/month and 3 = T Sh 650,000/
month.
edu_0, edu_6, edu_4, edu_2: 4 dummy variables for the education opportunities attribute. Any
given alternative will always take a value of 1 for 1 of the dummies and 0 for all others (since only
1 level of the education attribute will be provided).
loc_3hour, loc_dis, loc_reg, loc_dsm: dummy variable levels for the location attribute
housing_yes, housing_no: dummy variable levels for the housing attribute
work_heavy, work_normal: dummy variable levels for the workload attribute
equipdrugs_i, equipdrugs_s: dummy variable levels for availability of equipment and drugs
infra_bad, infra_good: dummy variable levels for infrastructure attribute

In table 2.3 choice_set 1 has two alternatives. Alternative 1 (Job A) is defined as: salary 3 (T Sh
650,000 /month); edu_2; loc_3hour; housing_no; work_normal; equip_s; and infra_bad. Alternative
2 (Job B) has salary 3 (T Sh 650,000 /month); edu_6; loc_dsm; housing_yes; work_heavy; equip_i;
and infra_good. An example of the first choice set in the design matrix is in figure 2.1.

Figure 2.1  Example of a choice set

Job A

Availability Housing Education Workload Infra­ Salary and Location


of equipment opportunities/ structure allowances
and drugs possibility of
upgrading
qualifications

Sufficient No house is Education Normal: Nearly The place has T Sh A 3-hour or


provided offered after 2 enough time unreliable 650,000/ more bus
years of service to complete mobile month ride from
duties. One coverage, no the district
hour of extra electricity or headquarters
work per day water

Job B

Availability Housing Education Workload Infra­ Salary and Location


of equipment opportunities/ structure allowances
and drugs possibility of
upgrading
qualifications

Insufficient A decent Education Heavy: Barely The place T Sh Dar es


house is offered after 6 enough time has mobile 650,000/ Salaam
provided years of service to complete coverage, month
duties. Three electricity
hours of extra and water
work per day

Considering your current situation, which of the two jobs would you choose?

Job A:     Job B: 

47
2.3.2  Checking properties
Section 1.2.2 outlined the characteristics of a good design. While a D-efficient design was devel-
oped, the properties of a good (but not perfect) design were still expected to be present. It is useful
to check the design’s properties. Below, the orthogonality, level balance, and minimum overlap
for the design in table 2.3 are considered. Given there was no a priori information available on
parameter estimates, utility balance was not considered.

Orthogonality
This criterion requires that the levels of each attribute vary independently of each other. The cor-
relations for this design, approximated with the pwcorr command in Stata, which uses the PPM
for estimating pairwise correlation coefficients and their significance, is shown in table 2.4. All are
sufficiently low not to cause concern.

Table 2.4  Correlation matrix

salary edu_0 edu_6 edu_4 edu_2 loc_ loc_dis loc_reg loc_


3hour

salary dsm

edu_0 -.155 1

edu_6 .025 -.319*** 1

edu_4 .008 -.333*** -.319*** 1

edu_2 .121 -.347*** -.333*** -.347*** 1

loc_3hours .106 0.000 -.064 0.000 .061 1

loc_dis .106 0.000 .021 0.000 -.020 -.333*** 1

loc_reg -.232* -.020** .001 .061 -.041 -.347*** -.347*** 1

loc_dsm .025 .021 .042 -.064 .001 -.319*** -.319*** -.333*** 1

housing_no -.001 .018 -.020 -.054 .054 .090 .090 -.158 -.020

housing_yes .001 -.018 .020 .054 -.054 -.090 -.090 .158 .020

work_hard .072 -.109 -.072 .036 -.073 -.109 .036 .069 .002

work_normal -.072 .109 .072 -.036 .073 .109 -.036 -.069 -.002

equipdrug_n -.127 .036 -.072 .036 -.002 .036 -.036 .140 -.146

equipdrug_s .127 -.036 .072 -.036 .002 -.036 .036 -.140 .146

infra_bad .014 -.072 .037 -.072 -.035 .072 0.000 -.035 -.037

infra_good -.014 .072 -.037 .072 .035 -.072 0.000 .035 .037

*, **, and *** indicate significance at 10%, 5% and 1% level respectively.

48
Table 2.4  Correlation matrix (continued)

housing_ housing_ work_ work_ equip equip infra_ infra


no yes hard normal drug_n drug_s bad _good

housing_no 1

housing_yes -1*** 1

work_hard -.029 .029 1

work_normal .029 -.029 -1 1

equipdrug_n -.029 .029 -.004 .004 1

equipdrug_s .029 -.029 .004 -.004 -1 1

infra_bad -.031 .031 -.125 .125 0.000 0.000 1

infra_good .031 -.031 .125 -.125 0.000 0.000 -1*** 1

*, **, and *** indicate significance at 10%, 5% and 1% level respectively.

Level balance
Level balance requires all levels of each attribute to appear with equal frequency across profiles.
Thus for a 2-level attribute, each level should appear in 50% of the profiles, and for a 4-level
attribute each level should appear in 25% of the profiles. In this case each level of the salary
attribute should appear in 25% of the job profiles, the same applies to each level of the education
and location attributes. For the remainder of the attributes, each level should appear in 50% of
the job profiles. The level balance of the design applied in this study is in table 2.5, which shows
that the design has a relatively good, if not perfect, level balance.

Table 2.5  Level balance

Number of appearances %

650,000 T Sh per month 16 25.0

500,000 T Sh per month 16 25.0

350,000 T Sh per month 17 26.6

200,000 T Sh per month 15 23.4

edu_0 16 25.0

edu_6 15 23.4

edu_4 16 25.0

edu_2 17 26.6

loc_3hour 16 25.0

loc_dis 16 25.0

loc_reg 17 26.6

loc_dsm 15 23.4

housing_no 31 48.4

housing_yes 33 51.6

work_normal 34 53.1

work_heavy 30 46.9

49
Number of appearances %

equipdrugs_i 34 53.1

equipdrugs_s 30 46.9

infra_good 32 50.0

infra_bad 32 50.0

Minimum overlap
This criterion requires that a repeated attribute level within a choice set be minimized. This ensures
that the experiment provides maximum information on respondents’ trade-offs. If an attribute
takes the same level in each choice, no information is revealed about preferences. As can be seen
from the design in table 2.5, there are a few overlaps of attributes within the choice sets in this
study. For example, in the first choice set, presented in figure 2.1.1 the salary is the same in job A
and job B.

2.3.3 Supplemental questions
Supplemental questions were asked about demographics, including sex and age as well as previous
experience from rural areas; educational experience, including prior training programs; work expe-
rience, including previous experience of working in rural areas; and motivational issues, including
reasons why respondents became health workers, what they expected of the future, etc. The
questionnaire became relatively long—49 questions prior to the DCE exercise. The questionnaire
(including the DCE) took on average 1.5 hours to complete; students were therefore given a
modest snack and a soda in the middle of the session to keep their energy up. Information col-
lected about sex, rural background, and willingness to help others was included in the subgroup
analysis presented in Kolstad (2011). It is often interesting to collect these kinds of supplemental
data because, as in Kolstad (2011), different types of health workers have different preferences.

2.4 Development of the questionnaire, pretesting, and data collection

2.4.1 Warm-up question


One warm-up exercise was included, where respondents were introduced to the choice situation.
The questionnaire also included a one-page introduction to the task. The researcher explained this
example on the blackboard, stressed the importance of considering all attributes, and answered
questions before the students completed the choices.

2.4.2 Pilot questionnaire


The DCE was tested in a pilot with 30 students at the Kilosa Clinical Officer Training Centre (this
training center did not participate in the main data collection). The respondents completed a
relatively long questionnaire before the DCE exercise and were then interviewed about various
issues concerning their participation. The focus was on three parts of the DCE:
• Formulation of attributes and levels: were the attributes and levels clear and did they have the
right range? Were any important attributes lacking for the choices to be meaningful or were
any included attributes perceived not relevant when making choices?
• Was the task understood? Were instructions good enough? Were all attributes traded off for
each other?
• How did the students experience the exercise: were there too many choices to make? Was it
fun, boring, etc?

50
As a result of the pilot, the formulation of some of the attribute levels was changed to make them
clearer to the respondents and to get the salary levels right in particular. Moreover, some of the
questions respondents were asked before they participated in the DCE were reformulated.

The pilot also indicated that respondents found 32 choices a large number to complete. The 32
were therefore divided into two blocks, with each respondent facing 16 choices (half the respond-
ents were presented with the first 16 choice sets, the other half with the next 16 choice sets).

Choices in the pilot questionnaire included an opt-out option. This option was frequently chosen
by respondents. When asked their reasons for choosing the opt-out in the interview, many
respondents commented that they were comparing the option to their “dream job”. Describing
the opt-out was therefore difficult (since respondents had no current job, and it was difficult to
identify a typical clinical officer job), and since all but one of the participants in the pilot said that
they wanted a career in the health system in Tanzania after their studies, it was decided a forced
choice would be the most relevant type of choice for this group. That is, respondents were forced
to choose between Job A and Job B (figure 1.1).

2.4.3  Main data collection


The data were collected during the autumn of 2007 with the help of four local research assistants
traveling in teams of two. As whole classes were visited at the time, it was practical to have two
assistants at each location, but in other contexts (depending on the number of respondents, and
safety issues, etc.), it may well be sufficient to have one researcher or research assistant. The
principal investigator traveled between the teams. Some 320 clinical officer finalists (around 60%
of all clinical officer finalists in Tanzania) from 10 randomly selected schools all over the country
took part in the DCE. All finalists in these schools were invited to participate.

The data were mostly collected during school time, on the school premises. This largely explains
the response rate of around 96%, which is high for a DCE. The finalists were divided into groups
of around 15 and seated in their classrooms or another suitable room. They were given a plenary
introduction to the study, signed consent forms, and were guided through a couple of examples of
choice sets before completing a paper version of the DCE by themselves (often at their desks).

Participation was voluntary, and students were not compensated in any way, but were offered a
soda and a small snack because of the long completion time. In addition to the DCE choices, the
respondents answered a series of questions which covered, among other things, their background,
motivation, beliefs, and attitudes. The questionnaire took on average 1.5 hours to complete,
confirming that it was wise to let each respondent make only 16 choices instead of the total of
32 choices generated by the design. Data collectors spent between one and two days at each
training center.

If the intention is to conduct a DCE on more experienced health workers already working in
different areas of the country, logistical challenges on reaching the respondents will probably be
greater, because more travel will be required, and so the DCE is likely to be more costly. However,
if conducted correctly, they will provide important and valuable information on the preferences of
the existing stock of health workers, and the gains may more than outweigh the costs.

Even though relatively thorough testing had been carried out in the pilot, 20 students were ran-
domly held back, 2 at each location, after the completion of the DCE. These students were asked
to explain how they made their choices. This was done to check that the task was understood,
that real trade-offs were being made, and to get a better impression of the reasoning behind the
choices made. The interviews were very reassuring in the sense that all students reported making
trade-offs between the attribute levels and were able to reconstruct and demonstrate the trade-
offs they had made.

51
2.5 Data input

This section is included because the data matrix generated from DCEs is quite different from that
generated for most questionnaires. One feature common to all DCE datasets is that respondents
answer more than one discrete choice question, resulting in multiple observations for each individual.
Furthermore, choice sets presented to individuals contain two or more alternatives, giving multiple
observations for each choice set.

The number of observations in a dataset depends on the number of respondents, the number
of choice sets per respondent and the number of alternatives in each choice set. For instance,
in the study covered here each choice set has two alternatives (Job A and Job B), so each choice
set contributes two observations to the dataset. Moreover, each respondent is presented with 16
choices. As each choice contributes two observations and each respondent faces 16 choices, there
are 32 observations per respondent (16 choices x 2 observations per choice). A sample of the final
data matrix (an extract from the full dataset) for the case study is in table 2.6. Most variable names
in this table refer to those in table 2.2.

As with any dataset it is useful to start by ordering the variables in some logical way. One sugges-
tion—followed here—is to present all the variables in a sequence that first describes how the data
are organized (such as respondent identifier, choice set identifier), then present the independent
variables from the experimental design (attribute levels) followed by the dependent variable (what
option respondents chose). Datasets also include other variables relating to the individual, such as
socioeconomic characteristics.

The variables are:


personid: The first variable is an identification variable unique to each respondent. It will be the
same for the first 32 rows, then for the next 32 rows etc.
obsid: Stata requires a variable indicating each unique choice made. This increases successively for
each choice.
alt: the alternative within each choice (where alt=1 represents the first and alt=2 the second alterna-
tive in each choice set (Job A and B respectively).
cno: represents the choice number in the DCE questionnaire; as each respondent made 16 choices
cno will range from 1 to 16.
choiceset: since the design consists of 32 different choice sets, there is a variable indicating which
of the 32 choice sets are being observed.

(Two identical choice sets presented to different respondents would thus have the same choice-set
value but different obsid values. For individual 1 the obsid values will range from 1 to 16, for
individual 2 they will range from 17 to 32, for individual 3 they will range from 33 to 48. Thus, for
choice 3 presented to respondent 1 obsid=3, for choice 3 presented to respondent 2 obsid=19, for
choice 3 presented to respondent 3 obsid=35 etc.)

The attributes in this study are a mixture of continuous and categorical dummy variables.
salary: is the salary attribute taking the values in the dataset that correspond to the levels presented
in the questionnaire. Salary is treated as a continuous variable in the regression analysis, and it has
thus been given one column only (unlike the categorical dummy attributes).

All categorical attributes were entered as dummy-coded variables. Here the effect of a level of an
attribute is estimated relative to a base comparator or reference point.
edu_2, edu_4, edu_6 and edu_0,: there are 4 levels for the education attribute (education offered
after 2, 4, or 6 years, and no education offered). Dummy variables take the value of 1 if the level
is present in the alternative and 0 otherwise. For instance, the first alternative in table 2.6 offers
education after 2 years of service. Thus, edu_2=1, while edu_4=0, edu_6=0 and edu_0=0.

52
Table 2.6  Final data matrix

person obsid alt cno choice- salary edu edu edu edu loc_ loc loc loc_ housing housing work_ work_ equip equip infra infra const choice sex rural
id set _0 _6 _4 _2 3hour _dis _reg dsm _no _yes heavy normal drugs_i drugs_s _bad _good back-
ground

1 1 1 1 1 650,000 0 0 0 1 1 0 0 0 1 0 0 1 0 1 1 0 0 0 1 0

1 1 2 1 1 650,000 0 1 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 1 0

1 2 1 2 2 200,000 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0

1 2 2 2 2 500,000 1 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0

1 3 1 3 3 350,000 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0

1 3 2 3 3 650,000 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 0

1 4 1 4 4 500,000 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0

1 4 2 4 4 350,000 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 1 0 1 0

1 5 1 5 5 500,000 1 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 1 1 0

1 5 2 5 5 350,000 0 0 1 0 0 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0

1 6 1 6 6 200,000 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 1 0

53
1 6 2 6 6 650,000 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 0

1 7 1 7 7 650,000 0 0 1 0 0 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0

1 7 2 7 7 350,000 0 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 1 0 1 0

1 8 1 8 8 350,000 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 0 1 0

1 8 2 8 8 350,000 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0

1 9 1 9 9 650,000 0 0 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 1 0

1 9 2 9 9 200,000 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0

1 10 1 10 10 200,000 0 1 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0

1 10 2 10 10 650,000 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0

1 11 1 11 11 650,000 0 1 0 0 1 0 0 0 1 0 0 1 1 0 1 0 0 1 1 0

1 11 2 11 11 200,000 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 1 0 1 0

1 12 1 12 12 200,000 1 0 0 0 0 0 0 1 0 1 0 1 1 0 1 0 0 0 1 0
person obsid alt cno choice- salary edu edu edu edu loc_ loc loc loc_ housing housing work_ work_ equip equip infra infra const choice sex rural
id set _0 _6 _4 _2 3hour _dis _reg dsm _no _yes heavy normal drugs_i drugs_s _bad _good back-
ground

1 12 2 12 12 500,000 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 1 1 1 1 0

1 13 1 13 13 350,000 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 0 0 1 0

1 13 2 13 13 650,000 0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 1 1 1 1 0

1 14 1 14 14 200,000 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0

1 14 2 14 14 500,000 0 0 1 0 0 0 1 0 0 1 0 1 1 0 0 1 1 1 1 0

1 15 1 15 15 650,000 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 0

1 15 2 15 15 500,000 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 0 1 0 1 0

1 16 1 16 16 500,000 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 0

1 16 2 16 16 350,000 0 1 0 0 0 0 1 0 0 1 0 1 1 0 1 0 1 1 1 0

2 17 1 1 17 350,000 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 0 1 1 1

54
2 17 2 1 17 500,000 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 0 1 0 1 1

2 18 1 2 18 500,000 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 1 1 1

2 18 2 2 18 200,000 0 1 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 0 1 1

2 19 1 3 19 350,000 1 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 1 1

2 19 2 3 19 200,000 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 1 1 1 1 1

2 20 1 4 20 500,000 0 0 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1

2 20 2 4 20 200,000 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 0 1 1

2 21 1 5 21 200,000 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 0 0 1 1

2 21 2 5 21 650,000 0 0 1 0 0 1 0 0 0 1 1 0 0 1 1 0 1 1 1 1

2 22 1 6 22 350,000 0 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1

2 22 2 6 22 350,000 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1

2 23 1 7 23 650,000 1 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 0 1 1

2 23 2 7 23 200,000 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 1 1 1 1 1

2 24 1 8 24 500,000 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 1
person obsid alt cno choice- salary edu edu edu edu loc_ loc loc loc_ housing housing work_ work_ equip equip infra infra const choice sex rural
id set _0 _6 _4 _2 3hour _dis _reg dsm _no _yes heavy normal drugs_i drugs_s _bad _good back-
ground

2 24 2 8 24 350,000 1 0 0 0 1 0 0 0 0 1 0 1 0 1 1 0 1 0 1 1

2 25 1 9 25 650,000 0 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 1 1

2 25 2 9 25 500,000 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 1

2 26 1 10 26 200,000 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 1

2 26 2 10 26 200,000 0 0 1 0 0 0 1 0 1 0 0 1 1 0 1 0 1 1 1 1

2 27 1 11 27 200,000 1 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1

2 27 2 11 27 350,000 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1

2 28 1 12 28 500,000 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1

2 28 2 12 28 650,000 0 1 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1

2 29 1 13 29 350,000 0 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 1 1 1

2 29 2 13 29 650,000 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 0 1 1

2 30 1 14 30 200,000 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 1 0 0 1 1

55
2 30 2 14 30 500,000 1 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1

2 31 1 15 31 350,000 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 1 0 0 1 1

2 31 2 15 31 500,000 0 1 0 0 0 0 1 0 1 0 0 1 0 1 1 0 1 1 1 1

2 32 1 16 32 500,000 0 0 0 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1

2 32 2 16 32 500,000 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1

3 33 1 1 1 650,000 0 0 0 1 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 1

3 33 2 1 1 650,000 0 1 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1

3 34 1 2 2 200,000 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 1 0 1

3 34 2 2 2 500,000 1 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 0 1

… … … … … … … … … … … … … … … … … … … … … … … … … …
For the same alternative the location is remote, that is, loc_3hour=1 while loc_dist=0, loc_reg=0 and
loc_dsm=0. Similarly, no housing is offered so housing_no=1 while housing_yes=0 etc.
const: when using conditional logit the researcher has to include a constant term in the data matrix,
indicating whether a row of data represents Job A or Job B in a choice set. It is designed as a dummy
taking the values 0 and 1. The constant is often included in the model as a test for specification error
(Scott 2001). Further, when dummy variables are included it soaks up the preference for the base
comparator (Bech and Gyrd-Hansen 2005).
choice: is the dependent variable, indicating their choice of job (Job A or Job B). This is represented
as a dichotomous variable taking the value of 1 for the chosen alternative and zero for the one not
chosen. From table 2.6 it can be seen that the first respondent chose Job B in the first choice set
(obsid=1, illustrated above in figure 2.1.1), and Job A in the next choice set (obsid=2).

Alongside DCE responses, information was collected about respondents’ socioeconomic character-
istics, such as sex and rural background. Given that each respondent has more than 1 row in the
dataset, this information is copied on to each row related to an individual in the same manner as
the id variable. From the example presented in table 2.6, respondent number 1 is a female (male
coded 0 and female coded 1) with a nonrural background (nonrural background coded 0 and rural
background coded 1), while respondent 2 is a female with a rural background and respondent 3 is
a male with a rural background.

As socioeconomic characteristics do not vary within a choice, these cannot be added into the regres-
sion model directly. Including interaction terms between respondent characteristics and attributes
allows slope coefficients to differ across subgroups. Such variables could be created by simply mul-
tiplying the variables of interest. For example, if the researcher is interested in whether preferences
for salary vary according to the sex of the respondent, he or she can create a variable, “salary-sex”,
which is simply “salary*sex”. This can then be entered into the regression model.

Most of the above can be set up before the data are collected. Often before administering a ques-
tionnaire to the sample, data are simulated for the response variable. The model the researcher
intends to use for estimation is then applied to this simulated data as a check that the data are
correctly coded and that the design allows the estimation of parameters of interest.

In this study the data collected on paper were entered twice (by two persons) in a software package
called Epi data (http://www.epidata.dk/). This package is free for download and is well suited for
this kind of data input as it allows the person in charge to make small programs in advance, which
reduces incorrect data input substantially. However, it is unimportant which software is used for data
entry as long as the data are entered correctly, since almost any format can be converted into data
files for most software packages.

2.6 Model estimation and interpretation

2.6.1 Set-up of the basic regression model


Researchers should be aware of the requirements of the statistical software packages they are
using to analyze the data. This section presents useful tips to prepare data for analysis in a com-
monly used software package, Stata.

The final sample used in the analysis comprised 296 respondents, each providing responses to 16
completed choices and resulting in 9472 observations (296 individuals x 16 choices x 2 options for
each choice). Following on from section 1, the probability a respondent will select a specified job
is modeled. The probability of choosing a given job is determined by the indirect utility. Here it is
assumed that this is linear and additive and of the form:
V = ß1salary + ß2edu_6 + ß3edu_4 + ß4edu_2 + ß5loc_dis+ + ß6loc_reg
+ ß7loc_dsm + ß8housing_yes + ß9work_normal + ß10equipdrugs_s
+ ß11infra_good + ß12const +

56
where V is the utility derived from a given job, refers to the error term as described in section 1,
and all other variables are defined above.

Given the binary choices presented to individuals, the binary logit model and conditional logit
model could be used to analyze the data. In Stata researchers can do a logit regression by using the
logit command. However, when the data are presented as in table 2.6., a conditional logit should
be used (since the data are stacked, with each option within a choice on a different row). This will
yield exactly the same results as the binary logit, which requires the options to be on one row, and
differenced, and therefore analyzed with binary logit (logit) or random effects binary logit (xtlogit,
to allow for multiple observations).

The way this data were set up, the clogit command was used. The exact syntax in Stata is:
clogit choice salary edu_6 edu_4 edu_2 loc_dis loc_reg loc_dsm housing_yes work_normal
equipdrugs_s infra_good const, group(obsid)

where all variables are defined above. The group(obsid) indicates which rows of data that came
from the same choice set.

The regression results and the corresponding WTP measures are in table 2.7.

Table 2.7  Regression results and WTP

Attributes Regression Betas Coefficientsa WTPb


labeling
Salary salary ß1 .003***

(.0002)

Education (relative to no education offered)

Education after 6 years of service edu_6 ß2 .354*** 110.021

(.0931) (51.324462 -
168.71692)

Education after 4 years of service edu_4 ß3 .707*** 219.547

(.0747) (171.00109 -
268.09271)

Education after 2 years of service edu_2 ß4 1.149*** 356.758

(.0687) (306.88717 -
406.62958)

Location (relative to 3 miles + from district HQ)

District headquarters loc_dis ß5 .216 *** 67.16

(.0701) (24.040591-
110.27902)

Regional headquarters loc_reg ß6 .021 6.566

(.0650) (-32.951269 -
46.082906)

Dar es Salaam loc_dsm ß7 -.308*** -95.657

(.0771) (-143.49899
-47.774036)

57
Attributes Regression Betas Coefficientsa WTPb
labeling
Location (relative to 3 miles + from district HQ)

Decent housing offered housing_yes ß8 .216*** 67.171

(.0493) (36.892623 -
97.449622)

Workload (relative to heavy workload)

Normal workload work_normal ß9 -.603 -19.506

(.0482) (-49.291817 -
10.280629)

Equipment (relative to insufficient equipment)

Sufficient equipment and drugs equipdrugs_s ß10 .413*** 128.145

(.0433) (99.165972 -
157.12438)

Infrastructure (relative to poor infrastructure)

Decent infrastructure infra_good ß11 .716*** 222.36

(.0381) (195.32815 -
249.40972)

Constant Const ß12 -.017

(.0398)

Number of respondents 296

Number of observations 9472

Log Likelihood 2424.2108

Pseudo R2 0.2513

a. Standard errors in parentheses. b. Confidence intervals in parentheses.


* significant at 10% level, ** significant at 5% level, *** significant at 1% level.

When looking at the output of a DCE the first thing the researcher should do is see whether the
attributes are significant, and therefore have an impact on the probability of choosing an alternative.
He or she should consider the sign of the coefficient, where significant. A positive sign implies
that the attribute has a positive impact on the take-up of a given job; a negative coefficient the
opposite.

ß2 for instance, shows that having education opportunities after six years of service, rather than
none at all, increases the utility of the job by 0.354. Similarly, ß5 shows that if the job is in the dis-
trict headquarters, the utility increases by 0.216. Most coefficients in table 2.7 have the expected
signs. All else equal, the respondents prefer a job with higher salaries and the possibility of further
education after 2, 4, and 6 years to no further education, the earlier the better. They prefer a job
where sufficient equipment is provided to one without, and a job that offers decent housing and
infrastructure to one that does not.

The respondents prefer to work in district headquarters rather than regional headquarters or in
a location that is a 3-hour (or longer) bus ride from the district headquarters. The least popular
location is the capital, Dar es Salaam. This may seem surprising but there are several plausible
explanations. Living costs are very high in Dar es Salaam compared to other cities in Tanzania, but

58
perhaps more important, the likelihood of being in charge of a health facility and to be able to
practice as a clinician is smaller in Dar es Salaam, where most of the formally qualified medical
doctors are based.

The coefficients for the workload attribute and for being located in regional headquarters are
insignificant. There could be two reasons: either the researcher was unable to estimate the coef-
ficients efficiently with the model used, or there is too much heterogeneity in the preferences for
these attributes.

The attributes are measured in different ways—the continuous salary coefficient indicates how
much the utility increases by having one extra shilling, while the other coefficients measure the
change in utility from the references category. They are not, therefore, directly comparable.

2.6.2  Willingness to pay


Within the context of workforce issues, inclusion of a price proxy (such as salary) allows the
researcher to estimate of the monetary value of attributes of a job, that is, how much salary a
respondent would be willing to give up to have an improvement in other aspects of the job. As
seen in section 1, this can be estimated as the ratio of the value of the coefficient of interest to the
negative of the cost attribute—in this case, salaries. For example, how much monthly salary that
respondents are willing to sacrifice to receive education after six years rather than no education
can be estimated as:

Similarly, how much monthly salary respondents are willing to sacrifice for working in the district
headquarters rather than a remote area is given by:

WTP(loc_dis) = -ß5 / ß1 = 0.216/0.003 = 67.16

And how much monthly salary respondents are willing to sacrifice for working at a facility with
sufficient equipment and drugs is given by:

WTP(equipdrugs_s) = -ß10 / ß1 = 0.413/0.003 = 128.145

The WTP values can be easily estimated by hand (with a calculator) as shown above, or in a soft-
ware package such as Excel. The figures in table 2.7 are calculated within Stata and may deviate
somewhat from the results obtained with a calculator, simply because of the number of decimals
included in the coefficients above.

The advantage of estimating WTP within Stata is that the program will also estimate the confidence
intervals (reported in parentheses under the WTP estimates).

Hole (2007) describes four approaches to estimating confidence intervals for WTP estimates within
a DCE: the delta, Fieller, Krinsky Robb, and bootstrap methods. He also compares their accuracy
using simulated data. Hole (2007) concludes that the four methods give very similar results when
the model is correctly specified and the cost coefficient is relatively precisely estimated (t-stat >
10 in absolute value). He also found, more generally, that the methods tend to give similar results
(personal communication).

Hole’s wtp command for Stata implements the delta method, the Fieller method, and the Krinsky
Robb (parametric bootstrap) method. Delta method confidence intervals can also be calculated
using the nlcom command in Stata. Nonparametric bootstrap confidence intervals can be esti-
mated using Stata’s bootstrap command. The wtpcikr command in Stata can also be used to
generate Krinsky Robb confidence intervals (this command was designed for use with contingent

59
valuation data). Stata’s wtp command implements the same method (Krinsky Robb) for models
estimated using data from choice experiments. However, these commands only work for standard
logit commands such as logit, random effects logit (xtlogit), and conditional logit (clogit).

To estimate WTP measures in Stata using either the nlcom command or the wtp command, both
commands are entered immediately after the conditional logit command (clogit).

So, for example, to calculate the willingness to sacrifice salary for education after 6 years of service,
rather than no education opportunities, the nlcom command will be the following:

nlcom (_b[edu_6])/- (_b[salary])

Alternatively, one can use the wtp command. The cost attribute, the salary in this case, will then
have to be defined as a negative and placed in front of the other attributes, as specified below:

gen msalary=-salary
wtp msalary edu_6 edu_4 edu_2 loc_dis loc_reg loc_dsm housing_yes work_normal
equipdrugs_s infra_good

In order to compress the information and make it more intuitive, WTP values may sometimes be
presented graphically. An example of this is provided in the case study from Uganda.

2.6.3  Uptake rate


A useful output when using DCEs to look at recruitment and retention is how the probability of
choosing a given post changes as levels of attributes are changed. One option is to consider the
change in the probability of taking the baseline job (the reference category for all dummies) due
to a change of the level in one of the job attributes. Then, the regression model appears as that
outlined for the WTP model.

The logit probability of choosing alternative i rather than alternative j is given by:
'
e β xi
Pi = β 'x j
∑e
where x is a vector of attribute coefficients. Using this equation, the change in the probability of
taking the baseline job because of a change in one of the job attributes—say, the salary is raised to
350,000 Tanzania shillings (T Sh) per month—is then (as long as all other attributes remain equal)
given by:

e β1*350 e β1*200
Pwage=350 − Pwage=200 = −
e β1*200 + e β1*350 e β1*200 + e β1*350
e 0.003*350 e 0.003*200
= 0.003*200 0.003*350 − 0.003*200 0.003*350
e +e e +e

= 0.261
The syntax in Stata (straight after the conditional logit regression) for calculating such a change in
the probability is the following:

nlcom exp(_b[salary]*350)/(exp(_b[salary]*200) + exp(_b[salary]*350)) -


exp(_b[salary]*200)/(exp(_b[salary]*200) + exp(_b[salary]*350))

60

You might also like