Personality and Pair Programming
Personality and Pair Programming
Abstract tors have mostly been studied in terms of how much they
directly influence performance, or in some cases, satisfac-
The benefits of synergistic collaboration are at the heart tion [2, 12, 13, 16, 24, 26, 25, 31, 33, 41, 42, 44, 49]. This
of arguments in favor of pair programming. However, em- means that the nature of collaboration in terms of how pairs
pirical studies usually investigate direct effects of various interact has mainly been treated in a black-box manner, with
factors on pair programming performance without looking a few exceptions [8, 9, 10, 14, 18, 41].
into the details of collaboration. This paper reports from an When it comes to personality, the direct impact of per-
empirical study that (1) investigated the nature of pair pro- sonality on performance has been found to be modest in
gramming collaboration, and (2) subsequently investigated several areas of research [4, 38, 7], including software en-
postulated effects of personality on pair programming col- gineering [25]. However, even though direct effects on per-
laboration. Audio recordings of 44 professional program- formance are disappointing, it is not unreasonable to expect
mer pairs were categorized according to a taxonomy of col- that personality might have a more substantial impact on
laboration. We then measured postulated relationships be- how pairs collaborate. How pairs collaborate might then be
tween the collaboration categories and the personality of used to predict performance. Introducing pair collaboration
the individuals in the pairs. We found evidence that per- as a mediator variable in this manner may then even reveal
sonality generally affects the type of collaboration that oc- larger effects of personality on performance, since the ef-
curs in pairs, and that different levels of a given personal- fects of personality are filtered through the effects of pair
ity trait between two pair members increases the amount of collaboration, see Figure 1.
communication-intensive collaboration exhibited by a pair. This paper focuses on the first part of this relationship,
which consist of two issues: (1) the definition of the con-
struct of pair collaboration, and (2) the relationship between
personality and pair collaboration; for example, whether ex-
1. Introduction troverts talk more, whether conscientious people have more
task-focused conversation, and whether people with low
emotional stability have more conflicts in collaboration. For
Pair programming involves two programmers collaborating
(1), to avoid confounding of constructs, it is important to de-
in front of one computer on the same programming task
fine the construct of pair collaboration before relating this
[6, 15, 47]. Several variants of pair programming are possi-
construct to performance: Good and bad pair collaboration
ble and practiced. For example, pairs may work for shorter
should not merely be defined to be whatever gives good and
or longer periods of time, partners may rotate, and driver
bad performance. If we were to do that, we would not gain
and navigator roles may, or may not be adhered to. On
insight into collaboration, and pair collaboration as a me-
the one hand, pair programming inspires a particularly close
diator variable would add nothing to the model. Therefore,
form of collaboration which might intensify group dynam-
the part of the relationship that concerns the effect of pair
ics, while, on the other hand, short sessions may not allow
more inert group dynamics to manifest themselves [17].
In any event, it is of interest to investigate factors that
may affect the interaction that occurs in pair programming ¢
[1, 19]. These factors include personality, gender, expertise,
attitudes, motivation, and preferences. However, since per-
formance, e.g., in terms of time and quality, is often the ul-
Figure 1. Pair Collaboration as a mediator variable
timate criterion variable in software engineering, such fac-
collaboration on pair performance is left for future research. task a test case was provided that each subject or pair used
Audio recordings of 44 pairs solving a change task were to test the solution. Further details, including validity is-
analyzed. We developed a taxonomy for classifying pair sues, are provided in [2, 25]. All verbal interaction during
collaboration from verbal interaction, that was intended to tasks T1 –T4 was audio recorded. At the end, the option was
capture the nature of collaboration in terms of (a) which given to complete the Big Five personality test.
subtasks are performed, (b) what kind of interaction occurs,
(c) on which cognitive level collaboration is conducted. We 3. Collaboration
also recorded the extent of collaboration, that is, how much
collaboration there is in pair programming. We then con-
Collaboration denotes a situation in which both parties con-
ducted an analysis to investigate postulated relationships
that personality might have on collaboration. tribute new information to a given task. In contrast, cooper-
ation involves splitting the task into subtasks and working
Section 2 summarizes the study. Section 3 describes our
on the subtasks separately [9]. Clearly, pair programming
investigation into pair collaboration, Section 4 describes the
is intended as a collaborative task, rather than a cooperative
personality model that was used in this study, Section 5
task in this respect.
presents the analysis of the effect of personality on pair col-
laboration, and Section 6 discusses and concludes. In order to determine the extent and nature of collabora-
tion, we carried out a content analysis of the audio record-
ings. The objective of a content analysis is to elicit semantic
2. Overview of Study content from recorded or written material in a systematic
manner [32]. In our case, the semantic content of interest
Our study used data collected during the experiment re- was the type of collaboration, and the analysis was done by
ported in [2], which compared the performance of profes- categorizing passages of speech (the units of analysis) ac-
sional pair programmers with that of solo programmers. cording to a coding scheme. We followed the steps of con-
Our study focuses on 44 of the pairs of that experiment. tent analysis as summarized in [40]. The material consisted
These programmers were recruited from software consul- of the audio recordings of from the four pair programming
tancy companies in Norway and Sweden in 2004/2005. tasks. We content analyzed the third task T3 only. This
The pairs were formed so that both individuals in a pair decision was made upon the assumption that collaboration
had the same level of expertise. The subjects did not know on this task was most representative for pair programming,
in advance who their partner would be during the study. since the pairs would have had a better chance to adjust to
Within each level of expertise, pairs were assigned ran- each other as well as to any problems with the experimental
domly to one of two treatments pertaining to task complex- routines, equipment, etc. The pairs’ team process may also
ity. Note, however, that our present analysis does not in- have had some time to settle [36, 35] or “jell” [48].
clude expertise or task complexity.
Each pair participated for one day and their session was 3.1. Sampling Strategy
divided into four stages. First, the subjects were given a pre-
sentation that included an introduction to the concept of pair The first step of a content analysis is to determine the ma-
programming, which focused on the active collaboration in terial to be analyzed. Our raw material was audio record-
pair programming and which involved a short description of ings. We faced a decision as to whether to transcribe the
the two roles (driver and navigator). The subjects were told audio recordings to text. As discussed in [34], transcription
that they could decide for themselves how often and when is extremely time consuming and should be subject to cost-
to switch roles, but that they had to try both roles at least effectiveness considerations. In our case, the focus was on
once. After the presentation, the subjects started perform- the content of collaborative measures, rather than on what
ing a training task, and a pretest task individually. Then, the was literally said. Exact transcripts were therefore regarded
subjects performed the three pair programming tasks T1 – as less important, and we chose to code directly on the audio
T3 as well as a time sink task T4 in pairs. The pair pro- files using Transana as a coding tool.1
gramming tasks were done on two different versions of the
system (a coffee machine application) according to the task 3.2. Units of analysis
complexity treatment. The first two tasks, T1 and T2 , were
simpler warm-up tasks (implementing a coin return button
Units of analysis were defined on two levels. The first level
and adding a new drink to the menu). The third task T3 was
captured the task focus of the pair. A unit of analysis at
the main task (adding an ingredients check). To support the
this level spanned longer discourse sequences evidencing
logistics of the study, the subjects used a web-based exper-
iment support tool [3] to answer questionnaires, download 1 Transana is a trademark of the Wisconsin Center for Education
code and documents, and to upload task solutions. For each Research, University of Wisconsin—Madison.
<
N : " ; N
<
S : T U : ; S U
< D
E ; E J @
V B B W X Y
D
O @
9 <
7 8 : : ; 7
F G
@
W X G
C C G L
M @ @
9 <
= > : : ; =
A
@ @
F Z F [
F H I
A B
? @
H H [
J
D
C
H D H
@
L
K @
H H
@ @ \
<
8 P Q P " ; Q
M
V Z W X V
J @
F D W W
@ H D H
R @
G ]
@
D
R @
L
O @ @ @ O D
C R @ C
G
R R
g h S i ! :
! "
^ 8 P
e : f : P
j
: k i
5 _ + - 1 + ( % & / ` ' ) * % +
3 % + c ( & + d $ % % &
5 _ + - 1 + ( % & / ( 4 % $ a & ) * % +
3 % + c ( & ' % 1 ( $ + -
5 _ + - 1 + ( % & ( + & b * ) . . ) c 1 - ( 2
3 % , % + ( & & % - .
3 % , % + ( & / ( 4 % $
5 - + 6 / $ + ( % & & % - .
3 % d 1 - + ( % & + c ( ) / '
5 - + 6 / $ + ( % & / ( 4 % $
a particular focus until it was evident that a different fo- theme building in both a data-independent manner and in
cus was present (Section 3.3.1). Depending on the focus an exploratory manner in parallel: One preliminary coding
in question, a further analysis was conducted where units scheme was developed on the basis of existing relevant cod-
of analysis were so-called interaction sequences built from ing schemes. Another preliminary coding scheme was de-
statement units. These interaction sequences were typically veloped on the basis of samples of our audio recordings. All
sequences of two or more alternating utterances between the coding schemes were subsequently systematized in a large
peers on a certain topic (Section 3.3.2). table in order to show overlaps between categories and dif-
ferences. This resulted in a distilled coding scheme, which
3.3. Themes was, as is customary [37], tested for reliability on 10 percent
of the material. The resulting coding scheme is depicted in
A content analysis revolves around a system of categories. Figure 2, which we will now explain.
Each subsystem in such a system is called a theme. Fig-
ure 2 shows our six themes. For example, the theme In- 3.3.1 Task Focus
teraction Pattern categorizes verbal discourse according to
The categorization of Task Focus is intended to capture what
categories intended to describe collaboration. In this way,
the pair programmers are occupied with during a collabora-
themes represent the constructs of the study. Constructs
tive session. The starting point for defining this theme was
may correspond to concepts (of a theory), and in that case,
a division of the task of pair programming into subtasks.
the constructs (themes) may be predefined. Or constructs
Bryant et al. [9] divided pair programming into subtasks and
may emerge from a particular study, in which case themes
registered the number of verbal utterances (in which new
are constructed in an exploratory manner in the manner of
information was contributed) for each subtask. Our Task
grounded theory building.
Focus is a coarser-grained and extended variant of Bryant
Our goal was to uncover patterns of collaboration, some-
et al.’s subtask categorization, and is shown to the left in
thing we expected to be more universal than that which
Figure 2. The categories are defined as follows:
pertains to this particular study. In the outset, then, we
wished to use predefined themes which capture the essence • Off Task—Utterances do not concern anything directly
of collaboration in pair programming from verbal data. In related to solving the task.
our review of related work, we found several useful coding • Task Description—Utterances pertain to reading the task
schemes, but we also realized that 1) no single scheme cap- description and understanding what is to be done.
tured the aspects that we wanted to capture, 2) that several • Code Comprehension—Utterances pertain directly to
of the schemes did have several aspects but that they were solving the task with a focus on understanding existing
not dealt with orthogonally, and 3) that coding schemes code.
from other disciplines were also highly relevant. • Programming—Utterances pertain directly to solving the
We therefore used a hybrid approach, in a double sense: task with a focus on developing new code.
First, we combined coding schemes from several stud- • Programming Aloud—Utterances made during active
ies and from several disciplines. Second, we approached programming but not intended for dialog.
• Programming Silently—Active programming with no Interaction sequences are characterized according to the
verbal utterances, i.e., only audible typing. five themes Begin Characteristics, Interaction Pattern, End
• Other Relevant Tasks—Utterances pertain to other tasks Characteristics, Result, and Cognitive Level. The themes
that are relevant to solving the main task, but that are not are coarser grained and at a higher level of abstraction than
directly focused on code comprehension or programming. the main categories of the Statement Unit Types. Neverthe-
• Compile and Test—Utterances pertain to compiling and less, statement units types are aids to indicate which theme,
testing the produced code. and which category in a theme, that is appropriate.
• Silence—There are no utterances and no typing.
• Unintelligible—The utterances are not intelligible. 3.3.3 Begin Characteristics
We are interested in problem-solving during actual pro- This theme characterizes the first statement unit of an in-
gramming. Thus, a further analysis was only conducted teraction sequence, and is designed to capture degrees of
when Task Focus was Code Comprehension or Program- assertiveness.
ming (although the other foci are also relevant). We ana- • Question—A passive request for informa-
lyzed the amount of actual verbal collaboration when pro- tion/clarification.
gramming in contrast to non-collaborative modes charac- • Suggestion—An active contribution of an idea, a plan of
terized by utterances not intended for collaboration or near- action, or information; possibly formulated as a question,
solo programming (hence the distinction between Program- e.g., a statement unit type of Presents idea or similar.
ming, Programming Aloud, and Programming Silently).
• Assertion—A statement describing or claiming how
things are or how things should be done), e.g., a statement
3.3.2 Interaction Sequences unit type of Presents idea, Presents Information, or similar.
If the Task Focus was Code Comprehension or Program- • Imperative—A statement asking or ordering the other
ming, a further analysis was conducted in order to deter- peer to do something.
mine interaction sequences (Figure 2 middle part). The ba-
sic units of an interaction sequence are statement units. A 3.3.4 Interaction Patterns
statement unit is a “codable unit of speech (i.e., a word, sen-
This theme characterizes the interaction sequence as a
tence, or sentences)” [27]. An interaction sequence is then
whole, and is designed to capture the central aspects of col-
built from one or more statement units.
laboration. Our categories are modeled around Hogan et
Hogan et al. [27] assigns each statement unit to a
al.’s [27] interaction pattern categories. The categories are
Statement Unit Type under main categories Conceptual,
defined as follows:
Metacognitive, Question-query, and Nonsubstantive (Fig-
ure 2 lower part). Statement units of certain types typi- • Consensual—Only one speaker contributes substantive
cally begin interaction sequences (e.g., Presents idea, Re- statements (i.e., Conceptual, Metacognitive, Question-
quests information), while others typically end interaction queries), while the other speaker responds by (a) agree-
sequences (e.g., Repeats other, Reacts agrees). Some types ing with the statement, passively or neutrally acknowledg-
may both begin and end an interaction sequence, and state- ing the statement, (b) actively accepting what was said and
ments of other types will typically only be found in the mid- thereby encouraging the speaker to continue, or (c) repeat-
dle of interaction sequences (e.g., Elaborate other). ing the preceding statement verbatim. Thus, in consensual
An interaction sequence begins when Task Focus sequences one speaker carries the conversation, with the
switches to Comprehension or Programming, or when a other speaker serving as a minimally verbally active audi-
preceding interaction sequence ends. An interaction se- ence. Although consensual sequences may last only a few
quence ends when it is immediately replaced by another statement units, sometimes a single speaker may contribute
interaction sequence, or when a Task Focus category other many ideas to the discussion with all of the intervening
than Comprehension or Programming starts. The difficult statements by the other speaker being nonsubstantive.
part of this is to determine when one interaction sequence • Stonewalling—A speaker delays a response or otherwise
ends and another begins. The main guideline for determin- obstructs any elaboration of the other peer’s input.
ing the transition from one interaction sequence to the next • Cross Purpose— The speakers’ statements are cross pur-
is a change of topic. However, any change in the nature of pose, that is, each speaker is speaking on separate topics.
discourse also warrants the end of one sequence and the be- • Responsive—Both questions and responses of both peers
ginning of another, that is, if it is reasonable to define a piece contribute substantive statements to the discussion. Respon-
of discourse into two interaction sequences with two differ- sive patterns are often only a few statement units in length.
ent characteristics, then this warrants the existence of these They become longer when several agreements or neutral
two interaction sequences instead of a single sequence. comments are embedded within the sequence.
audio file that were analyzed by both coders. The agreement 5. Personality and Collaboration
scores were in the range of 78%–92%, which is acceptable.
We postulated the following relationships:
4. Personality R1 Personality affects the type of Collaboration. This re-
lationship investigates our overall exploratory research
There exist several models of personality with several alter- question, and concerns the initial usefulness of Collab-
native operationalizations or tests (usually questionnaires) oration as a mediator variable as depicted in Figure 1.
that are administered to measure a person’s personality.
A model that in recent years has dominated the academic R2 Variability in Personality increases the amount of com-
scene [4] consist of five factors and goes under the name munication-intensive Collaboration. This is the pos-
of the Big Five [20, 21]. The Big Five posits that the most tulate most closely based on findings in the available
important personality differences in people’s lives will be- literature. It is primarily based on findings from the
come encoded as terms in their natural language, the so- Myers-Briggs Type Indicator-based pair programming
called Lexical Hypothesis [20]. and collaboration experiment by Sfetsos et al. [41].
The Big Five model consists of the following five person- The study indicated that there is a significant corre-
ality factors (traits) [20, 21] (with descriptions from [39]): lation between mixed personalities and high amounts
of communication transactions. Their results indicate
Extraversion (Factor 1) Assesses quantity and intensity of
that pairs with mixed personalities both communicate
interpersonal interaction; activity level; need for stim-
better and perform better. Further, Karn and Cowling
ulation; and capacity for joy.
claim that homogeneous teams (with respect to per-
sonality) are not ideal, and that such teams “run into
Agreeableness (Factor 2) Assesses the quality of interper-
a real danger of falling into the no debate trap” [29].
sonal orientation along a continuum from compassion
Williams et al. [49] present a similar finding.
to antagonism in thoughts, feelings, and actions.
Conscientiousness (Factor 3) Assesses degree of organiza- R3 Pairs in which the members have similar levels of Ex-
tion, persistence, and motivation in goal-directed be- traversion are less likely to disrupt each other. This re-
havior. Contrasts dependable, fastidious people with lationship surfaced in the ethnographic study in [29].
those who are lackadaisical and sloppy. In one of the groups in the study, none of the mem-
bers disrupted anyone else at all. The authors explain
Emotional stability/Neuroticism (Factor 4) Assesses this by the fact that the group had one clearly dominant
adjustment versus emotional stability. Identifies member, the group’s only extrovert.
proneness to psychological distress, unrealistic ideas,
excessive urges, and maladaptive coping responses. R4 High Extraversion Elevation leads to more communi-
cation-intensive Collaboration. This relationship is
Openness to experience (Factor 5) Assesses proactive suggested in [47]. Two extraverts will talk unnecessar-
seeking and appreciation of experience for its own ily, sometimes about things outside the task, and will
sake; toleration for and exploration of the unfamiliar. thus spend longer time on the task.
We used the Big Five Factor Markers [28, 23, 22] with
R5 High Agreeableness Elevation leads to more Off Task
100 indicators—20 per trait. The 100 indicators are self-
communication. This is an assumption based on the
assessment questionnaire items on a seven-point Likert
definitions of the personality traits and our collabora-
scale. For example, one of the 20 items for Extraversion
tion categories. It is reasonable to believe that people
is “I feel comfortable around people”, and one of the 20
scoring high on agreeableness might initiate off task
items for Agreeableness is “I make people feel at ease”.
communication such as small talk, since agreeableness
For pairs, some notion of Pair Personality must be de- indicates a genuine interest in other people’s lives.
vised. The most common ways of aggregating person-
ality scores into team scores is by taking the mean, the
R6 High Extraversion Elevation leads to more Metacogni-
minimum, the maximum, or the variance of the individual
tive statements. This is, like R5 based on the defini-
scores. These aggregates can be seen as alternative oper-
tions of the personality traits and our collaboration cat-
ationalizations of two Pair Personality constructs, namely
egories. Extraverts might be more likely to cope with
trait Elevation (with the mean as the canonical measure) and
frustration about tasks by expressing their opinions on
Variability (with the variance, or difference as the canonical
and feelings about the task.
measure) [5, 25, 38, 43].
5.1. Analysis
All Row s
Count 44 LogWorth Difference
Mean 18,866729 4,863871 24,1418
The next sections describe the analysis of these relation- Std Dev 13,129374
Data files were prepared with SAS 9.2, Enterprise Guide 4 Count
Mean
39 LogWorth Difference
16,123345 1,1885681 12,2921
Count
Mean
5
40,26512
and SPSS 16.2 as in [25], together with additional scripts.2 Std Dev 10,773756 Std Dev 10,229602
B5_1_Mean<41,753304891 B5_1_Mean>=41,753304891
5.1.1 Variables Count 6 Count 33
Mean 5,7223093 Mean 18,014443
Std Dev 3,0167331 Std Dev 10,603944
The starting point of a decision tree analysis is a dependent
variable and a set of independent variables. The indepen-
dent variables in our case were operationalizations of Pair
Figure 3. Example of decision tree
Personality Elevation and Variability (Section 4) in terms of
score means and differences. The dependent variables were
measures of the various theme categories for Collaboration Decision tree analysis is independent of any assumptions
(Section 3.3). We used two types of measure: The first type on normality or types of data. Splits nearer the root split
was the percentage of time allocated to a category, relative more of the observations and signify more general effects
to the total length of T3 ; for example, the amount of time than splits further away from the root. Nonlinear effects
spent by a pair making utterances that were classified to, are reflected by successive and asymmetrical splits (in the
say, Task Description under Task Focus (TF-D). The second sense of producing an unbalanced tree) with respect to the
type of measure was the percentage of clips allocated to a same independent variable. Interaction effects are reflected
category. For interaction sequence categories, the percent- by asymmetrical splits of different variables.
age was calculated relative to the total number of interac-
In Figure 3, only the top split is a significant split.3 The
tion sequence clips for a pair; for example, the percentage
tree should be read as follows (if focusing on all splits, re-
of Elaborative interaction patterns relative to the total num-
gardless of significance): The main effect for the percentage
ber of clips classified as interaction sequences for a pair.
of time used on Programming Aloud was in the difference of
For other clips, the percentage was calculated relative to the
Emotional Stability (B5 4 StdDev), where less difference
total number of clips for a pair.
leads to more Programming Aloud. For the pairs who had
higher differences in Emotional Stability, high scores on Ex-
5.1.2 Decision Trees traversion (B5 1 Mean) leads to more Programming Aloud.
ness toward outliers, the process was set to terminate when of the difference in mean values the child nodes with regards to the
partition sizes went below 5. dependent variable. Specifically, LogWorth = −log10 (p), where p
is the adjusted probability of the observed data under the hypoth-
2 SAS, Enterprise Guide, and jmp are trademarks of SAS Insti- esis of the means being equal. Thus, p-values less than 0.05 are
tute Inc. SPSS is a trademarks of SPSS Inc. reflected by LogWorth-values greater than 1.30.
Support Findings Personality Measure Time Allocated to Category Times Category Used
% time +/- LogWorth (p) % clips +/- LogWorth (p)
R1 Yes Several personality factors influence collaboration categories significantly Extraversion mean TF-O - 1.43 (0.04) EC-f - 1.48 (0.03)
CL-P + 1.31 (0.05) EC-d + 1.48 (0.03)
R2 Yes Personality differences influence several communication-intensive IP-n - 1.37 (0.04)
collaboration categories significantly Extraversion diff TF-Z + 3.94 (0.00) IP-x + 2.00 (0.01)
R3 No Contradicting evidence IP-x + 1.51 (0.03)
Agreeableness mean IP-s + 2.48 (0.00) TF-P - 2.07 (0.01)
R4 No No significant impact found BC-q - 1.87 (0.01) TF-C + 2.07 (0.01)
R5 No Extraversion affects several communication-intensive collaboration IP-x + 1.65 (0.02) Re-r - 2.00 (0.01)
Re-r - 1.50 (0.03) IP-q - 1.97 (0.01)
categories, and mostly in one direction, but only non-significantly CL-D - 1.47 (0.03) Re-u + 1.88 (0.01)
R6 No No significant impact found CL-M - 1.35 (0.04)
Agreeableness diff TF-X + 1.35 (0.04) Re-r - 1.47 (0.03)
Conscientiousness mean TF-D - 2.16 (0.01)
Conscientiousness diff
Table 1. Support for postulated relationships Emotional Stability mean TF-Z - 1.39 (0.04)
CL-M + 1.59 (0.03)
Emotional Stability diff TF-PA - 4.86 (0.00)
Openness mean BC-i + 3.70 (0.00) BC-i + 4.68 (0.00)
Openness diff IP-r + 2.11 (0.01) IP-r + 1.60 (0.03)
This focus on top splits describes the most general
trends, but in the presence of all independent variables. This
Table 2. Significant Personality influences on Col-
procedure will not prevent potential significant top splits
laboration categories—top splits
from surfacing, since the splits are made on the basis of
maximizing significance. This means that the top-most split
will always be more significant than the top-most split of
the next run, where the variable that was in the previous communication-intensive Collaboration. Communication-
top split is removed. When the top split is no longer sig- intensive Collaboration relates to the categories Off Task
nificant, the process can be stopped, since the next top split (TF-Z) Elaborative (IP-e), Responsive (IP-r), Cross Pur-
(after removing yet a variable) would be even less signifi- pose (IP-x), Disruption (EC-d), and Unresolved (Re-u).
cant. The process does, however, discard significant splits These categories either signify substantial mutual verbal in-
beneath non-significant top splits. This exclusion is justifi- volvement from both parties (e.g., Elaborative) or an over-
able, since significance is a criterion for the model. flow of verbal initiative (e.g., Disruption).
In addition to this, the standard complete split trees were Several of these collaboration categories were found to
investigated with regards to the specific variables that are be significantly influenced by difference (the operational-
involved in the relationships. Often, this extra check lead to ization of Variability) in Personality. This supports R2 .
the realization that certain findings had to be revised. Some categories, however, contradicted this, but mostly
non-significantly. The only significant finding that contra-
5.2. Results dicted R2 , can be seen in the left part of Table 2: For pairs
with a high Agreeableness difference, the programmers are
The relationships R1 –R6 were operationalized in terms of silent (TF-X) for a longer total time than pairs with more
the corresponding measures described in Section 5.1.1, and similar Agreeableness scores.
subjected to data analysis. The data analysis only supported To investigate R2 further, we extended the top split analy-
R1 and R2 , see Table 1. In the following we give a more sis described in Section 5.1.3 with a larger aggregated anal-
detailed account of the findings. ysis similar to that in [25]. For this, the complete split trees
R1 —Personality affects the type of Collaboration. Ta- were investigated, and the order of the splits, as well as
ble 2 lists all the significant findings when using the proce- whether they were significant or not, was noted. In addi-
dure described in Section 5.1.3. The “+/-” columns indicate tion to the above-mentioned communication-intensive cate-
whether the collaboration category is affected positively or gories, we designated a group of silent categories as a con-
negatively by the personality factor. For example, the first trast. This group consisted of Programming Silently (PS),
row shows a relationship between Extraversion mean and Silence (TF-X), Nonresponsive (IP-n), Consensual (IP-c),
the Other Relevant Tasks (O) category under the Task Fo- Stonewalling (IP-s), Flow (EC-f ), and Resolved (Re-r).
cus theme (TF) (Section 3.3.1). The “-” in the third column We then performed the complete tree analyses on these
indicates a negative relationship, i.e., that a high Extraver- two contrasting groups of categories. This analysis can be
sion mean score for the pair relates to less TF-O. Nearly all seen in Table 3. The numbers indicate how early the split
personality factors influenced one or more of the Collab- occurred (with 1 being the first split of the tree, 2 the sec-
oration category measures significantly, both with respect ond, and so on). A ’*’ indicates a significant split. The +/-
to percentage of time allocated to a category, (Table 2 left in front of the numbers indicate whether the split signifies
part), and when measuring percentage of times a category a positive or a negative relationship. At the bottom of each
was used (Table 2 right part). Thus, there is evidence in column, the R2 of an n-fold cross validation is given, that
support of R1 . is, the average R2 over n=44 predictions where n − 1 ob-
R2 —Variability in Personality increases the amount of servations are used to predict the nth observation. At the
n -Fold R 2 .36 .46 .16 .32 .24 .12 .16 -.04 .04 -.04 -.04 .28 .32 .12 .40 .16 .20 -.09 -.14 -.04 -.04 .21 .21
Overall R 2 .56 .59 .37 .50 .48 .33 .36 .21 .25 .26 .30 .49 .54 .36 .59 .40 .44 .24 .21 .26 .30 .40 .40
very bottom, the overall R2 , which indicates the ratio of each other less either. The partition trees for this analy-
explained variance, is given. sis are contradictory with signs of both positive and nega-
Table 4 ranks each independent variable according to tive relationships. However, the top split analysis done in
how early associated splits occurred, by the formula max connection with R1 does suggest (non-significantly) a pos-
splits + 1 - split number. In our case, max splits was seven, itive relationship between Extraversion difference and Dis-
that is, no trees had more than seven splits. Thus, if, say, ruption. Further investigation of this relationship should be
Agreeableness diff assumed a significant second split in a conducted, for example with an expanded sample size.
model, then that split would contribute 7+1-2=6 to the rank- R4 —High Extraversion Elevation leads to more com-
ing for Agreeableness diff. The ’*’-columns only count sig- munication-intensive Collaboration. Our analysis does not
nificant splits. The first two columns are the results from offer support for R4 . Analyses seem to suggest that com-
the communication-intensive categories (C) minus the re- munication-intensive Collaboration categories are affected
sults from the silent categories (S). This difference indi- by Extraversion, and mostly in one direction, but the results
cates the overall influence that the personality trait has on are highly non-significant on all categories. The only sig-
the amount of communication-intensive Collaboration. A nificant finding was that extraverts will disrupt each other
positive score indicates that difference in personality traits more often than introverts.
leads to an increase in communication-intensive Collabo- R5 —High Agreeableness Elevation leads to more Off
ration and would give evidence in favor of R2 . A negative Task communication. Our analysis does not offer support
score would contradict R2 . for R5 . Agreeableness mean does not significantly influence
Table 3 and 4 show that there are indeed a number of time used on neither Off-Task, Metacognitive nor Other
splits that indicate that variability in personality has an im- Relevant Tasks. (The non-significant results suggest that
pact on the communication-intensive categories. This is Agreeableness mean decreases occurrences in all three cat-
most evident for Extraversion, and it confirms our find- egories.) Moreover, high Agreeableness mean significantly
ings from the first analysis: The amount of communication- relates to fewer occurrences of Metacognitive statements.
intensive Collaboration does increase when the pairs have So if anything, highly agreeable people small talk less than
different personalities (except for differences in emotional their not-so-agreeable counterparts in a pair programming
stability). This provides support in favor of R2 . situation. This suggests that agreeable people will use the
R3 —Pairs in which the members have similar levels of cognitive level of Metacognitive less.
Extraversion are less likely to disrupt each other. Our anal- R6 —High Extraversion Elevation leads to more
ysis does not offer support for R3 . Individuals that have sim- Metacognitive statements. Our analysis does not offer
ilar levels of Extraversion do not disrupt each other more, support for R6 . The significance of the splits are very low,
but there is no significant evidence that they will disrupt and the split trees are contradictory. Our analysis gives
no reason to posit that Extraversion has any effect on the
amount of use of the cognitive level Metacognitive.
(C-S)* C-S C* C S* S
Extraversion diff 26 6 25 30 -1 24
Agreeableness diff 1 -5 0 12 -1 17 5.3. Threats to Validity
Conscientiousness diff 5 4 5 -2 0 -6
Emotional Stability diff 2 -31 -1 -8 -3 23
Openness diff 7 40 7 25 0 -15 The two most important threats to validity of this study are
construct validity and the corresponding (inter-rater) mea-
Table 4. Exploratory analysis for R2 aggregated surement reliability. The structure of the collaboration con-
[19] H. Gallis, E. Arisholm, and T. Dybå. An initial frame- [35] L.L. Levesque, J.M. Wilson, and D.R. Wholey. Cognitive
work for research on pair programming. In Proc. 2003 Int’l divergence and shared mental models in software develop-
Symp. Empirical Software Engineering (ISESE’03), pages ment project teams. J. Organizational Behavior, 22:135–
132–142, 2003. 144, 2001.
[20] L.R. Goldberg. An alternative description of personality: [36] J.E. Mathieu, G.F. Goodwin, T.S. Heffner, E. Salas, and J.A.
The big-five factor structure. J. Personality and Social Psy- Cannon-Bowers. The influence of shared mental models
chology, 59:1216–1229, 1990. on team process and performance. J. Applied Psychology,
[21] L.R. Goldberg. The structure of phenotypic personality 85(2):273–283, 2000.
traits. American Psychologist, 48:26–34, 1993. [37] P. Mayring. Qualitative content analysis. Forum Qual-
[22] L.R. Goldberg. A broad-bandwidth, public domain, person- itative Sozialforschung / Forum: Qualitative Social Re-
ality inventory measuring the lower-level facets of several search [On-line Journal], 1(2), June 2000. Available
five-factor models. In I. Mervielde, I. Deary, F.D. Fruyt, at: http://www.qualitative-research.net/
and F. Ostendorf, editors, Personality Psychology in Europe, fqs-texte/2-00/2-00mayring-e.htm.
volume 7, pages 7–28. Tilburg University Press, 1999. [38] M.A.G. Peeters, H.F.J.M. van Tuijl, C.G. Rutte, and
[23] L.R. Goldberg, J.A. Johnson, H.W. Eber, R. Hogan, M.C. I.M.M.J. Reymen. Personality and team performance: A
Ashton, C.R. Cloninger, and H.C. Gough. The international meta-analysis. European J. of Personality, 20:377–396,
personality item pool and the future of public-domain per- 2006.
sonality measures. J. Research in Personality, 40:84–96, [39] L.A. Pervin and O.P. John. Personality: Theory and Re-
2006. search. John Wiley & Sons, Inc., seventh edition, 1997.
[24] B. Hanks. Student attitudes toward pair programming. In [40] C. Robson. Real World Research. Blackwell Publishing,
Proc. 11th Annual Conf. Innovation and Technology in Com- second edition, 2002.
puter Science Education (ITiCSE06), pages 113–117. ACM, [41] P. Sfetsos, I. Stamelos, L. Angelis, and I. Deligiannis. In-
2006. vestigating the impact of personality types on communi-
cation and collaboration-viability in pair programming—an
[25] J.E. Hannay, E. Arisholm, H. Engvik, and D.I.K. Sjøberg.
empirical study. In Proc. Seventh Int’l Conf. Extreme Pro-
Personality and pair programming. To appear in IEEE
gramming and Agile Processes in Software Engineering (XP
Transactions on Software Engineering, 2009.
2006), volume 4044 of Lecture Notes in Computer Science,
[26] J.E. Hannay, T. Dybå, E. Arisholm, and D.I.K. Sjøberg. The
pages 43–52. Springer-Verlag, 2006.
effectiveness of pair programming: A meta-analysis. To ap-
[42] L. Thomas, M. Ratcliffe, and A. Robertson. Code warriors
pear in Information & Software Technology, 2009.
and code-a-phobes: A study in attitude and pair program-
[27] K. Hogan, B.K. Nastasi, and M. Pressley. Discourse patterns
ming. In Proc. 34th Technical Symp. Computer Science Ed-
and collaborative scientific reasoning in peer and teacher-
ucation (SIGCSE’03). ACM, 2003.
guided discussions. Cognition and Instruction, 17(4):379–
[43] A.E.M. Van Vianen and C.K.W. De Dreu. Personality in
432, 2000.
teams: Its relations to social cohesion, task cohesion, and
[28] International Personality Item Pool. A scientific collabora- team performance. European J. Work and Organizational
tory for the development of advanced measures of personal- Psychology, 10:97–120, 2001.
ity traits and other individual differences, 2007. [44] K. Visram. Extreme programming: Pair-programmers, team
[29] J.S. Karn and A.J. Cowling. A study of the effect of per- players or future leaders? In Proc. Eighth IASTED Int’l
sonality on the performance of software engineering teams. Conf. Software Engineering and Applications, pages 659–
In Proc. Fourth Int’l Symp. Empirical Software Engineering 664. Acta Press, 2004.
(ISESE’05), pages 417–427. ACM, 2005. [45] A. von Mayrhauser and S. Lang. A coding scheme to sup-
[30] J.S. Karn and A.J. Cowling. A follow up study of the effect port systematic analysis of software comprehension. IEEE
of personality on the performance of software engineering Trans. Software Eng., 25(4):526–540, July/Aug. 1999.
teams. In Proc. Fifth Int’l Symp. Empirical Software Engi- [46] A. von Mayrhauser and A.M. Vans. Industrial experience
neering (ISESE’06), pages 232–241. ACM, 2006. with an integrated code comprehension model. Software
[31] N. Katira, L. Williams, E. Wiebe, C. Miller, S. Balik, and Eng. J., pages 171–182, Sept. 1995.
E. Gehringer. On understanding compatibility of student [47] L. Williams and R.R. Kessler. Pair Programming Illumi-
pair programmers. In Proc. 35th Technical Symp. Computer nated. Addison-Wesley, 2002.
Science Education (SIGCSE’04), pages 7–11. ACM, 2004. [48] L. Williams, R.R. Kessler, W. Cunningham, and R. Jeffries.
[32] K. Krippendorff. Content Analysis: An Introduction to its Strengthening the case for pair programming. IEEE Soft-
Methodology. Sage, second edition, 2004. ware, 17(4):19–25, 2000.
[33] L. Layman. Changing students’ perceptions: An analysis [49] L. Williams, L. Layman, J. Osborne, and N. Katira. Examin-
of the supplementary benefits of collaborative software de- ing the compatibility of student pair programmers. In Proc.
velopment. In Proc. 19th Conf. Software Engineering Edu- AGILE 2006. IEEE Computer Society, 2006.
cation and Training (CSEET’06). IEEE Computer Society,
2006.
[34] T. Lethbridge, S.E. Sim, and J. Singer. Studying software en-
gineers: Data collection techniques for software field stud-
ies. Empirical Software Engineering, 10:311–341, 2005.