Using Psychometric Techniques To Improve The Balance Evaluation Systems Test: The Mini-Bestest
Using Psychometric Techniques To Improve The Balance Evaluation Systems Test: The Mini-Bestest
ORIGINAL REPORT
Franco Franchignoni, MD1, Fay Horak, PT, PhD2, Marco Godi, PT3, Antonio Nardone, MD3,4
and Andrea Giordano, PhD5
From the 1Unit of Occupational Rehabilitation and Ergonomics, Salvatore Maugeri Foundation, Clinica del Lavoro e
della Riabilitazione, IRCCS, Veruno, Italy, 2Department of Neurology, Oregon Health and Science University,
Portland, OR, USA, 3Department of Physical Medicine and Rehabilitation, Salvatore Maugeri Foundation, Clinica del
Lavoro e della Riabilitazione, IRCCS, Veruno, 4Department of Clinical and Experimental Medicine, University of Eastern
Piedmont, Novara and 5Unit of Bioengineering – Salvatore Maugeri Foundation, Clinica del Lavoro e della
Riabilitazione, IRCCS, Veruno, Italy
Objective: To improve, with the aid of psychometric analysis, reactions to external disturbances, anticipatory postural adjust-
the Balance Evaluation Systems Test (BESTest), a tool de- ments to perturbations caused by self-initiated movements
signed to analyse several postural control systems that may (e.g. lifting an object), and dynamic balance during gait (4).
contribute to poor functional balance in adults. However, until recently clinical balance tests did not system-
Methods: Performance of the BESTest was examined in a atically evaluate all these subdomains (5–6).
convenience sample of 115 consecutive adult patients with Recently, a new clinical tool for assessing subdomains
diverse neurological diagnoses and disease severity, referred underlying balance deficits has been presented: the Balance
to rehabilitation for balance disorders. Factor (both ex- Evaluation Systems Test (BESTest) (7). The BESTest is a
plorative and confirmatory) and Rasch analysis were used comprehensive balance assessment tool developed to identify
to process the data in order to produce a new, reduced and
the postural control systems underlying poor functional bal-
coherent balance measurement tool.
ance, so that treatments can be targeted to the specific balance
Results: Factor analysis selected 24 out of the 36 original
deficit. Since the BESTest encompasses 4–6 items for each of
BESTest items likely to represent the unidimensional con-
6 different balance domains, it takes approximately 35 min
struct of “dynamic balance”. Rasch analysis was then used
to: (i) improve the rating categories, and (ii) delete 10 items to administer, compared with only approximately 15 min for
(misfitting or showing local dependency). The model consist- other balance scales (e.g. the Berg Balance Scale; BBS) (8).
ing of the remaining 14 tasks was verified with confirmatory This is an important shortcoming of the BESTest, limiting its
factor analysis to meet the stringent requirements of modern routine use. On the other hand, the main disadvantage of other
measurement. popular balance scales, including the BBS, is that they do not
Conclusion: The new 14-item scale (dubbed mini-BESTest) include important aspects of dynamic balance control, such
focuses on dynamic balance, can be conducted in 10–15 min, as the capability to react to postural perturbations, to stand on
and contains items belonging evenly to 4 of the 6 sections from a compliant or inclined surface, or to walk while performing
the original BESTest. Further studies are needed to confirm a cognitive task. All of these features of balance control are
the usefulness of the mini-BESTest in clinical settings. known to be important in assessing balance disorders in dif-
Key words: postural balance, outcome assessment, psychomet- ferent types of patients, and reflect balance challenges during
rics. activities of daily living (5, 7, 9). Therefore, there is need for
a comprehensive balance assessment tool that can be admini
J Rehabil Med 2010; 42: 323–331
stered in a short time period.
Correspondence address: Franco Franchignoni, Fondazione In developing and validating new clinical instruments there
Salvatore Maugeri, Clinica del Lavoro e della Riabilitazione, is a growing trend of using Rasch analysis (10). Whereas tradi-
IRCCS, Via Revislate 13, IT-28010 Veruno, Italy. E-mail: tional psychometric approaches focus on an instrument’s total
[email protected] score, item response theory (IRT) models, such as the Rasch
Submitted August 25, 2009; accepted January 5, 2010 measurement models, are founded on the probability that a per-
son will make a particular response according to their level of
the underlying latent variable. In this framework, it is possible
INTRODUCTION
to evaluate how well an item performs in terms of its relevance
Assessment of balance and mobility in clinical settings can or contribution for measuring the underlying construct, the
help to determine both risk of falling (1) and the most suit- level of the underlying construct targeted by the question, the
able measures to reduce postural instability (2–3). Laboratory possible redundancy of the item relative to other items in the
studies have shown that postural control embraces different scale, and the appropriateness of the response categories (11).
subdomains, including stability during quiet stance, postural For these reasons, Rasch analysis has been recommended as a
© 2010 The Authors. doi: 10.2340/16501977-0537 J Rehabil Med 42
Journal Compilation © 2010 Foundation of Rehabilitation Information. ISSN 1650-1977
324 F. Franchignoni et al.
complementary method to assess the scaling properties of new for left and right sides) for a total of 36 tasks. Each item is scored
clinical instruments, in addition to the traditional psychometric on a 4-category ordinal scale from 0 (worst performance) to 3 (best
performance). Specific patient and rating instructions, and stopwatch
criteria for disability outcomes research (12).
and ruler values are used to improve reliability (see www.bestest.
The purpose of this study was to use both classical psycho- us). Patients were rated by a physical therapist (MG) with 4 years
metric techniques and Rasch analysis to evaluate the BESTest, of practice experience in balance assessment, who participated in a
investigating a wide range of measurement requirements (e.g. one week training course on the BESTest, at the Balance Disorders
dimensionality, quality of the rating categories, construct valid- Laboratory, Oregon Health & Science University.
ity, reliability indexes) in order to improve the structure and
Statistical analysis
measurement qualities of the test. Based on this analysis, we
Unidimensionality, i.e. whether items are measuring a single under-
present a new, mini-BESTest that focuses on dynamic balance lying dimension or several separate dimensions, is one of the key
and can be conducted in 10–15 min. requisites for test analysis and must be verified before applying Rasch
models (13). To test the dimensionality of the BESTest, we performed
the following statistical steps.
MATERIAL AND METHODS 1. A confirmatory factor analysis for categorical data (CFA, LISREL
Patients 8.80 software, Scientific Software International, Inc., Lincolnwood,
IL 60712, USA) was performed to evaluate the fit of the scale to
A total of 115 patients (53 men and 62 women), mean age 62.7 a unidimensional model. The extent to which the model can be
years (standard deviation, (SD) 16), were studied. They represent used to reproduce the sample data was determined by examining
a convenience sample of patients with balance disorders, recruited the following indexes: the non-normed fit index (NNFI, or Tucker-
with a consecutive sampling method. Patient diagnosis was as fol- Lewis index), the comparative fit index (CFI), the root mean square
lows: 22 hemiparesis (12 right, 10 left), 21 Parkinson’s disease, 15 error of approximation (RMSEA) and the standardized root mean
neuromuscular diseases, 14 hereditary ataxia, 11 multiple sclerosis, square residual (SRMR). NNFI and CFI scores range from 0 to 1
10 unspecific age-related balance disorders, 7 peripheral vestibular with higher values indicating better fit: values greater than 0.95 are
disorders, 6 traumatic brain injury, 4 diffuse encephalopathy, 3 cervi- indicative of an acceptable model fit. A RMSEA value lower than
cal myelopathy, and 2 central nervous system (CNS) neoplasm. All 0.08 reflects an adequate fit and a RMSEA value equal to or less
subjects were inpatients referred to the Scientific Institute of Veruno than 0.05–0.06 suggests a good fit. A SRMR value between < 0.10
for rehabilitation assessment and treatment. Inclusion criteria were: and 0.05 is reflective of an acceptable fit (14–15).
ability to walk with or without a cane; absence of severe cognitive 2 In the event of a poor fit (i.e. multidimensionality is suspected) the
or communication impairments; ability to tolerate the balance tasks following statistical steps were performed sequentially:
without fatigue. Prior to taking part in the study, all participants signed a. Horn’s parallel analysis (16) was used to estimate the number of
an informed consent that had been approved by the central ethics com- meaningful dimensions in the response matrix: the size of eigen-
mittee of the “Salvatore Maugeri” Foundation. values obtained from principal component analysis (PCA) was
compared with those obtained from a randomly generated data
Instrument and procedure set of the same size and number of variables. Only factors with
The BESTest (7) contains 6 subscales, covering a broad spectrum of eigenvalues exceeding the values obtained from the corresponding
performance tasks: (i) biomechanical constraints, (ii) stability limits, random dataset were retained for further investigation. Parallel
(iii) transitions and anticipatory postural adjustments, (iv) postural analysis was conducted using ViSta (17) Parallel Analysis plugin
responses to perturbation, (v) sensory orientation while standing on (http://www.mdp.edu.ar/psicologia/vista/).
a compliant or inclined base of support, and (vi) dynamic stability in b. Explanatory factor analysis (EFA, STATA 10.1 software, Stata-
gait with and without a cognitive task (Table I). The BESTest consists Corp LP College Station TX 77845, USA) was performed with
of 27 items, some of which are subdivided into 2–4 sub-items (e.g. a principal factor analysis using the number of factors suggested
Table I. Summary of BESTest items and subsystem categories. The 14 items forming the mini-BESTest for dynamic balance are in bold. Only the worst
performance in items 11 “Stand on one leg” and 18 “Lateral stepping” have to be taken into account for the score. Moreover, the performance in
item 27 “Cognitive Get Up and Go” must be compared with that in the baseline item 26
I Biomechanical constraints II Stability limits III Anticipatory- transitions
1. Base of Support 6. a. Lateral Lean L 9. Sit to Stand
2. Alignment b. Lateral Lean R 10. Rise to Toes
3. Ankle Strength c. Sitting Verticality L 11. Stand on One Leg (both right and left)
4. Hip Strength d. Sitting Verticality R 12. Alternate Stair Touch
5. Sit on Floor and Stand Up 7. Reach Forward 13. Standing Arm Raise
8. a. Reach L
b. Reach R
IV Postural responses V Sensory orientation VI Dynamic gait
14. In-place forward 19. a. Stance EO (firm surface) 21. Gait Natural
15. In-place backward b. Stance EC (firm surface) 22. Change Speed
16. Stepping forward c. Foam EO 23. Head Turns
17. Stepping backward d. Foam EC 24. Pivot Turns
18. Lateral stepping (both right and left) 20. Incline EC 25. Obstacles
26. Get Up and Go
27. Cognitive Get Up and Go
L: left; R: right; EO: eyes open; EC: eyes closed.
J Rehabil Med 42
Psychometric techniques to improve postural analysis: mini-BESTest 325
by the parallel analysis. After varimax rotation, the relationships Reliability was evaluated in terms of “separation” across test items,
between the test items and retained factors were taken into ac- defined as the ratio of the true spread of the measures to their measure-
count. For a solution that is stable and approximates the population ment error (23, 24). Two indexes were calculated: the item separation
pattern, given the sample size, only items with loading > 0.50 index and the person separation index, that give an estimate (in standard
were considered as correlated to the factors (18). error units) of the spread or “separation” of items and persons along the
c. Item exclusion, based on the EFA results and expert review, was measurement construct, respectively. A separation of 2.0 is considered
performed leading to a preliminary reduced set of test items. good (24). Related indexes are the reliability of the item separation
index and of the person separation index. These provide the degree
Following the above analysis and item exclusion, the matrix of item
of confidence that can be placed in the consistency of the estimates.
responses of the 24 retained items for each subject underwent Rasch
This confidence ranges from 0 to 1, and coefficients > 0.80 and > 0.90
analysis using WINSTEPS software (Linacre JM, WINSTEPS Rasch
are considered respectively good and excellent (23).
measurement computer program, version 3.68. Chicago: Winsteps.
com; 2009) (19).
As a first step, we investigated whether the rating scale of each BESTest RESULTS
item was used in the expected manner. We evaluated the rating scale
categories (partial credit model) using criteria suggested by Linacre The confirmatory factor analysis (CFA) gave, using all
(20, 21): (i) at least 10 observations per response option; (ii) even the items in the BESTest, an inadequate fit (NNFI = 0.91,
distribution of category use; (iii) monotonic increase in both aver- CFI = 0.91, RMSEA = 0.12, SRMR = 0.15). Horn’s Parallel
age measures of persons with a given score/category and thresholds
(thresholds, or step calibrations, are the ability levels at which the
Analysis (PA) revealed 3 factors with empirical eigenvalues
response to either of 2 adjacent categories is equally likely); (iv) exceeding those from the random data. These 3 factors ex-
category outfit mean square (MnSq) values less than 2 (see below); plained 43%, 11% and 8% of the variance, respectively. To
and (v) threshold differences larger than 1.4 and lower than 5 logits. investigate the contribution of each item to the scale, we tested
We collapsed categories following these guidelines, and compared the 3-factor model suggested by PA using explanatory factor
different collapsing solutions, examining not only the category diag-
nostics, but also reliability indices. We were guided by the intention analysis for ordinal data (EFA) with a principal axis factor
to select a solution that maximized statistical indices and clinical extraction method. After varimax rotation, 24 items loaded >
meaningfulness. 0.50 in the first factor, 4 items (6 a–d) in the second factor,
After this rating scale modification, a new Rasch analysis was per- and 3 items (7, 8a and 8b) in the third factor, while items 1–4
formed, including PCA on the standardized residuals to evaluate: (i) the
and 13 failed to load meaningfully in any factor.
presence of sub-dimensions, as an independent confirmation of the uni-
dimensionality of the scale, and (ii) the local independence of items. Taking into account these results and expert opinion, 12
1. “Unidimensionality” assumes that, after removal of the trait that the items (1–4, 6a–d, 7, 8a–b, 13) were deemed as not belonging
scale intended to measure (the “Rasch factor”), the residuals will to the main trait and therefore were dropped from subsequent
be uncorrelated and normally distributed (i.e. there are no principal analyses. The expert review judged the remaining 24 items
components) (19). The following criteria were used to determine
to potentially measure a factor likely to represent “dynamic
whether additional factors were likely to be present in the residuals:
(i) a cutoff of 50% of the variance explained by the Rasch factor; balance” in a variety of functional conditions. These 24 items
and (ii) eigenvalue of the first residual factor smaller than 3 (19). underwent Rasch analysis.
2. “Local independence” between items indicates that they do not Rating scale diagnostics showed that the 0–3 level rating
duplicate some feature of each other or they both incorporate some categories did not comply with our pre-set criteria for category
shared dimension. Item couples with a standardized residual correla-
tion > 0.30 were considered as possibly dependent components (22).
function. The model best meeting the criteria reduced the rating
Based on examination of the respective item information functions scale from 4 to 3 levels by combining categories 0 (absent) and
and expert judgement, we progressively eliminated all dependencies, 1 (mild) or 1 (mild) and 2 (moderate) (Table II), with different
either removing one of the items, or, in the case of dependent items collapsing strategies used across items.
that were related to the same task performed in different directions After combining these rating scale categories, 22 out of the
(e.g. scores assessing right and left sides), collapsing the items into
a new one reporting only the worst performance. 24 items fitted the underlying construct of dynamic balance
Internal validity of the scale was assessed by evaluating the fit of that the scale was intended to measure (infit and outfit MnSq
individual test items to determine if the pattern of item difficulty was between 0.7 and 1.3). Item 5 “Sit on floor and stand up” was
consistent with the model predictions. We estimated the goodness-of-fit underfitting (i.e. with unexpectedly high variability) and item
of the observed data to data predicted by the Rasch model (23, 24).
26 “Get up and go” was overfitting (i.e. with an overly predict-
Information-weighted (infit) and outlier-sensitive (outfit) mean-square
statistics (MnSq) for each item were calculated to test whether there able pattern), so they were eliminated.
were items that did not fit the model expectancies. Both of these fit The PCA of standardized residuals showed several high
statistics are expected to approach 1 if the data fit the model. In ac- (> 0.30) residual correlations between items.
cordance with the literature (10), we considered MnSq > 0.7 and < 1.3 Based on examination of the respective item information
as an indicator of acceptable fit.
functions and expert judgment, all misfitting items and re-
We also estimated the level of difficulty of each item (“item diffi-
culty”) and the ability of each individual subject, and then we examined sidual correlations > 0.30 were eliminated one by one, and the
the data for floor and ceiling effects. Item difficulty and subject ability Rasch analysis was re-run. Correlated (redundant) items were
are expressed, on a common interval scale, in logit units, a logit being removed either by deleting one of them, or by maintaining only
the natural logarithm of the ratio (odds) of mutually exclusive alterna- the worst performance in items 11 and 18, which assessed the
tives (e.g. pass vs fail, or higher vs lower response option) (23, 24).
Logit-transformed measures represent linear measures. By convention,
same task on both right and left side. At the end of these itera-
0 logit was ascribed to the mean item difficulty. For Rasch analysis, tions, only 14 test items remained. This set of items (called the
a sample size of more than 100 persons will estimate item difficulty mini-Balance Evaluation Systems Test of dynamic balance;
with an alpha of 0.05 within ± 0.5 logits (25). mini-BESTest) (see Table I) underwent further analyses.
J Rehabil Med 42
326 F. Franchignoni et al.
Table II. Mean difficulty estimates for each of the 14 items of the mini-BESTest with standard errors (SE) and infit and outfit mean-square statistics
(MnSq). The more difficult the item estimate, the less likely it is for any subject to gain a high score. Alongside each item is its number in the original
BESTest (see Table I). The rating scale column shows how the 4 scaling categories were collapsed into 3 categories, e.g. 0012 means that categories 0 and
1 have been collapsed and then the remaining 3 categories have been re-numbered accordingly. L: left; R: right; EO: eyes open; EC: eyes closed
Item Mean difficulty SE Infit MnSq Outfit MnSq Rating scale
11 a/b – Stand on L/R leg 2.43 0.25 0.90 1.07 0112
18 a/b – Postural Stepping L/R 1.10 0.22 0.84 0.76 0112
23 – Head turns 1.00 0.19 0.91 0.83 0012
17 – Postural Stepping backward 0.93 0.22 0.97 1.08 0112
27 – Cognitive “Get Up and Go” with dual task 0.77 0.24 1.07 1.08 0112
10 – Rise to toes 0.65 0.20 0.94 1.11 0012
19 d – Foam Surface EC 0.54 0.20 1.04 1.12 0112
25 – Obstacles 0.10 0.21 0.75 0.73 0112
16 – Postural Stepping forward –0.03 0.21 1.14 1.23 0112
20 – Incline EC –0.64 0.21 1.12 1.00 0112
24 – Pivot turns –0.85 0.21 0.99 1.32 0112
22 – Change speed –1.00 0.20 0.89 0.78 0112
9 – Sit to stand –1.78 0.24 1.30 1.32 0012
19 a – Stance EO –2.51 0.39 1.12 0.66 0012
All of the final 14 items showed good infit and outfit MnSq
values (Table II). The variance explained by the estimated
Rasch measures was 58.8%, whereas only 5.3% of the vari-
ance was explained by the first residual factor (eigenvalue
1.8). Regarding the hierarchic ordering of items, Figs 1 and 2
show, according to the Rasch model, the distribution of subject
ability and item difficulty. Item difficulty showed a fairly even
spread (from the most easy item “Stand with eyes open on a
firm surface” to the most difficult item “Stand on one leg”),
and subject ability presented a normal distribution spanning
from –5 to +4.9 logits, with an average measure = +0.15 (mean
SE 0.59). Only 2 subjects showed extreme maximum scores:
the precision of their ability estimates was quite low, the SE
being approximately 30% of the corresponding measure. No
floor effect was found. Overall, these findings demonstrate an
adequate sample-item distribution. The item difficulty esti-
mates spanned from –4 to +2.5 logits. The reliability indices
of mini-BESTest were as follows: item separation index = 7.35
and item separation reliability = 0.98; Person separation in-
dex = 2.50 and Person separation reliability = 0.86.
A final CFA confirmed the unidimensionality of the mini-
BESTest, supporting the unidimensional model with the fol-
lowing indexes: NNFI = 0.98, CFI = 0.99, RMSEA = 0.064,
and SRMR = 0.098.
The final version of the mini-BESTest is shown in Ap-
pendix I.
DISCUSSION
Fig. 1. Subject-ability and item-difficulty maps of the mini-BESTest
The original BESTest is composed of a comprehensive battery (n = 115). In both maps, the vertical line represents the measure of the
of 36 balance tasks, developed to analyse 6 different postural variable, in linear logit units. The left-hand column locates each patient’s
control systems that may contribute to poor functional balance in ability, from best to worst dynamic balance. The right-hand column locates
adults of any age (7). Thus, it is not surprising that this test failed each item’s relative difficulty for this sample (for each item, the difficulty
to meet a unidimensionality assumption (i.e. that a single dimen- estimate represents the mean calibration of the threshold parameters
according to the partial credit model). From bottom to top, measures
sion underlies all item responses), when applied to 115 patients
indicate better balance for patients and higher difficulty for items. By
with a wide range of diagnoses and severity of disease. convention, the average difficulty of items in the test is set at 0 logits
Our dimensionality assessment extracted from the test bat- (and indicated with M’) and patients with average ability are located at
tery 24 item assumed to define “dynamic balance”. On these M. L: left; R: right; EO: eyes open; EC: eyes closed.
J Rehabil Med 42
Psychometric techniques to improve postural analysis: mini-BESTest 327
items we performed an analysis of category and item properties No floor effect was found. However, one should interpret the
using Rasch psychometric methods, which led to the definition extreme results with caution, since these person measures have
of the 14 most psychometrically useful and practical items: the the least precision due to the larger errors of measurement. On
refined mini-BESTest measures the unidimensional construct the other hand, the high item separation reliability indicates
of “dynamic balance” without redundant items or significant that great confidence can be placed in the consistency of item
ceiling/floor effects (26) and takes 10–15 min to administer. difficulty estimate across future samples.
The rating scale diagnostics (21) performed on the 24 items re- Content validity of the dynamic mini-BESTest is high, since
tained after EFA showed that the original 4 levels were redundant many items included in the test are part of well-known balance
(23). This finding was expected, since some BESTest items were batteries: (i) “Sit to stand” is from the Berg Balance Scale (30)
borrowed (with modifications) from the BBS and the Dynamic and the Performance-Oriented Mobility Assessment (31); (ii)
Gait Index. These 2 well-known balance and mobility scales have “Stand on one leg” is from the Ataxia Test Battery (32) and
been shown to include sub-optimal category functioning (27, the Berg Balance Scale; (iii) “Stance – eyes open” and “Stance
28) when strict diagnostic criteria are applied (20). In addition, on foam – eyes closed” are from the modified Clinical Test of
it has already been demonstrated that the BBS (and other bal- Sensory Integration of Balance (33, 34); (iv) Gait when balance
ance scales) show essentially identical psychometric properties, is challenged by changing speed, head rotations, pivot turns, or
including responsiveness, when used with a 3-category, instead stepping over obstacles comes from the Dynamic Gait Index
of a 4- or 5-category rating scale (29). Appropriate combination (35); (v) the “Get Up and Go” test (36) and the “Get Up and Go
of levels 0–1 or 1–2 eliminated underutilized rating categories, with a simultaneous cognitive task” (37) are stand-alone tests.
and ensured that each rating category was distinct from the others In the BESTest, Horak et al. (7) made only minor modifications
in representing a distinct balance ability. to some of the above original items, in order to increase their
After collapsing the categories to 3 distinct levels, the data challenge and improve their consistency and reliability. Novel
from the 24-item set were reanalysed to calculate fit statistics items in the mini-BESTest have been adapted from laboratory
and the PCA of the residuals. This analysis enabled us to tests where they were shown to distinguish different types of
eliminate 10 misfitting or redundant items without loss of balance disorders: (i) postural reactions to external perturba-
measurement information and with the great advantage of tions (38); (ii) rise to toes (39); and (iii) stance on an inclined
improving test acceptability and feasibility. For the remain- surface with eyes closed (40).
ing 14-item (the mini-BESTest), we calculated fit statistics, As an additional demonstration of the internal construct
extracted Rasch-modelled parameters of ability and difficulty, validity of the scale, the general hierarchical arrangement
and then examined internal validity and test reliability. The found by Rasch analysis (Table II) is consistent with clinical
average ability of this group of patients was very similar to expectations. For example, the maintenance of feet-together
the mean value of 0 logits (+0.15): this means that the test is stance, eyes open on a firm surface (“Stance EO”) is the easiest
well targeted to the sample. Moreover, the person-ability and task and “Stand on one leg” the most difficult task item (28). In
item-difficulty mapped logit scale showed a broad range for fact, “Stance EO” makes few sensory demands and requires low
both person-ability and item-difficulty (Fig. 1). The 1.7% of effort, whereas “Stand on one leg” is very challenging because
subjects (2/115) with extreme maximum scores, the 2 “×” at the of the narrow base of support and musculoskeletal demands.
top of the left-hand column in Fig. 1, constituted a minor trend In addition, the results of Rasch analysis of the mini-BESTest
toward a ceiling effect in very highly functioning subjects. show a hierarchical order of item difficulty: “Gait with hori-
J Rehabil Med 42
328 F. Franchignoni et al.
zontal head turns”, “Stand on one leg”, and “Lateral stepping based review): report of the Quality Standards Subcommittee
responses” were the most difficult items, whereas “Stance EO” of the American Academy of Neurology. Neurology 2008; 70:
473–479.
and “Sit to Stand” were the easiest items. The high difficulty of
2. Gillespie LD, Robertson MC, Gillespie WJ, Lamb SE, Gates S,
the item “Gait with horizontal head turns” may be attributed to Cumming RG, et al. Interventions for preventing falls in older
vestibular influences (35) and is in line with the results of the people living in the community. Cochrane Database Syst Rev 2009
two Rasch studies on the Dynamic Gait Index (28, 41). Apr 15; 2: CD007146.
The mini-BESTest contains 14 items belonging evenly to 4 3. Horak FB. Clinical assessment of balance disorders. Gait Posture
1997; 6: 76–84.
of the 6 sections from the original BESTest (Table I): section 4. Horak FB, Macpherson JM. Postural orientation and equilibrium.
III “Anticipatory Postural Adjustments” (sit to stand, rise to In: Shepard J, Rowell L, editors. Regulation and integration of
toes, stand on 1 leg); section IV “Postural Responses” (step- multiple systems. Handbook of physiology: section 12, exercise.
ping in 4 different directions); section V “Sensory Orientation” New York: Oxford University Press; 1996, p. 255–292.
(stance – eyes open; foam surface – eyes closed; incline – eyes 5. Horak FB. Postural orientation and equilibrium: what do we need
to know about neural control of balance to prevent falls? Age
closed); and section VI “Balance during Gait” (gait during Ageing 2006; 35: ii7–ii11.
change speed, head turns, pivot turns, obstacles; cognitive 6. Pérennou D, Decavel P, Manckoundia P, Penven Y, Mourey F,
“Get Up and Go” with dual task). Launay F, et al. Evaluation of balance in neurologic and geriatric
Our factor analysis procedure (42) isolated a number of disorders. Ann Readapt Med Phys 2005; 48: 317–335.
7. Horak FB, Wrisley DM, Frank J. The Balance Evaluation Systems
items, primarily in the first 2 sections of the BESTest, that did
Test (BESTest) to differentiate balance deficits. Phys Ther 2009;
not contribute to the dominant trait (dynamic balance), suggest- 89: 484–498.
ing that parts I “Biomechanical constraints” and II “Stability 8. Franchignoni F, Tesio L, Martino MT, Ricupero C. Reliability of
limits” of the BESTest warrant separate psychometric studies. four simple, quantitative tests of balance and mobility in healthy
Biomechanical constraints (such as orthopaedic limitations on elderly females. Aging (Milano) 1998; 10: 26–31.
9. Woollacott M, Shumway-Cook A. Abnormal postural control.
the base of foot support, postural alignment and strength) and
In: Motor control - theory and practical applications. Baltimore:
stability limits (ability to lean to perceived limits of stability Lippincott Williams & Wilkins; 2001, p. 248–270.
and perception of verticality) are also important facets of 10. Tesio L. Measuring behaviours and perceptions: Rasch analysis as
postural control, but appear to be independent of the construct a tool for rehabilitation. J Rehabil Med 2003; 35: 105–115.
“dynamic balance”. 11. Reeve BB, Fayers P. Applying item response theory modeling
for evaluating questionnaire item and scale properties. In: Fayers
This study has several limitations, which restrict the gene P, Hays RD, editors. Assessing quality of life in clinical trials:
ralization of our results to different groups or settings, and methods of practice. 2nd edn. Oxford, NY: Oxford University
raters. In particular, the selection criteria of our convenience Press; 2005, p 55–73.
sample (recruited with a consecutive sampling method) may 12. Andresen EM. Criteria for assessing the tools of disability outcomes
represent a threat to external validity. Our sample was a cross- research. Arch Phys Med Rehabil 2000; 81 Suppl 2: S15–S20.
13. Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA,
section of adults drawn from a single rehabilitation facility and et al. PROMIS Cooperative Group. Psychometric evaluation and
with balance disorders of very different origins and severities. calibration of health-related quality of life item banks: plans for
Moreover, we used only one rater, but to improve the reliabil- the Patient-Reported Outcomes Measurement Information System
ity of results he participated in a one week training course on (PROMIS). Med Care 2007; 45 Suppl 1, S22–S31.
14. Schermelleh-Engel K, Moosbrugger H, H, Müller H. Evaluating
BESTest, held by one of its developers (FBH).
the fit of structural equation models: tests of significance and de-
In conclusion, the new mini-BESTest offers a unique, brief scriptive goodness-of-fit measures. Methods Psychol Res Online
clinical rating scale for dynamic balance that has excellent 2003; 8: 23–72.
psychometric characteristics. The potential interest of the mini- 15. Cook KF, Teal CR, Bjorner JB, Cella D, Chang CH, Crane PK,
BESTest in clinical settings is high, but further studies are needed. et al. IRT health outcomes data analysis project: an overview and
summary. Qual Life Res 2007; 16 Suppl 1: 121–132.
They should include: (i) analysis of the actual performance of
16. Horn JL. A rationale and test for the number of factors in factor
the new 3-level response structure; and (ii) a study of differential analysis. Psychometrika 1965; 30: 179–185.
item functioning, i.e. the stability of item hierarchy across sub- 17. Young FW. ViSta: the Visual Statistics System, UNC L.L.
samples defined according to potentially relevant clinical criteria; Thurstone Psychometric Laboratory Research Memorandum
(iii) relation of the scores to fall risk and to other clinical tests of 94–1(c), 1996.
18. Guadagnoli E, Velicer WF. Relation of sample size to the stability
balance; and (iv) age-related normative values. of component patterns. Psychol Bull 1988; 103: 265–275.
19. Linacre JM. A user’s guide to WINSTEPS-MINISTEP: Rasch-
model computer programs. Program manual 3.68.0. Chicago, IL:
ACKNOWLEDGEMENT WINSTEPS.com; 2009 [cited 2009 September 24]. Available from:
URL: http://www.winsteps.com/a/winsteps.pdf.
Fay Horak was supported by a Grant from the National Institutes on 20. Linacre JM. Optimizing rating scale category effectiveness. J Appl
Aging AG-06457 (USA). Meas 2002; 3: 85–106.
21. Wolfe EW, Smith EV Jr. Instrument development tools and activi-
ties for measure validation using Rasch models: part II – validation
REFERENCES activities. J Appl Meas 2007; 8: 204–234.
22. Davidson M. Rasch analysis of 24-, 18- and 11-item versions of
1. Thurman DJ, Stevens JA, Rao JK. Practice parameter: assessing the Roland-Morris Disability Questionnaire. Qual Life Res 2009;
patients in a neurology practice for risk of falls (an evidence- 18: 473–481.
J Rehabil Med 42
Psychometric techniques to improve postural analysis: mini-BESTest 329
23. Bond TG, Fox CM. Applying the Rasch model: fundamental interaction of balance. Suggestion from the field. Phys Ther 1986;
measurement in the human sciences. 2nd edn. Mahwah: Lawrence 66: 1548–1550.
Erlbaum Associates; 2001. 34. Cohen H, Blatchly CA, Gombash LL. A study of the clinical test of
24. Wright BD, Masters GN. Rating scale analysis. Chicago: Mesa sensory interaction and balance. Phys Ther 1993; 73: 346–351.
Press; 1982. 35. Whitney SL, Hudak MT, Marchetti GF. The dynamic gait index
25. Linacre JM. Sample size and item calibration stability. Rasch Meas relates to self-reported fall history in individuals with vestibular
Trans 1994; 7: 328. dysfunction. J Vestib Res 2000; 10: 99–105.
26. Hobart JC, Lamping DL, Freeman JA, Langdon DW, McLellan 36. Podsiadlo D, Richardson S. The timed “Up & Go”: a test of basic
DL, Greenwood RJ, et al. Evidence-based measurement: which functional mobility for frail elderly persons. J Am Geriatr Soc
disability scale for neurologic rehabilitation? Neurology 2001; 1991; 39: 142–148.
57: 639–644. 37. Shumway-Cook A, Brauer S, Woollacott M. Predicting the pro
27. Kornetti DL, Fritz SL, Chiu YP, Light KE, Velozo CA. Rating bability for falls in community-dwelling older adults using the
scale analysis of the Berg Balance Scale. Arch Phys Med Rehabil Timed Up & Go Test. Phys Ther 2000; 80: 896–903.
2004; 85: 1128–1135. 38. Henry SM, Fung J, Horak FB. EMG responses to maintain stance
28. Chiu YP, Fritz SL, Light KE, Velozo CA. Use of item response during multidirectional surface translations. J Neurophysiol 1998;
analysis to investigate measurement properties and clinical validity 80: 1939–1950.
of data for the dynamic gait index. Phys Ther 2006; 86: 778–787. 39. Nardone A, Schieppati M. Postural adjustments associated with
29. Wang CH, Hsueh IP, Sheu CF, Yao G, Hsieh CL. Psychometric voluntary contraction of leg muscles in standing man. Exp Brain
properties of 2 simplified 3-level balance scales used for patients Res 1988; 69: 469–480.
with stroke. Phys Ther 2004; 84: 430–438. 40. Kluzik J, Horak FB, Peterka RJ. Differences in preferred reference
30. Berg KO, Wood-Dauphinee SL, Williams JI, Maki B. Measuring frames for postural orientation shown by after-effects of stance on
balance in the elderly: validation of an instrument. Can J Public an inclined surface. Exp Brain Res 2005; 162: 474–489.
Health 1992; 83 Suppl 2: S7–S11. 41. Marchetti GF, Whitney SL. Construction and validation of the
31. Tinetti ME, Richman D, Powell L. Falls efficacy as a measure of 4-item dynamic gait index. Phys Ther 2006; 86: 1651–1660.
fear of falling. J Gerontol 1990; 45: P239–P243. 42. Coste J, Bouée S, Ecosse E, Leplège A, Pouchot J. Methodologi-
32. Graybiel A, Fregly AR. A new quantitative ataxia test battery. Acta cal issues in determining the dimensionality of composite health
Otolaryngol 1966; 6: 292–312. measures using principal component analysis: case illustration and
33. Shumway-Cook A, Horak FB. Assessing the influence of sensory suggestions for practice. Qual Life Res 2005; 14: 641–654.
J Rehabil Med 42
330 F. Franchignoni et al.
INSTRUCTIONS
1. Sit to stand
Examiner Instructions: Note the initiation of the movement, and the use Patient: Cross arms across your chest. Try not to use your hands unless
of hands on the arms of the chair or their thighs or thrusts arms forward. you must. Do not let your legs lean against the back of the chair when
you stand. Please stand up now.
2. Rise to toes
Examiner Instructions: Allow the patient to try it twice. Record the best Patient: Place your feet shoulder width apart. Place your hands on your
score. (If you suspect that subject is using less than their full height, ask hips. Try to rise as high as you can onto your toes. I will count out loud
them to rise up while holding the examiners’ hands.) Make sure subjects to 3 seconds. Try to hold this pose for at least 3 seconds. Look straight
look at a non–moving target 4–12 feet away. ahead. Rise now.
3. Stand on one leg
Examiner Instructions: Allow the patient two attempts and record Patient: Look straight ahead. Keep your hands on your hips. Bend one
the best. Record the number of seconds they can hold posture up to a leg behind you. Do not touch your raised leg on your other leg. Stay
maximum of 30 seconds. Stop timing when subject moves their hand off standing on one leg as long as you can. Look straight ahead. Lift now.
hips or puts a foot down. Make sure subjects look at a non–moving target
4–12 feet ahead. Repeat other side.
4. Compensatory stepping correction – forward
Examiner Instructions: Stand in front to the side of patient with one hand Patient: Stand with your feet shoulder width apart, arms at your sides.
on each shoulder and ask them to push forward. (Make sure there is room Lean forward against my hands beyond your forward limits. When I let
for them to step forward.) Require them to lean until their shoulders go, do whatever is necessary, including taking a step, to avoid a fall.
and hips are in front of their toes. The test must elicit a step. NOTE: Be
prepared to catch patient.
5. Compensatory stepping correction – backward
Examiner Instructions: Stand in back to the side of the patient with one Patient: Stand with your feet shoulder width apart, arms down at your
hand on each scapula and ask them to lean backward. (Make sure there sides. Lean backward against my hands beyond your backward limits.
is room for them to step backward.) Require them to lean until their When I let go, do whatever is necessary, including taking a step, to avoid
shoulders and hips are in back of their heels. After you feel their body a fall.
weight in your hands, very suddenly release your support. Test must
elicit a step. NOTE: Be prepared to catch patient
J Rehabil Med 42
Psychometric techniques to improve postural analysis: mini-BESTest 331
J Rehabil Med 42