COST294-MAUSE Workshop: Meaningful Measures: Valid Useful User Experience Measurement.
June 2008
Classifying and selecting UX and usability measures
Nigel Bevan
Professional Usability Services
12 King Edwards Gardens, London W3 9RG, UK
mail@[Link]
[Link]
20% chance that the success rate for a large sample of
ABSTRACT
users might only be 51%.
There are many different types of measures of usability
and user experience (UX). The overall goal of usability Although summative measures are most commonly
from a user perspective is to obtain acceptable obtained from user performance and satisfaction,
effectiveness, efficiency and satisfaction (Bevan, 1999, summative data can also be obtained from hedonic
ISO 9241-11). This paper summarises the purposes of questionnaires (e.g. Hassenzahl et al., 2003; Lavie and
measurement (summative or formative), and the measures Tractinsky, 2004) or from expert evaluation, such as the
of usability that can be taken at the user interface level degree of conformance with usability guidelines (see for
and at the system level. The paper suggests that the example Jokela, et al, 2006).
concept of usability at the system level can be broadened
Formative measures
to include learnability, accessibility and safety, which
contribute to the overall user experience. UX can be Formative evaluation can be used to identify UX/usability
measured as the user’s satisfaction with achieving problems, to obtain a better understanding of user needs
pragmatic and hedonic goals, and pleasure. and to refine requirements. The main data from formative
evaluation is qualitative. When formative evaluation is
WHY MEASURE UX/USABILITY? carried out relatively informally with small numbers of
The most common reasons for measuring usability in users, it does not generate reliable data from user
product development are to obtain a more complete performance and satisfaction.
understanding of users’ needs and to improve the product However some measures of the product obtained by
in order to provide a better user experience. formative evaluation, either with users or by an expert,
But it is also important to establish criteria for such as the number of problems identified, may be useful,
UX/usability goals at an early stage of design, and to use although they should be subject to statistical assessment if
summative measures to evaluate whether these have been they are to be interpreted.
achieved during development. In practice, even when the main purpose of an evaluation
Summative measures is summative, it is usual to collect formative information
to provide design feedback at the same time.
Summative evaluation can be used to establish a baseline,
make comparisons between products, or to assess whether WHAT MEASURES SHOULD BE USED?
usability requirements have been achieved. For this There are two types of UX/usability measures: those that
purpose, the measures need to be sufficiently valid and measure the result of using the whole system (usability in
reliable to enable meaningful conclusions to be drawn use) and measures of the quality of the user interface
from the comparisons. One prerequisite is that the (interface usability).
measures are taken from an adequate sample of typical
users carrying out representative tasks in a realistic SYSTEM USABILITY
context of use. Any comparative figures should be ISO 9241-11 (1998) defines usability as:
accompanied by a statistical assessment of whether the
results may have been obtained by chance. the extent to which a product can be used by specified
users to achieve specified goals with effectiveness,
For example, the test method for everyday products in efficiency and satisfaction in a specified context of use
ISO 20282-2 points out that to obtain 95% confidence
that 80% of users could successfully complete a task and ISO 9241-171 (2008) defines accessibility as:
would for example require 28 out of 30 users tested to be usability of a product, service, environment or facility
successful. If 4 out of 5 users in a usability test were by people with the widest range of capabilities
successful, even if the testing protocol was perfect there is
These definitions mean that for a product to be usable and Flexibility in use: the extent to which the product is
accessible users should be able to use a product or web usable in all potential contexts of use:
site to achieve their goals in an acceptable amount of
• Context conformity in use: the degree to which
time, and be satisfied with the results. ISO/IEC standards
usability in use meets requirements in all the intended
for software quality refer to this broad view of usability as
contexts of use.
“quality in use”, as it is the user’s overall experience of
the quality of the product (Bevan, 1999). This is a black- • Context extendibility in use: the degree of usability in
box view of usability: what is achieved, rather than how. use in contexts beyond those initially intended.
The new draft ISO standard ISO/IEC CD 25010.2 (2008) • Accessibility in use: the degree of usability in use for
proposes a more comprehensive breakdown of quality in users with specified disabilities.
use into usability in use (which corresponds to the ISO
9241-11 definition of usability as effectiveness, efficiency Safety: acceptable levels of risk of harm to people,
and satisfaction); flexibility in use (which is a measure of business, data, software, property or the environment in
the extent to which the product is usable in all potential the intended contexts of use.
contexts of use, including accessibility); and safety (which Safety is concerned with the potential adverse
is concerned with minimising undesirable consequences): consequences of not meeting the goals. For instance in
Quality in use Cockton’s (2008) example of designing a van hire system,
from a business perspective, what are the potential
Usability in use consequences of:
Effectiveness in use
Productivity in use Not offering exactly the type of van preferred by a
Satisfaction in use potential user group?
Likability (satisfaction with pragmatic goals) The user mistakenly making a booking for the wrong
Pleasure (satisfaction with hedonic goals) dates or wrong type of vehicle?
Comfort (physical satisfaction) The booking process taking longer than with
Trust (satisfaction with security) competitor systems?
Flexibility in use
For a consumer product or game, what are the potential
Context conformity in use adverse consequences of a lack of pleasurable emotional
Context extendibility in use reactions or of achievement of other hedonic goals?
Accessibility in use
Safety SYSTEM USABILITY MEASURES
Operator health and safety Usability in use and flexibility in use are measured by
Public health and safety effectiveness (task goal completion), efficiency (resources
Environmental harm in use used) and satisfaction. The relative importance of these
Commercial damage in use measures depends on the purpose for which the product is
Usability in use is similar to the ISO 9241-11 definition being used (for example in some personal situations,
of usability: resources may not be important).
• Effectiveness: “accuracy and completeness.” Error- Table 1 illustrates how the measures of effectiveness,
free completion of tasks is important in both business resources, safety and satisfaction can be selected to
and consumer applications. measure quality in use from the perspective of different
stakeholders.
• Efficiency: “resources expended.” How quickly a
user can perform work is critical for business From an organisational perspective, quality in use and
productivity. usability in use is about achievement of task goals. But
for the end user there are not only pragmatic task-related
• Satisfaction: the extent to which expectations are “do” goals, but also hedonic “be” goals (Carver &
met. Satisfaction is a success factor for any products Scheier, 1998). For the end user, effectiveness and
with discretionary use; it’s essential for maintaining efficiency are the do goals, and stimulation, identification,
workforce motivation. evocation and pleasure are the be goals.
Usability in use also explicitly identifies the need for a Additional derived user performance measures (Bevan,
product to be usable in the specified contexts of use: 2006) include:
• Context conformity: the extent to which usability in • Partial goal achievement. In some cases goals may
use meets requirements in all the required contexts of be only partially achieved, producing useful but
use. suboptimal results.
• Relative user efficiency. How long a user takes in product can be tested in situations that might be expected
comparison with an expert. to increase risks. Or risks can be estimated in advance.
• Productivity. Completion rate divided by task time, Evaluation of data from usage of an existing system
which gives a classical measure of productivity. Measures of effectiveness, efficiency and satisfaction can
Table 1. Stakeholder perspectives of quality in use also be obtained from usage of an existing system.
Stakeholder: End User Usage Technical Web Metrics
Usability Organisation support Web-based logs contain potentially useful data that can be
Cost- Maintenance used to evaluate usability by providing data such as
effectiveness entrance and exit pages, frequency of particular paths
Goal: Personal goals Task goals Support goals through the site, and the extent to which search is
Characteristic successful. (Burton and Walther, 2001), although it is
very difficult to track individual user behaviour (Groves,
System User Task Support 2007) without some form of pagetagging combined with
effectiveness effectiveness effectiveness effectiveness pop-up questions when the system is being used, so that
System Productivity Cost Support cost the results can be related to particular user groups and
resources (time) efficiency tasks.
(money)
Application Instrumentation
Safety Risk to user Commercial System Data points can be built into code that "count" when an
(health and risk failure or event occurs (for example in Microsoft Office (Harris,
safety) corruption
2005)). This could be the frequency with which
Stakeholder Hedonic and Management Support commands are used or the number of times a sequence
satisfaction pragmatic satisfaction satisfaction results in a particular type of error. The data is sent
satisfaction anonymously to the development organization. This real-
world data from large populations can help guide future
User satisfaction measures design decisions.
User satisfaction can be measured by the extent to which
users have achieved their pragmatic and hedonic goals. Satisfaction Surveys
ISO/IEC CD 25010.2 suggests the following types of Satisfaction questionnaires distributed to a sample of
measure: existing users provide an economical way of obtaining
feedback on the usability of an existing product or system.
• Likability: the extent to which the user is satisfied
with their perceived achievement of pragmatic goals, USER INTERFACE USABILITY
including acceptable perceived results of use and The broad quality in use perspective contrasts with the
consequences of use. narrower interpretation of usability as the attributes of the
• Pleasure: the extent to which the user is satisfied user interface that makes the product easy to use. This is
with their perceived achievement of hedonic goals of consistent with one of the views of usability in HCI, for
stimulation, identification and evocation (Hassenzahl, example in Nielsen’s (1993) breakdown where a product
2003) and associated emotional responses (Norman’s can be usable, even if it has no utility (Figure 1).
(2004) visceral category). System acceptability
• Comfort: the extent to which the user is satisfied with Social acceptability
physical comfort. Practical acceptability
• Trust: the extent to which the user is satisfied that the Cost
product will behave as intended. Compatibility
Satisfaction is most often measured using a questionnaire. Reliability
Psychometrically designed questionnaires will give more Usefulness
reliable results than ad hoc questionnaires (Hornbaek, Utility
2006).
Usability
Safety and risk measures Figure 1. Nielsen’s categorisation of usability
There are no simple measures of safety. Historical
User interface usability is a pre-requisite for system
measures can be obtained for the frequency of health and
usability.
safety, environmental harm and security failures. A
Expert-based methods satisfaction (e.g. ISO 9241-11), or good user performance
Expert evaluation relies on the expertise of the evaluator, and user experience (e.g. ISO 9241-210).
and may involve walking through user tasks or assessing Accessibility may refer to product capabilities (“technical
conformance to UX/usability guidelines or heuristics. accessibility”) or a product usable by people with
Measures that can be obtained from expert evaluation disabilities (e.g. ISO 9241-171).
include: UX has even more interpretations. ISO CD 9241-210
• Number of violations of guidelines or heuristics. defines user experience as:
• Number of problems identified. all aspects of the user’s experience when interacting
with the product, service, environment or facility.
• Percentage of interface elements conforming to a
particular guideline. This definition can be related to different interpretations
of UX:
• Whether the interface conforms to detailed
requirements (for example the number of clicks • UX attributes such as aesthetics, designed into the
required to achieve specific goals). product to create a good user experience.
If the measures are sufficiently reliable, they can be used • The user’s pragmatic and hedonic UX goals
to track usability during development. (individual criteria for user experience) (Hassenzahl,
2003).
Automated evaluation methods
• The actual user experience when using the product
There are some automated tools (such as WebSAT and (this is difficult to measure directly).
LIFT) that automatically test for conformance with basic
usability and accessibility rules. Although the measures • The measurable UX consequences of using the
obtained are useful for screening for basic problems, they product: pleasure, and satisfaction with achieving
only test a very limited scope of usability issues (Ivory & pragmatic and hedonic goals.
Hearst, 2001).
Table 2 shows how measures of system usability and UX
MEASURING UX, USABILITY AND ACCESSIBILITY are dependent on product attributes that support different
aspects of user experience. In Table 2 the columns are the
Usability is variously interpreted as good user interface
quality characteristics that contribute to the overall user
design (ISO 9126-1), an easy to use product (e.g.
experience, with the associated product attributes needed
Cockton, 2004), good user performance (e.g. Väänänen-
to achieve these qualities.
Vainio-Mattila et al, 2008), good user performance and
Table 2. Factors contributing to system usability and UX
Quality UX Functionality User interface Learnability Accessibility Safety
characteristic usability
Product Aesthetic Appropriate Good UI design Learnability Technical Safe and secure
attributes attributes functions (easy to use) attributes accessibility design
UX pragmat- To be effective and efficient
ic do goals
UX hedonic Stimulation, identification and evocation
be goals
UX: actual Visceral Experience of interaction
experience
Usability (= Effectiveness and Productivity Learnability Accessibility Safety
performance in use: in use: in use: in use:
in use effective and effective and occurrence of
effective task completion and efficient use of time
measures) efficient to efficient with unintended
learn disabilities consequences
Measures of Satisfaction in use:
UX satisfaction with achieving pragmatic and hedonic goals
consequences
Pleasure Likability and Comfort Trust
The users’ goals may be pragmatic (to be effective and could to be elaborated to incorporate new conceptual
efficient), and/or hedonic (stimulation, identification and/or distinctions as they emerge.
evocation). Understanding how different aspects of user experience
Although UX is primarily about the actual experience of relate to usability, accessibility, and broader conceptions of
usage, this is difficult to measure directly. The measurable quality in use, will help in the selection of appropriate
consequences are the user’s performance, satisfaction with measures.
achieving pragmatic and hedonic goals, and pleasure.
REFERENCES
User performance and satisfaction is determined by [1] Bevan, N. (1999) Quality in use: meeting user needs
qualities including attractiveness, functionality and for quality, Journal of Systems and Software, 49(1),
interface usability. Other quality characteristics will also be pp 89-96.
relevant in determining whether the product is learnable,
accessible, and safe in use. [2] Bevan, N. (2006) Practical issues in usability
measurement. Interactions 13(6): 42-43
Pleasure will be obtained from both achieving goals, and as
a direct visceral reaction to attractive appearance (Norman, [3] Carver, C. S., & Scheier, M. F. (1998). On the self-
2004). regulation of behavior. New York: Cambridge
University Press.
WHAT SHOULD BE MEASURED?
[4] Burton, M and Walther, J (2001) The value of web
In a systems development environment, UX/usability log data in use-based design and testing. Journal of
measures need to be prioritised: Computer-Mediated Communication, 6(3).
1. At a high level, whose stakeholder goals are the main [Link]/vol6/issue3/[Link]
concern (e.g. users, staff or managers)? [5] Cockton, G. (2004) From Quality in Use to Value in
2. What aspects of effectiveness, efficiency, satisfaction, the World. CHI 2004, April 24–29, 2004, Vienna,
flexibility, accessibility and safety are most important Austria.
for these stakeholders? [6] Cockton, G. (2008a) Putting Value into E-valu-ation.
3. What are the risks if the goals for effectiveness, In: Maturing Usability. Quality in Software,
efficiency, satisfaction, flexibility, accessibility and Interaction and Value. Law, E. L., Hvannberg, E. T.,
safety are not achieved in the intended contexts of use? Cockton, G. (eds). Springer.
4. Which of these UX/system usability measures are [7] Cockton (2008b) What Worth Measuring is.
important enough to validate using user-based testing Proceedings of Meaningful Measures: Valid Useful
and/or questionnaires, and how should the users, tasks User Experience Measurement (VUUM), Reykjavik,
and measures be selected? Iceland.
5. Are baseline measures needed to establish [8] Groves, K (2007). The limitations of server log files
requirements? (Whiteside et al, 1998) for usability analysis. Boxes and Arrows.
[Link]/view/the-limitations-of
6. Which aspects of interface usability can be measured
during development by expert evaluation to help [9] Harris, J. (2005) An Office User Interface Blog.
develop a product that achieves the UX/system [Link]
usability goals for the important stakeholders in the [Link] Retrieved January 2008.
important contexts of use? [10] Hassenzahl, M. (2002). The effect of perceived
7. How can UX/usability be monitored during use? hedonic quality on product appealingness.
International Journal of Human-Computer Interaction,
CONCLUSIONS 13, 479-497.
Discussion of UX and selection of appropriate UX [11] Hassenzahl, M. (2003) The thing and I: understanding
measures would be simplified if the different perspectives the relationship between user and product. In
on UX were identified and distinguished. The current Funology: From Usability to Enjoyment, M. Blythe,
interpretations of “UX” are even more diverse than those C. Overbeeke, A.F. Monk and P.C. Wright (Eds), pp.
of “usability”. 31 – 42 (Dordrecht: Kluwer).
This paper proposes a common framework for classifying [12] Hornbaek, K (2006). Current practices in measuring
usability and UX measures, showing how they relate to usability. Int. J. Human-Computer Studies 64 (2006)
broader issues of effectiveness, efficiency, satisfaction, , 79–102
accessibility and safety. It is anticipated that the framework
[13] ISO 9241-11 (1998) Ergonomic requirements for ACM Computing Surveys, 33,4 (December 2001) 1-
office work with visual display terminals (VDTs) Part 47. Accessible at [Link]
11: Guidance on Usability. ISO. papers/ue-survey/[Link]
[14] ISO FDIS 9241-171 (2008) Ergonomics of human- [21] Jokela, T., Koivumaa, J., Pirkola, J., Salminen, P.,
system interaction -- Part 171: Guidance on software Kantola , N. (2006) “Methods for quantitative
accessibility. ISO. usability requirements: a case study on the
development of the user interface of a mobile phone”,
[15] ISO CD 9241-210 (2008) Ergonomics of human-
Personal and Ubiquitous Computing, 10, 345 –
system interaction -- Part 210: Human-centred design
[Link], J. (1993) Usability Engineering.
process for interactive systems. ISO.
Academic Press.
[16] ISO 13407 (1999) Human-centred design processes
[22] Norman, D. (2004) Emotional design: Why we love
for interactive systems. ISO.
(or hate) everyday things (New York: Basic Books).
[17] ISO TS 20282-2 Ease of operation of everyday
[23] Väänänen-Vainio-Mattila, K., Roto, V., Hassenzahl,
products -- Part 2: Test method for walk-up-and-use
M. (2008) Towards Practical UX Evaluation Methods.
products. ISO.
Proceedings of Meaningful Measures: Valid Useful
[18] ISO/IEC 9126-1 (2001) Software engineering - User Experience Measurement (VUUM), Reykjavik,
Product quality - Part 1: Quality model. ISO. Iceland.
[19] ISO/IEC CD 25010.2 (2008) Software engineering – [24] Whiteside, J., Bennett, J., & Holtzblatt, K. (1988).
Software product Quality Requirements and Usability engineering: Our experience and evolution.
Evaluation (SQuaRE) – Quality model In M. Helander (Ed.), Handbook of Human-Computer
Interaction (1st Ed.) (pp. 791–817). North-Holland.
[20] Ivory, M.Y., Hearst, M.A. (2001) State of the Art in
Automating Usability Evaluation of User Interfaces.