Data Analytics
Data Analytics
with SNOMED CT
This PDF document was generated from the web version on the publication date shown
above. Any changes made to the web pages since that date will not appear in the PDF.
See the web version of this document for recent updates.
Table of Contents
1 Executive Summary .................................................................................................................................... 2
2 Introduction................................................................................................................................................. 4
Background ..................................................................................................................................................................4
Purpose.........................................................................................................................................................................4
Scope ............................................................................................................................................................................4
Audience .......................................................................................................................................................................4
Document Overview ....................................................................................................................................................4
3 Analytics Overview ...................................................................................................................................... 6
3.1 Definition ................................................................................................................................................................6
3.2 Scope and Purpose ................................................................................................................................................6
3.3 Substrates for Analytics .........................................................................................................................................8
3.4 Examples of Approaches........................................................................................................................................9
4 SNOMED CT Overview ............................................................................................................................... 10
4.1 Concepts ...............................................................................................................................................................10
4.2 Descriptions..........................................................................................................................................................10
4.3 Relationships........................................................................................................................................................11
4.4 Concept Model .....................................................................................................................................................11
4.5 Expressions...........................................................................................................................................................11
4.6 Reference Sets......................................................................................................................................................11
4.7 Description Logic Features ..................................................................................................................................12
4.8 Benefits of Using SNOMED CT for Analytics ........................................................................................................12
5 Preparing Data for Analytics ..................................................................................................................... 13
5.1 Natural Language Processing..............................................................................................................................13
5.2 Mapping Other Code Systems to SNOMED CT ....................................................................................................17
6 SNOMED CT Analytic Techniques ............................................................................................................. 20
6.1 Subsets .................................................................................................................................................................20
6.2 Subsumption ........................................................................................................................................................23
6.3 Using Defining Relationships...............................................................................................................................25
6.4 Description Logic Over Terminology...................................................................................................................30
6.5 Description Logic Over Terminology and Structure...........................................................................................32
6.6 Using Statistical Classifications...........................................................................................................................35
7 Task-Oriented Analytics............................................................................................................................ 38
7.1 Point of Care Analytics .........................................................................................................................................39
7.2 Population-Based Analytics.................................................................................................................................41
The Data Analytics with SNOMED CT guide reviews current approaches, tools and techniques for performing
data analytics using SNOMED CT and to share developing practice in this area. It is anticipated that this report
will benefit members, vendors and users of SNOMED CT by promoting a greater awareness of both what has
been achieved, and what can be achieved by using SNOMED CT to enhance analytics services.
This document is a publication of International Health Terminology Standards Development Organisation, trading as SNOMED International.
SNOMED International owns and maintains SNOMED CT®.
Any modification of this document (including without limitation the removal or modification of this notice) is prohibited without the express
written permission of SNOMED International. This document may be subject to updates. Always use the latest version of this document
published by SNOMED International. This can be viewed online and downloaded by following the links on the front page or cover of this
document.
SNOMED®, SNOMED CT® and IHTSDO® are registered trademarks of International Health Terminology Standards Development Organisation.
SNOMED CT® licensing information is available at [Link] For more information about SNOMED International and
SNOMED International Membership, please refer to [Link] or contact us at info@[Link].
1 Executive Summary
SNOMED CT is a clinically validated, semantically rich, controlled terminology designed to enable effective
representation of clinical information. SNOMED CT is widely recognized as the leading global clinical terminology
for use in Electronic Health Records (EHRs). SNOMED CT enables the full benefits of EHRs to be achieved by
supporting both clinical data capture, and the effective retrieval and reuse of clinical information.
The term 'analytics' is used to describe the discovery of meaningful information from healthcare data. Analytics
may be used to describe, predict or improve clinical and business performance, and to recommend action or guide
decision making.
1 Using SNOMED CT to support analytics services can enable a range of benefits, including:
• Enhancing the care of individual patients by supporting:
Retrieval of appropriate information for clinical care
Guideline and decision support integration
Retrospective searches for patterns requiring follow-up
• Enhancing the care of populations by supporting:
Epidemiology monitoring and reporting
Research into the causes and management of diseases
Identification of patient groups for clinical research or specialized healthcare programs
• Providing cost-effective delivery of care by supporting:
Guidelines to minimize risk of costly errors
Reducing duplication of investigations and interventions
Auditing the delivery of clinical services
Planning service delivery based on emerging health trends
SNOMED CT has a number of features, which makes it uniquely capable of supporting a range of powerful analytics
functions. These features enable clinical records to be queried by:
• Grouping detailed clinical concepts together into broader categories (at various levels of detail);
• Using the formal meaning of the clinical information recorded;
• Testing for membership of predefined subsets of clinical concepts; and
• Using terms from the clinician's local dialect.
SNOMED CT also enables:
• Clinical queries over heterogeneous data (using SNOMED CT as a common reference terminology to which
different code systems can be mapped);
• Analysis of patient records containing no original SNOMED CT content (e.g. free text);
• Powerful logic-based inferencing using Description Logic reasoners;
• Linking clinical concepts recorded in a health record to clinical guidelines and rules for clinical decision
support; and
• Mapping to classifications, such as ICD-9 or ICD-10, to utilize the additional features that these provide.
Analytics tasks, which may be enabled or enhanced by the use of SNOMED CT techniques, can be considered in
three broad categories:
1. Point-of-care analytics, which benefits individual patients and clinicians. This includes historical summaries,
decision support and reporting.
2. Population-based analytics, which benefits populations. This includes trend analysis, public health
surveillance, pharmacovigilance, care delivery audits and healthcare service planning, and
3. Clinical research, which is used to improve clinical assessment and treatment guidelines. This includes
identification of clinical trial candidates, predictive medicine and semantic searching of clinical knowledge.
While the use of SNOMED CT for analytics does not dictate a particular data architecture, there are a few key
options to consider, including:
• Analytics directly over patient records;
• Analytics over data exported to a data warehouse;
2 Introduction
Background
SNOMED CT is a clinically validated, semantically rich, controlled terminology. SNOMED CT is comprised of
meaning-based concepts, human-readable descriptions and machine-readable definitions. SNOMED CT is used
within electronic health records to support data capture, retrieval, and subsequent reuse for a wide range of
purposes. SNOMED CT is also used to enable or enhance analysis of patient records and other clinical documents
containing no original SNOMED CT content.
SNOMED CT hierarchies and formal concept definitions allow selective information retrieval to support analysis –
from patient-based queries to operational reporting, public health reporting, strategic planning, predictive
medicine and clinical research. As the SNOMED CT encoding of healthcare data increases, so too have the benefits
being realized from analytics processes performed over this data.
Purpose
The purpose of this document is to review current approaches, tools and techniques for performing data analytics
using SNOMED CT and to share developing practice in this area. It is anticipated that this report will benefit
members, vendors and users of SNOMED CT by promoting a greater awareness of both what has been achieved,
and what can be achieved by using SNOMED CT to enhance analytics services.
Scope
This document presents different data approaches, tools, terminology techniques, query languages, data
architectures and user interfaces that may be used in performing analytics using SNOMED CT. Analytics services
considered include patient-based queries, operational reporting, the application and audit of evidence-based
medical practice, strategic planning, predictive medicine, public health reporting and clinical research. The benefits
and challenges of these approaches are also presented. The case study summaries describe a selection of SNOMED
CT analytics projects and tools.
This document does not provide an exhaustive list of analytics projects and tools, and does not mandate a specific
approach. The development of clinical case definitions 1 is also outside of the scope of this document.
Audience
The target audience of this document includes:
• Members who wish to learn about current analytics activities in other jurisdictions and inform future
directions;
• Clinicians, informatics specialists and technical staff involved in the planning, management, design or
implementation of clinical record applications or healthcare analytics tools;
• Software vendors, data analysts, epidemiologists and others designing SNOMED CT based solutions.
This document assumes a basic level of understanding of SNOMED CT. For background information it is
recommended that the reader refers to the SNOMED CT Starter Guide.
Document Overview
This document presents an introduction to analytics over data with SNOMED CT content.
Section 1 (Executive Summary) provides a concise summary of the document.
Section 2 (Introduction) introduces the document by explaining the background, purpose, scope, audience and
overview of the document.
Section 3 (Analytics Overview) introduces the topic by presenting a definition of analytics and describing the scope,
purpose and substrates of analytics services which use SNOMED CT.
Section 4 (SNOMED CT Overview) describes the main features of SNOMED CT which may be used to support
analytics over health data, and the specific benefits that using SNOMED CT enables.
Section 5 (Preparing Data for Analytics) describes some approaches used to prepare clinical data for analytics using
SNOMED CT, including mapping and natural language processing.
Section 6 (SNOMED CT Analytics Techniques) presents a range of techniques for using SNOMED CT to perform data
analytics, including using value sets, subsumption, defining relationships and description logic.
Section 7 (Task-Oriented Analytics) looks at how these SNOMED CT based techniques can be used to assist with
specific analytics tasks for point of care analytics, population health monitoring and reporting, and clinical
research.
Section 8 (Data Architectures) presents a number of approaches for architecting analytics services, including
querying directly over patient data, using a data warehouse, querying a virtual medical record and using distributed
storage and processes.
Section 9 (Database queries) considers the query languages that are needed to perform analytics over the
combination of the patient record and terminology content.
Section 10 (User Interface Design) presents a selection of user interface styles that may be used with SNOMED CT to
support querying and results visualization.
Section 11 (Challenges) discusses some of the challenges which are faced when performing analytics over SNOMED
CT enabled data, including the reliability of patient data, information model/terminology boundary issues, concept
definition issues, versioning and inactive content.
Two appendices to this report present a variety of project case studies and vendor tooling case studies respectively.
These appendices, which are referenced extensively throughout this document, can be found at [Link]
analyticscasestudies .
1 [Link]
3 Analytics Overview
3.1 Definition
The term 'analytics' is used broadly in this document to describe the process of extracting useful information from
healthcare data.
Analytics is the discovery and communication of meaningful patterns in data… Analytics may be applied to
business data to describe, predict and improve business performance. The insights from data are used to
recommend action or to guide decision making.
1
Most analytical processes are driven by database queries. A 'query' is a means for retrieving information from a
database consisting of a machine readable question presented to the database in a predefined format. Queries are
used to inform or contribute to a human-readable report or produce a machine-actionable response. A human-
readable report may be a list of patients, a graph, historical or projected resource utilization figures, or a summary
dashboard display. Machine-actionable responses may include populating an order for a new laboratory test, based
on the results of a previous test, or placing an order to restock medical devices on a hospital ward.
Table 3.3-1: Direct and indirect substrates for SNOMED CT based analytics
Unstructured free text document Dictated clinical letter Natural language None or informal headings
Structured documents with free text fields Assessment form Natural language Standardized headings and
fields
Structured documents with free text and post- Discharge summary form with Classifications (e.g. ICD) Formal information model
coded classification (i.e. added by clinical post-coded classification (typically simple)
coders after the clinical event
Structured documents with non-SNOMED CT Standalone clinical Local code system, Formal information model
coding (e.g. proprietary, local or other coding application using controlled vocabulary or
system) departmental codes legacy clinical terminology
Enterprise-wide healthcare
system using local dictionaries
and pick-lists
Structured documents with SNOMED CT Cardiology report SNOMED CT Formal information model
content
GP event summary
'Big data' data store Data warehouse Various coding systems Mixture of both structured and
unstructured data
4 SNOMED CT Overview
SNOMED CT is a clinical terminology containing concepts, with unique meanings and formal logic-based
definitions, organized into hierarchies. The clinical content of SNOMED CT includes diagnoses and other clinical
findings, clinical observations, drug products, organisms, specimen types, body structures, and surgical and non-
surgical procedures.
SNOMED CT enables clinical information to be consistently represented at an appropriate level of detail within
electronic health records. The relationships within SNOMED CT then facilitate meaning-based retrieval of this
information at the preferred level of detail for the given query. This provides significant flexibility and facilitates the
integration of data from divergent models of use, such as different user interfaces or databases, into convergent
models of meaning, such as for the representation of data for reporting or statistical analysis purposes. Clinical
systems can thereby query and analyze electronic health record data recorded in different settings, at varying levels
of granularity and across multiple axes. This enables SNOMED CT to support a variety of clinical processes, which
may require either detailed or high-level information - from investigation, to diagnosis and clinical research.
SNOMED CT content is represented using three main types of component:
• Concepts - unique clinical meanings
• Descriptions - human readable terms used to refer to a concept
• Relationships - links between concepts that help to define the meaning of each concept
In addition to these three types of components, SNOMED CT also supports:
• Expressions – a structured combination of one or more concept identifiers used to represent a new clinical
meaning
• Reference sets – a mechanism for representing references to SNOMED CT components for a variety of
purposes, including subsets, aggregation hierarchies, maps and language preferences
In this section we introduce these SNOMED CT features and explain how they may be used to support analytics over
health data. For more detailed information about SNOMED CT features, please refer to the SNOMED CT Starter
Guide and the SNOMED CT Technical Implementation Guide.
We also discuss the specific benefits enabled by using SNOMED CT. For more details about the benefits of SNOMED
CT please refer to Building the Business Case for SNOMED CT.
4.1 Concepts
SNOMED CT concepts represent clinical meanings. Each concept has a permanent concept identifier, which
uniquely identifies the clinical meaning. For example:
• 22298006 |myocardial infarction|
• 160341008 |family history: epilepsy|
• 399208008 |plain chest X-ray|
• 319996000 |simvastatin 10mg tablet|
SNOMED CT's concepts, and their logic-based definitions, allow analytics services to perform meaning-based
queries, rather than purely lexical (or string-matching) searching.
4.2 Descriptions
SNOMED CT descriptions link appropriate human readable terms to concepts. Each concept can have many
descriptions, which represent different synonymous ways of referring to the same clinical meaning. Each
description is written in a specific language, and new descriptions can be created to support a variety of languages.
Like concepts, descriptions also have a permanent unique identifier.
The richness of description content assists the process of searching and finding concepts using user interfaces or
database queries. It may also be used to enhance string-matching in natural language processing applications,
including analytics over multi-lingual data.
4.3 Relationships
SNOMED CT relationships represent an association between two concepts. Relationships are used to logically
define the meaning of concept in a way that can be processed by a computer. A third concept, called a relationship
type, is used to represent the meaning of the association between the source and destination concepts. There are
different types of relationships available within SNOMED CT.
Subtype relationships, which use the |is a| relationship type, are the most widely used type of relationship. The
SNOMED CT concept hierarchy is constructed from |is a| relationships. For example, the concept 128276007 |
cellulitis of foot| has an |is a| relationship to both the concept 118932009 |disorder of foot| and the concept
128045006 |cellulitis|. Subtype relationships are used in many analytics scenarios to aggregate groups of concepts
together, or to perform queries using more abstract (less detailed) concepts that match more specific (or more
detailed) concepts stored in health records.
Attribute relationships contribute to the definition of the source concept by associating it with the value of a
defining characteristic. For example, the concept |viral pneumonia| has a |causative agent| relationship to the
concept |Virus| and a |finding site| relationship to the concept |lung|. Attribute relationships are used in analytics
scenarios in which the meaning of a concept is needed to determine whether a record matches the query criteria.
4.5 Expressions
An expression is a structured combination of one or more concept identifiers used to represent a clinical meaning.
SNOMED CT postcoordinated expressions enable clinical meanings to be represented, which cannot be represented
using a single SNOMED CT concept. For example, the following postcoordinated expression represents 'pain in the
left thumb:
• To define language or dialect specific sets of descriptions over which lexical searches can be performed.
Example
The example shown below in Figure 5.1-1 shows a free text section of a discharge summary that has been processed
with clinical NLP to extract a set of potential SNOMED CT clinical findings and procedures. In order to ensure the
correctness of this automatic encoding, the application should present this list of extracted codes to the user for
confirmation, giving them the opportunity to refine, delete or append codes.
To improve the accuracy of clinical NLP and the value for analytics processes, it is important that the context of
each statement expressed in natural language is clearly identified – for example, past history, suspected and
negation/absence. Figure 5.1-2 shows the same discharge summary narrative as in Figure 5.1-1, but this time
processed with clinical NLP that also extracts the explicit context of each clinical finding and procedure.
When SNOMED CT codes with explicit context are extracted from free text narrative, the resulting clinical meanings
may be captured using SNOMED CT postcoordinated expressions. For example, the following clinical statement:
Endoscopy revealed an acute gastric ulcer but no evidence of gastric bleeding or perforation of the stomach.
can be encoded using the following SNOMED CT expressions with explicit context (see Clinithink case study):
• 243796009 |situation with explicit context| :
{408731000 |temporal context| = 410512000 |current or specified time|,
246090004|associated finding| = 95529005 |acute gastric ulcer|,
408732007 |subject relationship context| = 410604004 |subject of record|,
408729009 |finding context| = 410515003 |known present|
• 243796009 |situation with explicit context| :
{408729009 |finding context| = 410516002 |known absent|,
Implementation
NLP Techniques using SNOMED CT
A clinical NLP engine can use SNOMED CT to encode free text narrative in patient records in a number of ways.
Firstly, it can use SNOMED CT descriptions together with techniques such as:
• Stemming: The process of reducing a word to its stem, base or root form – for example "cardiology",
"cardiac" and "cardiologist" may be reduced to the stem "cardi".
• Reordering: The process of reordering the words in a phrase – for example, reordering "hip fracture" to
"fracture hip".
• Word substitution: The process of substituting a word or word phrase with an equivalent word or word
phrase. The SNOMED CT Lexical Resources zip file, available from the SNOMED CT Document Library,
includes an English Word Equivalents table that groups together equivalent words and phrases – for
example, "Renal stone", "Kidney stone", "kidney calculus", "renal calculus" and "nephrolith" are grouped
into the same word block group. This table can be modified or extended with additional word equivalent
groups if required.
• Stop word removal: The process of removing words with limited semantic specificity – for example 'a', 'an',
'and', 'as', 'at', 'be', 'by', 'for', 'of', 'the'. The SNOMED CT Lexical Resources zip file , available from the
SNOMED CT Document Library, includes an Excluded Words table, which suggests some common English
stop words that may be used with SNOMED CT.
The SNOMED CT concept model can also be used to identify potential connections between related concepts – for
example, the words "left", "hip" and "fracture" used in close proximity may indicate a |fracture| with finding site |
hip| and a laterality of |left|. Similarly, the SNOMED CT concept model may help to identify context that is expressed
within the text – for example, past history, certainty and absence.
Another commonly adopted NLP strategy is to use the location of the free text within the structure of a document to
restrict the possible SNOMED CT code matches. For example, free text entered into a 'Diagnosis' field may restrict
its SNOMED CT encoding to the |disorder| hierarchy, together with other concepts that may be linked to |clinical
findings| via the SNOMED CT concept model.
When NLP techniques are applied to non-English (or dialect-specific) text, translations of relevant SNOMED CT
descriptions may be required. The NLP methods themselves may also need to be adapted to reflect the structure
and style of the language in which the text is written.
Indexing
Another major application for Natural Language Processing technologies is indexing collections of free text
transcripts or documents such that topic specific searches may be run on them, or relevant clinical knowledge
sources may be identified and linked to a given patient's clinical data. The challenge is to return ranked matches
which permit selection of texts with high sensitivity and high specificity (i.e. that relevant documents are rarely
overlooked and that irrelevant documents are rarely returned).
SNOMED CT can be used to support these applications by enabling more powerful searching of free text data stores
than using a purely lexical keyword-based approach. For example, the clinician may request "all documents which
refer to cardiac rhythm disorders". Rather than relying purely on text matching, the search term may be matched
with the concept 698247007 |cardiac arrhythmia (disorder)|, based on its synonym |disorder of heart rhythm|. The
descendants of this concept (e.g. 276796006 |atrial tachycardia|, 49260003 |idioventricular rhythm|, 233917008 |
atrioventricular block|) may then be used to search for any code which is a kind of cardiac arrhythmia. Non-|is a|
attribute relationships may also be used in the retrieval process to find associations between the search term and
the indexed concepts, and to calculate the relevance of each free text artefact to determine the order in which they
should be presented to the user.
Case Studies
Clinical NLP has been implemented for encoding free text narrative in health records by a number of vendors,
including Caradigm, Cerner, Clinithink and Intelligent Medical Objects).
NLP techniques for indexing and searching have also been implemented by Cerner and Dr Bevan Koopman.
Allscript's Sunrise InfoButton™ feature uses encoded patient problem lists and medication data elements, together
with SNOMED CT-based indexes provided by third-party medical content providers, to present on-topic information
to the clinician without manual searching.
Designing and authoring maps requires expertise and appropriate resources. Large maps (e.g. tens of thousands of
codes) are typically created and maintained by SNOMED International, National Release Centers, large healthcare
organizations, specialist data suppliers and large system vendors. However, smaller maps may be created and
maintained by smaller system suppliers, hospitals or clinics. Maps must be maintained to ensure that both the
SNOMED CT content and non-SNOMED CT content remains current whenever either code system is updated.
Example
A typical scenario requiring mapping to SNOMED CT is shown below in Figure 5.2-1. In this example, two source
systems (using ICD-9 and ICD-10 respectively) are being integrated into a data warehouse using SNOMED CT as the
common 'reference terminology' for analysis. Once this mapping is done, the same analytic techniques as used on
native SNOMED CT records may be applied (See Section 6 SNOMED CT Analytic Techniques).
Implementation
Mapping Using SNOMED CT
Maps are represented in SNOMED CT's RF2 using a Simple map reference set, a Complex map reference set, or an
Extended map reference set (depending on what additional information is required to support the implementation
of the map). Code mappings are then performed by matching each non-SNOMED CT code in a patient's record with
the 'mapTarget' field of the corresponding row of the map reference set, and using the SNOMED CT code found in
the 'referencedComponentId'.
Case Studies
The UK Terminology Centre's Data Migration Workbench demonstrates some advanced uses of data migration and
mapping products published by the UKTC, including Read Code Version 2 and CTV3 maps to SNOMED CT. A number
of vendor products also map non-SNOMED CT codes to SNOMED CT for use in analytics, including Allscript's
terminology service, Apelon's Distributed Terminology System, the Cerner Millennium Terminology (CMT) package,
and Epic's electronic patient record systems.
1 Please note that this concept does not exist in the international edition of SNOMED CT, but is shown here as a
hypothetical example of a concept added in a SNOMED CT extension.
6.1 Subsets
One approach to analytics using SNOMED CT is to construct subsets of SNOMED CT identifiers, which are applicable
to a specific clinical purpose, and to test the codes recorded in patient records to check for membership in the
appropriate subset. Subsets of SNOMED CT identifiers may either be defined extensionally or intensionally.
Extensionally defined subsets are those in which each concept is individually enumerated. They are usually
manually constructed and maintained, and can therefore be labor intensive and error prone. For example, one
might construct a subset of kidney disease codes including 36171008 |glomerulonephritis|, 71110009 |
hydrocalycosis| and 42399005 |renal failure|.
Intensionally defined subsets are those which are automatically populated (or expanded) based on a machine
processable query. For example, one might construct a subset of kidney disease codes using the results of the query
"<< 90708001 |kidney disease|" (i.e. descendants or self of 90708001 |kidney disease|). The query used to define an
intensional subset may utilize SNOMED CT's hierarchical relationships, attribute values, descriptions, and
membership in other intensionally or extensionally defined subsets. For more information about SNOMED CT query
languages, which may be used to define intensional subsets, please refer to section 9 Database Queries.
Example
A subset containing types of 58437007 |tuberculosis of meninges| may be defined extensionally as follows:
Concept ID Description
With the help of the SNOMED CT hierarchy (as shown in Figure 6.1-1), this same subset can be defined intensionally
as: << 58437007 |tuberculosis of meninges| The expansion of an intensional subset defined using this query is the
same as the extensionally defined subset shown above. 1
Using a lexical query, it is also possible to intensionally define a subset of 'tuberculosis of meninges' findings.
However, the results of purely lexical queries are not as reliable. For example, using the query: << 404684003 |
clinical finding| {{ term = ".tuberculosis.*meninges." }}
the following expansion can be calculated:
Concept ID Description
As can be seen, the results of this lexical query only includes 3 of the possible 4 values from the previous subset. In
other cases, lexical queries may incorrectly find concepts which are not appropriate to the subset. It is therefore
recommended that lexical queries are avoided in the definition of intentional subsets. However, they do serve a
useful purpose in identifying candidates for an otherwise manually crafted subset.
Implementation
Defining Subsets in SNOMED CT
Subsets of SNOMED CT may be defined locally as a flat list of concept identifiers, or as an independent query
specification. However, where wider distribution and/or version control is required over these subsets, SNOMED CT
reference sets offer the ideal solution.
Extensional subsets are commonly defined in SNOMED CT using a Simple reference set - however an Ordered
reference set or Annotation reference set can be used if additional information needs to be recorded for each
member of the subset. Intensional subsets are defined in SNOMED CT using a Query specification reference set. A
Query specification reference set allows a serialized query to define the membership of a subset of SNOMED CT
components. It also specifies the extensional reference set into which the results of executing the query are
generated. Intensional reference sets are preferred in many circumstances as they enable their membership to be
automatically recomputed over new versions of SNOMED CT. Version management of subsets is discussed further in
section 11.4 Versioning.
Subsets can be created using the following methods, either alone or in any combination:
• Manual inclusion, using search and browse methods
• Existing subset, used as a starting point for further manual inclusion and update
• Lexical queries, to identify candidate members, followed by manual verification and update
• Hierarchical queries, to identify descendants of a given concept (e.g. descendants of <73211009 |diabetes
mellitus|)
• Attribute queries, to identify concepts with a specific attribute value (e.g. disorders with a finding site of
80891009 |heart structure|)
• SNOMED CT queries, using the SNOMED CT Expression Constraint or Query languages, which offer additional
query functionality. Please refer to section 9 Database Queries for more details.
Case Studies
A number of vendor products, such as Apelon and B2i Healthcare allow users to create customized extensional and
intensional subsets of SNOMED CT. Other vendor products, such as the Cambio COSMIC® Electronic Patient Record
system, Caradigm's population health solutions, Cerner's data warehousing solution and Epic's decision support
and reporting tools use subsets of SNOMED CT to support their analytics services.
6.2 Subsumption
Determining whether one concept (or expression) is a kind of another concept (or expression) is the fundamental
capability enabled by SNOMED CT. For example, answering the question 'Which patients have an infectious
disease?' involves finding all the patients with any kind of infectious disease (e.g. viral pneumonia, tuberculosis).
Subsumption occurs when one clinical meaning is a subtype of another clinical meaning, and testing for this is
called 'subsumption testing'. If clinical meaning X is a subtype of clinical meaning Y, then Y is said to 'subsume' X
and X is 'subsumed by' Y.
Subsumption testing between concepts is represented using a stated or implied |is a| relationship. For example,
75570004 |viral pneumonia| is a 40733004 |infectious disease| and therefore 40733004 |infectious disease| subsumes
75570004 |viral pneumonia|, and 75570004 |viral pneumonia| is subsumed by 40733004 |infectious disease|.
Subsumption testing between expressions tests to see if the candidate expression (often recorded in a patient
record) is subsumed by a predicate expression (typically part of the query being run across the patient record). For
example:
Candidate expression: 75570004 |viral pneumonia|
Predicate expression: 40733004 |infectious disease|:
363698007 |finding site| = 39607008 |lung structure|
In this case, the candidate expression is subsumed by the predicate expression.
Subsumption testing can be represented using the SNOMED CT Expression Constraint Language using the
'<' (descendantOf) or '<<' (descendantOrSelfOf) operators. For example, the expression constraint:
<< 40733004 |infectious disease|
is satisfied by any expression that is subsumed by 40733004 |infectious disease|.
There are a variety of ways to implement subsumption testing. These are summarized in the Implementation sub-
section below.
Example
A typical example using subsumption would be an audit within a hospital, reviewing all patients with an infectious
disease. In this scenario, the following simple query could be executed to find all the patients whose health record
contains a diagnosis that is subsumed by the concept 40733004 |infectious disease|:
SELECT distinct patientID
FROM health_records
WHERE diagnosis = (<< 40733004 |infectious disease|)
If the health records contained the following data:
patientID date diagnosis
Implementation
Testing Subsumption Between Concepts
Rapid and efficient computation of whether a concept |is a| subtype descendant of another concept is essential for
testing subsumption between expressions. A variety of approaches exist for testing subsumption. When the
candidate and predicate expressions are both precoordinated concepts, subsumption testing can use the published
relationships from the SNOMED CT release files. Approaches for testing subsumption between precoordinated
concepts include:
• Exhaustive testing of subtype relationships
In this approach, every possible sequence of |is a| relationships are recursively tested from the candidate concept
until the predicate concept is reached or until all possible paths have been exhausted.
• Semantic type identifiers and hierarchy flags
In this approach, flags are added to each concept to indicate the set of high-level concept nodes of which that
concept is a subtype. A concept can only subsume concepts that include the same set of high-level concept flags.
This reduces the number of tests that need to be performed to recursively test the subtype relationships.
• Use of proprietary database features
In this approach, a database is used which supports the recursive testing of a chain of hierarchical relationships.
• Branch numbering
In this approach, a depth first tree walk is performed that applies an incremental number to each concept. A second
tree walk then allocates one or more branch number ranges to each concept, which contains the number of all of
their descendants.
• Precomputed transitive closure table
In this approach, a comprehensive list of all supertypes of each concept is created by recursively traversing all |is a|
relationships and adding each stated and inferred subtype relationship to a table.
• Using a Description Logic Reasoner
In this approach, a description logic reasoner (e.g. Snorocket, ELK, Fact++) is used to determine whether one
concept is subsumed by another.
In most environments, the recommended approach is to either use a precomputed transitive closure table or a
description logic reasoner. However, where disk capacity or distribution bandwidth are limiting factors, branch
numbering provides an efficient alternative approach. For more information on these approaches, please refer to
Subtype search scope restriction in the SNOMED CT Terminology Services Guide.
Case Studies
A number of vendor products use the SNOMED CT hierarchy to support subsumption testing in their analytics
services, including the Cerner Millennium Terminology (CMT) package and Epic's decision support and reporting
tools. Terminology servers that provide the ability to perform subsumption testing include B2i Healthcare's Snow
Owl® terminology server. The UK Terminology Centre's Data Migration Workbench also uses subsumption testing in
its query tool, and its case mix and caseload trends analysis tools.
Example
Figure 6.3-1 illustrates the execution of a query to retrieve a set of findings which have a benign tumor morphology.
The query is executed by finding those concepts with an 'associated morphology' relationship with the value
'benign neoplasm'. In this example, the concepts 'benign tumor of kidney', 'benign neoplasm of bladder' and
'benign tumor of lung' are found to have the required defining relationship value.
In Figure 6.3-2 the same set of concepts are shown analyzed with the intention to identify those which have a
finding site of kidney. In this example, the concepts 'renal cyst', 'benign tumor of kidney' and 'renal abscess' are
found to have the required defining relationship value.
If the queries from Figure 6.3-2 and Figure 6.3-3 are combined, then the query will return those concepts which are
benign tumors of the kidney (see Figure 6 4). In this case, the concept 'benign tumor of kidney' is the only concept
found to have the required defining relationship values.
In most cases, these queries would be designed to return concepts with an associated morphology of 'benign
neoplasm' or any subtype of 'benign neoplasm (e.g. 'angiomyolipoma'), and a finding site of 'kidney' or any subtype
of 'kidney' (e.g. 'papillary duct of kidney', or 'upper pole, left kidney'). This query could be expressed using
the Expression Constraint Language as:
< 404684003 |clinical finding|:
When executed against the January 31st 2015 international edition of SNOMED CT, this query would return the
following 12 concepts:
Implementation
Queries Over Defining Relationships
A query, which constrains the defining relationships of matching clinical meanings to specific values can either be
represented informally using a set of attribute value pairs, or represented more formally using a machine
processable language (e.g. the SNOMED CT Expression Constraint Language).
Approaches to implement such a query include:
• Using the distributed relationships
In this approach, the distributed Relationship file is used directly to compare the target value of each defining
relationship with the required attribute value in the query. This approach may be combined with a subsumption
testing approach (e.g. transitive closure table) to enable subtypes of the required attribute value to also be
matched.
• Comparing normal form expressions
In this approach, the query is represented as a predicate expression containing the constrained attribute values,
and the short normal form of this predicate expression is tested for subsumption against each candidate expression
(as per the normal form subsumption test in section 6.2 Subsumption).
• Using a Description Logic Reasoner
In this approach, a description logic reasoner (e.g. Snorocket, ELK, Fact++) is used to determine whether each
candidate expression is subsumed by the query (represented by a predicate expression).
Case Studies
Many organization-wide implementations of SNOMED CT, such as Kaiser Permanente's HealthConnect EHR and the
Danish National Medication Decision Support System, are taking advantage of SNOMED CT's definitional attributes
to support advanced analytics.
A number of vendor products are also supporting analytics over SNOMED CT's defining relationships, including
Apelon's Distributed Terminology System, B2i Healthcare's SnowOwl terminology server, and Cerner's Semantic
Search tool.
Example
For example, if we want to find all disorders that are associated with the organism 80166006 |streptococcus
pyogenes|, we may discover (using the SNOMED CT Relationships file) that there is a direct 'causative agent'
relationship from 302809008 |streptococcus pyogenes infection| to 80166006 |streptococcus pyogenes|. However,
by introducing the following property chain rule:
47429007 |associated with| ο 47429007 |associated with| → 47429007 |associated with|
and noting that 47429007 |associated with| has three subtypes:
255234002 |after|
42752001 |due to|
246075003 |causative agent|
it is possible to discover, using Description Logic, that 81077008 |acute rheumatic arthritis| and 58718002 |
rheumatic fever| are also 'associated with' the concept 30209008 |streptococcus pyogenes infection|. Figure
6.4-1 illustrates these relationships that can discovered using property chaining.
Implementation
OWL 2
Using Description Logic techniques to perform analytics over SNOMED CT involves first translating SNOMED CT into
OWL 2 (Web Ontology Language). OWL 2 is an ontology language for the Semantic Web with formally defined
meaning. The SNOMED CT international release comes with a Perl transform script that converts the RF2 files into
OWL XML/RDF, Functional Syntax or KRSS files.
Once generated, the OWL files can then be loaded into a Description Logic Editor (such as Protégé) or used directly
by a terminology service which offers description logic capabilities. The Description Logic Editor or terminology
service then uses DL reasoners (also known as 'classifiers'), such as Snorocket, ELK and FACT++, to perform
consistency checking and subsumption testing (also known as 'classification') over SNOMED CT. Subsumption
testing can also be performed between two expressions. Semantic query languages, such as SPARQL, can be used
to query over RDF representations of SNOMED CT.
Case Studies
Some commercial terminology servers, such as B2i Healthcare's Snow Owl terminology server, use Description
Logic based techniques to support both classification and querying over SNOMED CT. Kaiser Permanente is
collaborating with Oxford University to investigate ways of performing complex queries efficiently across extremely
large numbers of patient records using scalable parallel processing and description logic reasoners.
Example
Consider for example the two alternative ways of recording family history, as shown in Figure 6.5-1. The green
rectangles represent the logical structure of the information model and the blue rectangles represent the concept
identifiers that are used to populate this information model in the patient record.
The information model on the left uses a heading of 'Family history' to indicate that the named problem refers to a
family history of that problem. The information model on the right uses the terminology value to indicate that the
problem refers to a family history instance.
When querying over data, which may be collected in either format, both the semantics of the information model
and the semantics of the data instances must be considered. One way of achieving this is to use an 'expression
template' to convert all data instances into a Description Logic representation, and use this to reason over the data.
Figure 6.5-2 shows an example of an expression template that could be used to create a SNOMED CT expression for
each of the data instances shown in Figure 6.5-1. Please note that the orange parallelograms represent 'slots' which
are subsequently populated with the value of the named data element (e.g. '$Problem').
When the data instances from Figure 6.5-1 are used to populate the templates from Figure 6.5-2, the following two
expressions are created:
416471007|family history of clinical finding|:
246090004 |associated finding| = 56265001 |heart disease|,
408732007 |subject relationship context| = 72705000 |mother|,
408731000 |temporal context| = 410511007 |current or past (actual)|,
408729009 |finding context| = 410515003 |known present|
275120007 |family history: cardiac disorder|
These expressions may then be compared using a DL reasoner to discover that the first expression is subsumed by
the second, or queried using a semantic query language to allow the two data representations to be analyzed in a
consistent way.
Implementation
OWL 2
Description Logic techniques, such as those described in section 6.4 Description Logic Over Terminology, can be
used to reason over both the terminology and the information model. In addition to translating SNOMED CT to OWL
2, OWL 2 representations of the information model are also created using 'templates' that include 'slots' which are
then filled with the patient record instance values. DL reasoners, such as Snorocket, ELK and FACT++, and semantic
query languages, such as SPARQL, can then be used over both the terminology and the information model in a
consistent way.
Case Studies
Kaiser Permanente is collaborating with Oxford University to investigate ways of performing complex queries
efficiently across extremely large numbers of patient records using scalable parallel processing and description
logic reasoners. In this project, the analysis is being performed over an OWL-RL representation of the patient data,
which incorporates both the terminology and the structure of the information.
defining relationships between concepts, which further enhances its ability to support flexible and powerful
analytics capabilities.
It is generally recommended that clinical data is recorded using a clinical terminology, such as SNOMED CT, and
then mapped for reporting purposes to one or more classifications, such as ICD. SNOMED International publishes a
map from SNOMED CT to both ICD-9 and ICD-10. This supports epidemiological, statistical and administrative
reporting needs of the member countries and WHO Collaborating Centers. The collaborative work between
SNOMED International and the WHO on the alignment of ICD-11 with SNOMED CT is in progress and promises
tighter integration of the distinct use cases in the future.
Example
The following example illustrates the rows of the Extended Map reference set that supports the mapping from the
SNOMED CT concept 15296000 |sterility| to an appropriate ICD-10 code. The set of map rules associated with each
SNOMED CT concept are grouped together into 'map groups' and then ordered within each map group by a 'map
priority'. The map rule provides a machine readable rule that indicates whether this map should be selected within
its map group, and the map advice provides human readable advice. The correlation indicates the type of match
between the source and the target (e.g. 'exact match' or 'narrow to broad') and the map category indicates the kind
of map being represented.
Referenced Map target Map group Map priority Map rule Map advice Correlation Map Category
component ID Id
Implementation
Mapping to Classifications using SNOMED CT
Maps from SNOMED CT to classifications are generally represented in SNOMED CT's RF2 using a Complex or
Extended map reference set. Mappings are then performed by matching each SNOMED CT code in a patient's record
with the corresponding row of the map reference set, and using the classification code found in the 'mapTarget'
field.
Case Studies
The UK Terminology Centre's Data Migration Workbench demonstrates the use of maps from SNOMED CT to ICD-10
International Edition (using the UK maps) and OPCS Classification of Interventions and Procedures (OPCS-4). The
National Library of Medicine (NLM) has also developed a demonstration tool, which demonstrated the key
principles of implementing map rules and advice. This tool, called I-MAGIC
1 (Interactive Map-Assisted Generation of ICD Codes) uses the SNOMED CT to ICD-10 map in a real-time, interactive
manner to generate ICD-10 codes. It simulates a problem list interface in which the user enters problems with
SNOMED CT terms, which are then used to derive ICD-10 codes using the map. A number of vendor products, such
as Cerner Millenium, also use maps from SNOMED CT to ICD-10 to enable statistical analysis.
1 [Link]
7 Task-Oriented Analytics
The SNOMED CT analytics techniques described in the previous chapter only become useful when performing a
specific analytics task intended to meet a business need. In this chapter, we consider a range of analytics tasks,
which are either enabled or enhanced by using these SNOMED CT techniques.
The analytics tasks which can benefit from the use of SNOMED CT techniques can be considered in three broad
categories, as shown in Figure 7-1:
• Point of care analytics, which benefits individual patients and clinicians. This includes historical summaries,
decision support and reporting.
• Population-based analytics, which benefits populations. This includes trend analysis, public health
surveillance, pharmacovigilance, care delivery audits and healthcare service planning.
• Clinical research, which is used to improve clinical assessment and treatment guidelines. This includes
identification of clinical trial candidates, predictive medicine and semantic searching of clinical knowledge.
Many of these tasks use business intelligence capabilities, similar to those used in other sectors, such as
manufacturing, retail and transportation. Business intelligence is the provision of historic, current and predictive
views of information. Such services include reporting, online analytical processing (OLAP), data mining, process
mining, complex event processing, benchmarking, text mining, predictive analysis and prescriptive analytics. In
many cases, a data warehouse is used as the platform on which these services are provided (see section 8.2 Data
Warehouse).
The combination of these business intelligence techniques with the capabilities of SNOMED CT creates new
opportunities to improve healthcare delivery.
allergies. This service uses a summary extracted from detailed patient care records held in a variety of disparate
systems. Where the source data is not stored natively in SNOMED CT, they are mapped into SNOMED CT prior to
transmission. Over 40 million people in England (80% of the population) now have a summary care record. This
service now contributes to the safe and efficient assessment and treatment of these people, and has greatly
improved the accuracy and timeliness of medicines reconciliation. 2
1 Note: In some healthcare environments this is a point of care activity, while in others it is not.
Detecting changes of either incidence or prevalence of a particular disease, treatment, procedure or intervention
over time has major utility for population health monitoring, prediction of demand and effective resource
allocation at enterprise and national levels. One challenge that is encountered when analyzing routinely collected
patient data for trends, is distinguishing minor changes in coding style from real changes in disease incidence.
Simply counting the use of individual concept identifiers may be highly misleading. For example, a fall in the use of
the code 22298006 |myocardial infarction| might reflect a shift to using more specific codes (such as 314207007 |
non-Q wave myocardial infarction| or 304914007 |acute Q wave myocardial infarction|), rather than a reduction in
the incidence of myocardial infarctions. Use of subsumption testing on SNOMED CT encoded data (see section 6.2
Subsumption) can enable higher level trend analysis to be performed over more specific coded data.
SNOMED CT's polyhierarchy allows trends to be analyzed from multiple perspectives. However, deciding which
level of aggregation to use for trend analysis can be arbitrary. Novel approaches to this task are emerging as the
demand for trend analysis over SNOMED CT enabled data increases.
The UK Data Migration Workbench (case study 12.1.1 Data Migration Workbench (UK)), for example, includes a trend
module which analyses the frequency with which individual SNOMED CT codes are used in the Electronic Patient
Record (EPR) instance data, looking for those whose recording frequency has changed over the course of the data
collection period. It also includes an Induce module, which performs a more sophisticated analysis of case mix and
caseload trends within a clinical department. Instead of returning the most frequently used individual codes, the
Induce module identifies the most frequently used types of codes. For example, an emergency department may use
roughly 500 different SNOMED CT codes for a laceration in a particular anatomical location. While none of the site-
specific codes may appear in a list of most common codes, the descendants of 312608009 |laceration| may
collectively account for a significant part of the department's workload.
The algorithm used picks aggregation points at defined levels for analysis. The default setting finds roughly 100 sub-
trees within the SNOMED CT hierarchy, where each sub-tree accounts for a more or less constant proportion of all
coded episodes (around 1% of all coded events per sub-tree). The algorithm completes once the set of all codes
within all identified sub-trees collectively accounts for the large majority of the dataset being analyzed. When
applied to real emergency department attendance data, relatively low numbers of presentations (about 0.2%) were
coded as occurring primarily as a result of endocrine disease. As a result, in order to get a big enough grouping of
episodes, the algorithm chooses 362969004 |disorder of endocrine system| as the root of a single sub-tree covering
these reasons for the patient's attendance. By contrast, a very high proportion (9.4%) of presentations relate to
some subtype of 928000 |disorder of musculoskeletal system|. Therefore this part of the caseload is aggregated
under multiple more granular sub-trees, including (separately) burns, abrasions, lacerations, blunt injury, crush
injury and foreign body.
These code aggregations can then be tracked across time to reveal trends in demand, disease incidence or resource
utilization.
7.2.2 Pharmacovigilance
Pharmacovigilance is the collection, detection, assessment, monitoring and prevention of adverse effects with
pharmaceutical products. It is concerned with identifying the hazards associated with pharmaceutical products and
minimizing the risk of any harm that may come to patients. An important part of pharmacovigilance is
postmarketing surveillance, which monitors the safety of a pharmaceutical drug or medical device after it has been
released on the market. Since drugs are approved on the basis of clinical trials, which involve relatively small
numbers of people, postmarketing surveillance plays an important part in further refining, confirming or denying
the safety of a drug in the general population.
Pharmacovigilance uses a number of data sources to assess and monitor the safety of licensed drugs, including
clinical trial data, medical literature, spontaneous reporting databases, prescription events, electronic health
records, and patient registries. Data mining of large volumes of clinical data can be used to highlight potential
safety concerns. However, current mechanisms to analyze this data is often both costly and insensitive.
The availability of large datasets of richly encoded SNOMED CT data within longitudinal healthcare records can
greatly assist pharmacovigilance. Where SNOMED CT is not used natively to capture clinical data, free text narrative
and other code systems may be mapped to SNOMED CT (see sections 5.1 Natural Language Processing and 5.2
Mapping Other Code Systems to SNOMED CT) to support a homogeneous approach to querying across diseases,
signs and symptoms, lab results, medications, devices, procedures, allergies, adverse reactions, body sites and
substances. SNOMED CT's polyhierarchy and defining relationships, which provide links between these domains
provide a rich source of meaning-based information across which queries can be performed.
Many drug regulatory authorities and pharmaceutical companies currently use the Medical Dictionary for
Regulatory Activities (MedDRA) to classify adverse drug events. MedDRA is an international standard adverse event
classification used from pre-marketing through to post-marketing activities. However, as MedDRA was not designed
to support routine clinical data collection, its penetration into clinical systems is limited. Therefore mapping from
SNOMED CT to MedDRA would enable both styles of analysis and reporting to be performed from the same clinical
data. The UK Medicines and Healthcare products Regulatory Agency (MHRA) is working (with input from the
MedDRA Maintenance and Support Services Organization) to develop a mapping from a subset of SNOMED CT to
MedDRA for this purpose.
Clinical research typically involves the analysis of data from well-defined and homogenous groups of patients with
a specific disease, at a specific stage, receiving similar treatments and often without significant co-morbidities. The
data may be captured prospectively or retrieved retrospectively.
SNOMED CT helps clinical research activities by assisting in the identification of clinical trial candidates, enabling
the powerful analysis of trial data, supporting predictive medicine, and improving the effectiveness of semantic
search over clinical knowledge.
In this section, we discuss three key aspects to clinical research that can benefit from the use of SNOMED CT:
identification of clinical trial candidates, predictive medicine, and semantic search.
8 Data Architectures
While the use of SNOMED CT for analytics does not dictate a particular data architecture, there are a few key
options to consider. In this section, we describe the major categories of data architecture that may be used to
perform analytics over SNOMED CT enabled patient data, including:
1. Analytics directly over patient records;
2. Analytics over data exported to a data warehouse;
3. Analytics over a Virtual Health Record (VHR);
4. Analytics using distributed storage and processing.
Please note that some of these approaches may be used in combination. For example, data warehouses with large
volumes of data may use distributed storage and processing for enhanced performance, and querying directly over
disparate patient records could be performed using a Virtual Health Record.
Commercial data warehousing solutions that support SNOMED CT include Cambio's COSMIC Intelligence, Cerner's
PowerInsight Data Warehouse (PIDW) and Cerner's Health Facts Data Warehouse.
The process of transforming the logical query into separate physical queries may involve translating:
• The Query Language – from a common query language to the local data store's native query language
• Data Model References – from the common data model to the local data model
• Terminology References – from the standard terminology to the local code system
For example, if the user poses the following SQL query, written in terms of the VHR's common data model, to select
those patients with a diagnosis that is a subtype of 40733004 |infectious disease|:
SELECT patient_id FROM Health_Records
WHERE diagnosis IN (<40733004 |infectious disease|)
This query may be translated into the following 3 queries for local execution on each data store:
Data Store A:
Patient_record/patient_id[@diagnosis=typeOf(INF)]
Data Store B:
SELECT id FROM EHR NATURAL JOIN DSummary
WHERE discharge_diagnosis IN (descendantsOf (40733004)
Data Store C:
SELECT patient FROM record
WHERE diag IN (<40733004)
Similarly, when the query results are returned by each data store, these need to be transformed and mapped into
the common data model and then combined for presentation to the user.
The VHR approach provides an alternative architecture to a data warehouse for integrating heterogeneous systems.
It is most commonly used when copying clinical data into a data warehouse is not possible (e.g. due to legislative
requirements), or when the currency of the data is imperative. The challenges with this approach lie with the
potential complexity of the transformations required. The implementation of this approach is considered to be a
type of heterogeneous distributed database, as described in Section 8.4 Distributed Storage and Processes.
• Local autonomy of data (e.g. each department or institution controls their own data)
• Distributed query processing can improve performance, as the load can be balanced among the servers
A number of tools are available for the distributed storage and processing of big data, including Apache Hadoop.
Apache Hadoop is an open-source software framework, which splits files into large blocks and distributes these
blocks amongst the nodes in the cluster. To process the data, Hadoop sends code to the nodes that have the
required data, and the nodes then process the data in parallel. Hadoop supports horizontal scaling – that is, as data
grows additional servers can be added to distribute the load across them.
Many distributed database solutions use NoSQL (Not Only SQL) systems. NoSQL systems are increasingly being
used for big data, as they provide a mechanism for storage and retrieval of data in a variety of structures, including
relational, key-value, graph or documents. The Oxford University, in collaboration with Kaiser Permanente (case
study 13.1.2 Kaiser Permanente) are using a NoSQL database (RDFox) to investigate how to perform complex
queries efficiently across extremely large numbers of patient records. RDFox is a highly scalable and performant
NoSQL database that is readily distributed across parallel processing units.
9 Database Queries
Practically all analytical processes are driven by database queries. A database query is a machine readable question
presented to a database in a predefined language.
Unlike other code systems, which either have no hierarchy or a hierarchy that is fully represented within the code
(e.g. H65.9), just retrieving the SNOMED CT codes recorded in a patient record does not fully utilize the analytics
capabilities of SNOMED CT. To get the most benefit from using SNOMED CT in patient records, one must be able to
not only query the records themselves, but also query SNOMED CT.
In this section, we describe how record and terminology queries can work together to perform powerful queries
over SNOMED CT enabled data. In section 10 User Interface Design, we will then consider how user interfaces can be
designed to make these queries more accessible to non-technical users.
defined diseases which have a preferredTerm (in the GB English language reference set) that contains the substring
"heart".
<< 64572001 |disease| {{ definitionStatus = 900000000000073002 |defined|,
preferredTerm = "*heart*", languageRefSet = 900000000000508004 |GB English| }}
B2i's Snow Owl Terminology Server (see case study) supports the execution of SNOMED CT queries using a
precursor to the SNOMED CT Expression Constraint Language (referred to as 'Extended SNOMED CT Compositional
Grammar' or 'ESCG').
<<404684003|Clinical finding| :
246454002|Occurrence| = 255399007|Congenital|,
370135005|Pathological process|=<<263680009|Autoimmune|
[Link] { println "ID: ${[Link]}, ${[Link]}" } //prints the result to the console
Standardized APIs for terminology services are also available. In particular, HL7's Common Terminology Services 2
(CTS 2) provides a standardized API that supports access to terminology servers that may contain a variety of code
systems, including SNOMED CT.
One way of achieving this is to include a list of all possible SNOMED CT codes that are required within the query. For
example, to find the patients with a Respiratory system disorder, one could include every individual code that is a
descendant of 50043002 |disorder of respiratory system| (around 3000 codes) within the patient record query. Using
SQL, this would look like:
SELECT DISTINCT PatientID FROM ProblemList
WHERE Code IN (140004, 181007, 222008, 490008, 517007, 599006, 652005, 663008, etc)
However, this creates a lengthy query that is difficult to both validate and maintain. In some cases, it may also be
too long to be accepted by the query engine.
Another approach would be to use a subset of respiratory system disorders, and load these into a separate table –
for example:
SELECT DISTINCT PatientID FROM ProblemList
WHERE Code IN (SELECT * FROM RespiratorySystemDisorders)
However, it may not be scalable to create a new table for each terminology query that is required.
A third approach would be to use a transitive closure table to test the hierarchical relationship between each
SNOMED CT code and 50043002 |disorder of respiratory system|. For example,
SELECT DISTINCT PatientID FROM ProblemList PL
INNER JOIN SNOMEDTransitiveClosure TC ON [Link] = [Link]
WHERE [Link] = 50043002
However, to support a more advanced style of query that utilizes the full capabilities of SNOMED CT, SNOMED CT
query languages or API calls must be embedded within the patient record query languages. For example, the
following queries use the SNOMED CT Expression Constraint Language embedded within a SQL query.
SELECT DISTINCT PatientID FROM ProblemList
WHERE Code IN (< 50043002 |disorder of respiratory system| )
SELECT DISTINCT PatientID FROM ProblemList
WHERE Code IN (<< 404684003 |clinical finding|:
363698007 |finding site| = << 39057004 |pulmonary valve structure|,
116676008 |associated morphology| = << 415582006 |stenosis|)
Figure 10.1-1: B2i's Snow Owl interface for authoring Simple Reference Sets
Figure 10.1-2: NHS Data Migration Workbench interface for authoring queries
Figure 10.1-3: B2i's Snow Owl interface for authoring text-based queries
SNOMED CT's rich polyhierarchy provides a vast number of potential 'aggregators' for analytics, and possible views
of SNOMED CT encoded data. This polyhierarchy can be exploited by visual exploratory data analysis tools to
enable the visual inspection of complex datasets.
For example, the NHS have been using the Gephi open-source network analysis and visualization software, to
explore SNOMED CT encoded renal datasets.
The first representation (in Figure 10.2-2) shows a projection of all concepts directly coded in the patient data, with
the node size reflecting the frequency of each code. 36689008 |acute pyelonephritis| has a high frequency in the
data and is therefore represented by a big node, while 254915003 |clear cell carcinoma of kidney| has a low
frequency in the data and is therefore represented by a small node.
Using a simple concentration algorithm, which aggregates subsumed concepts up to a given threshold, the
representation in Figure 10.2-3 is achieved. In this representation, the size of the purple nodes reflects the
frequency of each code plus its subtypes, the size of the blue nodes reflects the frequency of each code's subtypes,
and the size of the red nodes reflects the frequency of each code on its own. This enables trends to be visually
detected – for example, 36171008 |glomerulonephritis| and 36171008 |acute pyelonephritis| - even when the
frequency of these concepts themselves is relatively low.
Figure 10.2-2: Gephi representation of renal dataset showing direct code usage
Figure 10.2-3: Gephi representation of renal dataset with direct and inherited code usage
Innovative data visualization and analysis tooling is expected to become much more widespread as the powerful
capabilities of SNOMED CT content are increasingly utilized.
11 Challenges
This section discusses some of the challenges which should be considered when performing analytics over clinical
data. Most of these challenges result from the fundamental nature of health record information, and therefore exist
irrespective of the code system used. Many of these challenges are able to be mitigated using the unique features of
SNOMED CT. The challenges fall into four broad categories:
• Reliability of patient data
• Terminology / information model boundary issues
• Concept definition issues
• Versioning
SNOMED CT offers significant advantages, compared to other code systems, in both performing powerful clinical
analytics, and in mitigating many of these challenges.
these codes to some degree. Any changes that effect the clinical meaning of the data may have an impact on the
quality of data analytics. SNOMED CT helps to mitigate this by supporting the representation of equivalence maps,
which can be used when the use case requires.
Even when the same information model is used, different systems may populate this model with differing levels of
precoordination. For example, the three clinical systems shown below in Figure 11.2-2 each collect data about a
'suspected lung cancer' diagnosis in a different way. For this reason, when given a common data model (as shown
in Figure 11.2-3), different systems may populate this in different ways. When this occurs, queries must be careful to
consider all possible representations of the data, to ensure that contextual and qualifying information about each
code is correctly interpreted.
SNOMED CT is in the unique position to be able to resolve many of these challenges, using the techniques described
in sections 6.4 Description Logic Over Terminology and 6.5 Description Logic Over Terminology and Structure. For
example, SNOMED CT enables the computation of equivalence and subsumption between alternative
representations of data. For example, the postcoordinated expression 22253000 |pain| : 363698007 |finding site| =
56459004 |foot| (which can be represented either in a single data element or using two separate data elements for
22253000 |pain| and 56459004 |foot|) can be automatically determined to be equivalent to the precoordinated
concept 47933007 |foot pain| (stored in a single data element).
Some cases exist, however, where SNOMED CT is not currently able to automatically establish equivalence. These
cases primarily relate to concepts for which the SNOMED CT concept model does not yet fully model their meaning.
For example, the two approaches for representing a 'twin pregnancy' shown below ( Figure 11.2-4) are currently not
able to be computed as equivalent using SNOMED CT.
Figure 11.2-4: Two non-equivalent ways of recording a twin pregnancy using SNOMED CT
The SNOMED CT concept model continues to be extended to support equivalence and subsumption testing within
an increasing number of hierarchies of SNOMED CT.
64572001 |disease| :
246075003 |causative agent| = 113858008 |mycobacterium tuberculosis complex|
363698007 |finding site| = 39607008 |lung structure|
This expression would not be returned by the following query:
<< 154283005 |pulmonary tuberculosis|
However, the query:
Incomplete Modelling
The SNOMED CT Concept Model continues to evolve to allow more concepts to be fully defined. For example, the
'Observable Entity' and 'Substance' hierarchies each have new concept models being developed, which will allow
these concepts to be more fully defined in future releases of SNOMED CT. When the concept models for these
hierarchies are incorporated, SNOMED CT's expressive power and analytics capabilities will be further expanded.
In those hierarchies for which the concept model has been established for some time (e.g. Clinical finding), ongoing
expansion to SNOMED CT's formal logical definitions continues. However, there still remains some concepts which
do not yet have all possible defining relationships included. This issue will be mitigated over time as more of
SNOMED CT's concepts continue to be modelled.
11.4 Versioning
A new version of the International Edition of SNOMED CT is released twice a year (in January and July). National
extensions mostly follow this cycle (albeit typically with a three month delay). However, some extensions (notably
those including medication related concepts) are released more frequently.
When a longitudinal health record is populated with clinical data over a number of years, it is quite possible that the
following may occur:
1. SNOMED CT concepts that were active at the time of recording have since been made inactive
2. SNOMED CT concepts that were primitive at the time of recording have since been defined
3. Reference sets that were used to populate pick lists may have changed
4. The SNOMED CT Concept Model that was used to construct expressions may have changed
To mitigate these versioning issues, SNOMED CT provides the following:
1. Each new version of the SNOMED CT International Edition that is released (in Release Format 2 -RF2)
includes a set of Delta files (containing all changes to the content since the last release), a set of Snapshot
files (containing the most recent version of every component that has ever been released in SNOMED CT),
and a set of Full files (containing every version of every component that has ever been released in SNOMED
CT). These files allow implementations to either incrementally adapt to new versions of SNOMED CT, or
alternatively load a complete current snapshot of SNOMED CT content (with or without old versions). When
longitudinal clinical records containing inactive concepts are queried, all prior descriptions and
relationships of these inactive concepts can still be queried using these snapshot files. SNOMED CT's RF2
distribution files also record the reason that each inactive component was inactivated, using 'historical
association' reference sets (see [Link].R Historical Association Reference Sets for more details).
2. SNOMED CT is maintained on the principle that every SNOMED CT concept identifier should retain its
semantic integrity over time, even when its logical definition changes. The semantics of a SNOMED CT
concept is established through its Fully Specified Name, and all changes to a concept's defining
relationships are intended to improve the machine-readable processing of these semantics. That said, it is
possible if required to determine what the logical definition of a concept was at any prior point in time using
a Full release of SNOMED CT.
3. SNOMED CT's reference sets and their members are all fully versioned in SNOMED CT's RF2. A Snapshot
release of a reference set includes the current version of every row that has ever been released (including
both active and inactive rows). A Full release of a reference set includes every version of every row that has
ever been released. Using this information, it is possible to adapt queries to consider both current and
former members of any given reference set.
4. The SNOMED CT Concept Model changes very rarely. When it does, however, any attributes that are retired
are retained as inactive concepts in the Snapshot and Full releases of SNOMED CT. It is expected that a
complete Machine Readable Concept Model (MRCM) of SNOMED CT will be published in the future, and that
this MRCM will be versioned in a manner that is consistent with other RF2 components.
We welcome additional input to this section and anticipate updates to this report as new information becomes
available.
The UK Terminology Centre Data Migration Workbench (DMWB) is designed to support the NHS Primary Care
Summary Care Record, Primary Care Systems of Choice and Data Migration programs. This tool demonstrates some
of the properties and advanced uses of the data migration and mapping products published by the UKTC and the
terminologies and classifications that they link.
The workbench uses SNOMED CT to perform novel and sophisticated analyses of patient data. It has immediate 'off
the shelf' international utility despite the inclusion of the UK-only terminologies within the standard tool
distribution.
The software contains SNOMED CT, Read Codes Version 2 and CTV3, maps between these and maps to ICD-10
International Edition (UK map not the same as the SNOMED International one) and OPCS Classification of
Interventions and Procedures (OPCS-4). The Workbench modules support:
1. Searching and browsing the hosted code systems;
2. Viewing maps between the hosted code systems;
3. Authoring analytics subsets (i.e. terminology query predicates) and their testing, maintenance and
'translation' between code systems;
4. Electronic Patient Record (EPR) data quality analysis and data repair; and
5. EPR reporting and case mix analysis.
Queries Tool
The Queries Tool offers advanced functionality for authoring, maintaining and testing query code sets (called
'clusters') or subset definitions within any of the supported terminologies or classifications. One major application
is to produce query sets which will return comparable results from patient records encoded with different code
systems. To assist with this, the tool translates subset definitions expressed using one terminology into subset
definitions expressed using another, allowing refinement by manual editing (see Figure 12.1.1-1 and Figure
12.1.1-2).
Figure 12.1.1-2: DMWB Queries Tool showing SNOMED CT asthma subset translated into
ICD-10
Analytics
The workbench data analytics tool runs cluster queries, combined with demographic data, to perform clinically
valuable case finding, case mix and caseload analysis.
The 'Overview' Report module includes:
• Basic demographics (population age, sex, ethnicity);
• Analyses of episodes with a SNOMED CT code;
• Counts by SNOMED CT supercategory;
• List of the 100 most frequently used individual SNOMED CT codes; and
• List of the 15 most common SNOMED CT codes for each age cohort.
The Trends module analyzes the frequency with which individual SNOMED CT codes are used in the EPR instance
data, looking for those whose recording frequency has changed over the course of the data collection period.
The Induce module performs a more sophisticated analysis of case mix and caseload trends within a clinical
department. Instead of returning the most frequently used individual codes, the Induce module attempts to
identify the most frequently used types of codes. For example, an emergency department may use roughly 500
different SNOMED CT codes for a laceration in a particular anatomical location. While none of the site-specific codes
may appear in a list of most common codes, the descendants of 312608009 | laceration| may collectively account for
a significant part of the department's workload.
The Graphs tool performs fundamentally the same query and search operations, but generates graphs based on the
patients or episodes identified, showing e.g. the age:sex distribution of patients in a defined casemix cohort, or the
changing incidence of one or more specified clinical phenomena (e.g. disease presentation, or procedure
performed) by year, quarter, month or day of the week. These graphs can be copied into documents.
1 [Link]
Kaiser Permanente (KP) has been involved in the development of SNOMED CT since its inception. Preceding this, KP
collaborated with the College of American Pathologists in the 1990's on the immediate predecessor of SNOMED CT
(SNOMED-RT). Some of the very earliest deployments of SNOMED CT have been within KP electronic patient record
systems.
The terminology services deployed within the KP HealthConnect electronic health record illustrate the practical use
of SNOMED CT as a key reference terminology within a multi-coding system environment. KP is also at the forefront
of realizing new possibilities offered by SNOMED CT using its description logic capabilities.
CMT uses SNOMED CT as a reference terminology, taking advantage of its poly-hierarchy and definitional attributes
to support advanced analytics – for example:
• Identifying patient cohorts with certain conditions for Population Care.
• Identifying subsets for use as "input criteria" for KPHC decision support modules, such as Best Practice
Alerts, Reminders, etc.
• Performing queries such as "find all conditions where |causative agent| is |Aspergillus (organism)|"
• Performing large aggregate queries, such as "find all patients coded with concepts in the cardiovascular
disorders subset"
In September 2010 Kaiser Permanente, IHTSDO and the US Department of Health and Human Services jointly
announced KPs donation of their CMT content and related tooling to SNOMED International. The donation consists
of terminology content (including several CMT subsets), tools to help create, manage and quality control
terminology.
1 [Link]
The National Release Centre of Denmark (National eHealth Authority) produces a SNOMED CT drug extension for
medications. The Danish SNOMED CT drug extension was primed by data extraction, cleansing and conversion of
content from the Danish Medicine Agency Database (DKMDB), which is primarily meant for pricing and stock
handling. The DKMDB was then complemented with SNOMED CT substances and their unique IDs. The Danish
SNOMED CT drug extension includes information such as trade names, substances, dose forms, strengths and units
of measure.
Building upon the Danish drug extension, the National eHealth Authority is working to introduce centralized
decision support (CDS) services for both primary care and hospital prescribing systems.
The CDS server will respond to web service requests from the various electronic medication systems and return
alerts and other prescribing information (see Figure 12.1.3-1)
Allergies Register
© Copyright 2021 International Health Terminology Standards Development Organisation 80
Data Analytics with SNOMED CT
(2021-03-11)
A group of allergy specialists, family practitioners and CDS experts are developing a standard set of information to
be used in a patient drug allergy register. A SNOMED CT subset, from the Drug Allergy (disorder) sub hierarchy in the
Findings hierarchy, is used to document allergies. .
Interactions Service
Based on an existing service, with data primarily drawn from peer-reviewed literature, the interaction database
describes 2,500 interactions between different drugs based on their ingredients.
The database contains a short description of all interactions and a recommendation of how the physician can
handle the interaction. The ingredients have been linked to SNOMED CT substances to directly inform the decision
support service.
Alert Filtering
The decision support platform will incorporate an alert filtering service in which physicians can set up their
personal preferences for the displaying of alerts. For example, the dose form hierarchy of SNOMED CT will be used
to enable filtering of unwanted alerts for specific dose forms (such as cutaneous dose forms).
1 [Link]
A major application for Natural Language Processing technologies is indexing collections of free text transcripts or
documents such that topic specific searches may be run on them. The challenge is to return ranked matches which
permit selection of texts with high sensitivity and high specificity (i.e. that relevant documents are rarely
overlooked and that irrelevant documents are rarely returned).
Clinical searches may be performed over transcripts or documents that reside in an electronic library, within
medical records, or the Internet. Examples of searches include:
• "Show me articles on this website concerned with inflammatory bowel disease"
• "Does this patient have transcripts in their record suggesting a heart rhythm disturbance?"
Bevan Koopman's PhD thesis explores semantic and statistical approaches to search. The intention is to move
beyond the limitations of plain keyword searching strategies for medical document retrieval. Characterizing these
limitations as the 'semantic gap' Bevan identifies and addresses several issues including:
• Vocabulary mismatch: hypertension vs. high blood pressure
• Granularity mismatch: antipsychotic vs. haloperidol
• Conceptual implication: e.g. from hemodialysis infer kidney failure
• Inferences of similarity e.g. comorbidities (anxiety and depression)
His specific aim was to determine whether graph-based features and the propagation of information over a graph
can provide an inference mechanism to bridge this semantic gap. As part of this work, he assessed the contribution
of using SNOMED CT data within the graphs used to drive inferences.
Documents were parsed and analyzed using Lemur – a highly versatile and customizable open source information
retrieval package developed at the University of Massachusetts. The construction of the graph was done using the
open source LEMON graph library. The graph was serialized using LEMON and stored inside the Lemur index
directory. For the MedTrack corpus, which was found to have a vocabulary size of 36,467 SNOMED CT concepts, the
resulting graph was 4.4MB.
Discussion
The findings of the thesis demonstrated that the graph based retrieval approaches using SNOMED CT derived data
performed better than other approaches on 'hard queries'. A number of additional insights were also revealed.
First, hard queries require inference and easy queries do not. Hard queries tended to be verbose and often
contained multiple dependent aspects to the query (for example, a procedure and a diagnosis concept). Re-ranking
using the Graph Inference model was effective here. Easy queries tended to have a small number of relevant
documents and an unambiguous query concept. For these queries, inference was not required and the Bag-of
concepts model was most effective. Overall, when valuable domain knowledge was provided by SNOMED CT, the
Graph Inference model was effective — either by returning new relevant documents or by effectively re-ranking
those selected. This again highlights the dependence on the underlying domain knowledge.
Regarding residual lack of sensitivity of all the IR strategies, Koopman suggests that an ideal ontology for
information retrieval would not only contain definitional but also assertional data – for example "captopril can be
used as a treatment of hypertension", "myocardial infarction [may] cause heart block" and "diabetes mellitus may
lead to renal failure".
1 [Link]
Migration to native SNOMED CT electronic patient records is in progress in the United Kingdom National Health
Service (NHS). In order to promote interoperability, usability and activity reporting, the NHS introduced a national
standard set of imaging codes in 2005 – the National Clinical Imaging Procedure code set (NCIP).
While SNOMED CT was the prime candidate for populating the NCIP, many Radiology Information Systems (RIS) and
Picture Archiving and Communication (PAC) systems at the time could not accommodate SNOMED CT 18-digit
concept identifiers or (up to) 255 character descriptions without disruptive and costly software changes. There was
also no consistent way to represent laterality of procedures, and some legacy systems required the creation of
separate orderable items for each laterality – for example 'Plain X-ray left wrist', 'Plain X-ray right wrist', and 'Plan X-
ray both wrists'. For these reasons, the NCIP code set was developed based on SNOMED CT, but with the addition of
unique identifiers compatible with legacy system's character limitations (6 alphabetic characters), up to 40
character human readable descriptions, and additional laterality metadata. For example, 60027007 |Radiography of
wrist| is represented within NCIP as:
SCT ID Laterality_ID Laterality Short_Code Preferred
NCIP short codes are 'meaningful', in that the modality of the procedure is defined by the first character of the code,
and the finding site and laterality are both explicitly represented in the code.
Each hospital submits mandatory data extracts using NCIP from both legacy and SNOMED CT capable RIS. In
addition to details of the imaging procedures, information about the referral source, patient type, demographics
and times of each imaging related event are also collected centrally. The data from all sites is then combined and
multiple reports are extracted. Hospitals can view their activity data via the iView web based reporting tool (see
Figure 12.1.5-1) and compare their activity with other centers.
Analytics on this central platform are wholly SNOMED CT based. SNOMED CT hierarchies support sophisticated
reports – for example, the monthly waiting times for Magnetic Resonance Imaging excluding Cardiac MRI and MRI
guided procedures is specified as:
• Includes hierarchy << 113091000 |Magnetic resonance imaging|
• Excludes hierarchy << 258177008 |Magnetic resonance imaging guidance|
• Excludes hierarchy << 241620005 |Magnetic resonance imaging of heart|
Figure 12.1.5-1: Detail of SNOMED CT based report on the NHS iView platform
1 [Link]
We welcome additional input to this section and anticipate updates to this report as new information becomes
available.
The 3M Healthcare Data Dictionary (HDD) is a controlled medical vocabulary server. The HDD has been continuously
expanded and maintained for over 15 years, both as a standalone product and embedded within several of 3M's
core products and services. The 3M HDD enables mapping and management of medical terminologies, integration
of content and standardization of healthcare data. The 3M Healthcare Data Dictionary incorporates a selection of
standard healthcare terminologies, including (but not limited to) SNOMED CT, LOINC, RxNorm, ICD-9-CM and
ICD-10-CM.
Concepts in the HDD are grouped and organized using both hierarchical and non-hierarchical relationships. One of
the hierarchical relationships in the HDD is SNOMED CT's 'is a' relationship which allows users to programmatically
use and analyze SNOMED CT concepts captured at various levels of granularity. The analytics capabilities of the
HDD are also extended through the use of other relationship types.
Data Warehousing
The content within the HDD makes a key contribution to analytics in several settings. For example one large
academic research institution uses the HDD to integrate over 100,000 medication concepts from disparate systems
for comprehensive data assimilation. Many of the medication concepts are mapped to RxNorm codes and linked
through a 'has ingredient' relationship to SNOMED CT codes.
The 3M HDD has a knowledge base and poly-hierarchical structure that defines the relationships between each
clinical drug. Figure 12.2.1-1 shows the relationships that exist for Ramipril including the links to SNOMED CT
content, which can be used to query the data warehouse. The knowledge base allows the hospital's researchers to
customize their searches by various levels of granularity and organize their clinical content into meaningful
relationships.
1 [Link]
12.2.2 Allscripts
Allscripts Healthcare Solutions, Inc. (Allscripts) is a provider of clinical, financial, connectivity and information
solutions and related professional services to hospitals, physicians and post-acute organizations. The Company
provides a variety of integrated clinical software applications for hospitals, physician practices and post-acute
organizations. For hospitals and health systems these applications include its Sunrise Enterprise suite of
clinical solutions, consisting of a range of acute care Electronic Health Record (EHR), integrated with financial/
administrative solutions, including performance management and revenue cycle/access management. The
Company's acute care solutions include Emergency Department Information System (EDIS), care management
and discharge management. 1
For more information please visit [Link]
Allscripts released their first version of Vocabulary Management utilizing SNOMED CT in 2005. Since then Allscripts
systems have been able to utilize SNOMED CT for clinical decision support and reporting. Allscripts uses a common
terminology platform for all three electronic health record systems: Sunrise Clinical Manager™, Sunrise Acute Care™
and Sunrise Ambulatory Care™.
When a query is performed over a health record that requires clinical terminology, the terminology service always
returns a SNOMED CT code. If the primary code stored in the health record is not SNOMED CT (e.g. ICD-9 or ICD-10),
then the terminology service performs the mapping to SNOMED CT, saves the SNOMED CT code in the health record
next to the original code (to make future queries more efficient), and returns the SNOMED CT code.
Allscripts Sunrise applications are able to link SNOMED CT to all orders, order form pick lists, observations pick lists
and results.
Reporting
The Allscripts Clinical Quality Management (CQM) is an automated chart abstraction and analytics system. CQM is
able to create population sets with SNOMED CT encoded patient records and use these patient sets for reporting.
CQM is a flexible, powerful reporting and analytics system presenting information in a variety of formats ranging
from simple list style reports, to Online Analytical Processing (OLAP) Data Cubes with Pivot reports.
Allscripts Clinical Performance Management (CPM) is a business intelligence solution for monitoring clinical
performance, improving patient outcomes and reducing costs. With prebuilt or customized reporting and
dashboards, healthcare leaders have powerful access to performance information enabling quality improvement
across the health enterprise.
Applications include:
• Alert usage analysis: User-customizable reports drill down into clinical decision support usage data
revealing the reasons and circumstances for bypassed or overridden alerts. Seeing the impact of decision
support enables a sharper focus on patient outcomes and supports the refinement of rule logic.
• Order-set usage analysis: Organizations can evaluate their order set usage patterns of computerized
provider order entry. By observing order set configuration, deployment and use, organizations can revise
them to enhance their effectiveness and usability.
• Clinician utilization analysis: Clinicians can examine the vast array of health issues and clinical observations
for discharged patients to support patient treatment decisions and protocol implementation. In addition,
patient cohorts can be tracked over time to determine if the proper treatments are being delivered.
1 [Link]
12.2.3 Apelon
Apelon is an international informatics company focusing on data standardization and interoperability. Leading
healthcare organizations use Apelon's products and services to better manage terminology assets. Apelon
solutions help healthcare application vendors, biomedical researchers, providers, biotech companies and
government agencies improve the quality, comparability, and accessibility of clinical information. 1
For more information please visit [Link]
SNOMED CT plays a central role in many Apelon products and projects. Apelon tools feature navigation and
visualization tools to support SNOMED CT in a variety of ways. Apelon also undertakes bespoke content
development and consultancy work in healthcare and biomedicine using SNOMED CT.
The Apelon Terminology Development Environment (TDE) software was used by the College of American
Pathologists to build and maintain the SNOMED CT International Edition prior to the formation of IHTSDO. Apelon
software continues to be used by major healthcare organizations and some National Release Centers to maintain
SNOMED CT extensions, maps and subsets.
1 [Link]/
Snow Owl is a clinical terminology platform developed by B2i Healthcare. The Snow Owl technology family is
deployed in over 2,500 locations in 83+ countries worldwide. The Snow Owl® terminology server has been licensed
by SNOMED International to form the basis of SNOMED International Terminology Server.
• Extensive support for expression constraints and semantic queries, including Extended SNOMED CT
Compositional Grammar and Groovy scripts.
• Distributed revision control system supports large teams of authors and reviewers working on independent
branches.
• Full support for SNOMED CT logical definitions (OWL 2 EL) with extended support for extensions using
advanced description logic features (OWL 2 DL) including datatype properties, universal restriction,
disjunction, etc.
• Standard distribution formats (e.g. SNOMED RF2, ICD-10 ClaML, LOINC csv)
• Traditional, white-label (embedded within client product), and source code licenses available.
The Singapore Drug Dictionary (SDD) is the biggest SNOMED CT extension - larger than SNOMED CT International
release itself. To support medication safety initiatives like medication management and adverse drug event
surveillance, the drug ontology makes use of Snow Owl's extended description logic support.
been placed into predefined buckets which don't provide the scale of complexity inherent in the original data. And
multiple information models represent the same semantic meaning in different ways.
Snow Owl Meaningful Query (MQ) allows semantic EHR queries on operational data stores without requiring
predefined structures like data warehouses or the presence of a single unified healthcare information model. The
system is optimized specifically for ad hoc queries of hundreds of millions of electronic health records. We combine
ontological reasoning over the EHRs with more traditional query methods to incorporate demographic and
ancillary data.
This query interface is being rolled out to all Singapore public hospitals and the national procurement office to
allow search and retrieval of pharmaceuticals contained within the Singapore Drug Dictionary ontology. All lexical
and semantic properties can be searched, including datatype properties and mappings to local code systems,
external terminologies like ATC, and internal procurement codes.
1 [Link]
12.2.5 Cambio
Cambio Healthcare Systems is a market leading Electronic Patient Record (EPR) company headquartered in
Stockholm, Sweden with offices in the UK, Sweden, Denmark and Sri-Lanka. Cambio COSMIC® is a patient-
centered integrated EPR system for comprehensive and clinical healthcare solutions with a focus on patient
safety. Cambio COSMIC® offers solutions within all healthcare sectors and is used by over 100,000 clinicians and
healthcare professionals. 1
For more information please visit [Link]
The Cambio COSMIC® Electronic Patient Record system has been under continuous development since 1993.
Cambio has applied innovations within healthcare informatics in areas such as information models, clinical
terminology and formal languages for expressing clinical decision support rules. The COSMIC® EPR combines
openEHR archetypes, SNOMED CT terminology and Guideline Definition Language rules in implementations which
benefit patients, clinical staff and healthcare enterprise management. Using these technologies, their system is
able to incorporate advanced analytics capabilities.
Decision Support
Cambio uses the Guideline Definition Language (GDL) to combine archetypes, terminologies and clinical decision
support rules. GDL provides:
• Bindings between archetype elements and variables in the rules;
• Rule expressions that are easily converted to industry rule engine languages;
• Bindings between local concepts used in the rules and concepts from SNOMED CT.
GDL rules can be used to trigger a variety of system actions, including pre-filling a form, proposing a test or
prescription, or sending a notification to the system user. The criteria for triggering actions from GDL rules may be
based on demographics data, the context of care (e.g. clinic or inpatient), current medications and diagnoses, or
observation values (e.g. lab results).
Decision support rules created in COSMIC® are authored using an editor. Figure 12.2.5-1 shows the high level view of
a rule for calculating a complex clinical risk-score (CHA2DS2-VASc Score for stroke risks stratification in atrial
fibrillation).
At the more detailed level, criteria may be defined using SNOMED CT concepts and subsets of concepts (as simple
refsets). Figure 12.2.5-2 below shows a section of a decision support rule which identifies patients with heart failure
Figure 12.2.5-2: Excerpt of GDL Rule showing binding to SNOMED CT and ICD
Identification of suitable patients for research studies is a particular challenge to clinicians working in a routine
clinic setting. A clinician may encounter eligible cases very rarely or simply not be familiar with the specific study
selection criteria. In order to study diseases, their courses and causes, what causes or affects a particular condition,
and the effects of different medications, researchers need trial subjects to meet specific criteria.
1 [Link]/
12.2.6 Caradigm
Caradigm is a population health company dedicated to helping organizations improve care, reduce costs and
manage risk. Caradigm analytics solutions provide insight into patients, populations and performance,
enabling healthcare organizations to understand their clinical and financial risk and identify the actions
needed to address it. Caradigm population health solutions enable teams to deliver the appropriate care to
patients through effective coordination and patient engagement, helping to improve outcomes and financial
results. 1
For more information please visit [Link]
Caradigm is a joint venture between Microsoft and GE Healthcare, which is dedicated to population health
management. Caradigm's cornerstone product is the Intelligence Platform. This platform can connect over 295
types of source systems, including Allscripts, Athenahealth, Cerner, Epic, GE, McKesson and Meditech. Data from
disparate systems within one or more healthcare organizations is collected, normalized and standardized to enable
applications to leverage this data in a unified and consistent way.
Caradigm's solutions provide explorative, comparative, predictive and guided elements aimed at analyzing the
disparate data and driving the insight that is gained into action. Caradigm's three main solution areas are:
1 [Link]
12.2.7 Cerner
Cerner Corporation is a supplier of healthcare information technology solutions, services, devices and
hardware. Cerner solutions optimize processes for healthcare organizations. These solutions are licensed by
approximately 9,300 facilities globally, including more than 2,650 hospitals; 3,750 physician practices 40,000
physicians; 500 ambulatory facilities, such as laboratories, ambulatory centers, cardiac facilities, radiology
clinics and surgery centers; 800 home health facilities; 40 employer sites and 1,600 retail pharmacies. The
Company operates in two segments: domestic and global. The domestic segment includes revenue
contributions and expenditures associated with business activity in the United States. The global segment
includes revenue contributions and expenditures linked to business activity in Argentina, Aruba, Australia,
Austria, Canada, Cayman Islands, Chile, China (Hong Kong), Egypt, England, France, Germany, Guam, India,
Ireland, Italy, Japan, Malaysia, Morocco, Puerto Rico, Qatar, Saudi Arabia, Singapore, Spain, Sweden,
Switzerland and the United Arab Emirates. 1
For more information please visit [Link]
Cerner Corporation's Millennium healthcare system manages terminologies, classifications and other code systems
within a terminology service - the Cerner Millennium Terminology (CMT) package. CMT accommodates and
integrates SNOMED CT International Release data and National extension content - such as concepts, relationships,
descriptions, subsets, maps etc.
At the Cerner Millennium user interface, content can be captured at point of care as SNOMED CT codes. Modules
used with SNOMED CT include: Problems and Diagnoses, Allergies, Procedures, Pharmacy Orders, Radiology Orders
and Cellular Pathology Reports.
Either Cerner or third party clinical encoding software can process SNOMED CT Diagnoses and Procedures captured
in Millennium and suggest ICD-10 and other classification codes to Clinical Coders for activity reporting and billing.
SNOMED CT is also used extensively behind the scenes to support more sophisticated analytic facilities within their
Natural Language Processing (NLP) tools and reporting tools. These Cerner products and services exploit unique
features and content of SNOMED CT to extend the power of these applications.
Data Warehousing
The Cerner product suite includes two data warehousing applications. These applications share much of their
terminology-related technology, including supporting subsumption queries with the CMT Concept Explode/
Transitive Closure facility, which utilizes the SNOMED 'is a' relationships.
The PowerInsight® Data Warehouse (PIDW) is an enterprise level data warehouse which updates on a nightly basis
from the live electronic patient record. PIDW services standard operational reporting, mandatory reports (e.g. for
National or State governments and regulatory bodies), and ad hoc queries e.g. individual lists of patients treated as
requested by clinicians for audit.
The Health Facts® Reporting supports the pooling of anonymized data from different healthcare organizations.
Health Facts data warehouse represents information of electronic records from millions of inpatient, emergency
department, and outpatient visits from participating U.S. health care organizations. (Data are encrypted and
secured to ensure patient confidentiality in compliance with HIPAA privacy regulations.) The reporting facilities
enable the analysis of patient care and process trends within a facility and provide comparisons to other Health
Facts contributors.
The Cerner data warehouse query tools include simple graphical user interfaces which directly create powerful
reports using the SNOMED CT hierarchy content. The screenshot below in Figure 12.2.7-1 shows a report of
attendances with diagnoses which are a descendent of the SNOMED CT concept 417746004 |traumatic injury|.
Other Applications
Cerner's Natural Language Processing (NLP) technology interprets the content of clinical notes through a complex
understanding of grammar, syntax, synonymy and phraseology. SNOMED CT's semantic content and concept
model enriches the analysis of the text in several ways including
• Concept recognition using synonyms – for example: |heart attack| is a synonym of |myocardial infarction|;
• The hierarchical relationships between concepts – for example: |pneumonia||is a||respiratory disease|;
• The identification of context, such as negation, certainty, subject and timing;
Computer Assisted Coding allows the extraction of appropriate SNOMED CT codes for automating coding and
billing processes.
Chart Search/Semantic Search is a tool that enables clinicians at the point of care to search in real time through a
patient's multiple charts, pathology reports and other documents, for topics such as 'heart disease' and 'diabetes'.
The interface, as shown below in Figure 12.2.7-2 has the look and feel of a World Wide Web search engine.
The searches can be filtered by date, document type etc. However the power of this approach is extended beyond
conventional search engine indexing by using SNOMED CT. Cerner's tools index SNOMED CT findings (including
diseases and symptoms) and procedures to make searches and queries over these domains faster. Cerner also hand
curates exceptions and associations between related concepts. A Clinical Significance Score is assigned to each
concept to allow documents to be sorted based on the probable relevance of the concepts in the document given
their context of use.
As shown in Figure 12.2.7-3 for example, when given the search term 'heart disease', the 'is a' hierarchies of
SNOMED CT enable recognition and return of documents which reference 'sinus bradycardia' and 'dilated
cardiomyopathy'.
1 [Link]
12.2.8 Clinithink
Clinithink's tools enable analytics and querying over SNOMED CT encoded patient data …TheCLiX
CNLPplatform transforms clinical narrative into rich structured data for healthcare providers and solution
vendors. CLiX ENRICH, powered by CLiX CNLP to support analytics, converts unstructured clinical data into
actionable data required to help solve today's toughest healthcare business problems. 1
For more information please visit [Link]
Building on the capabilities of its CLiX Clinical Natural Language Processing (CNLP) platform, Clinithink has created
a solution to enable the analysis of healthcare data sourced directly from clinical narrative called CLiX ENRICH. This
technology can be integrated into existing healthcare solutions to process any relevant narrative and encode the
key clinical elements – medications, diagnoses, procedures, symptoms and findings – using SNOMED CT. These
features can then be queried using powerful, user-definable SNOMED CT queries to present structured data in a
form that can be easily consumed by existing BI platforms.
For example, when the narrative text below is typed into diagnosis and observation fields (respectively), Clinithink
is able to provide a list of possible coding options for manual confirmation (as shown in Figure 12.2.8-1below).
1 [Link]
12.2.9 EMIS
EMIS clinical systems are used by over 5,000 healthcare organizations across the United Kingdom, from Primary
Care and out-of-hours services, to community care and sexual health services. By using the same system,
everyone can access the same information about their patients - no matter where they are treated - making the
prospect of integrated care a reality. With over 25 years experience of working with the NHS, EMIS are entrusted
with over 40 million patient records. 1
For more information please visit [Link]
Egton Medical Information Systems (EMIS) began in the 1980s in a rural practice in Egton in North Yorkshire, United
Kingdom. The founders, Dr Peter Sowerby and Dr David Stables, wrote the software and adopted the NHS Read
Code system during the 1980s. A series of systems have since been deployed in over half of England's primary care
practices. The latest product (EMIS Web) moved to a data center based architecture, thin client front end and built
in secure web based patient access facilities.
EMIS software looks after the patient records of nearly 40 million people in the UK (30 million using EMIS Web). More
than 2 in 3 of those patients can book appointments and order repeat medications online, and more than 1 in 3 can
view their own medical record.
EMIS Web features significant advances in terminology use with EMIS adopting a phased approach to SNOMED CT.
EMIS Web displays a familiar coding structure based on the construction of a Read Version 2 navigational hierarchy
within SNOMED CT. The principle design objective has been to enable SNOMED CT within the clinical system to
meet specific requirements, including:
• Supporting advanced decision support capabilities;
• Supporting interoperability within healthcare through the sharing of coded data;
• Supporting standards required in NHS General Practice Systems of Choice (e.g. the NHS mandates SNOMED
CT coding within the National Summary Care Record service);
• Broadening the scope of terminology use to support the recording of encounters in disciplines such as
dentistry and community healthcare;
• Supporting the mandatory requirement for the Electronic Prescription Service to natively use the UK
SNOMED CT drug extension (i.e. NHS dictionary of medicines and devices, dm+d).
By using coded structured records and providing access to the specialist domain terminology available in SNOMED
CT, EMIS has been able to extend the user base of EMIS Web by more than 20,000 new NHS users over the last year.
These include practice nurses, community matrons, child health and mental health nurses, palliative care
clinicians, diabetes specialists, physiotherapists and psychologists.
1 [Link]
12.2.10 Epic
Epic makes software for mid-size and large medical groups, hospitals and integrated healthcare organizations –
working with customers that include community hospitals, academic facilities, children's organizations, safety
net providers and multi-hospital systems. Epic's integrated software spans clinical, access and revenue
functions and extends into the home...Epic's integrated analytics and reporting – collectively named Cogito –
delivers current clinical intelligence and business intelligence based on role and workflow… Epic provides a
combination of flexible tools, content, data sources, distribution, training, and process to support decisions
throughout the health system with the best information available. 1
For more information please visit [Link]
Epic's electronic patient record systems use SNOMED CT as a reference terminology through the following
mechanisms:
• Mappings between a subset of Epic's standard data elements and SNOMED CT concepts;
• Mappings between diagnoses imported from other code systems (e.g. those used in Intelligent Medical
Object's and Health Language's products) and SNOMED CT concepts;
• Mappings from additional data elements to SNOMED CT concepts that can be created by clients using an
External Concept Mapping activity.
These mechanisms provide linking behind the scenes, so when clinicians add a diagnosis to the problem list by
selecting a familiar term, for example, they automatically select the corresponding SNOMED CT concept. This
SNOMED CT encoding creates powerful possibilities within Epic's decision support and reporting facilities.
Decision Support
The Epic system calls its decision support alerts 'Best Practice Advisories'. These customized alerts are
programmed to fire according to predetermined triggers, such as specific chief complaints, vital signs, diagnoses or
medications, either individually or in combination using inclusionary or exclusionary logic. Best Practice Advisories
can thus be used to notify clinicians to tend to important tasks, such as reviewing a patient's allergies, writing
orders, and completing charting. They can also present order sets and links to third party information sources
refined using the clinical context of the patient record being reviewed.
Epic customers can use SNOMED CT's hierarchical structure to group related records, making the setup for clinical
decision support much simpler than would be possible if users had to select records or clinical concepts
individually. For example, an administrator creating Best Practice Advisories for diabetic patients could use
73211009 |diabetes mellitus| within the SNOMED CT hierarchy as one of the criteria instead of identifying every
subtype of diabetes individually.
Reporting
Within Epic's integrated analytics and reporting suite (i.e. Cogito) customers have achieved benefits by using
SNOMED CT's clinical finding hierarchy to aggregate local diagnosis concepts.
The capability has for example been used by oncologists working with cancer-related ICD codes which are unsuited
to grouping diagnoses by stage. Using the mapped SNOMED CT codes they are able to facilitate the reporting of
staging data by utilizing the SNOMED CT hierarchy.
1 [Link]
The Multilex drug knowledge base is widely used throughout the UK and is integrated into clinical systems
across the whole healthcare community. The Multilex drug terminology holds clinical and commercial
information on more than 75,000 pharmaceutical products and packs and provides active clinical decision
support and referential medicines information for all healthcare professionals. 1
For more information please visit [Link]
Overview
First DataBank (FDB) were in the first wave of suppliers to recognize the potential of SNOMED CT and begin to
integrate support for SNOMED CT into their existing clinical decision support solutions. Their primary use of
SNOMED CT in the patient's electronic health record (EHR) is to detect safety issues arising from certain
combinations of medications, diagnoses and drug adverse reaction histories. In 2006 FDB introduced support for
products and packs encoded using the NHS SNOMED CT UK Drug Extension. In the following year FDB launched new
modules within the Multilex drug knowledge base supporting Drug-Condition Checking and Drug Sensitivity
(Allergy) checking for the SNOMED CT EHR.
System vendors implementing Multilex decision support within SNOMED CT-enabled medical record applications
include CSC (Lorenzo system), EPIC and JAC in secondary care, and CSE Servelec (RiO system) in community/mental
health. Currently only pre-coordinated expressions are supported by the live Multilex SNOMED CT based decision
support solutions.
1 [Link]
IMO produces a medical terminology service for healthcare solutions, allowing over 2,500 hospitals and 350,000
clinicians to focus on patient care. IMO bridges the information gap between clinicians, coders, and patients in the
US and internationally. IMO enable and support the accurate capture and preservation of clinical intent for clinical
documentation, decision support, reimbursement, reporting, data analysis, research, and health education.'
IMO's clinical interface terminology is designed pragmatically to capture clinical intent at point of care. However it
is also intended to enable and simplify the adoption of standard ontologies by vendor partners.
By choice, the editorial process requires all IMO interface terms to have one or many qualified maps to SNOMED CT.
Clients can then use SNOMED CT to drive reporting, analytics, clinical decision support, and research.
The following examples demonstrate how IMO uses SNOMED CT for analytical purposes:
1. Helping patients find health professionals who have expertise or interest in specific areas of medicine. These
areas include disorders, procedures, devices, medications, patient demographics, and medical specialties.
These areas of expertise or interest include those that are self-reported by clinicians and those documented
in clinical encounters. The search algorithms use hierarchies in SNOMED CT to retrieve and rank search
results.
2. Helping clinicians use patient diagnoses and procedures documented at varying levels of granularity to find
appropriate patient education materials using SNOMED CT is-a hierarchies.
3. Grouping together related clinical concepts in patient records for creating focused patient reports and
driving clinical workflows.
4. Forming subsumption queries for cohort selection within patient data repositories and document libraries.
5. IMO uses natural language processing (NLP) to extract information coded in SNOMED CT from clinical
narratives.
1 [Link]