0% found this document useful (0 votes)
16 views106 pages

Data Analytics

Uploaded by

Jeff Horsager
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views106 pages

Data Analytics

Uploaded by

Jeff Horsager
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Analytics

with SNOMED CT

Publication date: 2021-03-11


Web version link: [Link]
SNOMED CT document library: [Link]

This PDF document was generated from the web version on the publication date shown
above. Any changes made to the web pages since that date will not appear in the PDF.
See the web version of this document for recent updates.

© Copyright 2021 International Health Terminology Standards Development Organisation


Data Analytics with SNOMED CT
(2021-03-11)

Table of Contents
1 Executive Summary .................................................................................................................................... 2
2 Introduction................................................................................................................................................. 4
Background ..................................................................................................................................................................4
Purpose.........................................................................................................................................................................4
Scope ............................................................................................................................................................................4
Audience .......................................................................................................................................................................4
Document Overview ....................................................................................................................................................4
3 Analytics Overview ...................................................................................................................................... 6
3.1 Definition ................................................................................................................................................................6
3.2 Scope and Purpose ................................................................................................................................................6
3.3 Substrates for Analytics .........................................................................................................................................8
3.4 Examples of Approaches........................................................................................................................................9
4 SNOMED CT Overview ............................................................................................................................... 10
4.1 Concepts ...............................................................................................................................................................10
4.2 Descriptions..........................................................................................................................................................10
4.3 Relationships........................................................................................................................................................11
4.4 Concept Model .....................................................................................................................................................11
4.5 Expressions...........................................................................................................................................................11
4.6 Reference Sets......................................................................................................................................................11
4.7 Description Logic Features ..................................................................................................................................12
4.8 Benefits of Using SNOMED CT for Analytics ........................................................................................................12
5 Preparing Data for Analytics ..................................................................................................................... 13
5.1 Natural Language Processing..............................................................................................................................13
5.2 Mapping Other Code Systems to SNOMED CT ....................................................................................................17
6 SNOMED CT Analytic Techniques ............................................................................................................. 20
6.1 Subsets .................................................................................................................................................................20
6.2 Subsumption ........................................................................................................................................................23
6.3 Using Defining Relationships...............................................................................................................................25
6.4 Description Logic Over Terminology...................................................................................................................30
6.5 Description Logic Over Terminology and Structure...........................................................................................32
6.6 Using Statistical Classifications...........................................................................................................................35
7 Task-Oriented Analytics............................................................................................................................ 38
7.1 Point of Care Analytics .........................................................................................................................................39
7.2 Population-Based Analytics.................................................................................................................................41

© Copyright 2021 International Health Terminology Standards Development Organisation i


Data Analytics with SNOMED CT
(2021-03-11)

7.3 Clinical Research ..................................................................................................................................................43


8 Data Architectures..................................................................................................................................... 46
8.1 Patient Records for Analytics...............................................................................................................................46
8.2 Data Warehouse ...................................................................................................................................................48
8.3 Virtual Health Record ...........................................................................................................................................49
8.4 Distributed Storage and Processes .....................................................................................................................51
9 Database Queries ...................................................................................................................................... 53
9.1 Terminology Queries............................................................................................................................................53
9.2 Patient Record Queries ........................................................................................................................................54
10 User Interface Design .............................................................................................................................. 56
10.1 Query Interface...................................................................................................................................................56
10.2 Results Visualization ..........................................................................................................................................63
11 Challenges ............................................................................................................................................... 67
11.1 Reliability of Patient Data ..................................................................................................................................67
11.2 Terminology / Information Model Boundary Issues.........................................................................................68
11.3 Concept Definition Issues ..................................................................................................................................72
11.4 Versioning ...........................................................................................................................................................73
12 Appendix - Analytics Case Studies.......................................................................................................... 75
12.1 Project Case Studies...........................................................................................................................................75
12.2 Vendor Case Studies ..........................................................................................................................................85

© Copyright 2021 International Health Terminology Standards Development Organisation ii


Data Analytics with SNOMED CT
(2021-03-11)

The Data Analytics with SNOMED CT guide reviews current approaches, tools and techniques for performing
data analytics using SNOMED CT and to share developing practice in this area. It is anticipated that this report
will benefit members, vendors and users of SNOMED CT by promoting a greater awareness of both what has
been achieved, and what can be achieved by using SNOMED CT to enhance analytics services.

Web browsable version: [Link]


SNOMED CT Document Library: [Link]
© Copyright 2021 International Health Terminology Standards Development Organisation, all rights reserved.

This document is a publication of International Health Terminology Standards Development Organisation, trading as SNOMED International.
SNOMED International owns and maintains SNOMED CT®.

Any modification of this document (including without limitation the removal or modification of this notice) is prohibited without the express
written permission of SNOMED International. This document may be subject to updates. Always use the latest version of this document
published by SNOMED International. This can be viewed online and downloaded by following the links on the front page or cover of this
document.

SNOMED®, SNOMED CT® and IHTSDO® are registered trademarks of International Health Terminology Standards Development Organisation.
SNOMED CT® licensing information is available at [Link] For more information about SNOMED International and
SNOMED International Membership, please refer to [Link] or contact us at info@[Link].

© Copyright 2021 International Health Terminology Standards Development Organisation 1


Data Analytics with SNOMED CT
(2021-03-11)

1 Executive Summary
SNOMED CT is a clinically validated, semantically rich, controlled terminology designed to enable effective
representation of clinical information. SNOMED CT is widely recognized as the leading global clinical terminology
for use in Electronic Health Records (EHRs). SNOMED CT enables the full benefits of EHRs to be achieved by
supporting both clinical data capture, and the effective retrieval and reuse of clinical information.
The term 'analytics' is used to describe the discovery of meaningful information from healthcare data. Analytics
may be used to describe, predict or improve clinical and business performance, and to recommend action or guide
decision making.
1 Using SNOMED CT to support analytics services can enable a range of benefits, including:
• Enhancing the care of individual patients by supporting:
Retrieval of appropriate information for clinical care
Guideline and decision support integration
Retrospective searches for patterns requiring follow-up
• Enhancing the care of populations by supporting:
Epidemiology monitoring and reporting
Research into the causes and management of diseases
Identification of patient groups for clinical research or specialized healthcare programs
• Providing cost-effective delivery of care by supporting:
Guidelines to minimize risk of costly errors
Reducing duplication of investigations and interventions
Auditing the delivery of clinical services
Planning service delivery based on emerging health trends
SNOMED CT has a number of features, which makes it uniquely capable of supporting a range of powerful analytics
functions. These features enable clinical records to be queried by:
• Grouping detailed clinical concepts together into broader categories (at various levels of detail);
• Using the formal meaning of the clinical information recorded;
• Testing for membership of predefined subsets of clinical concepts; and
• Using terms from the clinician's local dialect.
SNOMED CT also enables:
• Clinical queries over heterogeneous data (using SNOMED CT as a common reference terminology to which
different code systems can be mapped);
• Analysis of patient records containing no original SNOMED CT content (e.g. free text);
• Powerful logic-based inferencing using Description Logic reasoners;
• Linking clinical concepts recorded in a health record to clinical guidelines and rules for clinical decision
support; and
• Mapping to classifications, such as ICD-9 or ICD-10, to utilize the additional features that these provide.

Analytics tasks, which may be enabled or enhanced by the use of SNOMED CT techniques, can be considered in
three broad categories:
1. Point-of-care analytics, which benefits individual patients and clinicians. This includes historical summaries,
decision support and reporting.
2. Population-based analytics, which benefits populations. This includes trend analysis, public health
surveillance, pharmacovigilance, care delivery audits and healthcare service planning, and
3. Clinical research, which is used to improve clinical assessment and treatment guidelines. This includes
identification of clinical trial candidates, predictive medicine and semantic searching of clinical knowledge.
While the use of SNOMED CT for analytics does not dictate a particular data architecture, there are a few key
options to consider, including:
• Analytics directly over patient records;
• Analytics over data exported to a data warehouse;

© Copyright 2021 International Health Terminology Standards Development Organisation 2


Data Analytics with SNOMED CT
(2021-03-11)

• Analytics over a Virtual Health Record (VHR);


• Analytics using distributed storage and processing; and
• A combination of the above approaches.
Practically all analytical processes are driven by database queries. To get the most benefit from using SNOMED CT
in patient records, record-based queries and terminology-based queries must work together to perform integrated
queries over SNOMED CT enabled data. To this end, SNOMED International is developing a consistent family of
languages to support a variety of ways in which SNOMED CT is used. Clinical user interfaces can also be designed to
harness the capabilities of SNOMED CT, and to make powerful clinical querying more accessible. Innovative data
visualization and analysis tools are becoming more widespread as the capabilities of SNOMED CT content are
increasingly utilized.
A number of challenges exist when performing analytics over clinical data, irrespective of the code system used.
These include the reliability of patient data, terminology/information model boundary issues, concept definition
issues and versioning. Many of these challenges, however, are able to be mitigated using the unique features of
SNOMED CT.
A number of software vendors are now realizing the competitive advantage that using SNOMED CT can provide to
unlock the analytics potential of clinical data. Several commercial tools are now available that support analytics
using SNOMED CT, while others are following a roadmap of increasing functionality driven by SNOMED CT.
As the SNOMED CT encoding of healthcare data increases, so too have the benefits being realized from analytics
processes performed over this data.

1 Wikipedia, Analytics, 2014, [Link]

© Copyright 2021 International Health Terminology Standards Development Organisation 3


Data Analytics with SNOMED CT
(2021-03-11)

2 Introduction
Background
SNOMED CT is a clinically validated, semantically rich, controlled terminology. SNOMED CT is comprised of
meaning-based concepts, human-readable descriptions and machine-readable definitions. SNOMED CT is used
within electronic health records to support data capture, retrieval, and subsequent reuse for a wide range of
purposes. SNOMED CT is also used to enable or enhance analysis of patient records and other clinical documents
containing no original SNOMED CT content.
SNOMED CT hierarchies and formal concept definitions allow selective information retrieval to support analysis –
from patient-based queries to operational reporting, public health reporting, strategic planning, predictive
medicine and clinical research. As the SNOMED CT encoding of healthcare data increases, so too have the benefits
being realized from analytics processes performed over this data.

Purpose
The purpose of this document is to review current approaches, tools and techniques for performing data analytics
using SNOMED CT and to share developing practice in this area. It is anticipated that this report will benefit
members, vendors and users of SNOMED CT by promoting a greater awareness of both what has been achieved,
and what can be achieved by using SNOMED CT to enhance analytics services.

Scope
This document presents different data approaches, tools, terminology techniques, query languages, data
architectures and user interfaces that may be used in performing analytics using SNOMED CT. Analytics services
considered include patient-based queries, operational reporting, the application and audit of evidence-based
medical practice, strategic planning, predictive medicine, public health reporting and clinical research. The benefits
and challenges of these approaches are also presented. The case study summaries describe a selection of SNOMED
CT analytics projects and tools.
This document does not provide an exhaustive list of analytics projects and tools, and does not mandate a specific
approach. The development of clinical case definitions 1 is also outside of the scope of this document.

Audience
The target audience of this document includes:
• Members who wish to learn about current analytics activities in other jurisdictions and inform future
directions;
• Clinicians, informatics specialists and technical staff involved in the planning, management, design or
implementation of clinical record applications or healthcare analytics tools;
• Software vendors, data analysts, epidemiologists and others designing SNOMED CT based solutions.
This document assumes a basic level of understanding of SNOMED CT. For background information it is
recommended that the reader refers to the SNOMED CT Starter Guide.

Document Overview
This document presents an introduction to analytics over data with SNOMED CT content.
Section 1 (Executive Summary) provides a concise summary of the document.
Section 2 (Introduction) introduces the document by explaining the background, purpose, scope, audience and
overview of the document.

© Copyright 2021 International Health Terminology Standards Development Organisation 4


Data Analytics with SNOMED CT
(2021-03-11)

Section 3 (Analytics Overview) introduces the topic by presenting a definition of analytics and describing the scope,
purpose and substrates of analytics services which use SNOMED CT.
Section 4 (SNOMED CT Overview) describes the main features of SNOMED CT which may be used to support
analytics over health data, and the specific benefits that using SNOMED CT enables.
Section 5 (Preparing Data for Analytics) describes some approaches used to prepare clinical data for analytics using
SNOMED CT, including mapping and natural language processing.
Section 6 (SNOMED CT Analytics Techniques) presents a range of techniques for using SNOMED CT to perform data
analytics, including using value sets, subsumption, defining relationships and description logic.
Section 7 (Task-Oriented Analytics) looks at how these SNOMED CT based techniques can be used to assist with
specific analytics tasks for point of care analytics, population health monitoring and reporting, and clinical
research.
Section 8 (Data Architectures) presents a number of approaches for architecting analytics services, including
querying directly over patient data, using a data warehouse, querying a virtual medical record and using distributed
storage and processes.
Section 9 (Database queries) considers the query languages that are needed to perform analytics over the
combination of the patient record and terminology content.
Section 10 (User Interface Design) presents a selection of user interface styles that may be used with SNOMED CT to
support querying and results visualization.
Section 11 (Challenges) discusses some of the challenges which are faced when performing analytics over SNOMED
CT enabled data, including the reliability of patient data, information model/terminology boundary issues, concept
definition issues, versioning and inactive content.
Two appendices to this report present a variety of project case studies and vendor tooling case studies respectively.
These appendices, which are referenced extensively throughout this document, can be found at [Link]
analyticscasestudies .

1 [Link]

© Copyright 2021 International Health Terminology Standards Development Organisation 5


Data Analytics with SNOMED CT
(2021-03-11)

3 Analytics Overview
3.1 Definition
The term 'analytics' is used broadly in this document to describe the process of extracting useful information from
healthcare data.

Analytics is the discovery and communication of meaningful patterns in data… Analytics may be applied to
business data to describe, predict and improve business performance. The insights from data are used to
recommend action or to guide decision making.
1

Most analytical processes are driven by database queries. A 'query' is a means for retrieving information from a
database consisting of a machine readable question presented to the database in a predefined format. Queries are
used to inform or contribute to a human-readable report or produce a machine-actionable response. A human-
readable report may be a list of patients, a graph, historical or projected resource utilization figures, or a summary
dashboard display. Machine-actionable responses may include populating an order for a new laboratory test, based
on the results of a previous test, or placing an order to restock medical devices on a hospital ward.

1 Wikipedia, Analytics, 2014, [Link]

3.2 Scope and Purpose


Full benefits of electronic health records only accrue with the implementation of effective retrieval and reuse of
clinical information. The scope of analysis of health record data may cover:
• An individual patient, across time and/or care providers;
• An individual healthcare worker;
• Patient groups or cohorts, based on demographics, diagnoses, treatments or interventions;
• Enterprise groups, based on teams, wards, clinics, institutions or providers;
• Geographical groups, based on a local area, town, region or country.
Figure 3.2-1 illustrates the three main purposes of analytics with SNOMED CT. These are:
1. Clinical assessment and treatment;
2. Population monitoring; and
3. Research.

© Copyright 2021 International Health Terminology Standards Development Organisation 6


Data Analytics with SNOMED CT
(2021-03-11)

Figure 3.2-1: Purposes of analytics with SNOMED CT

SNOMED CT may be used to support analytics that:

© Copyright 2021 International Health Terminology Standards Development Organisation 7


Data Analytics with SNOMED CT
(2021-03-11)

• Improves the care of individual patients by enabling:


Retrieval of relevant information that better supports clinicians in assessing the condition and needs
of a patient
Clinical records to be integrated with decision support tools to guide safe, appropriate and effective
patient care – for example, allergy checking and potential drug contraindications identified at the
point of prescribing
Reduction in the duplication of investigations and interventions through the effective retrieval of
shared information about the patient
Meaning-based sharing of clinical information that is collected by different members of the health
care team at different times and places (and potentially in different languages)
Identification of patients requiring follow-up or changes to treatment based on updated guidelines
Wellness management, for example, using genetic and behavioral risk profiles.
Context-sensitive presentation of guidelines and care pathways within the user interface
Labor-saving decision support systems for clinicians
Adaptive pick lists in clinical user interfaces
Professional logs and performance tracking for clinicians
Work list generation, for example, patients requiring follow-up based on specific criteria
Workload profiling and monitoring.
• Improves the care of populations by enabling:
Epidemiological monitoring and reporting, for example, monitoring of epidemic outbreaks, or
hypothesis generation for the causes of diseases
Audit of clinical care and service delivery
Systems that measure and maximize the delivery of cost-effective treatments and minimize the risk
of costly errors
• Supports evidence-based healthcare and clinical knowledge research by enabling:
Identification of clinical trial candidates
Research into the effectiveness of different approaches to disease management
Clinical care delivery planning, for example, determining optimum discharge time
Planning for future service delivery provision based on emerging health trends, perceived priorities
and changes in clinical understanding.

3.3 Substrates for Analytics


Analytics with SNOMED CT may be deployed on a wide range of data sources as summarized in the table below.
These data sources are also known as the 'substrate' of the analytics. Please note that data which is not natively
coded using SNOMED CT may be transformed using one of the techniques described in section 5 Preparing Data for
Analytics. These techniques may be used to transform heterogeneous data recorded using free text or a variety of
code systems into SNOMED CT, which can serve as a common reference terminology for analysis.

Table 3.3-1: Direct and indirect substrates for SNOMED CT based analytics

Analytics Substrate Examples Coding Information Model

Unstructured free text document Dictated clinical letter Natural language None or informal headings

Typed discharge summary


letter

© Copyright 2021 International Health Terminology Standards Development Organisation 8


Data Analytics with SNOMED CT
(2021-03-11)

Analytics Substrate Examples Coding Information Model

Structured documents with free text fields Assessment form Natural language Standardized headings and
fields

Discharge summary form

Structured documents with free text and post- Discharge summary form with Classifications (e.g. ICD) Formal information model
coded classification (i.e. added by clinical post-coded classification (typically simple)
coders after the clinical event

Structured documents with non-SNOMED CT Standalone clinical Local code system, Formal information model
coding (e.g. proprietary, local or other coding application using controlled vocabulary or
system) departmental codes legacy clinical terminology

Enterprise-wide healthcare
system using local dictionaries
and pick-lists

Electronic patient record using


regional coding system (such
as UK Primary Care systems)

Structured documents with SNOMED CT Cardiology report SNOMED CT Formal information model
content

GP event summary

'Big data' data store Data warehouse Various coding systems Mixture of both structured and
unstructured data

Data store containing a


mixture of substrates

3.4 Examples of Approaches


There are a number of ways in which SNOMED CT can be used in systems to support analytics, including:
• Analyzing free text with clinical Natural Language Processing (NLP) techniques, which use SNOMED CT as a
resource;
• Mapping coded clinical data from SNOMED CT to a classification, to enable analysis using the features of the
classification;
• Querying clinical data using the machine-processable definitions of clinical concepts defined in SNOMED CT;
• Mapping clinical data captured using a variety of code systems into SNOMED CT, to enable analysis over
heterogeneous data using a common reference terminology.
These approaches (and others) are described in more detail in the following chapters.

© Copyright 2021 International Health Terminology Standards Development Organisation 9


Data Analytics with SNOMED CT
(2021-03-11)

4 SNOMED CT Overview
SNOMED CT is a clinical terminology containing concepts, with unique meanings and formal logic-based
definitions, organized into hierarchies. The clinical content of SNOMED CT includes diagnoses and other clinical
findings, clinical observations, drug products, organisms, specimen types, body structures, and surgical and non-
surgical procedures.
SNOMED CT enables clinical information to be consistently represented at an appropriate level of detail within
electronic health records. The relationships within SNOMED CT then facilitate meaning-based retrieval of this
information at the preferred level of detail for the given query. This provides significant flexibility and facilitates the
integration of data from divergent models of use, such as different user interfaces or databases, into convergent
models of meaning, such as for the representation of data for reporting or statistical analysis purposes. Clinical
systems can thereby query and analyze electronic health record data recorded in different settings, at varying levels
of granularity and across multiple axes. This enables SNOMED CT to support a variety of clinical processes, which
may require either detailed or high-level information - from investigation, to diagnosis and clinical research.
SNOMED CT content is represented using three main types of component:
• Concepts - unique clinical meanings
• Descriptions - human readable terms used to refer to a concept
• Relationships - links between concepts that help to define the meaning of each concept
In addition to these three types of components, SNOMED CT also supports:
• Expressions – a structured combination of one or more concept identifiers used to represent a new clinical
meaning
• Reference sets – a mechanism for representing references to SNOMED CT components for a variety of
purposes, including subsets, aggregation hierarchies, maps and language preferences
In this section we introduce these SNOMED CT features and explain how they may be used to support analytics over
health data. For more detailed information about SNOMED CT features, please refer to the SNOMED CT Starter
Guide and the SNOMED CT Technical Implementation Guide.
We also discuss the specific benefits enabled by using SNOMED CT. For more details about the benefits of SNOMED
CT please refer to Building the Business Case for SNOMED CT.

4.1 Concepts
SNOMED CT concepts represent clinical meanings. Each concept has a permanent concept identifier, which
uniquely identifies the clinical meaning. For example:
• 22298006 |myocardial infarction|
• 160341008 |family history: epilepsy|
• 399208008 |plain chest X-ray|
• 319996000 |simvastatin 10mg tablet|
SNOMED CT's concepts, and their logic-based definitions, allow analytics services to perform meaning-based
queries, rather than purely lexical (or string-matching) searching.

4.2 Descriptions
SNOMED CT descriptions link appropriate human readable terms to concepts. Each concept can have many
descriptions, which represent different synonymous ways of referring to the same clinical meaning. Each
description is written in a specific language, and new descriptions can be created to support a variety of languages.
Like concepts, descriptions also have a permanent unique identifier.
The richness of description content assists the process of searching and finding concepts using user interfaces or
database queries. It may also be used to enhance string-matching in natural language processing applications,
including analytics over multi-lingual data.

© Copyright 2021 International Health Terminology Standards Development Organisation 10


Data Analytics with SNOMED CT
(2021-03-11)

4.3 Relationships
SNOMED CT relationships represent an association between two concepts. Relationships are used to logically
define the meaning of concept in a way that can be processed by a computer. A third concept, called a relationship
type, is used to represent the meaning of the association between the source and destination concepts. There are
different types of relationships available within SNOMED CT.
Subtype relationships, which use the |is a| relationship type, are the most widely used type of relationship. The
SNOMED CT concept hierarchy is constructed from |is a| relationships. For example, the concept 128276007 |
cellulitis of foot| has an |is a| relationship to both the concept 118932009 |disorder of foot| and the concept
128045006 |cellulitis|. Subtype relationships are used in many analytics scenarios to aggregate groups of concepts
together, or to perform queries using more abstract (less detailed) concepts that match more specific (or more
detailed) concepts stored in health records.
Attribute relationships contribute to the definition of the source concept by associating it with the value of a
defining characteristic. For example, the concept |viral pneumonia| has a |causative agent| relationship to the
concept |Virus| and a |finding site| relationship to the concept |lung|. Attribute relationships are used in analytics
scenarios in which the meaning of a concept is needed to determine whether a record matches the query criteria.

4.4 Concept Model


The rules which define how SNOMED CT concepts may be defined are called the SNOMED CT concept model. The
SNOMED CT concept model defines the permitted attributes and values that may be applied to each kind of
concept. For example, concepts in the |clinical finding| hierarchy are permitted to have a |finding site| relationship,
and the valid values of these relationships must belong to the |anatomical or acquired body structure| hierarchy.
The SNOMED CT concept model provides the foundation for processing the clinical meanings recorded in clinical
records and enables the appropriate use of clinical information for decision support and other analytics services.

4.5 Expressions
An expression is a structured combination of one or more concept identifiers used to represent a clinical meaning.
SNOMED CT postcoordinated expressions enable clinical meanings to be represented, which cannot be represented
using a single SNOMED CT concept. For example, the following postcoordinated expression represents 'pain in the
left thumb:

53057004 |hand pain| :


363698007 |finding site| = ( 76505004 |thumb structure| :
272741003 |laterality| = 7771000 |left| )
SNOMED CT postcoordinated expressions allow analytics services to perform meaning-based queries over a more
extensive set of clinical meanings than just individual concepts.

4.6 Reference Sets


A reference set (or 'refset') is a mechanism used to refer to a set of SNOMED CT components and to add customized
information to these components. Reference sets can be used for many different purposes, including representing
subsets of concepts, descriptions or relationships, language and dialect preferences, maps to and from other code
systems, ordered lists, navigation hierarchies and aggregation hierarchies. For more information about the different
types of reference sets, please refer to the Reference Set Release Files Specification.
Reference sets are used for a range of analytics purposes, including:
• Representing subsets of SNOMED CT concepts with which query criteria are defined and clinical records are
matched;
• To represent non-standard aggregations of concepts for specific use cases;
• To define maps from other code systems to SNOMED CT so that clinical data can be prepared for analytics to
be performed using SNOMED CT;

© Copyright 2021 International Health Terminology Standards Development Organisation 11


Data Analytics with SNOMED CT
(2021-03-11)

• To define language or dialect specific sets of descriptions over which lexical searches can be performed.

4.7 Description Logic Features


SNOMED CT concepts are modelled in such a way that their meaning can be represented using a formal family of
logics called Description Logic (DL). Description logic enables computers to make inferences about the concepts in
SNOMED CT and their meanings, and to classify SNOMED CT using a DL reasoner. Description logic also allows the
formal computation of:
• Subsumption – Testing pairs of expressions to see whether one is a subtype of the other
• Equivalence – Testing pairs of expressions to see whether they have the same logic-based meaning
Subsumption and equivalence are both extremely useful functions when retrieving or querying clinical information.
For example, when retrieving all clinical records related to 73211009 |diabetes mellitus|, it would usually be
necessary to retrieve records referring to any subtype of this concept, such as 23045005 |insulin dependent diabetes
mellitus type 1A|.

4.8 Benefits of Using SNOMED CT for Analytics


In addition to providing the features already described in this section, SNOMED CT also offers a number of
additional benefits for the provision of analytics including:
• SNOMED CT allows clinical data to be recorded at an appropriate level of detail, and then queried at either
the same level or a less detailed level of detail;
• SNOMED CT's broad coverage can enable queries across data captured within different disciplines,
specialties and domain areas;
• SNOMED CT provides a robust versioning mechanism, which helps to manage queries over longitudinal
health records;
• SNOMED CT is international, which enables queries, decision support rules and code system maps to be
shared and reused between countries;
• SNOMED CT includes localization mechanisms, which allow the same query to be applied to data from
different countries, dialects, regions and applications;
• SNOMED International provides maps between SNOMED CT and other international coding systems and
classifications, including LOINC (Logical Observation Identifiers Names and Codes) and ICD (International
Classification of Diseases, both ICD-10 and ICD-9-CM). This enables the additional benefits of these other
specialized standards to be integrated with the use of SNOMED CT.
Using SNOMED CT to support analytics services can also enable the following benefits:
• Enhancing the care of individual patients by supporting:
Retrieval of appropriate information for clinical care – e.g. for a clinical dashboard
Guideline and decision support integration
Retrospective searches for patterns requiring follow-up
• Enhancing the care of populations by supporting:
Epidemiology monitoring and reporting
Research into the causes and management of diseases
Identification of patient groups for clinical research or specialized healthcare programs
• Providing cost-effective delivery of care by supporting:
Guidelines to minimize risk of costly errors
Reducing duplication of investigations and interventions
Auditing the delivery of clinical services
Planning service delivery based on emerging health trends

© Copyright 2021 International Health Terminology Standards Development Organisation 12


Data Analytics with SNOMED CT
(2021-03-11)

5 Preparing Data for Analytics


As discussed in Section 3.3 Substrates for Analytics, not all electronic health records represent clinical data using
SNOMED CT. However, even when health records use free text or other code systems, it is still possible to use
SNOMED CT for analytics over this data if the data is prepared appropriately. For example, Natural Language
Processing can be used to encode free text data in SNOMED CT, subsequently enabling more sophisticated
analytics to be performed. Similarly, clinical data using other code systems can be mapped into SNOMED CT to
provide similar benefits.
In this section we discuss these alternative ways of preparing clinical data for analytics using SNOMED CT.

5.1 Natural Language Processing


While there is a strong trend towards the direct coding of clinical data, the capture and retention of free text
remains essential to record broader narratives about clinical history, physical examinations, clinical procedures and
investigation results. Wider deployment of medical transcription technologies featuring speech recognition also
add to the volume of free text in electronic format. Medical literature, clinical guidelines and published clinical
research also remains largely in free text.
Natural Language Processing (NLP) is a linguistic technique that enables a computer program to analyze and
extract meaning from human language. Clinical NLP, using SNOMED CT's concepts, descriptions and relationships,
may be applied to repositories of clinical information to search, index, selectively retrieve and analyze free text.
These techniques can be used to extract SNOMED CT encoded data from free-text patient records, and also support
the retrieval of clinical knowledge documents.
It should be noted that while clinical NLP techniques have increased in sophistication over recent years, it is not
possible to guarantee full accuracy or completeness using a computer-based algorithm. Spelling errors,
grammatical errors, abbreviations, unexpected synonyms, unusual vernacular (i.e. local) phrases, and hidden
contextual information continue to provide challenges that human intelligence is uniquely equipped to handle.

Example
The example shown below in Figure 5.1-1 shows a free text section of a discharge summary that has been processed
with clinical NLP to extract a set of potential SNOMED CT clinical findings and procedures. In order to ensure the
correctness of this automatic encoding, the application should present this list of extracted codes to the user for
confirmation, giving them the opportunity to refine, delete or append codes.

© Copyright 2021 International Health Terminology Standards Development Organisation 13


Data Analytics with SNOMED CT
(2021-03-11)

Figure 5.1-1: Natural Language Processing encoding SNOMED CT

To improve the accuracy of clinical NLP and the value for analytics processes, it is important that the context of
each statement expressed in natural language is clearly identified – for example, past history, suspected and
negation/absence. Figure 5.1-2 shows the same discharge summary narrative as in Figure 5.1-1, but this time
processed with clinical NLP that also extracts the explicit context of each clinical finding and procedure.

© Copyright 2021 International Health Terminology Standards Development Organisation 14


Data Analytics with SNOMED CT
(2021-03-11)

Figure 5.1-2: Natural Language Processing encoding SNOMED CT with context

When SNOMED CT codes with explicit context are extracted from free text narrative, the resulting clinical meanings
may be captured using SNOMED CT postcoordinated expressions. For example, the following clinical statement:
Endoscopy revealed an acute gastric ulcer but no evidence of gastric bleeding or perforation of the stomach.
can be encoded using the following SNOMED CT expressions with explicit context (see Clinithink case study):
• 243796009 |situation with explicit context| :
{408731000 |temporal context| = 410512000 |current or specified time|,
246090004|associated finding| = 95529005 |acute gastric ulcer|,
408732007 |subject relationship context| = 410604004 |subject of record|,
408729009 |finding context| = 410515003 |known present|
• 243796009 |situation with explicit context| :
{408729009 |finding context| = 410516002 |known absent|,

© Copyright 2021 International Health Terminology Standards Development Organisation 15


Data Analytics with SNOMED CT
(2021-03-11)

246090004 |associated finding| = 61401005 |gastric bleeding|,


408731000 |temporal context| = 410512000 |current or specified|,
408732007 |subject relationship context| = 410604004 |subject of record|}
• 243796009 |situation with explicit context| :
{408729009 |finding context| = 410516002 |known absent|,
246090004 |associated finding| = 235674005 |perforation of stomach|,
408731000 |temporal context| = 410512000 |current or specified|,
408732007 |subject relationship context| = 410604004 |subject of record|}

Implementation
NLP Techniques using SNOMED CT
A clinical NLP engine can use SNOMED CT to encode free text narrative in patient records in a number of ways.
Firstly, it can use SNOMED CT descriptions together with techniques such as:
• Stemming: The process of reducing a word to its stem, base or root form – for example "cardiology",
"cardiac" and "cardiologist" may be reduced to the stem "cardi".
• Reordering: The process of reordering the words in a phrase – for example, reordering "hip fracture" to
"fracture hip".
• Word substitution: The process of substituting a word or word phrase with an equivalent word or word
phrase. The SNOMED CT Lexical Resources zip file, available from the SNOMED CT Document Library,
includes an English Word Equivalents table that groups together equivalent words and phrases – for
example, "Renal stone", "Kidney stone", "kidney calculus", "renal calculus" and "nephrolith" are grouped
into the same word block group. This table can be modified or extended with additional word equivalent
groups if required.
• Stop word removal: The process of removing words with limited semantic specificity – for example 'a', 'an',
'and', 'as', 'at', 'be', 'by', 'for', 'of', 'the'. The SNOMED CT Lexical Resources zip file , available from the
SNOMED CT Document Library, includes an Excluded Words table, which suggests some common English
stop words that may be used with SNOMED CT.
The SNOMED CT concept model can also be used to identify potential connections between related concepts – for
example, the words "left", "hip" and "fracture" used in close proximity may indicate a |fracture| with finding site |
hip| and a laterality of |left|. Similarly, the SNOMED CT concept model may help to identify context that is expressed
within the text – for example, past history, certainty and absence.
Another commonly adopted NLP strategy is to use the location of the free text within the structure of a document to
restrict the possible SNOMED CT code matches. For example, free text entered into a 'Diagnosis' field may restrict
its SNOMED CT encoding to the |disorder| hierarchy, together with other concepts that may be linked to |clinical
findings| via the SNOMED CT concept model.
When NLP techniques are applied to non-English (or dialect-specific) text, translations of relevant SNOMED CT
descriptions may be required. The NLP methods themselves may also need to be adapted to reflect the structure
and style of the language in which the text is written.

Indexing
Another major application for Natural Language Processing technologies is indexing collections of free text
transcripts or documents such that topic specific searches may be run on them, or relevant clinical knowledge
sources may be identified and linked to a given patient's clinical data. The challenge is to return ranked matches
which permit selection of texts with high sensitivity and high specificity (i.e. that relevant documents are rarely
overlooked and that irrelevant documents are rarely returned).

© Copyright 2021 International Health Terminology Standards Development Organisation 16


Data Analytics with SNOMED CT
(2021-03-11)

SNOMED CT can be used to support these applications by enabling more powerful searching of free text data stores
than using a purely lexical keyword-based approach. For example, the clinician may request "all documents which
refer to cardiac rhythm disorders". Rather than relying purely on text matching, the search term may be matched
with the concept 698247007 |cardiac arrhythmia (disorder)|, based on its synonym |disorder of heart rhythm|. The
descendants of this concept (e.g. 276796006 |atrial tachycardia|, 49260003 |idioventricular rhythm|, 233917008 |
atrioventricular block|) may then be used to search for any code which is a kind of cardiac arrhythmia. Non-|is a|
attribute relationships may also be used in the retrieval process to find associations between the search term and
the indexed concepts, and to calculate the relevance of each free text artefact to determine the order in which they
should be presented to the user.

Case Studies
Clinical NLP has been implemented for encoding free text narrative in health records by a number of vendors,
including Caradigm, Cerner, Clinithink and Intelligent Medical Objects).
NLP techniques for indexing and searching have also been implemented by Cerner and Dr Bevan Koopman.
Allscript's Sunrise InfoButton™ feature uses encoded patient problem lists and medication data elements, together
with SNOMED CT-based indexes provided by third-party medical content providers, to present on-topic information
to the clinician without manual searching.

5.2 Mapping Other Code Systems to SNOMED CT


Mapping data from clinical records encoded using non-SNOMED CT code systems to SNOMED CT for analysis may
be considered when there is a requirement to produce:
• Management information for care service audit or delivery planning
• Statistical information for epidemiology
• Links from clinical records to clinical knowledge resources
• Links between clinical records and decision support tools
• An integrated data warehouse for querying from multiple heterogeneous sources
• Other types of research, reports or surveillance that requires SNOMED CT
Two important characteristics of a map, which affect its ability to be used for a particular purpose, are the direction
of the map, and the correlation between the source and target codes. Where the analytics use case requires
SNOMED CT to be used, the direction of the map must be from the non-SNOMED CT codes to SNOMED CT codes. A
map designed to move data from code system A to code system B will serve poorly (if at all) 'in reverse' if it is used
to map from B to A, unless all the links are exact semantic matches.
For analytics purposes where patient safety or data accuracy is important (e.g. point of care clinical decision
support or data integration), it is important that the correlation of the map is an 'exact match' (or equivalence). For
other purposes (e.g. epidemiology or care service delivery planning) it may be acceptable for the SNOMED CT code
to be broader than (or a supertype of) the non-SNOMED CT code. However, broad-to-narrow and narrow-to-broad
maps need to be used with care.
When a non-SNOMED CT code is being mapped into SNOMED CT, and an equivalent precoordinated SNOMED CT
concept does not exist, a number of options are possible, including:
1. Map the code to a broader (supertype) SNOMED CT concept
a. For example, map "DX0162: arthritis of left knee" to "371081002 |arthritis of knee|" with correlation
'narrow to broad'
2. Map the code to a SNOMED CT postcoordinated expression
a. For example, map "DX0162: arthritis of left knee" to "371081002 |arthritis of knee| : 272741003 |
laterality| = 7771000 |left|" with correlation 'exact match'
3. Create a new precoordinated SNOMED CT concept with the same meaning as the code, and map the code to
this new concept
a. For example, map "DX0162: arthritis of left knee" to a new extension concept 729999999100 |Arthritis
of left knee| 1

© Copyright 2021 International Health Terminology Standards Development Organisation 17


Data Analytics with SNOMED CT
(2021-03-11)

Designing and authoring maps requires expertise and appropriate resources. Large maps (e.g. tens of thousands of
codes) are typically created and maintained by SNOMED International, National Release Centers, large healthcare
organizations, specialist data suppliers and large system vendors. However, smaller maps may be created and
maintained by smaller system suppliers, hospitals or clinics. Maps must be maintained to ensure that both the
SNOMED CT content and non-SNOMED CT content remains current whenever either code system is updated.

Example
A typical scenario requiring mapping to SNOMED CT is shown below in Figure 5.2-1. In this example, two source
systems (using ICD-9 and ICD-10 respectively) are being integrated into a data warehouse using SNOMED CT as the
common 'reference terminology' for analysis. Once this mapping is done, the same analytic techniques as used on
native SNOMED CT records may be applied (See Section 6 SNOMED CT Analytic Techniques).

Figure 5.2-1: Mapping from ICD classifications to SNOMED CT

© Copyright 2021 International Health Terminology Standards Development Organisation 18


Data Analytics with SNOMED CT
(2021-03-11)

Implementation
Mapping Using SNOMED CT
Maps are represented in SNOMED CT's RF2 using a Simple map reference set, a Complex map reference set, or an
Extended map reference set (depending on what additional information is required to support the implementation
of the map). Code mappings are then performed by matching each non-SNOMED CT code in a patient's record with
the 'mapTarget' field of the corresponding row of the map reference set, and using the SNOMED CT code found in
the 'referencedComponentId'.

Case Studies
The UK Terminology Centre's Data Migration Workbench demonstrates some advanced uses of data migration and
mapping products published by the UKTC, including Read Code Version 2 and CTV3 maps to SNOMED CT. A number
of vendor products also map non-SNOMED CT codes to SNOMED CT for use in analytics, including Allscript's
terminology service, Apelon's Distributed Terminology System, the Cerner Millennium Terminology (CMT) package,
and Epic's electronic patient record systems.

1 Please note that this concept does not exist in the international edition of SNOMED CT, but is shown here as a
hypothetical example of a concept added in a SNOMED CT extension.

© Copyright 2021 International Health Terminology Standards Development Organisation 19


Data Analytics with SNOMED CT
(2021-03-11)

6 SNOMED CT Analytic Techniques


SNOMED CT offers a number of analytics techniques, which are not possible using other coding systems. SNOMED
CT's hierarchical design improves upon the purely lexical query capabilities of free text lists or 'flat' controlled
vocabularies. For example, a purely text based query for 'kidney disease' will not return the kidney disease
'glomerulonephritis'. Purely mono-hierarchies, however, limit querying to a single grouping of each code. For
example, using a mono-hierarchy 'tuberculosis of the lung' must be assigned a code which makes it either a kind of
'lung disease' or a kind of 'tuberculosis' – however it cannot be both. Using SNOMED CT's polyhierarchy
'tuberculosis of the lung' can be represented as both a kind of 'lung disease' and a kind of 'tuberculosis'. The
inclusion of other attribute-based defining relationships and the ability to represent SNOMED CT using OWL 2 EL
enables additional Description Logic techniques for classifying and querying SNOMED CT. Extending these
capabilities even further, it is possible to use Description Logic techniques across both the terminology and the
structure of the patient records in which the codes are stored. Finally, in some specific use cases such as billing,
reimbursement and statistics where double counting must be avoided, clinically recorded SNOMED CT codes can
be used to map into more general statistical classifications, such as ICD (International Classification of Diseases).
In this section, we describe how the following analytics techniques can be used to support analytics over SNOMED
CT enabled data. The techniques described include:
• Subsets – for example, find the patients with a diagnosis in the set of 'kidney disease codes'
• Subsumption – for example, find the patients with a diagnosis that is a subtype (or self) of 'kidney disease'
• Using defining relationships – for example, find the patients whose diagnosis has a finding site of 'kidney
structure' (or a subtype of 'kidney structure')
• Description logic over terminology – for example, find the patients whose diagnosis is associated (directly or
indirectly) with the 'Streptococcus pyogenes organism'
• Description logic over terminology and structure – for example, find the patients with a family history of
heart disease (where this may either be recorded as 275120007 |family history: cardiac disorder| or recorded
in a 'Family History' section on a form as 56265001 |heart disease|)
• Using statistical classifications – for example, to meet national reporting guidelines using ICD (International
Classification of Diseases)
In practice, a query language may combine a number of these techniques in the same syntax. With the possible
exception of the last two approaches, these SNOMED CT query techniques should then be embedded within an EHR
query to ensure that the semantic context of the surrounding patient record is taken into account.

6.1 Subsets
One approach to analytics using SNOMED CT is to construct subsets of SNOMED CT identifiers, which are applicable
to a specific clinical purpose, and to test the codes recorded in patient records to check for membership in the
appropriate subset. Subsets of SNOMED CT identifiers may either be defined extensionally or intensionally.
Extensionally defined subsets are those in which each concept is individually enumerated. They are usually
manually constructed and maintained, and can therefore be labor intensive and error prone. For example, one
might construct a subset of kidney disease codes including 36171008 |glomerulonephritis|, 71110009 |
hydrocalycosis| and 42399005 |renal failure|.
Intensionally defined subsets are those which are automatically populated (or expanded) based on a machine
processable query. For example, one might construct a subset of kidney disease codes using the results of the query
"<< 90708001 |kidney disease|" (i.e. descendants or self of 90708001 |kidney disease|). The query used to define an
intensional subset may utilize SNOMED CT's hierarchical relationships, attribute values, descriptions, and
membership in other intensionally or extensionally defined subsets. For more information about SNOMED CT query
languages, which may be used to define intensional subsets, please refer to section 9 Database Queries.

© Copyright 2021 International Health Terminology Standards Development Organisation 20


Data Analytics with SNOMED CT
(2021-03-11)

Example
A subset containing types of 58437007 |tuberculosis of meninges| may be defined extensionally as follows:

Concept ID Description

58437007 tuberculosis of meninges (disorder)

90302003 tuberculosis of cerebral meninges (disorder)

38115001 tuberculosis of spinal meninges (disorder)

11676005 tuberculous leptomeningitis (disorder)

With the help of the SNOMED CT hierarchy (as shown in Figure 6.1-1), this same subset can be defined intensionally
as: << 58437007 |tuberculosis of meninges| The expansion of an intensional subset defined using this query is the
same as the extensionally defined subset shown above. 1

Figure 6.1-1: Tuberculosis of meninges concept sub-hierarchy

Using a lexical query, it is also possible to intensionally define a subset of 'tuberculosis of meninges' findings.
However, the results of purely lexical queries are not as reliable. For example, using the query: << 404684003 |
clinical finding| {{ term = ".tuberculosis.*meninges." }}
the following expansion can be calculated:

© Copyright 2021 International Health Terminology Standards Development Organisation 21


Data Analytics with SNOMED CT
(2021-03-11)

Concept ID Description

58437007 tuberculosis of meninges (disorder)

90302003 tuberculosis of cerebral meninges (disorder)

38115001 tuberculosis of spinal meninges (disorder)

As can be seen, the results of this lexical query only includes 3 of the possible 4 values from the previous subset. In
other cases, lexical queries may incorrectly find concepts which are not appropriate to the subset. It is therefore
recommended that lexical queries are avoided in the definition of intentional subsets. However, they do serve a
useful purpose in identifying candidates for an otherwise manually crafted subset.

Implementation
Defining Subsets in SNOMED CT
Subsets of SNOMED CT may be defined locally as a flat list of concept identifiers, or as an independent query
specification. However, where wider distribution and/or version control is required over these subsets, SNOMED CT
reference sets offer the ideal solution.
Extensional subsets are commonly defined in SNOMED CT using a Simple reference set - however an Ordered
reference set or Annotation reference set can be used if additional information needs to be recorded for each
member of the subset. Intensional subsets are defined in SNOMED CT using a Query specification reference set. A
Query specification reference set allows a serialized query to define the membership of a subset of SNOMED CT
components. It also specifies the extensional reference set into which the results of executing the query are
generated. Intensional reference sets are preferred in many circumstances as they enable their membership to be
automatically recomputed over new versions of SNOMED CT. Version management of subsets is discussed further in
section 11.4 Versioning.
Subsets can be created using the following methods, either alone or in any combination:
• Manual inclusion, using search and browse methods
• Existing subset, used as a starting point for further manual inclusion and update
• Lexical queries, to identify candidate members, followed by manual verification and update
• Hierarchical queries, to identify descendants of a given concept (e.g. descendants of <73211009 |diabetes
mellitus|)
• Attribute queries, to identify concepts with a specific attribute value (e.g. disorders with a finding site of
80891009 |heart structure|)
• SNOMED CT queries, using the SNOMED CT Expression Constraint or Query languages, which offer additional
query functionality. Please refer to section 9 Database Queries for more details.

Case Studies
A number of vendor products, such as Apelon and B2i Healthcare allow users to create customized extensional and
intensional subsets of SNOMED CT. Other vendor products, such as the Cambio COSMIC® Electronic Patient Record
system, Caradigm's population health solutions, Cerner's data warehousing solution and Epic's decision support
and reporting tools use subsets of SNOMED CT to support their analytics services.

1 Expansion derived from SNOMED CT International Edition, dated 20170131.

© Copyright 2021 International Health Terminology Standards Development Organisation 22


Data Analytics with SNOMED CT
(2021-03-11)

6.2 Subsumption
Determining whether one concept (or expression) is a kind of another concept (or expression) is the fundamental
capability enabled by SNOMED CT. For example, answering the question 'Which patients have an infectious
disease?' involves finding all the patients with any kind of infectious disease (e.g. viral pneumonia, tuberculosis).
Subsumption occurs when one clinical meaning is a subtype of another clinical meaning, and testing for this is
called 'subsumption testing'. If clinical meaning X is a subtype of clinical meaning Y, then Y is said to 'subsume' X
and X is 'subsumed by' Y.
Subsumption testing between concepts is represented using a stated or implied |is a| relationship. For example,
75570004 |viral pneumonia| is a 40733004 |infectious disease| and therefore 40733004 |infectious disease| subsumes
75570004 |viral pneumonia|, and 75570004 |viral pneumonia| is subsumed by 40733004 |infectious disease|.
Subsumption testing between expressions tests to see if the candidate expression (often recorded in a patient
record) is subsumed by a predicate expression (typically part of the query being run across the patient record). For
example:
Candidate expression: 75570004 |viral pneumonia|
Predicate expression: 40733004 |infectious disease|:
363698007 |finding site| = 39607008 |lung structure|
In this case, the candidate expression is subsumed by the predicate expression.
Subsumption testing can be represented using the SNOMED CT Expression Constraint Language using the
'<' (descendantOf) or '<<' (descendantOrSelfOf) operators. For example, the expression constraint:
<< 40733004 |infectious disease|
is satisfied by any expression that is subsumed by 40733004 |infectious disease|.
There are a variety of ways to implement subsumption testing. These are summarized in the Implementation sub-
section below.

Example
A typical example using subsumption would be an audit within a hospital, reviewing all patients with an infectious
disease. In this scenario, the following simple query could be executed to find all the patients whose health record
contains a diagnosis that is subsumed by the concept 40733004 |infectious disease|:
SELECT distinct patientID
FROM health_records
WHERE diagnosis = (<< 40733004 |infectious disease|)
If the health records contained the following data:
patientID date diagnosis

634711 16th January 2015 71620000|fracture of femur|

634711 25th January 2015 415353009|rotavirus food poisoning|

634711 3rd February 2015 66308002|fracture of humerus|

158775 7th January 2015 40468003|hepatitis A|

© Copyright 2021 International Health Terminology Standards Development Organisation 23


Data Analytics with SNOMED CT
(2021-03-11)

889125 7th January 2015 75570004 |viral pneumonia|

456872 15th January 2015 22298006|myocardial infarction|

456872 15th January 2015 195967001|asthma|

Then this query would return the following list of patients:


• 634711 (because 415353009 |rotavirus food poisoning| is a subtype of 40733004 |infectious disease|)
• 158775 (because 40468003|hepatitis A| is a subtype of 40733004 |infectious disease|)
• 889125 (because 75570004 |viral pneumonia| is a subtype of 40733004 |infectious disease|)
Note that patient 456872 would not be returned by this query as neither 22298006 |myocardial infarction| or
195967001 |asthma| are subtypes of 40733004 |infectious disease|.

Implementation
Testing Subsumption Between Concepts
Rapid and efficient computation of whether a concept |is a| subtype descendant of another concept is essential for
testing subsumption between expressions. A variety of approaches exist for testing subsumption. When the
candidate and predicate expressions are both precoordinated concepts, subsumption testing can use the published
relationships from the SNOMED CT release files. Approaches for testing subsumption between precoordinated
concepts include:
• Exhaustive testing of subtype relationships
In this approach, every possible sequence of |is a| relationships are recursively tested from the candidate concept
until the predicate concept is reached or until all possible paths have been exhausted.
• Semantic type identifiers and hierarchy flags
In this approach, flags are added to each concept to indicate the set of high-level concept nodes of which that
concept is a subtype. A concept can only subsume concepts that include the same set of high-level concept flags.
This reduces the number of tests that need to be performed to recursively test the subtype relationships.
• Use of proprietary database features
In this approach, a database is used which supports the recursive testing of a chain of hierarchical relationships.
• Branch numbering
In this approach, a depth first tree walk is performed that applies an incremental number to each concept. A second
tree walk then allocates one or more branch number ranges to each concept, which contains the number of all of
their descendants.
• Precomputed transitive closure table
In this approach, a comprehensive list of all supertypes of each concept is created by recursively traversing all |is a|
relationships and adding each stated and inferred subtype relationship to a table.
• Using a Description Logic Reasoner
In this approach, a description logic reasoner (e.g. Snorocket, ELK, Fact++) is used to determine whether one
concept is subsumed by another.
In most environments, the recommended approach is to either use a precomputed transitive closure table or a
description logic reasoner. However, where disk capacity or distribution bandwidth are limiting factors, branch
numbering provides an efficient alternative approach. For more information on these approaches, please refer to
Subtype search scope restriction in the SNOMED CT Terminology Services Guide.

© Copyright 2021 International Health Terminology Standards Development Organisation 24


Data Analytics with SNOMED CT
(2021-03-11)

Testing Subsumption Between Expressions


When either the candidate expression (in the patient data) or the predicate expression in the query are
postcoordinated (or both), techniques based on description logic are needed to perform subsumption testing.
Approaches for testing subsumption between postcoordinated expressions include:
• Comparing normal form expressions
In this approach, the predicate expression is transformed to short normal form and the candidate expression is
transformed to long normal form. The two normal form expressions are then tested for subsumption by checking
that each focus concept in the predicate expression subsumes at least one focus concept in the candidate
expression, each attribute group in the predicate expression subsumes at least one attribute group in the candidate
expression and each ungrouped attribute in the predicate expression subsumes at least one attribute in the
candidate expression.
• Using a Description Logic Reasoner
In this approach, a description logic reasoner (e.g. Snorocket, ELK, Fact++) is used to determine whether one
expression is subsumed by another.
Where available, the recommended approach is to use a description logic reasoner to calculate subsumption
between expressions. However, comparing normal form expressions provides an alternative approach when a
reasoner is not available. For more information on these approaches, please refer to Expression Retrieval and
Normal Forms in the SNOMED CT Terminology Services Guide.

Case Studies
A number of vendor products use the SNOMED CT hierarchy to support subsumption testing in their analytics
services, including the Cerner Millennium Terminology (CMT) package and Epic's decision support and reporting
tools. Terminology servers that provide the ability to perform subsumption testing include B2i Healthcare's Snow
Owl® terminology server. The UK Terminology Centre's Data Migration Workbench also uses subsumption testing in
its query tool, and its case mix and caseload trends analysis tools.

6.3 Using Defining Relationships


SNOMED CT attributes are used to represent a characteristic of the meaning of a concept. There are more than 50
attributes in SNOMED CT, which can each be used as the 'type' of a defining relationship, including:
• 363698007|finding site|
• 116676008|associated morphology|
• 246075003|causative agent|
• 363704007|procedure site|
• 260686004|method|
• 272741003|laterality|
• 127489000|has active ingredient|
The SNOMED CT Concept Model provides rules about how these attributes can be used. Some database queries use
the rules from the SNOMED CT Concept Model to match concepts based on the value of their defining relationships.

Example
Figure 6.3-1 illustrates the execution of a query to retrieve a set of findings which have a benign tumor morphology.
The query is executed by finding those concepts with an 'associated morphology' relationship with the value
'benign neoplasm'. In this example, the concepts 'benign tumor of kidney', 'benign neoplasm of bladder' and
'benign tumor of lung' are found to have the required defining relationship value.

© Copyright 2021 International Health Terminology Standards Development Organisation 25


Data Analytics with SNOMED CT
(2021-03-11)

Figure 6.3-1: Query to retrieve benign neoplasm findings

In Figure 6.3-2 the same set of concepts are shown analyzed with the intention to identify those which have a
finding site of kidney. In this example, the concepts 'renal cyst', 'benign tumor of kidney' and 'renal abscess' are
found to have the required defining relationship value.

© Copyright 2021 International Health Terminology Standards Development Organisation 26


Data Analytics with SNOMED CT
(2021-03-11)

Figure 6.3-2: Query to retrieve findings in the kidney

If the queries from Figure 6.3-2 and Figure 6.3-3 are combined, then the query will return those concepts which are
benign tumors of the kidney (see Figure 6 4). In this case, the concept 'benign tumor of kidney' is the only concept
found to have the required defining relationship values.

© Copyright 2021 International Health Terminology Standards Development Organisation 27


Data Analytics with SNOMED CT
(2021-03-11)

Figure 6.3-3: Query to retrieve benign neoplasms of the kidney

In most cases, these queries would be designed to return concepts with an associated morphology of 'benign
neoplasm' or any subtype of 'benign neoplasm (e.g. 'angiomyolipoma'), and a finding site of 'kidney' or any subtype
of 'kidney' (e.g. 'papillary duct of kidney', or 'upper pole, left kidney'). This query could be expressed using
the Expression Constraint Language as:
< 404684003 |clinical finding|:

© Copyright 2021 International Health Terminology Standards Development Organisation 28


Data Analytics with SNOMED CT
(2021-03-11)

116676008|associated morphology| = << 3898006 |benign neoplasm| AND


363698007|finding site| = << 64033007 |kidney structure|

When executed against the January 31st 2015 international edition of SNOMED CT, this query would return the
following 12 concepts:

Concept ID Preferred Term

254925008 Benign tumor of renal calyx

254919009 Cortical adenoma of kidney

269489006 Benign tumor of renal parenchyma

254920003 Cystadenoma of kidney

254922006 Oncocytoma of kidney

276866009 Benign tumor of pelviureteric junction

254927000 Benign papilloma of renal pelvis

92319008 Benign neoplasm of renal pelvis

307618001 Juxtaglomerular tumor

254923001 Hemangiopericytoma of kidney

254921004 Angiomyolipoma of kidney

92165001 Benign neoplasm of kidney

Implementation
Queries Over Defining Relationships
A query, which constrains the defining relationships of matching clinical meanings to specific values can either be
represented informally using a set of attribute value pairs, or represented more formally using a machine
processable language (e.g. the SNOMED CT Expression Constraint Language).
Approaches to implement such a query include:
• Using the distributed relationships
In this approach, the distributed Relationship file is used directly to compare the target value of each defining
relationship with the required attribute value in the query. This approach may be combined with a subsumption
testing approach (e.g. transitive closure table) to enable subtypes of the required attribute value to also be
matched.
• Comparing normal form expressions
In this approach, the query is represented as a predicate expression containing the constrained attribute values,
and the short normal form of this predicate expression is tested for subsumption against each candidate expression
(as per the normal form subsumption test in section 6.2 Subsumption).
• Using a Description Logic Reasoner

© Copyright 2021 International Health Terminology Standards Development Organisation 29


Data Analytics with SNOMED CT
(2021-03-11)

In this approach, a description logic reasoner (e.g. Snorocket, ELK, Fact++) is used to determine whether each
candidate expression is subsumed by the query (represented by a predicate expression).

Case Studies
Many organization-wide implementations of SNOMED CT, such as Kaiser Permanente's HealthConnect EHR and the
Danish National Medication Decision Support System, are taking advantage of SNOMED CT's definitional attributes
to support advanced analytics.
A number of vendor products are also supporting analytics over SNOMED CT's defining relationships, including
Apelon's Distributed Terminology System, B2i Healthcare's SnowOwl terminology server, and Cerner's Semantic
Search tool.

6.4 Description Logic Over Terminology


SNOMED CT's semantics are based on Description Logic (DL). This enables the automation of reasoning across
SNOMED CT, and subsequently the implementation of more powerful analytics operations than is possible using
most other approaches. In addition to the subsumption and defining relationship testing described in the previous
approaches, DL reasoners and query engines are able to utilize a number of additional logic-based techniques
including:
• Property chaining
A property chain is a rule that allows you to infer the existence of a property from a chain of properties.
For example, "x has parent y" and "y has parent z" implies "x has grandparent z" (which may be written
as "|has parent|ο|has parent|→|has grandparent|). The current release of SNOMED CT includes the
property chain:
363701004 |direct substance| ο 127489000 |has active ingredient|
→ 363701004 |direct substance|
However, more property chains may be added in local implementations if required.
• Reasoning over concrete values
Some concepts in SNOMED CT (e.g. 374646004 |amoxicillin 500mg tablet|) require numbers or strings to
fully define their meaning. By generating an OWL 2 representation of these concept definitions,
Description Logic can be used to reason over their complete definition (including the concrete values)
• Testing equivalence and subsumption of postcoordinated expressions (without calculating normal forms)
Description Logic enables equivalence and subsumption testing to be performed efficiently, without the
need to manually calculate the normal form of each expression.
• Reasoning over minimum sufficient sets
SNOMED CT definitions include the set of necessary and sufficient conditions that define the given
concept. However, SNOMED CT does not currently distinguish the minimum sets which are sufficient to
define these concepts. For example, the defining relationships of 154283005 |pulmonary tuberculosis|
are:
116680003 |is a| = 64572001 |disease|
246075003 |causative agent| = 113858008 |mycobacterium tuberculosis complex|
116676008 |associated morphology| = 6266001 |granulomatous inflammation|
363698007 |finding site| = 39607008 |lung structure|
However, while the associated morphology of 'granulomatous inflammation' is necessarily present, the
following set of defining relationships are sufficient to infer 154283005 |pulmonary tuberculosis|:
116680003 |is a| = 64572001 |disease|

© Copyright 2021 International Health Terminology Standards Development Organisation 30


Data Analytics with SNOMED CT
(2021-03-11)

246075003 |causative agent| = 113858008 |mycobacterium tuberculosis complex|


363698007 |finding site| = 39607008 |lung structure|
Using Description Logic, it is possible to reason using multiple minimum sufficient sets for each concept.

Example
For example, if we want to find all disorders that are associated with the organism 80166006 |streptococcus
pyogenes|, we may discover (using the SNOMED CT Relationships file) that there is a direct 'causative agent'
relationship from 302809008 |streptococcus pyogenes infection| to 80166006 |streptococcus pyogenes|. However,
by introducing the following property chain rule:
47429007 |associated with| ο 47429007 |associated with| → 47429007 |associated with|
and noting that 47429007 |associated with| has three subtypes:
255234002 |after|
42752001 |due to|
246075003 |causative agent|
it is possible to discover, using Description Logic, that 81077008 |acute rheumatic arthritis| and 58718002 |
rheumatic fever| are also 'associated with' the concept 30209008 |streptococcus pyogenes infection|. Figure
6.4-1 illustrates these relationships that can discovered using property chaining.

© Copyright 2021 International Health Terminology Standards Development Organisation 31


Data Analytics with SNOMED CT
(2021-03-11)

Figure 6.4-1: Property chaining

Implementation
OWL 2
Using Description Logic techniques to perform analytics over SNOMED CT involves first translating SNOMED CT into
OWL 2 (Web Ontology Language). OWL 2 is an ontology language for the Semantic Web with formally defined
meaning. The SNOMED CT international release comes with a Perl transform script that converts the RF2 files into
OWL XML/RDF, Functional Syntax or KRSS files.
Once generated, the OWL files can then be loaded into a Description Logic Editor (such as Protégé) or used directly
by a terminology service which offers description logic capabilities. The Description Logic Editor or terminology
service then uses DL reasoners (also known as 'classifiers'), such as Snorocket, ELK and FACT++, to perform
consistency checking and subsumption testing (also known as 'classification') over SNOMED CT. Subsumption
testing can also be performed between two expressions. Semantic query languages, such as SPARQL, can be used
to query over RDF representations of SNOMED CT.

Case Studies
Some commercial terminology servers, such as B2i Healthcare's Snow Owl terminology server, use Description
Logic based techniques to support both classification and querying over SNOMED CT. Kaiser Permanente is
collaborating with Oxford University to investigate ways of performing complex queries efficiently across extremely
large numbers of patient records using scalable parallel processing and description logic reasoners.

6.5 Description Logic Over Terminology and Structure


When performing analytics over patient data, an appreciation for the semantics represented in both the
terminology and the information model is required. Different information models can use different amounts of
precoordination in the terminology, and the same semantics can be represented using different information
structures. By using description logic over both the terminology and the information structures, a consistent
representation of the meaning of data can be achieved, irrespective of whether this meaning is captured in the data
values or in the model itself.

© Copyright 2021 International Health Terminology Standards Development Organisation 32


Data Analytics with SNOMED CT
(2021-03-11)

Example
Consider for example the two alternative ways of recording family history, as shown in Figure 6.5-1. The green
rectangles represent the logical structure of the information model and the blue rectangles represent the concept
identifiers that are used to populate this information model in the patient record.
The information model on the left uses a heading of 'Family history' to indicate that the named problem refers to a
family history of that problem. The information model on the right uses the terminology value to indicate that the
problem refers to a family history instance.

Figure 6.5-1: Two ways of recording family history

When querying over data, which may be collected in either format, both the semantics of the information model
and the semantics of the data instances must be considered. One way of achieving this is to use an 'expression
template' to convert all data instances into a Description Logic representation, and use this to reason over the data.
Figure 6.5-2 shows an example of an expression template that could be used to create a SNOMED CT expression for
each of the data instances shown in Figure 6.5-1. Please note that the orange parallelograms represent 'slots' which
are subsequently populated with the value of the named data element (e.g. '$Problem').

© Copyright 2021 International Health Terminology Standards Development Organisation 33


Data Analytics with SNOMED CT
(2021-03-11)

Figure 6.5-2: SNOMED CT expression representation of family history data

When the data instances from Figure 6.5-1 are used to populate the templates from Figure 6.5-2, the following two
expressions are created:
416471007|family history of clinical finding|:
246090004 |associated finding| = 56265001 |heart disease|,
408732007 |subject relationship context| = 72705000 |mother|,
408731000 |temporal context| = 410511007 |current or past (actual)|,
408729009 |finding context| = 410515003 |known present|
275120007 |family history: cardiac disorder|
These expressions may then be compared using a DL reasoner to discover that the first expression is subsumed by
the second, or queried using a semantic query language to allow the two data representations to be analyzed in a
consistent way.

© Copyright 2021 International Health Terminology Standards Development Organisation 34


Data Analytics with SNOMED CT
(2021-03-11)

Implementation
OWL 2
Description Logic techniques, such as those described in section 6.4 Description Logic Over Terminology, can be
used to reason over both the terminology and the information model. In addition to translating SNOMED CT to OWL
2, OWL 2 representations of the information model are also created using 'templates' that include 'slots' which are
then filled with the patient record instance values. DL reasoners, such as Snorocket, ELK and FACT++, and semantic
query languages, such as SPARQL, can then be used over both the terminology and the information model in a
consistent way.

Case Studies
Kaiser Permanente is collaborating with Oxford University to investigate ways of performing complex queries
efficiently across extremely large numbers of patient records using scalable parallel processing and description
logic reasoners. In this project, the analysis is being performed over an OWL-RL representation of the patient data,
which incorporates both the terminology and the structure of the information.

6.6 Using Statistical Classifications


Clinical terminologies and classifications serve different but complementary purposes and both are an important
part of the healthcare environment. There are therefore some situations in which it is necessary to map SNOMED CT
codes to a classification, such as ICD-9 or ICD-10, for analytics or reporting purposes. The differences between the
two holds the key to their distinct purpose.
A classification is a hierarchical organization of terms that allows aggregation into categories which can be counted
and compared. A statistical classification is mono-hierarchical which means that each code in the hierarchy is
classified underneath a single code in the level above. This avoids codes being counted twice because they are
grouped into two distinct groupings (i.e. double counting), but means that arbitrary decisions must be made as to
where codes are grouped. For example, the ICD-10 code J12 |viral pneumonia, not elsewhere classified| is classified
under "Diseases of the respiratory system", but is not classified under "Certain infectious and parasitic diseases".
Therefore, a query that asks "Is J12 a respiratory disease?" will return "Yes", while a query that asks "Is J12 an
infectious disease?" will return "No".
Unlike clinical terminologies, classifications also explicitly enumerate 'known unknowns' (e.g. 'not otherwise
specified (NOS)' and 'not elsewhere classified (NEC)'); they often use a single code to represent several closely
related but clinically distinct entities (e.g. H65.9 represents 'allergic otitis media', 'exudative otitis media', 'mucoid
otitis media' and others); and they are often presented in the form of coding manuals with rigid, well defined rules
of use. Classifications emphasize coding discipline (rather than expressivity). This is helpful for driving formal billing
and reimbursement. The lower number of codes also makes assigning prices to each code tractable (e.g. using
either ICD-9, ICD-10, or one of the Diagnosis Related Group systems). Classifications are also deployable in low tech
environments, including paper or simple spreadsheet based systems.
Classifications are primarily used for purposes in which terms must be grouped into categories, and double
counting must be avoided. These purposes may include:
• Statistical reporting on major diagnoses, procedures or primary cause of morbidity
• Epidemiological reporting involving counting of disease categories
• Other administrative reporting based on specific WHO reporting requirements
• Billing and reimbursement
In contrast, SNOMED CT is a clinical terminology, in which each concept identifier represents a distinct clinical
meaning. By providing a more detailed level of granularity than classifications, SNOMED CT enables clinicians to use
SNOMED CT to record healthcare information at the clinically appropriate level of detail. Unlike statistical
classifications, SNOMED CT uses a polyhierarchy, in which each concept may be grouped under more than one
supertype, reflecting possible alternate ways of categorizing each clinical meaning. SNOMED CT also provides

© Copyright 2021 International Health Terminology Standards Development Organisation 35


Data Analytics with SNOMED CT
(2021-03-11)

defining relationships between concepts, which further enhances its ability to support flexible and powerful
analytics capabilities.
It is generally recommended that clinical data is recorded using a clinical terminology, such as SNOMED CT, and
then mapped for reporting purposes to one or more classifications, such as ICD. SNOMED International publishes a
map from SNOMED CT to both ICD-9 and ICD-10. This supports epidemiological, statistical and administrative
reporting needs of the member countries and WHO Collaborating Centers. The collaborative work between
SNOMED International and the WHO on the alignment of ICD-11 with SNOMED CT is in progress and promises
tighter integration of the distinct use cases in the future.

Example
The following example illustrates the rows of the Extended Map reference set that supports the mapping from the
SNOMED CT concept 15296000 |sterility| to an appropriate ICD-10 code. The set of map rules associated with each
SNOMED CT concept are grouped together into 'map groups' and then ordered within each map group by a 'map
priority'. The map rule provides a machine readable rule that indicates whether this map should be selected within
its map group, and the map advice provides human readable advice. The correlation indicates the type of match
between the source and the target (e.g. 'exact match' or 'narrow to broad') and the map category indicates the kind
of map being represented.
Referenced Map target Map group Map priority Map rule Map advice Correlation Map Category
component ID Id

15296000 | N97.9 1 1 IFA 10114008 | IF FEMALE STERILITY Not specified Context


sterility| |female infertility, Female sterility CHOOSE N97.9 dependent
unspecified| (finding) |

15296000 | N46 1 2 IFA 49408009 | IF MALE STERILITY Not specified Context


sterility| |male infertility| Male sterility CHOOSE N46 dependent
(finding) |

15296000 | 1 3 OTHERWISE TRUE MAP SOURCE Not specified Context


sterility| CONCEPT CANNOT BE dependent
CLASSIFIED WITH
AVAILABLE DATA

Implementation
Mapping to Classifications using SNOMED CT
Maps from SNOMED CT to classifications are generally represented in SNOMED CT's RF2 using a Complex or
Extended map reference set. Mappings are then performed by matching each SNOMED CT code in a patient's record
with the corresponding row of the map reference set, and using the classification code found in the 'mapTarget'
field.

Case Studies
The UK Terminology Centre's Data Migration Workbench demonstrates the use of maps from SNOMED CT to ICD-10
International Edition (using the UK maps) and OPCS Classification of Interventions and Procedures (OPCS-4). The
National Library of Medicine (NLM) has also developed a demonstration tool, which demonstrated the key
principles of implementing map rules and advice. This tool, called I-MAGIC
1 (Interactive Map-Assisted Generation of ICD Codes) uses the SNOMED CT to ICD-10 map in a real-time, interactive
manner to generate ICD-10 codes. It simulates a problem list interface in which the user enters problems with
SNOMED CT terms, which are then used to derive ICD-10 codes using the map. A number of vendor products, such
as Cerner Millenium, also use maps from SNOMED CT to ICD-10 to enable statistical analysis.

© Copyright 2021 International Health Terminology Standards Development Organisation 36


Data Analytics with SNOMED CT
(2021-03-11)

1 [Link]

© Copyright 2021 International Health Terminology Standards Development Organisation 37


Data Analytics with SNOMED CT
(2021-03-11)

7 Task-Oriented Analytics
The SNOMED CT analytics techniques described in the previous chapter only become useful when performing a
specific analytics task intended to meet a business need. In this chapter, we consider a range of analytics tasks,
which are either enabled or enhanced by using these SNOMED CT techniques.
The analytics tasks which can benefit from the use of SNOMED CT techniques can be considered in three broad
categories, as shown in Figure 7-1:
• Point of care analytics, which benefits individual patients and clinicians. This includes historical summaries,
decision support and reporting.
• Population-based analytics, which benefits populations. This includes trend analysis, public health
surveillance, pharmacovigilance, care delivery audits and healthcare service planning.
• Clinical research, which is used to improve clinical assessment and treatment guidelines. This includes
identification of clinical trial candidates, predictive medicine and semantic searching of clinical knowledge.

© Copyright 2021 International Health Terminology Standards Development Organisation 38


Data Analytics with SNOMED CT
(2021-03-11)

Figure 7-1: Analytics tasks that can benefit from SNOMED CT

Many of these tasks use business intelligence capabilities, similar to those used in other sectors, such as
manufacturing, retail and transportation. Business intelligence is the provision of historic, current and predictive
views of information. Such services include reporting, online analytical processing (OLAP), data mining, process
mining, complex event processing, benchmarking, text mining, predictive analysis and prescriptive analytics. In
many cases, a data warehouse is used as the platform on which these services are provided (see section 8.2 Data
Warehouse).
The combination of these business intelligence techniques with the capabilities of SNOMED CT creates new
opportunities to improve healthcare delivery.

7.1 Point of Care Analytics


Point of care analytics encompasses those analytics services that directly benefit individual patients and clinicians,
including historical summaries, decision support and point of care reporting. These analytics tasks typically involve
the summarization and mapping of patient data, and the linking of terminology with clinical knowledge artefacts.

7.1.1 Historical Summaries


One major ambition of healthcare IT is to make effective summaries of a patient's clinical history available to
healthcare providers (especially in emergency situations). Typically a patient's clinical data is scattered across a
number of healthcare institutions using a variety of information models and coding systems. Even within a single
institution patient data may be captured across many episodes of care, many devices, and often many software
systems.
SNOMED CT can help to support the integration of this information by serving as a common reference terminology
into which other code systems can be mapped (see section 5.2 Mapping Other Code Systems to SNOMED CT). It can
also be used to unlock clinical data that was captured by source systems in free text narrative (see section 5.1
Natural Language Processing), and to summarize large volumes of data by grouping codes together into more
general categories (see section 6.2 Subsumption). SNOMED CT can also be used to enable clinicians to filter large
volumes of data to select those records that are relevant to the current care episode – for example identifying all
previous records of a heart attack.
One significant example of this is the UK NHS Summary Care Record (SCR) service, 1 which uses SNOMED CT to
represent a number of types of clinical information including medical history, medications, adverse reactions and

© Copyright 2021 International Health Terminology Standards Development Organisation 39


Data Analytics with SNOMED CT
(2021-03-11)

allergies. This service uses a summary extracted from detailed patient care records held in a variety of disparate
systems. Where the source data is not stored natively in SNOMED CT, they are mapped into SNOMED CT prior to
transmission. Over 40 million people in England (80% of the population) now have a summary care record. This
service now contributes to the safe and efficient assessment and treatment of these people, and has greatly
improved the accuracy and timeliness of medicines reconciliation. 2

1 Vendor Introduction to SNOMED CT, 2015, [Link]


2 S. Sachdea, SCR reaches 40m patients, E-Health Insider, 2 July 2014, [Link]/news/29744/scr-
reaches-50m-patients.

7.1.2 Clinical Decision Support


Clinical decision support systems (CDSS) are designed to assist clinicians at the point of care on decision making
tasks. Examples of applications of clinical decision support include:
• Checking conformance with clinical guidelines and protocols
• Guide clinicians through complex care pathways
• Protect against errors in prescribing (e.g. drug-drug and allergy-drug contraindication checking)
• Highlight critical laboratory results
• Display clinical knowledge resources upon request, that are relevant to the given patient's diagnosis,
symptoms, procedures or medications
Most CDSSs consist of three parts:
1. The knowledge base, with rules and guidelines – for example:
a. IF drug = << 48603004 |warfarin| AND 77386006 |pregnant| THEN alert user
b. IF drug has active ingredient = << 387494007|codeine| AND past history of 292055008 |codeine
adverse reaction| THEN alert user
c. IF diagnosis = << 195967001 |asthma| THEN display Asthma Management Guidelines
2. The inference engine, which uses the data from the patient record to determine which rules from the
knowledge base should be executed – for example:
a. When a patient, with finding 77386006 |pregnant| is prescribed 375374009 |warfarin sodium 4mg
tablet|, the inference engine triggers Rule a. above.
b. When a patient, with past history of 292055008 |codeine adverse reaction| is prescribed 412575004 |
aspirin 325mg/codeine 30mg tablet|, the inference engine triggers Rule b. above.
c. When a patient's primary diagnosis is entered as "195949008 |chronic asthmatic bronchitis|" the
inference engine triggers Rule c. above.
3. A mechanism to communicate, which allows the system to display alerts or clinical knowledge to the user
Using a combination of SNOMED CT techniques, including mapping, subsets, subsumption and defining
relationships, SNOMED CT helps to support the inference engine in determining the appropriate rules to execute.
For example, Kaiser Permanente's HealthConnect system uses SNOMED CT to support efficient translation of its
business rules into decision support rules. The National Board of E-Health in Denmark is developing a centralized
decision support service based on the Danish SNOMED CT drug extension, which utilizes the hierarchical and
defining relationships of SNOMED CT.
A number of commercial tools also use the capabilities of SNOMED CT to implement Clinical Decision Support. For
example, Cambio's COSMIC tool binds GDL (Guideline Definition Language) rules to SNOMED CT concepts to
support the triggering of appropriate rules. Allscript's Sunrise InfoButton™ feature provides relevant medical
reference content to clinicians wherever patient care decisions are made, by using SNOMED CT encoded patient
problem lists and medication data to query third-party medical content. The Epic system provides decision support
alerts (called 'Best Practice Advisories'), which are able to use the SNOMED CT hierarchy to help define their criteria.
And First DataBank delivers clinical decision support solutions linked to SNOMED CT, primarily to detect safety
issues arising from certain combinations of medications, diagnoses and drug adverse reaction histories.

© Copyright 2021 International Health Terminology Standards Development Organisation 40


Data Analytics with SNOMED CT
(2021-03-11)

7.1.3 Point of Care Reporting


When it comes to reporting needs, the preference of most clinicians is to 'collect once and use many times'.
SNOMED CT enables this goal to be achieved by allowing data to be captured at the appropriate level of detail and
then queried at the same or less detailed level. SNOMED CT supports point of care reporting requirements using any
(or all) of the SNOMED CT analytics techniques described in section 6 SNOMED CT Analytic Techniques, including
subsets, subsumption, defining relationships and description logic. Examples of point of care reporting
requirements may include:
• Helping clinicians remember preventative services (reminders)
• Identifying patients with care gaps and risk factors
• Monitoring patient compliance with prescribed treatments
• Reporting clinical data to registries, such as cancer, stroke, and infectious disease registries
• Billing and reimbursement 1
When supporting a reporting requirement in which double counting must be avoided (such as statistical reporting,
administrative reporting, billing, or reimbursement), SNOMED CT codes can be mapped to statistical classifications
(such as ICD-9 and ICD-10) (see section 6.6 Using Statistical Classifications).
When the source data uses a coding system without the same reporting capabilities as SNOMED CT, or when a
variety of coding systems are used, coded data can be mapped into SNOMED CT to support the reporting
requirements (see section 5.2 Mapping Other Code Systems to SNOMED CT).

1 Note: In some healthcare environments this is a point of care activity, while in others it is not.

7.2 Population-Based Analytics


Population-based analytics encompasses those analytics services that benefit entire populations, including trend
analysis, public health surveillance, pharmacovigilance, care delivery audits and healthcare service planning.
Population-based analytics contributes to public health programs by helping to identify health threats, inform
public policy and manage healthcare resources.
Efficient healthcare delivery and service planning depends on high quality clinical data. Clinical data is typically
scattered between multiple different healthcare providers using different clinical systems. Collating this
information for analysis requires both standardized terminologies and common information models. Identifying
relevant and useful facts in large volumes of collated data also requires this data to be accurate, meaningful and
machine processable.
SNOMED CT supports population-based analytics in a number of ways. Firstly, it enables more accurate capture of
clinical data by allowing it to be represented at the appropriate level of clinical detail. Secondly, it supports the
integration of disparate clinical data sources by serving as a reference terminology into which free text and other
code systems can be mapped. And thirdly, it enables more meaningful and powerful queries to be performed over
the data using the descriptions, hierarchies and logic-based definitions of each concept.
Vendor products, which provide population health solutions include Caradigm's Intelligence Platform, Allscript's
Clinical Quality Management and Clinical Performance Management tools, Cerner's PowerInsight® Data Warehouse
and Epic's analytics and reporting suite.
In this section, we discuss three key types of population-based analytics: trend analysis, pharmacovigilance, and
clinical audit.

7.2.1 Trend Analysis


Trend analysis is the practice of collecting information and attempting to spot a pattern, or trend, in the
information. Trend analysis often refers to techniques for extracting an underlying pattern of behavior in a time
series, which would otherwise be partly or nearly completely hidden by noise.

© Copyright 2021 International Health Terminology Standards Development Organisation 41


Data Analytics with SNOMED CT
(2021-03-11)

Detecting changes of either incidence or prevalence of a particular disease, treatment, procedure or intervention
over time has major utility for population health monitoring, prediction of demand and effective resource
allocation at enterprise and national levels. One challenge that is encountered when analyzing routinely collected
patient data for trends, is distinguishing minor changes in coding style from real changes in disease incidence.
Simply counting the use of individual concept identifiers may be highly misleading. For example, a fall in the use of
the code 22298006 |myocardial infarction| might reflect a shift to using more specific codes (such as 314207007 |
non-Q wave myocardial infarction| or 304914007 |acute Q wave myocardial infarction|), rather than a reduction in
the incidence of myocardial infarctions. Use of subsumption testing on SNOMED CT encoded data (see section 6.2
Subsumption) can enable higher level trend analysis to be performed over more specific coded data.
SNOMED CT's polyhierarchy allows trends to be analyzed from multiple perspectives. However, deciding which
level of aggregation to use for trend analysis can be arbitrary. Novel approaches to this task are emerging as the
demand for trend analysis over SNOMED CT enabled data increases.
The UK Data Migration Workbench (case study 12.1.1 Data Migration Workbench (UK)), for example, includes a trend
module which analyses the frequency with which individual SNOMED CT codes are used in the Electronic Patient
Record (EPR) instance data, looking for those whose recording frequency has changed over the course of the data
collection period. It also includes an Induce module, which performs a more sophisticated analysis of case mix and
caseload trends within a clinical department. Instead of returning the most frequently used individual codes, the
Induce module identifies the most frequently used types of codes. For example, an emergency department may use
roughly 500 different SNOMED CT codes for a laceration in a particular anatomical location. While none of the site-
specific codes may appear in a list of most common codes, the descendants of 312608009 |laceration| may
collectively account for a significant part of the department's workload.
The algorithm used picks aggregation points at defined levels for analysis. The default setting finds roughly 100 sub-
trees within the SNOMED CT hierarchy, where each sub-tree accounts for a more or less constant proportion of all
coded episodes (around 1% of all coded events per sub-tree). The algorithm completes once the set of all codes
within all identified sub-trees collectively accounts for the large majority of the dataset being analyzed. When
applied to real emergency department attendance data, relatively low numbers of presentations (about 0.2%) were
coded as occurring primarily as a result of endocrine disease. As a result, in order to get a big enough grouping of
episodes, the algorithm chooses 362969004 |disorder of endocrine system| as the root of a single sub-tree covering
these reasons for the patient's attendance. By contrast, a very high proportion (9.4%) of presentations relate to
some subtype of 928000 |disorder of musculoskeletal system|. Therefore this part of the caseload is aggregated
under multiple more granular sub-trees, including (separately) burns, abrasions, lacerations, blunt injury, crush
injury and foreign body.
These code aggregations can then be tracked across time to reveal trends in demand, disease incidence or resource
utilization.

7.2.2 Pharmacovigilance
Pharmacovigilance is the collection, detection, assessment, monitoring and prevention of adverse effects with
pharmaceutical products. It is concerned with identifying the hazards associated with pharmaceutical products and
minimizing the risk of any harm that may come to patients. An important part of pharmacovigilance is
postmarketing surveillance, which monitors the safety of a pharmaceutical drug or medical device after it has been
released on the market. Since drugs are approved on the basis of clinical trials, which involve relatively small
numbers of people, postmarketing surveillance plays an important part in further refining, confirming or denying
the safety of a drug in the general population.
Pharmacovigilance uses a number of data sources to assess and monitor the safety of licensed drugs, including
clinical trial data, medical literature, spontaneous reporting databases, prescription events, electronic health
records, and patient registries. Data mining of large volumes of clinical data can be used to highlight potential
safety concerns. However, current mechanisms to analyze this data is often both costly and insensitive.
The availability of large datasets of richly encoded SNOMED CT data within longitudinal healthcare records can
greatly assist pharmacovigilance. Where SNOMED CT is not used natively to capture clinical data, free text narrative
and other code systems may be mapped to SNOMED CT (see sections 5.1 Natural Language Processing and 5.2

© Copyright 2021 International Health Terminology Standards Development Organisation 42


Data Analytics with SNOMED CT
(2021-03-11)

Mapping Other Code Systems to SNOMED CT) to support a homogeneous approach to querying across diseases,
signs and symptoms, lab results, medications, devices, procedures, allergies, adverse reactions, body sites and
substances. SNOMED CT's polyhierarchy and defining relationships, which provide links between these domains
provide a rich source of meaning-based information across which queries can be performed.
Many drug regulatory authorities and pharmaceutical companies currently use the Medical Dictionary for
Regulatory Activities (MedDRA) to classify adverse drug events. MedDRA is an international standard adverse event
classification used from pre-marketing through to post-marketing activities. However, as MedDRA was not designed
to support routine clinical data collection, its penetration into clinical systems is limited. Therefore mapping from
SNOMED CT to MedDRA would enable both styles of analysis and reporting to be performed from the same clinical
data. The UK Medicines and Healthcare products Regulatory Agency (MHRA) is working (with input from the
MedDRA Maintenance and Support Services Organization) to develop a mapping from a subset of SNOMED CT to
MedDRA for this purpose.

7.2.3 Clinical Audit


Clinical audit seeks to improve patient care and outcomes through systematic review of care against defined
standards and the implementation of change. It informs care providers and patients where their healthcare services
are doing well and where there could be improvements. The key component of clinical audit is that performance is
reviewed (or audited) to ensure that what should be done is being done, and if not it provides a framework to enable
improvements to be made.
Clinical audits can be performed in primary care facilities, individual clinics, hospitals, enterprises or jurisdictions.
Audit can have major beneficial impacts in ensuring the consistent delivery of quality healthcare. The questions
asked in audit are often chosen pragmatically according to local data collection practices. For example
• What proportion of patients invited to attend cervical screening did so?
• How many patents with ischemic heart disease are receiving appropriate drug treatments?
• Are all patients with diabetes mellitus reviewed within a stated time interval?
Current audit schemes use a combination of reporting against the classifications or questions specifically collected
for audit purposes. SNOMED CT will facilitate an increase in such audits being able to collect some of the data by
extracting data from the patient record thus reducing the additional burden of collection; it will also enable more a
more accurate picture from say tertiary centers where some of their procedures may fall into the NOS or NEC
classification and provide an unrepresentative comparison with other centers when the procedures are complex
and innovative/new.
SNOMED CT is well suited to service the ad hoc requirements that emerge in clinical audit questions, using the
techniques described in section 6 SNOMED CT Analytic Techniques. Using the SNOMED CT codes recorded during
care delivery can reduce the additional burden of data collection specifically for audit purposes. SNOMED CT may
also facilitate more accurate audit results than classifications, by distinguishing between distinct concepts (e.g.
clinical findings or procedures) which may fall into the 'Not Otherwise Specified' or 'Not Elsewhere Classified'
categories in these classifications.
A number of vendor products, such as Cerner's PowerInsight® Data Warehouse (case study 12.2.7 Cerner), are able
to support clinical audit using SNOMED CT enabled analytics tools.

7.3 Clinical Research


Clinical research is a branch of healthcare science that determines the safety and effectiveness of medications,
devices, diagnostic products, and treatment regimens intended for human use. Clinical research may be used for
the prevention, treatment, diagnosis or for relieving symptoms of a disease. In contrast to clinical practice, which
applies established treatment regimes, clinical research collects evidence to extend knowledge and establish the
value of novel treatments and other patient management practices.

© Copyright 2021 International Health Terminology Standards Development Organisation 43


Data Analytics with SNOMED CT
(2021-03-11)

Clinical research typically involves the analysis of data from well-defined and homogenous groups of patients with
a specific disease, at a specific stage, receiving similar treatments and often without significant co-morbidities. The
data may be captured prospectively or retrieved retrospectively.
SNOMED CT helps clinical research activities by assisting in the identification of clinical trial candidates, enabling
the powerful analysis of trial data, supporting predictive medicine, and improving the effectiveness of semantic
search over clinical knowledge.
In this section, we discuss three key aspects to clinical research that can benefit from the use of SNOMED CT:
identification of clinical trial candidates, predictive medicine, and semantic search.

7.3.1 Identification of Clinical Trial Candidates


SNOMED CT can be used to assist the process of identifying clinical trial candidates for recruitment into formal
clinical trials. Subsets of findings, procedures or medications (see section 6.1 Subsets) can be used to filter trial
candidates based on their clinical conditions or treatments. Subsumption techniques (see section 6.2
Subsumption) can be used to identify suitable candidates, irrespective of the level of granularity in which their
clinical data is stored. SNOMED CT defining relationships (see section 6.3 Using Defining Relationships) can be used
in a number of ways – for example, identifying patients with diseases of specific anatomical sites, with certain
morphologies; patients who are taking medications with specific ingredients or dose forms; and patients who have
had procedures on a specific body site.
Commercial tools, which can be used to support clinical research include Cerner's clinical research module
(PowerTrials), which offers patient identification functionality (case study 12.2.7 Cerner).

7.3.2 Predictive Medicine


Predictive medicine involves predicting the probability of disease and implementing measures to either prevent the
disease altogether or significantly decrease its impact upon the patient. The outcomes of predictive medicine are
often applied to the care of individual patients, but may also inform the deployment of resources to entire
populations at high risk.
The goal of predictive medicine is to predict the probability of future disease so that healthcare professionals and
the patient themselves can be proactive in implementing lifestyle modifications and increased physician
surveillance, such as regular skin exams, mammograms, or colonoscopies. Predictive medicine changes the
paradigm of medicine from being reactive to being proactive, and has the potential to significantly extend the
duration of health and to decrease the incidence, prevalence and cost of diseases.
Much attention has been focused on the availability of genetic makers of vulnerability to specific illnesses. However
the accurate capture of phenotypic (e.g. height and weight, blood pressure), environmental factors (e.g. smoking,
alcohol consumption) and other lifestyle factors (e.g. exercise, nutrition, quality of life) is not to be overlooked. For
example:
1. Patient is a smoker and has ischemic heart disease ? predict excess risk of myocardial infarction
2. Patient has BRCA1 gene and is a 40 year old woman ? predict (excess) risk of breast cancer
SNOMED CT can help to support predictive medicine by:
• Helping to identify clinical trial candidates (as described in section 7.3.1 Identification of Clinical Trial
Candidates)
• Helping to analyze clinical data, such as family history, lifestyle and environmental findings, to improve
predictive capabilities (using analytics techniques, as described in section 6 SNOMED CT Analytic
Techniques)
• Providing a link between patient data and risk assessment rules, so that rules can be triggered based on
subsumption of codes recorded in clinical data (see section 6.2 Subsumption). For example, matching
against patient records could be improved by defined the above rules as:

© Copyright 2021 International Health Terminology Standards Development Organisation 44


Data Analytics with SNOMED CT
(2021-03-11)

1. Criteria: 77176002 |smoker| AND 414545008 |ischemic heart disease|


Risk: 22298006 |myocardial infarction|
2. Criteria: 412734009 |BRCA1 gene mutation positive|
Risk: 254837009 |breast cancer|

7.3.3 Semantic Search


With an ever increasing volume of medical literature and clinical reports, it is becoming increasingly important to
be able to meaningfully search this information. A major application for Natural Language Processing technologies
is to index collections of free text transcripts or documents such that topic specific searches may be run on them.
The challenge is to move beyond the limitations of plain keyword searching strategies towards more advanced
search techniques, which return ranked matches with high sensitivity and specificity. Clinical searches may be
performed over documents within an electronic library, within medical records, or on the internet. Examples of
searches include:
• "Show me articles on this website concerned with inflammatory bowel disease"
• "Does this patient have transcripts in their record suggesting a heart rhythm disturbance?"
SNOMED CT was used in techniques developed by Koopman to improve search performance by addressing
vocabulary mismatch (using synonyms, e.g. hypertension vs high blood pressure), granularity mismatch (using
hierarchical relationships, e.g. antipsychotic vs haloperidol), conceptual implication (using defining relationships,
e.g. from renal cyst infer kidney) and inferences of similarity (e.g. using subset membership, e.g. comorbidities
anxiety and depression). Koopman also assigned a measure of similarity to each SNOMED CT relationship type, and
use this weighting to determine the relevance of each document.
Some commercial tools also provide semantic search, including Cerner's semantic search tool.

© Copyright 2021 International Health Terminology Standards Development Organisation 45


Data Analytics with SNOMED CT
(2021-03-11)

8 Data Architectures
While the use of SNOMED CT for analytics does not dictate a particular data architecture, there are a few key
options to consider. In this section, we describe the major categories of data architecture that may be used to
perform analytics over SNOMED CT enabled patient data, including:
1. Analytics directly over patient records;
2. Analytics over data exported to a data warehouse;
3. Analytics over a Virtual Health Record (VHR);
4. Analytics using distributed storage and processing.
Please note that some of these approaches may be used in combination. For example, data warehouses with large
volumes of data may use distributed storage and processing for enhanced performance, and querying directly over
disparate patient records could be performed using a Virtual Health Record.

8.1 Patient Records for Analytics


Electronic patient record systems typically require high performance, high reliability and no (or limited) downtime.
Any operation that effects these key criteria need to be kept to an absolute minimum, so as not to disturb the
clinical and documentation activities of busy clinicians.
Many analytics activities require large volumes of data to be processed, which may slow down or even 'lock out'
clinical transactions that are being performed at the same time. For this reason, population-based analytics and
clinical research is typically not performed directly on patient records in their native clinical system. Instead,
analytics directly over 'live' patient records tends to be restricted to point of care analytics activities, such as
historical summaries, clinical decision support and point of care reporting. These analytics activities tend to
demand the most up to date data possible to ensure its accuracy. They also tend to only require data for a single
patient, which can be efficiently accessed using a patient identifier index.
Figure 8.1-1 illustrates a simple architecture in which the data store for patient records is directly used for reporting
and analytics purposes.

© Copyright 2021 International Health Terminology Standards Development Organisation 46


Data Analytics with SNOMED CT
(2021-03-11)

Figure 8.1-1: Analytics directly over patient records

© Copyright 2021 International Health Terminology Standards Development Organisation 47


Data Analytics with SNOMED CT
(2021-03-11)

8.2 Data Warehouse


A data warehouse is a central data store of integrated data from one or more disparate sources, used for reporting
and data analysis. While operational systems are optimized for the preservation of data integrity and the speed of
recording transactions, data warehouses are optimized for the high performance execution of queries.
The typical Extract-Transfer-Load (ETL) based data warehouse uses a staging layer to clean the extracted data and
transform it into a homogeneous structure and standardized terminology. During this process, the techniques from
Preparing Data for Analytics, such as mapping codes to SNOMED CT, can be used to prepare the data for analytics.
The transformed data is then loaded into the data warehouse, and indexed, so that optimized analysis of the data
can begin.
The benefits of using a data warehouse include:
• Data from multiple heterogeneous sources can be integrated to enable consistent querying over data from
all sources
• The operational clinical system does not suffer performance degradation when running large analytics
queries over historical data
• The data quality can be improved by cleaning the data, and mapping non-SNOMED CT codes to SNOMED CT
• The data can be restructured to optimize query performance
Figure 8.2-1 illustrates an architecture in which the patient record data is extracted from its operational data store
and loaded into a data warehouse for reporting and other analytics purposes.

© Copyright 2021 International Health Terminology Standards Development Organisation 48


Data Analytics with SNOMED CT
(2021-03-11)

Figure 8.2-1: Querying using a data warehouse

Commercial data warehousing solutions that support SNOMED CT include Cambio's COSMIC Intelligence, Cerner's
PowerInsight Data Warehouse (PIDW) and Cerner's Health Facts Data Warehouse.

8.3 Virtual Health Record


A Virtual Health Record (VHR) provides a virtual view of heterogeneous data sources, using a common data model.
In contrast to the data warehousing approach in which heterogeneous data is extracted, transformed and stored in
a homogeneous form, the VHR approach does not require clinical data to be extracted from existing data stores.
Instead, logical queries are defined in terms of a common data model and then transformed into a set of physical
queries which can each be executed locally on an individual data store. Figure 8.3-1 illustrates an architecture
which supports querying over a VHR.

© Copyright 2021 International Health Terminology Standards Development Organisation 49


Data Analytics with SNOMED CT
(2021-03-11)

Figure 8.3-1: Querying using a Virtual Health Record

The process of transforming the logical query into separate physical queries may involve translating:
• The Query Language – from a common query language to the local data store's native query language
• Data Model References – from the common data model to the local data model
• Terminology References – from the standard terminology to the local code system

© Copyright 2021 International Health Terminology Standards Development Organisation 50


Data Analytics with SNOMED CT
(2021-03-11)

For example, if the user poses the following SQL query, written in terms of the VHR's common data model, to select
those patients with a diagnosis that is a subtype of 40733004 |infectious disease|:
SELECT patient_id FROM Health_Records
WHERE diagnosis IN (<40733004 |infectious disease|)
This query may be translated into the following 3 queries for local execution on each data store:
Data Store A:
Patient_record/patient_id[@diagnosis=typeOf(INF)]
Data Store B:
SELECT id FROM EHR NATURAL JOIN DSummary
WHERE discharge_diagnosis IN (descendantsOf (40733004)
Data Store C:
SELECT patient FROM record
WHERE diag IN (<40733004)
Similarly, when the query results are returned by each data store, these need to be transformed and mapped into
the common data model and then combined for presentation to the user.
The VHR approach provides an alternative architecture to a data warehouse for integrating heterogeneous systems.
It is most commonly used when copying clinical data into a data warehouse is not possible (e.g. due to legislative
requirements), or when the currency of the data is imperative. The challenges with this approach lie with the
potential complexity of the transformations required. The implementation of this approach is considered to be a
type of heterogeneous distributed database, as described in Section 8.4 Distributed Storage and Processes.

8.4 Distributed Storage and Processes


The increasing volume and variety of data collected by healthcare enterprises is a challenge to traditional relational
database management systems. This increase in data is due both to an increase in computerization of health
records, and to an increase in the capture of data from other sources, such as medical instruments (e.g. biometric
data from home monitoring equipment), imaging data, gene sequencing, administrative information,
environmental data and medical knowledge. The proliferation of large volumes of both structured and
unstructured data sets has led to the popularity of the term 'Big data' within the healthcare context. Big data refers
to any collection of data sets that is so large and complex that it becomes difficult to process them using traditional
data processing applications.
Accommodating and analyzing this expanding volume of diverse data (i.e. 'Big Data') requires distributed database
technologies. A distributed database is a federation of loosely coupled data stores with separate processing units,
which are controlled by a common distributed database management system. It may be stored in multiple
computers located in the same physical location, or dispersed over a network of interconnected computers.
Distributed databases may be categorized as either:
• Homogeneous – A distributed database with identical software and hardware running on all database
instances.
• Heterogeneous – A distributed database supported by different hardware, operating system, database
management systems and even data models (e.g. using the VHR strategy described in section 8.3 Virtual
Health Record).
In both cases, however, the database appears through a single interface as if it were a single database.
Distributed databases are used for Big Data analytics for a number of reasons, including:
• Transparency of querying over heterogeneous data stores (as described in section 8.3 Virtual Health Record)
• Increase in the reliability, availability and protection of data due to data replication

© Copyright 2021 International Health Terminology Standards Development Organisation 51


Data Analytics with SNOMED CT
(2021-03-11)

• Local autonomy of data (e.g. each department or institution controls their own data)
• Distributed query processing can improve performance, as the load can be balanced among the servers
A number of tools are available for the distributed storage and processing of big data, including Apache Hadoop.
Apache Hadoop is an open-source software framework, which splits files into large blocks and distributes these
blocks amongst the nodes in the cluster. To process the data, Hadoop sends code to the nodes that have the
required data, and the nodes then process the data in parallel. Hadoop supports horizontal scaling – that is, as data
grows additional servers can be added to distribute the load across them.
Many distributed database solutions use NoSQL (Not Only SQL) systems. NoSQL systems are increasingly being
used for big data, as they provide a mechanism for storage and retrieval of data in a variety of structures, including
relational, key-value, graph or documents. The Oxford University, in collaboration with Kaiser Permanente (case
study 13.1.2 Kaiser Permanente) are using a NoSQL database (RDFox) to investigate how to perform complex
queries efficiently across extremely large numbers of patient records. RDFox is a highly scalable and performant
NoSQL database that is readily distributed across parallel processing units.

© Copyright 2021 International Health Terminology Standards Development Organisation 52


Data Analytics with SNOMED CT
(2021-03-11)

9 Database Queries
Practically all analytical processes are driven by database queries. A database query is a machine readable question
presented to a database in a predefined language.
Unlike other code systems, which either have no hierarchy or a hierarchy that is fully represented within the code
(e.g. H65.9), just retrieving the SNOMED CT codes recorded in a patient record does not fully utilize the analytics
capabilities of SNOMED CT. To get the most benefit from using SNOMED CT in patient records, one must be able to
not only query the records themselves, but also query SNOMED CT.
In this section, we describe how record and terminology queries can work together to perform powerful queries
over SNOMED CT enabled data. In section 10 User Interface Design, we will then consider how user interfaces can be
designed to make these queries more accessible to non-technical users.

9.1 Terminology Queries

9.1.1 SNOMED CT Languages


SNOMED International is developing a consistent family of computable languages to support a variety of use cases
involving SNOMED CT, including querying and defining intensional subsets. The SNOMED CT family of computable
languages will include:
• Compositional Grammar – for defining SNOMED CT expressions
• Expression Constraint Language – for constraining a set of possible expressions
• Query Language – for querying over SNOMED CT content
• Template Languages – using the other languages with slots that may be filled at a later time
SNOMED CT compositional grammar, which provides a common foundation for all the SNOMED CT computable
languages, was adopted as an international standard in 2010. In 2014, the first version of the SNOMED CT
expression constraint language was then published. The SNOMED CT template languages and query language are
currently under development and will be made available in the near future.
Both the SNOMED CT Expression Constraint Language and the SNOMED CT Query Language can be used to define
queries against SNOMED CT content.
The SNOMED CT Expression Constraint Language is a formal language used to represent SNOMED CT Expression
Constraints. A SNOMED CT Expression Constraint is a computable rule that can be used to define a bounded set of
clinical meanings represented by either precoordinated or postcoordinated expressions. SNOMED CT Expression
Constraints allows a set of clinical meanings to be defined using hierarchical relationships, attribute values,
reference set membership, and other features such as cardinality, conjunction, disjunction and exclusion. For
example, the following expression constraint represents the set of clinical findings, which have both a finding site of
'pulmonary valve structure' (or a subtype of 'pulmonary valve structure') and an associated morphology of
'stenosis' (or a subtype of 'stenosis').

<< 404684003 |clinical finding| :


363698007 |finding site| = << 39057004 |pulmonary valve structure| ,
116676008 |associated morphology| = << 415582006 |stenosis|
The SNOMED CT Query Language is a formal language used to represent SNOMED CT Queries. This language is
based on the same features as the SNOMED CT Expression Constraint Language, with the addition of SNOMED CT
specific filters. These filters allow the author of the query to restrict the results based on the version of SNOMED CT
being used and the value of SNOMED CT's release file fields (e.g. definitionStatus, characteristicType,
languageCode, term and typeId). Additional keywords are also provided (e.g. preferredTerm, fullySpecifiedName)
to simplify the use of common filter combinations. For example, the following SNOMED CT query finds all fully

© Copyright 2021 International Health Terminology Standards Development Organisation 53


Data Analytics with SNOMED CT
(2021-03-11)

defined diseases which have a preferredTerm (in the GB English language reference set) that contains the substring
"heart".
<< 64572001 |disease| {{ definitionStatus = 900000000000073002 |defined|,
preferredTerm = "*heart*", languageRefSet = 900000000000508004 |GB English| }}
B2i's Snow Owl Terminology Server (see case study) supports the execution of SNOMED CT queries using a
precursor to the SNOMED CT Expression Constraint Language (referred to as 'Extended SNOMED CT Compositional
Grammar' or 'ESCG').

9.1.2 SNOMED CT APIs


An Application Programming Interface (API) for a SNOMED CT enabled terminology server can be used to request
the execution of SNOMED CT searches and queries. Using a terminology server API, record management systems
are able to effectively access terminology services without re-implementing their functionality in every system.
A number of commercial terminology servers offer proprietary APIs that enable SNOMED CT search and query,
including Dataline's SnAPI solution and B2i's Snow Owl Terminology Server (case study 2.5). An example of a script
which uses the B2i's Snow Owl API to execute a SNOMED CT query is shown below:
import [Link]

def escgQuery = """

<<404684003|Clinical finding| :

246454002|Occurrence| = 255399007|Congenital|,

370135005|Pathological process|=<<263680009|Autoimmune|

def escgEvaluator = new EscgEvaluatorService() //initialize a service for evaluating a query

def concepts = [Link](escgQuery) //evaluate the query

[Link] { println "ID: ${[Link]}, ${[Link]}" } //prints the result to the console

Standardized APIs for terminology services are also available. In particular, HL7's Common Terminology Services 2
(CTS 2) provides a standardized API that supports access to terminology servers that may contain a variety of code
systems, including SNOMED CT.

9.2 Patient Record Queries


The query language used to query a set of patient records is usually dependent on the type of database used to
store the patient records. For example:
• Relational databases may be queried using SQL (Structured Query Language)
• Object-oriented databases may be queried using OQL (Object Query Language)
• RDF databases may be queried using SPARQL (SPARQL Protocol And RDF Query Language)
• XML databases may be queried using XQuery (XML Query Language)
• OLAP databases may be queried using MDX (Multidimensional Expressions)
However, some query languages support logical queries that are independent of the application, programming
languages, system environment and storage models - for example, AQL (Archetype Query Language) and EQL (EHR
Query Language). These languages instead focus on queries based on the relevant information models (called
'archetypes').
To get the most benefit from using SNOMED CT in patient records, however, one must be able to not only query the
records themselves, but also query SNOMED CT.

© Copyright 2021 International Health Terminology Standards Development Organisation 54


Data Analytics with SNOMED CT
(2021-03-11)

One way of achieving this is to include a list of all possible SNOMED CT codes that are required within the query. For
example, to find the patients with a Respiratory system disorder, one could include every individual code that is a
descendant of 50043002 |disorder of respiratory system| (around 3000 codes) within the patient record query. Using
SQL, this would look like:
SELECT DISTINCT PatientID FROM ProblemList
WHERE Code IN (140004, 181007, 222008, 490008, 517007, 599006, 652005, 663008, etc)
However, this creates a lengthy query that is difficult to both validate and maintain. In some cases, it may also be
too long to be accepted by the query engine.
Another approach would be to use a subset of respiratory system disorders, and load these into a separate table –
for example:
SELECT DISTINCT PatientID FROM ProblemList
WHERE Code IN (SELECT * FROM RespiratorySystemDisorders)
However, it may not be scalable to create a new table for each terminology query that is required.
A third approach would be to use a transitive closure table to test the hierarchical relationship between each
SNOMED CT code and 50043002 |disorder of respiratory system|. For example,
SELECT DISTINCT PatientID FROM ProblemList PL
INNER JOIN SNOMEDTransitiveClosure TC ON [Link] = [Link]
WHERE [Link] = 50043002
However, to support a more advanced style of query that utilizes the full capabilities of SNOMED CT, SNOMED CT
query languages or API calls must be embedded within the patient record query languages. For example, the
following queries use the SNOMED CT Expression Constraint Language embedded within a SQL query.
SELECT DISTINCT PatientID FROM ProblemList
WHERE Code IN (< 50043002 |disorder of respiratory system| )
SELECT DISTINCT PatientID FROM ProblemList
WHERE Code IN (<< 404684003 |clinical finding|:
363698007 |finding site| = << 39057004 |pulmonary valve structure|,
116676008 |associated morphology| = << 415582006 |stenosis|)

© Copyright 2021 International Health Terminology Standards Development Organisation 55


Data Analytics with SNOMED CT
(2021-03-11)

10 User Interface Design


In this section, we consider how user interfaces can be designed to harness the capabilities of SNOMED CT, and to
make clinical querying more accessible to non-technical users. We describe both user interfaces for authoring
queries, as well as user interfaces for viewing query results.

10.1 Query Interface


When querying patient records containing SNOMED CT-enabled data, a variety of interfaces may be adopted to
support the user in authoring queries. In this section we first consider user interfaces for querying SNOMED CT, and
then look at user interfaces for querying SNOMED CT enabled patient records.

Terminology Query Interfaces


When querying clinical data, it may be necessary to first define a subset of SNOMED CT concepts (e.g. disorders or
procedures) that may then be compared against values in a patient record. A number of different options exist for
creating these SNOMED CT subsets, including:
1. Selecting individual SNOMED CT concepts (i.e. extensional definition)
2. Authoring queries directly using a query language (i.e. intensional definition)
3. Authoring queries using a structured form (i.e. a form which generates an intensional definition)

Selecting Individual Concepts


This approach uses a SNOMED CT browser to allow individual SNOMED CT concepts to be searched, selected and
added to a subset. For large subsets this can be quite time consuming, however it is quite suitable for smaller
subsets. A number of commercial tools are available which help to perform this task, including Apelon's Distributed
Terminology System and B2i's Snow Owl terminology server. Figure 10.1-1 below illustrates Snow Owl's authoring
interface for Simple reference sets.

© Copyright 2021 International Health Terminology Standards Development Organisation 56


Data Analytics with SNOMED CT
(2021-03-11)

Figure 10.1-1: B2i's Snow Owl interface for authoring Simple Reference Sets

Authoring Queries Using a Query Language


Other user interfaces allow a subset to be defined using a text-based query written using a predefined query
language (e.g. SNOMED CT Expression Constraint Language, or SNOMED CT Query Language). These interfaces tend
to be for the more technical user. However, some clinical users may be taught to use these interfaces if required.
Two examples of this style of interface are illustrated in Figure 10.1-2 and Figure 10.1-3. Figure 10.1-2 shows the
NHS Data Migration Workbench query interface, while Figure 10.1-3 shows B2i's Snow Owl query interface.

© Copyright 2021 International Health Terminology Standards Development Organisation 57


Data Analytics with SNOMED CT
(2021-03-11)

Figure 10.1-2: NHS Data Migration Workbench interface for authoring queries

© Copyright 2021 International Health Terminology Standards Development Organisation 58


Data Analytics with SNOMED CT
(2021-03-11)

Figure 10.1-3: B2i's Snow Owl interface for authoring text-based queries

© Copyright 2021 International Health Terminology Standards Development Organisation 59


Data Analytics with SNOMED CT
(2021-03-11)

Authoring Queries Using a Structured Form


A third style of user interfaces for authoring SNOMED CT subsets uses a structured form. A form-driven query tool
may allow the user to select an operator (e.g. 'memberOf', 'descendantOf'), the concept or subset to which this
operator is applied (e.g. 'Example problem list', 'Disorder'), and then one or more attribute values to limit the set of
concepts returned. (Note: The attribute name may either be selected from a list, or hard coded on the form). Once
the form is completed, a text-based query is automatically constructed from the selected values, and executed
against SNOMED CT. This style of interface can be designed to allow users to exploit the rich semantics of SNOMED
CT, while shielding them from the underlying technical details. Figure 10.1-4 illustrates how a generic form-driven
interface for authoring SNOMED CT queries works. Vendor products which implement form-driven interfaces for
authoring SNOMED CT queries include B2i's Meaningful Query web interface (as shown in Figure 10.1-5).

© Copyright 2021 International Health Terminology Standards Development Organisation 60


Data Analytics with SNOMED CT
(2021-03-11)

Figure 10.1-4: A generic form-driven interface for authoring SNOMED CT queries

© Copyright 2021 International Health Terminology Standards Development Organisation 61


Data Analytics with SNOMED CT
(2021-03-11)

Figure 10.1-5: B2i Snow Owl's Meaningful Query web interface

Patient Record Query Interfaces


When SNOMED CT queries are integrated (or embedded) into patient records queries, additional constraints are
often added across demographic data (e.g. age, address) and episode of care data (e.g. healthcare provider, dates).
These data items are often referred to as 'concrete values' and are typically not included in a terminology. A
number of styles of interfaces are used to author patient record queries that include SNOMED CT content, including:
1. Free text semantic search
2. Queries using a predefined language (e.g. SQL, XQL, OQL or AQL)
3. Queries using a structured form (including both SNOMED CT and concrete value criteria)
Figure 10.1-6 shows an example of a search for 'diabetes' using Cerner's Semantic Search tool. This tool enables
clinicians at the point of care to search in real time through a patient's multiple charts, pathology reports and other
documents for topics such as 'heart disease' and 'diabetes', using SNOMED CT's hierarchical and non-hierarchical
relationships.

© Copyright 2021 International Health Terminology Standards Development Organisation 62


Data Analytics with SNOMED CT
(2021-03-11)

Figure 10.1-6: User interface of Cerner's Chart Search/Semantic tool

10.2 Results Visualization


When a SNOMED CT enabled query over patient records is executed, the results of this query can be visualized in a
number of ways, including tables, charts, scatter diagrams and colored epidemiology maps. While some of these
results visualization techniques can be used with any coding system, others are able to utilize the unique features
of SNOMED CT in powerful ways.
For example, Figure 10.2-1 shows a report produced by Cerner's data warehouse query tool. This tool uses a simple
graphical interface which directly creates powerful reports using the SNOMED CT hierarchy content. The screenshot
shows a report of attendances with diagnoses which are a descendant of the SNOMED CT concept 417746004 |
traumatic injury|.

© Copyright 2021 International Health Terminology Standards Development Organisation 63


Data Analytics with SNOMED CT
(2021-03-11)

Figure 10.2-1: Report produced by Cerner's data warehouse query tool

SNOMED CT's rich polyhierarchy provides a vast number of potential 'aggregators' for analytics, and possible views
of SNOMED CT encoded data. This polyhierarchy can be exploited by visual exploratory data analysis tools to
enable the visual inspection of complex datasets.
For example, the NHS have been using the Gephi open-source network analysis and visualization software, to
explore SNOMED CT encoded renal datasets.
The first representation (in Figure 10.2-2) shows a projection of all concepts directly coded in the patient data, with
the node size reflecting the frequency of each code. 36689008 |acute pyelonephritis| has a high frequency in the
data and is therefore represented by a big node, while 254915003 |clear cell carcinoma of kidney| has a low
frequency in the data and is therefore represented by a small node.

© Copyright 2021 International Health Terminology Standards Development Organisation 64


Data Analytics with SNOMED CT
(2021-03-11)

Using a simple concentration algorithm, which aggregates subsumed concepts up to a given threshold, the
representation in Figure 10.2-3 is achieved. In this representation, the size of the purple nodes reflects the
frequency of each code plus its subtypes, the size of the blue nodes reflects the frequency of each code's subtypes,
and the size of the red nodes reflects the frequency of each code on its own. This enables trends to be visually
detected – for example, 36171008 |glomerulonephritis| and 36171008 |acute pyelonephritis| - even when the
frequency of these concepts themselves is relatively low.

Figure 10.2-2: Gephi representation of renal dataset showing direct code usage

© Copyright 2021 International Health Terminology Standards Development Organisation 65


Data Analytics with SNOMED CT
(2021-03-11)

Figure 10.2-3: Gephi representation of renal dataset with direct and inherited code usage

Innovative data visualization and analysis tooling is expected to become much more widespread as the powerful
capabilities of SNOMED CT content are increasingly utilized.

© Copyright 2021 International Health Terminology Standards Development Organisation 66


Data Analytics with SNOMED CT
(2021-03-11)

11 Challenges
This section discusses some of the challenges which should be considered when performing analytics over clinical
data. Most of these challenges result from the fundamental nature of health record information, and therefore exist
irrespective of the code system used. Many of these challenges are able to be mitigated using the unique features of
SNOMED CT. The challenges fall into four broad categories:
• Reliability of patient data
• Terminology / information model boundary issues
• Concept definition issues
• Versioning
SNOMED CT offers significant advantages, compared to other code systems, in both performing powerful clinical
analytics, and in mitigating many of these challenges.

11.1 Reliability of Patient Data


High quality data collection is imperative to the quality and accuracy of analytics results, irrespective of the
terminology used. Whether the focus is decision support, business intelligence, research or a mixture of all three -
data quality is critical. High quality information is not the consequence of collecting as much data as possible.
Instead, it is the product of intentionality and process design.
The factors that may impact the quality of patient data include:
• The design of user interfaces used to capture data
Clinical user interfaces should be designed to make it as easy as possible to find the most appropriate code, and as
difficult as possible to enter the wrong code. There are a variety of ways to improve the ease and effectiveness of
data entry using SNOMED CT – such as searching over all synonyms, confirming the selected concept using the
preferred term or fully specified name, ordering value lists effectively using an ordered reference set, searching
using navigation hierarchies, and constraining data entry using subsets.
1 These techniques can also help to reduce data entry errors by prohibiting invalid input, helping the user to
understand the correct meaning of the code selected, and ordering value lists in a clinically safe order (e.g. ordering
medications by strength, rather than alphabetically).
• Use of diagnostic criteria to standardize data capture
Diagnostic criteria and their application tends to vary widely according to care setting, patient status and
healthcare professional. The consistent ascertainment and recording of even common diagnoses, such as asthma
and myocardial infarction is often non-trivial. High quality prospective research studies require that diagnostic
criteria for the condition being studied are understood, rigorously applied and accurately documented. In routine
clinical practice doing this for potentially thousands of diagnoses in dozens of care settings is normally infeasible.
Divergence and inconsistencies in criteria for diagnosis capture can undermine the validity of any conclusions
which may be drawn from analytics. SNOMED CT mitigates this issue by allowing the query author to choose a
reliable aggregating concept from SNOMED CT's extensive content.
• Consistency of data capture with analytics requirements
Pick lists and constraints should be consistent with both clinical data collection needs and analytic requirements
and these should never be in conflict. The presence or absence of particular concepts in value sets within different
applications can cause data collection to be inconsistent. SNOMED CT mitigates this by allowing the query author
to choose a reliable aggregating concept.
• Loss of meaning during data transformations
Clinical data often undergoes a number of structural transformations and code mappings prior to data analytics
being performed, during the process of preparing the data for messaging and/or loading into a data warehouse. In
each of these transformations, care must be taken to ensure that the quality of the process is high, and that there is
no incremental shift in the clinical meaning of the data. For example, mapping local codes to an alternate code
system using non-equivalence maps (e.g. narrow to broad or broad to narrow) will change the clinical meaning of

© Copyright 2021 International Health Terminology Standards Development Organisation 67


Data Analytics with SNOMED CT
(2021-03-11)

these codes to some degree. Any changes that effect the clinical meaning of the data may have an impact on the
quality of data analytics. SNOMED CT helps to mitigate this by supporting the representation of equivalence maps,
which can be used when the use case requires.

1 SNOMED CT Search and Data Entry Guide, 2014, [Link]

11.2 Terminology / Information Model Boundary Issues


When performing data analytics over clinical data, it is important to understand the interdependency between the
terminology and the structural information model. For example, it is not sufficient to find a diagnosis of 56265001 |
heart disease|, and make the assumption that the patient has heart disease. Instead, the surrounding information
model must be considered to discover whether this is, for example, a confirmed diagnosis for the patient
themselves, a suspected or preliminary diagnosis for the patient, or perhaps a family history of heart disease in the
patient's paternal grandfather. Contextual or qualifying information about a code may appear in a variety of places,
including:
• Within the information model – for example, a section heading titled "Family History"
• In the same coded data element – for example, precoordinated as "394886001|suspected heart disease|" or
postcoordinated as "56265001 |heart disease|: 408729009 |finding context| = 415684004|suspected|"
• In a separate coded data element – for example, Diagnosis = 56265001 |heart disease|, Type = 148006 |
preliminary diagnosis|
By understanding where and how this contextual or qualifying information is represented, more appropriate
queries can be created.
When the same semantics may be represented in both the information model and the terminology, there is also a
risk of ambiguity as to how these two representations should be combined. This is clearly demonstrated by models
in which both the information model and the terminology can represent 'negation' or 'absence'. Does the
combination of 'negation' in the information model and 'absence' in the terminology indicate:
• Double negative,
• Redundant restatement of the negative, or
• Additional emphasis of the negative?
It is important in these situations to have clear rules about how the semantics in the information model and the
terminology should be combined.
The challenge often becomes even greater when heterogeneous data sources are integrated. When different
information models represent the same semantics using different combinations of structure versus terminology,
retrieval and reuse may miss similar information. To avoid false negatives or false positives in the query results, the
integration and/or analytics processes must resolve these differences.
For example, in Figure 11.2-1 below, the system on the left uses the 'Family history' structural heading to indicate
that the selected disease is a family history, while the system on the right precoordinates this within the
terminology. When integrating or querying across these data sources, these semantics need to be harmonized to
ensure accurate queries can be performed.

© Copyright 2021 International Health Terminology Standards Development Organisation 68


Data Analytics with SNOMED CT
(2021-03-11)

Figure 11.2-1: Two ways of recording family history of diabetes mellitus

Even when the same information model is used, different systems may populate this model with differing levels of
precoordination. For example, the three clinical systems shown below in Figure 11.2-2 each collect data about a
'suspected lung cancer' diagnosis in a different way. For this reason, when given a common data model (as shown
in Figure 11.2-3), different systems may populate this in different ways. When this occurs, queries must be careful to
consider all possible representations of the data, to ensure that contextual and qualifying information about each
code is correctly interpreted.

© Copyright 2021 International Health Terminology Standards Development Organisation 69


Data Analytics with SNOMED CT
(2021-03-11)

Figure 11.2-2: Three ways of recording suspected lung cancer

© Copyright 2021 International Health Terminology Standards Development Organisation 70


Data Analytics with SNOMED CT
(2021-03-11)

Figure 11.2-3: Three ways of populating a common Problem Diagnosis model

SNOMED CT is in the unique position to be able to resolve many of these challenges, using the techniques described
in sections 6.4 Description Logic Over Terminology and 6.5 Description Logic Over Terminology and Structure. For
example, SNOMED CT enables the computation of equivalence and subsumption between alternative
representations of data. For example, the postcoordinated expression 22253000 |pain| : 363698007 |finding site| =
56459004 |foot| (which can be represented either in a single data element or using two separate data elements for
22253000 |pain| and 56459004 |foot|) can be automatically determined to be equivalent to the precoordinated
concept 47933007 |foot pain| (stored in a single data element).
Some cases exist, however, where SNOMED CT is not currently able to automatically establish equivalence. These
cases primarily relate to concepts for which the SNOMED CT concept model does not yet fully model their meaning.
For example, the two approaches for representing a 'twin pregnancy' shown below ( Figure 11.2-4) are currently not
able to be computed as equivalent using SNOMED CT.

© Copyright 2021 International Health Terminology Standards Development Organisation 71


Data Analytics with SNOMED CT
(2021-03-11)

Figure 11.2-4: Two non-equivalent ways of recording a twin pregnancy using SNOMED CT

The SNOMED CT concept model continues to be extended to support equivalence and subsumption testing within
an increasing number of hierarchies of SNOMED CT.

11.3 Concept Definition Issues


While SNOMED CT is the most comprehensive clinical terminology in the world, containing an extensive set of logic-
based definitions which enable a broad range of powerful analytics, some challenges still exist, including:
• Logical versus vernacular
• Minimum sufficient sets
• Incomplete modelling
These challenges are described in more detail in this section.

Logical Versus Vernacular


In some cases, the strict logical meaning of a term may differ somewhat from the local vernacular (or common) use
of that term. For example, the assertions below in SNOMED CT are logically sound but may be counterintuitive to
clinicians:
• |insect bite of nose| is a subtype of |head injury|
• |laceration of radial artery| is a subtype of |cardiovascular disease|.
Examples, such as these, exist in which the formal logical definitions of these concepts may lead to hierarchies that
differ from what may be expected by some clinicians.

Minimum Sufficient Sets


SNOMED CT definitions include the set of necessary and sufficient conditions that define the given concept.
However, SNOMED CT does not currently distinguish the minimum sets which are sufficient to define these
concepts. For example, the defining relationships of 154283005 |pulmonary tuberculosis| are:
116680003 |is a| = 64572001 |disease|
246075003 |causative agent| = 113858008 |mycobacterium tuberculosis complex|

© Copyright 2021 International Health Terminology Standards Development Organisation 72


Data Analytics with SNOMED CT
(2021-03-11)

116676008 |associated morphology| = 6266001 |granulomatous inflammation|


363698007 |finding site| = 39607008 |lung structure|
While the associated morphology of 'granulomatous inflammation' is necessarily present, the following set of
defining relationships are sufficient to infer 154283005 |pulmonary tuberculosis|:
116680003 |is a| = 64572001 |disease|
246075003 |causative agent| = 113858008 |mycobacterium tuberculosis complex|
363698007 |finding site| = 39607008 |lung structure|
As a consequence if the following expression was recorded in a health record:

64572001 |disease| :
246075003 |causative agent| = 113858008 |mycobacterium tuberculosis complex|
363698007 |finding site| = 39607008 |lung structure|
This expression would not be returned by the following query:
<< 154283005 |pulmonary tuberculosis|
However, the query:

< 64572001 |disease| :


246075003 |causative agent| = << 113858008 |mycobacterium tuberculosis complex|
363698007 |finding site| = << 39607008 |lung structure|
would correctly return both the concept "154283005 |pulmonary tuberculosis|" and the above expression as
required. In this way, the design of appropriate queries can help to mitigate this issue.

Incomplete Modelling
The SNOMED CT Concept Model continues to evolve to allow more concepts to be fully defined. For example, the
'Observable Entity' and 'Substance' hierarchies each have new concept models being developed, which will allow
these concepts to be more fully defined in future releases of SNOMED CT. When the concept models for these
hierarchies are incorporated, SNOMED CT's expressive power and analytics capabilities will be further expanded.
In those hierarchies for which the concept model has been established for some time (e.g. Clinical finding), ongoing
expansion to SNOMED CT's formal logical definitions continues. However, there still remains some concepts which
do not yet have all possible defining relationships included. This issue will be mitigated over time as more of
SNOMED CT's concepts continue to be modelled.

11.4 Versioning
A new version of the International Edition of SNOMED CT is released twice a year (in January and July). National
extensions mostly follow this cycle (albeit typically with a three month delay). However, some extensions (notably
those including medication related concepts) are released more frequently.
When a longitudinal health record is populated with clinical data over a number of years, it is quite possible that the
following may occur:
1. SNOMED CT concepts that were active at the time of recording have since been made inactive
2. SNOMED CT concepts that were primitive at the time of recording have since been defined
3. Reference sets that were used to populate pick lists may have changed
4. The SNOMED CT Concept Model that was used to construct expressions may have changed
To mitigate these versioning issues, SNOMED CT provides the following:

© Copyright 2021 International Health Terminology Standards Development Organisation 73


Data Analytics with SNOMED CT
(2021-03-11)

1. Each new version of the SNOMED CT International Edition that is released (in Release Format 2 -RF2)
includes a set of Delta files (containing all changes to the content since the last release), a set of Snapshot
files (containing the most recent version of every component that has ever been released in SNOMED CT),
and a set of Full files (containing every version of every component that has ever been released in SNOMED
CT). These files allow implementations to either incrementally adapt to new versions of SNOMED CT, or
alternatively load a complete current snapshot of SNOMED CT content (with or without old versions). When
longitudinal clinical records containing inactive concepts are queried, all prior descriptions and
relationships of these inactive concepts can still be queried using these snapshot files. SNOMED CT's RF2
distribution files also record the reason that each inactive component was inactivated, using 'historical
association' reference sets (see [Link].R Historical Association Reference Sets for more details).
2. SNOMED CT is maintained on the principle that every SNOMED CT concept identifier should retain its
semantic integrity over time, even when its logical definition changes. The semantics of a SNOMED CT
concept is established through its Fully Specified Name, and all changes to a concept's defining
relationships are intended to improve the machine-readable processing of these semantics. That said, it is
possible if required to determine what the logical definition of a concept was at any prior point in time using
a Full release of SNOMED CT.
3. SNOMED CT's reference sets and their members are all fully versioned in SNOMED CT's RF2. A Snapshot
release of a reference set includes the current version of every row that has ever been released (including
both active and inactive rows). A Full release of a reference set includes every version of every row that has
ever been released. Using this information, it is possible to adapt queries to consider both current and
former members of any given reference set.
4. The SNOMED CT Concept Model changes very rarely. When it does, however, any attributes that are retired
are retained as inactive concepts in the Snapshot and Full releases of SNOMED CT. It is expected that a
complete Machine Readable Concept Model (MRCM) of SNOMED CT will be published in the future, and that
this MRCM will be versioned in a manner that is consistent with other RF2 components.

© Copyright 2021 International Health Terminology Standards Development Organisation 74


Data Analytics with SNOMED CT
(2021-03-11)

12 Appendix - Analytics Case Studies


This appendix presents two sets of case studies that demonstrate data analytics with SNOMED CT:
• Firstly, a number of projects that implement or support analytics using SNOMED CT are described;
• Secondly, a variety of commercial tools which support analytics over SNOMED CT enabled data are
presented.

12.1 Project Case Studies


This section includes brief reviews of a variety of projects which implement or support analytics over SNOMED CT
enabled data. The projects included in this review include:

We welcome additional input to this section and anticipate updates to this report as new information becomes
available.

12.1.1 Data Migration Workbench (UK)


The workbench is produced by the UK Terminology Centre (UKTC) as a DRAFT 'proof of principle' product. It
demonstrates advanced functionalities, leveraging SNOMED CT as a sophisticated reference terminology
together with the mappings across the current five NHS terminologies or classifications. It is not currently
intended to develop this as a fully supported product. 1

The UK Terminology Centre Data Migration Workbench (DMWB) is designed to support the NHS Primary Care
Summary Care Record, Primary Care Systems of Choice and Data Migration programs. This tool demonstrates some
of the properties and advanced uses of the data migration and mapping products published by the UKTC and the
terminologies and classifications that they link.
The workbench uses SNOMED CT to perform novel and sophisticated analyses of patient data. It has immediate 'off
the shelf' international utility despite the inclusion of the UK-only terminologies within the standard tool
distribution.
The software contains SNOMED CT, Read Codes Version 2 and CTV3, maps between these and maps to ICD-10
International Edition (UK map not the same as the SNOMED International one) and OPCS Classification of
Interventions and Procedures (OPCS-4). The Workbench modules support:
1. Searching and browsing the hosted code systems;
2. Viewing maps between the hosted code systems;
3. Authoring analytics subsets (i.e. terminology query predicates) and their testing, maintenance and
'translation' between code systems;
4. Electronic Patient Record (EPR) data quality analysis and data repair; and
5. EPR reporting and case mix analysis.

Queries Tool
The Queries Tool offers advanced functionality for authoring, maintaining and testing query code sets (called
'clusters') or subset definitions within any of the supported terminologies or classifications. One major application
is to produce query sets which will return comparable results from patient records encoded with different code
systems. To assist with this, the tool translates subset definitions expressed using one terminology into subset
definitions expressed using another, allowing refinement by manual editing (see Figure 12.1.1-1 and Figure
12.1.1-2).

© Copyright 2021 International Health Terminology Standards Development Organisation 75


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.1.1-1: DMWB Queries Tool building a SNOMED CT asthma subset

© Copyright 2021 International Health Terminology Standards Development Organisation 76


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.1.1-2: DMWB Queries Tool showing SNOMED CT asthma subset translated into
ICD-10

Electronic Patient Record Data


The EPR Data Tool provides an environment for loading and analyzing patient data, either to design query
specifications or as part of data quality and case mix analytics. The analytics functions include detection and
management of coding data quality issues such as:
• Records with inactive SNOMED CT codes;
• Records with codes from inappropriate SNOMED CT hierarchies e.g. a diagnosis recorded using a concept
from the substance hierarchy (e.g. 419442005 |ethyl alcohol|) rather than a code from the disorder hierarchy
(e.g. 25702006 |alcohol intoxication| or 7200002 |alcoholism|).
The tool also enables the rapid repair of such data by substituting inappropriate codes with more appropriate ones.
This service is performed in an offline reporting environment.

© Copyright 2021 International Health Terminology Standards Development Organisation 77


Data Analytics with SNOMED CT
(2021-03-11)

Analytics
The workbench data analytics tool runs cluster queries, combined with demographic data, to perform clinically
valuable case finding, case mix and caseload analysis.
The 'Overview' Report module includes:
• Basic demographics (population age, sex, ethnicity);
• Analyses of episodes with a SNOMED CT code;
• Counts by SNOMED CT supercategory;
• List of the 100 most frequently used individual SNOMED CT codes; and
• List of the 15 most common SNOMED CT codes for each age cohort.
The Trends module analyzes the frequency with which individual SNOMED CT codes are used in the EPR instance
data, looking for those whose recording frequency has changed over the course of the data collection period.
The Induce module performs a more sophisticated analysis of case mix and caseload trends within a clinical
department. Instead of returning the most frequently used individual codes, the Induce module attempts to
identify the most frequently used types of codes. For example, an emergency department may use roughly 500
different SNOMED CT codes for a laceration in a particular anatomical location. While none of the site-specific codes
may appear in a list of most common codes, the descendants of 312608009 | laceration| may collectively account for
a significant part of the department's workload.
The Graphs tool performs fundamentally the same query and search operations, but generates graphs based on the
patients or episodes identified, showing e.g. the age:sex distribution of patients in a defined casemix cohort, or the
changing incidence of one or more specified clinical phenomena (e.g. disease presentation, or procedure
performed) by year, quarter, month or day of the week. These graphs can be copied into documents.

1 [Link]

12.1.2 Kaiser Permanente (USA)


Kaiser Permanente HealthConnect®, is a comprehensive electronic health record and one of the largest private
electronic health systems in the world. KP HealthConnect with its integrated model securely connects more
than 611 medical offices and 37 hospitals, linking patients to their health care teams, their personal health
information and the latest medical knowledge. 1
For more information please visit [Link]

Kaiser Permanente (KP) has been involved in the development of SNOMED CT since its inception. Preceding this, KP
collaborated with the College of American Pathologists in the 1990's on the immediate predecessor of SNOMED CT
(SNOMED-RT). Some of the very earliest deployments of SNOMED CT have been within KP electronic patient record
systems.
The terminology services deployed within the KP HealthConnect electronic health record illustrate the practical use
of SNOMED CT as a key reference terminology within a multi-coding system environment. KP is also at the forefront
of realizing new possibilities offered by SNOMED CT using its description logic capabilities.

Convergent Medical Terminology


Convergent Medical Terminology (CMT) is KP's Enterprise Terminology System. While the KP HealthConnect EHR
system is built by Epic (see case study 12.2.10 Epic), the CMT is proprietary to Kaiser Permanente. CMT hosts several
components:
• Standard reference terminologies
• End user terminology (e.g. the terms presented to clinicians or patients)
• Administrative codes and classifications (e.g. ICD-9-CM, ICD-10-CM, CPT4, HCPCS)
• Analytics services (querying and decision support)
• Request submission for new terms

© Copyright 2021 International Health Terminology Standards Development Organisation 78


Data Analytics with SNOMED CT
(2021-03-11)

CMT uses SNOMED CT as a reference terminology, taking advantage of its poly-hierarchy and definitional attributes
to support advanced analytics – for example:
• Identifying patient cohorts with certain conditions for Population Care.
• Identifying subsets for use as "input criteria" for KPHC decision support modules, such as Best Practice
Alerts, Reminders, etc.
• Performing queries such as "find all conditions where |causative agent| is |Aspergillus (organism)|"
• Performing large aggregate queries, such as "find all patients coded with concepts in the cardiovascular
disorders subset"
In September 2010 Kaiser Permanente, IHTSDO and the US Department of Health and Human Services jointly
announced KPs donation of their CMT content and related tooling to SNOMED International. The donation consists
of terminology content (including several CMT subsets), tools to help create, manage and quality control
terminology.

Collaboration with Oxford University


KP in collaboration with the Information Systems Group at Oxford University are investigating how to perform
complex queries efficiently across extremely large numbers of patient records. The team at Oxford University has
developed an open source triple store (i.e. 'subject-predicate-object') database called RDFox. RDFox is highly
scalable and performant 'Not Only SQL' database readily distributed across parallel processing units. RDFox is an
implementation of the W3C Resource Description Framework (RDF) standard, which supports OWL-RL description
logic.
In this collaborative project, clinical data is being represented in OWL-RL as 'entity-role-act' triples. This uses a
logical model (with Entities in Roles participating in Acts) that is similar to HL7 V3's Reference Information Model.
OWL-RL and Datalog rule language is being used to reason over hundreds of millions of patient data triples. While
SNOMED CT expressions cannot be fully represented in OWL-RL, RDFox performs the preliminary large-scale clinical
data retrieval to return a far smaller record set. This smaller set is then processed using a richer featured but less
performant description logic reasoner supporting SNOMED CT.
Prototype work has been completed using real patient data, including observations for diabetes (coded using
SNOMED CT) and observations of Hemoglobin A1C levels. Datalog instructions and SPARQL queries were used to
calculate Healthcare Effectiveness Data and Information Set quality measures for diabetes management – for
example, both numerators and denominators for the Diabetes HgB A1C report.

1 [Link]

12.1.3 National Medication Decision Support System (Denmark)


Physicians often lack the time to familiarize themselves with the details of particular allergies or other drug
restrictions. Clinical Decision Support (CDS), based on a structured terminology, such as SNOMED CT, can help
physicians get an overview by automatically alerting allergy, interactions and other important information. The
centralized CDS platform based on SNOMED CT controls Allergy, Interactions, Risk Situation Drugs and Max
Dose restrictions with the help of databases developed for these specific purposes. 1

The National Release Centre of Denmark (National eHealth Authority) produces a SNOMED CT drug extension for
medications. The Danish SNOMED CT drug extension was primed by data extraction, cleansing and conversion of
content from the Danish Medicine Agency Database (DKMDB), which is primarily meant for pricing and stock
handling. The DKMDB was then complemented with SNOMED CT substances and their unique IDs. The Danish
SNOMED CT drug extension includes information such as trade names, substances, dose forms, strengths and units
of measure.
Building upon the Danish drug extension, the National eHealth Authority is working to introduce centralized
decision support (CDS) services for both primary care and hospital prescribing systems.
The CDS server will respond to web service requests from the various electronic medication systems and return
alerts and other prescribing information (see Figure 12.1.3-1)

© Copyright 2021 International Health Terminology Standards Development Organisation 79


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.1.3-1: Denmark medication decision support services overview

Allergies Register
© Copyright 2021 International Health Terminology Standards Development Organisation 80
Data Analytics with SNOMED CT
(2021-03-11)

A group of allergy specialists, family practitioners and CDS experts are developing a standard set of information to
be used in a patient drug allergy register. A SNOMED CT subset, from the Drug Allergy (disorder) sub hierarchy in the
Findings hierarchy, is used to document allergies. .

Allergy Alert Service


Allergy alerts are enabled by the relationships in SNOMED CT between allergy disorders and substances (via the |
causative agent| attribute), and relationships between drug products and substance concepts (via the |has active
ingredient| attribute).

Interactions Service
Based on an existing service, with data primarily drawn from peer-reviewed literature, the interaction database
describes 2,500 interactions between different drugs based on their ingredients.
The database contains a short description of all interactions and a recommendation of how the physician can
handle the interaction. The ingredients have been linked to SNOMED CT substances to directly inform the decision
support service.

Risk Situation Database


The risk situation database contains drugs evaluated by experts as being potentially dangerous in specific
situations. Drug products, ingredients and dose forms are converted to SNOMED CT concepts, which thus
contribute to the decision support service.

Maximum Dose Database


An existing database contains maximum doses for all drugs and recommended doses for patients with impaired
renal function. In the decision support service ingredients are once again expressed as SNOMED CT substances.

Alert Filtering
The decision support platform will incorporate an alert filtering service in which physicians can set up their
personal preferences for the displaying of alerts. For example, the dose form hierarchy of SNOMED CT will be used
to enable filtering of unwanted alerts for specific dose forms (such as cutaneous dose forms).

1 [Link]

12.1.4 Semantic Search (Australia)


In his thesis, Bevan Koopman presents models for semantic search: Information Retrieval models that elicit the
meaning behind the words found in documents and queries rather than simply matching keywords. This is
achieved by the integration of structured domain knowledge [from SNOMED CT] and data-driven information
retrieval methods... 1

A major application for Natural Language Processing technologies is indexing collections of free text transcripts or
documents such that topic specific searches may be run on them. The challenge is to return ranked matches which
permit selection of texts with high sensitivity and high specificity (i.e. that relevant documents are rarely
overlooked and that irrelevant documents are rarely returned).
Clinical searches may be performed over transcripts or documents that reside in an electronic library, within
medical records, or the Internet. Examples of searches include:
• "Show me articles on this website concerned with inflammatory bowel disease"
• "Does this patient have transcripts in their record suggesting a heart rhythm disturbance?"

© Copyright 2021 International Health Terminology Standards Development Organisation 81


Data Analytics with SNOMED CT
(2021-03-11)

Bevan Koopman's PhD thesis explores semantic and statistical approaches to search. The intention is to move
beyond the limitations of plain keyword searching strategies for medical document retrieval. Characterizing these
limitations as the 'semantic gap' Bevan identifies and addresses several issues including:
• Vocabulary mismatch: hypertension vs. high blood pressure
• Granularity mismatch: antipsychotic vs. haloperidol
• Conceptual implication: e.g. from hemodialysis infer kidney failure
• Inferences of similarity e.g. comorbidities (anxiety and depression)
His specific aim was to determine whether graph-based features and the propagation of information over a graph
can provide an inference mechanism to bridge this semantic gap. As part of this work, he assessed the contribution
of using SNOMED CT data within the graphs used to drive inferences.

Materials and Methods


The specific application in the thesis to find patients who match certain inclusion criteria for recruitment into
clinical trials based on the analysis of free text transcripts from clinical records.
Queries included
• Patients with depression on antidepressant medication
• Patients treated for lower extremity chronic wound
• Patients with AIDS who develop pancytopenia
Indexing methods were applied to the TREC MedTrack corpus - a standard collection of electronic texts containing
de-identified reports from multiple hospitals in the United States. It includes nine types of transcripts: history and
physical examinations, consultations, reports, progress notes, discharge summaries and emergency department
operation reports, radiology, surgical pathology and cardiology reports. The collection as used contained around
100,000 reports within around 17,000 unique 'visits'.
Graphs have a number of characteristics that align with the requirements of semantic search as inference. The
edges in a graph capture interdependence between concepts – which is identified as one of the semantic gap
problems. Graphs are a common feature of both ontologies and retrieval models. The propagation of information
over a graph — such as the popular PageRank algorithm used in Internet Search engines— provides a powerful
means of identifying relevant information items (be they terms, concepts or documents). Ontologies such as
SNOMED CT may also be represented as graphs.
The Graph Inference model developed by Bevan Koopman specifically addresses a number of semantic gap
problems. Regarding vocabulary mismatch, the Graph Inference model utilizes a concept-based representation as
this helps to overcome vocabulary mismatches (i.e. missed synonymy). The Graph Inference model specifically
addresses granularity mismatch by traversing parent-child (i.e. 'is a') relationships.
The semantic gap problem of 'conceptual implication' is where the presence of certain terms in the document infer
the query terms. For example, an organism may imply the presence of a certain disease. Such associations are
captured in SNOMED CT and thus the Graph Inference model can specifically address conceptual implication by
traversing those relationships.
Finally, the semantic gap problem of 'inference of similarity', where the strength of association between two
entities is critical, is specifically addressed by the diffusion factor, which assigns a measure of similarity to each
domain knowledge-based relationship. In the case of SNOMED CT the diffusion factor was derived from SNOMED CT
relationships. It was noted that some relationships contributed to search sensitivity or conversely could lead to
noise (loss of specificity) for the purpose of document retrieval. A weighting was applied (empirically) to each
SNOMED CT relationship type and used as part of the relationship type component of the diffusion factor. For
example, relationship type weightings included:
• |is a| = 1.0
• |active ingredient| = 1.0
• |definitional manifestation| = 0.8
• |associated finding| = 0.6
• |severity| = 0.2
• |laterality| = 0.2

© Copyright 2021 International Health Terminology Standards Development Organisation 82


Data Analytics with SNOMED CT
(2021-03-11)

Documents were parsed and analyzed using Lemur – a highly versatile and customizable open source information
retrieval package developed at the University of Massachusetts. The construction of the graph was done using the
open source LEMON graph library. The graph was serialized using LEMON and stored inside the Lemur index
directory. For the MedTrack corpus, which was found to have a vocabulary size of 36,467 SNOMED CT concepts, the
resulting graph was 4.4MB.

Discussion
The findings of the thesis demonstrated that the graph based retrieval approaches using SNOMED CT derived data
performed better than other approaches on 'hard queries'. A number of additional insights were also revealed.
First, hard queries require inference and easy queries do not. Hard queries tended to be verbose and often
contained multiple dependent aspects to the query (for example, a procedure and a diagnosis concept). Re-ranking
using the Graph Inference model was effective here. Easy queries tended to have a small number of relevant
documents and an unambiguous query concept. For these queries, inference was not required and the Bag-of
concepts model was most effective. Overall, when valuable domain knowledge was provided by SNOMED CT, the
Graph Inference model was effective — either by returning new relevant documents or by effectively re-ranking
those selected. This again highlights the dependence on the underlying domain knowledge.
Regarding residual lack of sensitivity of all the IR strategies, Koopman suggests that an ideal ontology for
information retrieval would not only contain definitional but also assertional data – for example "captopril can be
used as a treatment of hypertension", "myocardial infarction [may] cause heart block" and "diabetes mellitus may
lead to renal failure".

1 [Link]

12.1.5 Radiology Activity (UK)


The National Interim Clinical Imaging Procedure Code set is a list of codes and descriptions for the coded and
textual representation of Clinical Imaging Procedures in electronic systems in the NHS. It supports the
consistent and unambiguous representation of imaging procedures in electronic information systems so that
treatment options can be based on a common understanding of what procedures have been performed or are
planned and activity can be directly comparable between all service providers. 1

Migration to native SNOMED CT electronic patient records is in progress in the United Kingdom National Health
Service (NHS). In order to promote interoperability, usability and activity reporting, the NHS introduced a national
standard set of imaging codes in 2005 – the National Clinical Imaging Procedure code set (NCIP).
While SNOMED CT was the prime candidate for populating the NCIP, many Radiology Information Systems (RIS) and
Picture Archiving and Communication (PAC) systems at the time could not accommodate SNOMED CT 18-digit
concept identifiers or (up to) 255 character descriptions without disruptive and costly software changes. There was
also no consistent way to represent laterality of procedures, and some legacy systems required the creation of
separate orderable items for each laterality – for example 'Plain X-ray left wrist', 'Plain X-ray right wrist', and 'Plan X-
ray both wrists'. For these reasons, the NCIP code set was developed based on SNOMED CT, but with the addition of
unique identifiers compatible with legacy system's character limitations (6 alphabetic characters), up to 40
character human readable descriptions, and additional laterality metadata. For example, 60027007 |Radiography of
wrist| is represented within NCIP as:
SCT ID Laterality_ID Laterality Short_Code Preferred

60027007 51440002 Right and left XWRIB XR Wrist Both

60027007 7771000 Left XWRIL XR Wrist Lt

60027007 24028007 Right XWRIR XR Wrist Rt

© Copyright 2021 International Health Terminology Standards Development Organisation 83


Data Analytics with SNOMED CT
(2021-03-11)

NCIP short codes are 'meaningful', in that the modality of the procedure is defined by the first character of the code,
and the finding site and laterality are both explicitly represented in the code.
Each hospital submits mandatory data extracts using NCIP from both legacy and SNOMED CT capable RIS. In
addition to details of the imaging procedures, information about the referral source, patient type, demographics
and times of each imaging related event are also collected centrally. The data from all sites is then combined and
multiple reports are extracted. Hospitals can view their activity data via the iView web based reporting tool (see
Figure 12.1.5-1) and compare their activity with other centers.
Analytics on this central platform are wholly SNOMED CT based. SNOMED CT hierarchies support sophisticated
reports – for example, the monthly waiting times for Magnetic Resonance Imaging excluding Cardiac MRI and MRI
guided procedures is specified as:
• Includes hierarchy << 113091000 |Magnetic resonance imaging|
• Excludes hierarchy << 258177008 |Magnetic resonance imaging guidance|
• Excludes hierarchy << 241620005 |Magnetic resonance imaging of heart|

© Copyright 2021 International Health Terminology Standards Development Organisation 84


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.1.5-1: Detail of SNOMED CT based report on the NHS iView platform

1 [Link]

12.2 Vendor Case Studies


This section includes brief reviews of a variety of commercial tools which support analytics over SNOMED CT
enabled data. These reviews focus predominantly on those tooling features that are relevant to supporting
analytics services. The vendors who have contributed to this review include:

We welcome additional input to this section and anticipate updates to this report as new information becomes
available.

12.2.1 3M Health Information Systems


3M Health Information Systems provides intelligent tools to help compile and use health information for better
clinical and financial performance. Best known for market-leading coding system and ICD-10 expertise, 3M
Health Information Systems also delivers innovative software and consulting services for clinical
documentation improvement, computer-assisted coding, case mix and quality outcomes reporting, mobile
physician solutions, and a robust healthcare data dictionary and terminology services to support the Electronic
Healthcare Record. 1
For more information please visit [Link]

The 3M Healthcare Data Dictionary (HDD) is a controlled medical vocabulary server. The HDD has been continuously
expanded and maintained for over 15 years, both as a standalone product and embedded within several of 3M's
core products and services. The 3M HDD enables mapping and management of medical terminologies, integration
of content and standardization of healthcare data. The 3M Healthcare Data Dictionary incorporates a selection of
standard healthcare terminologies, including (but not limited to) SNOMED CT, LOINC, RxNorm, ICD-9-CM and
ICD-10-CM.
Concepts in the HDD are grouped and organized using both hierarchical and non-hierarchical relationships. One of
the hierarchical relationships in the HDD is SNOMED CT's 'is a' relationship which allows users to programmatically

© Copyright 2021 International Health Terminology Standards Development Organisation 85


Data Analytics with SNOMED CT
(2021-03-11)

use and analyze SNOMED CT concepts captured at various levels of granularity. The analytics capabilities of the
HDD are also extended through the use of other relationship types.

Data Warehousing
The content within the HDD makes a key contribution to analytics in several settings. For example one large
academic research institution uses the HDD to integrate over 100,000 medication concepts from disparate systems
for comprehensive data assimilation. Many of the medication concepts are mapped to RxNorm codes and linked
through a 'has ingredient' relationship to SNOMED CT codes.
The 3M HDD has a knowledge base and poly-hierarchical structure that defines the relationships between each
clinical drug. Figure 12.2.1-1 shows the relationships that exist for Ramipril including the links to SNOMED CT
content, which can be used to query the data warehouse. The knowledge base allows the hospital's researchers to
customize their searches by various levels of granularity and organize their clinical content into meaningful
relationships.

© Copyright 2021 International Health Terminology Standards Development Organisation 86


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.2.1-1: 3M HDD - Application of a knowledge base and hierarchies

The HDD supports researchers in performing data mining by:


• Extracting and mapping clinical metadata using a streamlined, systematic approach;
• Translating diverse clinical terminologies using a coded medical vocabulary;

1 [Link]

12.2.2 Allscripts
Allscripts Healthcare Solutions, Inc. (Allscripts) is a provider of clinical, financial, connectivity and information
solutions and related professional services to hospitals, physicians and post-acute organizations. The Company
provides a variety of integrated clinical software applications for hospitals, physician practices and post-acute
organizations. For hospitals and health systems these applications include its Sunrise Enterprise suite of
clinical solutions, consisting of a range of acute care Electronic Health Record (EHR), integrated with financial/
administrative solutions, including performance management and revenue cycle/access management. The
Company's acute care solutions include Emergency Department Information System (EDIS), care management
and discharge management. 1
For more information please visit [Link]

Allscripts released their first version of Vocabulary Management utilizing SNOMED CT in 2005. Since then Allscripts
systems have been able to utilize SNOMED CT for clinical decision support and reporting. Allscripts uses a common
terminology platform for all three electronic health record systems: Sunrise Clinical Manager™, Sunrise Acute Care™
and Sunrise Ambulatory Care™.
When a query is performed over a health record that requires clinical terminology, the terminology service always
returns a SNOMED CT code. If the primary code stored in the health record is not SNOMED CT (e.g. ICD-9 or ICD-10),
then the terminology service performs the mapping to SNOMED CT, saves the SNOMED CT code in the health record
next to the original code (to make future queries more efficient), and returns the SNOMED CT code.
Allscripts Sunrise applications are able to link SNOMED CT to all orders, order form pick lists, observations pick lists
and results.

© Copyright 2021 International Health Terminology Standards Development Organisation 87


Data Analytics with SNOMED CT
(2021-03-11)

Point of Care Decision Support


Sunrise Clinical Manager integrates reference content from medical publishers into clinician workflow. The Sunrise
InfoButton™ feature provides clinicians access to relevant medical reference content wherever patient care
decisions are made, without requiring them to log into or visit another site. Sunrise InfoButton uses encoded
patient problem lists and medication data elements to query third-party medical content selected by the clinician.
Several healthcare reference content providers index their content using SNOMED CT. This enables the delivery of
on-topic information without manual searching. Resources with SNOMED CT indexed content include Wolters
Kluwer Clin-eguide which provides context-specific evidence on particular medical issues and diseases and Lexi-
Comp, providing drug and drug-interaction information, diagnosis and disease management, formulary services,
patient-education resources and clinical support tools.

Reporting
The Allscripts Clinical Quality Management (CQM) is an automated chart abstraction and analytics system. CQM is
able to create population sets with SNOMED CT encoded patient records and use these patient sets for reporting.
CQM is a flexible, powerful reporting and analytics system presenting information in a variety of formats ranging
from simple list style reports, to Online Analytical Processing (OLAP) Data Cubes with Pivot reports.
Allscripts Clinical Performance Management (CPM) is a business intelligence solution for monitoring clinical
performance, improving patient outcomes and reducing costs. With prebuilt or customized reporting and
dashboards, healthcare leaders have powerful access to performance information enabling quality improvement
across the health enterprise.
Applications include:
• Alert usage analysis: User-customizable reports drill down into clinical decision support usage data
revealing the reasons and circumstances for bypassed or overridden alerts. Seeing the impact of decision
support enables a sharper focus on patient outcomes and supports the refinement of rule logic.
• Order-set usage analysis: Organizations can evaluate their order set usage patterns of computerized
provider order entry. By observing order set configuration, deployment and use, organizations can revise
them to enhance their effectiveness and usability.
• Clinician utilization analysis: Clinicians can examine the vast array of health issues and clinical observations
for discharged patients to support patient treatment decisions and protocol implementation. In addition,
patient cohorts can be tracked over time to determine if the proper treatments are being delivered.

1 [Link]

12.2.3 Apelon
Apelon is an international informatics company focusing on data standardization and interoperability. Leading
healthcare organizations use Apelon's products and services to better manage terminology assets. Apelon
solutions help healthcare application vendors, biomedical researchers, providers, biotech companies and
government agencies improve the quality, comparability, and accessibility of clinical information. 1
For more information please visit [Link]

SNOMED CT plays a central role in many Apelon products and projects. Apelon tools feature navigation and
visualization tools to support SNOMED CT in a variety of ways. Apelon also undertakes bespoke content
development and consultancy work in healthcare and biomedicine using SNOMED CT.
The Apelon Terminology Development Environment (TDE) software was used by the College of American
Pathologists to build and maintain the SNOMED CT International Edition prior to the formation of IHTSDO. Apelon
software continues to be used by major healthcare organizations and some National Release Centers to maintain
SNOMED CT extensions, maps and subsets.

© Copyright 2021 International Health Terminology Standards Development Organisation 88


Data Analytics with SNOMED CT
(2021-03-11)

Apelon Distributed Terminology System


The Apelon Distributed Terminology System (DTS) offers a variety of human and computer interfaces to navigate,
visualize and query SNOMED CT. DTS allows users to create custom extensions to SNOMED CT and perform
incremental description logic classification to ensure that the extensions are consistent with the base version of
SNOMED CT. DTS permits navigation, and side-by-side comparison of concepts across multiple SNOMED CT
versions. Features of Apelon DTS supporting analytics and data retrieval include:
• Subsetting: DTS allows users to create customized SNOMED CT subsets using advanced logic techniques.
The user is able to create extensional and intentional value sets of concepts for queries based on both
hierarchical and non-hierarchical relationships.
• Data normalization: DTS supports the matching of text input to standardized terms and concepts via word
order analysis, word stemming, spelling correction and term completion
• Code translation: DTS supports the mapping of clinical data to standard coding systems such as SNOMED
CT, ICD-9, 10 and CPT

Projects Using SNOMED CT


Apelon frequently choose to rely upon SNOMED CT in their consulting work as the overarching reference
terminology. Recent projects using SNOMED CT include:
• Work with a major performance measure developer to create a large number of SNOMED CT value sets
representing the inclusion and exclusion criteria for quality measures. SNOMED CT supplies the expressivity
for the detailed distinctions amongst disorders and patient characteristics that is essential for this work.
• Use of SNOMED CT as the "backbone" terminology in a number of mapping projects for Health Information
Exchanges. A small value set of SNOMED CT concepts serves as the "source of truth" for determining
appropriate maps, and then codes from other terminologies are assigned based on whether they are a good
fit with the SNOMED CT concept. This strategy provides a way to capture the precise intent behind the often-
fuzzy language found in clinical documents.
• Use of SNOMED CT to index patient education materials for a major content provider. Documents are
retrieved via an 'Infobutton' request in the EMR based on codes found in the patient record.

1 [Link]/

12.2.4 B2i Healthcare


B2i Healthcare provides tools and services to maximize SNOMED CT's utility. B2i Healthcare Pte Ltd (B2i) is a
boutique software engineering firm specialized in SNOMED CT and healthcare information standards and
exchange. B2i provide products to simplify SNOMED CT adoption and offer software development services to
support healthcare IT needs. 1
For more information please visit [Link]

Snow Owl is a clinical terminology platform developed by B2i Healthcare. The Snow Owl technology family is
deployed in over 2,500 locations in 83+ countries worldwide. The Snow Owl® terminology server has been licensed
by SNOMED International to form the basis of SNOMED International Terminology Server.

Snow Owl Terminology Server


The Snow Owl® terminology server scales from a small kernel embedded in single-user products to n-tier clusters
supporting hundreds of concurrent users. Clients can easily access and query SNOMED CT, LOINC, ATC, ICD-10, and
dozens of additional terminologies via REST or Java APIs. Collaborative distributed authoring is also supported,
including creating and maintaining local code systems, mapping between terminologies, and creating terminology
subsets.
Terminology server features include:

© Copyright 2021 International Health Terminology Standards Development Organisation 89


Data Analytics with SNOMED CT
(2021-03-11)

• Extensive support for expression constraints and semantic queries, including Extended SNOMED CT
Compositional Grammar and Groovy scripts.
• Distributed revision control system supports large teams of authors and reviewers working on independent
branches.
• Full support for SNOMED CT logical definitions (OWL 2 EL) with extended support for extensions using
advanced description logic features (OWL 2 DL) including datatype properties, universal restriction,
disjunction, etc.
• Standard distribution formats (e.g. SNOMED RF2, ICD-10 ClaML, LOINC csv)
• Traditional, white-label (embedded within client product), and source code licenses available.
The Singapore Drug Dictionary (SDD) is the biggest SNOMED CT extension - larger than SNOMED CT International
release itself. To support medication safety initiatives like medication management and adverse drug event
surveillance, the drug ontology makes use of Snow Owl's extended description logic support.

Snow Owl IDE


The Snow Owl IDE (Integrated Development Environment) simplifies developer tasks related to terminology
tooling. The architecture allows customized extensions to integrate tooling needs within a single platform.
The IDE embeds a terminology server and simplifies common terminology maintenance, ETL, and other tasks.
Customized authoring environments support developing a library of queries (SNOMED CT expression constraints)
using the Simple or Extended SNOMED CT Compositional Grammars and Groovy scripting. Files can be exported in a
variety of formats like OWL 2, SNOMED CT RF1 and RF2, ClaML, spreadsheets and text files. Custom formats can also
be created that support direct import and export to proprietary EHR and terminology applications.
Typical vendor deployments: EHR vendors use Snow Owl to create and maintain their local terminologies and
mappings to reference terminologies like SNOMED CT. Snow Owl IDE allows exporting this in a format consumable
by the proprietary EHR system format. The Snow Owl IDE has been built into proprietary tooling combining
information modelling with ontology development.

Snow Owl Collaborative Authoring Platform


Snow Owl's collaborative terminology authoring platform maintains terminology artefacts developed by a team
and supported by business workflows. The platform consists of the terminology server with remote clients
collaborating with independent authoring workflows. The platform integrates with external task management
systems like Bugzilla and JIRA.
Features:
• Full support for creating SNOMED CT extensions, including RF1 and RF2, all subset and mapping RF2
reference set types, modules, and full change history to 2002.
• Support for dozens of terminologies and any local code systems.
• Creation of value sets including mixing and matching codes from different code systems.
• Import existing value sets from the USA National Library of Medicine's Value Set Authority Center.
• Creation of mapping sets between any two terminologies or mapping local code systems to reference
terminologies like SNOMED CT and LOINC.
• Configurable workflow support for authoring use cases like single, dual, and dual independent authoring.
• Support for terminology-specific workflows and editing restrictions.
The Singapore Ministry of Health Holdings uses Snow Owl to maintain their national SNOMED CT extension and
local code systems as well as multi-terminology value sets and mappings used in their National Healthcare Data
Dictionary.

Snow Owl Meaningful Query


The international adoption of SNOMED CT and related healthcare ontologies has provided the logical definitions
that enable a new breed of queries. Unfortunately, it's challenging to run ad hoc queries that make use of the full
semantics of the underlying EHRs. Operational stores have the data, but in a variety of structures that can't act on
the semantic relationships between healthcare terms. Data warehouses can query only aggregated data that has

© Copyright 2021 International Health Terminology Standards Development Organisation 90


Data Analytics with SNOMED CT
(2021-03-11)

been placed into predefined buckets which don't provide the scale of complexity inherent in the original data. And
multiple information models represent the same semantic meaning in different ways.
Snow Owl Meaningful Query (MQ) allows semantic EHR queries on operational data stores without requiring
predefined structures like data warehouses or the presence of a single unified healthcare information model. The
system is optimized specifically for ad hoc queries of hundreds of millions of electronic health records. We combine
ontological reasoning over the EHRs with more traditional query methods to incorporate demographic and
ancillary data.
This query interface is being rolled out to all Singapore public hospitals and the national procurement office to
allow search and retrieval of pharmaceuticals contained within the Singapore Drug Dictionary ontology. All lexical
and semantic properties can be searched, including datatype properties and mappings to local code systems,
external terminologies like ATC, and internal procurement codes.

1 [Link]

12.2.5 Cambio
Cambio Healthcare Systems is a market leading Electronic Patient Record (EPR) company headquartered in
Stockholm, Sweden with offices in the UK, Sweden, Denmark and Sri-Lanka. Cambio COSMIC® is a patient-
centered integrated EPR system for comprehensive and clinical healthcare solutions with a focus on patient
safety. Cambio COSMIC® offers solutions within all healthcare sectors and is used by over 100,000 clinicians and
healthcare professionals. 1
For more information please visit [Link]

The Cambio COSMIC® Electronic Patient Record system has been under continuous development since 1993.
Cambio has applied innovations within healthcare informatics in areas such as information models, clinical
terminology and formal languages for expressing clinical decision support rules. The COSMIC® EPR combines
openEHR archetypes, SNOMED CT terminology and Guideline Definition Language rules in implementations which
benefit patients, clinical staff and healthcare enterprise management. Using these technologies, their system is
able to incorporate advanced analytics capabilities.

Decision Support
Cambio uses the Guideline Definition Language (GDL) to combine archetypes, terminologies and clinical decision
support rules. GDL provides:
• Bindings between archetype elements and variables in the rules;
• Rule expressions that are easily converted to industry rule engine languages;
• Bindings between local concepts used in the rules and concepts from SNOMED CT.
GDL rules can be used to trigger a variety of system actions, including pre-filling a form, proposing a test or
prescription, or sending a notification to the system user. The criteria for triggering actions from GDL rules may be
based on demographics data, the context of care (e.g. clinic or inpatient), current medications and diagnoses, or
observation values (e.g. lab results).
Decision support rules created in COSMIC® are authored using an editor. Figure 12.2.5-1 shows the high level view of
a rule for calculating a complex clinical risk-score (CHA2DS2-VASc Score for stroke risks stratification in atrial
fibrillation).

© Copyright 2021 International Health Terminology Standards Development Organisation 91


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.2.5-1: Creating rules for CHA2DS2-VASc calculation

At the more detailed level, criteria may be defined using SNOMED CT concepts and subsets of concepts (as simple
refsets). Figure 12.2.5-2 below shows a section of a decision support rule which identifies patients with heart failure

© Copyright 2021 International Health Terminology Standards Development Organisation 92


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.2.5-2: Excerpt of GDL Rule showing binding to SNOMED CT and ICD

Identification of suitable patients for research studies is a particular challenge to clinicians working in a routine
clinic setting. A clinician may encounter eligible cases very rarely or simply not be familiar with the specific study
selection criteria. In order to study diseases, their courses and causes, what causes or affects a particular condition,
and the effects of different medications, researchers need trial subjects to meet specific criteria.

Off-Line Reporting and Data Warehousing


COSMIC Intelligence is a data warehouse and reporting application. Analyses and reports that do not require real
time information are produced within this separate analysis system. COSMIC Intelligence is a data store optimized
for queries, retrieval and output of data. Data is periodically retrieved from the 'live' clinical system, transformed
and loaded into the data store.

1 [Link]/

12.2.6 Caradigm

Caradigm is a population health company dedicated to helping organizations improve care, reduce costs and
manage risk. Caradigm analytics solutions provide insight into patients, populations and performance,
enabling healthcare organizations to understand their clinical and financial risk and identify the actions
needed to address it. Caradigm population health solutions enable teams to deliver the appropriate care to
patients through effective coordination and patient engagement, helping to improve outcomes and financial
results. 1
For more information please visit [Link]

Caradigm is a joint venture between Microsoft and GE Healthcare, which is dedicated to population health
management. Caradigm's cornerstone product is the Intelligence Platform. This platform can connect over 295
types of source systems, including Allscripts, Athenahealth, Cerner, Epic, GE, McKesson and Meditech. Data from
disparate systems within one or more healthcare organizations is collected, normalized and standardized to enable
applications to leverage this data in a unified and consistent way.
Caradigm's solutions provide explorative, comparative, predictive and guided elements aimed at analyzing the
disparate data and driving the insight that is gained into action. Caradigm's three main solution areas are:

© Copyright 2021 International Health Terminology Standards Development Organisation 93


Data Analytics with SNOMED CT
(2021-03-11)

• Healthcare analytics (including clinical, operational and financial analytics);


• Coordination management; and
• Wellness promotion and patient engagement.
Some of Caradigm's customers use SNOMED CT natively in their clinical systems, while others use natural language
and other code systems. In order to aggregate data from disparate sources, it must first be standardized by
mapping into a common code system. Code systems used to standardize the disparate data include SNOMED CT,
ICD-9 and ICD-10. By standardizing the data, users are able to leverage the analytics tools - for example, to
understand trends within different diagnoses, to look at a comprehensive list of everything that has happened to a
patient in a longitudinal patient record, and to support care management by displaying the different diagnoses or
problems of a patient in a consistent manner.
Caradigm currently implements an approach to SNOMED CT based analytics using clinical value sets. These value
sets are developed manually by a team of clinical analysts for topics such as diabetes and heart disease. Clinical
users are then able to create queries in a user friendly interface, which allows them to (for example) define cohorts
built on criteria such as age, gender, specific diseases, conditions, medication or other treatments. These queries
are then converted behind the scenes into SQL statements which are executed against a SQL Server database and
return records containing data in the selected clinical value sets.
Caradigm also has Natural Language Processing (NLP) tools, which are able to extract and encode data, such as
problems and medications, from natural language notes within documents such as discharge summaries and
radiology reports.
As part of their strategic roadmap, Caradigm are exploring ways to enhance the capabilities of their tooling
platforms by leveraging the architecture of the terminology sets that they are using for analytics. In particular, they
are planning to start utilizing the hierarchical and non-hierarchical relationships of SNOMED CT to enable more
powerful query capabilities and to extend NLP processing options.

1 [Link]

12.2.7 Cerner
Cerner Corporation is a supplier of healthcare information technology solutions, services, devices and
hardware. Cerner solutions optimize processes for healthcare organizations. These solutions are licensed by
approximately 9,300 facilities globally, including more than 2,650 hospitals; 3,750 physician practices 40,000
physicians; 500 ambulatory facilities, such as laboratories, ambulatory centers, cardiac facilities, radiology
clinics and surgery centers; 800 home health facilities; 40 employer sites and 1,600 retail pharmacies. The
Company operates in two segments: domestic and global. The domestic segment includes revenue
contributions and expenditures associated with business activity in the United States. The global segment
includes revenue contributions and expenditures linked to business activity in Argentina, Aruba, Australia,
Austria, Canada, Cayman Islands, Chile, China (Hong Kong), Egypt, England, France, Germany, Guam, India,
Ireland, Italy, Japan, Malaysia, Morocco, Puerto Rico, Qatar, Saudi Arabia, Singapore, Spain, Sweden,
Switzerland and the United Arab Emirates. 1
For more information please visit [Link]

Cerner Corporation's Millennium healthcare system manages terminologies, classifications and other code systems
within a terminology service - the Cerner Millennium Terminology (CMT) package. CMT accommodates and
integrates SNOMED CT International Release data and National extension content - such as concepts, relationships,
descriptions, subsets, maps etc.
At the Cerner Millennium user interface, content can be captured at point of care as SNOMED CT codes. Modules
used with SNOMED CT include: Problems and Diagnoses, Allergies, Procedures, Pharmacy Orders, Radiology Orders
and Cellular Pathology Reports.
Either Cerner or third party clinical encoding software can process SNOMED CT Diagnoses and Procedures captured
in Millennium and suggest ICD-10 and other classification codes to Clinical Coders for activity reporting and billing.

© Copyright 2021 International Health Terminology Standards Development Organisation 94


Data Analytics with SNOMED CT
(2021-03-11)

SNOMED CT is also used extensively behind the scenes to support more sophisticated analytic facilities within their
Natural Language Processing (NLP) tools and reporting tools. These Cerner products and services exploit unique
features and content of SNOMED CT to extend the power of these applications.

Data Warehousing
The Cerner product suite includes two data warehousing applications. These applications share much of their
terminology-related technology, including supporting subsumption queries with the CMT Concept Explode/
Transitive Closure facility, which utilizes the SNOMED 'is a' relationships.
The PowerInsight® Data Warehouse (PIDW) is an enterprise level data warehouse which updates on a nightly basis
from the live electronic patient record. PIDW services standard operational reporting, mandatory reports (e.g. for
National or State governments and regulatory bodies), and ad hoc queries e.g. individual lists of patients treated as
requested by clinicians for audit.
The Health Facts® Reporting supports the pooling of anonymized data from different healthcare organizations.
Health Facts data warehouse represents information of electronic records from millions of inpatient, emergency
department, and outpatient visits from participating U.S. health care organizations. (Data are encrypted and
secured to ensure patient confidentiality in compliance with HIPAA privacy regulations.) The reporting facilities
enable the analysis of patient care and process trends within a facility and provide comparisons to other Health
Facts contributors.
The Cerner data warehouse query tools include simple graphical user interfaces which directly create powerful
reports using the SNOMED CT hierarchy content. The screenshot below in Figure 12.2.7-1 shows a report of
attendances with diagnoses which are a descendent of the SNOMED CT concept 417746004 |traumatic injury|.

© Copyright 2021 International Health Terminology Standards Development Organisation 95


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.2.7-1: Report produced by Cerner's data warehouse query tool

Other Applications
Cerner's Natural Language Processing (NLP) technology interprets the content of clinical notes through a complex
understanding of grammar, syntax, synonymy and phraseology. SNOMED CT's semantic content and concept
model enriches the analysis of the text in several ways including
• Concept recognition using synonyms – for example: |heart attack| is a synonym of |myocardial infarction|;
• The hierarchical relationships between concepts – for example: |pneumonia||is a||respiratory disease|;
• The identification of context, such as negation, certainty, subject and timing;
Computer Assisted Coding allows the extraction of appropriate SNOMED CT codes for automating coding and
billing processes.
Chart Search/Semantic Search is a tool that enables clinicians at the point of care to search in real time through a
patient's multiple charts, pathology reports and other documents, for topics such as 'heart disease' and 'diabetes'.
The interface, as shown below in Figure 12.2.7-2 has the look and feel of a World Wide Web search engine.

© Copyright 2021 International Health Terminology Standards Development Organisation 96


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.2.7-2: User interface of Cerner's Chart Search/Semantic tool

The searches can be filtered by date, document type etc. However the power of this approach is extended beyond
conventional search engine indexing by using SNOMED CT. Cerner's tools index SNOMED CT findings (including
diseases and symptoms) and procedures to make searches and queries over these domains faster. Cerner also hand
curates exceptions and associations between related concepts. A Clinical Significance Score is assigned to each
concept to allow documents to be sorted based on the probable relevance of the concepts in the document given
their context of use.
As shown in Figure 12.2.7-3 for example, when given the search term 'heart disease', the 'is a' hierarchies of
SNOMED CT enable recognition and return of documents which reference 'sinus bradycardia' and 'dilated
cardiomyopathy'.

© Copyright 2021 International Health Terminology Standards Development Organisation 97


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.2.7-3: Search results from Cerner's Chart Search tool

1 [Link]

© Copyright 2021 International Health Terminology Standards Development Organisation 98


Data Analytics with SNOMED CT
(2021-03-11)

12.2.8 Clinithink
Clinithink's tools enable analytics and querying over SNOMED CT encoded patient data …TheCLiX
CNLPplatform transforms clinical narrative into rich structured data for healthcare providers and solution
vendors. CLiX ENRICH, powered by CLiX CNLP to support analytics, converts unstructured clinical data into
actionable data required to help solve today's toughest healthcare business problems. 1
For more information please visit [Link]

Building on the capabilities of its CLiX Clinical Natural Language Processing (CNLP) platform, Clinithink has created
a solution to enable the analysis of healthcare data sourced directly from clinical narrative called CLiX ENRICH. This
technology can be integrated into existing healthcare solutions to process any relevant narrative and encode the
key clinical elements – medications, diagnoses, procedures, symptoms and findings – using SNOMED CT. These
features can then be queried using powerful, user-definable SNOMED CT queries to present structured data in a
form that can be easily consumed by existing BI platforms.
For example, when the narrative text below is typed into diagnosis and observation fields (respectively), Clinithink
is able to provide a list of possible coding options for manual confirmation (as shown in Figure 12.2.8-1below).

© Copyright 2021 International Health Terminology Standards Development Organisation 99


Data Analytics with SNOMED CT
(2021-03-11)

Figure 12.2.8-1: Clinithink narrative text and possible coding options

1 [Link]

12.2.9 EMIS
EMIS clinical systems are used by over 5,000 healthcare organizations across the United Kingdom, from Primary
Care and out-of-hours services, to community care and sexual health services. By using the same system,
everyone can access the same information about their patients - no matter where they are treated - making the
prospect of integrated care a reality. With over 25 years experience of working with the NHS, EMIS are entrusted
with over 40 million patient records. 1
For more information please visit [Link]

Egton Medical Information Systems (EMIS) began in the 1980s in a rural practice in Egton in North Yorkshire, United
Kingdom. The founders, Dr Peter Sowerby and Dr David Stables, wrote the software and adopted the NHS Read
Code system during the 1980s. A series of systems have since been deployed in over half of England's primary care
practices. The latest product (EMIS Web) moved to a data center based architecture, thin client front end and built
in secure web based patient access facilities.
EMIS software looks after the patient records of nearly 40 million people in the UK (30 million using EMIS Web). More
than 2 in 3 of those patients can book appointments and order repeat medications online, and more than 1 in 3 can
view their own medical record.
EMIS Web features significant advances in terminology use with EMIS adopting a phased approach to SNOMED CT.
EMIS Web displays a familiar coding structure based on the construction of a Read Version 2 navigational hierarchy
within SNOMED CT. The principle design objective has been to enable SNOMED CT within the clinical system to
meet specific requirements, including:
• Supporting advanced decision support capabilities;
• Supporting interoperability within healthcare through the sharing of coded data;
• Supporting standards required in NHS General Practice Systems of Choice (e.g. the NHS mandates SNOMED
CT coding within the National Summary Care Record service);
• Broadening the scope of terminology use to support the recording of encounters in disciplines such as
dentistry and community healthcare;

© Copyright 2021 International Health Terminology Standards Development Organisation 100


Data Analytics with SNOMED CT
(2021-03-11)

• Supporting the mandatory requirement for the Electronic Prescription Service to natively use the UK
SNOMED CT drug extension (i.e. NHS dictionary of medicines and devices, dm+d).
By using coded structured records and providing access to the specialist domain terminology available in SNOMED
CT, EMIS has been able to extend the user base of EMIS Web by more than 20,000 new NHS users over the last year.
These include practice nurses, community matrons, child health and mental health nurses, palliative care
clinicians, diabetes specialists, physiotherapists and psychologists.

1 [Link]

12.2.10 Epic
Epic makes software for mid-size and large medical groups, hospitals and integrated healthcare organizations –
working with customers that include community hospitals, academic facilities, children's organizations, safety
net providers and multi-hospital systems. Epic's integrated software spans clinical, access and revenue
functions and extends into the home...Epic's integrated analytics and reporting – collectively named Cogito –
delivers current clinical intelligence and business intelligence based on role and workflow… Epic provides a
combination of flexible tools, content, data sources, distribution, training, and process to support decisions
throughout the health system with the best information available. 1
For more information please visit [Link]

Epic's electronic patient record systems use SNOMED CT as a reference terminology through the following
mechanisms:
• Mappings between a subset of Epic's standard data elements and SNOMED CT concepts;
• Mappings between diagnoses imported from other code systems (e.g. those used in Intelligent Medical
Object's and Health Language's products) and SNOMED CT concepts;
• Mappings from additional data elements to SNOMED CT concepts that can be created by clients using an
External Concept Mapping activity.
These mechanisms provide linking behind the scenes, so when clinicians add a diagnosis to the problem list by
selecting a familiar term, for example, they automatically select the corresponding SNOMED CT concept. This
SNOMED CT encoding creates powerful possibilities within Epic's decision support and reporting facilities.

Decision Support
The Epic system calls its decision support alerts 'Best Practice Advisories'. These customized alerts are
programmed to fire according to predetermined triggers, such as specific chief complaints, vital signs, diagnoses or
medications, either individually or in combination using inclusionary or exclusionary logic. Best Practice Advisories
can thus be used to notify clinicians to tend to important tasks, such as reviewing a patient's allergies, writing
orders, and completing charting. They can also present order sets and links to third party information sources
refined using the clinical context of the patient record being reviewed.
Epic customers can use SNOMED CT's hierarchical structure to group related records, making the setup for clinical
decision support much simpler than would be possible if users had to select records or clinical concepts
individually. For example, an administrator creating Best Practice Advisories for diabetic patients could use
73211009 |diabetes mellitus| within the SNOMED CT hierarchy as one of the criteria instead of identifying every
subtype of diabetes individually.

Reporting
Within Epic's integrated analytics and reporting suite (i.e. Cogito) customers have achieved benefits by using
SNOMED CT's clinical finding hierarchy to aggregate local diagnosis concepts.
The capability has for example been used by oncologists working with cancer-related ICD codes which are unsuited
to grouping diagnoses by stage. Using the mapped SNOMED CT codes they are able to facilitate the reporting of
staging data by utilizing the SNOMED CT hierarchy.

© Copyright 2021 International Health Terminology Standards Development Organisation 101


Data Analytics with SNOMED CT
(2021-03-11)

1 [Link]

12.2.11 First Databank

The Multilex drug knowledge base is widely used throughout the UK and is integrated into clinical systems
across the whole healthcare community. The Multilex drug terminology holds clinical and commercial
information on more than 75,000 pharmaceutical products and packs and provides active clinical decision
support and referential medicines information for all healthcare professionals. 1
For more information please visit [Link]

Overview
First DataBank (FDB) were in the first wave of suppliers to recognize the potential of SNOMED CT and begin to
integrate support for SNOMED CT into their existing clinical decision support solutions. Their primary use of
SNOMED CT in the patient's electronic health record (EHR) is to detect safety issues arising from certain
combinations of medications, diagnoses and drug adverse reaction histories. In 2006 FDB introduced support for
products and packs encoded using the NHS SNOMED CT UK Drug Extension. In the following year FDB launched new
modules within the Multilex drug knowledge base supporting Drug-Condition Checking and Drug Sensitivity
(Allergy) checking for the SNOMED CT EHR.
System vendors implementing Multilex decision support within SNOMED CT-enabled medical record applications
include CSC (Lorenzo system), EPIC and JAC in secondary care, and CSE Servelec (RiO system) in community/mental
health. Currently only pre-coordinated expressions are supported by the live Multilex SNOMED CT based decision
support solutions.

Drug-Condition Contraindication Checks


The contraindications module alerts the clinician when a medication proposed to treat a disorder is incompatible
with another of the patient's disorders or clinical states. For example a beta blocker like propranolol might be
prescribed to treat someone with high blood pressure. However if that patient also has asthma, their asthma might
significantly worsen or a dangerous acute attack might be produced by the drug.
Thousands of such drug-condition contraindications exist and nearly all medications have at least one. Without
point of care decision support, the clinician must rely on memory or search reference sources for each drug
prescribed. Also there is a risk that a contraindicating condition may be in the record but unknown to the
prescribing clinician.
In a SNOMED CT enabled EHR, both the drugs (e.g. 318353009 |propranolol hydrochloride 40mg tablet|) and the
conditions (e.g. 370219009 |moderate asthma|) are encoded.
Internally FDB maintain their own local ontology representing only those conditions relevant to prescribing
decision support (e.g. asthma, gastric ulcer, heart disease, pregnancy). The items in this ontology are linked to
SNOMED CT codes as required to support this (contraindication checking) use case. These SNOMED CT links range
from the obvious, such as linking 195967001 |asthma| to FDB's 'asthma', to the more subtle, such as linking
447413000 |drainage of amniotic fluid using ultrasound guidance| to FDB's 'pregnancy'.
FDB reviews the relevant SNOMED CT domains (i.e. |Clinical finding|, |Procedure| and |Situation with explicit context
|) for concepts applicable to drug-condition checking. The FDB linking tool uses the SNOMED CT |is a| hierarchy and
a SNOMED CT derived transitive closure table to locate and suggest links from the FDB ontology to SNOMED CT
concepts. Other SNOMED CT relationships also help find related concepts via the browser but discovery is mainly by
clinical knowledge combined with description based searches assisted by the rich synonym content of SNOMED CT.

© Copyright 2021 International Health Terminology Standards Development Organisation 102


Data Analytics with SNOMED CT
(2021-03-11)

Drug Sensitivity (Allergy) Checks


The sensitivities module alerts the clinician when a proposed drug for a patient is either stated in that patient's
record to have caused a previous adverse reaction or when an adverse reaction has occurred to a similar drug and
thus likely to elicit a similar adverse response. For example, a patient allergic to penicillin is likely to react to most
other drugs containing a β-lactam ring in their molecular structures.
In a similar way to how FDB links SNOMED CT conditions to its own internal ontology, SNOMED CT concepts which
suggest allergy or previous adverse reactions to a medication are also linked to an internal FDB ontology for
representing medication ingredients. This ontology is designed specifically to support allergic and adverse reaction
cross-reactivity.

1 [Link]

12.2.12 Intelligent Medical Objects


Intelligent Medical Objects (IMO) develops, manages, and licenses medical terminology and healthcare IT
software applications that allow clinicians to capture their clinical intent at the point-of-care. IMO's
comprehensive medical terminology of physician-friendly terms is mapped to the preferred billing and
reference codes enabling clinicians to use the terms they are familiar with while ensuring improved coding
accuracy. 1
For more information please visit [Link]

IMO produces a medical terminology service for healthcare solutions, allowing over 2,500 hospitals and 350,000
clinicians to focus on patient care. IMO bridges the information gap between clinicians, coders, and patients in the
US and internationally. IMO enable and support the accurate capture and preservation of clinical intent for clinical
documentation, decision support, reimbursement, reporting, data analysis, research, and health education.'
IMO's clinical interface terminology is designed pragmatically to capture clinical intent at point of care. However it
is also intended to enable and simplify the adoption of standard ontologies by vendor partners.
By choice, the editorial process requires all IMO interface terms to have one or many qualified maps to SNOMED CT.
Clients can then use SNOMED CT to drive reporting, analytics, clinical decision support, and research.
The following examples demonstrate how IMO uses SNOMED CT for analytical purposes:
1. Helping patients find health professionals who have expertise or interest in specific areas of medicine. These
areas include disorders, procedures, devices, medications, patient demographics, and medical specialties.
These areas of expertise or interest include those that are self-reported by clinicians and those documented
in clinical encounters. The search algorithms use hierarchies in SNOMED CT to retrieve and rank search
results.
2. Helping clinicians use patient diagnoses and procedures documented at varying levels of granularity to find
appropriate patient education materials using SNOMED CT is-a hierarchies.
3. Grouping together related clinical concepts in patient records for creating focused patient reports and
driving clinical workflows.
4. Forming subsumption queries for cohort selection within patient data repositories and document libraries.
5. IMO uses natural language processing (NLP) to extract information coded in SNOMED CT from clinical
narratives.

1 [Link]

© Copyright 2021 International Health Terminology Standards Development Organisation 103

You might also like