“Taxonomy Tools and
Tool Evaluation”
SLA Taxonomy Division
Agenda and Presenters
Overview of Taxonomy Tools
• Presented by Heather Hedden
Senior Vocabulary Editor, Gale, A Cengage Company
Author of The Accidental Taxonomist (Information Today, 2016)
Evaluating Taxonomy Tools
• Presented by Marti Heyman
Executive Director, Metadata Strategy and Operations, OCLC
3
Overview of Taxonomy Tools
Heather Hedden
Senior Vocabulary Editor, Gale, A Cengage Company
[email protected]
What are “taxonomy tools”?
• No authoritative industry list of taxonomy software
• “Taxonomy software” can mean different things
• Thesaurus/ontology management software
• Auto-categorization/auto-classification software
• Mindmapping or concept modeling tools
• Other software with a key taxonomy component
• Web lists are miscellaneous taxonomy tools or out-of-date
• www.taxobank.org/content/thesauri-and-vocabulary-control-thesaurus-software
• www.taxotips.com/resources/tools
• www.searchtools.com/info/classifiers-tools.html
• Market is too small and specialized to be followed by industry analysts
5
Background
• Taxonomy, thesaurus, or ontologies - The distinctions are blurred.
Most software enables the creation of a combination:
taxonomy/thesaurus, thesaurus/ontology, taxonomy/thesaurus/ontology
• Excel suffices for flat term lists (such as for facets), and small hierarchical
taxonomies, but not for the complexities of large taxonomies, thesauri,
ontologies, or multilingual vocabularies.
• Software tools enforce/support standards, but not all the same standards:
thesaurus or records management (ANSI/NISO or ISO), SKOS/RDF, OWL
• Software tools support integration with auto-classification, content
management systems, SharePoint, etc.
Summary of Controlled Vocabulary Types
Less Controlled Vocabularies - Complexity More
Term List Synonym Ring Authority File Taxonomy Thesaurus Ontology
Ambiguity control Ambiguity control Ambiguity control Ambiguity control Ambiguity control Ambiguity control
Synonym control Synonym control (Synonym control) Synonym control (Synonym control)
Hierarchical Hierarchical
relationships relationship
Semantic
Associative relationships
relationships
Classes
(Linked Data)
Types of software for vocabulary management
• Spreadsheet software (Excel)
• Dedicated thesaurus/ontology management software
• Taxonomy creation & editing module of a content management,
document management, digital asset management, collaborative
software (SharePoint)
• Taxonomy creation & editing module of auto-classification (automated
indexing) software
• Vertical market software for creating classification structures
• Proprietary programs developed in-house in organizations with large or
core taxonomy management needs
Types of software used by taxonomists
An internally developed taxonomy/thesaurus management system 25.5% 36
Commercial dedicated thesaurus/taxonomy/ontology management
22.7% 32
software
Commercial software, of which taxonomy management is a feature,
12.1% 17
module or component
Open-source ontology/taxonomy management software 9.2% 13
Other commercial software that is not intended for taxonomies
30.5% 43
(such as a word processor, spreadsheet, or database management)
Results of author survey for The Accidental Taxonomist, 2nd ed., conducted May 2015
Thesaurus/ontology software basic features
• Maintain terms/concepts and their relationships
– As reciprocals
– When renaming, merging, subsuming, or deleting terms
• Support controlled variants/NPTs/synonyms/alternative labels
• Support notes/definitions and other attributes for terms
• Manage categories or classes for terms/concepts
• Manage candidate and approved terms; term creation and update dates
• Generate reports in various display formats
• Export data in format for importing into a content indexing/search/retrieval
system: CSV, Excel, HTML, XML (ZThes, RDF, SKOS, and OWL)
Software feature of enforcing thesaurus standards
Thesaurus standards: ANSI/NISO Z39.19-2005 (R2010) or ISO 25964
• Preferred terms (preferred labels) must be unique; no duplicates
• A nonpreferred term (alternative label) can point to only one preferred
term (concept)
• A pair of terms (concepts) can be either hierarchically (broader/narrower)
or associatively (related) linked to each other, but not both.
• Hierarchical relationship logic extends:
• A is narrower to B, and
• B is narrower to C,
• C cannot be narrower to Term A.
Thesaurus software points of comparison
• interface design (default view) and ease of use
• multiple taxonomy display options
• term searching
• spell-checking
• speed (limited mouse clicks) for repeated term and relationship additions
• single-step new term & relationship creation
• single-step branch (term and narrower terms) moving
• drag & drop relationship adding
• user-defined (customizable) relationships
• user-defined term notes and term attributes
• bilingual or multilingual taxonomy support
• importing and exporting formats
• connectors to content SharePoint and enterprise search systems
• auto-categorization add-on module
• support for linked data
Commercial dedicated thesaurus/ontology
management software
Includes:
• MultiTes Pro
• Synaptica KMS, Synaptica Graphite
• Data Harmony Thesaurus Master
• Semaphore
• Mondeca Intelligent Topic Manager (ITM)
• PoolParty
• TopBraid Enterprise Data Governance (EDG) Vocabulary Management
MultiTes Pro
Multisystems (Miami, FL)
www.multites.com
• Single product independent vendor since 1983.
• Windows single user $295 (multi-user and enterprise packages also available)
• Web/cloud-based option: $4950/year per thesaurus for 20 accounts
• Thesaurus model ANSI/NISO Z39.19 based
• Supports user-defined relationships, classes, and notes; multilingual thesauri
• Imports delimited text. Outputs text, HTML, XML, SKOS/RDF, and CSV
• Add-on products: web development kit, enterprise development kit
• Free 1-month downloadable trial and online video tutorials
MultiTes Pro
15
Synaptica
Synaptica Software LLC (Franktown, CO)
www.synaptica.com
Synaptica KMS (Knowledge Management System) – thesaurus model (since 1995)
Synaptica Graphite – SKOS ontology model on a linked data graph database (since 2018)
• Web browser-access, inside the firewall or hosted.
• Supports user-defined relationships, classes, and notes; multilingual vocabularies
• Features drag-and-drop editing, automatic term mapping
• Imports: CSV, Excel, XML (Zthes, RDF SKOS, RDF OWL). Exports also HTML, Word.
• Related add-on products: Indexing Management System (IMS), Text Analytics Platform
(TAP), Image Annotation & Indexing, Linked Data Manager, SharePoint connector
• Online video tutorial for editing terms in Synaptica KMS
Synaptica KMS 17
Synaptica KMS visualization
18
Synaptica Graphite
19
Data Harmony Thesaurus Master
Access Innovations (Albuquerque, NM)
www.dataharmony.com
• Commercial software (originally used for indexing in-house) offered since 1998
• Multi-platform java-based (used on Windows, Mac, Solaris, Linux).
Client software allows remote access. Also a web-hosted version.
• Thesaurus model ANSI/NISO Z39.19 based
• Separately or combined with M.A.I. (Machine Aided Indexer) as MAIstro.
• Related products: XIS (XML Intranet System), Inline Tagging, Search Harmony
• API connectors for SharePoint, MarkLogic, OpenText, Oracle, SAP
• Access Innovations also offers taxonomy creation services.
Data
Harmony
Thesaurus
Master
21
Data Harmony
Thesaurus Master
taxonomy visualization
Semaphore
Smartlogic Semaphore Ltd. (London, UK)
www.smartlogic.com
• Introduced in 2006.
• Supports SKOS, RDF ontology standard, and ISO 25964 thesaurus
standard
• Imports/export CSV, XML (RDF SKOS, Turtle, N Triple), SQL databases, and
MultiTes files
• Related products: Classification Server for automated classification;
Ontology Service for a navigation system
• Download free 30-day trial: https://trial.smartlogic.com/S4Trials/
(Sign in with LinkedIn.)
Smartlogic
Semaphore
24
Semaphore ontology visualization
25
Mondeca Intelligent Topic Manager (ITM)
Mondeca S.A. (Paris, France)
https://mondeca.com/itm
• Introduced in 2008
• Supports SKOS vocabularies and OWL-standard ontologies
• Linked data feature
• SharePoint term store connector
• Visualization of hierarchies and relationships
• Exports to Excel, XML, RDF, SKOS, and Topic Maps
27
28
29
PoolParty
Semantic Web Company (Vienna, Austria)
www.poolparty.biz
• Introduced in 2009.
• Built on W3C Semantic Web standards: SKOS, RDF, OWL, SPARQL
• Installed server or web-hosted options
• Can link domain-specific thesauri to Linked Open Data
• Import/export formats: Excel, N3, N-Quads, Trix, Binary-RDF, MultiTes,
RDF/XML, Turtle, N-Triples, RDF/JSON, Trig, JSON-LD, and Zthes
• Add on modules: Concept Tagging, Linked Data Management, Semantic Search,
Text Mining & Entity Extraction, Classification, Data Analytics & Visualization
• Connectors for SharePoint, Drupal, WordPress, Confluence, Alfresco, FontoXML
• Download free 30-day trial: http://www.poolparty.biz/test-demo/thesaurus-
server-entity-extractor
• Software
Pool Party
31
• Software
Pool Party
concept
visualization
32
TopBraid Enterprise Data Governance (EDG)
Vocabulary Management
TopQuadrant Inc. (Raleigh, NC)
www.topquadrant.com/products/topbraid-edg-vocabulary-management/
Originally as Enterprise Vocabulary Net (2010), stand-alone vocabulary
management tool (replacing Enterprise Vocabulary Net) or module of
TopBraid Enterprise Data Governance, introduced in 2016
• Web-based access to a Linux server installation, with browser access
• Based on a graph database
• Taxonomies in SKOS or SKOS-XL; ontologies based on SHACL or OWL
• Import/export formats Excel/CSV, XML, RDF/OWL
• Automatic creation of crosswalks between two vocabularies
• Video demos at: www.topquadrant.com/knowledge-assets/videos
TopBraid EDG
Vocabulary Management
34
TopBraid EDG
Vocabulary Management
visualization 35
Free and Open Source Software
• Protégé – Developed by the Center for Biomedical Informatics Research at
Stanford University School of Medicine. https://protege.stanford.edu
– Dedicated ontology software; not so suitable for taxonomies/thesauri
• VocBench – Developed by the Artificial Intelligence Research group at University
of Tor Vergata, Rome http://vocbench.uniroma2.it
– For OWL ontologies, SKOS(/XL) thesauri
– Introduced in 2010 for UN Food & Agriculture Organization’s AGROVOC thesaurus.
– Now funded by the EU European Commission's ISA2 program. Current version 3.
– Can be installed on a web server or on a single desktop
• TemaTres – Originally developed by the Library and Information Science
program of the University of Buenos Aires https://www.vocabularyserver.com
– Available On-Premise on a web server; Software as a Service (SaaS), or On-Demand
– Version 3.0, November 2017
– Uses SKOS model and supports ISO thesaurus standards
VocBench
37
Thesaurus/Ontology Management Software - Other
• a.k.a. by Synercon – information management software with
taxonomy/thesaurus/ontology component; Australian company
• Coreon – taxonomy/thesaurus + terminology management; German
company
• Lucidia’s STAR/Thesaurus – part of the CuadraSTAR (2008 acquisition)
suite, software marketed at libraries, archives, and museums
• Soutron Global – library management system with thesaurus component
• Unilexicon – web hosted open source, but all vocabularies are open, too.
• Wordmap – offered by a consulting company (Earley Information
Science), not their main focus
Software for indexing/tagging content with
controlled vocabulary terms
Different methods:
• Manual indexing
• Automated indexing/auto-categorization
• Machine-aided indexing
Different types:
• Dedicated software, if automated (but not for manual)
• Add-on module to the taxonomy management software (usually for
automated)
• Component of a content management system, if manual (not for
automated)
• Custom-built
Auto-Classification Software
Example dedicated tools for auto-classification/automated indexing
Taxonomy management components, if any, are limited.
(Create the taxonomy in an external management tool.)
• Attivio
• BA Insight
• Concept Searching
• Coveo
• Expert System
• Lexalytics
• Lucidworks
• OpenText
• SAS Text Miner
• Sinequa
Evaluating Taxonomy Tools
Marti Heyman
Executive Director, Metadata Strategy and Operations
OCLC
[email protected]
Setting the Context
• My assumptions:
• You have an enterprise-wide taxonomy program underway
• You have sufficient corporate support to fund at least yourself
• You have sufficient business support that you have subject matter
experts assisting you
• You’ve reached the point where you sense a taxonomy
management package is needed
Setting the Context
• My case for the Spec & Select effort:
• You’ve invested significant time and money in developing the
enterprise-wide controlled vocabularies you have
• You’ve invested significant time and effort gaining the confidence
of management in your judgment
• You’ve invested significant time and effort establishing your
credibility
So, why risk all that for the sake of saving a small amount of time?
Possible Outcomes
Over kill • Bought more than you needed
Under kill • Didn’t buy as much as you needed
• Conflicts with IT infrastructure and other
Road kill systems
• Bought just what you needed for now with a
Living the Good Life little room to grow!
The Process Blueprint
Phase 0: Phase 1: Phase 2: Phase 3: Phase 4:
Define your Business Functional Features Vendor
team Requirements Requirements Scorecard short list
Phase 7: Phase 6: Phase 5:
Phase 9: Phase 8:
Vendor due Analyze Live
Implement Purchase
diligence scores demonstrations
The Process Blueprint
Phase 0: Phase 1: Phase 2: Phase 3: Phase 4:
Define your Business Functional Features Vendor
team Requirements Requirements Scorecard short list
Phase 7: Phase 6: Phase 5:
Phase 9: Phase 8:
Vendor due Analyze Live
Implement Purchase
diligence scores demonstrations
The Process Blueprint
Phase 0: Phase 1: Phase 2: Phase 3: Phase 4:
Define your Business Functional Features Vendor
team Requirements Requirements Scorecard short list
Phase 7: Phase 6: Phase 5:
Phase 9: Phase 8:
Vendor due Analyze Live
Implement Purchase
diligence scores demonstrations
Sample business requirements
• Make the enterprise vocabulary accessible and available to a
geographically dispersed set of taxonomy managers
• Make the enterprise vocabulary accessible to all applications
that depend on it
• Ensure scalability of the enterprise vocabulary
• Low operating costs
• Unix environment compliant
Identifying functional requirements
• This can be trickier
• Be sure you have the use case(s) and technical constraints
clearly in mind
• Challenge what is a true “requirement” (i.e. necessity)
Step 1: Map the Process
Identified Implement and
Create
Business Manage
Taxonomy
Need Taxonomy
Enterprise
Taxonomy
1.1 Generate list of
candidate terms
Step 2: Drill Down 1.2 Define
relationships
between terms
and create scope
notes.
1.3 Enter attributes
for all terms
1.4 Review terms Make additions,
Are there
for accuracy, deletions and
typographical YES
completeness and modifications as
errors?
relevancy needed.
Are there terms Have all term
Are there Are there missing Is the scope note
missing from the attributes been
irrelevant terms? relationships? comprehensive?
list? entered?
NO
Change Term
state from
Candidate to
Approved
Capture the full context
• Design Considerations
• Supports multiple languages including foreign characters.
• Compliant with ANSI/NISO Z39.19.
• Business Rules
• A single term may exist in multiple categories.
• Duplicate terms can not be entered.
• New terms exist in the “Candidate” state. All terms must be
reviewed by a second person before being moved to the
“Approved” state.
Sample functional requirements
1.0 General
2.0 Term Creation/Editing
1.1 Supports multiple categories.
1.2 Supports multiple languages, including foreign characters 2.1 General editing - multi term select using ctrl and shift
1.3 Compliant with ANSI/NISO Z39.19 2.2 General editing - right mouse menu for undo, cut, copy, paste, select all
2.4 General editing - inbuilt spell check software for global spell-check
4.0 Relationship Creation/Management (automatically runs on “save record” action)
4.1 Automatic reciprocal relationship management.
2.5 Prevents duplicate term entry (any level). System should warn that the term
4.2 Ability to create and delete relationship types without developer already exists.
assistance. BT/NT, RT, Use/Use for……
4.3 Poly-hierarchies (i.e. ability for a single term to be in more than one
category).
The Process Blueprint
Phase 0: Phase 1: Phase 2: Phase 3: Phase 4:
Define your Business Functional Features Vendor
team Requirements Requirements Scorecard short list
Phase 7: Phase 6: Phase 5:
Phase 9: Phase 8:
Vendor due Analyze Live
Implement Purchase
diligence scores demonstrations
Sample Score Card
Sample Completed Score Card
The Process Blueprint
Phase 0: Phase 1: Phase 2: Phase 3: Phase 4:
Define your Business Functional Features Vendor
team Requirements Requirements Scorecard short list
Phase 7: Phase 6: Phase 5:
Phase 9: Phase 8:
Vendor due Analyze Live
Implement Purchase
diligence scores demonstrations
Key Points
• Requirements: to purchase the software that meets your
needs and expectations, you need a strong, clear,
unambiguous definition!
• Process Blueprint: follow a tried and true path to success and
ensure you have the data to explain your decisions
• Ensure stewardship of corporate funds
• Avoid being road kill, spend the time to do the “front-end
analysis” work
The Process Blueprint
Phase 0: Phase 1: Phase 2: Phase 3: Phase 4:
Define your Business Functional Features Vendor
team Requirements Requirements Scorecard short list
Phase 7: Phase 6: Phase 5:
Phase 9: Phase 8:
Vendor due Analyze Live
Implement Purchase
diligence scores demonstrations
Taxonomy Tools and
Tool Evaluation
Heather Hedden (
[email protected])