User'S Guide: About Inchi
User'S Guide: About Inchi
User's Guide
Last revision date: December 15, 2020
User's Guide
I. OVERVIEW
About InChI
Standard and non-standard InChI
About InChIKey
II. ABOUT InChI PROGRAMS
III. RUNNING InChI PROGRAMS
Command Line Executable inchi-1
InChI Software Library (libinchi)
Graphical Interface Program (winchi-1)
InChI Software Options
IV. CHEMICAL STRUCTURE INPUT
V. Further reading and contacts
This introductory User Guide is addressed to the novice user of InChI whose primary interest is to
learn how to produce InChI/InChIKey identifiers of chemical compounds with InChI executables
included in InChI Software distribution (note that alternatively one may simply use nearly any
chemical drawing programs as, at the moment of this writing, they typically have a built-in InChI
generation ability).
I. OVERVIEW
About InChI
The IUPAC International Chemical Identifier (InChI) provides unique labels for well-defined
chemical substances. These labels are generated by converting an input chemical structure, in the
form of a ‘connection table’, to a unique and predictable series of ASCII characters. They offer a
means for representing chemical compounds in a manner that does not depend on how they
were drawn. Note that they are re-expressions of chemical structures, they are not registry or
registration numbers and do not require access to a database. They were developed primarily as
a means of ‘naming’ a compound in digital media although they are expressed as simple text that
may be manually interpreted. This document describes the operation and output of the present
version of the program that generates this Identifier.
The Identifier is designed to process single, well-defined chemical compounds (which may be
composed of multiple components).
InChI is a project of the International Union of Pure and Applied Chemistry (IUPAC) described at: h
ttp://[Link]/inchi/
The IUPAC body which takes care of the current and future shape of InChI is the “IUPAC InChI
Subcommittee” (IUPAC Division VIII InChI Subcommittee), which reports to IUPAC Division VIII and
also to the IUPAC Committee on Publications and Cheminformatics Data Standards. There exist
also InChI Subcommittee working groups made up of additional chemists who are developing
rules for extending the capabilities of InChI. See: [Link]
Historically, the primary development of the InChI algorithm and software took place at NIST (US
National Institute of Standards and Technology, USA) under the auspices of IUPAC. Since 2009,
the responsibility for InChI technical development and promotion has been in the hands of the
InChI Trust ([Link] – a not-for-profit organization which works in close
contact with IUPAC (and of which IUPAC is a member).
Technical details are given in a separate document, the InChI Technical Manual. The basic
algorithms were taken from the literature, with selection, testing and implementation done
primarily at NIST; with modifications and additions by IUPAC and the InChI Trust.
In the several years of its development, many individuals contributed to the development of the
InChI at meetings and through correspondence. The chemical rules employed are intended to
represent a consensus view of the concept of chemical identity. The computer program described
in this document applies these algorithms to input structures and generates both the Identifier
and an annotated depiction of the structure.
Derivation of the InChI from an input chemical structure proceeds through three steps:
1) normalization – all input information not needed for structure identification is discarded and
structure information is divided into ‘layers’; 2) canonicalization – each atom is given a label that
depends only on its position in the structure; 3) serialization – a string of characters, the Identifier,
is generated from the canonical labels. All ‘chemical’ rules are applied in the first step.
The current version of InChI Identifier is 1; the current stable version of the InChI software is 1.06
which replaces the previous version 1.05.
· Standard InChI organometallic representation does not include bonds to metal for the time
being.
Standard InChI v.1 was introduced in v. 1.02-standard release of the InChI Software in 2009 (this
software version was able of generating only standard InChIs).
The present release of InChI Software, v. 1.06, has merged functionality. It allows one to produce
both standard and non-standard InChI strings, as well as their hashed representation (InChIKey).
By default, InChI Software v. 1.06 produces standard InChI (for brevity, stdInChI below). In
particular, the standard identifier is generated when the software is used without any specifically
added options. If some options are specified, and at least one of them qualifies as related to non-
standard InChI (see section ‘InChI Software Options’ below), the program produces non-
stdInChI/InChIKey.
The standard InChI is designated by the prefix: “InChI=1S/……….. “ (that is, letter ‘S’ immediately
follows the Identifier version number, ‘1’; Identifier version numbers should always be whole
numbers).
Non-standard InChI is designated by the prefix: “InChI=1/……….. “ (that is, letter ‘S’ is omitted).
InChI’s obtained with the experimental features of the Software (support of polymers; support of
“large” molecules) are designated by the prefix: InChI=1B/……….. (‘B’ for beta).
About InChIKey
The InChIKey is a character signature based on a hash code of the InChI string. A hash code is a
fixed length condensed digital representation of a variable length character string. Providing a
hash derived from an InChI string should be helpful in search applications, including Web
searching and chemical structure database indexing; also, this hash may serve as a checksum for
verifying InChI, for example, after transmission over a network.
The InChIKey consists of two blocks. The first block is always the same for the same molecular
skeleton. All isotopic substitutions, changes in stereoconfiguration, tautomerism and protonation
are reflected in the second block.
A standard InChIKey, which is a key produced from a standard InChI, does not account for
tautomerism and may indicate only absolute stereo (or completely ignore stereo). It also does not
account for the original structure’s bonds to metal.
The two hash blocks of InChIKey are based on a truncated SHA-256 cryptographic hash function.
Note that due to the very essence of hashing, appearance of collisions (the same InChIKey for
different InChIs/structures) is unavoidable in very large collections. A theoretical – optimistic –
estimate of collision resistance is as follows. The probability of a single first block collision in a
database of 1 billion compounds is 1.3%. In other words, a single first block collision is expected in
1 out of 100/1.3 = 75 databases of 109 compounds each. For 108 (100 million) compounds in a
database this probability is 0.014%.
A beta-version of the InChIKey was introduced in software v. 1.02-beta (2007). The standard
InChIKey was introduced in v. 1.02-standard release (2009) as an InChIKey computed from the
standard InChI and intended for the principal purpose of a search-engine-style lookup of chemical
information. The present release of InChI Software v. 1.06 has merged functionality. It allows one
to produce both standard and non-standard InChIKey.
II. ABOUT InChI PROGRAMS
This document is accompanied by version 1.06 of the InChI generator executable. This program
runs under 32/64 bit Microsoft Windows ([Link]) and Linux (inchi-1) operating systems. Also
included is [Link], a convenient Windows graphical-interface application.
InChI may be also generated by using Software Library/application programming interface (API).
This is described later.
InChI and InChIKey strings generated by inchi-1 is the reference standard which all other software
entities generating InChI/InChIKey should match.
The principal use of the program is batch processing of multiple structure files, primarily SDF files.
Windows version is able of displaying chemical structures, the Linux version does not display
structures.
Standard redirection may be used to suppress inchi-1 console output.
under Linux:
More advanced example (use advanced v. 1.06 handling of polymers and pseudo atoms, generate
'empty InChI' if error occurs, do not print auxiliary info, write output, log and problem structures
to supplied file names):
It is possible to process multiple input files at a single run. This mode is activated by the inchi-1
command line option /AMI (Windows) or –AMI (Linux; AMI stands for “Allow Multiple Inputs”). In
this mode, all the file names supplied in the command line are considered as the names of
separate input files. For further convenience, the common file name wildcards (“*” and “?”) are
supported.
For example, issuing a command inchi-1 *.mol /AMI (Windows) or inchi-1 *.mol -AMI (Linux)
will instruct the executable to process all the MOL files in the current directory.
Note, that omitting the switch AMI assumes working in a conventional single-input mode which
may result in undesired treatment of wildcards[1].
In AMI mode, the names of output, log and problem files could not be individually specified.
Instead, they are formed, for each of multiple inputs, by appending the file name with suffixes
“.txt”, “.log” and “.prb”. However, to partially mimic the behavior of inchi-1 in conventional single-
input mode, three additional command line options are introduced (see section “Availability of
InChI Software options”, Table 4). They allow one to redirect the output to stdout, log to stderr, as
well as to suppress creation of problem files.
As indicated by tests, processing of multiple MOL files in AMI mode may be several times faster
(the exact speed-up depends on many details; anyway the processing time is still significantly
longer than that for a single SDF file containing the same data).
InChI Software since v. 1.05 provides experimental support of InChI/InChIKey for regular single-
strand polymers and experimental support of large molecules containing up to 32767 atoms was
added.
By default, the executable inchi-1 ignores polymer-specific data (which also ensures compatibility
with the behaviour of previous versions); to allow treatment of polymers, one should explicitly use
the new command line option Polymers (-Polymers under Linux or /Polymers under
Windows). Analogously, switch LargeMolecule s is necessary to enable processing molecules
having more than 1024 atoms.
Note that support of polymers is an experimental feature. To emphasize this, InChI/InChIKey for a
polymer uses the ‘B’ flag character (for “Beta”), instead of ‘S’ or ‘N’ for standard/non-standard
InChI. It is supposed that this flag will be replaced by common standard/non-standard
conventions if and when InChI for polymers is finally adopted.
Note also that treatment of polymers and the appearance of polymer data in InChI string
significantly changed in the current version 1.06. However, compatibility option /Polymers105
instructs InChI Software to handle polymers in legacy mode, v. 1.05 (it is planned that this option
will be eliminated in future, leaving explicit-pseudo atoms approach the sole mode).
For the full list of enhancements in v. 1.06, please see items "new in v. 1.06" in Table 4 below.
For more details please refer to InChI v. 1.06 Release Notes and InChI Technical manual
accompanying this distribution .
InChI Software Library (libinchi)
For advanced users who may want to create the Identifier in their own software the InChI
Software Library (InChI API) is provided in a separate package. The package contains ‘C’ source
code for inchi‑[Link], ‘C’ source code for the InChI Library that may be compiled into a Dynamic
Link Library (DLL) [Link] under Windows or Shared Object (SO) [Link] under Linux; also,
there are ‘C’ and Python examples of simple applications that read input Molfile and use the InChI
Library to produce Identifiers.
The InChI Library does not display structures and is not able to read chemical structural data from
the input file. It uses specially formatted input binary data and produces three strings: InChI, the
Auxiliary Information, and, if necessary, an error or warning message. Also, there are procedure
to calculate InChIKey and other service routines. The source code is accompanied with makefiles
tested with gcc under Windows and Linux.
The InChI Library allows one to generate both standard and non-standard InChIs/InChIKeys. For
example, an API function GetINCHI() produces standard InChI by default and non-standard InChI if
some “InChI creation option” is specified in input parameters. However, for compatibility with the
previous v. 1.02-standard (2009) release, the procedures which deal only with stdInChI – for
example, GetStdINCHI() - are retained.
The InChI API calls are documented in the separate “InChI API Reference” document .
To start the program, run the file [Link] that was extracted from the zip file.
Generating an InChI begins with the selection of an input structure file. The simplest way is to
drag the input structure file from Windows Explorer directory list into the InChI window.
Structures also may be copied from certain chemical structure editors (ISIS/Draw with “Copy
Mol/Rxnfile to the Clipboard” option or from ACD/ChemSketch) and pasted into the InChI window
(Select Edit -> Paste from winchi-1 menu). The input structure file pathname may be provided as a
command line option when you start winchi-1. Selection of the input structure file may also be
done by first clicking on the ‘Open’ button (top left corner) and then, in the dialog box that
appears selecting a structure file using the ‘…’ button on the right of the ‘Input Structure File’ field.
You may select any of the sample .mol or .sdf files for initial testing. In this dialog you may also
enter “Text Header for ID”; this will simply add to the InChI header a structure ID if it is present in
an input SDfile (from other input formats the header and ID are extracted automatically).
The result appearing after choosing file "[Link]" (contained in examples sub-
directory of distribution package: "INCHI-1-TEST\samples\UserGuide\[Link]" ) is
presented on Figure 1.
Figure 1
The main output window is composed of two sections: the upper section (shown in white in
Figure ) shows structural information graphically and the lower section (shown in gray in Figure 1)
shows text output.
Upper section
The structure is displayed along with labels generated by InChI algorithms. In cases where an SDF
file is input, the first structure shown is the first entry in the input file.
The example shown in Figure 1 is a single component example. If more than one component
(independent structure) is found in the first structure file (such as benzoic acid, sodium salt shown
in Figure 2), each may be separately examined using the “Choose component” ‘combo box’ on the
upper left of the screen, although they are treated as part of a single compound by InChI (Figures
3 and 4).
Figure 2
Figure 3
Figure 4
Figure 5
The buttons under “Display” permit viewing of the input structure and the preprocessed structure
if it differs from the input structure. . The buttons under “Options” are the same as in the
“Options” dialog box. “Mobile H Perception” removes the “fixed-H” part of the identifier. Figure 6
shows the same structure with the option “Mobile H Perception” off.
On the InChI Toolbar the rightmost box displays the number of sets of equivalent components.
When equivalent components are found, they may be highlighted by making a selection in the
box. This provides a quick way to determine if two depictions of the same compound are
considered to be the same by InChI algorithms, although the actual InChI generated will represent
the collection of structures as a single compound.
Stereochemical parities of bonds and atoms are also displayed. A question mark symbol indicates
that stereoisomerism is possible, but the configuration has not been specified. Bonds that have
been found to be variable by alternation or movement of mobile H-atoms or charges are shown
by dotted lines. This information is used only for deciding which bonds may exhibit double bond
(Z/E) isomerism. By design, the Identifier does not explicitly represent bond types.
Lower Section
The InChI along with auxiliary data and explanatory information is shown in the lower section of
the output window. Unlike the graphical display, even if more than one disconnected component
is found, all textual results for a single input structure file are shown together. This reflects the
important point that all components of a submitted structure are considered by InChI to be part
of a single compound. Results for different (disconnected) components of a single substance are
separated by semicolons, except for chemical formulas, which, in keeping with common
conventions, are separated by dots.
Options
Pressing the Options Button opens the InChI Options Dialog Box.
Figure 7
The following options are then available (as seen in Figure 7):
· Mobile H Perception – turning Off will fix all H-atoms (disallow H-migration), this allows the
generation of a fixed-H section of the Identifier (and makes the resulting InChI non-standard).
· Include Stereo (Absolute, Relative, Racemic, From chiral flag) – include stereo layer and choose
its type or exclude all stereo information from the identifier. If the last option is selected then in
presence of a chiral flag stereochemistry is considered absolute, otherwise relative.
For standard InChI the only allowed choice is absolute stereochemistry or omission of all stereo;
other choices make InChI non-standard.
· Different marks for unknown/undefined stereo – turning this option On will result in usage of
the two different signs, ‘u’ and ‘?’, for “unknown” and “undefined” stereo. Briefly: “undefined”
means not given while “unknown” means explicitly marked as unknown, e.g., with “wavy” bonds.
By default, this option is turned off and the twoh signs are merged to ‘?’ (that is, “unknown” stereo
treated as “undefined”).
· Both ends of wedge point to stereocenters – by default, this option is turned Off. This means
that that a stereo bond depicted by a wedge affects the stereochemistry of only the atom ‘pointed
to’ by the narrow end of that wedge. However, it may be turned On if the user is completely sure
that a stereobond affects both atoms it connects (that is, for 2D structures complying to the
legacy “perspective” stereochemistry drawing style).
· Include Bonds to Metal - turning On will add a layer that includes specific bonding to metals (in
case of salts the bonds between a metal and an acid cannot be reconnected – as seen in Figures
where that choice is “grayed out” and cannot be ticked or checked).
· Annotation Format (Plain Text; XML, None) – choose appropriate format for explanatory
information.
· Ignore Isotopes in Structure Display – this does not change the identifier, it only affects the
structure appearance and the display of sets of equivalent components.
Note that the above options form a subset of a full options set available in the command-line
executable inchi-1 (see section ‘InChI Software Options’ below).
The options are available in graphical program winchi-1, command line executable inchi-1 and
through InChI API. Not all the options are available for all the parts of software; the maximal set
of options is available for the inchi-1 program.
Options affecting generation of InChI are divided on “structure perception” options and “InChI
creation” options.
The perception options are considered drawing style/edit flags which affect the input structure
interpretation and are not memorized. It is assumed that the user may deliberately use these
options to account for the specific features of structure collections. Whence, perception options
may be used while generating standard InChI without loss of its “standardness”.
Perception options are listed in the following tables. Presented here are command line switches
available (they should be used with the appropriate prefix - i.e., NEWPSOFF should be entered as
/NEWPSOFF under Windows and -NEWPSOFF under Linux).
Table 1. Structure perception options.
There are several options (Table 2) which modify the interpretation of input stereochemical data.
In principle, they also may be considered “structure perception” options. However, as the
standard InChI, by definition, requires the use of absolute stereo (or no stereo at all), these
“perception” options assume generation of non-standard InChI.
Use Chiral Flag in MOL/SD file record: Use absolute stereo (or another option
SUCF if On – use absolute stereo, Off – if requested by SRel /SRac/SNon
relative switches)
The creation options affects the InChI algorithm, not structure perception. They modify the
defaults which are specified for standard InChI and significantly affect the final appearance (e.g.,
additional InChI layers may appear). Whence, using any of the creation options qualifies the
resulting identifier as non-standard.
Creation options used for generation of a particular non-standard InChI may be appended to the
created identifier, see below.
The standard InChI is always generated if no InChI creation/stereo modification options are
specified. This means:
· include tautomerism (i.e., turn mobile H perception on, exclude “fixed hydrogen atoms” layer)
except for keto-enol and 1,5-tautomerism;
Since the software v. 1.03, the command-line option SaveOpt was introduced to append saved
InChI creation options to a non-standard InChI string.
Since v. 1.06, this option is deprecated.
It is still retained for compatibility reasons, but no further development is planned. For the
SaveOpt details, please consult InChI Software User Guide of v. 1.05 (2017) available at [Link]
[Link]/downloads/
The next table summarizes the current, v. 1.06, availability of various options in the various parts
of the InChI Software.
Table 4. Availability of InChI Software options (note entries marked "new in v. 1.06 ).
Options
availability
Option
inchi- API
winchi (without / or – Explanation
1 calls
prefix)
Input
Output
Do not produce
- Yes Yes AuxNone Auxiliary
Information
Omit structure
number,
- Yes - NoLabels
DataHeader and ID
from InChI output
(new in v. 1.06) Do
- Yes Yes NoInChI not print InChI
string itself
(deprecated since v.
1.06) Save custom
- Yes Yes SaveOpt
InChI creation
options
Separate structure
- Yes - Tabbed number, InChI, and
AuxIndo with tabs
(new in v. 1.06)
Suppress all
- Yes Yes NoWarnings warning
messages(default:
show)
Display the
Always Yes - D
structure
Convert InChI
created with default
- Yes Yes OutputSDF
auxiliary info to a
SDfile
Output Hydrogen
- Yes Yes SdfAtomsDT Isotopes to SDfile as
Atoms D and T
Write output to
- Yes - AMIOutStd stdout (in AMI mode
only)
Suppress creation
- Yes - AMIPrbNone of problem files (in
AMI mode only)
Structure perception
Do not add H
- Yes Yes DoNotAddH according to usual
valences
Ignore stereo
Yes Yes Yes SNon information in input
structures
Stereo perception
modifiers (non-
standard InChI)
Customizing InChI
creation (non-standard
InChI, “InChI=1/…”)
Always include
omitted
Yes Yes Yes SUU
unknown/undefined
stereo
Include reconnected
Yes Yes Yes RecMet
metals results
Options availability
Include Fixed H
Yes Yes Yes FixedH
layer
Experimental
(InChI=1/B…”)
Experimental
support of
Always Yes Yes LargeMolecules
molecules up to
32767 atoms
Experimental
Yes Yes Yes Polymers support of simple
polymers
(new in v. 1.06)
Experimental
Yes Yes Yes Polymers105 support of simple
polymers in older v.
1.05 way
(new in v. 1.06)
Yes Yes Yes NoFrameShift Disable polymer
CRU frame shift
(new in v. 1.06)
Enable stereo at
atoms connected to
- Yes Yes SAtZz
Zz pseudo atoms
(default:
disable/ignore)
Generation
Generate hash
extension (to 256
Always Yes - **) XHash1
bits) for 1st block of
InChIKey
Generate hash
extension (to 256
Always Yes - **) XHash2
bits) for 2nd block
of InChIKey
(new in v. 1.06) On
output, combine
- Yes - MergeHash
InChIKey with extra
hash(es)
Conversion
Convert standard
- Yes - InChI2Struct InChI string(s) into
structure(s)
*) W0 means unlimited time. In InChI Library the default is W0, in inchi-1 the default is 60 seconds
(W60).
**) In InChI Library, generation of InChIKey/hash extensions is performed via a separate API call.
If an SDfile is ‘labeled’, the program can supply these labels in its output. If the tag name is ‘Name’
and the data field is ‘2-methylanthracene’, this information would appear in the SDfile as 3 lines
(the last line is blank):
>
2-methylanthracene
In this case, if the tag ‘Name’ is entered in the ‘Structure ID Header’ field in the input dialog box, ‘2-
methylanthracene’ will appear in the output text.
A variety of structure files are provided for testing. Individual Molfiles have extension .MOL,
concatenated Molfiles have extension .SDF.
Since v. 1.05, the ability to read and parse large (up to 32767 atoms) input files in Molfile V3000
format was added to the inchi-1 executable and the API procedure MakeINCHIFromMolfileText().
This is necessary for treating large molecules (previous versions supported only V2000 format
limited to not more than 1000 atoms).
In addition, provisional support for extended features of Molfile V3000 was also added, both to
inchi-1 and the InChI Software Library, API. This means that extended data (on haptic
coordination bonds and stereo collections) are read and parsed; however, they are not used
currently (as this requires significant modification of the Identifier itself, not just the Software).
InChI Software has experimental support of regular single-strand polymers. Both structure-
based and source-based representation and encoding of polymers are supported.
Executable inchi-1 supports reading input Molfile files containing polymer description lines. This
support is also built into several API procedures (see "API Reference" document of this
distribution) .
Test Files
A number of Molfiles (.mol) and SDfiles (.sdf) are included with this distribution for illustrative
purposes. In particular, file samples_UserGuide.zip contains MOL files for the examples used in
this document.
V. Further reading and contacts
In addition to this introductory User Guide, a number of materials concerning InChI is currently
available.
In particular, this distribution contains separate documents (PDF files) with InChI Technical
Manual (InChI_TechMan.pdf) and InChI API Reference (InChI_API_Reference.pdf). For more brief
and less technical description, look at: Heller, S., McNaught, A., Pletnev, I., Stein, S., and Tchekhovskoi,
D. InChI, the IUPAC international chemical identifier. Journal of Cheminformatics 7 (2015), 23–23. DOI:
10.1186/s13321-015-0068-4 For much more brief description, as well as for background and
history, look at: Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D., and Pletnev, I. InChI - the worldwide
chemical structure identifier standard. Journal of Cheminformatics 5 (2013), 7–7. DOI: 10.1186/1758-
2946-5-7
Contacts:
Richard Kidd,
InChI Trust,
c/o Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, UK CB2 1EZ
richard@[Link]