0% found this document useful (0 votes)
18K views23 pages

User'S Guide: About Inchi

This document provides an overview and user guide for the IUPAC International Chemical Identifier (InChI) software. It describes how InChI generates unique identifiers for chemical structures by normalizing, canonicalizing, and serializing the structure representation. The software can generate both standard and non-standard InChIs, as well as InChIKeys which are hashed identifiers. The guide explains the differences between standard and non-standard InChIs and how they are designated. It also provides background on the development of InChI and describes the included executable programs for generating InChIs from chemical structure files.

Uploaded by

Maneet Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18K views23 pages

User'S Guide: About Inchi

This document provides an overview and user guide for the IUPAC International Chemical Identifier (InChI) software. It describes how InChI generates unique identifiers for chemical structures by normalizing, canonicalizing, and serializing the structure representation. The software can generate both standard and non-standard InChIs, as well as InChIKeys which are hashed identifiers. The guide explains the differences between standard and non-standard InChIs and how they are designated. It also provides background on the development of InChI and describes the included executable programs for generating InChIs from chemical structure files.

Uploaded by

Maneet Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

IUPAC International Chemical Identifier (InChI)

InChI version 1, Software version 1.06

User's Guide
Last revision date: December 15, 2020

User's Guide
I. OVERVIEW
About InChI
Standard and non-standard InChI
About InChIKey
II. ABOUT InChI PROGRAMS
III. RUNNING InChI PROGRAMS
Command Line Executable inchi-1
InChI Software Library (libinchi)
Graphical Interface Program (winchi-1)
InChI Software Options
IV. CHEMICAL STRUCTURE INPUT
V. Further reading and contacts

This introductory User Guide is addressed to the novice user of InChI whose primary interest is to
learn how to produce InChI/InChIKey identifiers of chemical compounds with InChI executables
included in InChI Software distribution (note that alternatively one may simply use nearly any
chemical drawing programs as, at the moment of this writing, they typically have a built-in InChI
generation ability).

I. OVERVIEW
About InChI
The IUPAC International Chemical Identifier (InChI) provides unique labels for well-defined
chemical substances. These labels are generated by converting an input chemical structure, in the
form of a ‘connection table’, to a unique and predictable series of ASCII characters. They offer a
means for representing chemical compounds in a manner that does not depend on how they
were drawn. Note that they are re-expressions of chemical structures, they are not registry or
registration numbers and do not require access to a database. They were developed primarily as
a means of ‘naming’ a compound in digital media although they are expressed as simple text that
may be manually interpreted. This document describes the operation and output of the present
version of the program that generates this Identifier.

The Identifier is designed to process single, well-defined chemical compounds (which may be
composed of multiple components).

InChI is a project of the International Union of Pure and Applied Chemistry (IUPAC) described at: h
ttp://[Link]/inchi/

The IUPAC body which takes care of the current and future shape of InChI is the “IUPAC InChI
Subcommittee” (IUPAC Division VIII InChI Subcommittee), which reports to IUPAC Division VIII and
also to the IUPAC Committee on Publications and Cheminformatics Data Standards. There exist
also InChI Subcommittee working groups made up of additional chemists who are developing
rules for extending the capabilities of InChI. See: [Link]

Historically, the primary development of the InChI algorithm and software took place at NIST (US
National Institute of Standards and Technology, USA) under the auspices of IUPAC. Since 2009,
the responsibility for InChI technical development and promotion has been in the hands of the
InChI Trust ([Link] – a not-for-profit organization which works in close
contact with IUPAC (and of which IUPAC is a member).

Technical details are given in a separate document, the InChI Technical Manual. The basic
algorithms were taken from the literature, with selection, testing and implementation done
primarily at NIST; with modifications and additions by IUPAC and the InChI Trust.

In the several years of its development, many individuals contributed to the development of the
InChI at meetings and through correspondence. The chemical rules employed are intended to
represent a consensus view of the concept of chemical identity. The computer program described
in this document applies these algorithms to input structures and generates both the Identifier
and an annotated depiction of the structure.

Derivation of the InChI from an input chemical structure proceeds through three steps:
1) normalization – all input information not needed for structure identification is discarded and
structure information is divided into ‘layers’; 2) canonicalization – each atom is given a label that
depends only on its position in the structure; 3) serialization – a string of characters, the Identifier,
is generated from the canonical labels. All ‘chemical’ rules are applied in the first step.

The current version of InChI Identifier is 1; the current stable version of the InChI software is 1.06
which replaces the previous version 1.05.

Standard and non-standard InChI


InChI has a layered structure which allows one to represent molecular structure with a desired
level of detail. Accordingly, the InChI Software may generate different InChI strings for the same
molecule, depending on the choice of a multitude of options (e.g., distinguishing or not
distinguishing tautomers). This flexibility, however, may be considered a drawback with respect to
standardization/interoperability. The standard InChI which is always produced with fixed options
was defined by the IUPAC InChI Subcommittee in response to these concerns.

The standard InChI was defined to ensure interoperability/compatibility between large


databases/web searching and information exchange. As related to its internal layered structure,
standard InChI, introduced in v.1.02-standard (2009) release of InChI Software, is a subset of
IUPAC International Chemical Identifier v.1. The layered structure of the standard InChI conforms
to the following requirements.

· Standard InChI organometallic representation does not include bonds to metal for the time
being.

· Standard InChI distinguishes between chemical substances at the level of ‘connectivity’,


‘stereochemistry’, and ‘isotopic composition’, where:

§ connectivity means tautomer-invariant valence-bond connectivity (different tautomers have the


same connectivity/hydrogen layer);

§ stereochemistry means configuration of stereogenic atoms and bonds; unknown stereo


designations are treated as undefined;

§ isotopic composition means mass numbers of isotopic atoms (when specified)

Standard InChI v.1 was introduced in v. 1.02-standard release of the InChI Software in 2009 (this
software version was able of generating only standard InChIs).
The present release of InChI Software, v. 1.06, has merged functionality. It allows one to produce
both standard and non-standard InChI strings, as well as their hashed representation (InChIKey).

By default, InChI Software v. 1.06 produces standard InChI (for brevity, stdInChI below). In
particular, the standard identifier is generated when the software is used without any specifically
added options. If some options are specified, and at least one of them qualifies as related to non-
standard InChI (see section ‘InChI Software Options’ below), the program produces non-
stdInChI/InChIKey.

The standard InChI is designated by the prefix: “InChI=1S/……….. “ (that is, letter ‘S’ immediately
follows the Identifier version number, ‘1’; Identifier version numbers should always be whole
numbers).

Non-standard InChI is designated by the prefix: “InChI=1/……….. “ (that is, letter ‘S’ is omitted).

InChI’s obtained with the experimental features of the Software (support of polymers; support of
“large” molecules) are designated by the prefix: InChI=1B/……….. (‘B’ for beta).

About InChIKey
The InChIKey is a character signature based on a hash code of the InChI string. A hash code is a
fixed length condensed digital representation of a variable length character string. Providing a
hash derived from an InChI string should be helpful in search applications, including Web
searching and chemical structure database indexing; also, this hash may serve as a checksum for
verifying InChI, for example, after transmission over a network.

The InChIKey consists of two blocks. The first block is always the same for the same molecular
skeleton. All isotopic substitutions, changes in stereoconfiguration, tautomerism and protonation
are reflected in the second block.

A standard InChIKey, which is a key produced from a standard InChI, does not account for
tautomerism and may indicate only absolute stereo (or completely ignore stereo). It also does not
account for the original structure’s bonds to metal.

The two hash blocks of InChIKey are based on a truncated SHA-256 cryptographic hash function.

Note that due to the very essence of hashing, appearance of collisions (the same InChIKey for
different InChIs/structures) is unavoidable in very large collections. A theoretical – optimistic –
estimate of collision resistance is as follows. The probability of a single first block collision in a
database of 1 billion compounds is 1.3%. In other words, a single first block collision is expected in
1 out of 100/1.3 = 75 databases of 109 compounds each. For 108 (100 million) compounds in a
database this probability is 0.014%.

For more details, please refer to the dedicated paper:

I. Pletnev, A. Erin, A. McNaught, K. Blinov, D. Tchekhovskoi, and S. Heller. InChIKey collision


resistance: an experimental testing. Journal of Cheminformatics, 4:39–39, 2012. DOI:
10.1186/1758-2946-4-39.

A beta-version of the InChIKey was introduced in software v. 1.02-beta (2007). The standard
InChIKey was introduced in v. 1.02-standard release (2009) as an InChIKey computed from the
standard InChI and intended for the principal purpose of a search-engine-style lookup of chemical
information. The present release of InChI Software v. 1.06 has merged functionality. It allows one
to produce both standard and non-standard InChIKey.

 
II. ABOUT InChI PROGRAMS
This document is accompanied by version 1.06 of the InChI generator executable. This program
runs under 32/64 bit Microsoft Windows ([Link]) and Linux (inchi-1) operating systems. Also
included is [Link], a convenient Windows graphical-interface application.

As structure input, the programs currently accept standard SDfiles, Molfiles


[see “Description of several chemical structure file formats used by computer programs
developed at Molecular Design Limited” by Arthur Dalby, James G. Nourse, W. Douglas Hounshell,
Ann K. I. Gushurst, David L. Grier, Burton A. Leland, and John Laufer, Journal of Chemical
Information and Computer Sciences, 1992; 32(3); pp. 244-255]; a more recent description of
V2000 and the latest V3000 formats may be downloaded from [[Link]
borative-science/biovia-draw/[Link]], or its own output produced when the “Full
auxiliary information” option is selected. Input may originate from individual disk files or through
the Windows clipboard. From v. 1.05, a limited support of V3000 Molfiles is included.

InChI may be also generated by using Software Library/application programming interface (API).
This is described later.

III. RUNNING InChI PROGRAMS


Command Line Executable inchi-1
An executable program ([Link] under Windows, inchi-1 under Linux) uses command line
arguments.
Full list of available arguments (options) is shown by invoking the program without any
arguments.

InChI and InChIKey strings generated by inchi-1 is the reference standard which all other software
entities generating InChI/InChIKey should match.

The principal use of the program is batch processing of multiple structure files, primarily SDF files.

Windows version is able of displaying chemical structures, the Linux version does not display
structures.
Standard redirection may be used to suppress inchi-1 console output.

Simplistic example of generating InChI and InChIKey under Windows:

[Link] /Key [Link]

under Linux:

./inchi-1 -Key [Link]

More advanced example (use advanced v. 1.06 handling of polymers and pseudo atoms, generate
'empty InChI' if error occurs, do not print auxiliary info, write output, log and problem structures
to supplied file names):

[Link] /Key /POLYMERS /FOLDCRU /NPZZ /OUTERRINCHI /AUXNONE [Link]


[Link] [Link] [Link]

./inchi-1 -Key -POLYMERS -FOLDCRU -NPZZ -/OUTERRINCHI -AUXNONE [Link]


[Link] [Link] [Link]
 

It is possible to process multiple input files at a single run. This mode is activated by the inchi-1
command line option /AMI (Windows) or –AMI (Linux; AMI stands for “Allow Multiple Inputs”). In
this mode, all the file names supplied in the command line are considered as the names of
separate input files. For further convenience, the common file name wildcards (“*” and “?”) are
supported.

For example, issuing a command inchi-1 *.mol /AMI (Windows) or inchi-1 *.mol -AMI (Linux)
will instruct the executable to process all the MOL files in the current directory.

Note, that omitting the switch AMI assumes working in a conventional single-input mode which
may result in undesired treatment of wildcards[1].

In AMI mode, the names of output, log and problem files could not be individually specified.
Instead, they are formed, for each of multiple inputs, by appending the file name with suffixes
“.txt”, “.log” and “.prb”. However, to partially mimic the behavior of inchi-1 in conventional single-
input mode, three additional command line options are introduced (see section “Availability of
InChI Software options”, Table 4). They allow one to redirect the output to stdout, log to stderr, as
well as to suppress creation of problem files.

Examples (Windows, Linux):

inchi-1 nci*.mol /AMI /AMIOutStd /AMIPrbNone /AuxNone /Key

./inchi-1 /home/me/mol/nci/*.mol -AMI -AMILogStd -AMIPrbNone


-RecMet –FixedH

As indicated by tests, processing of multiple MOL files in AMI mode may be several times faster
(the exact speed-up depends on many details; anyway the processing time is still significantly
longer than that for a single SDF file containing the same data).

InChI Software since v. 1.05 provides experimental support of InChI/InChIKey for regular single-
strand polymers and experimental support of large molecules containing up to 32767 atoms was
added.

By default, the executable inchi-1 ignores polymer-specific data (which also ensures compatibility
with the behaviour of previous versions); to allow treatment of polymers, one should explicitly use
the new command line option Polymers (-Polymers under Linux or /Polymers under
Windows). Analogously, switch LargeMolecule s is necessary to enable processing molecules
having more than 1024 atoms.

Note that support of polymers is an experimental feature. To emphasize this, InChI/InChIKey for a
polymer uses the ‘B’ flag character (for “Beta”), instead of ‘S’ or ‘N’ for standard/non-standard
InChI. It is supposed that this flag will be replaced by common standard/non-standard
conventions if and when InChI for polymers is finally adopted.

Note also that treatment of polymers and the appearance of polymer data in InChI string
significantly changed in the current version 1.06. However, compatibility option /Polymers105
instructs InChI Software to handle polymers in legacy mode, v. 1.05 (it is planned that this option
will be eliminated in future, leaving explicit-pseudo atoms approach the sole mode).

For the full list of enhancements in v. 1.06, please see items "new in v. 1.06" in Table 4 below.

For more details please refer to InChI v. 1.06 Release Notes and InChI Technical manual
accompanying this distribution .
InChI Software Library (libinchi)
For advanced users who may want to create the Identifier in their own software the InChI
Software Library (InChI API) is provided in a separate package. The package contains ‘C’ source
code for inchi‑[Link], ‘C’ source code for the InChI Library that may be compiled into a Dynamic
Link Library (DLL) [Link] under Windows or Shared Object (SO) [Link] under Linux; also,
there are ‘C’ and Python examples of simple applications that read input Molfile and use the InChI
Library to produce Identifiers.

The InChI Library does not display structures and is not able to read chemical structural data from
the input file. It uses specially formatted input binary data and produces three strings: InChI, the
Auxiliary Information, and, if necessary, an error or warning message. Also, there are procedure
to calculate InChIKey and other service routines. The source code is accompanied with makefiles
tested with gcc under Windows and Linux.

The InChI Library allows one to generate both standard and non-standard InChIs/InChIKeys. For
example, an API function GetINCHI() produces standard InChI by default and non-standard InChI if
some “InChI creation option” is specified in input parameters. However, for compatibility with the
previous v. 1.02-standard (2009) release, the procedures which deal only with stdInChI – for
example, GetStdINCHI() - are retained.

The InChI API calls are documented in the separate “InChI API Reference” document .

Graphical Interface Program (winchi-1)


Introduction

Windows graphical program of InChI generation is provided in a ‘zip’ file [Link]. To

To start the program, run the file [Link] that was extracted from the zip file.

Generating an InChI begins with the selection of an input structure file. The simplest way is to
drag the input structure file from Windows Explorer directory list into the InChI window.
Structures also may be copied from certain chemical structure editors (ISIS/Draw with “Copy
Mol/Rxnfile to the Clipboard” option or from ACD/ChemSketch) and pasted into the InChI window
(Select Edit -> Paste from winchi-1 menu). The input structure file pathname may be provided as a
command line option when you start winchi-1. Selection of the input structure file may also be
done by first clicking on the ‘Open’ button (top left corner) and then, in the dialog box that
appears selecting a structure file using the ‘…’ button on the right of the ‘Input Structure File’ field.
You may select any of the sample .mol or .sdf files for initial testing. In this dialog you may also
enter “Text Header for ID”; this will simply add to the InChI header a structure ID if it is present in
an input SDfile (from other input formats the header and ID are extracted automatically).

The result appearing after choosing file "[Link]" (contained in examples sub-
directory of distribution package: "INCHI-1-TEST\samples\UserGuide\[Link]" ) is
presented on Figure 1.
Figure 1

The main output window is composed of two sections: the upper section (shown in white in
Figure ) shows structural information graphically and the lower section (shown in gray in Figure 1)
shows text output.

Upper section

The structure is displayed along with labels generated by InChI algorithms. In cases where an SDF
file is input, the first structure shown is the first entry in the input file.
The example shown in Figure 1 is a single component example. If more than one component
(independent structure) is found in the first structure file (such as benzoic acid, sodium salt shown
in Figure 2), each may be separately examined using the “Choose component” ‘combo box’ on the
upper left of the screen, although they are treated as part of a single compound by InChI (Figures
3 and 4).
Figure 2
Figure 3
Figure 4
Figure 5

The buttons under “Display” permit viewing of the input structure and the preprocessed structure
if it differs from the input structure. . The buttons under “Options” are the same as in the
“Options” dialog box. “Mobile H Perception” removes the “fixed-H” part of the identifier. Figure 6
shows the same structure with the option “Mobile H Perception” off.

On the InChI Toolbar the rightmost box displays the number of sets of equivalent components.
When equivalent components are found, they may be highlighted by making a selection in the
box. This provides a quick way to determine if two depictions of the same compound are
considered to be the same by InChI algorithms, although the actual InChI generated will represent
the collection of structures as a single compound.

Figure 6. InChI Toolbar


The structure display shows the canonical identification number of each atom along with the non-
stereo equivalence class number assigned to that atom. The canonical number is the unique
number given to an atom and used for ‘serialization’ (creation of the actual InChI). The non-stereo
equivalence class number is a number assigned to each set of equivalent atoms (all atoms having
the same equivalence class number are indistinguishable, ignoring stereochemistry; the
equivalence class number is the smallest canonical identification number in the class of
equivalent atoms). This information is only intended to assist in the understanding of results of
InChI processing and is not directly used in InChI generation except in the processing of
stereochemistry.

Stereochemical parities of bonds and atoms are also displayed. A question mark symbol indicates
that stereoisomerism is possible, but the configuration has not been specified. Bonds that have
been found to be variable by alternation or movement of mobile H-atoms or charges are shown
by dotted lines. This information is used only for deciding which bonds may exhibit double bond
(Z/E) isomerism. By design, the Identifier does not explicitly represent bond types.

Lower Section

The InChI along with auxiliary data and explanatory information is shown in the lower section of
the output window. Unlike the graphical display, even if more than one disconnected component
is found, all textual results for a single input structure file are shown together. This reflects the
important point that all components of a submitted structure are considered by InChI to be part
of a single compound. Results for different (disconnected) components of a single substance are
separated by semicolons, except for chemical formulas, which, in keeping with common
conventions, are separated by dots.

Options

Pressing the Options Button opens the InChI Options Dialog Box.
Figure 7

The following options are then available (as seen in Figure 7):

· Mobile H Perception – turning Off will fix all H-atoms (disallow H-migration), this allows the
generation of a fixed-H section of the Identifier (and makes the resulting InChI non-standard).

· Include Stereo (Absolute, Relative, Racemic, From chiral flag) – include stereo layer and choose
its type or exclude all stereo information from the identifier. If the last option is selected then in
presence of a chiral flag stereochemistry is considered absolute, otherwise relative.
For standard InChI the only allowed choice is absolute stereochemistry or omission of all stereo;
other choices make InChI non-standard.

· Always include omitted/undefined stereo – by default, InChI does not include


unknown/undefined stereo unless at least one defined stereo is present in the input structure.
Turning this option On results in inclusion of unknown/undefined stereo in all cases.

· Different marks for unknown/undefined stereo – turning this option On will result in usage of
the two different signs, ‘u’ and ‘?’, for “unknown” and “undefined” stereo. Briefly: “undefined”
means not given while “unknown” means explicitly marked as unknown, e.g., with “wavy” bonds.
By default, this option is turned off and the twoh signs are merged to ‘?’ (that is, “unknown” stereo
treated as “undefined”).

· Both ends of wedge point to stereocenters – by default, this option is turned Off. This means
that that a stereo bond depicted by a wedge affects the stereochemistry of only the atom ‘pointed
to’ by the narrow end of that wedge. However, it may be turned On if the user is completely sure
that a stereobond affects both atoms it connects (that is, for 2D structures complying to the
legacy “perspective” stereochemistry drawing style).

· Include Bonds to Metal - turning On will add a layer that includes specific bonding to metals (in
case of salts the bonds between a metal and an acid cannot be reconnected – as seen in Figures
where that choice is “grayed out” and cannot be ticked or checked).

· Annotation Format (Plain Text; XML, None) – choose appropriate format for explanatory
information.

· Ignore Isotopes in Structure Display – this does not change the identifier, it only affects the
structure appearance and the display of sets of equivalent components.

Note that the above options form a subset of a full options set available in the command-line
executable inchi-1 (see section ‘InChI Software Options’ below).

InChI Software Options


The exact set of InChI Software options has been changing from release to release.

The description below refers to the current v. 1.06.

The options are available in graphical program winchi-1, command line executable inchi-1 and
through InChI API. Not all the options are available for all the parts of software; the maximal set
of options is available for the inchi-1 program.

Options affecting generation of InChI are divided on “structure perception” options and “InChI
creation” options.

The perception options are considered drawing style/edit flags which affect the input structure
interpretation and are not memorized. It is assumed that the user may deliberately use these
options to account for the specific features of structure collections. Whence, perception options
may be used while generating standard InChI without loss of its “standardness”.

Perception options are listed in the following tables. Presented here are command line switches
available (they should be used with the appropriate prefix - i.e., NEWPSOFF should be entered as
/NEWPSOFF under Windows and -NEWPSOFF under Linux).

 
Table 1. Structure perception options.

Structure Default behavior


perception Meaning (standard; if no option
option supplied)

     

Only the narrow end of a


Both ends of a wedge (which indicates
NEWPSOFF wedge points to a
stereochemistry) point to stereocenters
stereocenter

All hydrogens in input structure are Add H according to usual


DoNotAddH
explicit valences

SNon Ignore stereo Use absolute stereo

There are several options (Table 2) which modify the interpretation of input stereochemical data.
In principle, they also may be considered “structure perception” options. However, as the
standard InChI, by definition, requires the use of absolute stereo (or no stereo at all), these
“perception” options assume generation of non-standard InChI.

Table 2. Stereo interpretation options (lead to generation of non-standard InChI).

Stereo Default behavior (standard; if no


Meaning
option option supplied)

     

SRel Use relative stereo Use absolute stereo

SRac Use racemic stereo Use absolute stereo

Use Chiral Flag in MOL/SD file record: Use absolute stereo (or another option
SUCF if On – use absolute stereo, Off – if requested by SRel /SRac/SNon
relative switches)

The creation options affects the InChI algorithm, not structure perception. They modify the
defaults which are specified for standard InChI and significantly affect the final appearance (e.g.,
additional InChI layers may appear). Whence, using any of the creation options qualifies the
resulting identifier as non-standard.

Creation options used for generation of a particular non-standard InChI may be appended to the
created identifier, see below.

InChI creation options are listed in the following table.

Table 3. InChI creation options.


InChI creation Default behavior (if no option
Meaning
option supplied)

     

Does not indicate


Always indicate unknown/undefined stereo
SUU
unknown/undefined stereo unless at least one defined
stereocenter is present

Stereo labels for “unknown” and


Stereo labels for “unknown” and
SLUUD “undefined” are different, ‘u’ and ‘?’,
“undefined” are the same (‘?’)
resp. (new option; see explanation)

RecMet Include reconnected metals results Do not include

FixedH Include Fixed H layer Do not include

Account for keto-enol tautomerism


KET (experimental extension to InChI v. Ignore keto-enol tautomerism
1)

Account for 1,5-tautomerism


15T (experimental extension to InChI v. Ignore 1,5-tautomerism
1)

Accept molecules containing more


Reject molecules containing
LargeMolecules than 1024 (but less than 32767)
more than 1024 atoms
atoms

Accept polymer data in input Ignore polymer data in input


Polymers
V2000 Molfiles. V2000 Molfiles.

Output empty InChI and


OutErrInChI corresponding InChIKey if error Output nothing
occurs

The standard InChI is always generated if no InChI creation/stereo modification options are
specified. This means:

· include tautomerism (i.e., turn mobile H perception on, exclude “fixed hydrogen atoms” layer)
except for keto-enol and 1,5-tautomerism;

· omit reconnection of bonds to metal atoms;

· only the narrow end of a wedge points to a stereocenter;

· exclude unknown/undefined stereo if no other stereo is present;

· treat stereochemistry as absolute (not relative or racemic).

Inversely, if any of SUU/SLUUD/RecMet/FixedH/Ket/15T/SRel/SRac/SUCF options are specified in


the command line, the generated InChI will be non-standard.

Since the software v. 1.03, the command-line option SaveOpt was introduced to append saved
InChI creation options to a non-standard InChI string.
Since v. 1.06, this option is deprecated.
It is still retained for compatibility reasons, but no further development is planned. For the
SaveOpt details, please consult InChI Software User Guide of v. 1.05 (2017) available at [Link]
[Link]/downloads/

The next table summarizes the current, v. 1.06, availability of various options in the various parts
of the InChI Software.

Table 4. Availability of InChI Software options (note entries marked "new in v. 1.06 ).

Options
       
availability

Option
inchi- API
winchi (without / or – Explanation
1 calls
prefix)

Input        

- Yes - STDIO Use standard input/output streams

Input structures in InChI default


- Yes - InpAux aux. info format (for use with
STDIO)

Read from the input SDfile the ID


Yes Yes Yes SDF:name
under the named data header

- Yes - AMI Allow multiple input files

- Yes - START:number Start from SDF record number

- Yes - END:number End at SDF record number

- Yes - RECORD:number process only SDF record number


Options availability        

Output        

Do not produce
- Yes Yes AuxNone Auxiliary
Information

Omit structure
number,
- Yes - NoLabels
DataHeader and ID
from InChI output

(new in v. 1.06) Do
- Yes Yes NoInChI not print InChI
string itself

(deprecated since v.
1.06) Save custom
- Yes Yes SaveOpt
InChI creation
options

Separate structure
- Yes - Tabbed number, InChI, and
AuxIndo with tabs

(new in v. 1.06)
Suppress all
- Yes Yes NoWarnings warning
messages(default:
show)

On fail, print empty


- Yes - OutErrInChI InChI (default:
nothing)

Display the
Always Yes - D
structure

Convert InChI
created with default
- Yes Yes OutputSDF
auxiliary info to a
SDfile

Output Hydrogen
- Yes Yes SdfAtomsDT Isotopes to SDfile as
Atoms D and T

Write output to
- Yes - AMIOutStd stdout (in AMI mode
only)

Write log messages


- Yes - AMILogStd to stderr (in AMI
mode only)
Options availability        

Suppress creation
- Yes - AMIPrbNone of problem files (in
AMI mode only)

Structure perception        

Both ends of wedge


Yes Yes Yes NEWPSOFF point to
stereocenters

Do not add H
- Yes Yes DoNotAddH according to usual
valences

Ignore stereo
Yes Yes Yes SNon information in input
structures

(new in v. 1.06) Relax


criteria of
- Yes Yes LooseTSACheck ambiguous drawing
for in-ring
tetrahedral stereo

Stereo perception
modifiers (non-        
standard InChI)

Yes Yes Yes SRel Relative stereo

Yes Yes Yes SRac Racemic stereo

Use Chiral Flag: On


Yes Yes Yes SUCF means Absolute
stereo, Off - Relative

Customizing InChI
creation (non-standard        
InChI, “InChI=1/…”)

Always include
omitted
Yes Yes Yes SUU
unknown/undefined
stereo

Make labels for


unknown and
Yes Yes Yes SLUUD
undefined stereo
different

Include reconnected
Yes Yes Yes RecMet
metals results
Options availability        

Include Fixed H
Yes Yes Yes FixedH
layer

Experimental
       
(InChI=1/B…”)

Account for keto-


Yes Yes Yes KET enol tautomerism
(experimental)

Account for 1,5-


Yes Yes Yes 15T tautomerism
(experimental)

Experimental
support of
Always Yes Yes LargeMolecules
molecules up to
32767 atoms

Experimental
Yes Yes Yes Polymers support of simple
polymers

(new in v. 1.06)
Experimental
Yes Yes Yes Polymers105 support of simple
polymers in older v.
1.05 way

(new in v. 1.06) Fold


Yes Yes Yes FoldCRU polymer CRU if
inner repeats occur

(new in v. 1.06)
Yes Yes Yes NoFrameShift Disable polymer
CRU frame shift

(new in v. 1.06) Allow


non-polymer Zz
Yes Yes Yes NPZz
(pseudo element)
atoms

(new in v. 1.06)
Enable stereo at
atoms connected to
- Yes Yes SAtZz
Zz pseudo atoms
(default:
disable/ignore)

Generation        

Set time-out per


Yes
- (always 60) Yes*) Wnumber structure to number
*)
seconds
Options availability        

(new in v. 1.06) Set


Yes time-out per
- (always 60000) Yes*) WMnumber
*) structure to number
milliseconds

Warn and produce


- Yes Yes WarnOnEmptyStructure empty InChI for
empty structure

Always Yes - **) Key Generate InChIKey

Generate hash
extension (to 256
Always Yes - **) XHash1
bits) for 1st block of
InChIKey

Generate hash
extension (to 256
Always Yes - **) XHash2
bits) for 2nd block
of InChIKey

(new in v. 1.06) On
output, combine
- Yes - MergeHash
InChIKey with extra
hash(es)

Conversion        

Convert standard
- Yes - InChI2Struct InChI string(s) into
structure(s)

*) W0 means unlimited time. In InChI Library the default is W0, in inchi-1 the default is 60 seconds
(W60).

**) In InChI Library, generation of InChIKey/hash extensions is performed via a separate API call.

IV. CHEMICAL STRUCTURE INPUT


Molfile structures may be submitted either as a single Molfile or as a series of concatenated
Molfiles (an SDfile). A number of programs, some of them freely available, may be used to create
these Molfiles. Information on how to produce and convert If an input structure contains more
than one independent structure, each component is individually shown in the graphical output
section of the program, though this has no effect on the InChI. Text results are given for all layers
and all components (different components of a single substance are separated by semicolons in
each layer, except for chemical formulas, which, by convention, are separated by dots.).
While structure normalization methods built into the program perceive a range of different
structure drawing conventions, it is possible that other conventions may not be properly
recognized. Examination of the graphical results of InChI processing, especially for equivalent
atom classes and stereo labeling, should reveal such problems.

If an SDfile is ‘labeled’, the program can supply these labels in its output. If the tag name is ‘Name’
and the data field is ‘2-methylanthracene’, this information would appear in the SDfile as 3 lines
(the last line is blank):

>

2-methylanthracene

In this case, if the tag ‘Name’ is entered in the ‘Structure ID Header’ field in the input dialog box, ‘2-
methylanthracene’ will appear in the output text.

A variety of structure files are provided for testing. Individual Molfiles have extension .MOL,
concatenated Molfiles have extension .SDF.

Since v. 1.05, the ability to read and parse large (up to 32767 atoms) input files in Molfile V3000
format was added to the inchi-1 executable and the API procedure MakeINCHIFromMolfileText().
This is necessary for treating large molecules (previous versions supported only V2000 format
limited to not more than 1000 atoms).

In addition, provisional support for extended features of Molfile V3000 was also added, both to
inchi-1 and the InChI Software Library, API. This means that extended data (on haptic
coordination bonds and stereo collections) are read and parsed; however, they are not used
currently (as this requires significant modification of the Identifier itself, not just the Software).

InChI Software has experimental support of regular single-strand polymers. Both structure-
based and source-based representation and encoding of polymers are supported.

Executable inchi-1 supports reading input Molfile files containing polymer description lines. This
support is also built into several API procedures (see "API Reference" document of this
distribution) .

Test Files

A number of Molfiles (.mol) and SDfiles (.sdf) are included with this distribution for illustrative
purposes. In particular, file samples_UserGuide.zip contains MOL files for the examples used in
this document.
V. Further reading and contacts
In addition to this introductory User Guide, a number of materials concerning InChI is currently
available.

In particular, this distribution contains separate documents (PDF files) with InChI Technical
Manual (InChI_TechMan.pdf) and InChI API Reference (InChI_API_Reference.pdf). For more brief
and less technical description, look at: Heller, S., McNaught, A., Pletnev, I., Stein, S., and Tchekhovskoi,
D. InChI, the IUPAC international chemical identifier. Journal of Cheminformatics 7 (2015), 23–23. DOI:
10.1186/s13321-015-0068-4 For much more brief description, as well as for background and
history, look at: Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D., and Pletnev, I. InChI - the worldwide
chemical structure identifier standard. Journal of Cheminformatics 5 (2013), 7–7. DOI: 10.1186/1758-
2946-5-7

For InChI FAQ, address to: [Link]

Contacts:

Richard Kidd,
InChI Trust,
c/o Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, UK CB2 1EZ
richard@[Link]

Steve Heller (InChI project director, InChI Trust)


steve@[Link]

Igor V. Pletnev (InChI developer)


[Link]@[Link]

You might also like