CADD Tutorial PDF

Uploaded by

Eduardo Otalvaro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

165 views8 pages

CADD Tutorial PDF

Uploaded by

Eduardo Otalvaro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332290319

TeachOpenCADD: a teaching platform for computer-aided drug design using

open source packages and data

Article in Journal of Cheminformatics · April 2019

DOI: 10.1186/s13321-019-0351-x

CITATIONS READS

0 103

4 authors, including:

Dominique Sydow Maximilian Driller

Charité Universitätsmedizin Berlin Freie Universität Berlin
11 PUBLICATIONS 55 CITATIONS 1 PUBLICATION 0 CITATIONS

SEE PROFILE SEE PROFILE

Andrea Volkamer
Charité Universitätsmedizin Berlin
25 PUBLICATIONS 503 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Molecular insight on the binding of NNRTI to K103N mutated HIV-1 RT: Molecular dynamics simulations, dynamic pharmacophore analysis and free energy calculation
methods View project

TeachOpenCADD: a teaching platform for computer‐aided drug design using open source packages and data View project

All content following this page was uploaded by Dominique Sydow on 09 April 2019.

The user has requested enhancement of the downloaded file.

Sydow et al. J Cheminform (2019) 11:29
https://doi.org/10.1186/s13321-019-0351-x Journal of Cheminformatics

SOFTWARE Open Access

TeachOpenCADD: a teaching platform

for computer‑aided drug design using open
source packages and data
Dominique Sydow , Andrea Morger , Maximilian Driller and Andrea Volkamer*

Abstract
Owing to the increase in freely available software and data for cheminformatics and structural bioinformatics, research
for computer-aided drug design (CADD) is more and more built on modular, reproducible, and easy-to-share pipe-
lines. While documentation for such tools is available, there are only a few freely accessible examples that teach the
underlying concepts focused on CADD, especially addressing users new to the field. Here, we present TeachOpen-
CADD, a teaching platform developed by students for students, using open source compound and protein data as
well as basic and CADD-related Python packages. We provide interactive Jupyter notebooks for central CADD topics,
integrating theoretical background and practical code. TeachOpenCADD is freely available on GitHub: https://githu
b.com/volkamerlab/TeachOpenCADD.
Keywords: Computer-aided drug design, Python, RDKit, Open source, Teaching, Learning, Cheminformatics,
Structural bioinformatics

Introduction Development Kit (CDK) [5–9] and the Teach–Discover–

Open access resources for cheminformatics and struc- Treat (TDT) initiative [10], which launched challenges to
tural bioinformatics as well as public platforms for code develop tutorials, such as a Python-based virtual screen-
deposition such as GitHub are increasingly used in ing (VS) workflow to identify malaria drugs [11, 12].
research. This combination facilitates and promotes the Complementing these resources, we developed the
generation of modular, reproducible, and easy-to-share TeachOpenCADD platform to provide students and
pipelines for computer-aided drug design (CADD). researchers new to CADD and/or programming with
Comprehensive lists of open resources are reviewed by step-by-step tutorials suitable for self-study training
Pirhadi et al. [1], or presented in the form of the web- as well as classroom lessons, covering both ligand- and
based search tool Click2Drug [2], aiming to cover the full structure-based approaches. TeachOpenCADD is a
CADD pipeline. novel teaching platform developed by students for stu-
While documentation for open access resources is dents, using open source data and Python packages to
available, freely accessible teaching platforms for con- tackle various common tasks in cheminformatics and
cepts and applications in CADD are rare. Available structural bioinformatics. Interactive Jupyter notebooks
examples include the following: On the one hand, graphi- [13] are presented for central topics, integrating detailed
cal user interface (GUI) based tutorials teach CADD theoretical background and well-documented practi-
basics, such as the web-based educational Drug Design cal code. Topics build upon one another in the form of a
Workshop [3, 4]. On the other hand, examples for edu- pipeline, which is illustrated at the example of the epider-
cational coding tutorials are the Java-based Chemistry mal growth factor receptor (EGFR) kinase, but can eas-
ily be adapted to other query proteins. TeachOpenCADD
*Correspondence: [email protected]
is publicly available on GitHub and open to contribu-
In Silico Toxicology, Institute of Physiology, Charité – Universitätsmedizin tions from the community: https://github.com/volka
Berlin, Charitéplatz 1, 10117 Berlin, Germany

© The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/
publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Sydow et al. J Cheminform (2019) 11:29 Page 2 of 7

merlab/TeachOpenCADD (current release: https://doi. Open data resources employed are the ChEMBL [14]
org/10.5281/zenodo.2600909). and PDB [15] databases for compound and protein struc-
ture data acquisition, respectively. Open source libraries
Methods utilized are RDKit [16] (cheminformatics), the ChEMBL
TeachOpenCADD currently consists of ten talktorials webresource client [17] and PyPDB [18] (ChEMBL and
covering central topics in CADD, see Fig. 1. Talktorials PDB application programming interface access), Bio-
are offered as interactive Jupyter notebooks that can be Pandas [19] (loading and manipulating molecular struc-
used as tutorials but also for oral presentations, e.g. in tures), and PyMOL [20] (structural data visualization).
student CADD seminars (talk + tutorial = talktorial). Additionally, basic Python computing libraries employed
They start with a topic motivation and learning goals, include numpy [21, 22] and pandas [23, 24] (high-per-
continue with the main part composed of theoretical formance data structures and analysis), scikit-learn [25]
background and practical code, and end with a short dis- (machine learning), as well as matplotlib [26] and seaborn
cussion and quiz, see Fig. 2. [27] (plotting). Furthermore, the user is instructed how

Fig. 1 TeachOpenCADD talktorial pipeline. TeachOpenCADD is a teaching platform for open source data and packages, currently offering ten
talktorials in the form of Jupyter notebooks on central topics in CADD, ranging from cheminformatics (T1–7) to structural bioinformatics (T8–10).
The talktorials are illustrated at the example of EGFR (based on data sets from ChEMBL and PDB queries in November 2018)
Sydow et al. J Cheminform (2019) 11:29 Page 3 of 7

Fig. 2 Screenshot of TeachOpenCADD talktorial composition. TeachOpenCADD talktorials are Jupyter notebooks that cover one CADD topic
each, composed of (i) a topic motivation, (ii) learning goals, (iii) references to literature, (iv) theoretical background, (v) practical code, (vi) a short
discussion, and (vii) a quiz—all in one place. Shown here is a screenshot of parts of talktorial T9 to generate pharmacophores

to work with conda [28], a widely used package, depend- generate ligand-based ensemble pharmacophores (T9).
ency and environment management tool. A conda yml Geometry-based binding site comparison of kinase
file is provided to ensure an easy and quick setup of an inhibitor imatinib binding proteins is performed to ana-
environment containing all required packages. lyse potential off-targets (T10). In summary, the pre-
The talktorial topics include how to acquire data from sented talktorials build a pipeline with starting points
ChEMBL (T1), filter compounds for drug-likeness (T2), being (i) a query protein to study associated compound
and identify unwanted substructures (T3). Furthermore, data (T1 and T8) and (ii) a query ligand to investigate
measures for compound similarity are introduced and associated on- and off-targets (T10), see Fig. 1. These
applied for VS of kinase inhibitor gefitinib (T4) as well talktorials can be studied independently from each other
as for compound clustering (T5), including the use of or as a pipeline.
maximum common substructures (T6). Machine learn- As an example, the talktorial pipeline is used to iden-
ing approaches are employed to build models for pre- tify novel EGFR kinase inhibitors. EGFR kinase is a
dicting active compounds (T7). Lastly, protein-ligand transmembrane protein, which activates several signal-
complexes are fetched from the PDB (T8), used to ing cascades to convert extracellular signals into cellular
Sydow et al. J Cheminform (2019) 11:29 Page 4 of 7

responses. Dysfunctional signaling of EGFR is associated potentially problematic. They can be manually evaluated
with diseases such as cancer, making it a frequent tar- by medicinal chemists if reported as hits after screening,
get in drug development projects (the reader is referred see Fig. 1.T3.
to a review by Chen et al. [29] for more information on T4. Ligand-based screening: compound similarity.
EGFR). Furthermore, the pipeline can easily be adapted In VS, compounds similar to known ligands of a target
to other examples by simply exchanging the query pro- under investigation often constitute the starting point
tein (T1 and T8: protein UniProt ID) and query ligand for drug development. This approach follows the simi-
(T10: ligand names in the PDB). lar property principle stating that structurally similar
compounds are more likely to exhibit similar biological
Results activities [35, 36] (exceptions are so-called activity cliffs
In the following, the content of each talktorial is briefly [37]). For computational representation and processing,
discussed and summarized in Fig. 1. If not noted other- compound properties can be encoded in the form of bit
wise, tasks are conducted with RDKit or basic Python arrays, so-called molecular fingerprints, e.g. MACCS [38]
libraries as stated in the Methods section. Note that and Morgan fingerprints [39, 40]. Compound similar-
reported numbers and results are based on data sets from ity can be assessed by comparison measures, such as the
ChEMBL and PDB queries conducted in November 2018. Tanimoto and Dice similarity [41]. Using these encod-
T1. Data acquisition from ChEMBL. Compound infor- ing and comparison methods, VS is conducted based on
mation on structure, bioactivity and associated targets is a similarity search: the EGFR inhibitor gefitinib is used
organized in databases such as ChEMBL, PubChem [30], to find its most similar compounds in data set T2. With
or DrugBank [31]. For the query target EGFR (UniProt ID the data being split into active and inactive compounds
P00533), compound data including molecular structure based on the chosen pIC50 cutoff of 6.3, screening results
(SMILES) and bioactivity data is automatically fetched are evaluated with enrichment plots, see Fig. 1.T4. In the
from the ChEMBL database, using the ChEMBL webre- top 5% of the compounds ranked by similarity, called the
source client, and is filtered for e.g. binding assays and enrichment factor at 5% (EF5% ), 8.3% of actives can be
IC50 measurements (6,641 compounds). The data set is retrieved, while the random and optimal EF5% of this data
formatted and further filtered: e.g. duplicates and entries set are 5.0% and 9.2%, respectively.
with missing values are dropped and only bioactivity val- T5. Compound clustering. The similar property prin-
ues in molar units are kept and converted to pIC50 values ciple can also be used to identify groups of similar com-
(4,771 compounds retained, referred to as data set T1), pounds via clustering, in order to pick a set of diverse
see Fig. 1.T1. compounds from these clusters for e.g. non-redundant
T2. Molecular filtering: ADME criteria. Not all com- experimental testing. In this talktorial, Butina cluster-
pounds are suitable starting points for drug development ing [42] based on the RDKFingerprint [43] is applied to
due to undesirable pharmacokinetic properties, which cluster data set T2 at a Tanimoto distance cutoff of 0.2,
for instance negatively affect a drug’s absorption, distri- resulting in 988 clusters with the largest cluster consist-
bution, metabolism, and excretion (ADME). Therefore, ing of 143 compounds, see Fig. 1.T5. Following the exam-
such compounds are usually not included in data sets for ple in the TDT pipeline by Riniker et al. [11], a maximum
VS. Data set T1 is filtered by lead-likeness criteria, i.e. of 1000 compounds is subsequently picked by selecting
Lipinski’s rule of five [32], in order to remove less drug- the ten most similar compounds per cluster (or 50% for
like molecules from the EGFR data set (4009 compounds clusters with fewer compounds), starting with the larg-
retained, referred to as data set T2). This data set is vis- est cluster. Thereby, compound diversity is ensured (rep-
ualized using radar plots demonstrating their ADME resentatives of each cluster), while structure-activity
properties, see Fig. 1.T2, and serves as starting point for relationship (SAR) information is retained (most similar
several talktorials discussed in the following. compounds selected from clusters).
T3. Molecular filtering: unwanted substructures. Com- T6. Maximum common substructures. In order to visu-
pounds can contain unwanted substructures that may alize shared scaffolds and thereby emphasize the extent
cause mutagenic, reactive, or other unfavorable phar- and type of chemical similarities or differences of a com-
macokinetic effects [33] or that may lead to non-specific pound cluster, the maximum common substructure
interactions with assays (PAINS) [34]. Such unwanted (MCS) [44] can be calculated and highlighted. The MCS
substructures are detected and highlighted in data set for the largest cluster from T5 is calculated using the
T2. This knowledge can be integrated into cheminfor- FMCS algorithm [45], see Fig. 1.T6. Different parameters
matics pipelines to either perform an additional filter- can be applied, e.g. a threshold to set the percentage of
ing step before screening (1,951 compounds retained) compounds in the set that need to share the same MCS,
or – more often – to set alert flags to compounds being
Sydow et al. J Cheminform (2019) 11:29 Page 5 of 7

or a restriction to match ring bonds only with other ring in their binding site with on-targets, and are therefore
bonds. able to bind similar ligands. Computational off-target
T7. Ligand-based screening: machine learning. With prediction using binding site comparison is an estab-
the continuously increasing amount of available data, lished approach in early stages of drug development [53,
machine learning (ML) gained momentum in drug dis- 54]. In T10, structural similarity is exemplarily accessed
covery and especially in ligand-based VS to predict the using a basic measure, i.e. the geometrical variation
activity of novel compounds against a target of interest. between structures by calculating the root mean square
The EGFR compound data set is split into active and deviation (RMSD) between pairs of aligned structures
inactive compounds as described in T4, and used to train using PyMOL, including either the whole proteins or
ML classifiers based on random forests (RF) [46], sup- focusing on their binding sites. Pairwise RMSD compari-
port vector machines (SVM) [47], and artificial neural son of seven protein structures binding imatinib, a small
networks (ANN) [48], applying 10-fold cross validation. molecule tyrosine kinase inhibitor for cancer treatment,
Models are evaluated using receiver operating charac- is able to separate tyrosine kinases (on-targets) from qui-
teristic (ROC) curves and mean area under the curve none reductase (reported off-target [55]), see Fig. 1.T10.
(AUC) values (mean AUC results for RF, SVM, and ANN
are 90%, 87%, and 87%, respectively), see Fig. 1.T7. The Conclusion
trained models can be used to perform a classification of The presented teaching platform TeachOpenCADD aims
an unknown screening data set to predict novel potential at introducing interested students and researchers to the
EGFR inhibitors. ease and benefit of using open access resources for chem-
T8. Data acquisition from PDB. The PDB database informatics and structural bioinformatics. Jupyter note-
holds 3D structural data and meta information on experi- books (talktorials) offer detailed theoretical background
mentally resolved proteins. Using PyPDB, all EGFR struc- and Python code examples, forming an automated pipe-
tures are automatically fetched from the PDB (by UniProt line that saves and reloads results from one topic to
ID) and filtered by ligand-bound structures resolved with another. The pipeline is illustrated using the example of
X-ray crystallography, retaining four EGFR-ligand struc- EGFR, but can easily be adapted to other examples by
tures with good structural resolution. Using the Python exchanging the input protein and ligand. Beyond their
integration of the molecular visualization tool PyMOL, teaching purpose for self-study training and classroom
those structures are subsequently aligned to each other in lessons, the talktorials can serve as starting point for
3D. Ligands are extracted, see Fig. 1.T8, and saved to be users’ project-directed modifications and extensions.
used in T9 for the generation of a ligand-based ensemble TeachOpenCADD intends to expand existing and add
pharmacophore. new topics continuously, and is open for contributions
T9. Ligand-based ensemble pharmacophores. Another and ideas from the community.
approach for ligand-based VS – besides a similarity
search (T4) or machine learning classifiers (T7) – are
Abbreviations
ligand-based (ensemble) pharmacophore models. They CADD: computer-aided drug design; GUI: graphical user interface; CDK: Chem-
describe important steric and physicochemical proper- istry Development Kit; TDT: Teach–Discover–Treat; VS: virtual screening; EGFR:
ties of a ligand (or a set of ligands) to bind a target under epidermal growth factor receptor; ADME: absorption, distribution, metabo-
lism, excretion; SAR: structure–activity relationship; MCS: maximum common
investigation. Examples for physicochemical properties substructure; ML: machine learning; RF: random forest; SVM: support vector
are so-called donor, acceptor, and hydrophobic pharma- machine; ANN: artificial neural network; ROC: receiver operating characteristic;
cophoric features present in a molecule [49, 50]. For the AUC: area under the curve; RMSD: root mean square deviation; EF: enrichment
factor.
EGFR ligands selected and aligned in T8, pharmacoph-
oric features are identified for each ligand and subse- Authors’ contributions
quently clustered with k-means clustering [51] in order to All authors (DS, AM, MD, and AV) contributed to implementing the platform,
finalizing the talktorials, and editing/reviewing the manuscript. DS was
define an ensemble pharmacophore, see Fig. 1.T9. Such responsible for management and major writing, and AV for conceptualiza-
a pharmacophore represents the properties of the set of tion, management, and writing. All authors read and approved the final
known EGFR ligands and can be used to search for novel manuscript.
EGFR ligands via VS, as described in an RDKit pharma-
cophore tutorial by Stiefl et al. [52]. Acknowledgements
T10. Off-target prediction and binding site comparison. The authors thank the participants of the CADD seminar courses in 2017
and 2018 (joint bioinformatics study program at the Freie Universität Berlin
Off-targets are proteins that interact with a drug or (one and the Charité) for working on the reported talktorials: Svetlana Leng and
of ) its metabolite(s) without being the designated target, Paula Junge (T1), Mathias Wajnberg and Michele Ritschel (T2), Maximilian
potentially causing unwanted side effects. Off-targets Driller and Sandra Krüger (T3), Andrea Morger and Franziska Fritz (T4), Gizem
Spriewald and Calvinna Caswara (T5), Oliver Nagel (T6), Jacob Gora and Jan
mainly occur because they share similar structural motifs Philipp Albrecht (T7), Majid Vafadar and Anja Georgi (T8), Pratik Dhakal and
Sydow et al. J Cheminform (2019) 11:29 Page 6 of 7

Florian Gusewski (T9), as well as Angelika Szengel and Marvis Sydow (T10). 10. Jansen JM, Cornell W, Tseng YJ, Amaro RE (2012) Teach–Discover–Treat
Additionally, the authors acknowledge Greg Landrum and Boran Adas for their (TDT): collaborative computational drug discovery for neglected dis-
feedback on the talktorials. Finally, the authors express their gratitude to the eases. J Mol Graph Modell 38:360–2
Freie Universität Berlin for supporting the TeachOpenCADD project (SUPPORT 11. Riniker S, Landrum GA, Montanari F, Villalba SD, Maier J, Jansen JM,
für die Lehre: Förderung innovativer Lehrvorhaben). Walters WP, Shelat AA (2017) Virtual-screening workflow tutorials and
prospective results from the Teach–Discover–Treat competition 2014
Competing interests against malaria. F1000Research 6:1136
The authors declare that they have no competing interests. 12. Riniker S, Landrum GA, Montanari F, Villalba SD, Maier J, Jansen JM,
Walters WP, Shelat AA (2017) Tutorial for the Teach–Discover–Treat (TDT)
Availability and requirements Competition 2014—Challenge 1: anti-malaria hit finding using classifier-
Project name: TeachOpenCADD. Project home page: https://github.com/ fusion boosted predictive models. https://github.com/sriniker/TDT-tutor
volkamerlab/TeachOpenCADD. Operating system(s): Platform independent. ial-2014. Accessed 18 Dec 2018
Programming language: Python. Other requirements: Databases: ChEMBL and 13. Kluyver T, Ragan-Kelley B, Pérez F, Granger B, Bussonnier M, Frederic J,
PDB. Python packages: RDKit, ChEMBL webresource client, PyPDB, BioPandas, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing
PyMOL, numpy, pandas, scikit-learn, matplotlib, seaborn, and conda. License: C, Team Jupyter Development (2016) Jupyter Notebooks—a publishing
http://creativecommons.org/licenses/by/4.0/. Any restrictions to use by non- format for reproducible computational workflows. Agents and agendas.
academics: Not applicable. In: Loizides F, Schmidt B (eds) Positioning and power in academic pub-
lishing: players. IOS Press, Amsterdam, pp 87–90
Availability of data and materials 14. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light
TeachOpenCADD talktorial material is available at https://github.com/volka Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2012)
merlab/TeachOpenCADD. Compound and protein structure data used as ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic
EGFR example in the talktorials are fetched from the ChEMBL (query by Uni- Acids Res 40:1100–7
Prot ID “P00533”) and PDB (query by UniPort ID “P00533”, “STI”, and “imatinib”) 15. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shin-
databases. dyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res
28:235–42
Funding 16. RDKit (2018) RDKit: Open-Source Cheminformatics, Version 2018.09.1.
The authors receive funding from the Bundesministerium für Bildung und http://www.rdkit.org
Forschung (AV: Grant Number 031A262C), Deutsche Forschungsgemeinschaft 17. Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F,
(DFG) (AV and DS: Grant Number 391684253), and the HaVo-Stiftung, Ludwig- Bellis L, Overington JP (2015) ChEMBL web services: streamlining access
shafen, Germany (AM). The authors acknowledge support from the German to drug discovery data and utilities. Nucleic Acids Res 43:W612–W620
Research Foundation (DFG) and the Open Access Publication Fund of Charité 18. Gilpin W (2015) PyPDB: a Python API for the protein data bank. Bioinfor-
– Universitätsmedizin Berlin. matics 32:159–60
19. Raschka S (2017) BioPandas: working with molecular structures in pandas
DataFrames. J Open Source Softw 2:279
Publisher’s Note 20. Schrödinger L (2015) The PyMOL molecular graphics system. Version 1.8
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional 21. Oliphant T (2006) A guide to NumPy. Trelgol Publishing
claims in published maps and institutional affiliations 22. van der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a struc-
ture for efficient numerical computation. Comput Sci Eng 13(2):22–30
Received: 19 December 2018 Accepted: 27 March 2019 23. McKinney W (2010) Data structures for statistical computing in Python. In:
van der Walt S, Millman J (eds) Proceedings of the 9th Python in science
conference, pp 51–56
24. McKinney W (2011) pandas: a foundational Python library for data analy-
sis and statistics
References 25. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O,
1. Pirhadi S, Sunseri J, Koes DR (2016) Open source molecular modeling. J Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A,
Mol Graph Modell 69:127–43 Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn:
2. Swiss Institute of Bioinformatics (2013) Click2Drug website. http://www. machine learning in Python. J Mach Learn Res 12:2825–2830
click2drug.org/. Accessed 18 Dec 2018 26. Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng
3. Daina A, Blatter MC, Baillie Gerritsen V, Palagi PM, Marek D, Xenarios I, 9:90–95
Schwede T, Michielin O, Zoete V (2017) Drug design workshop: a web- 27. Waskom M (2018) seaborn v0.9.0
based educational tool to introduce computer-aided drug design to the 28. Continuum Analytics Inc (dba Anaconda Inc) (2017) conda. https://www.
general public. J Chem Educ 94:335–344 anaconda.com. Accessed 18 Dec 2018
4. Swiss Institute of Bioinformatics (2015) Drug Design Workshop website. 29. Chen J, Zeng F, Forrester SJ, Eguchi S, Zhang MZ, Harris RC (2016) Expres-
www.drug-design-workshop.ch. Accessed 18 Dec 2018 sion and function of the epidermal growth factor receptor in physiology
5. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) and disease. Physiol Rev 96:1025–1069
The Chemistry Development Kit (CDK): an open-source java library for 30. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He
chemo- and bioinformatics. J Chem Inf Comput Sci 43:493–500 S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2016) PubChem
6. Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) substance and compound databases. Nucleic Acids Res 44:D1202–D1213
Recent developments of the Chemistry Development Kit (CDK)—an 31. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang
open-source java library for chemo- and bioinformatics. Curr Pharm Des Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug
12:2111–20 discovery and exploration. Nucleic Acids Res 34:D668–D672
7. May JW, Steinbeck C (2014) Efficient ring perception for the Chemistry 32. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and
Development Kit. J Cheminf 6:3 computational approaches to estimate solubility and permeability in
8. Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, drug discovery and development settings. Adv Drug Deliv Rev 23:3–25
Kuhn S, Pluskal T, Rojas-Chertó M, Torrance G, Evelo CT, Guha R, Steinbeck 33. Brenk R, Schipani A, James D, Krasowski A, Gilbert IH, Frearson J, Wyatt
C (2017) The Chemistry Development Kit (cdk) v2.0: atom typing, depic- PG (2008) Lessons learnt from assembling screening libraries for drug
tion, molecular formulas, and substructure searching. J Cheminf 9:33 discovery for neglected diseases. ChemMedChem 3:435–444
9. Chemistry Development Kit (2017) Chemistry Development Kit (CDK) 34. Baell JB, Holloway GA (2010) New substructure filters for removal of pan
website. https://cdk.github.io/, Accessed 18 Dec 2018 assay interference compounds (PAINS) from screening libraries and for
their exclusion in bioassays. J Med Chem 53:2719–2740
Sydow et al. J Cheminform (2019) 11:29 Page 7 of 7

35. Johnson MA, Maggiora GM (1990) Concepts and applications of molecu- 47. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn
lar similarity, 1st edn. Wiley, New York 20:273–297
36. Bender A, Glen RC (2004) Molecular similarity: a key technique in molecu- 48. van Gerven M, Bohte S (2017) Editorial: artificial neural networks as mod-
lar informatics. Org Biomol Chem 2:3204 els of neural information processing. Front Comput Neurosci 11:114
37. Bajorath J (2017) Representation and identification of activity cliffs. Expert 49. Wermuth CG, Ganellin CR, Lindberg P, Mitscher LA (1998) Glossary of
Opin Drug Discov 12:879–883 terms used in medicinal chemistry (IUPAC Recommendations 1998). Pure
38. Accelrys Inc, San Diego, CA, USA (2011) MACCS structural keys Appl Chem 70:1129–1143
39. Morgan HL (1965) The generation of a unique machine description for 50. Seidel T, Wolber G, Murgueitio MS (2018) Pharmacophore perception and
chemical structures—a technique developed at Chemical Abstracts applications. Applied chemoinformatics. Wiley, Weinheim, pp 259–282
Service. J Chem Doc 5:107–113 51. Macqueen J (1967) Some methods for classification and analysis of
40. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf multivariate observations. In: 5th Berkeley symposium on mathematical
Model 50:742–754 statistics and probability, pp 281–297
41. Maggiora G, Vogt M, Stumpfe D, Bajorath J (2014) Molecular similarity in 52. Stiefl N (2016) 3D pharmacophores in the RDKit. https://github.com/rdkit
medicinal chemistry. J Med Chem 57:3186–3204 /UGM_2016/blob/master/Notebooks/Stiefl _RDKitPh4FullPublication.
42. Butina D (1999) Unsupervised data base clustering based on Daylight’s ipynb. Accessed 18 Dec 2018
fingerprint and Tanimoto similarity: a fast and automated way to cluster 53. Kellenberger E, Schalon C, Rognan D (2008) How to measure the similar-
small and large data sets. J Chem Inf and Model 39:747–750 ity between protein ligand-binding sites? Curr Comput-Aided Drug Des
43. RDKit (2018) RDKFingerprint. http://rdkit.org/docs/source/rdkit.Chem. 4:209–220
rdmolops.html. Accessed 18 Dec 2018 54. Ehrt C, Brinkjost T, Koch O (2016) Impact of binding site comparisons
44. Raymond JW, Willett P (2002) Maximum common subgraph isomorphism on medicinal chemistry and rational molecular design. J Med Chem
algorithms for the matching of chemical structures. J Comput-Aided Mol 59:4121–4151
Des 16:521–33 55. Winger JA, Hantschel O, Superti-Furga G, Kuriyan J (2009) The structure
45. Dalke A, Hastings J (2013) FMCS: a novel algorithm for the multiple MCS of the leukemia drug imatinib bound to human quinone reductase 2
problem. J Cheminf 5:O6 (NQO2). BMC Struct Biol 9:7
46. Ho TK (1995) Random decision forests. In: Proceedings of 3rd interna-
tional conference on document analysis and recognition, vol 1. IEEE
Comput Soc Press, Los Alamitos, California, pp 278–282

Ready to submit your research ? Choose BMC and benefit from:

• fast, convenient online submission

• thorough peer review by experienced researchers in your field
• rapid publication on acceptance
• support for research data, including large and complex data types
• gold Open Access which fosters wider collaboration and increased citations
• maximum visibility for your research: over 100M website views per year

At BMC, research is always in progress.

Learn more biomedcentral.com/submissions

View publication stats

Impact of Soda-Lime Borosilicate Glass Composition On Water Penetration and Water Structure at The First Time of Alteration
No ratings yet
Impact of Soda-Lime Borosilicate Glass Composition On Water Penetration and Water Structure at The First Time of Alteration
11 pages
The Rise of Mycoses in South Africa
No ratings yet
The Rise of Mycoses in South Africa
19 pages
Choetal 2019
No ratings yet
Choetal 2019
9 pages
A Survey of Medical Image Registration On Graphics
No ratings yet
A Survey of Medical Image Registration On Graphics
15 pages
Choetal 2019
No ratings yet
Choetal 2019
9 pages
Wearable Medical Systems For P-Health
No ratings yet
Wearable Medical Systems For P-Health
14 pages
Dewaele2015 OGR Manono
No ratings yet
Dewaele2015 OGR Manono
19 pages
Event-Driven Intrinsic Plasticity For Spiking Convolutional Neural Networks
No ratings yet
Event-Driven Intrinsic Plasticity For Spiking Convolutional Neural Networks
11 pages
A Traffic Object Detection System For Road Traffic
No ratings yet
A Traffic Object Detection System For Road Traffic
7 pages
Association of Uterine Health in The First Lactation
No ratings yet
Association of Uterine Health in The First Lactation
13 pages
Chierchia 2
No ratings yet
Chierchia 2
9 pages
The Lancet 1999 Investigators
No ratings yet
The Lancet 1999 Investigators
6 pages
Effects of Post-Exercise Recovery Interventions On Physiological, Psychological, and Performance Parameters
No ratings yet
Effects of Post-Exercise Recovery Interventions On Physiological, Psychological, and Performance Parameters
10 pages
Center of Excellence For Lean Enterprise 4.0: Procedia Manufacturing January 2019
No ratings yet
Center of Excellence For Lean Enterprise 4.0: Procedia Manufacturing January 2019
7 pages
Professional Competence of Teachers: Effects On Instructional Quality and Student Development
No ratings yet
Professional Competence of Teachers: Effects On Instructional Quality and Student Development
17 pages
Dyadic Coping Inventory (DCI) : A Questionnaire Assessing Dyadic Coping in Couples
No ratings yet
Dyadic Coping Inventory (DCI) : A Questionnaire Assessing Dyadic Coping in Couples
17 pages
Ethnography for Academics
No ratings yet
Ethnography for Academics
2 pages
Role of Economic Instruments in Water Allocation Reform: Lessons From Europe
No ratings yet
Role of Economic Instruments in Water Allocation Reform: Lessons From Europe
36 pages
Met A Bolo Mics 2012 Xavier
No ratings yet
Met A Bolo Mics 2012 Xavier
15 pages
Heussen Et Al ISGT2010 Public 4
No ratings yet
Heussen Et Al ISGT2010 Public 4
10 pages
Tree-Ring Reconstructed Rainfall Over The Southern Amazon Basin: Southern Amazon Rainfall
No ratings yet
Tree-Ring Reconstructed Rainfall Over The Southern Amazon Basin: Southern Amazon Rainfall
10 pages
Effects of Different Exercise Interventions On Risk of Falls, Gait Ability, and
No ratings yet
Effects of Different Exercise Interventions On Risk of Falls, Gait Ability, and
11 pages
Board Leadership and Strategy Involvement in Small Firms A Team Production Approach
No ratings yet
Board Leadership and Strategy Involvement in Small Firms A Team Production Approach
17 pages
Rodriguezetal 2006 Ingenieria UNIADUNI
No ratings yet
Rodriguezetal 2006 Ingenieria UNIADUNI
4 pages
Musculoskeletal Ultrasonography in Physical and Re
No ratings yet
Musculoskeletal Ultrasonography in Physical and Re
10 pages
17O NMR Study of Water Exchange On (GD (DTPA) (H2O) ) 2-And (GD (DOTA) (H2O) ) - Related To NMR Imaging
No ratings yet
17O NMR Study of Water Exchange On (GD (DTPA) (H2O) ) 2-And (GD (DOTA) (H2O) ) - Related To NMR Imaging
8 pages
High-Temperature SAW Sensors Analysis
No ratings yet
High-Temperature SAW Sensors Analysis
6 pages
Knapen 2014 Disabil Rehabil Exercisetherapyimprovesbothmentalandphysicalhealthinpatientswithmajordepression
No ratings yet
Knapen 2014 Disabil Rehabil Exercisetherapyimprovesbothmentalandphysicalhealthinpatientswithmajordepression
7 pages
P2019 Lighttunableazopolymers Photomechanicalphenomenaandmultifunctionalmaterials
No ratings yet
P2019 Lighttunableazopolymers Photomechanicalphenomenaandmultifunctionalmaterials
2 pages
Oxytocin, Vasopressin, and Social Behavior: From Neural Circuits To Clinical Opportunities
No ratings yet
Oxytocin, Vasopressin, and Social Behavior: From Neural Circuits To Clinical Opportunities
14 pages
2010 Dewaele SGeology of The Cassiterite Mineralisation in The Rutongo Area Rwanda Central Africa Current State of Knowledge
No ratings yet
2010 Dewaele SGeology of The Cassiterite Mineralisation in The Rutongo Area Rwanda Central Africa Current State of Knowledge
23 pages
Height and Weight Estimation From Anthropometric M
No ratings yet
Height and Weight Estimation From Anthropometric M
10 pages
E079814 Full
No ratings yet
E079814 Full
8 pages
Hyaluronic Acid Lip Fillers Guide
No ratings yet
Hyaluronic Acid Lip Fillers Guide
4 pages
Iet-Rpg 2019 00361
No ratings yet
Iet-Rpg 2019 00361
12 pages
Prediction of Lightning Inception by Large Ice Particles and Extensive Air Showers
No ratings yet
Prediction of Lightning Inception by Large Ice Particles and Extensive Air Showers
6 pages
Clustering Methods 2023
No ratings yet
Clustering Methods 2023
29 pages
Kersten 2017
No ratings yet
Kersten 2017
8 pages
LSD and Network Connectivity Changes
No ratings yet
LSD and Network Connectivity Changes
9 pages
Bouguer Gravity Data and Satellite Gravity Transformation Integration in The Caspian Region: An Introduction
No ratings yet
Bouguer Gravity Data and Satellite Gravity Transformation Integration in The Caspian Region: An Introduction
7 pages
Hybrid-Electric Aircraft Motor Design
No ratings yet
Hybrid-Electric Aircraft Motor Design
9 pages
2016 Zhang Et Al Pess VF 3lnpc PMSG BTB DMPC 2
No ratings yet
2016 Zhang Et Al Pess VF 3lnpc PMSG BTB DMPC 2
9 pages
Ancient DNA of Early European Farmers
No ratings yet
Ancient DNA of Early European Farmers
4 pages
Restorative Environment Metrics
No ratings yet
Restorative Environment Metrics
21 pages
Gossyplure Synthesis for Pink Bollworm
No ratings yet
Gossyplure Synthesis for Pink Bollworm
5 pages
BS Art 45951-10
No ratings yet
BS Art 45951-10
12 pages
Violette Reviewofguidelines-Thromboprophylinurology BJU2016
No ratings yet
Violette Reviewofguidelines-Thromboprophylinurology BJU2016
9 pages
2011-Experimental Verification of Photon Angular
No ratings yet
2011-Experimental Verification of Photon Angular
4 pages
Determination of PV Generator I-V/P-V Characteristic Curves Using A DC-DC Converter Controlled by A Virtual Instrument
No ratings yet
Determination of PV Generator I-V/P-V Characteristic Curves Using A DC-DC Converter Controlled by A Virtual Instrument
15 pages
BMI, A Performance Parameter For Speed Improvement: PLOS ONE February 2014
No ratings yet
BMI, A Performance Parameter For Speed Improvement: PLOS ONE February 2014
8 pages
CO Production From CO2 Via Reverse Water-Gas Shift Reaction Performed in A Chemical Looping Mode: Kinetics On Modified Iron..
No ratings yet
CO Production From CO2 Via Reverse Water-Gas Shift Reaction Performed in A Chemical Looping Mode: Kinetics On Modified Iron..
10 pages
Vanderlinden DIS Q ClinicalPsychologyPsychotherapy1993
No ratings yet
Vanderlinden DIS Q ClinicalPsychologyPsychotherapy1993
8 pages
Behavior Trees and State Machines in Robotics Appl
No ratings yet
Behavior Trees and State Machines in Robotics Appl
25 pages
Patolous
No ratings yet
Patolous
6 pages
2010 Etnografa
No ratings yet
2010 Etnografa
2 pages
Landslides2014 Articolopubblicato
No ratings yet
Landslides2014 Articolopubblicato
29 pages
Determining The Thermal Stress Limit of LED Lamps Using Highly Accelerated Decay Testing
No ratings yet
Determining The Thermal Stress Limit of LED Lamps Using Highly Accelerated Decay Testing
12 pages
2022 Dentaldefects Environmental EHP10208
No ratings yet
2022 Dentaldefects Environmental EHP10208
12 pages
Kinetics and Thermodynamics of Protein Folding
No ratings yet
Kinetics and Thermodynamics of Protein Folding
21 pages
Molecular Modelling
No ratings yet
Molecular Modelling
91 pages
Biomembranes Scientific Article
No ratings yet
Biomembranes Scientific Article
8 pages
Multiskan GO User Guide
No ratings yet
Multiskan GO User Guide
92 pages
Cassava Fermentation
No ratings yet
Cassava Fermentation
9 pages
Cassava Fermentation
No ratings yet
Cassava Fermentation
9 pages
Victoria Ogaraku c.v-1
No ratings yet
Victoria Ogaraku c.v-1
3 pages
Ultimate Bundle Chestnuts Obstetric Anesthesia Principles and Practice 5e Ebook and TestBank Bundle
No ratings yet
Ultimate Bundle Chestnuts Obstetric Anesthesia Principles and Practice 5e Ebook and TestBank Bundle
339 pages
Ty v. People
No ratings yet
Ty v. People
11 pages
Power System Protection CH 1 Mcq's
50% (2)
Power System Protection CH 1 Mcq's
3 pages
Kinky KakaIru
No ratings yet
Kinky KakaIru
41 pages
Understanding A Restaurant Cash Flow Statement
No ratings yet
Understanding A Restaurant Cash Flow Statement
6 pages
General Measure of Enterprising Tendency v2 - GET2
No ratings yet
General Measure of Enterprising Tendency v2 - GET2
6 pages
Symbolism of Bertha in Jane Eyre
No ratings yet
Symbolism of Bertha in Jane Eyre
2 pages
Grade 2 Seasons Lesson Plan
No ratings yet
Grade 2 Seasons Lesson Plan
3 pages
Google Dorking: Ethical Hacking Guide
No ratings yet
Google Dorking: Ethical Hacking Guide
2 pages
F650GS-CS Maintenance Schedule
No ratings yet
F650GS-CS Maintenance Schedule
2 pages
FAILURE MODE AND EFFECT ANALYSIS PADA DAMPAK ZAKAT TERHADAP PEREKONOMIAN LOKAL
No ratings yet
FAILURE MODE AND EFFECT ANALYSIS PADA DAMPAK ZAKAT TERHADAP PEREKONOMIAN LOKAL
15 pages
Ecology Essentials for Students
No ratings yet
Ecology Essentials for Students
8 pages
Agriculture Science Technology
No ratings yet
Agriculture Science Technology
3 pages
Power Resources Notes
No ratings yet
Power Resources Notes
26 pages
Job Instructions & Consulting Guide
No ratings yet
Job Instructions & Consulting Guide
7 pages
Guide To Understanding Urine Microscopy Culture and Sensitivity
No ratings yet
Guide To Understanding Urine Microscopy Culture and Sensitivity
34 pages
ThermoKing Yanmar Overhaul Manual TK353ModelRD1
91% (11)
ThermoKing Yanmar Overhaul Manual TK353ModelRD1
53 pages
Aarav Profile 23-24
No ratings yet
Aarav Profile 23-24
2 pages
4-CET-MCQs-Computer-Software-Internet Basics
No ratings yet
4-CET-MCQs-Computer-Software-Internet Basics
22 pages
Simulator Training Course STW 43-3-4 - Model Course - Train The Simulator Trainer and Assessor (Secretariat)
80% (5)
Simulator Training Course STW 43-3-4 - Model Course - Train The Simulator Trainer and Assessor (Secretariat)
125 pages
High School Internship Guide
No ratings yet
High School Internship Guide
10 pages
AOI Programming
No ratings yet
AOI Programming
21 pages
UNIT 5 - Information Extraction
No ratings yet
UNIT 5 - Information Extraction
14 pages
Seoul National University Graduate School of Public Health
No ratings yet
Seoul National University Graduate School of Public Health
1 page
Tuning Linux For Big Firebird Database: 693GB AND 1000+ USERS
No ratings yet
Tuning Linux For Big Firebird Database: 693GB AND 1000+ USERS
35 pages
Stern Model
No ratings yet
Stern Model
3 pages
By Hanna Desta Molla "Assessment On The Role of Micro-Finance On Poverty Reduction The Case of Addis Credit and Saving Institution Kirkos Sub-City".
100% (1)
By Hanna Desta Molla "Assessment On The Role of Micro-Finance On Poverty Reduction The Case of Addis Credit and Saving Institution Kirkos Sub-City".
88 pages
Tesco Comprehensive Resource Audit
100% (1)
Tesco Comprehensive Resource Audit
2 pages
Drug Interaction Report Aspirin, Candesartan H
No ratings yet
Drug Interaction Report Aspirin, Candesartan H
1 page