Describe the bug
Chem.Pharma2D.Gen2DFingerprint uses an unnecessarily slow GetUniqueCombinations. There is a much faster implementation called GetUniqueCombinations_new.
Here are direct links to the respective functions in order:
|
def Gen2DFingerprint(mol, sigFactory, perms=None, dMat=None, bitInfo=None): |
|
def GetUniqueCombinations(choices, classes, which=0): |
|
def GetUniqueCombinations_new(choices, classes, which=0): |
.
To Reproduce
This code takes about 75 seconds to execute currently:
from rdkit import Chem
from rdkit.Chem.Pharm2D import Generate, Gobbi_Pharm2D
smi = "CC=CC=C(S1C=C)C(O)C=C1NCCCCCCCCCCCCCCCCCCCCCCCCCCCC=CCCCCCCCCCCCCCCCCCCCCCCCCC=CCCCCCCCCCCCCCCC=CC=CCCC=CC=CCC=CCC=CCCCCCC=CCCC[P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1]"
mol = Chem.MolFromSmiles(smi)
Generate.Gen2DFingerprint(mol, Gobbi_Pharm2D.factory)
Expected behavior
If you replace the call with GetUniqueCombinations_new, the code takes about 1 second.
Configuration (please complete the following information):
- RDKit version: 2024.09.4
- OS: OpenSUSE Tumbleweed
- Python version (if relevant): 3.12.8
- Are you using conda? No
- If you are using conda, which channel did you install the rdkit from?
- If you are not using conda: how did you install the RDKit?
Describe the bug
Chem.Pharma2D.Gen2DFingerprintuses an unnecessarily slowGetUniqueCombinations. There is a much faster implementation calledGetUniqueCombinations_new.Here are direct links to the respective functions in order:
rdkit/rdkit/Chem/Pharm2D/Generate.py
Line 81 in 6867286
rdkit/rdkit/Chem/Pharm2D/Utils.py
Line 324 in 6867286
rdkit/rdkit/Chem/Pharm2D/Utils.py
Line 348 in 6867286
To Reproduce
This code takes about 75 seconds to execute currently:
Expected behavior
If you replace the call with
GetUniqueCombinations_new, the code takes about 1 second.Configuration (please complete the following information):