Skip to content

Gen2DFingerprint unnecessarily slow #8207

@haydn-jones

Description

@haydn-jones

Describe the bug
Chem.Pharma2D.Gen2DFingerprint uses an unnecessarily slow GetUniqueCombinations. There is a much faster implementation called GetUniqueCombinations_new.
Here are direct links to the respective functions in order:

def Gen2DFingerprint(mol, sigFactory, perms=None, dMat=None, bitInfo=None):

def GetUniqueCombinations(choices, classes, which=0):

def GetUniqueCombinations_new(choices, classes, which=0):
.

To Reproduce
This code takes about 75 seconds to execute currently:

from rdkit import Chem
from rdkit.Chem.Pharm2D import Generate, Gobbi_Pharm2D

smi = "CC=CC=C(S1C=C)C(O)C=C1NCCCCCCCCCCCCCCCCCCCCCCCCCCCC=CCCCCCCCCCCCCCCCCCCCCCCCCC=CCCCCCCCCCCCCCCC=CC=CCCC=CC=CCC=CCC=CCCCCCC=CCCC[P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1][P+1]"
mol = Chem.MolFromSmiles(smi)

Generate.Gen2DFingerprint(mol, Gobbi_Pharm2D.factory)

Expected behavior
If you replace the call with GetUniqueCombinations_new, the code takes about 1 second.

Configuration (please complete the following information):

  • RDKit version: 2024.09.4
  • OS: OpenSUSE Tumbleweed
  • Python version (if relevant): 3.12.8
  • Are you using conda? No
  • If you are using conda, which channel did you install the rdkit from?
  • If you are not using conda: how did you install the RDKit?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions