Skip to content

Use GetUniqueCombinations_new instead of GetUniqueCombinations#8214

Merged
greglandrum merged 5 commits intordkit:masterfrom
bertiewooster:GetUniqueCombinations_new
Feb 6, 2025
Merged

Use GetUniqueCombinations_new instead of GetUniqueCombinations#8214
greglandrum merged 5 commits intordkit:masterfrom
bertiewooster:GetUniqueCombinations_new

Conversation

@bertiewooster
Copy link
Copy Markdown
Contributor

@bertiewooster bertiewooster commented Jan 26, 2025

Reference Issue

Fixes #8207

What does this implement/fix? Explain your changes.

Remove GetUniqueCombinations, use instead GetUniqueCombinations_new for improved speed; remove GetUniqueCombinations_new's unused and uncalled parameter which. Update related tests.

Any other comments?

This PR makes the following choices; alternatives are noted:

  • Removes the old function GetUniqueCombinations to prevent its use in the future; that old function could instead be kept.
  • Keeps the name GetUniqueCombinations_new to indicate that it's the new algorithm; the name of the new function could be changed to GetUniqueCombinations to prevent confusion over why there's a function with the prefix _new and no function without that prefix.

@greglandrum
Copy link
Copy Markdown
Member

@bertiewooster I would suggest replacing GetUniqueCombinations() with GetUniqueCombinations_new() and then adding:

GetUniqueCombinations_new = GetUniqueCombinations

to maintain backwards compatibility

@bertiewooster
Copy link
Copy Markdown
Contributor Author

I applied the tests from GetUniqueCombinations() to GetUniqueCombinations_new() and one test is failing with error Combinations are not in the same order on about half of the platforms. Is it in fact a problem if the combinations come in the wrong order? In general I don't think of combinations as being ordered, but there might be a need for that here. If the order matters, I'll have to investigate further.

Here's the platforms the test fails on:

  • Ubuntu_x64
  • Ubuntu_x64_limitexternaldependencies
  • Ubuntu_x64_py311
  • Windows_VS2022_x64
  • macOS_x64

whereas all tests succeed on the other platforms:

  • Ubuntu_x64_cartridge
  • Ubuntu_x64_java
  • Windows_VS2022_x64_dll
  • Windows_VS2022_x64_swig
  • macOS_x64_java

This makes me wonder if itertools.product gives a different order for choice on some platforms. (The other conceivable possibility is that sorted gives different orders on on some platforms, but I would very much like to think that Python's tests would ensure that's not the case.)

@bertiewooster
Copy link
Copy Markdown
Contributor Author

bertiewooster commented Jan 27, 2025

It turns out that GetUniqueCombinations() and GetUniqueCombinations_new() return the same combinations but in different orders, for example:

old[0] = [(1, (11,)), (1, (12,)), (2, (31,))]
new[0] = [(1, (11,)), (1, (12,)), (2, (31,))]
^--same
old[1] = [(1, (11,)), (1, (12,)), (2, (32,))]
new[1] = [(1, (11,)), (1, (12,)), (2, (32,))]
^--same
old[2] = [(1, (11,)), (1, (13,)), (2, (31,))]
new[2] = [(1, (11,)), (1, (13,)), (2, (31,))]
^--same
old[3] = [(1, (11,)), (1, (13,)), (2, (32,))]
new[3] = [(1, (11,)), (1, (13,)), (2, (32,))]
^--same
old[4] = [(1, (12,)), (1, (13,)), (2, (31,))]
new[4] = [(1, (11,)), (1, (14,)), (2, (31,))]
^--different but old[4] is the same as new[6]
old[5] = [(1, (12,)), (1, (13,)), (2, (32,))]
new[5] = [(1, (11,)), (1, (14,)), (2, (32,))]
^--different but old[5] is the same as new[7]
old[6] = [(1, (11,)), (1, (14,)), (2, (31,))]
new[6] = [(1, (12,)), (1, (13,)), (2, (31,))]
^--different but old[6] is the same as new[4]
old[7] = [(1, (11,)), (1, (14,)), (2, (32,))]
new[7] = [(1, (12,)), (1, (13,)), (2, (32,))]
^--different but old[7] is the same as new[5]
old[8] = [(1, (12,)), (1, (14,)), (2, (31,))]
new[8] = [(1, (12,)), (1, (14,)), (2, (31,))]
^--same
old[9] = [(1, (12,)), (1, (14,)), (2, (32,))]
new[9] = [(1, (12,)), (1, (14,)), (2, (32,))]
^--same
old[10] = [(1, (13,)), (1, (14,)), (2, (31,))]
new[10] = [(1, (13,)), (1, (14,)), (2, (31,))]
^--same
old[11] = [(1, (13,)), (1, (14,)), (2, (32,))]
new[11] = [(1, (13,)), (1, (14,)), (2, (32,))]
^--same

GetUniqueCombinations_new() has the combinations in numerical, whereas GetUniqueCombinations() has some out of order. So unless there's some reason to keep the combinations out of numerical order, I suggest we re-order the expected results of the tests to put them in numerical order. Alternatively, we could relax the test to check that the expected and actual results contain the same combinations, but not assert that their order is the same.
(Edited to strike out first suggestion; relaxing the test to check that the expected and actual results contain the same combinations, but not assert that their order is the same, should make the tests pass even if there are differences in the order of combinations on platforms.)

@bertiewooster
Copy link
Copy Markdown
Contributor Author

bertiewooster commented Jan 27, 2025

@bertiewooster I would suggest replacing GetUniqueCombinations() with GetUniqueCombinations_new() and then adding:

GetUniqueCombinations_new = GetUniqueCombinations

to maintain backwards compatibility

@greglandrum is this what you mean, to redefine GetUniqueCombinations to use the new code, and then "alias" GetUniqueCombinations_new to GetUniqueCombinations in rdkit/Chem/Pharm2D/Utils.py?

def GetUniqueCombinations(choices, classes):
  """  Does the combinatorial explosion of the possible combinations
    of the elements of _choices_.

    """
  assert len(choices) == len(classes)
  combos = set()
  for choice in itertools.product(*choices):
    # If a choice occurs in more than one of the fields, we ignore this case
    if len(set(choice)) != len(choice):
      continue
    combos.add(tuple(sorted((cls, ch) for cls, ch in zip(classes, choice))))
  return [list(combo) for combo in sorted(combos)]

GetUniqueCombinations_new = GetUniqueCombinations

And do you recommend calling GetUniqueCombinations_new or GetUniqueCombinations in places where the code is called, e.g. in Pharm2D and in tests?

@greglandrum
Copy link
Copy Markdown
Member

@greglandrum is this what you mean, to redefine GetUniqueCombinations to use the new code, and then "alias" GetUniqueCombinations_new to GetUniqueCombinations in rdkit/Chem/Pharm2D/Utils.py?

Yes.

And do you recommend calling GetUniqueCombinations_new or GetUniqueCombinations in places where the code is called,

I'd call GetUniqueCombinations() there.

Also, since the order of results is going to change, we should add a backwards incompatibility note in ReleaseNotes.md

…ve unused variable `x`. "Alias" GetUniqueCombinations_new to GetUniqueCombinations for backwards compatibility. Add note about backwards incompatibility for GetUniqueCombinations to ReleaseNotes.md.
…Combinations for order-insensitive comparison
@bertiewooster bertiewooster marked this pull request as ready for review January 28, 2025 18:04
@bertiewooster
Copy link
Copy Markdown
Contributor Author

Only test failure is JavaDiversityPickerTests on Ubuntu_x64_java, which seems unrelated to this PR.

@greglandrum
Copy link
Copy Markdown
Member

Only test failure is JavaDiversityPickerTests on Ubuntu_x64_java, which seems unrelated to this PR.

yeah, that's an intermittent failure that happens... no need for concern

Copy link
Copy Markdown
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@greglandrum greglandrum added this to the 2024_09_6 milestone Feb 6, 2025
@greglandrum greglandrum merged commit 103b22d into rdkit:master Feb 6, 2025
@bertiewooster bertiewooster deleted the GetUniqueCombinations_new branch February 6, 2025 04:29
greglandrum pushed a commit that referenced this pull request Feb 28, 2025
* Remove GetUniqueCombinations, use instead GetUniqueCombinations_new for improved speed. Update related tests.

* Remove unused and uncalled parameter `which`

* Update tests for GetUniqueCombinations to be in numerical order. Remove unused variable `x`. "Alias" GetUniqueCombinations_new to GetUniqueCombinations for backwards compatibility. Add note about backwards incompatibility for GetUniqueCombinations to ReleaseNotes.md.

* Create _compareCombinationsOrderUnimportant and use for testGetUniqueCombinations for order-insensitive comparison

* Replace call to GetUniqueCombinations_new with GetUniqueCombinations in Gen2DFingerprint for consistency
ZontaNicola pushed a commit to ZontaNicola/rdkit that referenced this pull request Jul 14, 2025
…#8214)

* Remove GetUniqueCombinations, use instead GetUniqueCombinations_new for improved speed. Update related tests.

* Remove unused and uncalled parameter `which`

* Update tests for GetUniqueCombinations to be in numerical order. Remove unused variable `x`. "Alias" GetUniqueCombinations_new to GetUniqueCombinations for backwards compatibility. Add note about backwards incompatibility for GetUniqueCombinations to ReleaseNotes.md.

* Create _compareCombinationsOrderUnimportant and use for testGetUniqueCombinations for order-insensitive comparison

* Replace call to GetUniqueCombinations_new with GetUniqueCombinations in Gen2DFingerprint for consistency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gen2DFingerprint unnecessarily slow

2 participants