biojava icon indicating copy to clipboard operation
biojava copied to clipboard

Add standard documented way how to reduce structure to a subset of chains

Open rdk opened this issue 4 years ago • 2 comments

...while setting all the metadata consistently and handling ligands intuitively (i.e. keeping the adjacent/bound ligands in).

The closest approximation of this functionality can be done via new StructureName("4hhb.A").reduce(structure), but:

  1. It is limited to a single chain in implementation and in the name format (this is the main concern),
  2. It is not clear if and how it works with structure files that don't have a pdb id (homology models, AlphaFold models, "bare" pdb files with just ATOM/HETATM records),
  3. This functionality is quite hidden. Adding a more visible method, for example to StructureTools, would go a long way.

Some context to the point 3.: In version 5.4 there used to be a method Structure getReducedStructure(Structure s, String chainId) in StructureTools (also limited to a single chain), but it was removed in favor of using StructureIdentifier/StructureName framework. However, this use is not documented (except in the tests, see TestStructureName.java) and furthermore it doesn't seem very intuitive to look into class called StructureName if I want to manipulate a structure's content.

The implementation of the method getReducedStructure() in version 5.4 shows that the process is quite involved and that it is easy to get it wrong if it is left for the users of the library.

If you decide to add this functionality, I'm happy to help with implementation/testing.

rdk avatar Dec 09 '21 12:12 rdk

Thanks for the feedback @rdk . I see your point, especially about the lack of functionality for multiple chains.

I think adding such method in StructureTools would be a good solution. There could be a few signatures

Structure getReducedStructure(Structure s, List<String> chainIds);

Structure getReducedStructure(Structure s, List<String> chainIds, double thresholdToIncludeBoundNonPolymers);

Structure getReducedStructure(Structure s, List<Integer> entityIds);

Structure getReducedStructure(Structure s, List<Integer> entityIds, double thresholdToIncludeBoundNonPolymers);

josemduarte avatar Dec 11 '21 00:12 josemduarte

I can implement it and send it as PR with tests. @josemduarte, is it a contribution you would be interested in? If so, which branch / milestone shoud I target?

I was wondering if @aalhossary might have some input on that, since he was the one who removed the old StructureTools.getReducedStructure(Structure s, String chainId) method: https://github.com/biojava/biojava/commit/12d348d0ad2a4f252276d8e64cf46ace04fc37b0#diff-f9baf770ea89ce7172f7c688aa80f4277aa52372581b53f207b2d65311a2cbeeL1275

Structure reduction logic is currently implemented in SubstructureIdentifier.reduce() and it should be possible to call new SubstructureIdentifier("3AA0.A,B").reduce(structure), however it is not completely clear from the documentation comments and there are no tests for it.

For start I could do the following:

  • making sure that SubstructureIdentifier.reduce() works for this case and writing some tests for it
  • adding StructureTools.getReducedStructure(Structure s, List<String> chainIds) that calls SubstructureIdentifier.reduce()

However, it looks like the implementation of the other 3 methods

Structure getReducedStructure(Structure s, List<String> chainIds, double thresholdToIncludeBoundNonPolymers);
Structure getReducedStructure(Structure s, List<Integer> entityIds);
Structure getReducedStructure(Structure s, List<Integer> entityIds, double thresholdToIncludeBoundNonPolymers);

would be only possible by duplicating & modifiing the reductionn logic from SubstructureIdentifier.

rdk avatar Apr 19 '22 08:04 rdk