Skip to content

Refactoring fragment.py for better maintainability and style#2360

Closed
JacksonBurns wants to merge 1 commit intoReactionMechanismGenerator:mainfrom
JacksonBurns:afm-refactor
Closed

Refactoring fragment.py for better maintainability and style#2360
JacksonBurns wants to merge 1 commit intoReactionMechanismGenerator:mainfrom
JacksonBurns:afm-refactor

Conversation

@JacksonBurns
Copy link
Copy Markdown
Contributor

Working with @lily90502 to take better advantage of existing infrastructure in RMG to make the new fragment features more maintainable and extensible.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 30, 2023

Codecov Report

Merging #2360 (295ef99) into main (2cdc2ea) will decrease coverage by 0.68%.
The diff coverage is 20.00%.

@@            Coverage Diff             @@
##             main    #2360      +/-   ##
==========================================
- Coverage   48.38%   47.70%   -0.68%     
==========================================
  Files         110      110              
  Lines       30517    30517              
  Branches     7951     7951              
==========================================
- Hits        14765    14558     -207     
- Misses      14211    14443     +232     
+ Partials     1541     1516      -25     
Impacted Files Coverage Δ
rmgpy/molecule/fragment.py 73.63% <20.00%> (ø)
rmgpy/qm/molecule.py 37.55% <0.00%> (-43.27%) ⬇️
rmgpy/qm/mopac.py 26.43% <0.00%> (-41.38%) ⬇️
rmgpy/qm/qmdata.py 52.77% <0.00%> (-27.78%) ⬇️
rmgpy/qm/main.py 51.61% <0.00%> (-13.71%) ⬇️
arkane/encorr/decomp.py 93.50% <0.00%> (-5.20%) ⬇️
arkane/encorr/data.py 93.48% <0.00%> (+0.93%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@JacksonBurns
Copy link
Copy Markdown
Contributor Author

Output of symilar as a starting point for identifying where methods should be inherited:

Command output:
/RMG-Py/rmgpy/molecule$ symilar fragment.py molecule.py

32 similar lines in 2 files
==fragment.py:[264:301]
==molecule.py:[1018:1055]
       @property
       def fingerprint(self):
           """
           Fingerprint used to accelerate graph isomorphism comparisons with
           other molecules. The fingerprint is a short string containing a
           summary of selected information about the molecule. Two fingerprint
           strings matching is a necessary (but not sufficient) condition for
           the associated molecules to be isomorphic.

           Use an expanded molecular formula to also enable sorting.
           """
           if self._fingerprint is None:
               # Include these elements in this order at minimum
               element_dict = {'C': 0, 'H': 0, 'N': 0, 'O': 0, 'S': 0}
               all_elements = sorted(self.get_element_count().items(), key=lambda x: x[0])  # Sort alphabetically
               element_dict.update(all_elements)
               self._fingerprint = ''.join([f'{symbol}{num:0>2}' for symbol, num in element_dict.items()])
           return self._fingerprint

       @fingerprint.setter
       def fingerprint(self, fingerprint):
           self._fingerprint = fingerprint

       @property
       def inchi(self):
           """InChI string for this molecule. Read-only."""
           if self._inchi is None:
               self._inchi = self.to_inchi()
           return self._inchi

       @property
       def smiles(self):
           """SMILES string for this molecule. Read-only."""
           if self._smiles is None:
               self._smiles = self.to_smiles()
           return self._smiles


25 similar lines in 2 files
==fragment.py:[1464:1493]
==molecule.py:[1957:1986]
       def is_linear(self):
           """
           Return :data:`True` if the structure is linear and :data:`False`
           otherwise.
           """

           atom_count = len(self.vertices)

           # Monatomic molecules are definitely nonlinear
           if atom_count == 1:
               return False
           # Diatomic molecules are definitely linear
           elif atom_count == 2:
               return True
           # Cyclic molecules are definitely nonlinear
           elif self.is_cyclic():
               return False

           # True if all bonds are double bonds (e.g. O=C=O)
           all_double_bonds = True
           for atom1 in self.vertices:
               for bond in atom1.edges.values():
                   if not bond.is_double(): all_double_bonds = False
           if all_double_bonds: return True

           # True if alternating single-triple bonds (e.g. H-C#C-H)
           # This test requires explicit hydrogen atoms
           for atom in self.vertices:
               bonds = list(atom.edges.values())

16 similar lines in 2 files
==fragment.py:[1377:1397]
==molecule.py:[2594:2614]
           # Get a set of atom indices for each molecule
           atom_ids = set([atom.id for atom in self.atoms])
           other_ids = set([atom.id for atom in other.atoms])

           if atom_ids == other_ids:
               # If the two molecules have the same indices, then they might be identical
               # Sort the atoms by ID
               atom_list = sorted(self.atoms, key=lambda x: x.id)
               other_list = sorted(other.atoms, key=lambda x: x.id)

               # If matching atom indices gives a valid mapping, then the molecules are fully identical
               mapping = {}
               for atom1, atom2 in zip(atom_list, other_list):
                   mapping[atom1] = atom2

               return self.is_mapping_valid(other, mapping, equivalent=True, strict=strict)
           else:
               # The molecules don't have the same set of indices, so they are not identical
               return False


14 similar lines in 2 files
==fragment.py:[1295:1309]
==molecule.py:[2374:2388]
           except ValueError:
               logging.warning('Unable to check aromaticity by converting to RDKit Mol.')
           else:
               aromatic_rings = []
               aromatic_bonds = []
               for ring0 in rings:
                   aromatic_bonds_in_ring = []
                   # Figure out which atoms and bonds are aromatic and reassign appropriately:
                   for i, atom1 in enumerate(ring0):
                       if not atom1.is_carbon():
                           # all atoms in the ring must be carbon in RMG for our definition of aromatic
                           break
                       for atom2 in ring0[i + 1:]:
                           if self.has_bond(atom1, atom2):

13 similar lines in 2 files
==fragment.py:[1515:1529]
==molecule.py:[2238:2252]
               for i in range(atom.radical_electrons):
                   H = Atom('H', radical_electrons=0, lone_pairs=0, charge=0)
                   bond = Bond(atom, H, 1)
                   self.add_atom(H)
                   self.add_bond(bond)
                   if atom not in added:
                       added[atom] = []
                   added[atom].append([H, bond])
                   atom.decrement_radical()

           # Update the atom types of the saturated structure (not sure why
           # this is necessary, because saturating with H shouldn't be
           # changing atom types, but it doesn't hurt anything and is not
           # very expensive, so will do it anyway)

12 similar lines in 2 files
==fragment.py:[1297:1309]
==molecule.py:[2406:2418]
           else:
               aromatic_rings = []
               aromatic_bonds = []
               for ring0 in rings:
                   aromatic_bonds_in_ring = []
                   # Figure out which atoms and bonds are aromatic and reassign appropriately:
                   for i, atom1 in enumerate(ring0):
                       if not atom1.is_carbon():
                           # all atoms in the ring must be carbon in RMG for our definition of aromatic
                           break
                       for atom2 in ring0[i + 1:]:
                           if self.has_bond(atom1, atom2):

11 similar lines in 2 files
==fragment.py:[610:622]
==molecule.py:[1570:1582]
           for element, count in group.elementCount.items():
               if element not in element_count:
                   return False
               elif element_count[element] < count:
                   return False

           if generate_initial_map:
               keys = []
               atms = []
               initial_map = dict()
               for atom in self.atoms:
                   if atom.label and atom.label != '':

11 similar lines in 2 files
==fragment.py:[207:219]
==molecule.py:[1275:1287]
           # Copy connectivity values and sorting labels
           for i in range(len(self.vertices)):
               v1 = self.vertices[i]
               v2 = other.vertices[i]
               v2.connectivity1 = v1.connectivity1
               v2.connectivity2 = v1.connectivity2
               v2.connectivity3 = v1.connectivity3
               v2.sorting_label = v1.sorting_label
           other.multiplicity = self.multiplicity
           other.reactive = self.reactive
           return other


10 similar lines in 2 files
==fragment.py:[1259:1269]
==molecule.py:[1732:1742]
           # Check if multiplicity is possible
           n_rad = self.get_radical_count()
           multiplicity = self.multiplicity
           if not (n_rad + 1 == multiplicity or n_rad - 1 == multiplicity or
                   n_rad - 3 == multiplicity or n_rad - 5 == multiplicity):
               raise ValueError('Impossible multiplicity for molecule\n{0}\n multiplicity = {1} and number of '
                                'unpaired electrons = {2}'.format(self.to_adjacency_list(), multiplicity, n_rad))
           if raise_charge_exception:
               if self.get_net_charge() != 0:
                   raise ValueError('Non-neutral molecule encountered. '

9 similar lines in 2 files
==fragment.py:[370:380]
==molecule.py:[2212:2222]
       def get_charge_span(self):
           """
           Iterate through the atoms in the structure and calculate the charge span
           on the overall molecule.
           The charge span is a measure of the number of charge separations in a molecule.
           """
           abs_net_charge = abs(self.get_net_charge())
           sum_of_abs_charges = sum([abs(atom.charge) for atom in self.vertices])
           return (sum_of_abs_charges - abs_net_charge) / 2


9 similar lines in 2 files
==fragment.py:[174:183]
==molecule.py:[1665:1674]
       def draw(self, path):
           """
           Generate a pictorial representation of the chemical graph using the
           :mod:`draw` module. Use `path` to specify the file to save
           the generated image to; the image type is automatically determined by
           extension. Valid extensions are ``.png``, ``.svg``, ``.pdf``, and
           ``.ps``; of these, the first is a raster format and the remainder are
           vector formats.
           """

8 similar lines in 2 files
==fragment.py:[1535:1543]
==molecule.py:[2142:2150]
       def is_aryl_radical(self, aromatic_rings=None, save_order=False):
           """
           Return ``True`` if the molecule only contains aryl radicals,
           i.e., radical on an aromatic ring, or ``False`` otherwise.
           If no ``aromatic_rings`` provided, aromatic rings will be searched in-place,
           and this process may involve atom order change by default. Set ``save_order`` to
           ``True`` to force the atom order unchanged.
           """

8 similar lines in 2 files
==fragment.py:[1284:1294]
==molecule.py:[2363:2373]
           from rdkit.Chem.rdchem import BondType
           AROMATIC = BondType.AROMATIC

           if rings is None:
               rings = self.get_relevant_cycles()
               rings = [ring for ring in rings if len(ring) == 6]
           if not rings:
               return [], []

           try:

8 similar lines in 2 files
==fragment.py:[1272:1282]
==molecule.py:[2346:2357]
       def get_aromatic_rings(self, rings=None, save_order=False):
           """
           Returns all aromatic rings as a list of atoms and a list of bonds.

           Identifies rings using `Graph.get_smallest_set_of_smallest_rings()`, then uses RDKit to perceive aromaticity.
           RDKit uses an atom-based pi-electron counting algorithm to check aromaticity based on Huckel's Rule.
           Therefore, this method identifies "true" aromaticity, rather than simply the RMG bond type.

           The method currently restricts aromaticity to six-membered carbon-only rings. This is a limitation imposed
           by RMG, and not by RDKit.


8 similar lines in 2 files
==fragment.py:[575:584]
==molecule.py:[1490:1499]
           # Do the quick isomorphism comparison using the fingerprint
           # Two fingerprint strings matching is a necessary (but not
           # sufficient!) condition for the associated molecules to be isomorphic
           if self.fingerprint != other.fingerprint:
               return False
           # check multiplicity
           if self.multiplicity != other.multiplicity:
               return False


8 similar lines in 2 files
==fragment.py:[351:360]
==molecule.py:[1124:1133]
       def remove_bond(self, bond):
           """
           Remove the bond between atoms `atom1` and `atom2` from the graph.
           Does not remove atoms that no longer have any bonds as a result of
           this removal.
           """
           self._fingerprint = self._inchi = self._smiles = None
           return self.remove_edge(bond)


8 similar lines in 2 files
==fragment.py:[308:317]
==molecule.py:[1115:1124]
       def remove_atom(self, atom):
           """
           Remove `atom` and all bonds associated with it from the graph. Does
           not remove atoms that no longer have any bonds as a result of this
           removal.
           """
           self._fingerprint = self._inchi = self._smiles = None
           return self.remove_vertex(atom)


8 similar lines in 2 files
==fragment.py:[243:252]
==molecule.py:[1433:1442]
           return alist

       def get_all_labeled_atoms(self):
           """
           Return the labeled atoms as a ``dict`` with the keys being the labels
           and the values the atoms themselves. If two or more atoms have the
           same label, the value is converted to a list of these atoms.
           """
           labeled = {}

7 similar lines in 2 files
==fragment.py:[1012:1020]
==molecule.py:[2088:2096]
           """
           if self.symmetry_number == -1:
               self.calculate_symmetry_number()
           return self.symmetry_number

       def calculate_symmetry_number(self):
           """
           Return the symmetry number for the structure. The symmetry number

7 similar lines in 2 files
==fragment.py:[639:647]
==molecule.py:[1598:1606]
                           return True
                   else:
                       return False
               else:
                   if not self.is_mapping_valid(other, initial_map, equivalent=False):
                       return False

           # Do the isomorphism comparison

7 similar lines in 2 files
==fragment.py:[343:351]
==molecule.py:[1067:1075]
       def add_bond(self, bond):
           """
           Add a `bond` to the graph as an edge connecting the two atoms `atom1`
           and `atom2`.
           """
           self._fingerprint = self._inchi = self._smiles = None
           return self.add_edge(bond)


7 similar lines in 2 files
==fragment.py:[198:205]
==molecule.py:[1265:1272]
       def copy(self, deep=False):
           """
           Create a copy of the current graph. If `deep` is ``True``, a deep copy
           is made: copies of the vertices and edges are used in the new graph.
           If `deep` is ``False`` or not specified, a shallow copy is made: the
           original vertices and edges are used in the new graph.
           """

6 similar lines in 2 files
==fragment.py:[1543:1552]
==molecule.py:[2151:2160]
           if aromatic_rings is None:
               aromatic_rings = self.get_aromatic_rings(save_order=save_order)[0]

           total = self.get_radical_count()
           aromatic_atoms = set([atom for atom in itertools.chain.from_iterable(aromatic_rings)])
           aryl = sum([atom.radical_electrons for atom in aromatic_atoms])

           return total == aryl


6 similar lines in 2 files
==fragment.py:[1347:1355]
==molecule.py:[2554:2562]
           Uses entire range of cython's integer values to reduce chance of duplicates
           """

           global atom_id_counter

           for atom in self.atoms:
               atom.id = atom_id_counter
               atom_id_counter += 1

6 similar lines in 2 files
==fragment.py:[1311:1319]
==molecule.py:[2392:2400]
                                   aromatic_bonds_in_ring.append(self.get_bond(atom1, atom2))
                   else:  # didn't break so all atoms are carbon
                       if len(aromatic_bonds_in_ring) == 6:
                           aromatic_rings.append(ring0)
                           aromatic_bonds.append(aromatic_bonds_in_ring)

               return aromatic_rings, aromatic_bonds


6 similar lines in 2 files
==fragment.py:[1251:1259]
==molecule.py:[1723:1730]
           Skips the first line (assuming it's a label) unless `withLabel` is
           ``False``.
           """
           from rmgpy.molecule.adjlist import from_adjacency_list

           self.vertices, self.multiplicity = from_adjacency_list(adjlist, group=False, saturate_h=saturate_h)
           self.update_atomtypes(raise_exception=raise_atomtype_exception)

5 similar lines in 2 files
==fragment.py:[1502:1509]
==molecule.py:[1995:2002]
           else:
               # didn't fail
               return True

           # not returned yet? must be nonlinear
           return False


5 similar lines in 2 files
==fragment.py:[1447:1454]
==molecule.py:[1463:1470]
               if key in element_count:
                   element_count[key] += 1
               else:
                   element_count[key] = 1

           return element_count


5 similar lines in 2 files
==fragment.py:[152:157]
==molecule.py:[932:937]
                               'using InChI and ignoring SMILES.')
           if inchi:
               self.from_inchi(inchi)
               self._inchi = inchi
           elif smiles:
TOTAL lines=4804 duplicates=285 percent=5.93

@JacksonBurns
Copy link
Copy Markdown
Contributor Author

Migrating this to a branch on RMG-Py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant