Conversation
Codecov Report
@@ Coverage Diff @@
## master #1486 +/- ##
==========================================
- Coverage 41.85% 41.53% -0.33%
==========================================
Files 177 176 -1
Lines 29490 29833 +343
Branches 6097 6191 +94
==========================================
+ Hits 12344 12390 +46
- Misses 16266 16534 +268
- Partials 880 909 +29
Continue to review full report at Codecov.
|
f2ee821 to
9df5781
Compare
8bc4271 to
583acfc
Compare
|
Okay, this is ready for review. |
mliu49
left a comment
There was a problem hiding this comment.
I've finished an initial review. I plan on reviewing in more detail once you've removed the whitespace changes from the commits.
rmgpy/kinetics/arrhenius.pyx
Outdated
| while boo: | ||
| boo = False | ||
| try: | ||
| params = curve_fit(kfcn,xdata,ydata,sigma=sigmas,p0=[1.0,1.0,w0/10.0],xtol=xtol,ftol=ftol) |
There was a problem hiding this comment.
This section seems to be indented with two spaces. Also, could you add spaces after the commas and possibly add a line break in the call to curve_fit if it ends up too long after adding spaces.
rmgpy/kinetics/uncertainties.pyx
Outdated
| def __get__(self): | ||
| return self._mu | ||
| def __set__(self, value): | ||
| self._mu = value |
There was a problem hiding this comment.
If you're not doing anything to the value, why do you need to make it a property, rather than a normal attribute?
rmgpy/kinetics/uncertainties.pxd
Outdated
|
|
||
| cdef public double _Tref | ||
| cdef public double _mu | ||
| cdef public double _sigma |
There was a problem hiding this comment.
sigma doesn't seem to exist in the .pyx file.
| # DEALINGS IN THE SOFTWARE. # | ||
| # # | ||
| ############################################################################### | ||
| import numpy as np |
There was a problem hiding this comment.
I think it might be more efficient to cimport numpy.
There was a problem hiding this comment.
Doesn't seem to work here, the import works, but the imported module doesn't have basic things like sqrt or pi.
rmgpy/molecule/graph.pyx
Outdated
|
|
||
| return new | ||
|
|
||
| cpdef _merge(self, Graph other): |
There was a problem hiding this comment.
The point of underscored methods is that they are not intended to be used outside of the module. Naming it this way also does not help convey the difference compared to the main merge and split methods at all.
I think the ideal approach would be to add an argument to merge and split indicating whether or not to copy the graph. If that would be too difficult to implement, then the method names should be more reflective of the difference.
There was a problem hiding this comment.
IIRC, these methods were to get around the fact that subgraph isomorphism did not previously work on disconnected graphs. Since Richard fixed that, do you still need these methods?
There was a problem hiding this comment.
Doesn't seem to be used anymore, so I've removed it.
rmgpy/molecule/group.py
Outdated
| if atom.label == label: return atom | ||
| raise ValueError('No atom in the functional group has the label "{0}".'.format(label)) | ||
| if atom.label == label: | ||
| alist.append(atom) |
There was a problem hiding this comment.
You could use a list comprehension: alist = [atom for atom in self.verticies if atom.label == label]
Also, you should update the docstring to indicate that this will always return a list. It would be nice to also rename the method to getLabeledAtoms, but it's not necessary.
rmgpy/molecule/molecule.py
Outdated
| if atom.label == label: return atom | ||
| raise ValueError('No atom in the molecule has the label "{0}".'.format(label)) | ||
| if atom.label == label: | ||
| alist.append(atom) |
There was a problem hiding this comment.
As with the Group method, this could also be a list comprehension.
rmgpy/molecule/molecule.py
Outdated
| # Do the isomorphism comparison | ||
| result = Graph.findSubgraphIsomorphisms(self, other, initialMap, saveOrder=saveOrder) | ||
| return result | ||
| return Graph.findSubgraphIsomorphisms(self,group,initialMap,saveOrder=saveOrder) |
There was a problem hiding this comment.
You removed spaces after commas 🙁
rmgpy/rmg/main.py
Outdated
| solvent=self.solvent | ||
|
|
||
| if self.kineticsEstimator == 'rate rules': | ||
| autoGeneratedTrees = ['r_recombination'] |
There was a problem hiding this comment.
Could you make autoGenerated an attribute of KineticsFamily (and saved in the groups file) to avoid hard-coding?
87ef826 to
4970bef
Compare
skip map possibilities that map multiple graph atoms to the same subgraph atom
note this does not protect against merging two graphs with atoms that are the same reference
and adapt associated test
using this option makes getReactionMatches more robust
don't check accessibility as all nodes are guaranteed to be accessible in a generated tree and R_Recombination has issues with sample molecule generation
skip tree generation related lines when reading chemkin file add generate tree statement indicators
In some cases the symmetry of a group can cause what in most cases stays a regularization dimension to stop being a regularization dimension This can lead to nodes that are splitable looking like they aren't splitable in terms of regularization dimensions This commit clears the regularization dimensions in this case causing it to recompute them and be able to split the node
…e generation Breaks reactions into batches based on a modified stratified sampling scheme Effectively: The top and bottom outlierFraction of all reactions are always included in the first batch The remaining reactions are ordered by the rate coefficients at T The list of reactions is then split into stratumNum similarly sized intervals batches sample equally from each interval, but randomly within each interval until they reach maxBatchSize reactions A list of lists of reactions containing the batches is returned
…s from the tree before reoptimizing with an additional batch Remove nodes that have less than maxRxnToReoptNode reactions that match and clear the regularization dimensions of their parent This is used to remove smaller easier to optimize and more likely to change nodes before adding a new batch in cascade model generation
When the number of reactions is greater than maxBatchSize tree generation switches to the faster Cascade algorithm maxBatchSize is the maximum number of reactions in a batch of the cascade algorithm outlierFraction is the fraction of reactions that are fastest and slowest that are forced to be included in the first batch of the cascade algorithm stratumNum is the number of strata in the stratified sampling scheme used to construct the Cascade algorithm batches maxRxnsToReoptNode is the maximum number of matching reactions at which the Cascade algorithm will prune the node and reoptimize it in the next batch
During node generation the nodes matching the most reactions are created first this means that a naive parallelization will hand all of the most computationally expensive fits to the same process this commit shuffles them randomly so hard fits are spread evenly between processors and then reorders them after processing
One issue with the Cascade algorithm is that information about the regularization dimensions of nodes that match lots of reactions is prohibitively expensive to compute this commit tests each "guessed" regularization dimension after it is applied to check whether it actually is a regularization dimension, if it isn't that particular regularization is reversed and ignored
|
Ok, I made the whitespace changes and removed the travis and deploy commits. |
mliu49
left a comment
There was a problem hiding this comment.
Will merge after tests pass.

This adds parallelization of tree generation, integration with RMG kinetics estimation and on-the-fly uncertainty calculation.