Skip to content

Parse rotational constants and molecular mass for Gaussian logs#1054

Merged
amandadumi merged 9 commits intocclib:masterfrom
amarkpayne:gaussian_rotcons
Sep 6, 2022
Merged

Parse rotational constants and molecular mass for Gaussian logs#1054
amandadumi merged 9 commits intocclib:masterfrom
amarkpayne:gaussian_rotcons

Conversation

@amarkpayne
Copy link
Copy Markdown
Contributor

Purpose

Currently, cclib has the ability to parse rotational constants and molecular mass from MOPAC log files (I think this functionality was added to cclib long ago by RMG developer Greg Magoon to support his work on QMTP in RMG). This PR extends this ability to Gaussian log files, which is something we use cclib for in RMG.

Testing

I have tested locally that the added code works by testing it on Gaussian log files for various semi-empirical calculations. It is the same suite of tests that we use to test the MOPAC parsing ability already implemented in cclib

Other considerations

This is my first time contributing to cclib, so let me know if I missed any steps in your usual development workflow. I tried to follow the guidelines outlined in your online documentation.

Copy link
Copy Markdown
Member

@shivupa shivupa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good so far. Just a note about the convertor. I will check on adding a test that covers this

@berquist berquist added this to the v1.7.1 milestone Jul 22, 2021
@berquist
Copy link
Copy Markdown
Member

rotcons isn't actually an attribute: it currently only exists as a calculation method. It needs to be added as a proper attribute. We also need to decide on what the standard units will be and potentially add any conversion factors to the converter (though Shiv already hinted at that).

@shivupa
Copy link
Copy Markdown
Member

shivupa commented Jul 22, 2021

Per #256 it seems like we were hoping users would calculate rotational constants from our method

Would this work with the expected use case you had in mind @amarkpayne

@shivupa
Copy link
Copy Markdown
Member

shivupa commented Jul 22, 2021

haha @berquist beat me to it

@berquist
Copy link
Copy Markdown
Member

berquist commented Jul 22, 2021

Well, I could see wanting to parse what's there rather than recomputing it. If you're doing this for a database (anything more than 100 logfiles, just to pick a number), the MOI tensor calculation and diagonalization will probably become significant in terms of time.

@amarkpayne
Copy link
Copy Markdown
Contributor Author

In our use case the user would be parsing hundreds of files, so speed would be preferable here. Let me also look at your method though

@shivupa
Copy link
Copy Markdown
Member

shivupa commented Jul 22, 2021

I agree then it would make sense to make this a parsed attribute.
The steps needed to do that would be to

  1. Add an attribute and description to the ccdata object
  2. Use self.set_attribute('rotcons', parsed_rotcons) to set the rotational constants that you parsed. A similar call for the masses.
  3. A test for the dvb_sp.out would since this parses the line
    dvb_sp.out: Rotational constants (GHZ): 4.6266363 0.6849065 0.5965900
    So we could add a test in this file where the values of the parsed data is tested

Let us know if you get stuck at any point and we can help you

Edit:
And docs on this page
https://github.com/cclib/cclib/blob/master/doc/sphinx/data_notes.rst

@amarkpayne
Copy link
Copy Markdown
Contributor Author

So not to muddy the discussion but I actually was able to try out the Nuclear.rotational_constants method really quick (it wasn't too bad to code up, sorry I was unaware of its existence given that the MOPAC parser does things differently) and it seems to work just fine. In the unit tests we run, we only parse 4 logs, but as far as I can tell calculating these numbers did not significantly increase the time compared to everything else we are doing, and we get the same answers (at least within the precision we set in our unit tests).

TLDR it looks like either approach will work for us, so let me know what you prefer. Also let me know if molmass is fine as is as an attribute

@ghutchis
Copy link
Copy Markdown
Contributor

Sorry to jump in, but it's also going to depend on the size of the molecule. The MOI and diagonalization is negligible for v. small molecules, but if you get to 100s of atoms, times 100s of log files, you'll want to parse it if available. For a database in the group, I'd definitely prefer to parse for formats where available.

@shivupa
Copy link
Copy Markdown
Member

shivupa commented Jul 22, 2021

TLDR it looks like either approach will work for us, so let me know what you prefer. Also let me know if molmass is fine as is as an attribute

It is up to you. In the short term you could proceed by using the Nuclear.rotational_constants method, and we could work on getting this in. If you didn't have the bandwidth to do this, we could take care of putting this in. Now that we know this would be of interest, it makes sense to work on it. Let us know what works for you.

I think we might not need molmass, since we could just sum atommasses (an existing attribute) whenever we need molmass?

Sorry to jump in, but it's also going to depend on the size of the molecule. The MOI and diagonalization is negligible for v. small molecules, but if you get to 100s of atoms, times 100s of log files, you'll want to parse it if available. For a database in the group, I'd definitely prefer to parse for formats where available.

Agreed!

@berquist
Copy link
Copy Markdown
Member

@amarkpayne Poke. If you use utils.convertor to remove the hard-coded constants, then rebase (the test failures are not from your PR), I'd be happy to merge this.

@amandadumi amandadumi self-assigned this Feb 23, 2022
@langner langner modified the milestones: v1.7.2, v1.8 May 17, 2022
@berquist berquist force-pushed the gaussian_rotcons branch from f5932ed to 7d05a13 Compare July 4, 2022 22:07
@berquist
Copy link
Copy Markdown
Member

berquist commented Jul 4, 2022

Since Mark hasn't responded, I've decided to try finishing this off. It's also parsed for MOPAC, but as wavenumbers, so it needs a conversion. Right now that part of MOPAC isn't tested. I figured that should be in a follow-up PR.

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 8, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.86%. Comparing base (592a871) to head (9638d8d).
Report is 1205 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1054   +/-   ##
=======================================
  Coverage   87.85%   87.86%           
=======================================
  Files          69       69           
  Lines       13503    13511    +8     
=======================================
+ Hits        11863    11871    +8     
  Misses       1640     1640           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Member

@amandadumi amandadumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for getting to this and sorry I wasn't able to yet. But this all looks great, just two questions!

Copy link
Copy Markdown
Member

@amandadumi amandadumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Thank you.

@berquist
Copy link
Copy Markdown
Member

berquist commented Sep 4, 2022

poke

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants