Bug fix to use correct rho value in cal_titau subroutines#1344
Bug fix to use correct rho value in cal_titau subroutines#1344rsarthur wants to merge 2 commits intowrf-model:release-v4.2.2from
Conversation
…olated to the correct location on the staggered grid.
|
@rsarthur @dudhia @weiwangncar Just looking at the code quickly, I do not see the troubles, but certainly anytime horizontal differences are involved, communicated boundaries are likely a problem. To debug something like this, I would run a single time step with an MPI code that is built without optimization. Compare the results of 1 processor with 4 processors. |
|
@davegill Thanks for letting me know, I will take a look. Do you know what the sfs_opt and m_opt values were for the test that showed the issue? |
|
@rsarthur Take a look at You want the Namelists/weekly/em_quarter_ss and em_quarter_ss8 directories. Inside, you will find the files namelist.input.04 (em_quarter_ss, 32-bit precision) and namelist.input_06 (em_quarter_ss8, double precision). Make sure you turn OFF optimization for these bit-for-bit tests ( Let us know what you find. |
|
Do these tests use m_opt?
Also I don't see any red flags in the code changes as it only uses rho
where it
was used before.
…On Wed, Dec 23, 2020 at 2:10 PM Dave Gill ***@***.***> wrote:
@rsarthur <https://github.com/rsarthur>
The sfs_opt and km_opt settings were different for these two runs. Nothing
is ever easy on a supposedly easy PR, especially right before Christmas.
Take a look at
https://github.com/davegill/SCRIPTS
You want the Namelists/weekly/em_quarter_ss and em_quarter_ss8 directories.
Inside, you will find the files namelist.input.04 (em_quarter_ss, 32-bit
precision) and namelist.input_06 (em_quarter_ss8, double precision).
Make sure you turn OFF optimization for these bit-for-bit tests (clean -a,
configure -d, compile em_quarter_ss). For a real*8 build, use configure
-d -r8.
Let us know what you find.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1344 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEIZ77BU3QD5S5KHWKE6JCDSWJME3ANCNFSM4VHKOEJA>
.
|
|
@davegill I was able to run the tests you suggested. I compared em_quarter_ss with 1 and 4 processors, and the same for em_quarter_ss8. I am not seeing any differences in the output for either case. I used diffwrf compiled with the respective wrf executable for the ss and ss8 cases. It is possible that HALO_EM_PHYS_DIFFUSION should include rho, but that should have caused problems prior to this specific code change, since (as @dudhia mentioned) rho was already being used in the same way. |
|
@rsarthur |
|
@rsarthur @dudhia |
|
@rsarthur
|
|
@davegill I am not sure why we are getting different results. Maybe it is because I only ran for 1 timestep. Would you recommend that I try again for more than 1 timestep? Or, could you share the diffwrf output for the runs that are giving you issues? Thanks. |
|
@rsarthur |
|
@davegill |
|
@rsarthur The diffs are in your mods (or are exposed by your mods, honestly). If I use the previous repo code that you branched from, I get bitwise identical results between differing number of MPI processes. If I use your code, I get bit-wise diffs in the 4 corners, when comparing the differing numbers of MPI processes. I generated the file to plot using only the fields that show a difference: The following diffs are after a single time step (the initial time is identical, as is usually the situation), with mpi tasks 3 vs 8. The other fields look similarly structured, though with ncview I had to rescale to get the values viewable for the specific level. The PH diff shows up at the first level up from the surface (ph=0 is a definition at the model surface). |
|
Here's the namelist for the QSS case (obviously without the .txt extension). The configure option "-d" is usually required to get identical results. |
|
@rsarthur We were "lucky" (for some definition of lucky) to find diffs by changing MPI ranks, so only a single MPI build is required to track this down. Keep me in the loop with questions you have. |
|
Thanks for publishing this PR. I also had it on my list but hadn't come around to do the PR. It's really strange that the check fails, since as you say you only use the same values of rho that are already used before. These values are made available by inclusion of HALO_EM_TKE_E and PERIOD_BDY_EM_PHY_BC and by the routine phy_bc in first_rk_step_part2. So I don't know what's causing the problem here. |
|
@davegill I'm on it! |
|
@davegill |
|
@rsarthur |
|
I can imagine that the failing test has to do with PR #1314. There I added the corner points that were missing before in set_physical_bc3d. But anyway it's strange that the test did not fail previously. |
|
@matzegoebel that makes sense. I see that PR #1314 was merged into the 4.2.2 release. @davegill, is it possible to retest this PR in the latest version of the code to see if the issue is fixed? Do you need anything from me to do that? |
|
@rsarthur |



TYPE: bug fix
KEYWORDS: rho, sfs_opt, m_opt, cal_titau
SOURCE: Robert Arthur and Jeff Mirocha (LLNL)
DESCRIPTION OF CHANGES:
Problem:
In the cal_titau subroutines in module_diffusion_em, rho was not always interpolated to the correct location on the staggered grid. This issue affected the subgrid stress terms when sfs_opt .gt. 1. Additionally, when sfs_opt .eq. 0 and m_opt .eq. 1, the actual subgrid stresses applied in the code were correct, but the output was wrong.
Solution:
A rhoavg variable was added to cal_titau_12_21, cal_titau_13_31, and cal_titau_23_32 and used instead of the cell-centered rho variable.
ISSUE: This is a follow-on from issue #1214 by @matzegoebel, which addressed the surface stress output from subroutine vertical_diffusion_2 when sfs_opt .gt. 0 or m_opt .eq. 1, but did not address the cal_titau subroutines.
LIST OF MODIFIED FILES: dyn_em/module_diffusion_em.F
TESTS CONDUCTED:
For this relatively simple fix, no tests were conducted beyond confirming that the code compiles.
RELEASE NOTE: Bug fix to use correct rho value in cal_titau subroutines