Skip to content

Fix HDF5+mpi~fortran#35400

Merged
tldahlgren merged 4 commits intospack:developfrom
s-sajid-ali:sajid/hdf5_mpi_fortran
Mar 14, 2023
Merged

Fix HDF5+mpi~fortran#35400
tldahlgren merged 4 commits intospack:developfrom
s-sajid-ali:sajid/hdf5_mpi_fortran

Conversation

@s-sajid-ali
Copy link
Copy Markdown
Contributor

Fixes #35377

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Feb 8, 2023

@byrnHDF can you review this PR?

This PR modifies the following package(s), for which you are listed as a maintainer:

  • hdf5

byrnHDF
byrnHDF previously approved these changes Feb 8, 2023
hyoklee
hyoklee previously approved these changes Feb 8, 2023
tldahlgren
tldahlgren previously approved these changes Feb 8, 2023
Copy link
Copy Markdown
Contributor

@tldahlgren tldahlgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tldahlgren tldahlgren enabled auto-merge (squash) February 8, 2023 18:46
lrknox
lrknox previously approved these changes Feb 9, 2023
@tldahlgren
Copy link
Copy Markdown
Contributor

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Feb 10, 2023

I've started that pipeline for you!

@hyoklee
Copy link
Copy Markdown
Contributor

hyoklee commented Feb 13, 2023

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Feb 13, 2023

I've started that pipeline for you!

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

Pipeline errors seem to be related to WRF checksums that should have been fixed by #35401.

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Feb 14, 2023

I've started that pipeline for you!

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

The latest pipeline still has wrf errors and an unrelated pfunit error.

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Feb 16, 2023

I've started that pipeline for you!

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

@tldahlgren : Rebasing onto develop fixed the wrf errors, but now there are new errors pertaining to meson/py-docutils/parmets/vtk-m/.... While there are some build errors labelled as hdf5 build errors, the logs show that the errors occurred when building cmake (log).

@tldahlgren
Copy link
Copy Markdown
Contributor

We're having problems with the pipeline builds. Hopefully this gets resolved soon as the ci/gitlab-ci checks are required.

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Feb 22, 2023

I've started that pipeline for you!

@lrknox
Copy link
Copy Markdown
Contributor

lrknox commented Feb 22, 2023

We're having problems with the pipeline builds. Hopefully this gets resolved soon as the ci/gitlab-ci checks are required.

I saw this warning and error in an HDF5-1.14.0 build for

We're having problems with the pipeline builds. Hopefully this gets resolved soon as the ci/gitlab-ci checks are required.

I saw this warning and error in an HDF5 1.14.0 build for radius-aws-aarch64-build:
==> Warning: 'target=graviton2' is deprecated, use 'target=neoverse_n1' instead
==> Error: Unable to copy files (/tmp/root/spack-stage/spack-stage-hdf5-1.14.0-753ppmfvdqa6dcskvcahonprbmjp3ytj/spack-build-out.txt) to artifacts /builds/spack/spack/jobs_scratch_dir/logs due to exception: No such file or directory: '/tmp/root/spack-stage/spack-stage-hdf5-1.14.0-753ppmfvdqa6dcskvcahonprbmjp3ytj/spack-build-out.txt'

The warning message seems to appear in many of the failing builds in the pipeline. I also noticed in some recent builds on one of our local machines that spack install --keep-stage doesn't seem to be putting anything in the stage directory lately. I don't know what should be doen to change from graviton2 to neovers_n1, though.

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

The issue with that particular CI run seems to be a build error for cmake : https://gitlab.spack.io/spack/spack/-/jobs/5732974#L220.

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

Can this be merged now @alalazo / @eugeneswalker ?

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Mar 8, 2023

@s-sajid-ali I think this PR is just waiting for CI to finish successfully. Then it'll be auto-merged.

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Mar 8, 2023

I've started that pipeline for you!

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Mar 8, 2023

@s-sajid-ali Just so you know: by restarting an ongoing CI, you probably gained a longer waiting time 😬 (every pipeline that wasn't finished restarted the same builds from scratch).

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

s-sajid-ali commented Mar 8, 2023

@alalazo Thanks for the information.

The new CI pipeline has errors for ml-linux-x86_64-cpu-pr-build/ml-linux-x86_64-cuda-pr-build , for py-tensorflow which seems unrelated to this PR. Is this a known issue?

@tldahlgren
Copy link
Copy Markdown
Contributor

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Mar 9, 2023

I've started that pipeline for you!

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Mar 10, 2023

I've started that pipeline for you!

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

s-sajid-ali commented Mar 10, 2023

Current CI failures are (per this run):

  • e4s-power-pr-build:

      Created fresh repository.
      fatal: unable to access 'https://gitlab.spack.io/spack/spack.git/': Could not resolve host: gitlab.spack.io
      Uploading artifacts for failed job 00:02 
      Uploading artifacts...
      WARNING: jobs_scratch_dir/logs: no matching files   
      WARNING: jobs_scratch_dir/reproduction: no matching files 
      WARNING: jobs_scratch_dir/tests: no matching files 
      WARNING: jobs_scratch_dir/user_data: no matching files 
      ERROR: No files to upload                          
      Cleaning up project directory and file based variables 00:01
      ERROR: Job failed: exit code 1
    
    • warpx/yi62tyc 23.03 [email protected] linux-ubuntu20.04-ppc64le E4S Power
      Reason: Same as above
    • dray/eg4gdf4 0.1.8 [email protected] linux-ubuntu20.04-ppc64le E4S Power
      Reason: Same as above
  • ml-linux-x86_64-cpu-pr-build:

    • py-tensorflow/sbcleax 2.10.1 [email protected] linux-amzn2-x86_64_v3 Machine Learning
      Reason:
    # Configuration: 70d5859fdef852abbc7e778d3e97b9f1bece7a530d593fd25d634060c4c992fa
    # Execution platform: @local_execution_config_platform//:platform
    gcc: internal compiler error: Killed (program cc1plus)
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <http://bugzilla.redhat.com/bugzilla> for instructions.
    Target //tensorflow/tools/pip_package:build_pip_package failed to build
    INFO: Elapsed time: 2342.864s, Critical Path: 334.01s
    INFO: 9650 processes: 1140 internal, 8510 local.
    FAILED: Build did NOT complete successfully
    
  • ml-linux-x86_64-cuda-pr-build:

    • py-tensorflow/7rm7vw4 2.10.1 [email protected] linux-amzn2-x86_64_v3 Machine Learning
      Reason:
    # Configuration: d236e8123202fbb1dc3edda1c1dfc704400c275ba451bbe641d3e897f49f1c21
    # Execution platform: @local_execution_config_platform//:platform
    
     Server terminated abruptly (error code: 14, error message: 'Connection reset by peer', log file: '/tmp/spackq_v06wxb/b79c0385198eb909c6e134607174ccf8/server/jvm.out')
    
    

The py-tensorflow builds are being repeated in the CI after the first set of failures.

@eugeneswalker
Copy link
Copy Markdown
Contributor

The power failures were a node issue, it has been fixed now. Re-running pipeline and they will finish: @spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Mar 10, 2023

I've started that pipeline for you!

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

s-sajid-ali commented Mar 10, 2023

Thanks @eugeneswalker !

The following failures with py-tensorflow remain ( from the latest run):

  • ml-linux-x86_64-cpu-pr-build:

    • py-tensorflow/sbcleax 2.10.1 [email protected] linux-amzn2-x86_64_v3 Machine Learning
      Reason:
      # Configuration: f8b41c6b368834d97a9badbd57c6bd2ce355775d99d313e59e55944349742ade
      # Execution platform: @local_execution_config_platform//:platform
      gcc: internal compiler error: Killed (program cc1plus) 
      Please submit a full bug report,
      with preprocessed source if appropriate.
      See <http://bugzilla.redhat.com/bugzilla> for instructions.
      Target //tensorflow/tools/pip_package:build_pip_package failed to build
      INFO: Elapsed time: 2220.264s, Critical Path: 707.33s
      INFO: 8251 processes: 1132 internal, 7119 local.
      FAILED: Build did NOT complete successfully
      
  • ml-linux-x86_64-cuda-pr-build:

    • py-tensorflow/7rm7vw4 2.10.1 [email protected] linux-amzn2-x86_64_v3 Machine Learning
      Reason:
      # Configuration: 93b58564f565042f4c7209d19a3417f306b5df5b1d476b814880e5058b3e09e2
      # Execution platform: @local_execution_config_platform//:platform
      gcc: internal compiler error: Killed (program cc1plus)
      Please submit a full bug report,
      with preprocessed source if appropriate.
      See <http://bugzilla.redhat.com/bugzilla> for instructions.
      Target //tensorflow/tools/pip_package:build_pip_package failed to build
      INFO: Elapsed time: 4797.839s, Critical Path: 771.12s
      INFO: 29045 processes: 9590 internal, 19455 local.
      FAILED: Build did NOT complete successfully
      

Pinging py-tensorflow maintainers regarding the CI failures: @adamjstewart / @aweits / @pradyunsg

@kwryankrattiger
Copy link
Copy Markdown
Contributor

These should be fixed with #35996

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Mar 13, 2023

I'm sorry, gitlab does not have your latest revision yet, I can't run that pipeline for you right now.

One likely possibility is that your PR pipeline has been temporarily deferred, in which case, it is awaiting a develop pipeline, and will be run when that finishes.

Please check the gitlab commit status message to see if more information is available.

Details
pr head: a31d005, gitlab commit parents: ['a5b8066', 'a8bfb0f']

@s-sajid-ali
Copy link
Copy Markdown
Contributor Author

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Mar 14, 2023

I've started that pipeline for you!

@tldahlgren tldahlgren merged commit 9a12540 into spack:develop Mar 14, 2023
@s-sajid-ali s-sajid-ali deleted the sajid/hdf5_mpi_fortran branch March 15, 2023 16:44
cnegre pushed a commit to cnegre/spack that referenced this pull request Mar 20, 2023
* HDF5+mpi~fortran
* fix style
jmcarcell pushed a commit to key4hep/spack that referenced this pull request Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Installation issue: HDF5+mpi~fortran requires a fortran compiler

9 participants