Description of the problem / feature request:
I'm running some test of TensorFlow using bazel but on our multi-core POWER9 system it fails with e.g.
ERROR: /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/TensorFlow/tensorflow-r2.4/tensorflow/core/platform/BUILD:1142:11: failed (Exit 1): generate-xml.sh failed: error executing command
I.e. there is no good error message, it simply failed to execute that script which comes from the Bazel installation. I verified that the executed command (bazel -s) runs correctly and the script hence also exists
I even modified that script in the Bazel sources to print something at the start but that doesn't show up. So it seems that script is not (yet?) created when Bazel tries to execute it. I hence expect a race condition or something but am unable to verify this.
Any hints, ideas, ...?
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Sorry, only thing I have is the command I use to test TF:
bazel --output_base=/dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmptspeEg-bazel-tf/output_base --install_base=/dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmptspeEg-bazel-tf/output_base/inst_base --output_user_root=/dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmptspeEg-bazel-tf/output_user_root --host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m test --compilation_mode=opt --config=opt --subcommands --verbose_failures --config=noaws --jobs=64 --copt="-fPIC" --distinct_host_configuration=false --test_output=errors --local_test_jobs=1 --build_tests_only --test_tag_filters='-no_gpu,-no_oss,-oss_serial,-benchmark-test,-no_oss_py37,-v1only' --build_tag_filters='-no_gpu,-no_oss,-oss_serial,-benchmark-test,-no_oss_py37,-v1only' -- //tensorflow/core/... //tensorflow/cc/... //tensorflow/c/... -//tensorflow/core:example_java_proto -//tensorflow/core/example:example_protos_closure
What operating system are you running Bazel on?
RHEL 7.6
What's the output of bazel info release?
release 3.4.1- (@Non-Git)
If bazel info release returns "development version" or "(@Non-Git)", tell us how you built Bazel.
EXTRA_BAZEL_ARGS="--jobs=176 --host_javabase=@local_jdk//:jdk" ./compile.sh
Have you found anything relevant by searching the web?
No
Any other information, logs, or outputs that you want to share?
ERROR: /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/TensorFlow/tensorflow-r2.4/tensorflow/core/platform/BUILD:1142:11: failed (Exit 1): generate-xml.sh failed: error executing command
(cd /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmptspeEg-bazel-tf/output_base/execroot/org_tensorflow && \
exec env - \
PATH=/usr/bin:/bin \
TEST_BINARY=tensorflow/core/platform/platform_strings_test \
TEST_NAME=//tensorflow/core/platform:platform_strings_test \
TEST_SHARD_INDEX=0 \
TEST_TOTAL_SHARDS=0 \
external/bazel_tools/tools/test/generate-xml.sh bazel-out/ppc-opt/testlogs/tensorflow/core/platform/platform_strings_test/test.log bazel-out/ppc-opt/testlogs/tensorflow/core/platform/platform_strings_test/test.xml 0 0)
Execution platform: @local_execution_config_platform//:platform
Description of the problem / feature request:
I'm running some test of TensorFlow using bazel but on our multi-core POWER9 system it fails with e.g.
ERROR: /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/TensorFlow/tensorflow-r2.4/tensorflow/core/platform/BUILD:1142:11: failed (Exit 1): generate-xml.sh failed: error executing commandI.e. there is no good error message, it simply failed to execute that script which comes from the Bazel installation. I verified that the executed command (
bazel -s) runs correctly and the script hence also existsI even modified that script in the Bazel sources to print something at the start but that doesn't show up. So it seems that script is not (yet?) created when Bazel tries to execute it. I hence expect a race condition or something but am unable to verify this.
Any hints, ideas, ...?
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Sorry, only thing I have is the command I use to test TF:
What operating system are you running Bazel on?
RHEL 7.6
What's the output of
bazel info release?release 3.4.1- (@Non-Git)
If
bazel info releasereturns "development version" or "(@Non-Git)", tell us how you built Bazel.EXTRA_BAZEL_ARGS="--jobs=176 --host_javabase=@local_jdk//:jdk" ./compile.shHave you found anything relevant by searching the web?
No
Any other information, logs, or outputs that you want to share?