Add new TensorFlow package by dodo47 · Pull Request #13112 · spack/spack

dodo47 · 2019-10-09T19:57:11Z

As @healther promised in #9352 here is the current state of our tf-package.

This is essentially a cherry-pick onto the current HEAD and we did exactly one test build which is still running right now (@healther will report whether it worked), so use with caution. For example, it explicitly breaks the air-gapped-ability of spack, as bazel insists on downloading its own stuff.

All versions I added worked at some point, but I didn't retest them. I hope it is helpful :)

Closes #9352
Closes #3244
Closes #2043

Sinan81 · 2019-10-10T00:08:19Z

Thank you all for creating this PR!

I gave it a try as soon as I saw it. Now bazel 0.19.0 and rest of the dependencies build just fine. However, I am getting the following error during tensorflow build:

==> Installing tensorflow
==> Searching for binary cache of tensorflow
==> Warning: No Spack mirrors are currently configured
==> No binary for tensorflow found: installing from source
==> Error: AttributeError: module 'spack.pkg.builtin.tensorflow' has no attribute 'make_jobs'

/mnt/local/spack/var/spack/repos/builtin/packages/bazel/package.py:100, in setup_dependent_package:
         97        dependent_module = inspect.getmodule(dependent_spec.package)
         98        if not dependent_spec.package.parallel:
         99            jobs = 1
  >>    100        elif dependent_module.make_jobs:
        101            jobs = dependent_module.make_jobs
        102        module.bazel = BazelExecutable('bazel', 'build', jobs)

Wondering if there is simple fix for it.

Sinan81 · 2019-10-10T00:23:29Z

I guess the above error was reported before at #11850

Sinan81 · 2019-10-10T00:52:10Z

Commented out some lines, and the build process went beyond the above mentioned error:

$ git diff
diff --git a/var/spack/repos/builtin/packages/bazel/package.py b/var/spack/repos/builtin/packages/bazel/package.py
index c2e79af..2e4ff18 100644
--- a/var/spack/repos/builtin/packages/bazel/package.py
+++ b/var/spack/repos/builtin/packages/bazel/package.py
@@ -95,8 +95,9 @@ class Bazel(Package):
 
         jobs = cpu_count()
         dependent_module = inspect.getmodule(dependent_spec.package)
-        if not dependent_spec.package.parallel:
-            jobs = 1
-        elif dependent_module.make_jobs:
-            jobs = dependent_module.make_jobs
+#        if not dependent_spec.package.parallel:
+#            jobs = 1
+#        elif dependent_module.make_jobs:
+#            jobs = dependent_module.make_jobs
+#        module.bazel = BazelExecutable('bazel', 'build')
         module.bazel = BazelExecutable('bazel', 'build', jobs)

However, a new error is showing up now:

...
     56    INFO: Analysed target //tensorflow/tools/pip_package:build_pip_package (34
           0 packages loaded, 15095 targets configured).
     57    INFO: Found 1 target...
     58    [0 / 14] [-----] BazelWorkspaceStatusAction stable-status.txt ... (2 actio
           ns, 0 running)
     59    ERROR: /tmp/spack/tf/_bazel_sbulut/2289c2ebad06feb964ca6be8c038c7d3/extern
           al/com_google_absl/absl/numeric/BUILD.bazel:25:1: C++ compilation of rule 
           '@com_google_absl//absl/numeric:int128' failed (Exit 1)
     60    Spack compiler must be run from Spack! Input 'SPACK_TARGET_ARGS' is missin
           g.
     61    Target //tensorflow/tools/pip_package:build_pip_package failed to build

See build log for details:
  /cache/sbulut/spack-stage/tensorflow-1.13.1-yre37hoxr22737q3tfe5g35kbkns2jro/spack-build-out.txt

healther · 2019-10-10T07:01:21Z

I'm still in the process of doing a test build, so I haven't hit your errors yet (some flaky downloader failed over night...)

There is at least one additional "problematic" hack that I had to introduce in the bazel package
(which from a curiously glance is the same problem that @Sinan81 found within the tensorflow build). @adamjstewart do you know the inner workings of the compiler wrappers (or who knows these magic things :)).
This behaviour must have been introduced in the last 2-3 months as our old merge works without that bypass, but it feels like this should hit everything that is build with bazel (including bazel itself)

@Sinan81 can you see if you can spin up a patch that circumvents the spack compiler wrappers for tensorflow (similarly to what I did for bazel)?

In general before merging this PR we need to at least come to some decision for the following points:

How do we communicate to users, that tensorflow needs to be build with internet access?
How do we communicate that this means that the same hash is not guaranteed to provide the same tensorflow?
How do we deal with the "bazel breaks the compiler wrapper" thing? Is my workaround good enough? Should this be considered a bug in spack?
Decide what to do about the history of this package, I'm not sure whether old versions still build
Fix flake8 errors

I think 1-3 are the more important ones. If someone urgently needs TF he/she can use this PR and by doing so should be cautioned that weird things are done

var/spack/repos/builtin/packages/bazel/package.py

healther · 2019-10-10T13:11:58Z

@Sinan81 sorry for the noise of the deleted comment (in case you got an email or such), our "solution" to your problem was a bit less drastic in that we left the option to say "not parallel", but if I understand @alalazo from #11850 correctly, then we should manually set the '--jobs={}'.format(make_jobs) argument in the bazel() call of the tensorflow package. I'm still trying to get one build to succeed though

adamjstewart · 2019-10-10T17:14:59Z

Feel free to add me as a maintainer:

maintainers = ['adamjstewart']

I'm more of a PyTorch person myself, but I'm maintaining all of the other ML/DL packages in Spack so I don't mind helping out with this one as well.

healther · 2019-10-11T14:27:57Z

I cleaned up the flake8 issues and the bazel callable. This version builds as is for me, please check and review

Sinan81 · 2019-10-11T22:17:38Z

I confirm that [email protected] and [email protected] build just fine! Thanks @healther I should mention though [email protected] didn't build with [email protected]_202-b08 rather it builds with [email protected]_202

Sinan81 · 2019-10-11T23:36:57Z

While compiling 2.0.0-alpha0, I am getting the following error:

     30    $TEST_TMPDIR defined: output root default is '/tmp/spack/tf' and max_idle_secs default is '15'.
     31    Starting local Bazel server and connecting to it...
     32    Loading:
     33    Loading: 0 packages loaded
  >> 34    ERROR: error loading package '': Encountered error while reading extension file 'swift/repositories.bzl': no such package '@@build_bazel_rules_swif
           t//swift': Traceback (most recent call last):
     35         File "/tmp/spack/tf/_bazel_sbulut/cfafda2acdfc5895b3378343fe1e417e/external/bazel_tools/tools/build_defs/repo/git.bzl", line 166
     36                 _clone_or_update(ctx)
     37         File "/tmp/spack/tf/_bazel_sbulut/cfafda2acdfc5895b3378343fe1e417e/external/bazel_tools/tools/build_defs/repo/git.bzl", line 72, in _clone_or_upd
           ate
     38                 fail(("error cloning %s:\n%s" % (ctx....)))
     39    error cloning build_bazel_rules_swift:
     40    + cd /tmp/spack/tf/_bazel_sbulut/cfafda2acdfc5895b3378343fe1e417e/external

     ...

healther · 2019-10-12T18:50:37Z

We never build against openjdk, so it is perfectly possible that it only works with the Oracle one. Maybe @dodo47 knows something about the state of @2:, but since we have a preferred statement on the 1.something I would guess that it didn't work and we said: good enough

Sinan81 · 2019-10-12T19:59:19Z

A month ago or so I managed to build [email protected] with [email protected] but something has changed since then and can't build with openjdk anymore. It is certainly good enough that we're able to build [email protected] one way or another.

Sinan81 · 2019-10-14T07:44:42Z

Actually, today I realized that so far i've been building tensorflow with default options, i.e. without cuda. When I try to build tensorflow+cuda, it gets stuck at the install phase. I will further look into this on Tuesday.

healther · 2019-10-14T11:00:52Z

Is there a generic way to go about rewriting the checksums to sha256?

@adamjstewart: I think except for the checksums, this is as good as I' going to clean it up on my own. I'd like to get your review, in particular regarding the env['CC']=env['SPACK_CC'] hack that works around the compiler wrappers being wrongly used by bazel

dodo47 · 2019-10-14T11:35:57Z

Maybe @dodo47 knows something about the state of @2:, but since we have a preferred statement on the 1.something I would guess that it didn't work and we said: good enough

Last time I tried, I only had to add the filter_file changes in the tensorflow package (if I remember correctly) for 2.0 to work, with Bazel 0.19. However, here https://www.tensorflow.org/install/source they state that 2.0 now builds with Bazel 0.26.1. I guess a wrong Bazel version might explain the error message?

Sinan81 · 2019-10-19T01:33:47Z

So today again I tried compiling tensorflow+cuda, this time on a different machine with multiple gpus as opposed to my workstation, to see if it makes difference, still the build gets stuck at install phase as follows. I am wondering if others also experienced same issue.

sbulut@compute-004 ~ 
$ spack -d install -v tensorflow+cuda ^[email protected]_202                                                                                                                         
==> [2019-10-18-18:18:46.142488] Reading config file /disk/software/lib/hpcpm/spdev/etc/spack/defaults/modules.yaml
...
==> [2019-10-18-18:19:19.164106] WRITE LOCK: /disk/software/lib/hpcpm/spdev/opt/spack/.spack-db/prefix_lock[104350938852925909:1] [Acquired]
==> [2019-10-18-18:19:19.168487] Executing phase: 'install'
==> [2019-10-18-18:19:19.178719] FILTER FILE: configure.py [replacing "if workspace_has_any_android_rule\(\)"]
==> [2019-10-18-18:19:19.184351] './configure'
$TEST_TMPDIR defined: output root default is '/tmp/spack/tf' and max_idle_secs default is '15'.
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".

healther · 2019-10-19T06:56:01Z

The warning is completely normal (i.e. in all our builds it gets raised) and should not be the reason why it locks up. Unfortunately you would have to modify the bazel invocation to get it to be verbose too, i.e. spack install -v does not produce additional output for bazel runs. I seem to remember (hazily) deadlocks when invoking bazel due to the build directory being located on a network drive (the reason why we put the /tmp/... as a hardcoded value)

Sinan81 · 2019-10-20T06:15:36Z

@healther I am wondering if you might want to add the followings to the package file for the sake of completeness:

version('2.0.0',  sha256='49b5f0495cd681cbcb5296a4476853d4aea19a43bdd9f179c928a977308a0617')
version('1.15.0',  sha256='a5d49c00a175a61da7431a9b289747d62339be9cf37600330ad63b611f7f5dc9')
version('1.14.0',  sha256='aa2a6a1daafa3af66807cfe0bc77bfe1144a9a53df9a96bab52e3e575b3047ed')

depends_on('[email protected]',        type='build', when='@1.14.0')
depends_on('[email protected]:0.26.1', type='build', when='@1.15.0,2.0.0')

P.S. still working on compiling +cuda variant. Will let you know when I get more information about the build problem.

Sinan81 · 2019-10-20T06:30:31Z

For future reference, I should also mention that I have been playing around with compiling tensorflow~cuda versions 1.14.0, 1.15.0, and 2.0.0 using externally installed [email protected], and I've found out the followings:

versions 1.14.0, 1.15.0, and 2.0.0 depend on py-future package, otherwise build fails with ImportError: No module named builtins. Specifically, I added the following line to the package file to take care of this error. Tested this only for [email protected] so far.

depends_on('[email protected]:',   type=('build', 'run'), when='@1.14.0:')

v1.14.0 builds successfully with [email protected] as suggested by tensorflow tested build configurations
finally, versions 1.15.0 and 2.0.0 require the following patch, otherwise bazel will yield errors about io_bazel_rules_docker etc

diff --git a/WORKSPACE b/WORKSPACE
index 74ea14d..0b09a6e 100644
--- a/WORKSPACE
+++ b/WORKSPACE
@@ -34,6 +34,13 @@ load(
 
 bazel_toolchains_repositories()
 
+http_archive(
+    name = "io_bazel_rules_docker",
+    sha256 = "413bb1ec0895a8d3249a01edf24b82fd06af3c8633c9fb833a0cb1d4b234d46d",
+    strip_prefix = "rules_docker-0.12.0",
+    urls = ["https://github.com/bazelbuild/rules_docker/releases/download/v0.12.0/rules_docker-v0.12.0.tar.gz"],
+)
+
 load(
     "@io_bazel_rules_docker//repositories:repositories.bzl",
     container_repositories = "repositories",

Sinan81 · 2019-10-20T22:01:24Z

So, it seems tensorflow+cuda configuration was getting stuck at nccl step. A quick fix for it is to set:

            env['TF_NCCL_VERSION'] = "1"

which will later on trigger automatic downloading of nccl@1 from a github.

Or else we should add nccl dependency and do

...
    depends_on('nccl', when='+cuda')
...
            env['TF_NCCL_VERSION'] = str(spec['nccl'].version)

Furthermore, the following nccl paths in .tf_configure.bazelrc need to be corrected after configure step via file filtering since configure script doesn't extract these paths from environment variables:

TF_NCCL_INSTALL_PATH = str(spec['nccl'].prefix)
TF_NCCL_HDR_PATH = str(spec['nccl'].prefix.include)

After nccl setup, there are a few more inputs need. So I also added the followings to the package file, and now it proceeds to the build step:

            env['TF_CUDA_CLANG'] = '0'
            env['TF_NEED_ROCM'] = '0'
            env['TF_NEED_TENSORRT'] = '0'

Aside: I think it might make sense to make compute_capability a multi valued variant of the package. (or is it possible to indicate as many capabilities as we need?)

Finally, so far I haven't managed to build +cuda variant. Speaking of which, it might be necessary to add '--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"' to the build command

            bazel('--jobs={}'.format(make_jobs), '-c', 'opt', '--config=cuda', '--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"', '//tensorflow/tools/pip_package:build_pip_package')

for gcc > 5 as suggested by TF developers.

healther · 2019-10-21T10:04:24Z

@Sinan81 I updated the PR with all of your suggestions except for the last one, where I didn't understand the .tf_configure.bazelrc reference. Truth be told: I'd prefer to merge this and have you do the changes directly.

@adamjstewart I suggest rebasing this, adding a conflicts('+cuda') with a reference to this discussion and merging.

edit: For some reason my push failed -> will report back when I fixed that up

var/spack/repos/builtin/packages/tensorflow/url-zlib.patch

adamjstewart · 2019-10-21T16:09:05Z

This PR has some conflicts, probably due to the recent switch from MD5 to SHA256. You'll need to update your new versions to SHA256 as well.

Sinan81 · 2019-10-25T21:36:02Z

In compiling +cuda variant, I am getting the following error during build step:

...
     63    ERROR: /tmp/spack/tf/_bazel_sbulut/cd57fa472520cfea2b78ad111fb0dd88/external/fft2d/BUILD.bazel:26:1: C++ compilation of rule '@fft2d//:fft2d' failed (Exit 1)
     64    Spack compiler must be run from Spack! Input 'SPACK_TARGET_ARGS' is missing.
     65    Target //tensorflow/tools/pip_package:build_pip_package failed to build

if I manually overwrite GCC_HOST_COMPILER_PATH="/mnt/local/spack/lib/spack/env/gcc/gcc" with GCC_HOST_COMPILER_PATH="/path/to/actual/gcc/bin/gcc", the build seems to proceed further ahead. I am wondering if any of you is familiar with this error.

P.S. @healther I agree that the package should be merged as soon as possible so that others can start contributing directly.

healther · 2019-10-26T06:56:14Z

Try setting env['CC'] = env['SPACK_CC'] and env['CC'] = env['SPACK_CXX'], this is due to bazel not being correctly patched and when bazel is invoked the SPACK_* variables aren't set, so the compiler wrappers aren't working

Sinan81 · 2019-10-26T09:14:00Z

Try setting env['CC'] = env['SPACK_CC'] and env['CXX'] = env['SPACK_CXX'], this is due to bazel not being correctly patched and when bazel is invoked the SPACK_* variables aren't set, so the compiler wrappers aren't working

Still getting same error :(

    65    INFO: Analysed target //tensorflow/tools/pip_package:build_pip_pack
           age (345 packages loaded, 15553 targets configured).
     66    INFO: Found 1 target...
     67    [0 / 10] [-----] BazelWorkspaceStatusAction stable-status.txt ... (
           2 actions, 0 running)
     68    ERROR: /mnt/local/cache/sbulut/spack-stage/tensorflow-1.13.1-xbfnox
           qivyjwnjgbjvbjfrhif3rxyhgh/spack-src/tensorflow/core/BUILD:310:1: C
           ++ compilation of rule '//tensorflow/core:platform_base' failed (Ex
           it 1)
     69    Spack compiler must be run from Spack! Input 'SPACK_TARGET_ARGS' is
            missing.
     70    Target //tensorflow/tools/pip_package:build_pip_package failed to b
           uild

Sinan81 · 2019-10-30T10:49:11Z

Finally managed to build +cuda variant using spack! There were three key steps in building +cuda variant:

set nccl version, and fix nccl paths after config step
set LD_LIBRARY_PATH in .tf_configure.bazelrc (so that bazel can detect libcudnn)
use self.compiler.cc as GCC_HOST_COMPILER as opposed to SPACK_CC

To be specific, I built v1.13.1 with the following specific versions of dependencies:

[email protected]
[email protected] (0.19.0 might also work, didn't try it yet)
[email protected]_202
[email protected]
[email protected]

var/spack/repos/builtin/packages/py-tensorflow/package.py

adamjstewart · 2019-12-04T23:39:43Z

var/spack/repos/builtin/packages/py-tensorflow/package.py

+            # only supports compute capabilities >= 3.5
+            if spec.variants['cuda_arch'].value != 'none':
+                capabilities = ','.join('{0:.1f}'.format(
+                    float(i) / 10.0) for i in spec.variants['cuda_arch'].value)


@Sinan81 I think this string formatting will be much more stable so that you don't get weird things like '6.499999999' instead of '6.5'.

What should we put if cuda_arch=none? I'm trying to build on a local cluster. CUDA is installed, but I don't see any NVIDIA GPU on the cluster. I want to test anyway, but I don't know what my CUDA capabilities are.

adamjstewart · 2019-12-05T00:13:37Z

var/spack/repos/builtin/packages/py-tensorflow/package.py

+        env.set('TEST_TMPDIR', tmp_path)
+        # TODO: Is setting this necessary? It breaks `spack build-env`
+        # because Bash can't find my .bashrc
+        env.set('HOME', tmp_path)


It looks like bazel has a --output_user_root option you can use to change where things are written instead of $HOME.

See what I did to the bazel package for an example of its usage.

* use bazel commit in spack#13112, and add version 0.24.1, and corresponding cc_env patch * undo preferred java version by dodo47 * patch for v0.26 * Update install steps * Add patches for more versions * Add unit tests * Update patches for new Spack env vars * env is already defined, use spackEnv

adamjstewart · 2019-12-05T16:27:58Z

Okay, I successfully installed [email protected]~cuda~nccl+gcp %[email protected] platform=darwin! Unfortunately it takes 6.5 hrs to install, at least on my circa-2013 MBP. Can't wait for binary caches to become available for everything.

@Sinan81 @healther how close do you think we are to merging this PR? I haven't yet managed to test Linux due to some numpy build problems. I would like to make sure that [email protected]+cuda+nccl %gcc platform=linux builds before merging. Also, if anyone can figure out my ~gcp build problems let me know, otherwise I might open an issue with the TF devs.

Sinan81 · 2019-12-05T19:14:13Z

Last time I tried compiling v2, I got the error mentioned the in the ticket you opened above. I should mention though v2.1 always compiled fine. I will give v2 another try today. On a 16 physical cpu machine, build takes 2 hours as v1.14 while it takes only 0.5 hours up to v1.13.
I do think that we're very close to merging this PR!

adamjstewart · 2019-12-05T19:58:15Z

I decided to enable +gcp by default until we figure out why it isn't working.

healther · 2019-12-05T20:40:40Z

how close do you think we are to merging this PR?

I'm a bit out of the loop of our current CI status. But Iiuc the last version that we checked out of this PR, TF didn't build. But it is likely that this was 2.0, I can do a test build on our system tomorrow. In general I think merging this should be a priority, even if we have to paper over some cases with conflicts statements in the near future (as we become aware of other people's build problems). But at least we will get that feedback.

Maybe @obreitwi can summarise what the current status of our CI is?

Sinan81 · 2019-12-05T20:48:21Z

While building v2.0.0 on Linux, I am getting the following error:

WARNING: /mnt/local/cache/sbulut/spack-stage/spack-stage-py-tensorflow-2.0.0-ntp6zs7vrfmej3zygnanatx7xmmgwimt/spack-src/tensorflow/python/BUILD:3693:1: in py_library rule //tensorflow/python:standard_ops: target '//tensorflow/python:standard_ops' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and will be removed by early 2019. You should update all usage of `tf.distributions` to `tfp.distributions`.
WARNING: /mnt/local/cache/sbulut/spack-stage/spack-stage-py-tensorflow-2.0.0-ntp6zs7vrfmej3zygnanatx7xmmgwimt/spack-src/tensorflow/python/BUILD:86:1: in py_library rule //tensorflow/python:no_contrib: target '//tensorflow/python:no_contrib' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and will be removed by early 2019. You should update all usage of `tf.distributions` to `tfp.distributions`.
INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (275 packages loaded, 20934 targets configured).
INFO: Found 1 target...
[0 / 23] [Prepa] Creating source manifest for //tensorflow/tools/compatibility:tf_upgrade_v2 ... (8 actions, 0 running)
ERROR: /tmp/spack/tf/_bazel_sbulut/fd9e55f784f5ddec7f40495f8db3ac70/external/com_google_protobuf/BUILD:687:1: undeclared inclusion(s) in rule '@com_google_protobuf//:python/google/protobuf/internal/_api_implementation.so':
this rule is missing dependency declarations for the following files included by 'external/com_google_protobuf/python/google/protobuf/internal/api_implementation.cc':
  '/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/python-3.7.4-g2foj2ak4eyf7m4gt6lbu6q7xslvu74v/include/python3.7m/Python.h'
  '/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/python-3.7.4-g2foj2ak4eyf7m4gt6lbu6q7xslvu74v/include/python3.7m/patchlevel.h'
  '/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/python-3.7.4-g2foj2ak4eyf7m4gt6lbu6q7xslvu74v/include/python3.7m/pyconfig.h'
  '/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/python-3.7.4-g2foj2ak4eyf7m4gt6lbu6q7xslvu74v/include/python3.7m/pymacconfig.h'
  '/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/python-3.7.4-g2foj2ak4eyf7m4gt6lbu6q7xslvu74v/include/python3.7m/pyport.h'

Time permitting I will look into it today.

Sinan81 · 2019-12-05T21:01:20Z

After clearing bazel cache, I am getting the following error:

INFO: Found 1 target...
[0 / 13] [Prepa] BazelWorkspaceStatusAction stable-status.txt ... (6 actions, 0 running)
ERROR: /mnt/local/cache/spack/tf/_bazel_sbulut/fd9e55f784f5ddec7f40495f8db3ac70/external/nasm/BUILD.bazel:8:1: C++ compilation of rule '@nasm//:nasm' failed (Exit 1)
In file included from external/nasm/output/outaout.c:52:0:
/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/python-3.7.4-g2foj2ak4eyf7m4gt6lbu6q7xslvu74v/include/python3.7m/eval.h:10:12: error: unknown type name 'PyObject'
 PyAPI_FUNC(PyObject *) PyEval_EvalCode(PyObject *, PyObject *, PyObject *);
            ^~~~~~~~
/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/python-3.7.4-g2foj2ak4eyf7m4gt6lbu6q7xslvu74v/include/python3.7m/eval.h:12:12: error: unknown type name 'PyObject'
 PyAPI_FUNC(PyObject *) PyEval_EvalCodeEx(PyObject *co,
            ^~~~~~~~
/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/python-3.7.4-g2foj2ak4eyf7m4gt6lbu6q7xslvu74v/include/python3.7m/eval.h:21:12: error: unknown type name 'PyObject'
 PyAPI_FUNC(PyObject *) _PyEval_EvalCodeWithName(
            ^~~~~~~~
/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/python-3.7.4-g2foj2ak4eyf7m4gt6lbu6q7xslvu74v/include/python3.7m/eval.h:31:12: error: unknown type name 'PyObject'
 PyAPI_FUNC(PyObject *) _PyEval_CallTracing(PyObject *func, PyObject *args);
            ^~~~~~~~
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 16.158s, Critical Path: 0.55s
INFO: 13 processes: 13 local.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully

Sinan81 · 2019-12-06T00:43:04Z

When I try to build with [email protected] (as suggested in TF docs), the error is

INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (275 packages loaded, 20934 targets configured).
INFO: Found 1 target...
[0 / 16] [Prepa] Expanding template tensorflow/python/tools/freeze_graph ... (2 actions, 0 running)
ERROR: /mnt/local/cache/spack/tf/_bazel_sbulut/7bd192b021ed9cc1727ca7bd9e51e3e6/external/com_google_protobuf/BUILD:106:1: undeclared inclusion(s) in rule '@com_google_protobuf//:protobuf_lite':
this rule is missing dependency declarations for the following files included by 'external/com_google_protobuf/src/google/protobuf/stubs/int128.cc':
  '/disk/software/lib/hpcpm/sptfdev/opt/spack/linux-centos7-broadwell/gcc-7.4.0/gettext-0.20.1-kywjimcikgjvmssndclpfpiiax6fcvgm/include/libintl.h'
Target //tensorflow/tools/pip_package:build_pip_package failed to build

adamjstewart · 2019-12-06T00:44:36Z

Your first error looks like a Cython thing. You could try adding a build dependency on py-cython.

Your second error looks like a Bazel thing, which is far out of my area of expertise.

adamjstewart · 2019-12-06T00:48:13Z

I think I'm going to merge this PR as is, since it seems like we've gotten 2.0 to work on macOS and 1.14/1.13 to work on Linux. I think this is a pretty good start, and we can keep hacking on the package in smaller PRs to get individual versions/environments working. Thanks for the hard work everyone!

healther · 2019-12-06T10:09:29Z

@adamjstewart do you have an idea where we could visibly say: "This is an experimental package, it is likely not everything works. Please let us know and we will add the appropriate conflicts()"?

We should probably have something as a general note on the terminal in case of a build-fail, but especially for TF (if people use the package from spack) we will run into loads of different setups...

edit: doing a test build either now or over the weekend and will report back

adamjstewart · 2019-12-06T17:01:17Z

@healther I don’t think there’s any way to do this right now. Hopefully the post on Slack will reach enough people

The tensorflow package stopped building with the latest upstream merge and will hence be disabled until spack#13112 is merged. Change-Id: I713c2352119e6fd931ede64a2e5b491e7618f9ec

This comment has been minimized.

Sign in to view

adamjstewart added the new-package label Oct 10, 2019

jcftang reviewed Oct 10, 2019

View reviewed changes

var/spack/repos/builtin/packages/bazel/package.py Outdated Show resolved Hide resolved

var/spack/repos/builtin/packages/bazel/package.py Outdated Show resolved Hide resolved

healther requested a review from adamjstewart October 14, 2019 11:01

adamjstewart reviewed Oct 21, 2019

View reviewed changes

var/spack/repos/builtin/packages/tensorflow/url-zlib.patch Show resolved Hide resolved

adamjstewart reviewed Dec 4, 2019

View reviewed changes

var/spack/repos/builtin/packages/py-tensorflow/package.py Show resolved Hide resolved

adamjstewart added 4 commits December 4, 2019 16:46

Revert to v2

3cbd246

Remove extraneous period

2864d35

Add new version of jdk/openjdk

d4c4f2f

More stable cuda_arch formatting

bbb9a30

adamjstewart reviewed Dec 4, 2019

View reviewed changes

adamjstewart reviewed Dec 5, 2019

View reviewed changes

adamjstewart added 2 commits December 4, 2019 18:58

Fix bazel unit tests

a01c782

Fix symlinking

5ba8dd9

Fix unit tests

e5d1c24

adamjstewart mentioned this pull request Dec 5, 2019

Illegal ambiguous match on configurable attribute "deps" in //tensorflow/core:ops tensorflow/tensorflow#34878

Closed

+gcp by default until build error figured out

efcd036

adamjstewart mentioned this pull request Dec 5, 2019

Add "same_version_as"-directive to fix py-protobuf dependencies #14002

Open

adamjstewart merged commit c3eafde into spack:develop Dec 6, 2019

adamjstewart deleted the PR/tensorflow branch December 6, 2019 00:48

tgamblin mentioned this pull request Dec 9, 2019

What is the status of tensorflow on spack? #9352

Closed

pat-s mentioned this pull request Jan 13, 2020

Installation issue: py-tensorflow #14488

Closed

Conversation

dodo47 commented Oct 9, 2019 • edited by adamjstewart Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sinan81 commented Oct 10, 2019

Uh oh!

Sinan81 commented Oct 10, 2019

Uh oh!

Sinan81 commented Oct 10, 2019

Uh oh!

This comment has been minimized.

healther commented Oct 10, 2019

Uh oh!

Uh oh!

Uh oh!

healther commented Oct 10, 2019

Uh oh!

adamjstewart commented Oct 10, 2019

Uh oh!

healther commented Oct 11, 2019

Uh oh!

Sinan81 commented Oct 11, 2019

Uh oh!

Sinan81 commented Oct 11, 2019

Uh oh!

healther commented Oct 12, 2019

Uh oh!

Sinan81 commented Oct 12, 2019

Uh oh!

Sinan81 commented Oct 14, 2019

Uh oh!

healther commented Oct 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dodo47 commented Oct 14, 2019

Uh oh!

Sinan81 commented Oct 19, 2019

Uh oh!

healther commented Oct 19, 2019

Uh oh!

Sinan81 commented Oct 20, 2019

Uh oh!

Sinan81 commented Oct 20, 2019

Uh oh!

Sinan81 commented Oct 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

healther commented Oct 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

adamjstewart commented Oct 21, 2019

Uh oh!

Sinan81 commented Oct 25, 2019

Uh oh!

healther commented Oct 26, 2019

Uh oh!

Sinan81 commented Oct 26, 2019

Uh oh!

Sinan81 commented Oct 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

adamjstewart Dec 4, 2019

Choose a reason for hiding this comment

Uh oh!

adamjstewart Dec 5, 2019

Choose a reason for hiding this comment

Uh oh!

adamjstewart Dec 5, 2019

Choose a reason for hiding this comment

Uh oh!

adamjstewart Dec 5, 2019

Choose a reason for hiding this comment

Uh oh!

adamjstewart commented Dec 5, 2019

Uh oh!

Sinan81 commented Dec 5, 2019

Uh oh!

adamjstewart commented Dec 5, 2019

dodo47 commented Oct 9, 2019 •

edited by adamjstewart

Loading

healther commented Oct 14, 2019 •

edited

Loading

Sinan81 commented Oct 20, 2019 •

edited

Loading

healther commented Oct 21, 2019 •

edited

Loading

Sinan81 commented Oct 30, 2019 •

edited

Loading

healther commented Dec 6, 2019 •

edited

Loading