Skip to content

Add package tensorflow [draft]#2043

Closed
muffgaga wants to merge 1 commit intospack:developfrom
electronicvisions:Add_package_tensorflow
Closed

Add package tensorflow [draft]#2043
muffgaga wants to merge 1 commit intospack:developfrom
electronicvisions:Add_package_tensorflow

Conversation

@muffgaga
Copy link
Copy Markdown
Contributor

@muffgaga muffgaga commented Oct 18, 2016

Here's a preliminary PR for the tensorflow package (we're still working on the two non-default build variants).

Tensorflow is built using the bazel build tool.
Bazel pulls in dependencies (cf. WORKSPACE file) for the build process;
these dependencies are installed to the same prefix directory as tensorflow (i.e. similar to what python setuptools does).

Should we try to integrate all dependencies into spack first?

* FIXME fix cuda and test gcp build variants before merging
@muffgaga muffgaga force-pushed the Add_package_tensorflow branch from 9915702 to cfd347d Compare October 18, 2016 15:36
@muffgaga muffgaga changed the title Add package tensorflow [RFC] Add package tensorflow [draft] Oct 18, 2016
@adamjstewart
Copy link
Copy Markdown
Member

Bazel pulls in dependencies (cf. WORKSPACE file) for the build process;
these dependencies are installed to the same prefix directory as tensorflow (i.e. similar to what python setuptools does).

Should we try to integrate all dependencies into spack first?

The way that you need to think about things is that you can't assume Spack will always have internet access. Some clusters are behind firewalls or are completely offline. Spack allows users to create a mirror of fetched tarballs, copy that mirror to a different cluster, and install those packages. So all dependencies need to be added to Spack so that Bazel doesn't need to download anything.

@tgamblin
Copy link
Copy Markdown
Member

Also so that Spack can actually control what they're built with. Spack wants to be able to guarantee you that things are built with a consistent compiler stack. If bazel builds a bunch of stuff behind the scenes that Spack doesn't know about, Spack can't do this.

@adamjstewart
Copy link
Copy Markdown
Member

these dependencies are installed to the same prefix directory as tensorflow

@tgamblin Do you think this will be a problem?

@tgamblin
Copy link
Copy Markdown
Member

Can Bazel instead be configured to depend on external versions of the dependencies? If so, then no. If not, then yeah we probably have to think about that.

@adamjstewart
Copy link
Copy Markdown
Member

Bazel sounds very convenient, but it results in unrepeatable builds if we can't control or record how a package's dependencies are built. I think we'll have to add tensorflow's dependencies to Spack. Feel free to do that in this PR, it doesn't have to be a separate PR.

@muffgaga
Copy link
Copy Markdown
Contributor Author

muffgaga commented Oct 19, 2016

[consistent/reproducible build by tracking all deps via spack]

ACK, however in this case (for this tensorflow package):

  • no shared libraries are being built (see below); the downloaded source code is only used in some of the tensorflow compile units => it feels like package-internal code distributed over multiple tar.gz files... (which are downloaded by bazel, yes... see comment below)
  • bazel does not pull HEAD or branches but only specific versions (that's what I see in the log, I have to verify this in the manifestWORKSPACE files)

Have a look at this output:

# call ldd for each installed file, do some filtering on errors and count occurrences
$ find /path/to/spack/opt/linux-y/gcc-z/tensorflow-XYZ -exec ldd {} \; 2>&1 | egrep -v "not ((a dynamic executable)|(regular file))" | sed 's/(0x[0-9a-f]*)$//' | sort | uniq -c
     16     /lib64/ld-linux-x86-64.so.2 
     16     libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 
      1     libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 
     16     libgcc_s.so.1 => /path/to/spack/opt/linux-y/gcc-z/gcc-6.2.0-fhir7awimw3chugjsa25vrgn2xkf3lij/lib64/libgcc_s.so.1 
     16     libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 
     15     libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
     16     libstdc++.so.6 => /path/to/spack/opt/linux-y/gcc-z/gcc-6.2.0-fhir7awimw3chugjsa25vrgn2xkf3lij/lib64/libstdc++.so.6 
      1     libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 
     16     linux-vdso.so.1 

This looks quite reasonable for a spack package (e.g. bin/{easy_install,f2py,pbr,tensorflow,wheel} (mostly from py-package dependencies) and everything else under lib/python2.7/site-packages/:

$ find . -name "*.so"
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/factorization/python/ops/_factorization_ops.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/factorization/python/ops/_clustering_ops.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/layers/python/ops/_sparse_feature_cross_op.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/layers/python/ops/_bucketization_op.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/rnn/python/ops/_lstm_ops.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/tensor_forest/python/ops/_inference_ops.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/tensor_forest/python/ops/_training_ops.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/tensor_forest/data/_data_ops.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/ffmpeg/ffmpeg.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/metrics/python/ops/_set_ops.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/linear_optimizer/python/ops/_sdca_ops.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/quantization/kernels/_quantized_kernels.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/quantization/_quantized_ops.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/python/_pywrap_tensorflow.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/external/protobuf/internal/_api_implementation.so
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/external/protobuf/pyext/_message.so

In short: I think that the tensorflow package is being built in a reproducible/robust way.

[external dependencies in bazel]

This would a be a nice solution, but unfortunately I didn't find any options for this.
I'm not (yet) a bazel expert though...

[internet access]

ACK, this is a problem.
Could we (ab)use spack to call bazel fetch (some spack-package-fetch-specific hook)?

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Oct 19, 2016

I guess we could use bazel fetch as part of a resource directive and have that archive the bazel dependencies as resources for the package... that would get them mirrored properly. Is there a way to know what tarballs the bazel build needs a priori?

@muffgaga
Copy link
Copy Markdown
Contributor Author

muffgaga commented Oct 19, 2016

@tgamblin You mean after staging/extracting the tensorflow package?
The WORKSPACE file (and all loaded files, e.g. here tensorflow/workspace.bzl) specifies all dependencies in bazel. Bazel does not support transitive dependencies, so parsing this file (and all loaded ones) is sufficient (cf. docs).
However, simply grepping those files is not sufficient, e.g. in tensorflow/workspace.bzl:

eigen_version = "9e1b48c333aa"
# ...
url = "https://bitbucket.org/eigen/eigen/get/ + eigen_version + ".tar.gz",

Except for the one above, all other dependencies are specified as plain urls:

$ spack stage tensorflow && spack cd tensorflow
$ cat WORKSPACE tensorflow/workspace.bzl | sed -n 's/^\s*url = "\([^"]*\)",\?/\1/p' 
https://github.com/mbostock-bower/d3-bower/archive/v3.5.15.tar.gz
https://github.com/cpettitt/dagre/archive/v0.7.4.tar.gz
https://github.com/components/es6-promise/archive/v2.1.0.tar.gz
https://github.com/polymerelements/font-roboto/archive/v1.0.1.tar.gz
# ... and ~70 more...

If we can use the bazel build tool to extract these urls, we could add a patch to the bazel package providing some --print-urls-of-all-dependencies option for bazel fetch. This seems like a reasonable extension to spack's bazel 😁 .
Otherwise, we could do some very basic variable expansion and do grep load WORKSPACE to find all loaded files, then for file in WORKSPACE $ALL_LOADED_FILES; do grep url $file; done plus some variable expansion (maybe better in python + some mocking of typically used bazel classes) on those files.

@muffgaga
Copy link
Copy Markdown
Contributor Author

@tgamblin Alternatively we could patch tensorflow's WORKSPACE files to use local files and specify all the urls from above as package resources. This would increase the work to add new versions of the tensorflow package, but maybe it's simpler to do (download once, extract, do some grepping).

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Dec 5, 2016

@muffgaga What is the status of this PR ?

@jayavanth
Copy link
Copy Markdown
Contributor

Are the dependencies rpath linked? The dependencies are installed via Bazel.

@muffgaga
Copy link
Copy Markdown
Contributor Author

@jayavanth

Are the dependencies rpath linked? The dependencies are installed via Bazel.

bazel's paradigm is to link statically;I just did a find . -name "*.so" + ldd + chrpath -l and there are no links to the higher-level libraries (like eigen, google protobuf, etc.). However, there are some paths pointing to system-level directories (at least libz seems like a real problem):

./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/contrib/factorization/python/ops/_factorization_ops.so: RPATH=/MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.7/gcc-4.9.2-w4e6fuud2uh5i7x3pjxljpnvmblmvrq3/lib:/MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.7/gcc-4.9.2-w4e6fuud2uh5i7x3pjxljpnvmblmvrq3/lib64:/MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.9.2/tensorflow-0.10.0-vi4yiokxsjw4jaejklhkxtv2lbjz4z43/lib:/MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.9.2/tensorflow-0.10.0-vi4yiokxsjw4jaejklhkxtv2lbjz4z43/lib64
        linux-vdso.so.1 (0x00007ffc2d762000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7c471b2000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7c46f95000)
        libstdc++.so.6 => /MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.7/gcc-4.9.2-w4e6fuud2uh5i7x3pjxljpnvmblmvrq3/lib64/libstdc++.so.6 (0x00007f7c46c8b000)
        libgcc_s.so.1 => /MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.7/gcc-4.9.2-w4e6fuud2uh5i7x3pjxljpnvmblmvrq3/lib64/libgcc_s.so.1 (0x00007f7c46a75000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7c466ca000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f7c4790b000)
# ...
./lib/python2.7/site-packages/tensorflow-0.10.0-py2.7.egg/tensorflow/python/_pywrap_tensorflow.so: RPATH=/MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.7/gcc-4.9.2-w4e6fuud2uh5i7x3pjxljpnvmblmvrq3/lib:/MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.7/gcc-4.9.2-w4e6fuud2uh5i7x3pjxljpnvmblmvrq3/lib64:/MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.9.2/tensorflow-0.10.0-vi4yiokxsjw4jaejklhk
xtv2lbjz4z43/lib:/MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.9.2/tensorflow-0.10.0-vi4yiokxsjw4jaejklhkxtv2lbjz4z43/lib64
        linux-vdso.so.1 (0x00007ffc20aba000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2beca81000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2bec780000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2bec565000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2bec348000)
        libstdc++.so.6 => /MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.7/gcc-4.9.2-w4e6fuud2uh5i7x3pjxljpnvmblmvrq3/lib64/libstdc++.so.6 (0x00007f2bec03e000)
        libgcc_s.so.1 => /MY_SPACK/opt/spack/linux-debian7-x86_64/gcc-4.7/gcc-4.9.2-w4e6fuud2uh5i7x3pjxljpnvmblmvrq3/lib64/libgcc_s.so.1 (0x00007f2bebe28000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2beba7d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2bef1c4000)

On our last build (using spack from 2017-01-26) and [email protected] I do not see the dynamic libz dependencies anymore; there's only vdso, pthread, libm, libc, ld-linux, libgcc_s, libstdc++ and libdl → however, this installation uses the system-level compiler, so maybe some of those would point to the spack compiler location on a different installation.

@alalazo

@muffgaga What is the status of this PR ?

Current status: I'll provide an update for [email protected]+cuda, but we still do not see an easy fix for the bazel-builds-it-all-on-its-own-into-mostly-static-libraries/binaries problem.
We do use it in production on 3 different archs (usage on Debian wheezy basically ends now, we actively use it on Debian jessie and have a working installation on Scientific Linux 6 (Carbon)).

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Feb 24, 2017

@muffgaga We are having the same problems as you installing tensorflow at our site via spack. We noticed though that in the tensorflow git repo they have a cmake directory. Do you know if this is a substitute for bazel?

@healther healther deleted the Add_package_tensorflow branch March 4, 2017 16:40
@healther healther restored the Add_package_tensorflow branch March 4, 2017 16:40
@alalazo
Copy link
Copy Markdown
Member

alalazo commented May 7, 2017

Related to #3244

@alalazo alalazo mentioned this pull request May 7, 2017
3 tasks
@alalazo
Copy link
Copy Markdown
Member

alalazo commented Sep 16, 2017

@muffgaga I added the 'up-for-grabs' tag as this was inactive for a while. Feel free to complete the PR and remove the tag. I bet you'll save the day to a lot of Spack users.

@jcftang
Copy link
Copy Markdown
Member

jcftang commented Sep 16, 2017

The TF guys keep adding packages that makes this hard to package up.

@ifelsefi
Copy link
Copy Markdown
Contributor

ifelsefi commented Oct 10, 2017

Tensorflow now requires bazel >= 5.4.

The newer versions of bazel do not install with the existing template. I cannot figure out the problem. I think it has to do with these patches:

    patch('fix_env_handling.patch')
    patch('link.patch')
    patch('cc_configure.patch')

I turned off these scripts however that produces various different errors.

With them all on:

[root@node117 ~]# spack install bazel %[email protected]
==> zip is already installed in /pbtech_mounts/softlib001/apps/EL6/spack/opt/spack/linux-rhel6-x86_64/gcc-6.3.0/zip-3.0-o6zj7p7t3a3xdpsvhod7zhyfuu2jitfs
==> jdk is already installed in /pbtech_mounts/softlib001/apps/EL6/spack/opt/spack/linux-rhel6-x86_64/gcc-6.3.0/jdk-8u141-b15-5bog6a7tfbasqegezgox5zvikndldodx
==> Installing bazel
==> Using cached archive: /pbtech_mounts/softlib001/apps/EL6/spack/var/spack/cache/bazel/bazel-0.6.1.zip
==> Already staged bazel-0.6.1-cplr3rjajznaldy5wwjxes7naypp6hud in /pbtech_mounts/softlib001/apps/EL6/spack/var/spack/stage/bazel-0.6.1-cplr3rjajznaldy5wwjxes7naypp6hud
==> Patching failed last time. Restaging.
==> Staging archive: /pbtech_mounts/softlib001/apps/EL6/spack/var/spack/stage/bazel-0.6.1-cplr3rjajznaldy5wwjxes7naypp6hud/bazel-0.6.1-dist.zip
==> Applied patch fix_env_handling.patch
==> Applied patch link.patch
1 out of 1 hunk FAILED -- saving rejects to file tools/cpp/cc_configure.bzl.rej
==> Patch cc_configure.patch failed.
==> Error: ProcessError: Command exited with status 1:
    '/usr/bin/patch' '-s' '-p' '1' '-i' '/pbtech_mounts/softlib001/apps/EL6/spack/var/spack/repos/builtin/packages/bazel/cc_configure.patch' '-d' '.'
==> Error: [Errno 2] No such file or directory: '/pbtech_mounts/softlib001/apps/EL6/spack/var/spack/stage/bazel-0.6.1-cplr3rjajznaldy5wwjxes7naypp6hud/spack-expanded-archive/spack-build.out'

Has anyone tried to install latest bazel?

@tgamblin
Copy link
Copy Markdown
Member

I'd really like to see a version if this that doesn't require bazel, e.g. the one in #3244

@muffgaga muffgaga deleted the Add_package_tensorflow branch December 1, 2022 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants