add easyblock for TensorFlow (REVIEW)#1287
Conversation
|
+1 on the env vars! Seeing strange things with the interactive installer vs running it under eb. |
… MKL light instead
…Flow build command
…nt variables used by configure.py script
…nes, add debugging
|
The env vars based block doesn't hang any more since its no long based on the qanda block 👍 |
damianam
left a comment
There was a problem hiding this comment.
I am not familiar with TF, so I'd suggest to look for a maintainer more acquainted with it for an additional review.
easybuild/easyblocks/t/tensorflow.py
Outdated
|
|
||
| tmpdir = tempfile.mkdtemp(suffix='-bazel-configure') | ||
|
|
||
| # create wrapper for icc to make sure location of license server is available... |
There was a problem hiding this comment.
Explain here that bazel sandboxes the environment and it is then when INTEL_LICENSE_FILE gets lost?
| cuda_root = get_software_root('CUDA') | ||
| cudnn_root = get_software_root('cuDNN') | ||
| jemalloc_root = get_software_root('jemalloc') | ||
| opencl_root = get_software_root('OpenCL') |
There was a problem hiding this comment.
There isn't any OpenCL easyconfig as far as I can see. Besides, the OpenCL runtime for intel processors needs root permissions, so it won't be made available any time soon through EB.
There was a problem hiding this comment.
Well, ok, but I'll leave this in anyway, to avoid hardcoding building without OpenCL support.
At least this way there is an option to built TF with OpenCL support (for example via http://easybuild.readthedocs.io/en/latest/Using_external_modules.html).
easybuild/easyblocks/t/tensorflow.py
Outdated
| cmd.append(self.cfg['buildopts']) | ||
|
|
||
| # pass through environment variables that may specify location of license file for Intel compilers | ||
| for key in []: |
There was a problem hiding this comment.
I have the feeling that this shouldn't be an empty list....
There was a problem hiding this comment.
Woops, yes, I did the for for $INTEL_LICENSE_FILE through this initially, but it turns out that doesn't actually work...
| # using the full Intel MKL doesn't work without additional effort... | ||
| cmd.extend(['--config=mkl']) | ||
|
|
||
| cmd.append('//tensorflow/tools/pip_package:build_pip_package') |
There was a problem hiding this comment.
I am not sure what this does, but I going to guess that the double slash is a typo.
There was a problem hiding this comment.
No, it's not, that's how Bazel works... See https://www.tensorflow.org/install/install_sources#build_the_pip_package .
…means of test, in install step
Added preconfigopts, configopts and a few other things in tensorflow …
|
@damianam @akesandgren I gave this another, hopefully final, review, I think this is good to go in now... |
| extra_vars = { | ||
| # see https://developer.nvidia.com/cuda-gpus | ||
| 'cuda_compute_capabilities': [[], "List of CUDA compute capabilities to build with", CUSTOM], | ||
| 'with_jemalloc': [True, "Make TensorFlow use jemalloc", CUSTOM], |
There was a problem hiding this comment.
Wouldn't it be better to use jemalloc only when it is present as a dependency and remove this with_jemalloc thingie?
There was a problem hiding this comment.
Currently we have no way to let Bazel pick up on self-provided dependencies, but we needed a way to disable it pulling in and building jemalloc itself (which is required on CentOS 6).
This is something to improve on in the future, but I think we'll have to wait for improvements in Bazel itself before this is feasible without heavy patching, it's a huge PITA right now to make that work.
This easyblock is a starting point, definitely not the end of the road...
There was a problem hiding this comment.
Ok, good enough for me then.
Awaiting merge so i can redo the builds...
|
@JackPerdue I guess we can relax the |
|
Looks good enough to go for me. |
|
(re)tested with:
pending: TensorFlow 1.5.0 with |
|
@boegel no blocker here....digging further I see I have similar problems with Bazel. So, for now, I've hacked together a binutils-system using bundle.py which only does: modextravars = {'EBROOTBINUTILS': '/usr'} Seems to work since it gets listed last in module. That causes bazel.py to do a bunch of other stufff like testing for dwp who's absence on RHEL6 didn't show up until building TensorFlow. Unfortunately, that hasn't solved the other problem with Bazel in that it wants to use the system GCC instead of the EB one. Running "strings" on bazel still shows: GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-16) and 4.8.5 is what gets used by Bazel when building TF which breaks it (e.g. trying to use /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/stdint.h). I tried rebuilding Bazel with all gcc RPMs removed but somehow it still picks it up. So to build TF, I still have to remove system gcc (which would not be my first choice). [note, still use Bazel block and config from EB 3.5.1] Anyway.... not a blocker for me... lets get something out there and see who else has problems/fixes. |
|
@wpoely86 Do you see any reason not to merge this? |
|
No, I forgot to click the shiny green button 😉 |
|
🎉 |
With this easyblock, I'm able to build & install TensorFlow 1.4.0 from source using
foss/2017band Python 3.6.3 (cfr. easybuilders/easybuild-easyconfigs#5318).Further testing is required, especially w.r.t. building with Intel compilers on top of Intel MKL & GPU support (i.e., including CUDA & cuDNN as dependencies).
One thing not taking care of now is avoiding that the installation procedure pulls in a whole bunch of dependencies itself, it seems like Bazel doesn't have good support for providing the dependencies via some other way (meaning that a lot of patching is probably required to avoid that they're automagically installed).
Also, we can avoid the need for
run_cmd_qaby setting$TF_*environment variables that are picked up byconfigure, it may be worth while switching to that approach.