Skip to content

add easyblock for TensorFlow (REVIEW)#1287

Merged
wpoely86 merged 32 commits intoeasybuilders:developfrom
boegel:TensorFlow
Feb 19, 2018
Merged

add easyblock for TensorFlow (REVIEW)#1287
wpoely86 merged 32 commits intoeasybuilders:developfrom
boegel:TensorFlow

Conversation

@boegel
Copy link
Copy Markdown
Member

@boegel boegel commented Nov 9, 2017

With this easyblock, I'm able to build & install TensorFlow 1.4.0 from source using foss/2017b and Python 3.6.3 (cfr. easybuilders/easybuild-easyconfigs#5318).

Further testing is required, especially w.r.t. building with Intel compilers on top of Intel MKL & GPU support (i.e., including CUDA & cuDNN as dependencies).

One thing not taking care of now is avoiding that the installation procedure pulls in a whole bunch of dependencies itself, it seems like Bazel doesn't have good support for providing the dependencies via some other way (meaning that a lot of patching is probably required to avoid that they're automagically installed).

Also, we can avoid the need for run_cmd_qa by setting $TF_* environment variables that are picked up by configure, it may be worth while switching to that approach.

@pforai
Copy link
Copy Markdown
Contributor

pforai commented Nov 9, 2017

+1 on the env vars! Seeing strange things with the interactive installer vs running it under eb.

@pforai
Copy link
Copy Markdown
Contributor

pforai commented Nov 29, 2017

The env vars based block doesn't hang any more since its no long based on the qanda block 👍

Copy link
Copy Markdown
Member

@damianam damianam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with TF, so I'd suggest to look for a maintainer more acquainted with it for an additional review.


tmpdir = tempfile.mkdtemp(suffix='-bazel-configure')

# create wrapper for icc to make sure location of license server is available...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain here that bazel sandboxes the environment and it is then when INTEL_LICENSE_FILE gets lost?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do

cuda_root = get_software_root('CUDA')
cudnn_root = get_software_root('cuDNN')
jemalloc_root = get_software_root('jemalloc')
opencl_root = get_software_root('OpenCL')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't any OpenCL easyconfig as far as I can see. Besides, the OpenCL runtime for intel processors needs root permissions, so it won't be made available any time soon through EB.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, ok, but I'll leave this in anyway, to avoid hardcoding building without OpenCL support.

At least this way there is an option to built TF with OpenCL support (for example via http://easybuild.readthedocs.io/en/latest/Using_external_modules.html).

cmd.append(self.cfg['buildopts'])

# pass through environment variables that may specify location of license file for Intel compilers
for key in []:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling that this shouldn't be an empty list....

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, yes, I did the for for $INTEL_LICENSE_FILE through this initially, but it turns out that doesn't actually work...

# using the full Intel MKL doesn't work without additional effort...
cmd.extend(['--config=mkl'])

cmd.append('//tensorflow/tools/pip_package:build_pip_package')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this does, but I going to guess that the double slash is a typo.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel boegel changed the title add easyblock for TensorFlow (WIP) add easyblock for TensorFlow (REVIEW) Dec 10, 2017
Damian Alvarez and others added 4 commits February 8, 2018 18:01
@boegel boegel changed the title add easyblock for TensorFlow (WIP) add easyblock for TensorFlow (REVIEW) Feb 12, 2018
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Feb 12, 2018

@damianam @akesandgren I gave this another, hopefully final, review, I think this is good to go in now...
If the both of you agree (please take a couple of minutes to give the whole thing another view), this is good to merge by one of the @easybuilders/easybuild-easyblocks-maintainers, to be included with the upcoming EasyBuild v3.5.2 (ETA end of Feb'18).

extra_vars = {
# see https://developer.nvidia.com/cuda-gpus
'cuda_compute_capabilities': [[], "List of CUDA compute capabilities to build with", CUSTOM],
'with_jemalloc': [True, "Make TensorFlow use jemalloc", CUSTOM],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to use jemalloc only when it is present as a dependency and remove this with_jemalloc thingie?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have no way to let Bazel pick up on self-provided dependencies, but we needed a way to disable it pulling in and building jemalloc itself (which is required on CentOS 6).

This is something to improve on in the future, but I think we'll have to wait for improvements in Bazel itself before this is feasible without heavy patching, it's a huge PITA right now to make that work.

This easyblock is a starting point, definitely not the end of the road...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good enough for me then.
Awaiting merge so i can redo the builds...

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Feb 13, 2018

@JackPerdue I guess we can relax the binutils requirement, sure, it's a bit strict right now... It's just that I was making the assumption here that a 'modern' EasyBuild toolchain is used, where we include our own binutils.
Is this a blocker for you?

@akesandgren
Copy link
Copy Markdown
Contributor

Looks good enough to go for me.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Feb 13, 2018

@JackPerdue
Copy link
Copy Markdown
Contributor

@boegel no blocker here....digging further I see I have similar problems with Bazel. So, for now, I've hacked together a binutils-system using bundle.py which only does:

modextravars = {'EBROOTBINUTILS': '/usr'}

Seems to work since it gets listed last in module. That causes bazel.py to do a bunch of other stufff like testing for dwp who's absence on RHEL6 didn't show up until building TensorFlow. Unfortunately, that hasn't solved the other problem with Bazel in that it wants to use the system GCC instead of the EB one. Running "strings" on bazel still shows:

GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-16)
GCC: (GNU) 6.3.0

and 4.8.5 is what gets used by Bazel when building TF which breaks it (e.g. trying to use /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/stdint.h). I tried rebuilding Bazel with all gcc RPMs removed but somehow it still picks it up. So to build TF, I still have to remove system gcc (which would not be my first choice). [note, still use Bazel block and config from EB 3.5.1]

Anyway.... not a blocker for me... lets get something out there and see who else has problems/fixes.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Feb 19, 2018

@wpoely86 Do you see any reason not to merge this?

@wpoely86 wpoely86 merged commit b264b72 into easybuilders:develop Feb 19, 2018
@wpoely86
Copy link
Copy Markdown
Member

No, I forgot to click the shiny green button 😉

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Feb 20, 2018

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants