Skip to content

Adding basic CMake support#728

Closed
ageron wants to merge 7 commits intotensorflow:masterfrom
ageron:master
Closed

Adding basic CMake support#728
ageron wants to merge 7 commits intotensorflow:masterfrom
ageron:master

Conversation

@ageron
Copy link
Copy Markdown
Contributor

@ageron ageron commented Jan 8, 2016

This PR adds very basic CMake support, to unblock people who can't use Bazel (eg. Windows users).

The CMake files are all located in the cmake subdirectory. Usage:

cd cmake
mkdir build & cd build
mkdir release & cd release
cmake -DCMAKE_BUILD_TYPE=Release ../..

This builds the tutorial C++ training example (along with all dependencies).

Limitations: no GPU, no python, no tests, no install targets, tested only on MacOSX, and needs a bit of code cleanup. It's a first version to get the ball rolling.

@tensorflow-jenkins
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@googlebot
Copy link
Copy Markdown

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@ageron
Copy link
Copy Markdown
Contributor Author

ageron commented Jan 19, 2016

I added my email to my profile, hope this helps the CLA check.

@googlebot
Copy link
Copy Markdown

CLAs look good, thanks!

@martinwicke
Copy link
Copy Markdown
Member

I would love to have this in contrib/ can you move the files there? Then we can merge.

@rbharath
Copy link
Copy Markdown

Is this being actively worked on? If not, I'd be glad to give this a go. It would be very useful in getting tensorflow building on clusters with old versions of glibc and the like.

@tensorflow-jenkins
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@ageron
Copy link
Copy Markdown
Contributor Author

ageron commented Feb 23, 2016

@rbharath Hi, sorry for the delay, I've been caught up in another project over the past few weeks, I was hoping to get back to this sooner. Your help is most welcome, thanks a lot! Please feel free to ping me if something is not clear in the code, and I'll try to come back to this within the next couple of weeks.

@rbharath
Copy link
Copy Markdown

@ageron: Glad to be useful! I'll give this a shot. I'm no cmake expert, but hopefully I can make enough progress to get somewhere useful.

@rbharath
Copy link
Copy Markdown

@ageron I checked out this PR and tried to get tensorflow to build on my machine using cmake (Ubuntu 12.04 with glic 2.15, gcc 4.9.1, cmake 3.5). I made it a ways, but the build crashes when trying to compile the re2 library:

libbenchmark.a(benchmark.cc.o): In function `_Z8RunBenchPN7testing9BenchmarkEii.part.2':
benchmark.cc:(.text+0x4f): undefined reference to `clock_gettime'
benchmark.cc:(.text+0x149): undefined reference to `clock_gettime'
benchmark.cc:(.text+0x195): undefined reference to `clock_gettime'
benchmark.cc:(.text+0x377): undefined reference to `clock_gettime'
libbenchmark.a(benchmark.cc.o): In function `StopBenchmarkTiming()':
benchmark.cc:(.text+0x47a): undefined reference to `clock_gettime'
libbenchmark.a(benchmark.cc.o):benchmark.cc:(.text+0x4da): more undefined references to     `clock_gettime' follow

The cmake code uses the command ExternalProject to internally download re2 and build it. The failure above is due the linking flag '-lrt' being missing (the rt library was merged into glibc from 2.17 onwards, but I have an old version of glibc locally). The strange thing here though is that I can clone re2 from github manually and get it to build outside of cmake, so there's some arcana about the cmake build that's confusing me. Do you have any thoughts?

@ageron
Copy link
Copy Markdown
Contributor Author

ageron commented Feb 24, 2016

@rbharath That's really weird, I'm looking into it. Also, I'm moving cmake to tensorflow/contrib/cmake, as requested, and catching up to all the changes that have happened in the last few weeks. I'll try to finish by tomorrow.

@ageron
Copy link
Copy Markdown
Contributor Author

ageron commented Feb 25, 2016

@rbharath I updated the code to move cmake/ to tensorflow/contrib/cmake/, and I updated to the latest upstream commit and fixed a couple issues. The code builds and runs fine on my MacOSX machine (tf_tutorials_example_trainer), but I just tried to build it on an Ubuntu VM, and it failed with similar issues as you (linkage errors). Adding the following code just after "find_package(Threads)" in tensorflow/contrib/cmake/CMakeLists.txt fixed the build problem:

IF (UNIX)
    LINK_LIBRARIES(${CMAKE_THREAD_LIBS_INIT} ${CMAKE_DL_LIBS} -lm -lrt)
ENDIF (UNIX)

Now it builds all the way to the end, but I get a segfault when I run tf_tutorials_example_trainer on Ubuntu. I'm investigating why, here's a quick dbg session:
debug_tf.txt

Before building TensorFlow, I installed a few packages:

sudo apt-get install build-essential
sudo apt-get install zlib1g-dev

I also built and installed Protobuf 3 from the source in google/protobuf (I followed the instructions in google/protobuf/cmake/README.md, without gmock)
As I installed it in a non-standard directory, I added the following options to the cmake commands when building TensorFlow:

-DPROTOBUF_PROTOC_EXECUTABLE=[...]/install/bin/protoc
-DPROTOBUF_INCLUDE_DIR=[...]/install/include
-DPROTOBUF_PROTOC_LIBRARY=[...]/install/lib/libprotoc.a
-DPROTOBUF_LIBRARY=[...]/install/lib/libprotobuf.a

@ageron
Copy link
Copy Markdown
Contributor Author

ageron commented Feb 25, 2016

@martinwicke Hi Martin, I moved the cmake/ directory to tensorflow/contrib/cmake, and caught up to the latest upstream commit.

@martinwicke
Copy link
Copy Markdown
Member

Thanks @ageron! I'm merging this for the greater good, even if it's still broken.

martinwicke added a commit that referenced this pull request Feb 25, 2016
@martinwicke
Copy link
Copy Markdown
Member

Merged. Is the segfault the same problem you had in your first version?

@rbharath
Copy link
Copy Markdown

@ageron, @martinwicke I managed to get around the linking issues in re2 (The fix suggested above didn't quite work for me, but something close did. However, I'm now running into a make error when compiling tf_core_kernels

/home/rbharath/tensorflow/tensorflow/core/kernels/matrix_solve_ls_op.cc: In member function ‘void tensorflow::MatrixSolveLsOp<Scalar, SupportsBatchOperationT>::ComputeMatrix(tensorflow::OpKernelContext*, const typename tensorflow::BinaryLinearAlgebraOp<Scalar, SupportsBatchOperationT>::ConstMatrixMap&, const typename tensorflow::BinaryLinearAlgebraOp<Scalar, SupportsBatchOperationT>::ConstMatrixMap&,   typename tensorflow::BinaryLinearAlgebraOp<Scalar, SupportsBatchOperationT>::MatrixMap*)’:
/home/rbharath/tensorflow/tensorflow/core/kernels/matrix_solve_ls_op.cc:111:41: error: ‘Matrix’ is not a class, namespace, or enumeration
           (Scalar(l2_regularizer) * Matrix::Ones(cols, 1)).asDiagonal();
                                     ^
/home/rbharath/tensorflow/tensorflow/core/kernels/matrix_solve_ls_op.cc:131:41: error: ‘Matrix’ is not a class, namespace, or enumeration
           (Scalar(l2_regularizer) * Matrix::Ones(rows, 1)).asDiagonal();
                                     ^
make[2]: ***   [CMakeFiles/tf_core_kernels.dir/home/rbharath/tensorflow/tensorflow/core/kernels/matrix_solve_ls_op.cc.o] Error 1
make[1]: *** [CMakeFiles/tf_core_kernels.dir/all] Error 2
make: *** [all] Error 2

Do you have any thoughts on possible fixes?

@vrv
Copy link
Copy Markdown

vrv commented Feb 26, 2016

I think your version of gcc doesn't know how to parse

"using typename BinaryLinearAlgebraOp<Scalar, SupportsBatchOperationT>::Matrix;" higher in the file -- my suggestion is to either upgrade your gcc and/or maybe downgrade our code to use less sophisticated C++ features :)

@rbharath
Copy link
Copy Markdown

@vrv Thanks for the pointer!

@martinwicke I've made the changes required to get CMake to compile on ubuntu in PR #1309. Could you take a quick look?

@ageron: Unfortunately, I'm also seeing a segfault in my compiled version. I'll do some digging on my end as well to see if I can figure something out.

@martinwicke
Copy link
Copy Markdown
Member

I'm closing this PR (it's merged). Feel free to continue using it as a message board, or move to #1309.

@ageron ageron deleted the master branch September 18, 2016 13:46
darkbuck pushed a commit to darkbuck/tensorflow that referenced this pull request Jan 23, 2020
…pstream-pr-dropout-fp32-fp16

Use CPU to generate random state in dropout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants