ARROW-8147: [C++] add GCS library to ThirdpartyToolchain#11268
ARROW-8147: [C++] add GCS library to ThirdpartyToolchain#11268coryan wants to merge 4 commits intoapache:masterfrom coryan:ARROW-8147-add-google-cloud-cpp-to-ThirdpartyToolchain
Conversation
|
|
|
The "Java JNI" test passes locally with The appveyor failure seems to be a race condition in a test. The test has a In any case, if I am wrong do let me know and I will investigate / change things as needed. |
This adds the GCS library to ThirdpartyToolchain, and enables the library in some of the builds. A future PR will add the library to the package manager configuration files and to any remaining builds where it needs to go.
| -DCMAKE_CXX_STANDARD=11 | ||
| "-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}" | ||
| -DCMAKE_INSTALL_LIBDIR=lib | ||
| "-DCMAKE_INSTALL_PREFIX=<INSTALL_DIR>") |
There was a problem hiding this comment.
Can we use EP_COMMON_CMAKE_ARGS here?
There was a problem hiding this comment.
Done. Also did the same for nlohmann_json (add EP_COMMON_CMAKE_ARGS and sort the arguments), but maybe it has no effect there.
| -DCMAKE_INSTALL_RPATH=$ORIGIN | ||
| -DCMAKE_INSTALL_LIBDIR=lib | ||
| -DGOOGLE_CLOUD_CPP_ENABLE=storage | ||
| -DBUILD_TESTING=OFF) |
There was a problem hiding this comment.
Could you sort this list in alphabetical order?
| if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO") | ||
| find_package(google_cloud_cpp_storage QUIET) | ||
| if(NOT google_cloud_cpp_storage_FOUND) | ||
| build_google_cloud_cpp() | ||
| endif() | ||
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED") | ||
| build_google_cloud_cpp() | ||
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM") | ||
| find_package(google_cloud_cpp_storage REQUIRED) | ||
| endif() |
There was a problem hiding this comment.
Can we use resolve_dependency() here?
| if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO") | |
| find_package(google_cloud_cpp_storage QUIET) | |
| if(NOT google_cloud_cpp_storage_FOUND) | |
| build_google_cloud_cpp() | |
| endif() | |
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED") | |
| build_google_cloud_cpp() | |
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM") | |
| find_package(google_cloud_cpp_storage REQUIRED) | |
| endif() | |
| resolve_dependency(google_cloud_cpp_storage) |
There was a problem hiding this comment.
Oh, sorry. My suggestion was insufficient.
We need to add google_cloud_cpp_storage entry to build_dependency():
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 36c8c33ab..e12abefc5 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -151,6 +151,8 @@ macro(build_dependency DEPENDENCY_NAME)
build_gflags()
elseif("${DEPENDENCY_NAME}" STREQUAL "GLOG")
build_glog()
+ elseif("${DEPENDENCY_NAME}" STREQUAL "google_cloud_cpp_storage")
+ build_google_cloud_cpp_storage()
elseif("${DEPENDENCY_NAME}" STREQUAL "gRPC")
build_grpc()
elseif("${DEPENDENCY_NAME}" STREQUAL "GTest")And we should use google_cloud_cpp_storage_SOURCE and build_google_cloud_cpp_storage for variable/function names.
| -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} | ||
| "-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}" |
There was a problem hiding this comment.
Can we remove them because EP_COMMON_CMAKE_ARGS include them?
| -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} | |
| "-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}" |
| PROPERTIES IMPORTED_LOCATION | ||
| "${GOOGLE_CLOUD_CPP_STATIC_LIBRARY_COMMON}" | ||
| INTERFACE_INCLUDE_DIRECTORIES | ||
| "${GOOGLE_CLOUD_CPP_INSTALL_PREFIX}") |
There was a problem hiding this comment.
Is this a typo?
| "${GOOGLE_CLOUD_CPP_INSTALL_PREFIX}") | |
| "${GOOGLE_CLOUD_CPP_INCLUDE_DIR}") |
| Threads::Threads | ||
| OpenSSL::SSL | ||
| OpenSSL::Crypto) | ||
| add_dependencies(google-cloud-cpp::storage GOOGLE_CLOUD_CPP_ep) |
There was a problem hiding this comment.
Should we use lowercase here?
| add_dependencies(google-cloud-cpp::storage GOOGLE_CLOUD_CPP_ep) | |
| add_dependencies(google-cloud-cpp::storage google_cloud_cpp_ep) |
| if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO") | ||
| find_package(google_cloud_cpp_storage QUIET) | ||
| if(NOT google_cloud_cpp_storage_FOUND) | ||
| build_google_cloud_cpp() | ||
| endif() | ||
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED") | ||
| build_google_cloud_cpp() | ||
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM") | ||
| find_package(google_cloud_cpp_storage REQUIRED) | ||
| endif() |
There was a problem hiding this comment.
Oh, sorry. My suggestion was insufficient.
We need to add google_cloud_cpp_storage entry to build_dependency():
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 36c8c33ab..e12abefc5 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -151,6 +151,8 @@ macro(build_dependency DEPENDENCY_NAME)
build_gflags()
elseif("${DEPENDENCY_NAME}" STREQUAL "GLOG")
build_glog()
+ elseif("${DEPENDENCY_NAME}" STREQUAL "google_cloud_cpp_storage")
+ build_google_cloud_cpp_storage()
elseif("${DEPENDENCY_NAME}" STREQUAL "gRPC")
build_grpc()
elseif("${DEPENDENCY_NAME}" STREQUAL "GTest")And we should use google_cloud_cpp_storage_SOURCE and build_google_cloud_cpp_storage for variable/function names.
| set(CRC32C_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/crc32c_ep-install") | ||
| set(CRC32C_CMAKE_ARGS | ||
| ${EP_COMMON_CMAKE_ARGS} | ||
| -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} |
There was a problem hiding this comment.
| -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} |
There was a problem hiding this comment.
Fixed. FWIW, there are many other uses of this in the file, you may want to clean that up in a separate bug+PR.
| ${EP_COMMON_CMAKE_ARGS} | ||
| -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} | ||
| -DCMAKE_CXX_STANDARD=11 | ||
| "-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}" |
There was a problem hiding this comment.
| "-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}" |
| set(GOOGLE_CLOUD_CPP_CMAKE_ARGS | ||
| ${EP_COMMON_CMAKE_ARGS} | ||
| -DBUILD_TESTING=OFF | ||
| -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} |
There was a problem hiding this comment.
| -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} |
| if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO") | ||
| find_package(google_cloud_cpp_storage QUIET) | ||
| if(NOT google_cloud_cpp_storage_FOUND) | ||
| build_google_cloud_cpp_storage() | ||
| endif() | ||
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED") | ||
| build_google_cloud_cpp_storage() | ||
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM") | ||
| find_package(google_cloud_cpp_storage REQUIRED) | ||
| endif() |
There was a problem hiding this comment.
resolve_dependency() does this.
| if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO") | |
| find_package(google_cloud_cpp_storage QUIET) | |
| if(NOT google_cloud_cpp_storage_FOUND) | |
| build_google_cloud_cpp_storage() | |
| endif() | |
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED") | |
| build_google_cloud_cpp_storage() | |
| elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM") | |
| find_package(google_cloud_cpp_storage REQUIRED) | |
| endif() |
There was a problem hiding this comment.
Neat, thanks. Removed.
| ARROW_WITH_ZSTD=ON \ | ||
| ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-${llvm}/bin/llvm-symbolizer \ | ||
| AWSSDK_SOURCE=BUNDLED \ | ||
| GOOGLE_CLOUD_CPP_SOURCE=BUNDLED \ |
There was a problem hiding this comment.
| GOOGLE_CLOUD_CPP_SOURCE=BUNDLED \ | |
| google_cloud_cpp_storage_SOURCE=BUNDLED \ |
There was a problem hiding this comment.
Done. Note that this could be a nuisance if you ever want to use Bigtable or Spanner as data sources.
There was a problem hiding this comment.
We may need to implement custom resolve_dependency() like feature for them when we also want to use other Google Cloud C++ libraries.
If Google Cloud C++ provides google_cloud_cpp CMake package with storage, bigtable and so on components, we can use google_cloud_cpp_SOURCE and find_package(google_cloud_cpp COMPONENTS storage bigtable ...).
ci/scripts/cpp_build.sh
Outdated
| -DCMAKE_INSTALL_PREFIX=${CMAKE_INSTALL_PREFIX:-${ARROW_HOME}} \ | ||
| -DCMAKE_UNITY_BUILD=${CMAKE_UNITY_BUILD:-OFF} \ | ||
| -Dgflags_SOURCE=${gflags_SOURCE:-} \ | ||
| -DGOOGLE_CLOUD_CPP_SOURCE=${GOOGLE_CLOUD_CPP_SOURCE:-} \ |
There was a problem hiding this comment.
| -DGOOGLE_CLOUD_CPP_SOURCE=${GOOGLE_CLOUD_CPP_SOURCE:-} \ | |
| -Dgoogle_cloud_cpp_storage_SOURCE=${google_cloud_cpp_storage_SOURCE:-} \ |
| -DCRC32C_BUILD_TESTS=OFF | ||
| -DCRC32C_BUILD_BENCHMARKS=OFF | ||
| -DCRC32C_USE_GLOG=OFF) | ||
| set(CRC32C_BUILD_BYPRODUCTS) |
There was a problem hiding this comment.
| set(CRC32C_BUILD_BYPRODUCTS) |
There was a problem hiding this comment.
Ooops, thanks. Removed.
|
Oh, arm64 build on Travis CI is failed: |
| ARROW_WITH_ZSTD=ON \ | ||
| ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-${llvm}/bin/llvm-symbolizer \ | ||
| AWSSDK_SOURCE=BUNDLED \ | ||
| google_cloud_cpp_storage_SOURCE=BUNDLED \ |
There was a problem hiding this comment.
Was GCS_SOURCE too ambiguous? This is quite a long parameter name...
There was a problem hiding this comment.
We use CMake's package name for XXX_SOURCE.
If we don't follow the convention, we can choose GCS_SOURCE.
There was a problem hiding this comment.
Let me know how to proceed, I can do either.
There was a problem hiding this comment.
If it's the CMake name then fair enough.
|
Forgot to update |
That seems to be fixed upstream in crc32c, but the fix is not in any tag or release. Would you be Okay with me pinning to a SHA of this dependency? Can I reproduce these builds myself to verify I have fixed such problems? |
Yes. We sometimes use this approach.
Yes.
BTW, it's strange that Travis CI jobs weren't run for this pull request... |
In #11268 I neglected to add `google_cloud_cpp_storage` to the third-party dependencies. Closes #11320 from coryan/ARROW-14223-add-google-cloud-cpp-to-thirdparty-dependencies Authored-by: Carlos O'Ryan <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
This adds the GCS library to ThirdpartyToolchain, and enables the library in some of the builds. A future PR will add the library to the package manager configuration files and to any remaining builds where it needs to go. Closes apache#11268 from coryan/ARROW-8147-add-google-cloud-cpp-to-ThirdpartyToolchain Authored-by: Carlos O'Ryan <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
In apache#11268 I neglected to add `google_cloud_cpp_storage` to the third-party dependencies. Closes apache#11320 from coryan/ARROW-14223-add-google-cloud-cpp-to-thirdparty-dependencies Authored-by: Carlos O'Ryan <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
This adds the GCS library to ThirdpartyToolchain, and enables the
library in some of the builds. A future PR will add the library to the
package manager configuration files and to any remaining builds where it
needs to go.