Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Commit 1393602

Browse files
authored
[MXNET-1446] Quantization: intgemm matrix multiply wrappers (#17559)
This pull request adds wrappers to the intgemm matrix multiplication library: https://github.com/kpu/intgemm . A performance comparison with DNNL aka MKL-DNN is at kpu/intgemm#59 The library targets thin matrix sizes seen in neural machine translation inference and was part of the top submission to the 2018 Workshop on Neural Generation and Translation efficiency task: https://neural.mt/papers/edinburgh/wnmt_marian_paper.pdf . The purpose of this issue is to add similar functionality to Sockeye: awslabs/sockeye#771 . Quantized Sockeye performance is 2.95x as fast. One problem with the current MXQuantizeSymbol approach is that Sockeye does not have a static graph for everything. intgemm uses a custom memory layout for the weight matrix to make more memory accesses consecutive, so there are operators to convert weights to that format. The idea is that weights are typically loaded once for inference. On architectures without VNNI, intgemm uses saturating 16-bit accumulation. This avoids an expensive madd_epi16 instruction every multiply by exploiting the fact that most neural network parameters are near 0. Because x86 only offers a unsigned * signed instruction and most people want signed * signed, there are two strategies one can take. Add 128 to data so now it's unsigned. But that biases the output. DNNL calculates this bias on the fly by summing weights then subtracts it out during GEMM. intgemm calculates this bias in advance, which can then be subtracted from the bias term with no overhead at runtime. A problem with this strategy is that it makes the accumulator bigger, requiring more upcasting with an expensive madd_epi16 instruction. Emulate signed * signed by normalizing the sign bit into the second argument. This requires extra instructions in the hot loop but keeps the accumulator small, so it's less necessary to accumulate into 32-bit integers and madd_epi16 can be avoided. Both intgemm and DNNL implement strategy 1; intgemm also implements strategy 2. Similar to DNNL, intgemm has runtime CPUID selection among backends for SSSE3, AVX2, AVX512BW, and AVX512VNNI.
1 parent e2aacce commit 1393602

File tree

11 files changed

+1157
-3
lines changed

11 files changed

+1157
-3
lines changed

CMakeLists.txt

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ if(USE_MKL_IF_AVAILABLE AND (NOT APPLE) AND (NOT MSVC) AND (CMAKE_HOST_SYSTEM_PR
6464
else()
6565
option(USE_MKLDNN "Build with MKL-DNN support" OFF)
6666
endif()
67+
cmake_dependent_option(USE_INTGEMM "Build with x86_64 intgemm library for low-precision multiplication" ON "CMAKE_SYSTEM_PROCESSOR STREQUAL x86_64" OFF)
6768
if(NOT MSVC)
6869
option(USE_OPERATOR_TUNING "Enable auto-tuning of operators" ON)
6970
else()
@@ -278,6 +279,22 @@ if(USE_MKLDNN)
278279
set_target_properties(dnnl PROPERTIES CXX_CLANG_TIDY "") # don't lint 3rdparty dependency
279280
endif()
280281

282+
if(USE_INTGEMM)
283+
message(STATUS "Using intgemm")
284+
include(FetchContent)
285+
FetchContent_Declare(
286+
intgemm
287+
GIT_REPOSITORY https://github.com/kpu/intgemm.git
288+
GIT_TAG 02f671cf537fdbc818cf8111d1d9e557a8650d7a
289+
)
290+
FetchContent_GetProperties(intgemm)
291+
if(NOT intgemm_POPULATED)
292+
FetchContent_Populate(intgemm)
293+
endif()
294+
add_subdirectory(${intgemm_SOURCE_DIR} ${intgemm_BINARY_DIR} EXCLUDE_FROM_ALL)
295+
add_definitions(-DMXNET_USE_INTGEMM=1)
296+
endif()
297+
281298
# Allow Cuda compiles outside of src tree to find things in 'src' and 'include'
282299
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/include)
283300
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/src)
@@ -474,6 +491,11 @@ endif()
474491
FILE(GLOB_RECURSE SOURCE "src/*.cc" "src/*.h" "include/*.h")
475492
FILE(GLOB_RECURSE CUDA "src/*.cu" "src/*.cuh")
476493

494+
if(NOT USE_INTGEMM)
495+
FILE(GLOB_RECURSE INTGEMM_OPERATOR_SOURCE "src/operator/contrib/intgemm/*.cc" "src/operator/contrib/intgemm/*.h")
496+
list(REMOVE_ITEM SOURCE ${INTGEMM_OPERATOR_SOURCE})
497+
endif()
498+
477499
# add nnvm to source
478500
FILE(GLOB_RECURSE NNVMSOURCE
479501
3rdparty/tvm/nnvm/src/c_api/*.cc
@@ -750,6 +772,10 @@ if(USE_MKLDNN)
750772
${CMAKE_BINARY_DIR}/3rdparty/mkldnn/include/dnnl_version.h ${CMAKE_SOURCE_DIR}/include/mkldnn/)
751773
endif()
752774

775+
if(USE_INTGEMM)
776+
target_link_libraries(mxnet PRIVATE intgemm)
777+
endif()
778+
753779
function(BuildTVMOP)
754780
# scope the variables in BuildTVM.cmake to avoid conflict
755781
include(cmake/BuildTVM.cmake)

LICENSE

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,8 @@
309309
Licensed MIT © Zeno Rocha
310310
11. mx-theme - For details, see docs/python_docs/themes/mx-theme/LICENSE
311311
Copyright (c) 2016 myyasuda
312+
12. intgemm - Refer to 3rdparty/intgemm/LICENSE
313+
Copyright (c) 2017--2019 University of Edinburgh, Nikolay Bogoychev, Mateusz Chudyk, Kenneth Heafield, and Microsoft Corporation
312314

313315

314316
=======================================================================================

include/mxnet/base.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -539,7 +539,7 @@ inline std::ostream& operator<<(std::ostream &out, const Context &ctx) {
539539
#define ADD_FILELINE "\n\nDefined in " __FILE__ ":L" STRINGIZE(__LINE__)
540540

541541

542-
#if MXNET_USE_MKLDNN == 1
542+
#if MXNET_USE_MKLDNN == 1 || MXNET_USE_INTGEMM == 1
543543
constexpr size_t kMKLDNNAlign = 64;
544544
#endif
545545

0 commit comments

Comments
 (0)