【Hackathon 5th No.34】为 Paddle 新增 bitwise_right_shift / bitwise_right_shift_ / bitwise_left_shift / bitwise_left_shift_ API -part#58092
Conversation
|
@xuxinyi389 可以抽空帮忙看看嘛? 为啥本地test可以过,但是在CI-PR3的环境确过不了呢?我好像复现不了这个bug,不知道从哪里可以入手 |
你写的test在目前过的流水线好像都没运行过,应该不是环境的问题。你本地的测试通过,可以提交一份本地的测试报告。 |
我看到在PR-CI-Windows-Inference里面是没运行到,但是在PR-CI-Py3里面应该是运行过但是失败了,我在上个commit中加入了一些print用来打印:两个输入、一个输出、一个reference,不太清楚为什么这个ci里面的输出错的离谱的感觉: |
|
我之前遇到过静态图变量泄漏,因为没有用好静态图的上下文,存在同名变量导致fetch错。但是我看你的动态图测试也没通过。你的测试没有在任何一个CI上通过,应该和环境没关系,你自己再定位下错误。 |
试了下是CPUPlace的时候,遇到需要broadcast的时候结果会出现问题(之前本地我都是开的GPU,broadcast正常,我上面将broadcast的case注释掉了,然后就可以过ci)
bitwise的broadcast应该是在CPU和GPU下有不同的实现?理论上所有bitwise算子应该都进行了自动的broadcast,既然在GPU下的broadcast样例没问题而在CPU下的broadcast样例有问题,那么我感觉我可能是这个地方出了问题。 |
在调用逻辑较为复杂时,为了定位出bug的代码行数,C++的调试可以在源码内添加一些日志输出,然后设置GLOG_v的级别,可以更直观的定位。 |
|
请通过下覆盖率测试 |
|
@xuxinyi389 查明白之前的问题了,是在CPU下x和y在broadcast的时候,会将x和y中len(shape)更大的作为x,更小的作为y,然后再做broadcast,所以当len(x.shape)比len(y.shape)小的时候,原本的 然后现在遇到一个问题,就是支持的类型是 然后我尝试在cpp层,想要在底层bitwise计算的时候加入一个bool判断算术位移还是逻辑位移,但是发现 想问一下有什么建议嘛?emmm感觉有点头大(思考算术位移的必要性emmmm |
是否可以给cast api和paddle.where 补充下类型支持? |
ok我尝试一下~ |
看了一下paddle中似乎不支持uint16, uint32, uint64的dtype吗?之前一些类型拓展我看到普遍都是添加对一些复数和bfloat这种类型的支持,其他同学补充的unsigned支持也是唯一支持的unsigned类型(uint8)。想问下想要支持其他unsigned类型有什么可以参考的例子吗? >>> paddle.to_tensor([1,2,3], dtype=paddle.int16)
Tensor(shape=[3], dtype=int16, place=Place(cpu), stop_gradient=True,
[1, 2, 3])
>>> paddle.to_tensor([1,2,3], dtype=paddle.uint16)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'paddle' has no attribute 'uint16'. Did you mean: 'int16'?
>>> paddle.to_tensor([1,2,3], dtype=paddle.uint32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'paddle' has no attribute 'uint32'. Did you mean: 'int32'?
>>> paddle.to_tensor([1,2,3], dtype=paddle.uint64)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'paddle' has no attribute 'uint64'. Did you mean: 'int64'? |
paddle cast如果支持 uint 向 int转换,在超出int表示范围时会损失精度,这样的转换是不合理的。 |
辛苦review~,嗯嗯有道理的,那这样看来似乎竞品都没有可以参考的实现方案。我感觉想要复用大部分bitwise那一套代码的话,是不是可以干脆把算术算数位移和逻辑位移拆开(因为额外的bool参数似乎不太方便融合到bitwise那一套体系中)。比如原来的 template <typename T>
struct BitwiseRightShiftFunctor {
HOSTDEVICE T operator()(const T a, const T b) const { return a >> b; }
};这个地方,分别对 或者有什么其他比较好的实现方法嘛?希望大佬指点指点~ |
可以分开实现的。你可以尝试下在python层抽象一层,对用户提供例如bitwise_right_shift(x, y, is_arithmetic)这样的接口,而在底层拆开实现 |
ok明白了 |
|
@xuxinyi389 我注意到coverage的ci中cpu下的kernel的四个 然后我尝试在这个宏加了一个log: #define DEFINE_BITWISE_KERNEL_WITH_INVERSE(op_type) \
template <typename T, typename Context> \
void Bitwise##op_type##Kernel(const Context& dev_ctx, \
const DenseTensor& x, \
const DenseTensor& y, \
DenseTensor* out) { \
LOG(INFO) << "I am in the .cc kernel!!!!!!"; \ ######### 打了这一行log ###########
funcs::Bitwise##op_type##Functor<T> func; \
funcs::InverseBitwise##op_type##Functor<T> inv_func; \
auto x_dims = x.dims(); \
auto y_dims = y.dims(); \
if (x_dims.size() >= y_dims.size()) { \
funcs::ElementwiseCompute<funcs::Bitwise##op_type##Functor<T>, T>( \
dev_ctx, x, y, func, out); \
} else { \
funcs::ElementwiseCompute<funcs::InverseBitwise##op_type##Functor<T>, \
T>(dev_ctx, x, y, inv_func, out); \
} \
}然后把基类unittest的setup中的place写死成cpu class TestBitwiseLeftShiftAPI(unittest.TestCase):
def setUp(self):
self.init_input()
self.place = (
# paddle.CUDAPlace(0)
# if paddle.is_compiled_with_cuda()
# else paddle.CPUPlace()
paddle.CPUPlace()
)最后能过test,并且走的是cpu版本的kernel,下面是log: λ xxxy /home/pd/Paddle/build ctest -R test_bitwise_shift -V
UpdateCTestConfiguration from :/home/pd/Paddle/build/DartConfiguration.tcl
UpdateCTestConfiguration from :/home/pd/Paddle/build/DartConfiguration.tcl
Test project /home/pd/Paddle/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 742
Start 742: test_bitwise_shift_op
742: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/home/pd/Paddle/build/python" "/usr/bin/python" "/home/pd/Paddle/tools/test_runner.py" "test_bitwise_shift_op"
742: Test timeout computed to be: 10000000
742: grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
742: W1217 04:08:01.149598 94391 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.3, Runtime API Version: 12.0
742: W1217 04:08:01.150851 94391 gpu_resources.cc:164] device: 0, cuDNN Version: 8.8.
742: I1217 04:08:01.243474 94391 program_interpreter.cc:214] New Executor is Running.
742: I1217 04:08:01.243816 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.250216 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.258257 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.265053 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.272917 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.278879 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.286705 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.292797 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.301602 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.307641 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.315480 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.321446 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.329121 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.335136 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.343986 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.350009 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.357797 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.363785 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.371469 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.380086 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.390872 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.396921 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.404939 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.410903 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.418599 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.424595 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.433780 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.439796 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.447381 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.453291 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.460729 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.466603 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.475275 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.481164 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
1/1 Test #742: test_bitwise_shift_op ............ Passed 5.06 sec
The following tests passed:
test_bitwise_shift_op
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 5.24 sec此外,我也安装了whl包验证了一下: λ xxxy /home/pd/Paddle/build python
Python 3.9.18 (main, Aug 25 2023, 13:20:04)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
>>> x = paddle.to_tensor([], dtype=paddle.int64)
>>> x = paddle.to_tensor([10,20,30], dtype=paddle.int64, place=paddle.CPUPlace())
>>> y = paddle.to_tensor([1,2,3], dtype=paddle.int64, place=paddle.CPUPlace())
>>> paddle.bitwise_left_shift(x, y, is_arithmetic=True)
I1217 04:15:22.898773 94440 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
Tensor(shape=[3], dtype=int64, place=Place(cpu), stop_gradient=True,
[20 , 80 , 240])
>>> paddle.bitwise_left_shift(x, y, is_arithmetic=False)
I1217 04:15:30.973914 94440 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
Tensor(shape=[3], dtype=int64, place=Place(cpu), stop_gradient=True,
[20 , 80 , 240])
>>> paddle.bitwise_right_shift(x, y, is_arithmetic=True)
I1217 04:16:38.313091 94440 bitwise_kernel.cc:67] I am in the .cc kernel!!!!!!
Tensor(shape=[3], dtype=int64, place=Place(cpu), stop_gradient=True,
[5, 5, 3])
>>> paddle.bitwise_right_shift(x, y, is_arithmetic=False)
I1217 04:16:45.778152 94440 bitwise_kernel.cc:68] I am in the .cc kernel!!!!!!
Tensor(shape=[3], dtype=int64, place=Place(cpu), stop_gradient=True,
[5, 5, 3])也是有glog信息的,不知道为啥coverage的ci说没覆盖到这四个宏。。。
|
|
@xuxinyi389 对另外 template <typename T>
struct BitwiseLeftShiftArithmeticFunctor {
HOSTDEVICE T operator()(const T a, const T b) const {
LOG(INFO) << "This is BitwiseLeftShiftArithmeticFunctor"; // line: 53
return a << b; }
};
template <typename T>
struct InverseBitwiseLeftShiftArithmeticFunctor {
inline HOSTDEVICE T operator()(const T a, const T b) const {
LOG(INFO) << "This is BitwiseLeftShiftArithmeticFunctor";
return b << a; }
};
template <typename T>
struct BitwiseLeftShiftLogicFunctor {
HOSTDEVICE T operator()(const T a, const T b) const {
LOG(INFO) << "This is BitwiseLeftShiftArithmeticFunctor";
return a << b; }
};然后安装whl包测试: 这个53行,也就是这个Functor的内部,是走到了的,不明白为什么coverage没检测到。 辛苦师傅帮忙review一下,hackathon 22号前ddl,想赶一赶尽量merge~ |
|
(欸好像coverage的CI突然×变√了,辛苦review~ |
| ) | ||
|
|
||
| def init_input(self): | ||
| self.x = np.random.randint(1, 20, [2, 3]).astype('uint8') |
There was a problem hiding this comment.
low=1,high=20。这样取值似不合理,应该取dtype所能表示的上下限更合理。
|
|
||
| class TestBitwiseLeftShiftAPI_UINT8(TestBitwiseLeftShiftAPI): | ||
| def init_input(self): | ||
| self.x = np.random.randint(1, 20, [2, 3]).astype('uint8') |
| class TestDygraphInplaceBitwiseRightShift_logic(TestDygraphInplaceLogicAnd): | ||
| def init_data(self): | ||
| self.input_var_numpy = paddle.randint( | ||
| low=0, high=10, shape=[3, 4, 5], dtype="int32" |
| with self.assertRaises(ValueError): | ||
| self.inplace_api_processing(broadcast_input) | ||
|
|
||
|
|
There was a problem hiding this comment.
最终的设计细节与RFC有差异的地方,同步更新到RFC
|
|
所有的low,high记得同步修改,不只我评论的那几处 |
|
看CI过了,之前是发现如果直接位移的话,cpu和gpu下的位移结果有几率不一致,本地gpu测的可以和np的接口对齐,但是试了下换成cpu在左移的时候,如果移动位数很大,发现会偶尔自动发生取模优化(例如uint8左移20bit的时候,实际会左移(20%8=4)位),而gpu下的结果一直都是0,让它溢出。 另外如果移动的数
所以为了和np的接口对齐和cpu、gpu结果一致,后来加的代码对这两方面在 |
|
LGTM |
| - op : bitwise_left_shift_arithmetic | ||
| args : (Tensor x, Tensor y) | ||
| output : Tensor(out) | ||
| infer_meta : | ||
| func : ElementwiseInferMeta | ||
| kernel : | ||
| func : bitwise_left_shift_arithmetic | ||
| backend : x | ||
| inplace: (x -> out) | ||
|
|
||
| - op : bitwise_left_shift_logic | ||
| args : (Tensor x, Tensor y) | ||
| output : Tensor(out) | ||
| infer_meta : | ||
| func : ElementwiseInferMeta | ||
| kernel : | ||
| func : bitwise_left_shift_logic | ||
| backend : x | ||
| inplace: (x -> out) |
There was a problem hiding this comment.
According to the C++ operator development specifications, the naming and parameters of operator need to be consistent with Python API, so that operator can reuse documentation of Python API. When user writes C++ code directly using operator, it will be relatively simple. So the operator needs to be bitwise_left_shift with is_arithmetic in args, kernels in file paddle/phi/kernels/bitwise_kernel.h should also haveis_arithmetic in args. and it's best if Functor in paddle/phi/kernels/funcs/bitwise_functors.h can be unified, but it's okay if it can't be unified.
| - op : bitwise_right_shift_arithmetic | ||
| args : (Tensor x, Tensor y) | ||
| output : Tensor(out) | ||
| infer_meta : | ||
| func : ElementwiseInferMeta | ||
| kernel : | ||
| func : bitwise_right_shift_arithmetic | ||
| backend : x | ||
| inplace: (x -> out) | ||
|
|
||
| - op : bitwise_right_shift_logic | ||
| args : (Tensor x, Tensor y) | ||
| output : Tensor(out) | ||
| infer_meta : | ||
| func : ElementwiseInferMeta | ||
| kernel : | ||
| func : bitwise_right_shift_logic | ||
| backend : x | ||
| inplace: (x -> out) | ||
|
|
There was a problem hiding this comment.
According to the C++ operator development specifications, the naming and parameters of operator need to be consistent with Python API, so that operator can reuse documentation of Python API. When user writes C++ code directly using operator, it will be relatively simple. So the operator needs to be bitwise_right_shift with is_arithmetic in args, kernels in file paddle/phi/kernels/bitwise_kernel.h should also haveis_arithmetic in args. and it's best if Functor in paddle/phi/kernels/funcs/bitwise_functors.h can be unified, but it's okay if it can't be unified.
| PD_REGISTER_KERNEL(bitwise_left_shift_arithmetic, | ||
| CPU, | ||
| ALL_LAYOUT, | ||
| phi::BitwiseLeftShiftArithmeticKernel, | ||
| uint8_t, | ||
| int8_t, | ||
| int16_t, | ||
| int, | ||
| int64_t) {} | ||
|
|
||
| PD_REGISTER_KERNEL(bitwise_left_shift_logic, | ||
| CPU, | ||
| ALL_LAYOUT, | ||
| phi::BitwiseLeftShiftLogicKernel, | ||
| uint8_t, | ||
| int8_t, | ||
| int16_t, | ||
| int, | ||
| int64_t) {} | ||
|
|
||
| PD_REGISTER_KERNEL(bitwise_right_shift_arithmetic, | ||
| CPU, | ||
| ALL_LAYOUT, | ||
| phi::BitwiseRightShiftArithmeticKernel, | ||
| uint8_t, | ||
| int8_t, | ||
| int16_t, | ||
| int, | ||
| int64_t) {} | ||
|
|
||
| PD_REGISTER_KERNEL(bitwise_right_shift_logic, | ||
| CPU, | ||
| ALL_LAYOUT, | ||
| phi::BitwiseRightShiftLogicKernel, | ||
| uint8_t, | ||
| int8_t, | ||
| int16_t, | ||
| int, | ||
| int64_t) {} |
There was a problem hiding this comment.
After unification, registration can be reduced to bitwise_right_shift and bitwise_left_shift
| if is_arithmetic: | ||
| return _C_ops.bitwise_left_shift_arithmetic(x, y) | ||
| else: | ||
| return _C_ops.bitwise_left_shift_logic(x, y) |
There was a problem hiding this comment.
After operator(C++ API) of bitwise_left_shift have parameter is_arithmetic , No need for this if, the code is simpler and clearer
| return _C_ops.bitwise_left_shift_arithmetic(x, y) | ||
| else: | ||
| return _C_ops.bitwise_left_shift_logic(x, y) | ||
| if is_arithmetic: |
| >>> paddle.bitwise_left_shift(x, y) | ||
| Tensor(shape=[2, 4], dtype=int64, place=Place(gpu:0), stop_gradient=True, | ||
| [[2 , 8 , 32 , 128], | ||
| [64 , 136, 128, 130]]) |
There was a problem hiding this comment.
The example code needs to provide the case when is_arithmetic = False to facilitate user understanding
| >>> paddle.bitwise_right_shift(x, y) | ||
| Tensor(shape=[2, 4], dtype=int64, place=Place(gpu:0), stop_gradient=True, | ||
| [[5 , 5 , 5 , 5 ], | ||
| [4 , 2 , 8 , 32]]) |
There was a problem hiding this comment.
The example code needs to provide the case when is_arithmetic = False to facilitate user understanding
|
@jeff41404 Thanks for your review and instruction, I refactored the code, and put the Because of the limitation of |
OK, thanks |
Co-authored-by: zachary sun <[email protected]>
Co-authored-by: zachary sun <[email protected]>
Co-authored-by: zachary sun <[email protected]>
Co-authored-by: zachary sun <[email protected]>
Co-authored-by: zachary sun <[email protected]>
Co-authored-by: zachary sun <[email protected]>
* [DimExpr] DimExpr support hash (PaddlePaddle#60471) * open warning with `paddle.utils.deprecated` (PaddlePaddle#60458) * open_warning * update unittest * update * fix typos * fix warning in test runner * uncomment * cleanup todo * using VisibleDeprecationWarning * update comment * fix typo * fix indentation * fix * fix * fix indent level and test * update --------- Co-authored-by: SigureMo <[email protected]> * [AutoParallel] Auto Trans PP to VPP (PaddlePaddle#60467) * [AutoParallel] Auto Trans PP to VPP * add comment * 【PIR OpTest Fix No.23】 fix test_distribute_fpn_proposals_op (PaddlePaddle#60335) * fix * fix * fix test_lookup_table_v2_bf16_op (PaddlePaddle#60332) * Fix shape error in combined-indexing setitem (PaddlePaddle#60447) * add ut * fix shape error in combine-indexing * fix ut * [auto parallel] Add pp lazy init, bug fix for xavier (PaddlePaddle#60441) * [PIR] add slice_array_dense api (PaddlePaddle#60433) * fix * fix * Set value with scalar (PaddlePaddle#60452) * set_value with scalar * fix ut * [PIR]Support custom op in PIR (PaddlePaddle#59790) * support custom op in pir * fix compile bugs * fix bugs * delete code * fix windows bugs * fix windows bugs * add symbol to paddle lib * fix windows bugs * revert code * fix bugs * fix bugs * perfect code according comment * fix py3 * revert third party * fix bugs * fix bug * fix compile bugs * fix windows * [Prim][PIR] support roll, gather, scatter, scatter_nd_add op backward in pir prim (PaddlePaddle#60481) * prim gather op backward * prim scatter op backward * prim roll op backward * prim scatter_nd op backward * [PIR] delete dense_tensor mem_desc_ (PaddlePaddle#60024) * delete dense_tensor mem_desc_ * [PIR] Complement op defs (PaddlePaddle#60475) * complement translation of legacy matmul * Complement op mappings in translation for deformable_conv_v1. * [pir]Supporting constant_folding_pass for train (PaddlePaddle#60355) * [pir]Supporting constant_folding_pass for train * fix * Update constant_folding_pass.cc * [Dynamic Shape] Fuse shape ops into generate shape op pass (PaddlePaddle#60490) * add shape.generate_shape op * rename shape.generate_shape to cinn_op.generate_shape * refactor GenerateShapeOp::SymbolBinding * move GenerateShapeOp related helper functions into generate_shape_util.cc * minor fix * minor fix * backup * refine signature of ConvertDimExprToAttribute * minor fix for signature of ConvertDimExprToAttributes * remove SubstituteDimExpr from generate_shape_util.h * Fix compile error * Fix unittest compile error * Code format * Code format * Fix _hiden_size to _hidden_size (PaddlePaddle#60485) * [DimExpr] Add substitute DimExpr util (PaddlePaddle#60493) * add SubstituteDimExpr * Fix compile error * Code format * Polish DimExprUtilTest * Change namesapce * Fix unittest * Polish DimExprUtilTest * [xpu]add sine_pos fuse pass and sine_pos xpu kernel (PaddlePaddle#60025) * add split with variable in factors and rewrite vectorize,unroll,bind error handling mechanism (PaddlePaddle#60449) * [CodeStyle] Fix regression of Ruff in sot (PaddlePaddle#60483) * support cast op from FP32 to low precision (PaddlePaddle#60385) * test=document_fix (PaddlePaddle#60399) * [XPU] refine flash attention ut (PaddlePaddle#60474) * [XPU] refine flash attention ut * refine tolerance * [Inference] support collect shape in sub block (PaddlePaddle#60451) * support collect shape in sub block * udpate * udpate * fix process mesh incorrect set in converter (PaddlePaddle#60504) * 【CMake opt No.13】Remove CINN DEPS in test/cpp/pir/shape_dialect/CMakeLists.txt (PaddlePaddle#60517) * Update CMakeLists.txt * Apply suggestions from code review * Apply suggestions from code review * Update CMakeLists.txt * Update CMakeLists.txt * 【pir】 add tensorarray op createarrylike, add_n (PaddlePaddle#60460) * optimize backward * [PIR] add vjp interface for while op * [PIR] fix ci error. * modify while stopgradient * merge * modify while grad bug * modify while grad op * modify * increment vp * [PIR] add get_used_external_value interface for block. * while case * delete print * delete print * Update python/paddle/autograd/ir_backward.py * [PIR] add unit_test for get_used_external_value * modify while_loop * code_style * modofy ci bug * modify while api * modify ci * modify array * Update python/paddle/autograd/ir_backward.py * Update test/legacy_test/test_cond.py * update * modify array_write grad info * merge * add_n and createarraylike * conflict * modify exe bug * modify kernel choose --------- Co-authored-by: winter-wang <[email protected]> * Add align iter space tactic (PaddlePaddle#60498) Add align iter space tactic * [Dynamic Shape] Add helper function MakeGenerateShapeOpAttribute (PaddlePaddle#60512) * add helper function MakeGenerateShapeOpAttribute * fix complier complaint * Code format * [Prim][PIR] Set prim gflag for pure cpp (PaddlePaddle#60505) * inference support decomp * polish code * add decomp base define * add decomp base define2 * change decomp infer * fix symbol overload * fix test case * debug * debug * decomp add debug info * add cpp flag * revert * remove unused flag * [PIR] Refine and fix pir exe (PaddlePaddle#60443) * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * update 2023 security advisory, test=document_fix (PaddlePaddle#60527) * [Inference] refine common/*.h for inference lib (PaddlePaddle#60513) * 【complex op】No.19 add complex support for triangular_solve (PaddlePaddle#59529) * fix reshard dist_attr (PaddlePaddle#60535) * 【auto parallel】剔除切分推导相关的头文件对proto 的依赖 (PaddlePaddle#60543) * decouple proto * format * format * strcuct pre def * [PIR] Support Operation::Clone Interface (PaddlePaddle#60536) * [PIR] Support Operation::Clone Interface * modify into shared_ptr * [Dynamic Shape] Add FullyInsertBroadcastPass and Broadcast Op (PaddlePaddle#60511) * add ShapeBroadcastOp * add pass FullyInsertBroadcastPass * InferSymbolicShape of BroadcastShape Op * Delete unit test * Fix return error * Code format * Fix error message * Update paddle/cinn/hlir/dialect/operator/transforms/fully_insert_broadcast_pass.cc Co-authored-by: Bo Zhang <[email protected]> --------- Co-authored-by: Bo Zhang <[email protected]> * Fix OpTranslatorTest name (PaddlePaddle#60518) * fix name * fix name * fix name * fix name * [PIR] migrate DataFeeder into pir (PaddlePaddle#60434) * 【PIR API adaptor No.90,92】Migrate some ops into pir (PaddlePaddle#59801) * [DimExpr] Convert Broadcast to BroadcastTree (PaddlePaddle#60440) * backup BroadcastTree * add SubstituteDimExpr * add helper function ConstructBroadcastTree * Fix compile error * Code format * Polish DimExprUtilTest * Add cmake file * Change namesapce * Fix compile error * Fix unittest * reconstruct BroadcastTree * Polish DimExprUtilTest * Reconstruct BroadcastTree * Finish BroadcastBranch * Finish BroadcastBranch * Finish BroadcastBranch * Add Unittest * Remove unnecessary dim_expr_util * Add header file * [Dynamic Shape] Erase expand (PaddlePaddle#60525) * EraseExpandOp * minor fix * minor fix * Code format * [inference] Support wint4 groupwise with cutlass gemm (PaddlePaddle#60422) * support gemv-groupwise func && weightQuanter-groupwise && weightDeQuanter-groupwise * fix build bug * add unit_test && fix bug * delete useless code * fix ci build bug * fix ci && optimize * fix merge conflict * add op change info * fix weight_only_linear_pass * fix format * solve ci unit_test * init * support cutlass gemm with groupwise * add unit test * fix strange bug * delete random bug * fix sm70 build bug * try to fix ci build bug * fix bug * fix volta build bug * skip sm70 in groupwise mode * change cutlass branch * simplify extent of loop after fuse and add corresponding test case (PaddlePaddle#60538) * fix bug of put_along_axis (PaddlePaddle#60551) * remove clearPass to allow custom device use fusion under fp16 (PaddlePaddle#60541) * fix fleetutil get_online_pass_interval bug2; test=develop (PaddlePaddle#60544) * fix vs2017 limit (PaddlePaddle#60528) * 【Hackathon 5th No.20】为 Paddle 新增 Exponential 和 Gamma API (PaddlePaddle#57899) * add exponential * add gamma distribution * refine docs * add kl_divergence and test * resolve conflicts * resolve conflicts * fix bug * refine test * fix test timeout * refine code * add standard_gamma kernel * fix comments * fix tests * fix tests * fix comments * fix tests * fix gamma grad * fix yaml * fix bugs * fix tests * fix standard_gamma_grad * fix test * fix test * add cdf & icdf * add cdf & icdf * refine comments * fix * fix * fix head file * fix * fix cuda op * fix * fix * refine test * fix test * refine comments * fix comments * fix * fix * fix type check * fix docs * delete useless comments * [CINN] Add IntrinsicOps into ir_codes_collector (PaddlePaddle#60556) This PR fixed a bug of running Resnet PaddleClas. The bug is due to vectorize introduce an intrinsic GetAddr and we didn't collect the tensor of GetAddr in ir_node_collector, this would caused tensor alias won't create in cuda code. TODO: we may modify IntrinsicOp in the near future * 【auto parallel】custom op spmd rule register (PaddlePaddle#60509) * custom op spmd rule register * custom op spmd rule register * custom op spmd rule register * custom op spmd rule register * polish * 【AutoParallel】Add master grad in AMP-O2 of AutoParallel (PaddlePaddle#59987) * add master_grad in auto-parallel * reset third_party * fix coverage * support bf16 master_grad * fix bug in master_grad * change code according to review * change the way to find optimizer op * [Dy2St] Fix `NameloadJstTransformer` missing transform call kwargs (PaddlePaddle#60515) --------- Co-authored-by: gouzil <[email protected]> * cinn(backends): generate infer shape kernel to infer shape of output tensor (PaddlePaddle#60519) 通过二维指针来返回后端infer shape的结果。生成的cinn ir如下。tensor_shape_args是一个二维指针。 infer_shape_set_value(0, 0, S1, tensor_shape_args) 表示将第0个output tensor的第0维设置为S1。 * fix tensor math method inplace converter (PaddlePaddle#60546) * [xpu]Add vis_decoder_attention_xpu_pass && modify qkv_attention_xpu_kernel (PaddlePaddle#60361) * [Prim][PIR] support abs, instance_norm op backward in prim pir (PaddlePaddle#60444) * abs op backward * add test case * update code * update code * update code * update code * update code * instance_norm op backward * add instance_norm_v2 test cast * custom op * [PIR] remove log simply name mechnism from phi to common. (PaddlePaddle#60507) * [InferSymbolicShape] Delete redundent value_id_to_shapeordata_ (PaddlePaddle#60554) * 【Hackathon 5th No.25】add gammaln api (PaddlePaddle#60553) * fix (PaddlePaddle#60570) * [CINN] Add tile tactic and bind cuda tactic (PaddlePaddle#60534) * [CINN] Add tile tactic * [CINN] Add bind cuda tactic * 【PIR OpTest Fix No.8】 fix test_shuffle_batch_op (PaddlePaddle#59631) * fix test_shuffle_batch_op * fix * 【PIR OpTest Fix No.14】 fix test_nce (PaddlePaddle#60255) * fix test_nce * fix test_nce * Update ops.yaml * fix * Update utils.cc * Update ops.yaml * 【PIR OpTest Fix No.19】 fix test_ftrl_op (PaddlePaddle#60329) * fix test_ftrl_op * fix * [auto parallel] Lazy init for MP. Add reshard infer shape. (PaddlePaddle#60563) * [PIR] Add unittest for Operation::Clone and Group::Clone (PaddlePaddle#60577) * [PIR] dce pass disable custom op (PaddlePaddle#60578) * [Inference] Fix bug of RunWithExternalStream API in new executor (PaddlePaddle#60122) * fix bug of RunWithExternalStream API in new executor * add test * fix bug of RunWithExternalStream API in new executor * reset flage in RunWithExternalStream * fix bug * add param swith_stream * fix bug * modify python api * fix bug * Resubmit PR-58859 (PaddlePaddle#60310) * allow multiple rng state in generator * Fix 60142; Fix some comments from sneaxiy * Overwrite copy constructors * add api * pre-commit * tensor_array slice in PIR (PaddlePaddle#60503) * use slice_array, now will meet error of destory opresult still in use * disable the pir test until the bug fixed * Set DistModel state_dict keys to structure_names (PaddlePaddle#60478) * exclude xpu * check structure name mapping * test pp * polish * support dynamic save static load * support dygraph save static load * polish * polish * use structured_name as key in DistModel state_dict * polish * polish * fix checkpoint path conflict * test get_rank_to_files * static save dynamic load test * fix sm75 build bug (PaddlePaddle#60583) * replace LOG(INFO) with VLOG(6) * Add CanProveDivisible for symbolic calculation (PaddlePaddle#60572) * add CanProveDivisible for symbolic calculation * delete extra cout for debug * fix according to some comments * [PIR][DynamicShape] make shape pass default and fix some bugs (PaddlePaddle#60548) att, make shape pass default and fix some bugs * Fix words (PaddlePaddle#60603) * 【auto parallel】custom op use spmd rule (PaddlePaddle#60571) * custom op use smpd rule * custom op use smpd rule * [auto parallel] add lazy init ut to llama (PaddlePaddle#60585) * 【pir】 modify array_write and array_read vjp , add a simple while with array_write (PaddlePaddle#60575) * optimize backward * [PIR] add vjp interface for while op * [PIR] fix ci error. * modify while stopgradient * merge * modify while grad bug * modify while grad op * modify * increment vp * [PIR] add get_used_external_value interface for block. * while case * delete print * delete print * Update python/paddle/autograd/ir_backward.py * [PIR] add unit_test for get_used_external_value * modify while_loop * code_style * modofy ci bug * modify while api * modify ci * modify array * Update python/paddle/autograd/ir_backward.py * Update test/legacy_test/test_cond.py * update * modify array_write grad info * merge * add_n and createarraylike * conflict * modify array_write vjp * modify array_write vjp * Update paddle/fluid/pybind/manual_static_op_function.h * modify array_write vjp * modify ci bug * modify * modify * Update test/legacy_test/test_while_loop_op.py * modify inplace array_read * Update test/legacy_test/test_while_op.py * Update test/ir/pir/test_while_api.py --------- Co-authored-by: winter-wang <[email protected]> * [Prim][PIR] add leaky_relu, sigmoid, instance_norm op forward prim (PaddlePaddle#60564) * hardswish op prim sink * hardswish op prim * add composite * add leaky_relu, sigmoid op forward prim * remove hardswish op forward * add instance_norm op forward prim * [CINN]Add bucket context (PaddlePaddle#60549) * [CINN] Add tile tactic * [CINN] Add bind cuda tactic * [CINN] Add bucket contexts * fix group output args bug * Add CUDNNv8 max pooling (PaddlePaddle#59413) * Add CUDNNv8 version of pool2d * Minor fix * Fix build failure * Remove dygraph API * Fix CI failure * Fix CI failure * Fix timeout * Fix timeout * Add comments * Minor fix * update lbfgs to avoid the randomness caused by paddle.dot() temporarily (PaddlePaddle#60591) * update lbfgs to avoid the randomness caused by paddle.dot() temporarily * add note * set_pir_tests_properties for some tests (PaddlePaddle#60401) * fix * Update CMakeLists.txt * Update pir_op_test_white_list * Update pir_op_test_white_list * Update pir_op_test_white_list * Add tests to whitelist (PaddlePaddle#60522) * fix * add * fix double grad without convert inplace (PaddlePaddle#60614) * fix fleetutil get_online_pass_interval bug3 (PaddlePaddle#60615) * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * [PIR][DynamicShape] Add an example for broadcast in dynamic shape infer (PaddlePaddle#60608) * Add an example for broadcast in dynamic shape infer * fix_convert_all_blocks (PaddlePaddle#60613) * fix_convert_all_blocks * [Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508) [Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508) * fix (PaddlePaddle#60625) * [PIR] Support Region Clone in Operation::Clone (PaddlePaddle#60590) * deg2rad test passed (PaddlePaddle#60619) * [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size (PaddlePaddle#60623) * [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size * fix padding_size * fix pooling_type * [SOT] move_gpu_pinned_to_gpu (PaddlePaddle#60395) * PIR API adaptor No.35、40】 Migrate paddle.nn.ChannelShuffle/ClipGradByNorm into pir (PaddlePaddle#60445) * fix some bugs * fix bugs * Update clip.py * Update test_channel_shuffle.py * Update test_clip_by_norm_op.py * Update test_clip_by_norm_op.py * add param name for dist_tensor parameter (PaddlePaddle#60574) * Fix (PaddlePaddle#60631) * [PIR] Reify InferSymbolicShapeInterface (PaddlePaddle#60438) * Reify InferSymbolicShapeInterface * [Dynamic Shape] Remove ShapeBroadcastOp redundant codes (PaddlePaddle#60609) * [Dy2St] fix `test_grad` in PIR mode (PaddlePaddle#60621) --------- Co-authored-by: xiaoguoguo626807 <[email protected]> * reconstruct llama ci cases (PaddlePaddle#60637) * 【AutoParallel】Unify the fp16 and bf16 in auto-parallel (PaddlePaddle#60514) * unify the fp16 and bf16 * change white_list in AMP * add dtype support * fix bug in dtype * [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass (PaddlePaddle#60624) * [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass * Fix compile error * Fix compile error * update pdsa-2023-019, test=document_fix (PaddlePaddle#60646) * [SOT] sot export test files (PaddlePaddle#60547) * Improve the performence of put_along_axis (PaddlePaddle#60618) * fix bug of put_along_axis * improve performence of put_along_axis * [AutoParallel] Fit vpp for gradient_merge pass (PaddlePaddle#60560) * add dist attr * add op namescope * add test_semi_auto_parallel_hybrid_strategy (PaddlePaddle#60537) * [PIR]Open uts for AdaptiveAvgPool3D (PaddlePaddle#60636) * test (PaddlePaddle#60654) * [CINN] Add OptimizeReductionTactic (PaddlePaddle#60661) * [Paddle-Trt]update set_value cmakelist (PaddlePaddle#60664) [Paddle-Trt]update set_value cmakelist * [auto parallel] fix reshape infer shape (PaddlePaddle#60632) * [CINN+PIR]Clean Old GroupScheduler logic and switch into new_group_scheduler (PaddlePaddle#60642) * [CINN]Fix HasDynamicShape Bug while Type is NULL (PaddlePaddle#60658) * [PIR] pir onednn support legact istruction and lrn (PaddlePaddle#60502) * pir onednn support legact istruction and lrn * c_softmax_with_cross_entropy support bf16 for xpu (PaddlePaddle#60472) * enable custom device to use silu_fuse_pass (PaddlePaddle#60595) move SetUseCustomDevice to all platform * [XPU] add empty_like op and test, update XHPC to 20240105 (PaddlePaddle#60617) * [XPU] update XHPC date and refine FA ut (PaddlePaddle#60598) * [XPU] update XHPC date * update comments for ut * correct adamw bf16 unit test and the way to get data type (PaddlePaddle#60565) * Fix some PADDLE_THROW error type and change test cases (PaddlePaddle#60487) * fix error type * fix TypeError fix type fix fix fix fix * fix typo * as_complex as_real check_grad (PaddlePaddle#60666) * [Fix Bug] Fix Bugs of Two Pass (PaddlePaddle#60626) * [Fix Bug] Fix Bugs of Two Pass * Fix GenerateShapeOp bug * Modify unit test * Fix MakeGetterDimExpr4SymbolName * 【Hackathon 5th No.34】为 Paddle 新增 bitwise_right_shift / bitwise_right_shift_ / bitwise_left_shift / bitwise_left_shift_ API (PaddlePaddle#58092) * This PR enable offset of generator for custom device. (PaddlePaddle#60616) * [SOT] Convert dtype to `DataType` in PIR mode (PaddlePaddle#60627) * [PIR] Change output to block_arg from copy to a shared for the execution of while (PaddlePaddle#60607) * test * fix * fix * fix * 【auto parallel】custom op spmd infer add args check (PaddlePaddle#60633) * add bound check * add bound check * [PIR] Open PIR flag for test_ifelse (PaddlePaddle#60685) * open pir flag for test_ifelse * Update test_ifelse.py * Update test_ifelse.py * [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass (PaddlePaddle#60669) * [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass * fix index error * refine pir_all_path UT * fix bug * fix uncontiguous tensor resize bug (PaddlePaddle#60684) * fix uncontiguous tensor resize bug * [PIR]Support inplace custom op in pir (PaddlePaddle#60529) * support inplace in pir * fix inference ut * fix win bugs * fix win bug * fix * polish code * polish code * print log * print log * debug * fix win bugs * fix windows * fix (PaddlePaddle#60634) * [Docs] Update latest release version in README (PaddlePaddle#60691) * [CINN] Refine cmake for pass in cinn (PaddlePaddle#60683) * refine cmake for pass in cinn * add dependency in cmake * add dependency in cmake * [PIR]Open uts for PReLU (PaddlePaddle#60645) * [PIR]Open uts for ReLU6 (PaddlePaddle#60650) * [PIR]Open uts for RReLU (PaddlePaddle#60660) * [NPU] fix storage_properties type mismatch with OneDNN and NPU (PaddlePaddle#60566) * fix ttfnet_darknet53_1x_coco in pir mode (PaddlePaddle#60663) * [auto parallel] shard tensor stop gradient support (PaddlePaddle#60699) * [PIR][DynamicShape] Polish some codes (PaddlePaddle#60651) att, polish some codes * [PIR] fix onednn double reg (PaddlePaddle#60720) * fix onednn double reg * 【pir】modify add_n in while use blockarg instead of input value (PaddlePaddle#60668) * test * fix * fix * fix * modify add_n block_arg * modify increment return value * merge * modfiy whiel_op.py --------- Co-authored-by: zhangbo9674 <[email protected]> * [PIR] Open test_case ut (PaddlePaddle#60721) * fix * fix * [PIR] rename data_layout (PaddlePaddle#60678) * rename data_layout * [xpu]: check op is null (PaddlePaddle#60656) * 【Hackathon 5th No.1】 为 Paddle 新增 copysign API (PaddlePaddle#57785) * add copysign op * fix codestyle * codestyle * fix test * fix std bug * merge init * merge init * merge init * add static cast * add std * static cast * static cast * copysignf * static cast to float input * float input * static cast to double input * fix * add inplace test * fix api * fix cast when grad * modify paddle.cast_ to cast_ * remove cast in python api * support fp16 && bf16 * set grad y to zero * fix en doc * support number input * add hostdevice * refactor kernel * fix nan when backward * add broadcast unit test * modify .cu * Update __init__.py * Update __init__.py * for ci test * static float * codestyle * static double * fix broadcast, try coverage * Delete paddle/phi/kernels/funcs/broadcast_function.h * remove unused * Update math.py * Update math.py * fix en doc * add test for output dtype, integer unsupported for now * update * update * fix * fix * add cast for input * fix * add pir test * fix doc * fix doc * fix doc * detail doc * adjust for MSVC * fix * Update python/paddle/tensor/math.py Co-authored-by: zachary sun <[email protected]> * Update python/paddle/tensor/math.py Co-authored-by: zachary sun <[email protected]> * fix doc output dtype, fix Equation * codestyle * codestyle * Update math.py --------- Co-authored-by: zachary sun <[email protected]> * rms_norm_infer_spmd (PaddlePaddle#60709) * [PIR]Open more tests for bernoulli and celu (PaddlePaddle#60706) * bernoulli && celu * celu test_error * [PIR]Open uts for scatter_nd_add (PaddlePaddle#60698) * [PIR]Open uts for scatter_nd_add * Fix ut * [PIR]Open uts for sinh (PaddlePaddle#60714) * [PIR]Open uts for Softshrink and Softsign (PaddlePaddle#60716) * [PIR] polish the ir_mapping implimentation. (PaddlePaddle#60675) * [PIR] fix onednn layout transform yaml format (PaddlePaddle#60680) * fix onednn layout transform yaml format * 【CINN】Complete error handler mechanism of dynamic schedule (PaddlePaddle#60718) * complete error handler mechanism of dynamic schedule * fix some output info * fix windows C++17 bug (PaddlePaddle#60736) * [XPU] fc pass and delete pass nodes check (PaddlePaddle#60314) * fix_local_windows_compile (PaddlePaddle#60682) * [PIR] fix onednn dialect name (PaddlePaddle#60665) * fix onednn dialect name * 【pir】add tesnor to array kernel etc (PaddlePaddle#60703) * merge * modfiy kernel * modify net * modify print * Fix defition definition (PaddlePaddle#60679) * cholesky and cholesky_solve tests (PaddlePaddle#60726) * [PIR]Open uts for searchsorted (PaddlePaddle#60700) * [PIR]Open uts for selu (PaddlePaddle#60702) * [PIR]Open uts for selu * Fix ut * [PIR]Open uts for sequence_mask (PaddlePaddle#60704) * [PIR] adjust pir pass log printing (PaddlePaddle#60723) * adjust pir pass log printing * update * update * update * fix compile * Fix Throughtput Throughput (PaddlePaddle#60741) * please last md (PaddlePaddle#60749) * [CINN+PIR]Fix Fetch XShape Variable logic (PaddlePaddle#60722) * [PIR][DynamicShape] Remove redundant code for shapeAnalysis and shapedTypeInterface (PaddlePaddle#60744) att, remove redundant code for shapeAnalysis and shapedTypeInterface * 【PIR Dist Op Reg No.1】 reg push_sparse_v2 (PaddlePaddle#60473) * code reg push_sparse_v2 * [Dynamic Shape] Provide operator<< For BroadcastTree (PaddlePaddle#60730) * [PIR] change IR clone to const and support clone operation successors (PaddlePaddle#60752) * support ir clone const and support clone operation successors * refine ir_mapping * refine region clone * [CINN] Refine fully_insert_broadcast_pass (PaddlePaddle#60676) * refine fully_insert_broadcast_pass * fix complie bug * fix complie * fix conflict * [PIR] einsum's inner_cache and xshape set to optional (PaddlePaddle#60748) * einsum's inner_cache and xshape set to intermediate * Update paddle/fluid/pir/dialect/operator/ir/ops.yaml --------- Co-authored-by: kangguangli <[email protected]> * reduce runtime of unit-tests in windows-trt (PaddlePaddle#60731) * modify trt test to deal with Timeout * windows * [Paddle-TRT] upgrade EnqueueV2 to EnqueueV3 (PaddlePaddle#59950) * 【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API (PaddlePaddle#59890) * Fix rank_relatvie rank_relative (PaddlePaddle#60770) * add graph_key to specific graph's varmap (PaddlePaddle#60567) * add graph_key to specific graph's varmap * fix inpalce case * fix inpalce case * 【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel (PaddlePaddle#59847) * [Init] add fractional max pool kernel and api * [Fix] pooling.cu seed offset * [Change] remove adaptive from fractional max pool * [Change] fractional max 2d gpu pooling.cu grad * [Change] fractional max 2d gpu pooling.cu grad with dim3 * [Change] use UnchangedInferMeta * [Change] test api with uint16 * [Change] wrap test disable_static * [Change] regiester float16/bfloat16 * [Change] remove bfloat16 from cpu kernrl * [Change] test dtypes in cpu and gpu * [Change] test_fractional_max_pool3d_2d/3d timeout to 30s * [Fix] resolve conflict * [Change] win32 cannot detect bfloat16 correctly * [Change] force set_device * [Add] test random_u is None * [Change] use kernel_size for overlapping mode * [Change] clean headers * [CodeStyle] pooling * [Change] rename op * [Change] rename func without index * [Prim][PIR] Recover pir bn (PaddlePaddle#60689) * reopen bn prim pir * fix atol * decomp support batch_norm_ * fix test case * fix bug * fix code * [PIR]fc_with_special_op_fuse_pass bug fix (PaddlePaddle#60751) * bug fix update * update * delete all debug message * add code deleted wrong at last commit * delete createAutoMixedPrecisionPass in analysis_predictor.cc --------- Co-authored-by: HongyuJia <[email protected]> Co-authored-by: ooo oo <[email protected]> Co-authored-by: SigureMo <[email protected]> Co-authored-by: zhaoyingli <[email protected]> Co-authored-by: xingmingyyj <[email protected]> Co-authored-by: JYChen <[email protected]> Co-authored-by: Yuang Liu <[email protected]> Co-authored-by: zhangbo9674 <[email protected]> Co-authored-by: YuanRisheng <[email protected]> Co-authored-by: kevin <[email protected]> Co-authored-by: wanghuancoder <[email protected]> Co-authored-by: kangguangli <[email protected]> Co-authored-by: zhangyuqin1998 <[email protected]> Co-authored-by: co63oc <[email protected]> Co-authored-by: NeroLoh <[email protected]> Co-authored-by: 傅剑寒 <[email protected]> Co-authored-by: lzydev <[email protected]> Co-authored-by: tianshuo78520a <[email protected]> Co-authored-by: houj04 <[email protected]> Co-authored-by: Yuanle Liu <[email protected]> Co-authored-by: LiYuRio <[email protected]> Co-authored-by: 张春乔 <[email protected]> Co-authored-by: xiaoguoguo626807 <[email protected]> Co-authored-by: winter-wang <[email protected]> Co-authored-by: BiynXu <[email protected]> Co-authored-by: cyber-pioneer <[email protected]> Co-authored-by: Vigi Zhang <[email protected]> Co-authored-by: zbt78 <[email protected]> Co-authored-by: liuzhenhai93 <[email protected]> Co-authored-by: Aurelius84 <[email protected]> Co-authored-by: Bo Zhang <[email protected]> Co-authored-by: Lu Qi <[email protected]> Co-authored-by: LoneRanger <[email protected]> Co-authored-by: freeliuzc <[email protected]> Co-authored-by: YibLiu <[email protected]> Co-authored-by: engineer1109 <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: xuxinyi389 <[email protected]> Co-authored-by: MayYouBeProsperous <[email protected]> Co-authored-by: Huihuang Zheng <[email protected]> Co-authored-by: gouzil <[email protected]> Co-authored-by: 6clc <[email protected]> Co-authored-by: Terry <[email protected]> Co-authored-by: winter-wang <[email protected]> Co-authored-by: Wang Xin <[email protected]> Co-authored-by: ming1753 <[email protected]> Co-authored-by: Frank Lin <[email protected]> Co-authored-by: pangengzheng <[email protected]> Co-authored-by: lanxianghit <[email protected]> Co-authored-by: Tian Zheng <[email protected]> Co-authored-by: lijialin03 <[email protected]> Co-authored-by: Wangzheee <[email protected]> Co-authored-by: zhink <[email protected]> Co-authored-by: huangjiyi <[email protected]> Co-authored-by: Chen Zhiyang <[email protected]> Co-authored-by: feifei-111 <[email protected]> Co-authored-by: fsczz <[email protected]> Co-authored-by: Haohongxiang <[email protected]> Co-authored-by: Sonder <[email protected]> Co-authored-by: Liujie0926 <[email protected]> Co-authored-by: WangZhen <[email protected]> Co-authored-by: risemeup1 <[email protected]> Co-authored-by: bukejiyu <[email protected]> Co-authored-by: zhangyikun02 <[email protected]> Co-authored-by: Jianbang Yang <[email protected]> Co-authored-by: enzodechine <[email protected]> Co-authored-by: Zhan Rongrui <[email protected]> Co-authored-by: coco <[email protected]> Co-authored-by: zhaohaixu <[email protected]> Co-authored-by: chen2016013 <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: Qi Li <[email protected]> Co-authored-by: zhangbo9674 <[email protected]> Co-authored-by: Liuyinfeng <[email protected]> Co-authored-by: zachary sun <[email protected]> Co-authored-by: wendaxiao <[email protected]> Co-authored-by: cyberslack_lee <[email protected]> Co-authored-by: lizexu123 <[email protected]> Co-authored-by: GGBond8488 <[email protected]> Co-authored-by: megemini <[email protected]>






PR types
New features
PR changes
APIs
Description
RFC:
更新RFC:
中文文档: