add index_put api#52886
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
… less than x.dims
| #include "paddle/phi/kernels/index_put_grad_kernel.h" | ||
| #include <numeric> | ||
| #include "paddle/phi/backends/gpu/gpu_context.h" | ||
| #include "paddle/phi/backends/gpu/gpu_launch_config.h" |
| int64_t offset = 0; | ||
|
|
||
| for (size_t i = 0; i < Rank; ++i) { | ||
| cur_ix = (int64_t(*(indices[i] + idx))); |
| value_grad->dtype(), | ||
| false, | ||
| &value_grad_dims_without1); | ||
| phi::ReshapeInferKernel<Context>( |
There was a problem hiding this comment.
这里的目的是对value_grad做Resize吧?value_grad是输出,直接用value_grad->Resize(...)就行?
There was a problem hiding this comment.
value_grad的size不能调用resize变化的,value_grad的dims会影响到反向梯度的shape,需保持与前向的value的shape一致
There was a problem hiding this comment.
但你ReshapeInferKernel的调用,不也会修改value_grad的shape吗?我的意思是,在L190再调用一次value_grad->Resize,直接再次设置value_grad的shape,也可避免ReshapeInferKernel中的一次memcpy。
There was a problem hiding this comment.
这里的ReshapeInferKernel本身并没有修改value_grad的shape
| template <typename T, size_t Rank> | ||
| void set_zero_kernel(const int64_t N, | ||
| const int64_t** indices, | ||
| phi::Array<int64_t, Rank> stride, |
There was a problem hiding this comment.
CPU Kernel没必要用phi::Array,直接用const std::vector<int64_t>&或const phi::DDim&类型就行,还能避免拷贝。
| } | ||
| } | ||
|
|
||
| template <typename T, typename Context, size_t Rank> |
There was a problem hiding this comment.
CPU Kernel就不要将Rank作为模板了,你单测覆盖率没过正式因为Rank
| #include "paddle/phi/kernels/expand_kernel.h" | ||
| #include "paddle/phi/kernels/nonzero_kernel.h" | ||
| #include "paddle/phi/kernels/reshape_kernel.h" | ||
| #include "paddle/phi/kernels/split_kernel.h" |
|
|
||
| #include <vector> | ||
| #include "paddle/fluid/memory/malloc.h" | ||
| #include "paddle/fluid/memory/memcpy.h" |
There was a problem hiding this comment.
不要include fluid下面的头文件,使用phi目录下的代替
| PROPERTIES TIMEOUT 200) | ||
| set_tests_properties(test_index_select_op PROPERTIES TIMEOUT 120) | ||
| set_tests_properties(test_index_add_op PROPERTIES TIMEOUT 120) | ||
| set_tests_properties(test_index_put_op PROPERTIES TIMEOUT 120) |
There was a problem hiding this comment.
这个不设置的话,CI会超时,和CI的同学确认过了,默认的值的话,很小,貌似不过15s,当时CI报错超过15s直接timeout了,具体是多少不确定
| @@ -0,0 +1,826 @@ | |||
| # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. | |||
|
|
||
|
|
||
| """ | ||
| assert len(indices) != 0, "indices can't be empty" |
|
Sorry to inform you that 7b71a3a's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
| out->set_layout(x.layout()); | ||
| } | ||
|
|
||
| void IndexPutInferMeta(const MetaTensor& x, |
| #pragma once | ||
|
|
||
| #include <vector> | ||
| #include "paddle/phi/common/place.h" |
| #pragma once | ||
|
|
||
| #include <vector> | ||
| #include "paddle/phi/common/place.h" |
| infer_meta : | ||
| func : IndexPutInferMeta | ||
| kernel : | ||
| func : index_put |
There was a problem hiding this comment.
输入x和indices的数据类型不同,需要指定按照谁的数据类型来选择kernel,关键字为data_type,写法如后面紧跟的index_sample
| const int64_t* pd_indices[7]; | ||
| for (size_t i = 0; i < indices_v.size(); ++i) { | ||
| pd_indices[i] = indices_v[i]->data<int64_t>(); | ||
| } |
There was a problem hiding this comment.
L108 - L111既然后续还会用到,就挪到L98吧,删除L121 - L124的重复代码。
| value_grad->dtype(), | ||
| false, | ||
| &value_grad_dims_without1); | ||
| phi::ReshapeInferKernel<Context>( |
There was a problem hiding this comment.
但你ReshapeInferKernel的调用,不也会修改value_grad的shape吗?我的意思是,在L190再调用一次value_grad->Resize,直接再次设置value_grad的shape,也可避免ReshapeInferKernel中的一次memcpy。
| T* out = dev_ctx.template Alloc<T>(p_res); | ||
| range_kernel<T>(N, out); | ||
| return res; | ||
| } |
There was a problem hiding this comment.
并不是说CPU、GPU Kernel里面重复,而是前向、反向中也有重复。通过模板+宏、或者设置不同的函数名来解决。
| const int64_t** indices, | ||
| const phi::DDim& stride, | ||
| const phi::DDim& shape, | ||
| int64_t isSingleValTensor, |
There was a problem hiding this comment.
isSingleValTensor -> is_single_val_tensor
| DenseTensor* out) { | ||
| auto* x_data = x.data<T>(); | ||
| auto* val_data = value.data<T>(); | ||
| bool isInitialized = out->initialized(); |
There was a problem hiding this comment.
isInitialized -> is_initialized
| #include "paddle/phi/kernels/reshape_kernel.h" | ||
| #include "paddle/phi/kernels/split_kernel.h" | ||
|
|
||
| namespace phi { |
| phi::DenseTensor res_tensor(tensor.dtype()); | ||
| res_tensor.Resize(res_dim); | ||
| ExpandKernel<T, Context>( | ||
| dev_ctx, mid_tensor, IntArray(phi::vectorize(res_dim)), &res_tensor); |
There was a problem hiding this comment.
这里调Reshape和Expand都会产生memcpy,实际上只需要获得相应的dims
There was a problem hiding this comment.
我开始也想过是不是可以直接resize就行,之前在
tmp_indices_v.emplace_back(DenseTensor(phi::DataType::INT64).Resize(phi::make_ddim({nonzero_indices.dims()[0],1})));替换为
tmp_indices_v.emplace_back(DenseTensor(phi::DataType::INT64).Resize(phi::make_ddim({nonzero_indices.dims()[0]})));的时候我尝试过是否通过resize可以减少一些reshape操作
但是在这里我受限的点在于我需要一个能够满足expand关系的src tensor和一个des tensor来操作,但是我并不能修改tensor这个对象,因为它是一个const reference,所以这里两次的拷贝,可能是一个必要的
| int64_t** indices, | ||
| phi::Array<int64_t, Rank> stride, | ||
| phi::Array<int64_t, Rank> shape, | ||
| int64_t isSingleValTensor, |
There was a problem hiding this comment.
isSingleValTensor -> is_single_val_tensor
| T* out = dev_ctx.template Alloc<T>(p_res); \ | ||
| range_kernel<T>(N, out); \ | ||
| return res; \ | ||
| } |
There was a problem hiding this comment.
真是没想到你会把整个函数全部写成一个宏。一个建议的最简单的修改方式如下:
template <typename T>
void range_kernel(int64_t N, T* out) {
...
}
template <typename T, typename Context>
phi::DenseTensor GetRangeTensor(const Context& dev_ctx, int64_t N, phi::DataType dtype) {
...
}
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
template <typename T>
__global__ void range_cuda_kernel(int64_t N, T* out) {
...
}
template <typename T, typename Context>
phi::DenseTensor GetRangeCudaTensor(
const Context& dev_ctx, int64_t N, phi::DataType dtype) {
...
}
#endif|
|
||
| Args: | ||
| x (Tensor) : The Source Tensor. Supported data types are int32, int64, float16, float32, float64, bool. | ||
| indices (Tensor): The tuple of Tensor containing the indices to index. |
There was a problem hiding this comment.
这里应该是List / tuple of Tensor
There was a problem hiding this comment.
这里是tuple of tensor,对齐的torch对应的API的用法
| indices (Tuple of Tensor): The tuple of Tensor containing the indices to index. | ||
| The data type of ``tensor in indices`` must be int32, int64 or bool | ||
| value (Tensor): The tensor used to be assigned to x. | ||
| accummulate (Bool): Whether the elements in values are added to x |
There was a problem hiding this comment.
有默认值的参数需要注明optional,以及default是什么
|
|
||
| def index_put(x, indices, value, accumulate=False, name=None): | ||
| """ | ||
| Outplace version of ``index_put_`` API, the output Tensor will be inplaced with input ``x``. |
There was a problem hiding this comment.
一般是会说 index_put_ 是 index_put 的 inplace 版本,能否反过来说辛苦文档pm确认下 @sunzhongkai588
|
|
||
| Returns: | ||
| Tensor, same dimention and dtype with x. | ||
| Examples: |
There was a problem hiding this comment.
这里需要确认一下官网预览效果,等 PR-CI-Paddle-Doc-Preview 跑完
XieYunshen
left a comment
There was a problem hiding this comment.
LGTM for set_tests_properties(test_index_put_op PROPERTIES TIMEOUT 120)
XieYunshen
left a comment
There was a problem hiding this comment.
LGTM for set_tests_properties(test_index_put_op PROPERTIES TIMEOUT 120)

PR types
New features
PR changes
APIs
Description
This PR add index_put and index_put_ API for Paddle, please refer to PaddlePaddle API doc for details.
(Supplementary Note: Due to some indexing mechanism problems of the Paddle framework, the performance of Paddle's ways to index is much slower than Torch, but the overall reconstruction is a process that takes time, so some advanced indexing with poor performance is firstly extracted for optimization and will expose them in the type of paddle API which are index_put and index_put_ API for users.
Advanced Indexing means using tensor as subscript to index a tensor. Under the functions supported by index_put API, its performance is far better than directly c-order indexing in paddle)