-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Implement Matrix class to abstract algorithms away from data storage details #54
Description
Currently, the algorithm codes are quite aware of the memory layout of the underlying data. Adding a Matrix class in-between helps separate concerns of different modules which is a good practice in software engineering.
The biggest benefit is to simplify coding and improve the development productivity. It will also ease understanding of the existing and future algorithms. As a result, we will see accelerated development and adoption progresses.
The Matrix class is intended to be a view of 2D array contained in a Blob. Its main functionality is to provide high level wrappers of the common operations.
using boost::move;
template<Dtype>
class Matrix {
public:
Matrix();
Matrix(shared_ptr<Blob<Dtype> > blob);
Matrix<Dtype> mul(Matrix<Dtype>& that) {
Matrix<Dtype> product;
caffe_gpu_gemm(...);
return move(product);
}
Matrix<Dtype> add(Matrix<Dtype>& that);
minus, div, rdiv, sqr, pow, exp, conv, sum, max, min, mean, std, ones, zeros, rand, randn, size, rows, cols, row, col, roi, t/transpose, rot90, ...
private:
shared_ptr<Blob<Dtype> > blob_;
size_t num_;
size_t channel_;
size_t offset_;
}So that we can write like codes like the following snippets.
The convolution:
output = image.conv(filter);The fully connected layer:
output = weight.mul(input).add(bias);The ReLU activation:
activation = input.max(0);The Softmax activation
activations = input.exp();
probs = activations.rdiv(activations.sum(dim));As you can see, the API is highly inspired by MATLAB which also motivates ArrayFire C++. But of course the snippets are only rough sketches. Many more details need to be considered. For example, if the performance price of boost move operations is too high, it could be replaced by shared_ptr which would complicate the user codes a little. Another question is should we pass in the shared_ptr of the result matrix instead of returning it. More importantly, the GPU codes may greatly differ from the CPU codes depending on whether CUDA can play well with the proposed API syntax.
Therefore, this issue's scope is limited to the implementation of the Matrix classes for both kinds of devices. Porting algorithms should be put into independent issues until benchmark results show no performance gap between the low level API and the proposed high level API.
Welcome efforts to refine the API and help implement it.