Skip to content

Implement Matrix class to abstract algorithms away from data storage details #54

@kloudkl

Description

@kloudkl

Currently, the algorithm codes are quite aware of the memory layout of the underlying data. Adding a Matrix class in-between helps separate concerns of different modules which is a good practice in software engineering.

The biggest benefit is to simplify coding and improve the development productivity. It will also ease understanding of the existing and future algorithms. As a result, we will see accelerated development and adoption progresses.

The Matrix class is intended to be a view of 2D array contained in a Blob. Its main functionality is to provide high level wrappers of the common operations.

using boost::move;

template<Dtype>
class Matrix {
public:
  Matrix();
  Matrix(shared_ptr<Blob<Dtype> > blob);
  Matrix<Dtype> mul(Matrix<Dtype>& that) {
    Matrix<Dtype> product;
    caffe_gpu_gemm(...);
    return move(product);
  }
  Matrix<Dtype> add(Matrix<Dtype>& that);
  minus, div, rdiv, sqr, pow, exp, conv, sum, max, min, mean, std, ones, zeros, rand, randn, size, rows, cols, row, col, roi, t/transpose, rot90, ...
private:  
  shared_ptr<Blob<Dtype> > blob_;
  size_t num_;
  size_t channel_;
  size_t offset_;
}

So that we can write like codes like the following snippets.
The convolution:

output = image.conv(filter);

The fully connected layer:

output = weight.mul(input).add(bias);

The ReLU activation:

activation = input.max(0);

The Softmax activation

activations = input.exp();
probs = activations.rdiv(activations.sum(dim));

As you can see, the API is highly inspired by MATLAB which also motivates ArrayFire C++. But of course the snippets are only rough sketches. Many more details need to be considered. For example, if the performance price of boost move operations is too high, it could be replaced by shared_ptr which would complicate the user codes a little. Another question is should we pass in the shared_ptr of the result matrix instead of returning it. More importantly, the GPU codes may greatly differ from the CPU codes depending on whether CUDA can play well with the proposed API syntax.

Therefore, this issue's scope is limited to the implementation of the Matrix classes for both kinds of devices. Porting algorithms should be put into independent issues until benchmark results show no performance gap between the low level API and the proposed high level API.

Welcome efforts to refine the API and help implement it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions