Implement Matrix class to abstract algorithms away from data storage details

Currently, the algorithm codes are quite aware of the memory layout of the underlying data. Adding a Matrix class in-between helps separate concerns of different modules which is a good practice in software engineering. 

The biggest benefit is to simplify coding and improve the development productivity. It will also ease understanding of the existing and future algorithms. As a result, we will see accelerated development and adoption progresses.

The Matrix class is intended to be a view of 2D array contained in a Blob. Its main functionality is to provide high level wrappers of the common operations.

``` cpp
using boost::move;

template<Dtype>
class Matrix {
public:
  Matrix();
  Matrix(shared_ptr<Blob<Dtype> > blob);
  Matrix<Dtype> mul(Matrix<Dtype>& that) {
    Matrix<Dtype> product;
    caffe_gpu_gemm(...);
    return move(product);
  }
  Matrix<Dtype> add(Matrix<Dtype>& that);
  minus, div, rdiv, sqr, pow, exp, conv, sum, max, min, mean, std, ones, zeros, rand, randn, size, rows, cols, row, col, roi, t/transpose, rot90, ...
private:  
  shared_ptr<Blob<Dtype> > blob_;
  size_t num_;
  size_t channel_;
  size_t offset_;
}
```

So that we can write like codes like the following snippets.
The convolution:

``` cpp
output = image.conv(filter);
```

The fully connected layer:

``` cpp
output = weight.mul(input).add(bias);
```

The ReLU activation:

``` cpp
activation = input.max(0);
```

The Softmax activation

``` cpp
activations = input.exp();
probs = activations.rdiv(activations.sum(dim));
```

As you can see, the API is highly inspired by MATLAB which also motivates ArrayFire C++. But of course the snippets are only rough sketches. Many more details need to be considered. For example, if the performance price of boost move operations is too high, it could be replaced by shared_ptr which would complicate the user codes a little. Another question is should we pass in the shared_ptr of the result matrix instead of returning it. More importantly, the GPU codes may greatly differ from the CPU codes depending on whether CUDA can play well with the proposed API syntax. 

Therefore, this issue's scope is limited to the implementation of the Matrix classes for both kinds of devices. Porting algorithms should be put into independent issues until benchmark results show no performance gap between the low level API and the proposed high level API.

Welcome efforts to refine the API and help implement it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Matrix class to abstract algorithms away from data storage details #54

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Matrix class to abstract algorithms away from data storage details #54

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions