Skip to content

Conversation

@rcurtin
Copy link
Member

@rcurtin rcurtin commented Apr 3, 2024

This PR adds documentation for PCA (principal components analysis). This was a pretty simple method to document, but I had to make some minor changes:

  • Add non-inplace overloads for dimensionality reduction and variance keeping overloads. This is necessary so you can, e.g., pass in a sparse dataset and get back a dense one.

  • Templatize PCA::Apply() so it can work on different matrix types (e.g. arma::fmat). That in turn meant templatizing all of the decomposition strategies... as those get individually documented they will be tested further. As part of this, I refactored MakeAlias(), which existed in two places, so that it now only exists in one place. I also updated the documentation in core.md accordingly.

  • Fix a minor bug in the block Krylov randomized SVD; @zoq can you make sure my changes here are correct? I believe according to the paper that k must be less than or equal to d (see Algorithm 2's Input: section).

  • Limit the width of the sidebar so that it doesn't overlap with the content.

You can click around the documentation here:

Copy link
Member

@shrit shrit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, I left a couple of comments but even the docs webpage looks great.

Comment on lines +22 to +23
LMNNFunction<MetricType>::LMNNFunction(const arma::mat& datasetIn,
const arma::Row<size_t>& labelsIn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why not move this to MatType ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll come back later for it, the goal here was just to modify MakeAlias().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, good for me

Comment on lines +25 to 27
const arma::mat& datasetIn,
const arma::Row<size_t>& labelsIn,
MetricType metric) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, it will happen, just not in this PR (likely when I document NCA or LMNN).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is fine then

const MatType& predictors,
const arma::Row<size_t>& responses,
const MatType& predictorsIn,
const arma::Row<size_t>& responsesIn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is already refactored, LogisticRegression already takes whatever MatType as input. The labels have to be arma::Row<size_t> though.

Comment on lines -1 to -16
/**
* @file make_alias.hpp
* @author Ryan Curtin
*
* Implementation of `MakeAlias()`, a utility function. This is meant to be
* used in `SetWeights()` calls in various layers, to wrap internal weight
* objects as aliases around the given memory pointers.
*
* mlpack is free software; you may redistribute it and/or modify it under the
* terms of the 3-clause BSD license. You should have received a copy of the
* 3-clause BSD license along with mlpack. If not, see
* http://www.opensource.org/licenses/BSD-3-Clause for more information.
*/
#ifndef MLPACK_METHODS_ANN_MAKE_ALIAS_HPP
#define MLPACK_METHODS_ANN_MAKE_ALIAS_HPP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where this class has been moved? are you using directly the one that is in the core/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using it in the coot refactoring 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the signature is exactly the same but it is moved to core/ now. I cleaned up a few other overloads. I think everything is the same for your usage though: https://www.ratml.org/misc/mlpack-markdown-doc/user/core.html#aliases

Comment on lines +158 to +159
BaseMatType eigvec;
BaseColType eigVal;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between BaseColType and ColType ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to catch expressions that are passed in. So if I call, e.g., pca.Apply(2 * x, ...), MatType will be inferred as an Armadillo expression (some complex type). So BaseMatType and BaseColType are the "base" types that we can use to store results in. The ensmallen code does similar things at the beginning of every Optimize() function: https://github.com/mlpack/ensmallen/blob/master/include/ensmallen_bits/de/de_impl.hpp#L40

Comment on lines +24 to +25
* Allow PCA to take different matrix types (#3677).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should remember do the same for all the PR we merged

Comment on lines +19 to 29
* Reconstruct `m` as an alias around the memory `newMem`, with size `numRows` x
* `numCols`.
*/
template<typename ElemType>
arma::Cube<ElemType> MakeAlias(arma::Cube<ElemType>& input,
const bool strict = true)
template<typename MatType>
void MakeAlias(MatType& m,
typename MatType::elem_type* newMem,
const size_t numRows,
const size_t numCols,
const bool strict = true,
const typename std::enable_if_t<!IsCube<MatType>::value>* = 0)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I see it here.

#### See also:
<!-- TODO: add link -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess something for later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I can't add the link to RADICAL until the documentation page for that method is done.

Comment on lines 63 to 64
blockSize = std::min((size_t) data.n_rows,
std::min((size_t) data.n_cols, rank + 10));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the indentation is off?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks good to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 787e14b. 👍

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more comments from my side.

@rcurtin rcurtin merged commit 6586ed6 into mlpack:master Apr 9, 2024
@rcurtin rcurtin deleted the pca-doc branch April 9, 2024 12:52
This was referenced May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants