Skip to content

Conversation

@rcurtin
Copy link
Member

@rcurtin rcurtin commented Jan 13, 2024

I stopped documenting individual mlpack methods so that we could have a framework that could be deployed immediately (or soon!) that we could then add documentation to little by little.

The basic idea is this:

  • index.md will be the landing page for documentation and the entrypoint for users
  • citation.md holds the citation for when people are writing papers
  • matrices.md contains revamped details of how matrices are used in mlpack, and hopefully everyone should read this before they get started
  • load_save.md contains details on all the ways data and models can be loaded and saved with mlpack
  • core.md is a single page of documentation containing details on all core mlpack classes; so far I added math::Range, DiscreteDistribution, GaussianDistribution, LMetric, and GaussianKernel; of course there is a lot more too!

None of the links work yet. I think the next PR I work on will be a build system that will build all of the documentation and also test all the code snippets, and at that time I'll fix all the links. For now the thing to review is just the structure of the pages, the text, and the examples.

All of the code in these examples compiles and runs and works (I tested manually, but like I said above I do want to automate that and think it should not be too hard).

@shrit @zoq @coatless @conradsnicta (and anyone else who is interested!) I'd love to get your feedback on whatever you have time to take a look at. The more eyes, the better 👍

@shrit
Copy link
Member

shrit commented Jan 14, 2024

I will need to look more in depth, I skimmed all these READMEs and they look good. However, I think we can do some improvements over the index page since it is the landing page and it needs to be very eye catching, currently it is a bit boring in some sense, and mainly sending people to another page.

I do believe we can have some a quick starter there and show you can build an ML model in 5 minutes using mlpack or something like that, let us say make the first half page very eye catching so the user would say: Yes this library definitely worth my time.

@coatless
Copy link
Contributor

@shrit I'm not 100% sure that is needed as the "splash" per say is already conveyed via the main homepage. When they visit the reference index, they will likely want to be directed most efficiently to their specialized topic. The "quickstart" example is already featured heavily as one of the first examples in the new index.

I think one of the only notes here is to bring forward the mlpack bindings to other languages to capture the data scientists better. (It'll be further away once more information is added.)

Copy link
Member

@shrit shrit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume all the removed code is replicated or does exist in ensmallen. In all cases I am not expert in it, so it is your call.

@shrit
Copy link
Member

shrit commented Jan 15, 2024

Regarding my above comment here is how I imagine a 5 minutes example:

First a quick installation guide, like the following:

mkdir my_mlpack_example

git clone https://github.com/mlpack/mlpack.git

cp -r mlpack/src/*  my_mlpack_example

cd my_mlpack_example

vim main.cpp

Then copy the following into main.cpp

// This simple program uses the mlpack::neighbor::NeighborSearch object
// to find the nearest neighbor of each point in a dataset using the L1 metric,
// and then print the index of the neighbor and the distance of it to stdout.

#include <mlpack.hpp>

using namespace mlpack;

int main()
{
  // Load the data from data.csv (hard-coded).  Use CLI for simple command-line
  // parameter handling.
  arma::mat data("0.339406815,0.843176636,0.472701471; \
                  0.212587646,0.351174901,0.81056695;  \
                  0.160147626,0.255047893,0.04072469;  \
                  0.564535197,0.943435462,0.597070812"); 
  data = data.t(); 

  // Use templates to specify that we want a NeighborSearch object which uses
  // the Manhattan distance.
  NeighborSearch<NearestNeighborSort, ManhattanDistance> nn(data);

  // Create the object we will store the nearest neighbors in.
  arma::Mat<size_t> neighbors;
  arma::mat distances; // We need to store the distance too.

  // Compute the neighbors.
  nn.Search(1, neighbors, distances);

  // Write each neighbor and distance using Log.
  for (size_t i = 0; i < neighbors.n_elem; ++i)
  {
    std::cout << "Nearest neighbor of point " << i << " is point "
        << neighbors[i] << " and the distance is " << distances[i] << "." << std::endl;
  }

  return 0;
}

We can take a different example, just pulled this one from the website.

Then you compile with the following command:

g++ main.cpp -o main

Then run the app as follows:

./main

Then we can have either a link to a quick start or explain what we have done up there. For example, mlpack.hpp is the only required header to be included, then

@rcurtin I know the above example is not useful but it shows very nice steps on how to include mlpack (pull the header), and compile the program. This is extremely nice and comfortable since 99% percent of people should not run cmake and compile the tests or the CLI bindings because they do not need them. (If this is the case for the other 1 percent, then we can point them to a dedicated page, and even add a developer page for those who like to contribute).

This is also the same process for those who are using Visual Studio which was always a question of how to compile mlpack. For them it is even easier, just pull mlpack headers, copy them to your directory and then click on F5 or whatever the compile button

@coatless I am not too fan of adding additional binding to mlpack, and the reason for this is very simple, it is too much of work to maintain them for a few specific sets of users. I love the CLI binding they are nice and each time we have a new method we are adding one of these. However, when it comes to Python, it is very rare to use this binding, it is better to use scikit learn because it is designed for this purpose. It contains why more methods and is much easier to integrate with any Python app you are working on. Of course when it comes to R and Go it is a different discussion where it is more needed in these two languages. However, I do not think we should spend more effort as maintainers to add new programming languages.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@rcurtin
Copy link
Member Author

rcurtin commented Jan 30, 2024

Sorry it took me so long to get back to this.

This is extremely nice and comfortable since 99% percent of people should not run cmake

@shrit I don't want to teach people that this is how to use the library. That doesn't check for dependencies and is a recipe for confused people with compilation problems reporting issues that take our time up to resolve ("can't find ensmallen.hpp", "can't find ", "couldn't find -lopenblas" and on and on and on). We already have a build guide in the README and I'd like to stick to that. (Totally open to changing around how it looks.)

This is also the same process for those who are using Visual Studio which was always a question of how to compile mlpack. For them it is even easier, just pull mlpack headers, copy them to your directory and then click on F5 or whatever the compile button

It's a bit of an aside, but this is definitely not the case: 95% of the work to get mlpack working in Visual Studio is getting the dependencies in place. That means a bunch of headers for Armadillo, ensmallen, and cereal, and then also the compiled OpenBLAS library to link against. There's no way that you can just "copy and go"; that's why we have a handful of Windows compilation tutorials and examples.

I do believe we can have some a quick starter there and show you can build an ML model in 5 minutes using mlpack or something like that, let us say make the first half page very eye catching so the user would say: Yes this library definitely worth my time.

I don't know if the homepage of the docs is the right place to do that. Maybe it is, maybe not, I'm not sure. We already have quickstarts though, and we have to be careful because there are quickstarts for every language:

https://github.com/mlpack/mlpack/blob/master/doc/quickstart/cpp.md

That can be cleaned up separately and made more eye-catching.

I think one of the only notes here is to bring forward the mlpack bindings to other languages to capture the data scientists better. (It'll be further away once more information is added.)

I am not too fan of adding additional binding to mlpack, and the reason for this is very simple, it is too much of work to maintain them for a few specific sets of users.

@coatless @shrit there is a difficult line to balance here: we have two types of users and each of you represent each of those types perfectly (in my opinion 😄): the people who want to use mlpack directly in a small application in C++, and those who would rather use it to accelerate parts of their workflow in other languages (like R). The bindings shouldn't go away or be marginalized, since they were made specifically to address the continual concern from the old days of "I would love to use mlpack and its algorithms but I can't use it directly in C++". At the same time, it's certainly true that to get the most out of mlpack, it should be used from C++, as the bindings necessarily only expose some of the functionality.

I tried to walk this tightrope by adding a Bindings to other languages bullet point to the mlpack basics section at the top, to make them a bit easier to see visually:

https://www.ratml.org/misc/mlpack-markdown-doc/index.html


Overall I think that maybe these points are worth revisiting more carefully at the time when we integrate this with the website and the repository README, and then we can figure out the best way to make things eye-catching. Happy to discuss anything further; since this is already approved, I'll let it sit for a couple days before merging in case there's more discussion that leads to changes. 👍

Copy link
Contributor

@coatless coatless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few quick comments

citation for
[the paper below](https://joss.theoj.org/papers/10.21105/joss.05026) (given in
BibTeX format):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add the typed-out version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a0d81fc. 👍

library with bindings to other languages. It aims to provide fast, lightweight
implementations of both common and cutting-edge machine learning algorithms.

mlpack's lightweight C++ implementation makes it ideal for deployment, and it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mlpack's lightweight C++ implementation makes it ideal for deployment, and it
mlpack's [lightweight C++ implementation](quickstart/cpp.md) makes it ideal for deployment, and it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it in a slightly different place (closer to the binding quickstart links) in 680fee0 ---thanks for the suggestion. 👍

@coatless
Copy link
Contributor

@rcurtin would you be up for adding a link to a video showing "this is how we build MLPACK from scratch!"

Or, adding in a devcontainer.json setup?

@rcurtin
Copy link
Member Author

rcurtin commented Feb 21, 2024

@rcurtin would you be up for adding a link to a video showing "this is how we build MLPACK from scratch!"

@coatless That's a nice idea! I'd be interested in picking more of your brain about what to capture in a screencast like that, then I could record one. Or, if you want to do it, that works too.

Or, adding in a devcontainer.json setup?

Sure, I think that could be nice (although it may still be painful to develop with mlpack on Windows!).

For now I'll go ahead and merge this once the build is green. It has been a busy February but I am finally now clear enough to get back to the mlpack backlog...

@rcurtin rcurtin mentioned this pull request Feb 21, 2024
@rcurtin rcurtin merged commit 7ee0fbb into mlpack:master Feb 22, 2024
@rcurtin rcurtin deleted the main-docs branch February 22, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants