-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add main mlpack documentation framework #3603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I will need to look more in depth, I skimmed all these READMEs and they look good. However, I think we can do some improvements over the index page since it is the landing page and it needs to be very eye catching, currently it is a bit boring in some sense, and mainly sending people to another page. I do believe we can have some a quick starter there and show you can build an ML model in 5 minutes using mlpack or something like that, let us say make the first half page very eye catching so the user would say: Yes this library definitely worth my time. |
|
@shrit I'm not 100% sure that is needed as the "splash" per say is already conveyed via the main homepage. When they visit the reference index, they will likely want to be directed most efficiently to their specialized topic. The "quickstart" example is already featured heavily as one of the first examples in the new index. I think one of the only notes here is to bring forward the |
shrit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume all the removed code is replicated or does exist in ensmallen. In all cases I am not expert in it, so it is your call.
|
Regarding my above comment here is how I imagine a 5 minutes example: First a quick installation guide, like the following: Then copy the following into // This simple program uses the mlpack::neighbor::NeighborSearch object
// to find the nearest neighbor of each point in a dataset using the L1 metric,
// and then print the index of the neighbor and the distance of it to stdout.
#include <mlpack.hpp>
using namespace mlpack;
int main()
{
// Load the data from data.csv (hard-coded). Use CLI for simple command-line
// parameter handling.
arma::mat data("0.339406815,0.843176636,0.472701471; \
0.212587646,0.351174901,0.81056695; \
0.160147626,0.255047893,0.04072469; \
0.564535197,0.943435462,0.597070812");
data = data.t();
// Use templates to specify that we want a NeighborSearch object which uses
// the Manhattan distance.
NeighborSearch<NearestNeighborSort, ManhattanDistance> nn(data);
// Create the object we will store the nearest neighbors in.
arma::Mat<size_t> neighbors;
arma::mat distances; // We need to store the distance too.
// Compute the neighbors.
nn.Search(1, neighbors, distances);
// Write each neighbor and distance using Log.
for (size_t i = 0; i < neighbors.n_elem; ++i)
{
std::cout << "Nearest neighbor of point " << i << " is point "
<< neighbors[i] << " and the distance is " << distances[i] << "." << std::endl;
}
return 0;
}We can take a different example, just pulled this one from the website. Then you compile with the following command: Then run the app as follows: Then we can have either a link to a quick start or explain what we have done up there. For example, @rcurtin I know the above example is not useful but it shows very nice steps on how to include mlpack (pull the header), and compile the program. This is extremely nice and comfortable since 99% percent of people should not run This is also the same process for those who are using Visual Studio which was always a question of how to compile mlpack. For them it is even easier, just pull mlpack headers, copy them to your directory and then click on F5 or whatever the compile button @coatless I am not too fan of adding additional binding to mlpack, and the reason for this is very simple, it is too much of work to maintain them for a few specific sets of users. I love the CLI binding they are nice and each time we have a new method we are adding one of these. However, when it comes to Python, it is very rare to use this binding, it is better to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
|
Sorry it took me so long to get back to this.
@shrit I don't want to teach people that this is how to use the library. That doesn't check for dependencies and is a recipe for confused people with compilation problems reporting issues that take our time up to resolve ("can't find ensmallen.hpp", "can't find ", "couldn't find -lopenblas" and on and on and on). We already have a build guide in the README and I'd like to stick to that. (Totally open to changing around how it looks.)
It's a bit of an aside, but this is definitely not the case: 95% of the work to get mlpack working in Visual Studio is getting the dependencies in place. That means a bunch of headers for Armadillo, ensmallen, and cereal, and then also the compiled OpenBLAS library to link against. There's no way that you can just "copy and go"; that's why we have a handful of Windows compilation tutorials and examples.
I don't know if the homepage of the docs is the right place to do that. Maybe it is, maybe not, I'm not sure. We already have quickstarts though, and we have to be careful because there are quickstarts for every language: That can be cleaned up separately and made more eye-catching.
@coatless @shrit there is a difficult line to balance here: we have two types of users and each of you represent each of those types perfectly (in my opinion 😄): the people who want to use mlpack directly in a small application in C++, and those who would rather use it to accelerate parts of their workflow in other languages (like R). The bindings shouldn't go away or be marginalized, since they were made specifically to address the continual concern from the old days of "I would love to use mlpack and its algorithms but I can't use it directly in C++". At the same time, it's certainly true that to get the most out of mlpack, it should be used from C++, as the bindings necessarily only expose some of the functionality. I tried to walk this tightrope by adding a Overall I think that maybe these points are worth revisiting more carefully at the time when we integrate this with the website and the repository README, and then we can figure out the best way to make things eye-catching. Happy to discuss anything further; since this is already approved, I'll let it sit for a couple days before merging in case there's more discussion that leads to changes. 👍 |
coatless
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few quick comments
| citation for | ||
| [the paper below](https://joss.theoj.org/papers/10.21105/joss.05026) (given in | ||
| BibTeX format): | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also add the typed-out version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a0d81fc. 👍
| library with bindings to other languages. It aims to provide fast, lightweight | ||
| implementations of both common and cutting-edge machine learning algorithms. | ||
|
|
||
| mlpack's lightweight C++ implementation makes it ideal for deployment, and it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| mlpack's lightweight C++ implementation makes it ideal for deployment, and it | |
| mlpack's [lightweight C++ implementation](quickstart/cpp.md) makes it ideal for deployment, and it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it in a slightly different place (closer to the binding quickstart links) in 680fee0 ---thanks for the suggestion. 👍
|
@rcurtin would you be up for adding a link to a video showing "this is how we build MLPACK from scratch!" Or, adding in a |
Co-authored-by: James J Balamuta <[email protected]>
Co-authored-by: James J Balamuta <[email protected]>
@coatless That's a nice idea! I'd be interested in picking more of your brain about what to capture in a screencast like that, then I could record one. Or, if you want to do it, that works too.
Sure, I think that could be nice (although it may still be painful to develop with mlpack on Windows!). For now I'll go ahead and merge this once the build is green. It has been a busy February but I am finally now clear enough to get back to the mlpack backlog... |
I stopped documenting individual mlpack methods so that we could have a framework that could be deployed immediately (or soon!) that we could then add documentation to little by little.
The basic idea is this:
math::Range,DiscreteDistribution,GaussianDistribution,LMetric, andGaussianKernel; of course there is a lot more too!None of the links work yet. I think the next PR I work on will be a build system that will build all of the documentation and also test all the code snippets, and at that time I'll fix all the links. For now the thing to review is just the structure of the pages, the text, and the examples.
All of the code in these examples compiles and runs and works (I tested manually, but like I said above I do want to automate that and think it should not be too hard).
@shrit @zoq @coatless @conradsnicta (and anyone else who is interested!) I'd love to get your feedback on whatever you have time to take a look at. The more eyes, the better 👍