Skip to content

Conversation

@rcurtin
Copy link
Member

@rcurtin rcurtin commented Nov 23, 2024

@shrit noticed that the mlpack_knn binding would produce strange output when loading data:

$ ./bin/mlpack_knn -r ~/datasets/corel.csv -q ~/datasets/corel.csv -k 3 -v
[INFO ] Using reference data from Loading '/home/ryan/datasets/corel.csv' as CSV data.  Size is 32 x 37749.
[INFO ] '/home/ryan/datasets/corel.csv' (37749x32 matrix).
[INFO ] Building reference tree...
[INFO ] Tree built.
[INFO ] Using query data from Loading '/home/ryan/datasets/corel.csv' as CSV data.  Size is 32 x 37749.
[INFO ] '/home/ryan/datasets/corel.csv' (37749x32 matrix).
[INFO ] Searching for 3 neighbors with dual-tree kd-tree search...

You can see from the output that the first message is actually two messages being printed at the same time.

I dug into this and it is a strange artifact of how the command-line bindings work---the problem does not show up for other binding types. The issue is this:

  • For command-line bindings, we call data::Load() the first time a parameter is accessed (e.g. params.Get<arma::mat>("matrix") or params.GetPrintable<arma::mat>("matrix")), and that call to data::Load() will produce output.
  • But in some bindings, the call to GetPrintable<> is used to print a status message.
  • If GetPrintable<> is called on a matrix parameter before the data has been loaded, then the data::Load() message will be printed while that GetPrintable<> call happens... leading to the duplicate message.

So, I found all bindings where this situation happens, and where possible I changed things around so that GetPrintable<> was not the first call that would trigger the load of a matrix. So long as the first call is something like params.Get<arma::mat>("matrix"), then the load happens separately and there is no duplicate output.

After this PR the output of the program command above is:

[INFO ] Loading '/home/ryan/datasets/corel.csv' as CSV data.  Size is 32 x 37749.
[INFO ] Using reference data from '/home/ryan/datasets/corel.csv' (37749x32 matrix).
[INFO ] Building reference tree...
[INFO ] Tree built.
[INFO ] Loading '/home/ryan/datasets/corel.csv' as CSV data.  Size is 32 x 37749.
[INFO ] Using query data from '/home/ryan/datasets/corel.csv' (37749x32 matrix).
[INFO ] Searching for 3 neighbors with dual-tree kd-tree search...

which is what we would hope for.

@shrit
Copy link
Member

shrit commented Nov 27, 2024

@rcurtin it seems that the std::move is not working correctly on this one

@rcurtin
Copy link
Member Author

rcurtin commented Nov 27, 2024

Thanks for pointing that out---I hadn't gotten back to this PR yet. Should be fixed now, let's see what CI says...

Copy link
Member

@shrit shrit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, let us see if the tests pass, the radical test failure should not be related. I have restarted the CI

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@shrit shrit merged commit f35711b into mlpack:master Dec 6, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants