Skip to content

Adding correlation distance metric in oneDAL primitives#3059

Merged
richardnorth3 merged 38 commits intouxlfoundation:mainfrom
richardnorth3:dev/correlation-metric
Mar 6, 2025
Merged

Adding correlation distance metric in oneDAL primitives#3059
richardnorth3 merged 38 commits intouxlfoundation:mainfrom
richardnorth3:dev/correlation-metric

Conversation

@richardnorth3
Copy link
Copy Markdown
Contributor

@richardnorth3 richardnorth3 commented Feb 3, 2025

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

@icfaust
Copy link
Copy Markdown
Contributor

icfaust commented Feb 3, 2025

inline sycl::event reduce_by_columns(sycl::queue& q,

https://github.com/uxlfoundation/oneDAL/blob/main/cpp/oneapi/dal/backend/primitives/stat/cov.hpp#L34
should help you in calculating the means for each row (column_wise reduction to generate the sum, then calculate the mean using the second function)

this will leverage the GPUs capabilities

ndview<Float, 2>& out,
const event_vector& deps = {}) const {
const std::int64_t n = u.get_dimension(0);
auto u_sum = ndarray<Float, 1>::empty({ 1 });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up @richardnorth3 ! https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.correlation.html should be a good reference for the means necessary. The mean doesn't need to be a single value, but a vector the size of a single sample with each being the mean value for each feature (i.e. for each row). This will then be a vector subtraction. Let me know if I am misunderstanding things about your implementation. Good progress.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry about that, the definition is the mean of the 1d array, so my recommendations have been a little bit wrong.

@richardnorth3
Copy link
Copy Markdown
Contributor Author

richardnorth3 commented Feb 24, 2025

https://github.com/richardnorth3/oneDAL/blob/9b09822dd84b8eb2ee555e08b4576c8ce7d9c0e2/cpp/oneapi/dal/backend/primitives/distance/correlation_distance_dpc.cpp#L50-L95

After heavily mediating over numerous drafts and codebase reviews, I ultimately came back to your original advice @icfaust after an epiphany. This version of the correlation distance reuses most of the cosine distance code but calculates the deviations first, passes those values to get_inversed_norms(), and proceeds with the same functionality as the cosine distance computation.

@richardnorth3 richardnorth3 added dpc++ Issue/PR related to DPC++ functionality oneAPI Issue/PR related to oneAPI part labels Mar 3, 2025
@Vika-F
Copy link
Copy Markdown
Contributor

Vika-F commented Mar 5, 2025

@Mergifyio rebase

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 5, 2025

rebase

☑️ Nothing to do

Details
  • -conflict [📌 rebase requirement]
  • -closed [📌 rebase requirement]
  • queue-position = -1 [📌 rebase requirement]
  • any of:
    • #commits > 1 [📌 rebase requirement]
    • #commits-behind > 0 [📌 rebase requirement]
    • -linear-history [📌 rebase requirement]

@richardnorth3 richardnorth3 force-pushed the dev/correlation-metric branch from 4658d8f to 7c877d4 Compare March 5, 2025 16:16
Copy link
Copy Markdown
Contributor

@Alexandr-Solovev Alexandr-Solovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good to me! Well done! I have a few minor questions and suggestions for now

@Alexandr-Solovev
Copy link
Copy Markdown
Contributor

/intelci: run

Signed-off-by: richard.north.iii <[email protected]>
Signed-off-by: richard.north.iii <[email protected]>
@richardnorth3
Copy link
Copy Markdown
Contributor Author

/intelci: run

@Vika-F
Copy link
Copy Markdown
Contributor

Vika-F commented Mar 6, 2025

/intelci: run

Copy link
Copy Markdown
Contributor

@Alexandr-Solovev Alexandr-Solovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The private ci in the commit before apply clang format is green! Well done!

Copy link
Copy Markdown
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure in latest CI is not related to the changes. LGTM.

@richardnorth3 richardnorth3 merged commit 2c316ac into uxlfoundation:main Mar 6, 2025
17 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dpc++ Issue/PR related to DPC++ functionality oneAPI Issue/PR related to oneAPI part

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants