Add short cut for subsetting along the minor axis by Intron7 · Pull Request #8468 · cupy/cupy

Intron7 · 2024-08-08T12:49:10Z

This address #8452

I added a shortcut for sorted idx for the subsetting or when the X.indices are not sorted to begin with.

asi1024 · 2024-08-09T07:13:38Z

+        idx_is_sorted = cupy.all(idx[:-1] <= idx[1:])
+        if not self.has_sorted_indices or idx_is_sorted:


idx_is_sorted is a cupy.ndarray, so it requires DtoH synchronization. Aligning with CuPy's policy, we provide features as possible while minimizing the amount of DtoH synchronization.
How about sorting idx without checking it is already sorted?

I feel like sorting the index would create problems for the user. Because some people relay on sub-setting to sort there array.

How about copying idx?

I'm sorry, but I don't understand why copying would help with the issue. I added this:

idx_is_sorted = cupy.all(idx[:-1] <= idx[1:]) if not self.has_sorted_indices or idx_is_sorted:

because I want the user to get sorted X.indices if X.has_sorted_indices. If we don't care about that, we could leave this out and just run with the shortcut in every case. However, calling X.sort_indices() is more expensive than the old solution (which still has one DtoH that I save).

Hmm, I would prefer not to perform additional computation on the GPU to decide whether to use a shortcut path or not. Could you modify to allow shortcuts only to the extent that they can be branched by simple calculations on the host side?

How about copying idx?

I'm sorry, but I don't understand why copying would help with the issue.

I think this was in response to your earlier comment

I feel like sorting the index would create problems for the user.

My take is that @asi1024 suggested to copy idx and sort the copy, instead of sorting idx in-place (which could cause problems as you pointed out). After sorting, you don't need to compute idx_is_sorted and copy the boolean back to host, just proceed to use the sorted idx.

Let me clarify the intent and functionality of the current implementation because I feel like we are misaligned:

Unsorted Indices and Canonical Format:
My current implementation explicitly supports subsetting with an unsorted index array. In other words, it correctly returns columns exactly in the order specified by the user, even if the provided indices are unsorted. However, in such cases, the resulting CSX matrix will no longer be in canonical format (sorted column indices for CSR).

Why Sorting Doesn't Always Apply:
Sorting the index will still not help with the creation of a canonical CSX matrix. Even an unsorted index results in the correct order of columns in a CSR matrix just not in canonical format. I think if a user subsets a Canonical matrix they want a canonical matrix out.

Reason for the Existing Check:
The only reason the current check (idx_is_sorted) exists is to preserve performance in cases where users subset from a canonical CSR matrix using an unsorted index. The old implementation is faster than using my implementation and than calling sort_indices() afterwards.

I've been thinking about this, and what Severin said makes sense to me. @asi1024 could you share any implementation opportunity that you see in achieving what you suggested (simple shortcut based only on host logics)?

Intron7 · 2024-08-09T10:14:16Z

Benchmarks on an RTX 3090:

X.shape = (200,000, 27,998)
X_new.shape = (200,000, 19,214)
Old implementation = 500 ms
New implementation = 120 ms

Additionally, the shortcut is significantly more memory efficient. With 400,000 rows, the old implementation results in a memory error, while the new version completes without an issue.

Intron7 · 2025-02-22T22:47:18Z

@leofang as mentioned, this PR is important for the single-cell community as the old implementation is a significant bottleneck and responsible for many oom errors due to CSR to CSC conversion.

leofang · 2025-04-18T06:02:48Z

+        idx_is_sorted = cupy.all(idx[:-1] <= idx[1:])
+        if not self.has_sorted_indices or idx_is_sorted:


I've been thinking about this, and what Severin said makes sense to me. @asi1024 could you share any implementation opportunity that you see in achieving what you suggested (simple shortcut based only on host logics)?

Intron7 · 2025-04-28T14:57:36Z

@leofang @asi1024

After extensive testing and reviewing SciPy’s implementation, I found that my earlier version failed to handle duplicate indices such as [5, 5, 3, 2, 1, 1]. SciPy simply slices the axis without enforcing canonical CSR order, so unsorted input indices produce unsorted output rows. I have now re-implemented SciPy’s approach for CuPy with custom kernels that correctly support duplicates, eliminate the costly post-sort, run significantly faster, and use less memory. Let me know whether you’d prefer me to update the existing pull request with this implementation or create a new one.

leofang · 2025-04-28T15:05:59Z

If it is still in-scope for the original intent, feel free to just reuse the same PR. TBH either way is fine.

Intron7 · 2025-04-29T11:48:53Z

@leofang @asi1024 Thank you for your patience with this one. All cupyx.scipy.sparse tests are now passing on my system, and I’ve tried to keep the kernel formatting as close as possible to cupy.

leofang · 2025-06-24T22:07:13Z

/test mini

leofang · 2025-06-30T17:04:52Z

@asi1024 would it be possible for you to take another look? All tests have passed. Thanks! 🙂

asi1024 · 2025-07-02T03:03:26Z

+                _scalar.get_typename(self.data.dtype),
+            )
+            fillB = self._fill_B.get_function(ker_name)
+        threads = 32


Different thread counts are specified for calc_Bp_minor and fillB. Is this intentional?

Yes that is intentional based on how _fill_B& _fill_B_complex use the shared row to update the nnz per row.

asi1024 · 2025-07-02T11:10:01Z

@Intron7 @leofang LGTM! Thank you for your contributions!

Intron7 and others added 3 commits August 7, 2024 17:22

update subsetting

fa0049d

address sorting

a37b4fe

Merge branch 'main' into update-minor-slice

c7a7e7d

kmaehashi assigned asi1024 Aug 9, 2024

kmaehashi added cat:performance Performance in terms of speed or memory consumption prio:medium labels Aug 9, 2024

asi1024 reviewed Aug 9, 2024

View reviewed changes

Intron7 and others added 4 commits August 14, 2024 09:20

pre-commit fixes

a41c1c1

Merge branch 'main' into update-minor-slice

0100b5d

Merge branch 'main' into update-minor-slice

369bb64

Merge branch 'main' into update-minor-slice

0dfafa3

leofang reviewed Apr 18, 2025

View reviewed changes

Intron7 and others added 4 commits April 29, 2025 09:21

Merge branch 'main' into update-minor-slice

ac0ca8b

base reimplement

bb32a9b

fix complex

2d83dcf

forget pre-commit

22fc718

Intron7 requested review from asi1024 and leofang April 29, 2025 11:45

Intron7 added 2 commits April 29, 2025 13:53

remove print

d9f2d1c

deduplicate

7489ec7

asi1024 reviewed Jul 2, 2025

View reviewed changes

asi1024 added this to the v14.0.0a2 milestone Jul 2, 2025

asi1024 approved these changes Jul 2, 2025

View reviewed changes

asi1024 merged commit 896f4ca into cupy:main Jul 2, 2025
70 checks passed

Intron7 deleted the update-minor-slice branch July 3, 2025 07:23

		idx_is_sorted = cupy.all(idx[:-1] <= idx[1:])
		if not self.has_sorted_indices or idx_is_sorted:

Uh oh!

Conversation

Intron7 commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Intron7 commented Aug 9, 2024

Uh oh!

Intron7 commented Feb 22, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Intron7 commented Apr 28, 2025

Uh oh!

leofang commented Apr 28, 2025

Uh oh!

Intron7 commented Apr 29, 2025

Uh oh!

leofang commented Jun 24, 2025

Uh oh!

leofang commented Jun 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asi1024 commented Jul 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Intron7 commented Aug 8, 2024 •

edited

Loading