[Repo Assist] Add Frame.distinctRowsBy to remove duplicate rows by column values by github-actions[bot] · Pull Request #596 · fslaborg/Deedle

github-actions · 2026-03-09T01:14:59Z

🤖 This PR was created by Repo Assist, an automated AI assistant.

Summary

Adds Frame.distinctRowsBy, a new frame-transformation function that retains only the first row (by index order) for each unique combination of values in the specified columns.

This is analogous to SQL SELECT DISTINCT col1, col2 FROM table — a pattern that comes up regularly (see #558).

Example

// Given a frame with duplicate rows:
//   A  B   C
// 0 x  1.0 10
// 1 y  2.0 20
// 2 x  1.0 30   ← duplicate of row 0 on (A, B)
// 3 y  2.0 40   ← duplicate of row 1 on (A, B)

df |> Frame.distinctRowsBy ["A"; "B"]
//   A  B   C
// 0 x  1.0 10
// 1 y  2.0 20

C#:

df.DistinctRowsBy("A", "B")

Implementation

distinctRowsBy is built on top of the existing filterRows primitive. For each row, it computes a key as an obj list of the requested column values. F# lists have structural equality and GetHashCode() so HashSet(obj list) correctly deduplicates any combination of standard .NET value types (strings, ints, floats, DateTime, etc.).

A C#-friendly [(Extension)] DistinctRowsBy(frame, [(ParamArray)] columns) overload is added to FrameExtensions.

Test Status

src/Deedle/Deedle.fsproj — builds without errors ✅
tests/Deedle.Tests/Deedle.Tests.fsproj — compiles without errors ✅
Unit tests cannot be executed locally (requires .NET 5, environment has .NET 8+); three new tests cover partial deduplication, no-op, and full deduplication. Please verify via CI.

Trade-offs

Uses obj list key rather than a typed tuple — keeps the implementation simple and avoids arity limits, at the cost of boxing value types per row per call. For large frames with many unique columns, a custom IEqualityComparer could improve performance, but this is sufficient for typical use.
Missing values are treated as null in the key (two rows with missing values in the same columns are considered equal).

Generated by Repo Assist · ◷

To install this agentic workflow, run
gh aw add githubnext/agentics/workflows/repo-assist.md@30f2254f2a7a944da1224df45d181a3f8faefd0d

Adds a new Frame.distinctRowsBy function that retains only the first row for each unique combination of values in the specified columns. This is analogous to SQL 'SELECT DISTINCT col1, col2' — a commonly requested feature (see issue #558). Also adds a C#-friendly DistinctRowsBy extension method that accepts a params array of column keys. Includes three unit tests for: partial deduplication, no-op on already-distinct data, and full deduplication. Co-authored-by: Copilot <[email protected]>

…8e1a5bf1c7b3f41

dsyme · 2026-03-09T13:09:12Z

/repo-assist update the added xmldoc comment to be proper xmldoc style

Use <summary>, <param>, and <category> tags matching the existing xmldoc conventions in FrameExtensions.fs. Co-authored-by: Copilot <[email protected]>

github-actions · 2026-03-09T13:14:00Z

Commit pushed: 1184e56

Generated by Repo Assist

github-actions Bot added automation repo-assist labels Mar 9, 2026

github-actions Bot and others added 2 commits March 9, 2026 01:15

ci: trigger checks

0086a59

Merge branch 'master' into repo-assist/improve-frame-distinctrowsby-3…

74a7e3e

…8e1a5bf1c7b3f41

This was referenced Mar 9, 2026

[Repo Assist] Monthly Activity 2026-03 #584

Closed

Suggestion to distinct rows by specified columns #558

Closed

dsyme added 3 commits March 9, 2026 03:23

Merge branch 'master' into repo-assist/improve-frame-distinctrowsby-3…

86ad724

…8e1a5bf1c7b3f41

Merge branch 'master' into repo-assist/improve-frame-distinctrowsby-3…

b8cfed8

…8e1a5bf1c7b3f41

Merge branch 'master' into repo-assist/improve-frame-distinctrowsby-3…

0234907

…8e1a5bf1c7b3f41

Update DistinctRowsBy xmldoc to proper xmldoc style

1184e56

Use <summary>, <param>, and <category> tags matching the existing xmldoc conventions in FrameExtensions.fs. Co-authored-by: Copilot <[email protected]>

ci: trigger checks

6ad1765

dsyme marked this pull request as ready for review March 12, 2026 02:16

dsyme merged commit 9696a61 into master Mar 12, 2026
2 checks passed

dsyme deleted the repo-assist/improve-frame-distinctrowsby-38e1a5bf1c7b3f41 branch March 12, 2026 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Repo Assist] Add Frame.distinctRowsBy to remove duplicate rows by column values#596

[Repo Assist] Add Frame.distinctRowsBy to remove duplicate rows by column values#596
dsyme merged 8 commits intomasterfrom
repo-assist/improve-frame-distinctrowsby-38e1a5bf1c7b3f41

github-actions Bot commented Mar 9, 2026

Uh oh!

dsyme commented Mar 9, 2026

Uh oh!

github-actions Bot commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

github-actions Bot commented Mar 9, 2026

Summary

Example

Implementation

Test Status

Trade-offs

Uh oh!

dsyme commented Mar 9, 2026

Uh oh!

github-actions Bot commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant