Skip to content

[CLI] Fix datasets list table rendering#4157

Merged
hanouticelina merged 2 commits intomainfrom
codex/fix-datasets-ls-table-rendering
Apr 27, 2026
Merged

[CLI] Fix datasets list table rendering#4157
hanouticelina merged 2 commits intomainfrom
codex/fix-datasets-ls-table-rendering

Conversation

@hanouticelina
Copy link
Copy Markdown
Contributor

@hanouticelina hanouticelina commented Apr 27, 2026

This PR fixes the hf datasets ls table rendering when dataset descriptions contain newlines.

Root cause: the table formatter converted cell values to strings and then padded/truncated them without normalizing embedded newlines and indentation. Dataset descriptions often include Markdown with leading newlines, tabs, and spacing, so a single DESCRIPTION cell could spill onto multiple terminal lines and shift the remaining columns.
The fix keeps the compact hf datasets ls columns, including DESCRIPTION, and normalizes table cells to one line for human/agent table output before truncation. JSON output still returns the original values.

Before:

> hf datasets ls --filter benchmark:official
ID                            CREATED_AT DESCRIPTION                         DOWNLOADS GATED LIKES PRIVATE TRENDING_SCORE
----------------------------- ---------- ----------------------------------- --------- ----- ----- ------- --------------
hf-audio/open-asr-leaderboard 2024-06-21 


                ESB Test Sets: Parquet &... 19665           23            13            
SWE-bench/SWE-bench_Verified  2025-04-29 Dataset Summary
SWE-bench Verifi... 101465          44            13            
openai/gsm8k                  2022-04-12 


                Dataset Card for GSM8K
        ... 824734          1276          12            
llamaindex/ParseBench         2026-04-09 


                ParseBench



Quick lin... 17145           73            12            
cais/hle                      2025-01-23 


[!NOTE]
IMPORTANT: Please hel... 49653     auto  787           8             
Idavidrein/gpqa               2023-11-27 


                Dataset Card for GPQA

... 101058    auto  422           6                     

After:

> hf datasets ls --filter benchmark:official
ID                            CREATED_AT DESCRIPTION                         DOWNLOADS GATED LIKES PRIVATE TRENDING_SCORE
----------------------------- ---------- ----------------------------------- --------- ----- ----- ------- --------------
hf-audio/open-asr-leaderboard 2024-06-21 ESB Test Sets: Parquet & Sorted ... 19665           23            13            
SWE-bench/SWE-bench_Verified  2025-04-29 Dataset Summary SWE-bench Verifi... 101465          44            13            
openai/gsm8k                  2022-04-12 Dataset Card for GSM8K Dataset S... 824734          1276          12            
llamaindex/ParseBench         2026-04-09 ParseBench Quick links: [🌐 Websi... 17145           73            12            
cais/hle                      2025-01-23 [!NOTE] IMPORTANT: Please help u... 49653     auto  787           8             
Idavidrein/gpqa               2023-11-27 Dataset Card for GPQA GPQA is a ... 101058    auto  422           6             

Note

Medium Risk
Touches shared CLI table formatting for both human and agent outputs, which could subtly change whitespace in existing command outputs and affect scripts that rely on exact formatting.

Overview
Fixes CLI table rendering when cell values contain newlines/tabs/extra indentation by normalizing all table cell values to a single line before truncation/printing (via new _single_line() used across human and agent table formatters).

Updates hf datasets ls to pass an explicit, ordered header list so the table consistently shows the intended compact columns (including description) instead of relying on auto-detected keys.

Reviewed by Cursor Bugbot for commit a394de2. Configure here.

@hanouticelina hanouticelina marked this pull request as ready for review April 27, 2026 16:21
@hanouticelina hanouticelina requested a review from Wauplin April 27, 2026 16:21
@bot-ci-comment
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a394de2. Configure here.

Comment thread src/huggingface_hub/cli/datasets.py Outdated
out.table(
results,
headers=["id", "created_at", "description", "downloads", "gated", "likes", "private", "trending_score"],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded headers hide user-requested expand properties

Medium Severity

The datasets list command's hardcoded table headers cause two issues: --expand properties (like tags or siblings) are fetched but not displayed, and unrequested properties appear as empty columns, making output noisy. Previously, headers were auto-detected, showing all relevant data.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a394de2. Configure here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely a non intended change @hanouticelina?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, not intended at all 🙈

Copy link
Copy Markdown
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Approving once the comment is addressed (juste remove the line?)

Comment thread src/huggingface_hub/cli/datasets.py Outdated
@hanouticelina hanouticelina merged commit 486cbe6 into main Apr 27, 2026
18 of 20 checks passed
@hanouticelina hanouticelina deleted the codex/fix-datasets-ls-table-rendering branch April 27, 2026 17:17
@huggingface-hub-bot
Copy link
Copy Markdown
Contributor

This PR has been shipped as part of the v1.13.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants