Skip to content

Add --sample <n> flag for quick data preview with schema #89

Description

@vmvarela

Description

When exploring an unfamiliar dataset, users typically want to see a few rows alongside the inferred schema. Currently this requires two separate steps (--columns + SELECT * FROM t LIMIT n). A single --sample <n> flag combines both into one invocation designed for exploration.

Example

$ cat sales.csv | sql-pipe --sample 3
# Schema (5 columns, 42,317 rows estimated):
#   id       INTEGER
#   region   TEXT
#   amount   REAL
#   date     TEXT
#   status   TEXT

id,region,amount,date,status
1,North,1250.00,2024-01-15,paid
2,South,875.50,2024-01-16,pending
3,East,2100.75,2024-01-16,paid

Acceptance Criteria

  • --sample <n> (default n=10 if flag given without value, or require explicit value — decide) prints a schema comment block to stderr followed by the first n data rows to stdout as CSV
  • Schema block lists each column name and its inferred type, prefixed with # so it is ignored by downstream CSV parsers
  • --sample implies --header (column names printed as first CSV row)
  • --sample is mutually exclusive with --json; compatible with --delimiter / --tsv
  • Exits after emitting n rows — does not need to read the entire input
  • Documented in --help, README.md, and docs/sql-pipe.1.scd
  • Tests: correct number of rows output, schema block present on stderr, early exit confirmed

Notes

  • Type inference still reads up to 100 rows (or n, whichever is larger) before emitting output
  • --sample is a read/explore mode, not a query mode — no SQL query argument required

Dependencies

Depends on #85 (--columns flag must exist so --sample can extend it to show sample rows alongside schema)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions