collect more statistics about the data

data.table could collect more statistics about data while processing. This allows potential optimizations, not limited to internal data.table code. Users can use them to speed up their code and design more data-driven functions.
List of measures to collect:
- [x] is sorted: `haskey(x)`
- [x] has index: `!is.null(idx<-attr(attr(x, "index"), idx_name))`
- [x] has NA / `anyNA`
- [x] has NaN
- [x] number of groups (uniqueN): `length(attr(idx, "starts"))`
- [x] size of biggest group: `attr(idx, "maxgrpn")`
- [x] is unique (uniqueN == .N): `attr(idx, "maxgrpn")==1L`
- [x] range (min, max): `x[c(idx[1L], idx[length(idx)])]`
- [x] all NA: `{{hasna}} && length(attr(idx, "starts"))==1L`
- [x] is ascii

optionally, as I don't see obvious optimizations coming from those:
- [ ] NA count
- [ ] sd, var


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

collect more statistics about the data #2879

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

collect more statistics about the data #2879

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions