Skip to content

What is "within scope" for the data.table package? #5722

@TysonStanley

Description

@TysonStanley

As we are working towards a new governance document for data.table (#5676), it seems important to consider what features are “within scope” of the data.table package. For instance, it is clear that essentially anything tied to data wrangling, cleaning, reformatting, structuring, and analysis are all within the scope of the package. But currently plotting (e.g., something like ggplot2) is not. To help communicate and guide feature requests and contributions, having this laid out clearly is important. Notably, the README states “data.table provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.” Obviously, there are additional features (e.g., fread()) in addition to the high-performance version of data.frames.
 
To start, here are the features that are already within scope (as described in the README).

  • Data manipulation and analysis
    • reshaping/pivoting
    • aggregation/summarizing
    • subsetting rows
    • all sorts of joining (left/right/full/inner, rolling, etc.)
    • adding/updating/deleting columns
  • Reading/writing of data from/to many file formats

Topics we currently believe are out of scope:

  • plotting/graphics (like ggplot2)
  • manipulating data stored on disk (or remote SQL DB) rather than in memory (like sqldf / dbplyr)
  • machine learning / modeling (like mlr3)
  • regular expression builders (like rex and nc packages)

Please add any others that make sense to include or should be discussed. Note that these topics may be relevant for the “Seal of Approval” (#5723).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions