Skip to content

Option to exclude value counts from RabbitInAHat output #29

@mark-velez

Description

@mark-velez

Frequency distributions generated by the WhiteRabbit scan report are incredibly valuable, but since they can be polluted with PHI, an option to avoid duplicating them in the RabbitInAHat output (i.e. the appendix and spec file) would be incredibly useful and help prevent proliferation of these data. At Columbia we typically delete the appendix manually after generating the ETL doc. This of course is error-prone.

Furthermore, continuing to duplicate these data in the spec file will impede its very use once serialization to a human-readable format (#10) is implemented. We found that data from org.ohdsi.rabbitInAHat.dataModel.valueCounts contributes to an overwhelming majority of our spec file size. Since we deal with large datasets and adoption of a human-readable format likely precludes compression, this means we'd be expected to read and edit a file exceeding 34MB whereas without the counts it would be ~500KB.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions