Option to exclude value counts from RabbitInAHat output

Frequency distributions generated by the WhiteRabbit scan report are incredibly valuable, but since they can be polluted with PHI, an option to avoid duplicating them in the RabbitInAHat output (i.e. the appendix and spec file) would be incredibly useful and help prevent proliferation of these data. At Columbia we typically delete the appendix manually after generating the ETL doc. This of course is error-prone.

Furthermore, continuing to duplicate these data in the spec file will impede its very use once serialization to a human-readable format (#10) is implemented. We found that data from `org.ohdsi.rabbitInAHat.dataModel.valueCounts` contributes to an overwhelming majority of our spec file size. Since we deal with large datasets and adoption of a human-readable format likely precludes compression, this means we'd be expected to read and edit a file exceeding 34MB whereas without the counts it would be ~500KB.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to exclude value counts from RabbitInAHat output #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Option to exclude value counts from RabbitInAHat output #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions