Skip to content

Downloadable Training Data #3

@djl11

Description

@djl11

It would be very useful to have the training data returned from generate_data_parallel.py script available to download, for both the pile and packed cases.

I appreciate this may be a large amount of memory, and therefore difficult to host, so there is no expectation of course!
But it would avoid people needing to run the costly data generation process locally in order to experiment with the training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions