perf(DPR-Data-Read): Improve data reading process by voidful · Pull Request #733 · deepset-ai/FARM

voidful · 2021-03-04T16:51:47Z

Refer to issue #730

Improvement:

Support jsonl format for loading large file.
Downsampling data to reduce memory usage.
Add passage id when not exist.

Support jsonl format, downsampling data, add passage id when not exist.

Timoeller

Looks very good already. Thanks for the changes.

lets keep max samples
made a comment about a failing test

Although the downsampling now exists twice in our codebase I vote for keeping the downsampling in our TextSimilarityProcessor as wll, since we might use it for other datasets. Or do you have any strong preferences?

farm/data_handler/utils.py

Timoeller

Very nice, thanks for the suggested changes.

perf(DPR-Data-Read): Improve data reading process

6a97238

Support jsonl format, downsampling data, add passage id when not exist.

Timoeller reviewed Mar 4, 2021

View reviewed changes

farm/data_handler/utils.py Show resolved Hide resolved

farm/data_handler/utils.py Outdated Show resolved Hide resolved

fix file suffix check failed & keep max_samples setting

620a1f6

Timoeller self-requested a review March 5, 2021 08:23

Timoeller approved these changes Mar 5, 2021

View reviewed changes

Timoeller merged commit fc82058 into deepset-ai:master Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(DPR-Data-Read): Improve data reading process#733

perf(DPR-Data-Read): Improve data reading process#733
Timoeller merged 2 commits intodeepset-ai:masterfrom
voidful:improve-dpr-data-read

voidful commented Mar 4, 2021

Uh oh!

Timoeller left a comment

Uh oh!

Uh oh!

Uh oh!

Timoeller left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

voidful commented Mar 4, 2021

Uh oh!

Timoeller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Timoeller left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants