Improve file processing performance

This issue will improve the file processing performance with the following changes.

- Serialize in-process data to temporary storage vs passing interprocess
- Set a `batchsize` for queuing data into batches. The current implementation is spending a lot of time on queue blocking record-by-record.
- Remove output queue limit. With new method, the database loader will catch up at the end of processing

On a reasonably modern machine the full PubMed baseline (37 million articles as of JAN-2025) can be parsed in 1.5 hours.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve file processing performance #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve file processing performance #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions