-
Notifications
You must be signed in to change notification settings - Fork 49
Improve file processing performance #59
Copy link
Copy link
Closed
Milestone
Description
This issue will improve the file processing performance with the following changes.
- Serialize in-process data to temporary storage vs passing interprocess
- Set a
batchsizefor queuing data into batches. The current implementation is spending a lot of time on queue blocking record-by-record. - Remove output queue limit. With new method, the database loader will catch up at the end of processing
On a reasonably modern machine the full PubMed baseline (37 million articles as of JAN-2025) can be parsed in 1.5 hours.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels