Skip to content

96 receiving potentially large uploads#577

Merged
1 commit merged intomainfrom
96-receiving-potentially-large-uploads
Dec 5, 2023
Merged

96 receiving potentially large uploads#577
1 commit merged intomainfrom
96-receiving-potentially-large-uploads

Conversation

@ghost
Copy link

@ghost ghost commented Nov 22, 2023

No description provided.

@ghost ghost linked an issue Nov 22, 2023 that may be closed by this pull request
@ghost
Copy link
Author

ghost commented Nov 23, 2023

Potentially relevant:
JetBrains/Exposed#871

@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch 3 times, most recently from d79b4c4 to d3608d4 Compare November 27, 2023 16:46
@ghost ghost marked this pull request as ready for review November 27, 2023 16:46
@ghost ghost marked this pull request as draft November 27, 2023 16:55
@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch from d3608d4 to bfcebde Compare November 27, 2023 16:59
@ghost ghost marked this pull request as ready for review November 27, 2023 17:40
@ghost ghost requested review from chaoran-chen and fengelniederhammer and removed request for fengelniederhammer November 27, 2023 17:40
Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, but some details probably need improvement.

@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from fd68633 to 816eaef Compare November 28, 2023 15:19
@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch from 816eaef to 35a8db1 Compare November 28, 2023 15:24
@ghost
Copy link
Author

ghost commented Nov 28, 2023

Follow-up tickets

loculus-project/security-issues#23
#604

@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from 6dd476c to e69e83c Compare November 28, 2023 23:05
@ghost
Copy link
Author

ghost commented Nov 28, 2023

supported compressed file formats, tested manually:

@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch from e69e83c to 0f04ee4 Compare November 28, 2023 23:09
Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor things 👍

@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from be974fc to e2b1de8 Compare November 29, 2023 13:47
@ghost
Copy link
Author

ghost commented Nov 29, 2023

Some performance measure:

  • 1 mio identical sequence entries zstd compressed
  • very small file size + local upload = negligible upload time
  • batch insert into aux tables with 1000 entries (100 seems also fine and can reduce RAM footprint even more, but is probably a bit slower)

Result:
Screenshot from 2023-11-29 15-10-24
(Plateau at the end is pretty much exactly 1GB)

Roughly 4 min (where validating and copying into prod table is ~ 10 sec each)

@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch from 8dd241b to d9e868c Compare November 29, 2023 16:19
@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch from d9e868c to 28dd626 Compare November 30, 2023 09:47
@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from a31e910 to c336612 Compare December 1, 2023 21:50
@ghost ghost added the preview Triggers a deployment to argocd label Dec 1, 2023
@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from e290fe3 to 4f18267 Compare December 4, 2023 17:34
@ghost ghost mentioned this pull request Dec 4, 2023
@ghost ghost requested a review from fengelniederhammer December 4, 2023 17:55
@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch from 4f18267 to fe54e89 Compare December 4, 2023 17:58
@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch from 92b0c67 to 03388c0 Compare December 5, 2023 09:35
@ghost ghost requested a review from fengelniederhammer December 5, 2023 09:44
@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch from 03388c0 to 483ac99 Compare December 5, 2023 12:22
 * introduce two auxiliary tables to efficiently validate and merge metadata and sequence data
 * remove singleton SequenceEntriesTable and replace it with provided and cached `Table`s to facilitate compression of sequence data
 * de-compress sequence strings with custom dictionary when de-serializing
 * support for zstd, gzip, xz, lzma, zip, bzip2
@ghost ghost force-pushed the 96-receiving-potentially-large-uploads branch from 483ac99 to 8c428b1 Compare December 5, 2023 12:43
@ghost ghost merged commit 73c35eb into main Dec 5, 2023
@ghost ghost deleted the 96-receiving-potentially-large-uploads branch December 5, 2023 13:04
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview Triggers a deployment to argocd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Receiving (potentially large) uploads

2 participants