Skip to content

feat: Split data store into chunks#14315

Open
dmgawel wants to merge 20 commits intowithastro:mainfrom
dmgawel:dmgawel/chunked-data-store
Open

feat: Split data store into chunks#14315
dmgawel wants to merge 20 commits intowithastro:mainfrom
dmgawel:dmgawel/chunked-data-store

Conversation

@dmgawel
Copy link
Contributor

@dmgawel dmgawel commented Sep 5, 2025

Changes

Manifest format:

{
  "collectionName": [
     ["collectionName.hash1.js", "collectionName.hash2.js"], // 1000 entries, split into two strings
     ["collectionName.hash3.js"], // rest of entries
  ],
  "anotherCollection": ["..."]
}

Testing

Content layer tests have already great coverage, I've added test cases where relevant to run the suite with experimental flag as well.

Docs

Needs a new experimental feature page.

/cc @withastro/maintainers-docs

@changeset-bot
Copy link

changeset-bot bot commented Sep 5, 2025

⚠️ No Changeset found

Latest commit: f2dd103

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added the pkg: astro Related to the core `astro` package (scope) label Sep 5, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 5, 2025

CodSpeed Performance Report

Merging #14315 will not alter performance

Comparing dmgawel:dmgawel/chunked-data-store (f2dd103) with main (f59581f)1

Summary

✅ 6 untouched

Footnotes

  1. No successful run was found on main (3c14936) during the generation of this report, so f59581f was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@ascorbic
Copy link
Contributor

ascorbic commented Sep 5, 2025

This approach solves the issue for Cloudflare chunks, but I think it's a bit narrow. I'd rather also try to solve the issue of stringified objects being too large, by serialising the chunks individually and splitting by number of entries. Once doing that it would probably make sense to split by collection too.

@dmgawel
Copy link
Contributor Author

dmgawel commented Sep 11, 2025

@ascorbic I've changed the splitting approach to cover more use cases, as you've explained. It's now split by collection -> chunks of 1000 entries -> devalue -> further split string to not exceed 20MB.

I've ensured to cover all places, including the Vite virtual module. It now uses dynamic imports, to make sure it's not combined into one big module file during build.

Feel free to take another look at the PR.

Unfortunately, I'm stuck with some failing tests. I'd like to ask you for help with debugging 🙏 It looks like the failing tests are related to changes/invalidations/restarts. I struggle to recreate that manually, so maybe it's some kind of race condition? Or duplicate watching? IDK, I'm not sure.

@ascorbic
Copy link
Contributor

FYI, this will need to be behind an experimental flag for now, so you're going to need to add that into the logic.

@dmgawel dmgawel marked this pull request as ready for review September 17, 2025 08:47
@dmgawel dmgawel requested a review from ascorbic September 17, 2025 08:53
Copy link
Contributor

@ascorbic ascorbic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've not tested it yet, but it's looking promising. Have you been able to detect any difference in performance? It might be good to add test cases in the benchmarks dir (see benchmark/make-project/markdown-cc2.js). This will need a minor changeset too. See https://contribute.docs.astro.build/docs-for-code-changes/changesets/ and https://contribute.docs.astro.build/docs-for-code-changes/experimental-feature-docs/

* @version 5.14
* @description
*
* When enabled, ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs adding

@clong365
Copy link

Please merge this PR into new version as soon as possible.
Issue #13360 is a nightmare for projects with 10k+ md files.

@clong365
Copy link

clong365 commented Dec 6, 2025

I was wondering if there has been any progress on this pull request? It has been approximately three months since it was opened, and I wanted to check in on its status.
To potentially expedite the process, might it be feasible to split this PR into two separate parts?

  1. Focus on resolving the existing issue: Build fails when data-store.json exceeds a certain size #13360.
  2. Address the new proposal: Splitting content snapshot into chunks: enable deployment to platforms with file size limits (Cloudflare) roadmap #1213.

Separating the issue resolution from the proposal might help avoid unnecessary delays in addressing the core problem.
I appreciate the team's efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs pr pkg: astro Related to the core `astro` package (scope)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build fails when data-store.json exceeds a certain size

3 participants