-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem? Please describe.
The position table consumes memory and a per-position memory consumption stops us from generating many positions, which is also bad.
Describe the solution you'd like
I don't think we need a position table, if we make clever use of a file table instead.
Instead of a table, we could use a coordinate scheme where we split the position bits into two numbers: one for the file and one for picking out the n-th AST node. However, this gets a bit close to the limits of our bit budget, so we'd end up dropping some positions.
Details
A half split gives us 16 bits to index the files, and if you were to evaluate all of a single Nixpkgs, you're close to maxing out 15 bits. In reality, you wouldn't parse all of those, but you may evaluate multiple nixpkgs, especially with flakes, without inter-flake caching. Worse, all-packages, hackage-packages and node-packages will not have position numbers for most of their ast. A rough estimate with `wc -w` gives, at the high end:41933 pkgs/servers/web-apps/outline/yarn.nix pkgs/servers/web-apps/outline/yarn.nix
47588 pkgs/development/mobile/androidenv/repo.json pkgs/development/mobile/androidenv/repo.json
51079 maintainers/maintainer-list.nix maintainers/maintainer-list.nix
51080 pkgs/servers/monitoring/karma/node-packages.nix pkgs/servers/monitoring/karma/node-packages.nix
57125 pkgs/desktops/gnome/extensions/extensions.json pkgs/desktops/gnome/extensions/extensions.json
98313 pkgs/top-level/perl-packages.nix pkgs/top-level/perl-packages.nix
102093 pkgs/tools/typesetting/tex/texlive/tlpdb.nix pkgs/tools/typesetting/tex/texlive/tlpdb.nix
121178 pkgs/top-level/all-packages.nix pkgs/top-level/all-packages.nix
188951 pkgs/applications/editors/emacs/elisp-packages/recipes-archive-melpa.json pkgs/applications/editors/emacs/elisp-packages/recipes-archive-melpa.json
262215 pkgs/development/lisp-modules/imported.nix pkgs/development/lisp-modules/imported.nix
269585 pkgs/development/lisp-modules-new-obsolete/imported.nix pkgs/development/lisp-modules-new-obsolete/imported.nix
275547 pkgs/development/r-modules/cran-packages.nix pkgs/development/r-modules/cran-packages.nix
323407 pkgs/development/node-packages/node-packages.nix pkgs/development/node-packages/node-packages.nix
1156660 pkgs/development/haskell-modules/hackage-packages.nix pkgs/development/haskell-modules/hackage-packages.nix
Instead, a more "efficient" scheme is to use consecutive numbers.
Functionally, this is concatenating all files and then indexing that. (But of course not implemented like that)
I propose the following:
# pseudocode
typedef int32_t PosIdx; // already exists but newtyped
class EvalState additions {
int32_t posCounter = 1;
std::map<int, Path> filePosRanges;
std::map<Path, Expr *> fileParseCache; // already exists
}
class Expr additions {
int32_t posIdx;
}
parse(file):
// save the start
filePosRanges.insert(posCounter, file)
for each expr, as they are constructed in order:
posIdx = posCounter++
getPos(n):
fileStart = filePosRanges.lower_bound(n)
fileExpr <- find n in the Expr cache; returning null pos if not found
return (find expr with posIdx == n in fileExpr)
I think the numbering will be in post-order.
Possible complications with their solutions
Arbitrary positions: currently you can create a Pos with a truly arbitrary location - an arbitrary character inside an AST node. If we need that, we'd have to reserve a range for those. I think these would be rare.
Serialization of parsing: I don't think we parse concurrently, but if we want to, we could use a pool of file tables and assign a range of positions to each. A parser claims any of the file tables for itself and continues as described before.
Describe alternatives you've considered
A product instead of a sum, as used in the introduced and rejected in the description of the solution; no other ideas so far.
Additional context
- Position table was introduced in reduce the size of Attr from 3 pointers to 2 on 64 bit machines #6218
Priorities
Add 👍 to issues you find important.