Skip to content

Remove the position table, add 32 bits to Expr? #8208

@roberth

Description

@roberth

Is your feature request related to a problem? Please describe.

The position table consumes memory and a per-position memory consumption stops us from generating many positions, which is also bad.

Describe the solution you'd like

I don't think we need a position table, if we make clever use of a file table instead.
Instead of a table, we could use a coordinate scheme where we split the position bits into two numbers: one for the file and one for picking out the n-th AST node. However, this gets a bit close to the limits of our bit budget, so we'd end up dropping some positions.

Details A half split gives us 16 bits to index the files, and if you were to evaluate all of a single Nixpkgs, you're close to maxing out 15 bits. In reality, you wouldn't parse all of those, but you may evaluate multiple nixpkgs, especially with flakes, without inter-flake caching. Worse, all-packages, hackage-packages and node-packages will not have position numbers for most of their ast. A rough estimate with `wc -w` gives, at the high end:
41933 pkgs/servers/web-apps/outline/yarn.nix pkgs/servers/web-apps/outline/yarn.nix
47588 pkgs/development/mobile/androidenv/repo.json pkgs/development/mobile/androidenv/repo.json
51079 maintainers/maintainer-list.nix maintainers/maintainer-list.nix
51080 pkgs/servers/monitoring/karma/node-packages.nix pkgs/servers/monitoring/karma/node-packages.nix
57125 pkgs/desktops/gnome/extensions/extensions.json pkgs/desktops/gnome/extensions/extensions.json
98313 pkgs/top-level/perl-packages.nix pkgs/top-level/perl-packages.nix
102093 pkgs/tools/typesetting/tex/texlive/tlpdb.nix pkgs/tools/typesetting/tex/texlive/tlpdb.nix
121178 pkgs/top-level/all-packages.nix pkgs/top-level/all-packages.nix
188951 pkgs/applications/editors/emacs/elisp-packages/recipes-archive-melpa.json pkgs/applications/editors/emacs/elisp-packages/recipes-archive-melpa.json
262215 pkgs/development/lisp-modules/imported.nix pkgs/development/lisp-modules/imported.nix
269585 pkgs/development/lisp-modules-new-obsolete/imported.nix pkgs/development/lisp-modules-new-obsolete/imported.nix
275547 pkgs/development/r-modules/cran-packages.nix pkgs/development/r-modules/cran-packages.nix
323407 pkgs/development/node-packages/node-packages.nix pkgs/development/node-packages/node-packages.nix
1156660 pkgs/development/haskell-modules/hackage-packages.nix pkgs/development/haskell-modules/hackage-packages.nix

Instead, a more "efficient" scheme is to use consecutive numbers.
Functionally, this is concatenating all files and then indexing that. (But of course not implemented like that)

I propose the following:

# pseudocode

typedef int32_t PosIdx; // already exists but newtyped

class EvalState additions {
  int32_t posCounter = 1;
  std::map<int, Path> filePosRanges;
  std::map<Path, Expr *> fileParseCache; // already exists
}

class Expr additions {
  int32_t posIdx;
}

parse(file):
  // save the start
  filePosRanges.insert(posCounter, file)
  for each expr, as they are constructed in order:
    posIdx = posCounter++

getPos(n):
  fileStart = filePosRanges.lower_bound(n)
  fileExpr <- find n in the Expr cache; returning null pos if not found
  return (find expr with posIdx == n in fileExpr)

I think the numbering will be in post-order.

Possible complications with their solutions

Arbitrary positions: currently you can create a Pos with a truly arbitrary location - an arbitrary character inside an AST node. If we need that, we'd have to reserve a range for those. I think these would be rare.

Serialization of parsing: I don't think we parse concurrently, but if we want to, we could use a pool of file tables and assign a range of positions to each. A parser claims any of the file tables for itself and continues as described before.

Describe alternatives you've considered

A product instead of a sum, as used in the introduced and rejected in the description of the solution; no other ideas so far.

Additional context

Priorities

Add 👍 to issues you find important.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeature request or proposallanguageThe Nix expression language; parser, interpreter, primops, evaluation, etcperformance

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions