Skip to content

Comments

reduce the size of Attr from 3 pointers to 2 on 64 bit machines#6218

Merged
thufschmitt merged 12 commits intoNixOS:masterfrom
pennae:pos-symbol-tables
Apr 22, 2022
Merged

reduce the size of Attr from 3 pointers to 2 on 64 bit machines#6218
thufschmitt merged 12 commits intoNixOS:masterfrom
pennae:pos-symbol-tables

Conversation

@pennae
Copy link
Contributor

@pennae pennae commented Mar 8, 2022

this changes position handling to use a table similar to SymbolTable for storing position information, and changes SymbolTable itself to require less memory per Symbol at the expense of lookup convenience. with both of these tables using 32 bit indices we can pack Attr more tightly, reducing GC heap size and allocations by 10-15%. reduced GC load also gives us an eval performance improvement on the same order for free.

depends on #5865

@dpulls
Copy link

dpulls bot commented Mar 11, 2022

🎉 All dependencies have been resolved !

@pennae
Copy link
Contributor Author

pennae commented Mar 11, 2022

rebased to current master now. :)

@lovesegfault
Copy link
Member

Do you have benchmarks on the eval perf improvement here?

@pennae
Copy link
Contributor Author

pennae commented Mar 11, 2022

@lovesegfault benchmarks results are in the final commit message, copied here for convenience:


before (on memory-friendliness):

  Benchmark 1: nix search --no-eval-cache --offline ../nixpkgs hello
    Time (mean ± σ):      6.960 s ±  0.028 s    [User: 5.832 s, System: 0.897 s]
    Range (min … max):    6.886 s …  7.005 s    20 runs

  Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
    Time (mean ± σ):     328.1 ms ±   1.7 ms    [User: 295.8 ms, System: 32.2 ms]
    Range (min … max):   324.9 ms … 331.2 ms    20 runs

  Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
    Time (mean ± σ):      2.688 s ±  0.029 s    [User: 2.365 s, System: 0.238 s]
    Range (min … max):    2.642 s …  2.742 s    20 runs

after:

  Benchmark 1: nix search --no-eval-cache --offline ../nixpkgs hello
    Time (mean ± σ):      6.902 s ±  0.039 s    [User: 5.844 s, System: 0.783 s]
    Range (min … max):    6.820 s …  6.956 s    20 runs

  Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
    Time (mean ± σ):     330.7 ms ±   2.2 ms    [User: 300.6 ms, System: 30.0 ms]
    Range (min … max):   327.5 ms … 334.5 ms    20 runs

  Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
    Time (mean ± σ):      2.330 s ±  0.027 s    [User: 2.040 s, System: 0.234 s]
    Range (min … max):    2.272 s …  2.383 s    20 runs

SymbolPtr name;
Value * value;
PosPtr pos;
Value * value;
Copy link
Member

@andir andir Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that the order of these was choosen purposely?

And perhaps some test/assertion that ensure that this expectation still holds true in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment describing the reasoning here. not sure an assertion has much value though, do you have a scenario in mind that might interfere here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it has a signitifcant impact I think the trivial issue could be that someone adds another field here without fully understanding the consequences. Also just some refactoring (prior to your comment) might have lead to someone moving them around (for various reasons) without knowing that the layout was chosen deliberately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an assertion for the expected size with a warning about why it's important.

@pennae
Copy link
Contributor Author

pennae commented Mar 11, 2022

puzzling that the eval-okay-xml test fails on macos, it runs fine here on linux 😕

edit: turns out, use-after-free of parser data in string_view. c++ needs lifetimes 🙄

@pennae
Copy link
Contributor Author

pennae commented Mar 22, 2022

@edolstra this has been getting a lot of merge conflicts recently, does it make sense to keep resolving them or should we wait until this is officially approved?

@tomberek
Copy link
Contributor

NIX_SHOW_STATS trimmed for brevity

before                                             after
{                                                  {
  "envs": {                                          "envs": {
    "bytes": 146264296                                 "bytes": 146264296
  "list": {                                          "list": {
    "bytes": 26839376,                                 "bytes": 26839352,
  "values": {                                        "values": {
    "bytes": 362397240                                 "bytes": 362397456
  "symbols": {                                       "symbols": {
    "bytes": 3618409                                   "bytes": 1382817
  "sets": {                                          "sets": {
    "bytes": 1100289680,                               "bytes": 747502224,
  },                                                 },
  "sizes": {                                         "sizes": {
    "Env": 16,                                         "Env": 16,
    "Value": 24,                                       "Value": 24,
    "Bindings": 16,                                    "Bindings": 16,
    "Attr": 24                                         "Attr": 16
  },                                                 },
  "gc": {                                            "gc": {
    "heapSize": 1561522176,                            "heapSize": 1158864896,
    "totalBytes": 1960805392                           "totalBytes": 1566740096
  }                                                  }

@pennae
Copy link
Contributor Author

pennae commented Mar 29, 2022

rebased again. also renamed PosPtr to PosIdx and SymbolPtr to SymbolIdx since neither of them really act like pointers at all.

@Artturin
Copy link
Member

@Ericson2314
Copy link
Member

I am not up to date on these things and so could not review very way, but perhaps @layus can after #6204. I think it's good to have the people that do performance and diagnostic quality reviewing each other, as those goals in the general case have some tension.

(To be clear, I don't see a sign of diagnostic quality regression in this case.)

@pennae
Copy link
Contributor Author

pennae commented Apr 14, 2022

installer test has been stuck for nine days. nice.

Copy link
Member

@layus layus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real concern here for me, except for the slight readability decrease that often comes with performance tuning.

As this is a huge amount of work, I think a decision should be taken soon on whether we want to continue in this direction or not.

{ $$ = new ExprLambda(CUR_POS, data->symbols.create($1), 0, $3); }
| '{' formals '}' ':' expr_function
{ $$ = new ExprLambda(CUR_POS, data->symbols.create(""), toFormals(*data, $2), $5); }
{ $$ = new ExprLambda(CUR_POS, {}, toFormals(*data, $2), $5); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a comment for future readers. A dummy function with a proper name would help a lot.

Copy link
Contributor Author

@pennae pennae Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

Suggested change
{ $$ = new ExprLambda(CUR_POS, {}, toFormals(*data, $2), $5); }
{ $$ = new ExprLambda(CUR_POS, SymbolIdx::none, toFormals(*data, $2), $5); }

which would be a new constant akin to noPos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added another constructor to ExprLambda instead because C++ really doesn't like constexpr data members of a type to be declared inside the type, and adding a noSymbol to the surrounding namespace for a single use seemed rather cluttering.

[[gnu::noinline]]
void addErrorTrace(Error & e, const char * s, const std::string & s2) const;
[[gnu::noinline]]
void addErrorTrace(Error & e, const Pos & pos, const char * s, const std::string & s2) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the big pain point for me. Now these hacky functions leak into a much used, public header. I understand the purpose of these functions, but I would really like to find a more DRY alternative to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also make them private and make all the *Expr classes friends of EvalState, but that feels even more of a hack than this. we might be able to make them inline instead and define them only in eval-state.cpp to make all other uses compile time errors, but that only adds to the hackiness as well 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about going the other way around, and lookup the {Symbol,Position}Idx before passing them to throw*Error ? I My understanding is that the stack usage should not change. If this machinery is indeed about reducing stack space, then it is about the same.

That would avoid having to expose all these dummy and ill-defined functions in the headers, and let them settle when we should find a way to get rid of them.

Obviously it would make the code a bit more verbose, due to the repeated lookups, but that is not a big deal for that code in particular. If we wanted it terse, we should first take care of the try/catch and re-throwing bits, that clutter the eval and primops files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we did that at first, but it does clutters the call sites something fierce. this was mostly a readability choice as well and not something we're particularly attached to or did for runtime metrics. the symbols table is a public member of EvalState though, so we could make that parameter explicit and remove the declarations from the header again. (which then creates another problem: where does the argument go? first, last, somewhere in between?)

{
auto aDrvPath = getAttr(root->state.sDrvPath, true);
auto aDrvPath = getAttr("drvPath", true);
auto drvPath = root->state.store->parseStorePath(aDrvPath->getString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I read correctly that this disables the former optimization of turning common strings into symbols ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a manner of speaking. the optimization doesn't have a great impact here since the symbol is soon turned into a string again and used in an sqlite query, so the extra symbol table lookup we may have to do isn't very expensive in comparison.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I guess the few places where you had to move back from symbols to strings are the ones where you do not have access to a symbol table ? I see hot paths like the one in pirmops.cc still use the state->sFooBar "global" symbols.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have access to the symbol in this location too (through root->state), in this case it just seemed more readable to not use it. wouldn't be a problem to use it here as well though if it bothers anyone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to understand the code really, so not bothered by anything here. From the diff it looks like a de-optimisation, but it is indeed restricted to a few places. All fine for me.

@roberth
Copy link
Member

roberth commented Apr 18, 2022

Does the memory usage regress when evaluating on 32-bit platforms?

@pennae
Copy link
Contributor Author

pennae commented Apr 18, 2022

haven't tested 32 bit, but it should also decrease. Pos now elides a Symbol (aka pointer) per instances at the cost of two ints and a std::string per file. that should break even at five positions per file on average for 32 bit machines, and if nixpkgs is anything to go by that shouldn't be hard to achieve :)

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the speedup and memory reduction is much appreciated.

The diff is quite massive, so I might have missed something, but apart from the few comments I’ve left, looks good

src/nix/repl.cc Outdated
if (v.type() == nPath || v.type() == nString) {
PathSet context;
auto filename = state->coerceToString(noPos, v, context);
return {state->symbols.create(*filename), 0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t you just create a symbol out of the string, and then convert it back into a string here?

Copy link
Contributor Author

@pennae pennae Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the moment yes, but that's just to keep the visible diff small-ish. a later commit removes the symbol creation here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes sorry 🤦

Comment on lines +59 to +62
// must always be invalid by default, add() replaces this with the actual value.
// subsequent add() calls use this index as a token to quickly check whether the
// current origins.back() can be reused or not.
mutable uint32_t idx = std::numeric_limits<uint32_t>::max();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than setting this after the fact, can’t we just require an Origin to only be created from a PosTable (and so get a valid idx right when initialized) ? Like it’s done for Symbols

Copy link
Contributor Author

@pennae pennae Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, instances of this type double as origin information within the symbol table and as cookie values for users of the symbol table to enable reuse of origin entries. the idea is to have each site that creates positions allocate an Origin with an invalid idx, which the symbol table fills with a valid value when a position is created. each subsequent position creation need only check that the idx of the cookie is the idx of the last origin in the table and reuse that origin if it is, otherwise create a new origin and change the idx of the cookie. it turns out doing it this way is a lot more efficient than having each position creation pass the full file/origin pair and check those.

edit to clarify: multiple Origin cookies are allowed to exist at once, eg when doing nested parsing for some reason. once the second level has created a position the first level cookie is invalid and has to be updated transparently, otherwise very obscure bugs will happen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, didn’t think about nesting the parsing (dunno whether that happens in practice, but it’s definitely safer to assume that it can)

static constexpr unsigned CHUNK_SIZE = 8192;

std::vector<Origin> origins;
std::vector<std::vector<Offset>> offsets;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these nested vectors really provide a performance gain? And if so, could that be moved to its own class so that we decouple the storing logic from the actual caching? I think that would help a lot in making this easier to understand

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and similar for the symbols ofc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they do once you have a lot of them. allocating multiple fixed-size vectors is faster than reallocating a vector that keeps growing and has less overhead once the tables get large, at the expense of an additional indirection during lookup. extracting that to a separate type won't be a problem.

}

ref<AttrCursor> AttrCursor::getAttr(Symbol name, bool forceErrors)
std::shared_ptr<AttrCursor> AttrCursor::getAttr(std::string_view name, bool forceErrors)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is a std::shared_ptr and not a ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't remember, to be honest. it might just be wrong.

@pennae
Copy link
Contributor Author

pennae commented Apr 21, 2022

addressed @thufschmitt review comments :)

pennae added 10 commits April 21, 2022 21:25
we don't *need* symbols here. the only advantage they have over strings is
making call-counting slightly faster, but that's a diagnostic feature and thus
needn't be optimized.

this also fixes a move bug that previously didn't show up: PrimOp structs were
accessed after being moved from, which technically invalidates them. previously
the names remained valid because Symbol copies on move, but strings are
invalidated. we now copy the entire primop struct instead of moving since primop
registration happen once and are not performance-sensitive.
the only use of this function is to determine whether a lambda has a non-set
formal, but this use is arguably better served by Symbol::set and using a
non-Symbol instead of an empty symbol in the parser when no such formal is present.
a future commit will remove the ability to convert the symbol type used in
bindings to strings. since we only have two users we can inline the error check.
only file and line of the returned position were ever used, it wasn't actually
used a position. as such we may as well use a path+int pair for only those two
values and remove a use of Pos that would not work well with a position table.
when we introduce position and symbol tables we'll need to do lookups to turn
indices into those tables into actual positions/symbols. having the error
functions as members of EvalState will avoid a lot of churn for adding lookups
into the tables for each caller.
Pos objects are somewhat wasteful as they duplicate the origin file name and
input type for each object. on files that produce more than one Pos when parsed
this a sizeable waste of memory (one pointer per Pos). the same goes for
ptr<Pos> on 64 bit machines: parsing enough source to require 8 bytes to locate
a position would need at least 8GB of input and 64GB of expression memory. it's
not likely that we'll hit that any time soon, so we can use a uint32_t index to
locate positions instead.
PosTable deduplicates origin information, so using symbols for paths is no
longer necessary. moving away from path Symbols also reduces the usage of
symbols for things that are not keys in attribute sets, which will become
important in the future when we turn symbols into indices as well.
this slightly increases the amount of memory used for any given symbol, but this
increase is more than made up for if the symbol is referenced more than once in
the EvalState that holds it. on average every symbol should be referenced at
least twice (once to introduce a binding, once to use it), so we expect no
increase in memory on average.

symbol tables are limited to 2³² entries like position tables, and similar
arguments apply to why overflow is not likely: 2³² symbols would require as many
string instances (at 24 bytes each) and map entries (at 24 bytes or more each,
assuming that the map holds on average at most one item per bucket as the docs
say). a full symbol table would require at least 192GB of memory just for
symbols, which is well out of reach. (an ofborg eval of nixpks today creates
less than a million symbols!)
with position and symbol tables in place we can now shrink Attr by a full
pointer with some simple field reordering. since Attr is a very hot struct this
has substantial impact on memory use, decreasing GC allocations and heap size by
10-15% each. we also get a ~15% performance improvement due to reduced GC
loading.

pure parsing has taken a hit over the branch base because positions are now
slightly more expensive to create, but overall we get a noticeable improvement.

before (on memory-friendliness):

  Benchmark 1: nix search --no-eval-cache --offline ../nixpkgs hello
    Time (mean ± σ):      6.960 s ±  0.028 s    [User: 5.832 s, System: 0.897 s]
    Range (min … max):    6.886 s …  7.005 s    20 runs

  Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
    Time (mean ± σ):     328.1 ms ±   1.7 ms    [User: 295.8 ms, System: 32.2 ms]
    Range (min … max):   324.9 ms … 331.2 ms    20 runs

  Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
    Time (mean ± σ):      2.688 s ±  0.029 s    [User: 2.365 s, System: 0.238 s]
    Range (min … max):    2.642 s …  2.742 s    20 runs

after:

  Benchmark 1: nix search --no-eval-cache --offline ../nixpkgs hello
    Time (mean ± σ):      6.902 s ±  0.039 s    [User: 5.844 s, System: 0.783 s]
    Range (min … max):    6.820 s …  6.956 s    20 runs

  Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
    Time (mean ± σ):     330.7 ms ±   2.2 ms    [User: 300.6 ms, System: 30.0 ms]
    Range (min … max):   327.5 ms … 334.5 ms    20 runs

  Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
    Time (mean ± σ):      2.330 s ±  0.027 s    [User: 2.040 s, System: 0.234 s]
    Range (min … max):    2.272 s …  2.383 s    20 runs
it's no longer needed now that positions aren't really pointers any
more.
@pennae
Copy link
Contributor Author

pennae commented Apr 21, 2022

and rebased to master once more, for good measure

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed @thufschmitt review comments :)

Thanks :) I’ve pushed a couple of commits to move ChunkedVector to its own header and add some small tests. I’ll merge as soon as the CI is green

@thufschmitt thufschmitt enabled auto-merge April 22, 2022 08:05
@pennae
Copy link
Contributor Author

pennae commented Apr 22, 2022

not having high hopes for CI to be honest, the installer test has been hating this PR for weeks xD

@thufschmitt thufschmitt merged commit c4ffc8e into NixOS:master Apr 22, 2022
@pennae
Copy link
Contributor Author

pennae commented Apr 22, 2022

looks like the auto-merge was a bit over-eager again, the macos build failed:

 nix> In file included from src/libutil/tests/chunked-vector.cc:1:
nix> src/libutil/chunked-vector.hh:3:10: fatal error: 'bits/stdint-uintn.h' file not found
nix> #include <bits/stdint-uintn.h>
nix>          ^~~~~~~~~~~~~~~~~~~~~
nix> 1 error generated.

@thufschmitt
Copy link
Member

Duh, that thing is indeed totally broken 😠

@dtzWill
Copy link
Member

dtzWill commented Apr 22, 2022

Hi, I was wondering if std::deque was considered/would work instead of ChunkedVector? deque is basically a vector of vectors and is useful for same reasons I think ChunkedVector was introduced?

(modulo the forEach helper, but std::for_each will do the trick)

@pennae
Copy link
Contributor Author

pennae commented Apr 22, 2022

@dtzWill was considered but (maybe mistakenly) deemed unsuitable because there are no allocation size guarantees for its chunks. part of the reason for chunking is to keep a constant maximum memory overhead for unused capacity, if we just didn't find the way to get that done with a deque then a deque would work just as well

auto name = cursor->getAttr(state.sName)->getString();
auto outPath = cursor->getAttr("outPath")->getString();
auto outputName = cursor->getAttr("outputName")->getString();
auto name = cursor->getAttr("name")->getString();
Copy link
Member

@edolstra edolstra Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this and similar changes. Not that this is on the hot path, but isn't it worse for performance to replace a lookup of a symbol by a lookup of a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we changed these for readability. ioo it was preferrable to do it like this instead of as

auto name = cursor->getAttr(state.symbols[state.sName])->getString();

but ultimately both of them are totally valid.

@edolstra
Copy link
Member

Question: we now have two ways to uniquely refer to symbols, namely Symbol and SymbolIdx, which is confusing. Which one should be used in APIs? Given that it's possible to efficiently convert a SymbolIdx to a Symbol but not the other way around, does it even make sense to expose Symbol?

std::vector<Symbol> res;
for (auto & a : parseAttrPath(s))
res.push_back(state.symbols.create(a));
res.emplace_back(a);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this. How does this guarantee that the symbol is unique (i.e. pointer-comparable)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not, as of this PR Symbol does not confer a uniqueness guarantee, only a "this is a nix identifier" semantics. we can remove Symbol and rename SymbolIdx -> Symbol to restore that guarantee to the name if that's preferrable

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-29/18903/1

@roberth
Copy link
Member

roberth commented Apr 12, 2023

@pennae what do you think of a scheme like this, that replaces the position table by a clever indexing scheme?

It would enable some improvements to error messages; see the github reverse links to the issues that come after it.

@pennae pennae deleted the pos-symbol-tables branch December 31, 2023 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.