Skip to content

GC collects live objects when using template classes with tuple fields (LDC regression) #5068

@bhavya-sl

Description

@bhavya-sl

We are seeing the GC collect objects that are still reachable when they are stored inside a template class with a variadic tuple field (Ts ts). Under LDC, such classes are sometimes allocated with the GC.BlkAttr.NO_SCAN attribute, so the GC never scans them and treats referenced objects as unreachable.

As a temporary workaround, we manually clear the NO_SCAN flag after allocation:

auto EventLoop(Ts...)(Ts ts)
{
    auto eloop = new EventLoopImpl!(Ts)(ts);

    // BUG introduced in LDC 1.31.0
    version(LDC)
    {
        uint attr = GC.getAttr(cast(void*)eloop);
        if (attr & GC.BlkAttr.NO_SCAN)
        {
            info("Have to remove NO_SCAN attribute from EventLoopImpl due to LDC GC bug");
            GC.clrAttr(cast(void*)eloop, GC.BlkAttr.NO_SCAN);
            assert ((GC.getAttr(cast(void*)eloop) & GC.BlkAttr.NO_SCAN) == 0, "Still has NO_SCAN attribute after clearing it");
        }
    }
    return eloop;
}

Attaching the AI-assisted analysis of the bug below, including a reproducible example, investigation of the frontend and IR changes across versions and how the class flags are derived.

[REG 1.31] Template classes with variadic tuple fields incorrectly allocated as NO_SCAN, causing GC to collect live objects

Summary

Template classes with variadic tuple fields (Ts... ts) are incorrectly allocated with the NO_SCAN GC attribute. This causes the GC to never scan the object's memory, so any objects referenced only through tuple fields become unreachable and get collected — leaving dangling pointers and causing use-after-free crashes.

The bug is present in every LDC release from 1.31.0 through 1.42.0 (latest tested). It does not affect DMD. It occurs at all optimization levels (-O0, -O2, etc.).

Reproducer

import core.memory : GC;
import core.stdc.stdio : printf, fflush, stdout;

void log(const(char)* msg)
{
    printf("%s\n", msg);
    fflush(stdout);
}

class Inner
{
    int value;
    this(int v) { value = v; }
}

// Template class storing type params as tuple fields — triggers the bug.
final class Container(Ts...)
{
    bool flag;
    Ts ts;
    this(Ts args) { ts = args; }
}

// Equivalent non-template class for comparison — works correctly.
final class ContainerExplicit
{
    bool flag;
    Inner inner;
    this(Inner i) { inner = i; }
}

pragma(inline, false)
Container!(Inner) makeTupleContainer()
{
    return new Container!(Inner)(new Inner(42));
}

pragma(inline, false)
ContainerExplicit makeExplicitContainer()
{
    return new ContainerExplicit(new Inner(42));
}

pragma(inline, false)
void clobberStack()
{
    long[256] junk;
    junk[] = 0xDEADBEEF;
    if (junk[0] == 0) log("unreachable");
}

void main()
{
    log("=== LDC GC Bug Reproducer ===");

    // Compile-time bitmaps are correct in both cases
    enum bitmapTuple = __traits(getPointerBitmap, Container!(Inner));
    printf("Container!(Inner)  compile-time bitmap: size=%lu, bits=0x%lx\n",
        cast(ulong) bitmapTuple[0], cast(ulong) bitmapTuple[1]);

    enum bitmapExplicit = __traits(getPointerBitmap, ContainerExplicit);
    printf("ContainerExplicit  compile-time bitmap: size=%lu, bits=0x%lx\n",
        cast(ulong) bitmapExplicit[0], cast(ulong) bitmapExplicit[1]);
    fflush(stdout);

    auto tupleContainer = makeTupleContainer();
    auto explicitContainer = makeExplicitContainer();

    // Check runtime GC attributes — this is the core bug
    auto tupleInfo = GC.query(cast(void*) tupleContainer);
    auto explicitInfo = GC.query(cast(void*) explicitContainer);

    bool tupleBug = !!(tupleInfo.attr & GC.BlkAttr.NO_SCAN);
    bool explicitBug = !!(explicitInfo.attr & GC.BlkAttr.NO_SCAN);

    if (tupleBug)
        log("BUG:  Container!(Inner) [tuple]    is NO_SCAN at runtime");
    else
        log("OK:   Container!(Inner) [tuple]    is scannable");

    if (explicitBug)
        log("BUG:  ContainerExplicit [explicit] is NO_SCAN at runtime");
    else
        log("OK:   ContainerExplicit [explicit] is scannable");

    // Wipe the stack to remove residual Inner pointers
    clobberStack();

    // Apply GC pressure and collect
    log("Forcing GC collection with memory pressure...");
    foreach (i; 0 .. 10)
    {
        foreach (j; 0 .. 10_000)
            cast(void) new int[64];
        GC.collect();
        GC.minimize();
    }
    log("GC collection done.");

    // Verify — if NO_SCAN, Inner objects may have been collected
    auto innerPtr = cast(void*) tupleContainer.ts[0];
    auto innerPtrExplicit = cast(void*) explicitContainer.inner;

    auto innerInfo = GC.query(innerPtr);
    auto innerInfoExplicit = GC.query(innerPtrExplicit);

    if (innerInfo.base is null)
        log("BUG:  Tuple container's Inner was COLLECTED by GC (dangling pointer!)");
    else
        log("OK:   Tuple container's Inner is still a valid GC allocation");

    if (innerInfoExplicit.base is null)
        log("BUG:  Explicit container's Inner was COLLECTED by GC");
    else
        log("OK:   Explicit container's Inner is still a valid GC allocation");
}

Build and run:

ldmd2 -i -of=repro ldc_gc_bug_repro.d && ./repro

Expected output (DMD, and LDC <= 1.30.0)

=== LDC GC Bug Reproducer ===
Container!(Inner)  compile-time bitmap: size=24, bits=0x4
ContainerExplicit  compile-time bitmap: size=24, bits=0x4
OK:   Container!(Inner) [tuple]    is scannable
OK:   ContainerExplicit [explicit] is scannable
Forcing GC collection with memory pressure...
GC collection done.
OK:   Tuple container's Inner is still a valid GC allocation
OK:   Explicit container's Inner is still a valid GC allocation

Actual output (LDC 1.31.0 through 1.42.0)

=== LDC GC Bug Reproducer ===
Container!(Inner)  compile-time bitmap: size=24, bits=0x4
ContainerExplicit  compile-time bitmap: size=24, bits=0x4
BUG:  Container!(Inner) [tuple]    is NO_SCAN at runtime
OK:   ContainerExplicit [explicit] is scannable
Forcing GC collection with memory pressure...
GC collection done.
BUG:  Tuple container's Inner was COLLECTED by GC (dangling pointer!)
OK:   Explicit container's Inner is still a valid GC allocation

Note: the compile-time pointer bitmap (__traits(getPointerBitmap)) is correct in both cases. The bug is in the runtime ClassInfo.m_flags field.

Version bisection

LDC version DMD frontend Result
1.29.0 2.099.1 OK
1.30.0 2.100.1 OK
1.31.0-beta1 2.101.2 BUG
1.31.0 2.101.2 BUG
1.32.0 – 1.42.0 2.102 – 2.112 BUG

DMD 2.110.0 is not affected (all optimization levels).

Root cause analysis

The immediate cause

buildClassinfoFlags() in ir/irclass.cpp (line ~316) iterates pc->members and calls hasPointers() on each member. If no member reports having pointers, it sets ClassFlags::noPointers (0x2) in m_flags. At allocation time, _d_newclass in rt/lifetime.d checks ci.m_flags & ClassFlags.noPointers and sets BlkAttr.NO_SCAN, telling the GC to never scan the object.

What changed between LDC 1.30.0 and 1.31.0

The DMD frontend was upgraded from 2.100.1 to 2.101.2 (commit b7624aa625). In the old frontend, when a variadic tuple field like Ts ts is expanded during semantic analysis into individual fields (__ts_field_0, __ts_field_1, ...), the expanded fields were pushed into sc.scopesym.members:

v1.30.0dmd/dsymbolsem.d ~line 672:

v.dsymbolSemantic(sc);
if (sc.scopesym)
{
    if (sc.scopesym.members)
        sc.scopesym.members.push(v);  // expanded tuple fields added to members
}

v1.31.0-beta1 — this members.push(v) block was removed:

v.dsymbolSemantic(sc);
Expression e = new VarExp(dsym.loc, v);

The consequence

  • v1.30.0: members = [flag, ts, __ts_field_0, __ts_field_1]. When buildClassinfoFlags iterates members, it reaches __ts_field_0 (e.g. type Inner), calls hasPointers() which returns true → class is correctly marked as scannable.

  • v1.31.0+: members = [flag, ts]. buildClassinfoFlags only sees the original ts VarDeclaration whose type is TypeTuple. TypeTuple does not override hasPointers() — it inherits the base Type.hasPointers() which returns false → class is incorrectly marked as noPointers.

Note that the fields array (used by determineFields() and __traits(getPointerBitmap)) is correct in both versions — it properly resolves through v.aliassym to the TupleDeclaration and visits expanded fields. The bug is only in members, which buildClassinfoFlags relies on.

Confirming via LLVM IR

Comparing the LLVM IR for the Container!(Inner) ClassInfo between the two versions:

  • v1.30.0: m_flags = i32 60 (no noPointers bit)
  • v1.31.0-beta1: m_flags = i32 62 (difference = 2 = ClassFlags.noPointers)

The RTInfo pointer bitmap field is identical and correct in both versions.

Suggested fix

The most targeted fix would be in buildClassinfoFlags() in ir/irclass.cpp. Options:

  1. Iterate cd->fields instead of cd->membersfields already has the correctly expanded tuple fields.
  2. Resolve through aliassym for tuple VarDeclarations — when a VarDeclaration has aliassym pointing to a TupleDeclaration, iterate the tuple's expanded members.
  3. Fix TypeTuple.hasPointers() upstream — add an override in TypeTuple that checks whether any constituent type has pointers. This would also fix the issue in the DMD frontend for any other callers.

Impact

This is a silent memory corruption bug. Objects referenced only through variadic tuple fields in template classes can be collected by the GC while still in use. This leads to dangling pointers, use-after-free crashes, and data corruption — with no compiler warnings or runtime diagnostics.

Any D code using template classes with variadic tuple parameters is affected when compiled with LDC 1.31.0+.

Environment

  • LDC versions tested: 1.29.0, 1.30.0, 1.31.0-beta1, 1.31.0, 1.32.0, 1.33.0, 1.34.0, 1.35.0, 1.36.0, 1.37.0, 1.38.0, 1.39.0, 1.40.0, 1.41.0, 1.42.0
  • DMD versions tested: 2.110.0 (not affected)
  • Platform: Linux x86_64 (Amazon Linux 2023)
  • Optimization levels: All (-O0, -O2, etc.) — bug is present regardless of optimization

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions