-
-
Notifications
You must be signed in to change notification settings - Fork 282
GC collects live objects when using template classes with tuple fields (LDC regression) #5068
Description
We are seeing the GC collect objects that are still reachable when they are stored inside a template class with a variadic tuple field (Ts ts). Under LDC, such classes are sometimes allocated with the GC.BlkAttr.NO_SCAN attribute, so the GC never scans them and treats referenced objects as unreachable.
As a temporary workaround, we manually clear the NO_SCAN flag after allocation:
auto EventLoop(Ts...)(Ts ts)
{
auto eloop = new EventLoopImpl!(Ts)(ts);
// BUG introduced in LDC 1.31.0
version(LDC)
{
uint attr = GC.getAttr(cast(void*)eloop);
if (attr & GC.BlkAttr.NO_SCAN)
{
info("Have to remove NO_SCAN attribute from EventLoopImpl due to LDC GC bug");
GC.clrAttr(cast(void*)eloop, GC.BlkAttr.NO_SCAN);
assert ((GC.getAttr(cast(void*)eloop) & GC.BlkAttr.NO_SCAN) == 0, "Still has NO_SCAN attribute after clearing it");
}
}
return eloop;
}
Attaching the AI-assisted analysis of the bug below, including a reproducible example, investigation of the frontend and IR changes across versions and how the class flags are derived.
[REG 1.31] Template classes with variadic tuple fields incorrectly allocated as NO_SCAN, causing GC to collect live objects
Summary
Template classes with variadic tuple fields (Ts... ts) are incorrectly allocated with the NO_SCAN GC attribute. This causes the GC to never scan the object's memory, so any objects referenced only through tuple fields become unreachable and get collected — leaving dangling pointers and causing use-after-free crashes.
The bug is present in every LDC release from 1.31.0 through 1.42.0 (latest tested). It does not affect DMD. It occurs at all optimization levels (-O0, -O2, etc.).
Reproducer
import core.memory : GC;
import core.stdc.stdio : printf, fflush, stdout;
void log(const(char)* msg)
{
printf("%s\n", msg);
fflush(stdout);
}
class Inner
{
int value;
this(int v) { value = v; }
}
// Template class storing type params as tuple fields — triggers the bug.
final class Container(Ts...)
{
bool flag;
Ts ts;
this(Ts args) { ts = args; }
}
// Equivalent non-template class for comparison — works correctly.
final class ContainerExplicit
{
bool flag;
Inner inner;
this(Inner i) { inner = i; }
}
pragma(inline, false)
Container!(Inner) makeTupleContainer()
{
return new Container!(Inner)(new Inner(42));
}
pragma(inline, false)
ContainerExplicit makeExplicitContainer()
{
return new ContainerExplicit(new Inner(42));
}
pragma(inline, false)
void clobberStack()
{
long[256] junk;
junk[] = 0xDEADBEEF;
if (junk[0] == 0) log("unreachable");
}
void main()
{
log("=== LDC GC Bug Reproducer ===");
// Compile-time bitmaps are correct in both cases
enum bitmapTuple = __traits(getPointerBitmap, Container!(Inner));
printf("Container!(Inner) compile-time bitmap: size=%lu, bits=0x%lx\n",
cast(ulong) bitmapTuple[0], cast(ulong) bitmapTuple[1]);
enum bitmapExplicit = __traits(getPointerBitmap, ContainerExplicit);
printf("ContainerExplicit compile-time bitmap: size=%lu, bits=0x%lx\n",
cast(ulong) bitmapExplicit[0], cast(ulong) bitmapExplicit[1]);
fflush(stdout);
auto tupleContainer = makeTupleContainer();
auto explicitContainer = makeExplicitContainer();
// Check runtime GC attributes — this is the core bug
auto tupleInfo = GC.query(cast(void*) tupleContainer);
auto explicitInfo = GC.query(cast(void*) explicitContainer);
bool tupleBug = !!(tupleInfo.attr & GC.BlkAttr.NO_SCAN);
bool explicitBug = !!(explicitInfo.attr & GC.BlkAttr.NO_SCAN);
if (tupleBug)
log("BUG: Container!(Inner) [tuple] is NO_SCAN at runtime");
else
log("OK: Container!(Inner) [tuple] is scannable");
if (explicitBug)
log("BUG: ContainerExplicit [explicit] is NO_SCAN at runtime");
else
log("OK: ContainerExplicit [explicit] is scannable");
// Wipe the stack to remove residual Inner pointers
clobberStack();
// Apply GC pressure and collect
log("Forcing GC collection with memory pressure...");
foreach (i; 0 .. 10)
{
foreach (j; 0 .. 10_000)
cast(void) new int[64];
GC.collect();
GC.minimize();
}
log("GC collection done.");
// Verify — if NO_SCAN, Inner objects may have been collected
auto innerPtr = cast(void*) tupleContainer.ts[0];
auto innerPtrExplicit = cast(void*) explicitContainer.inner;
auto innerInfo = GC.query(innerPtr);
auto innerInfoExplicit = GC.query(innerPtrExplicit);
if (innerInfo.base is null)
log("BUG: Tuple container's Inner was COLLECTED by GC (dangling pointer!)");
else
log("OK: Tuple container's Inner is still a valid GC allocation");
if (innerInfoExplicit.base is null)
log("BUG: Explicit container's Inner was COLLECTED by GC");
else
log("OK: Explicit container's Inner is still a valid GC allocation");
}Build and run:
ldmd2 -i -of=repro ldc_gc_bug_repro.d && ./repro
Expected output (DMD, and LDC <= 1.30.0)
=== LDC GC Bug Reproducer ===
Container!(Inner) compile-time bitmap: size=24, bits=0x4
ContainerExplicit compile-time bitmap: size=24, bits=0x4
OK: Container!(Inner) [tuple] is scannable
OK: ContainerExplicit [explicit] is scannable
Forcing GC collection with memory pressure...
GC collection done.
OK: Tuple container's Inner is still a valid GC allocation
OK: Explicit container's Inner is still a valid GC allocation
Actual output (LDC 1.31.0 through 1.42.0)
=== LDC GC Bug Reproducer ===
Container!(Inner) compile-time bitmap: size=24, bits=0x4
ContainerExplicit compile-time bitmap: size=24, bits=0x4
BUG: Container!(Inner) [tuple] is NO_SCAN at runtime
OK: ContainerExplicit [explicit] is scannable
Forcing GC collection with memory pressure...
GC collection done.
BUG: Tuple container's Inner was COLLECTED by GC (dangling pointer!)
OK: Explicit container's Inner is still a valid GC allocation
Note: the compile-time pointer bitmap (__traits(getPointerBitmap)) is correct in both cases. The bug is in the runtime ClassInfo.m_flags field.
Version bisection
| LDC version | DMD frontend | Result |
|---|---|---|
| 1.29.0 | 2.099.1 | OK |
| 1.30.0 | 2.100.1 | OK |
| 1.31.0-beta1 | 2.101.2 | BUG |
| 1.31.0 | 2.101.2 | BUG |
| 1.32.0 – 1.42.0 | 2.102 – 2.112 | BUG |
DMD 2.110.0 is not affected (all optimization levels).
Root cause analysis
The immediate cause
buildClassinfoFlags() in ir/irclass.cpp (line ~316) iterates pc->members and calls hasPointers() on each member. If no member reports having pointers, it sets ClassFlags::noPointers (0x2) in m_flags. At allocation time, _d_newclass in rt/lifetime.d checks ci.m_flags & ClassFlags.noPointers and sets BlkAttr.NO_SCAN, telling the GC to never scan the object.
What changed between LDC 1.30.0 and 1.31.0
The DMD frontend was upgraded from 2.100.1 to 2.101.2 (commit b7624aa625). In the old frontend, when a variadic tuple field like Ts ts is expanded during semantic analysis into individual fields (__ts_field_0, __ts_field_1, ...), the expanded fields were pushed into sc.scopesym.members:
v1.30.0 — dmd/dsymbolsem.d ~line 672:
v.dsymbolSemantic(sc);
if (sc.scopesym)
{
if (sc.scopesym.members)
sc.scopesym.members.push(v); // expanded tuple fields added to members
}v1.31.0-beta1 — this members.push(v) block was removed:
v.dsymbolSemantic(sc);
Expression e = new VarExp(dsym.loc, v);The consequence
-
v1.30.0:
members=[flag, ts, __ts_field_0, __ts_field_1]. WhenbuildClassinfoFlagsiterates members, it reaches__ts_field_0(e.g. typeInner), callshasPointers()which returnstrue→ class is correctly marked as scannable. -
v1.31.0+:
members=[flag, ts].buildClassinfoFlagsonly sees the originaltsVarDeclaration whose type isTypeTuple.TypeTupledoes not overridehasPointers()— it inherits the baseType.hasPointers()which returnsfalse→ class is incorrectly marked asnoPointers.
Note that the fields array (used by determineFields() and __traits(getPointerBitmap)) is correct in both versions — it properly resolves through v.aliassym to the TupleDeclaration and visits expanded fields. The bug is only in members, which buildClassinfoFlags relies on.
Confirming via LLVM IR
Comparing the LLVM IR for the Container!(Inner) ClassInfo between the two versions:
- v1.30.0:
m_flags = i32 60(nonoPointersbit) - v1.31.0-beta1:
m_flags = i32 62(difference = 2 =ClassFlags.noPointers)
The RTInfo pointer bitmap field is identical and correct in both versions.
Suggested fix
The most targeted fix would be in buildClassinfoFlags() in ir/irclass.cpp. Options:
- Iterate
cd->fieldsinstead ofcd->members—fieldsalready has the correctly expanded tuple fields. - Resolve through
aliassymfor tuple VarDeclarations — when a VarDeclaration hasaliassympointing to aTupleDeclaration, iterate the tuple's expanded members. - Fix
TypeTuple.hasPointers()upstream — add an override inTypeTuplethat checks whether any constituent type has pointers. This would also fix the issue in the DMD frontend for any other callers.
Impact
This is a silent memory corruption bug. Objects referenced only through variadic tuple fields in template classes can be collected by the GC while still in use. This leads to dangling pointers, use-after-free crashes, and data corruption — with no compiler warnings or runtime diagnostics.
Any D code using template classes with variadic tuple parameters is affected when compiled with LDC 1.31.0+.
Environment
- LDC versions tested: 1.29.0, 1.30.0, 1.31.0-beta1, 1.31.0, 1.32.0, 1.33.0, 1.34.0, 1.35.0, 1.36.0, 1.37.0, 1.38.0, 1.39.0, 1.40.0, 1.41.0, 1.42.0
- DMD versions tested: 2.110.0 (not affected)
- Platform: Linux x86_64 (Amazon Linux 2023)
- Optimization levels: All (
-O0,-O2, etc.) — bug is present regardless of optimization