|
| 1 | +# W-PYTORCH-CM-(ii) — StoreAttr managed-dict tag-flip corruption |
| 2 | + |
| 3 | +**Status:** PARKED behind pure-C JIT roadmap completion (Alex |
| 4 | +2026-04-27T07:12:25Z, supervisor cascade 07:13:18Z; D-1777270945). |
| 5 | +Failing-test sentinel: |
| 6 | +``Lib/test/test_phoenix_jit_storeattr_managed_dict_tag_flip.py`` |
| 7 | +(``@unittest.expectedFailure``). |
| 8 | + |
| 9 | +**Workstream history:** D-1777180692 state-of-knowledge brief + |
| 10 | +``docs/w-pytorch-cm-tooling-note.md`` running investigation log. |
| 11 | +W-PYTORCH-CM was split 2026-04-26T11:37Z into (i) compile-time |
| 12 | +type-confusion (FIXED at push 63 by adding |
| 13 | +``hir_c_primitive_compare_op`` accessor) and (ii) the runtime |
| 14 | +StoreAttr corruption documented here. (ii) is structurally |
| 15 | +INDEPENDENT of (i) per testkeeper valgrind discriminator |
| 16 | +2026-04-26T11:37:15Z (D2 LSB transition still captured post-(i) |
| 17 | +fix). |
| 18 | + |
| 19 | +## Symptom |
| 20 | + |
| 21 | +``` |
| 22 | +$ ./python /tmp/repro_s3.py |
| 23 | +... (50,000-iter bench_pytorch_cm post-force_compile) ... |
| 24 | +Segmentation fault (core dumped) |
| 25 | +``` |
| 26 | + |
| 27 | +Crash is a NULL+0xAB deref inside ``PyDict_SetItem`` reaching |
| 28 | +``Py_TYPE(NULL)->tp_flags`` (offset 0xAB into ``PyTypeObject``). |
| 29 | +Confirmed via ASAN on push 63 (testkeeper 2026-04-26T13:03Z): the |
| 30 | +SEGV is a **downstream consequence** of an LSB-clear at ``obj + |
| 31 | +0x18`` — not a wild write or UAF. |
| 32 | + |
| 33 | +## Mechanism (narrowed; writer un-localized) |
| 34 | + |
| 35 | +PEP 697 managed-dict encoding stores ``(char*)values_ptr - 1`` in the |
| 36 | +slot at ``obj + 0x18``. 8-aligned addresses end in ``0x0`` / ``0x8``, |
| 37 | +so the encoded form ends in ``0x7`` / ``0xF`` (low 3 bits ``0b111``) |
| 38 | +when IsValues is set. ``IsDict`` is signalled by LSB == 0. |
| 39 | + |
| 40 | +Sequence observed in the repro: |
| 41 | + |
| 42 | +1. ``D2[0]`` snapshot: slot byte 0 = ``0x97`` (correct IsValues |
| 43 | + encoding for ``values_ptr = 0x98``; T2.5 confirmed ``0x98`` is the |
| 44 | + heavily-recycled values chunk). |
| 45 | +2. ``D2[1]`` snapshot at the same ``obj`` address: slot byte 0 = |
| 46 | + ``0x96`` — exactly one bit cleared (LSB). |
| 47 | +3. The IsDict path reads the now-LSB-zero word as a ``PyDictObject*``; |
| 48 | + ``ob_type`` at offset 8 of ``0x96`` is NULL/junk. |
| 49 | +4. ``PyDict_SetItem`` is called with that NULL dict and SEGVs at |
| 50 | + ``Py_TYPE(NULL)->tp_flags``. |
| 51 | + |
| 52 | +**Class-invariant pattern:** byte 0 of ``obj + 0x18`` for ``_NoGrad`` |
| 53 | +instances allocated by ``Tools/benchmark_phoenix.py:bench_pytorch_cm`` |
| 54 | +gets its low bit cleared. Pattern cannot result from any vanilla |
| 55 | +CPython slot write (writes ``0x97`` IsValues, ``0x00`` NULL, or an |
| 56 | +8-aligned dict pointer ending ``0x0`` / ``0x8``). |
| 57 | + |
| 58 | +**Source-level audit (Phoenix Python/cinderx + Python/jit):** NO |
| 59 | +direct writes to ``obj + 0x18``. The Phoenix source only READS via |
| 60 | +``_PyObject_DictOrValuesPointer`` (e.g. ``SplitMutator::setAttr`` / |
| 61 | +``getAttr``) using the correct macros. |
| 62 | + |
| 63 | +**JIT-emit caveat (pythia #156 #1):** the source-grep audit covers |
| 64 | +source-level writes only. JIT-emitted machine-code writes (Phoenix |
| 65 | +runtime helpers, JIT-emitted prologues) are NOT testable by source |
| 66 | +grep. The writer for the LSB-clear remains undischarged by the |
| 67 | +audit. |
| 68 | + |
| 69 | +## Hypothesis classes after cheap-tier exhaustion |
| 70 | + |
| 71 | +(2026-04-26T14:21:09Z — discriminator-saturated, GENUINE PAUSE called |
| 72 | +by supervisor). Five candidates were enumerated; three are FALSIFIED; |
| 73 | +two and a half remain OPEN. |
| 74 | + |
| 75 | +| Class | Description | Status | |
| 76 | +|-------|-------------|--------| |
| 77 | +| (a) | Narrow 1-byte writer at ``obj+0x18`` byte 0 (AND-with-~1, sub-1, or direct ``0x96`` store) | OPEN | |
| 78 | +| (b) | Wider write clipping LSB (2/4/8-byte store whose low byte happens to be ``0x96``) | OPEN | |
| 79 | +| (c) | Wild write / UAF coincidentally LSB-aligned at ``obj+0x18`` | FALSIFIED (ASAN on push 63: crash is NULL+0xAB deref, not UAF; LSB-clear is the cause not the corruption itself) | |
| 80 | +| (d) | Two-instance conflation — ``D2[0]`` and ``D2[1]`` are different recycled instances at the same address; no single-instance mutation occurred | OPEN (cannot be discriminated from header bytes — refcnt + type_ptr + first8 identical for fresh ``_NoGrad`` instances; testkeeper 2026-04-26T14:20:44Z) | |
| 81 | +| (e) | Cache-load-side: ``TypeAttrCache`` value slot baked into JIT'd code at compile, racing with cache-slot writer → JIT loads torn value → STORE writes corrupted value to ``obj+0x18`` | FALSIFIED at the per-frame SEGV-site enumeration (3/3 cache slots tested by hardware-watchpoint: TYPE 0xd33020, VALUE 0xd42018, ``cache_`` 0xd5b2a0 all stable post-fill, no runtime writes during workload). RESIDUAL CHEAP-TIER UNRUN: broader objdump-grep across compile-unit cache-load immediates not enumerated. | |
| 82 | + |
| 83 | +## Trigger sensitivity |
| 84 | + |
| 85 | +Bug is **TIMING-SENSITIVE** (D-1777190733, testkeeper 2026-04-26T08:03Z): |
| 86 | +the original LSB=0 trigger DID NOT reproduce when the workload was |
| 87 | +wrapped in a Python ``__enter__`` context manager. Wrapper added |
| 88 | +~100µs/iter of Python interpreter overhead, shifting JIT-call-counter |
| 89 | +timing relative to the auto-compile threshold and thereby evading the |
| 90 | +trigger window. |
| 91 | + |
| 92 | +Implication for instrumentation: any printf-class observer that adds |
| 93 | +Python-level overhead may also evade. Heavy-tier discriminators |
| 94 | +(C-side allocate-counter, hardware watchpoint via ``tp_alloc`` hook) |
| 95 | +are the next observability tier. |
| 96 | + |
| 97 | +## Reproducer |
| 98 | + |
| 99 | +``/tmp/repro_s3.py`` (228 bytes, preserved verbatim in |
| 100 | +``Lib/test/test_phoenix_jit_storeattr_managed_dict_tag_flip.py`` as |
| 101 | +the test's HARNESS_SOURCE): |
| 102 | + |
| 103 | +```python |
| 104 | +import sys; sys.path.insert(0, 'Tools') |
| 105 | +import _cinderx, cinderjit |
| 106 | +from benchmark_phoenix import bench_pytorch_cm |
| 107 | +bench_pytorch_cm(5000) # warmup |
| 108 | +cinderjit.force_compile(bench_pytorch_cm) |
| 109 | +bench_pytorch_cm(50000) |
| 110 | +print("S3 OK") |
| 111 | +``` |
| 112 | + |
| 113 | +``bench_pytorch_cm`` is a self-contained |
| 114 | +``Tools/benchmark_phoenix.py`` benchmark exercising nested context |
| 115 | +managers (``_NoGrad`` / ``_Autocast`` / ``_ProfileScope``) — the |
| 116 | +PyTorch-style pattern that prompted the workstream name. No |
| 117 | +``torch`` runtime dependency. |
| 118 | + |
| 119 | +## Heavy-tier instrumentation designs (on disk, un-implemented) |
| 120 | + |
| 121 | +Both ~200 LOC + rebuild; gated on heavy-tier authorization (Alex |
| 122 | +direction OR explicit team auth) per governance D-1777190699. Both |
| 123 | +documented by theologian under the 2026-04-26 stand-down and ready |
| 124 | +for resumption. |
| 125 | + |
| 126 | +### tp_alloc hardware watchpoint |
| 127 | +``docs/w-pytorch-cm-tp-alloc-watchpoint-design.md`` |
| 128 | + |
| 129 | +Hook ``_NoGrad`` ``tp_alloc``; on each allocation set a 1-byte |
| 130 | +hardware watchpoint (DR0-DR3) on ``obj + 0x18`` with write-only |
| 131 | +trigger; SIGTRAP handler captures ``RIP`` + backtrace + register |
| 132 | +dump. Discriminates (a) narrow 1-byte writer vs (b) wider clipping |
| 133 | +write directly from the faulting instruction. (d) two-instance |
| 134 | +conflation manifests as "watchpoint never fires on watched instance |
| 135 | +even though ``D2`` captures the LSB transition on a different |
| 136 | +recycled instance". |
| 137 | + |
| 138 | +### Allocate-counter side-table |
| 139 | +``docs/w-pytorch-cm-allocate-counter-design.md`` |
| 140 | + |
| 141 | +Add a 64-bit monotonic ``alloc_id`` per ``_NoGrad`` instance via a |
| 142 | +hash-table side-table (keyed on ``obj`` pointer; populated at |
| 143 | +``init_inline_values``, looked up at the ``D2`` print site). |
| 144 | +Discriminates (d) instance conflation from (a)/(b) single-instance |
| 145 | +mutation by comparing ``D2[0].alloc_id`` to ``D2[1].alloc_id`` at |
| 146 | +the same ``obj`` address. |
| 147 | + |
| 148 | +**Recommended ordering** (per |
| 149 | +``w-pytorch-cm-tp-alloc-watchpoint-design.md`` §"Comparison"): if |
| 150 | +only one design is authorized, run ``tp_alloc`` watchpoint first — |
| 151 | +it directly identifies the writer when (a) or (b) holds. If the |
| 152 | +watchpoint never fires on the watched instance during a confirmed |
| 153 | +``D2`` transition, (d) becomes the load-bearing hypothesis and the |
| 154 | +allocate-counter design is then run. |
| 155 | + |
| 156 | +## Why parked (Alex 2026-04-27T07:12:25Z) |
| 157 | + |
| 158 | +Bug only fires under the contrived ``repro_s3.py`` 50,000-iter |
| 159 | +workload after explicit ``force_compile``. Not seen in: |
| 160 | + |
| 161 | +- The CinderX prod codebase (``cinderx_dev`` oracle PASS; |
| 162 | + D-1775658159 11-day Alex prior-art). |
| 163 | +- The regular Phoenix test suite (480-test x86_64 + 483-test ARM64 |
| 164 | + runs). |
| 165 | +- The 24-benchmark ABBA + per-commit 4-benchmark gate. |
| 166 | + |
| 167 | +The fix-class falsifier (cinderx_dev oracle) shows core Cinder is |
| 168 | +structurally immune to this bug — Phoenix introduced it. Per |
| 169 | +``feedback_assume_phoenix_regression.md`` the bug is presumed |
| 170 | +Phoenix-introduced and warrants a real fix, but Alex's 07:12:25Z |
| 171 | +direction sequences it after the pure-C JIT roadmap is complete. |
| 172 | + |
| 173 | +## Resumption gate |
| 174 | + |
| 175 | +Before re-engaging the writer hunt: |
| 176 | + |
| 177 | +1. Re-confirm ``Lib/test/test_phoenix_jit_storeattr_managed_dict_tag_flip.py`` |
| 178 | + still ``expectedFailure``s on the current HEAD (subprocess SEGV |
| 179 | + reproducible). |
| 180 | +2. Read ``docs/w-pytorch-cm-tooling-note.md`` for the full |
| 181 | + investigation log including 6 falsified hypotheses, the 3-cycle |
| 182 | + D8 + T2.5 reconciliation, and the |
| 183 | + ``shouldSkipCompilation``-skip-list anti-pattern warning (pythia |
| 184 | + #154 #4). |
| 185 | +3. Choose ``tp_alloc``-watchpoint, allocate-counter, or both per |
| 186 | + the comparison table in |
| 187 | + ``w-pytorch-cm-tp-alloc-watchpoint-design.md``. |
| 188 | +4. Heavy-tier authorization required per governance D-1777190699 + |
| 189 | + D-1777270945 (Alex parking decision; resumption is the trigger to |
| 190 | + re-engage). |
| 191 | + |
| 192 | +## Anti-pattern (do not adopt) |
| 193 | + |
| 194 | +Per pythia #154 #4 + ``feedback_no_workarounds.md``: the path of |
| 195 | +least resistance after a multi-pivot investigation is appending |
| 196 | +``_NoGrad`` / ``_Autocast`` / context-manager types to Phoenix's |
| 197 | +``shouldSkipCompilation`` skip-list (``pyjit.cpp``). That is a |
| 198 | +WORKAROUND — it preserves the underlying bug class for future |
| 199 | +managed-dict types to re-trigger. Resumption agent must root-cause |
| 200 | +the LSB-clear writer; do NOT extend the skip-list. |
0 commit comments