Add 'annotate-usecode' command to import USECODE IR JSON annotations

- Introduced a new command 'annotate-usecode' to import USECODE IR JSON annotation hints as Ghidra comments on compiled anchors.
- Added argument parsing for multiple IR JSON files, comment type selection, and a dry-run option.
- Implemented logic to read annotation records from the provided IR files and set comments on the corresponding addresses in Ghidra.
- Enhanced JSON schema to include response structure for the new command.
This commit is contained in:
MaddoScientisto 2026-03-24 18:14:20 +01:00
commit daa363c3d2
39 changed files with 41450 additions and 871 deletions

View file

@ -366,20 +366,113 @@ The 000c event handler at `000c:9703` is entered via the large cheat-event dispa
Key negative result: no function in the compiled C code directly pushes the value `0x410` into the game's event broadcast path. All three occurrences of the immediate `0x410` in the disassembly are: (a) the `CMP BX,0x410` comparison inside the 000c switch, (b) a multi-event subscription list at `000b:b5cb` (registering to receive the event), and (c) an abort-function error code at `000d:5290` unrelated to the cheat.
Conclusion: event 0x410 is generated exclusively by the **interpreted USECODE lane** (centered on `EUSECODE.FLX`), not by any static keyboard-level scan-code path in the compiled binary. The F10 keyboard branch in `seg001_input_keyboard_handler` is a separate `0x44` path gated by `0x6045`, not by `0x410`. Separate follow-up work on the imported `ASYLUM.DLL` shows that DLL exports `ASS_*` audio routines, so it should not be conflated with the immortality toggle path. The in-game trigger is still best modeled as a USECODE item or controller script, consistent with the surrounding string evidence (`000e:6337 "CruHealer"`, `000e:6341 "BatteryCharger"`, `000e:6445 "Controller"`, `000e:64ab "AutoFirer"` — these are USECODE process class names bracketing the Immortality string).
The strongest new compiled-side recovery in this pass is the seg109 listener object behind that subscription site. `cheat_event_listener_create` at `000b:b3b1` allocates one listener object and registers the shared cheat/control event bundle (`0x13d`, `0x1b`, `0x443`, `0x142`, `0x141`, `0x143`, `0x23f`, `0x43e`, `0x41f`, `0x417`, `0x431`, `0x411`, `0x410`, `0x441`, `0x421`, `0x22d`) through the seg109 registration helper at `000b:3d2a`. Its paired `cheat_event_listener_handle_event` body at `000b:b62c` is subscriber-side only: for event `0x410` it rewrites the event object's field `+0x6` to local state `0x0e` and falls into the shared `FUN_000b_b7f3` state-processing tail. That listener does not produce event `0x410`; it only reacts after the event has already been emitted elsewhere.
The generic compiled dispatch path is one step tighter now too. The larger `000c:8a62` wrapper first peels off local gated cases, then falls into the generic cheat/control event dispatcher at `000c:8c56`, which reads `event_object->code` from field `+0x6` and switches over values like `0x141`, `0x142`, `0x143`, `0x23f`, `0x410`, `0x431`, `0x441`, and `0x443`. That makes the shared event-object contract explicit: `000c:8c56` consumes the original emitted event id from `+0x6`, while `cheat_event_listener_handle_event` reuses the same `+0x6` field as a local state/subcommand code before entering `FUN_000b_b7f3`.
One extraction-side false lead is now closed too: the `TELEPAD` row in `USECODE/EUSECODE_extracted/class_event_index.tsv` with `raw_code_offset = 0x00000410` is a class-body offset for slot `0x20`, not direct evidence that `TELEPAD` emits gameplay event `0x410`.
The requested USECODE family sweep also tightened the player-trigger side without closing it. Inside `class_event_index.tsv`, `NPCTRIG` is the only requested family that is both explicitly event-bearing at the descriptor level and also has non-empty callable bodies in the current event-slot extraction (`equip` / slot `0x0a` at raw offset `0x0175`, plus one anonymous slot `0x20` body at raw offset `0x0159`). `SPECIAL`, `TRIGPAD`, and `REB_PAD` all have non-empty callable bodies too, but they remain referent/state neighbors rather than direct event carriers: `SPECIAL` shows bodies for `equip`, `enterFastArea`, `leaveFastArea`, and anonymous slots `0x20/0x21`; `TRIGPAD` shows `gotHit`; `REB_PAD` shows `gotHit` and anonymous slots `0x20/0x21`. None of those extracted bodies currently expose a verified `0x410` immediate or decoded event payload.
Conclusion: event 0x410 is generated exclusively by the **interpreted USECODE lane** (centered on `EUSECODE.FLX`), not by any static keyboard-level scan-code path in the compiled binary. The F10 keyboard branch in `seg001_input_keyboard_handler` is a separate `0x44` path gated by `0x6045`, not by `0x410`. Separate follow-up work on the imported `ASYLUM.DLL` shows that DLL exports `ASS_*` audio routines, so it should not be conflated with the immortality toggle path. The in-game trigger is still best modeled as a USECODE item or controller script, consistent with the surrounding string evidence (`000e:6337 "CruHealer"`, `000e:6341 "BatteryCharger"`, `000e:6445 "Controller"`, `000e:64ab "AutoFirer"` — these are USECODE process class names bracketing the Immortality string). The new extractor-side report `USECODE/EUSECODE_extracted/immortality_target_body_scan.md` now scans the strongest current bodies in `EVENT`, `NPCTRIG`, `COR_BOOT`, `REE_BOOT`, `SFXTRIG`, `SPECIAL`, and `TRIGPAD` and finds no inline little-endian `0x0410`, no dword `0x00000410`, and no byte-swapped `0x1004` in any of them. That closes the immediate-emitter hypothesis for those currently exposed bodies and narrows the remaining frontier to data-driven decoding of the monolithic `EVENT` slot `0x0a` body and the compact `NPCTRIG` slot `0x0a` / `0x20` bodies, not to `TRIGPAD`, `SPECIAL`, `REB_PAD`, or `TELEPAD`.
The next extractor pass now pushes that one layer deeper. `USECODE/EUSECODE_extracted/immortality_body_structure.md` shows that `EVENT` slot `0x0a` is structurally a wide generic hub body, not a compact trigger leaf: it carries `90` internal `0x53 0x5c <u16> EVENT` subheaders, `383` local `0x5b` labels, and one wide tail-field set covering `event`, `item`, `source`, `dest`, `door`, `counter`, `counter2`, `link`, `time`, `post1`, `post2`, `floor`, and `flicMan`. By contrast, `NPCTRIG` stays compact and trigger-shaped. Slot `0x0a` has only `5` class-labelled subheaders and a narrow tail-field set (`referent`, `event`, `item`, `item2`), while slot `0x20` has only `1` such subheader and swaps the tail `event` field for `typeNpc` while keeping the same compact `item` / `item2` neighborhood. That is the strongest current player-trigger result: `EVENT` now reads as the generic event hub body, while the likeliest player-facing path is the `NPCTRIG` pair with slot `0x0a` as the compact event-bearing trigger body and slot `0x20` as its nearby typed/setup companion.
The next focused decode pass sharpens that split enough to treat the two `NPCTRIG` bodies differently instead of as one unresolved pair. New report `USECODE/EUSECODE_extracted/immortality_npctrig_clauses.md` fixes the open-header parse and shows that slot `0x0a` starts with `0x5A 0x06 0x5C 0x013E NPCTRIG ... 0x0B 0x11`, then falls into a five-step clause ladder with subheaders at `0x0064/0x0093/0x00c2/0x00f1/0x0120`. Those subheaders sit on a uniform `0x2f` stride, their targets walk backward by the same amount, and each full-width clause carries one `branch_3f_0a`, one `push_24_51`, and one `writeback_57_02`. Slot `0x20` is structurally different: its prolog ends with event-code byte `0x01`, it has only one class-labelled subheader, no `writeback_57_02`, no `push_24_51`, and ten `field_4b_fe_0f` hits clustered around repeated `0x0a 00/05 4b fe 0f ...` windows before the tail field `69:000a -> typeNpc`. That is the strongest current descriptor-side reduction of the search space: slot `0x0a` now reads like the live event-bearing clause ladder, while slot `0x20` reads more like a typed gate or setup/attachment companion body than like a second emitter.
The runtime-side bridge is tighter too. The binary already had one exact offset-specialized masked wrapper for slot `0x0a`, `entity_vm_context_try_create_mask_0400_slot0a_with_offset` at `0005:2c35`, and the `000d:21ed -> 000d:22bc` lane is still verified as a slot-backed inline-payload consumer that copies a variable-length byte stream first and then consumes compact metadata bytes plus streamed words. The new body-structure report is consistent with that runtime contract: the surviving `EVENT` / `NPCTRIG` bodies are clause streams with repeated internal subheaders and local labels, not flat literal blobs. That still does **not** prove that `NPCTRIG 0x0a` emits `0x410` directly, but it narrows the best remaining emitter frontier from `EVENT or NPCTRIG` down to `NPCTRIG slot 0x0a` with `NPCTRIG slot 0x20` as the strongest adjacent support body.
The clause report makes that runtime comparison more concrete too. `0005:2c35` is no longer just an abstract "with offset" wrapper: `entity_vm_slot_load_value_plus_offset` at `000d:5572` now proves the extra word is applied additively to the loaded slot value before `000d:21ed` consumes the result. The internal consumer at `000d:21ed -> 000d:22bc` is tighter as well: after copying the inline blob into the context it reads two signed metadata bytes, uses byte A as the lead-word row count, uses byte B as the shared target-list width, performs `A x B` `entity_link` calls, and pushes back only non-`0x0400` words. That makes `NPCTRIG 0x0a` the only surviving compact body with a natural selector family for this lane: it has `5` evenly spaced clause starts at stride `0x2f`, while slot `0x20` has only one clause and no matching writeback/push motif. So the best current working model is no longer "EVENT or NPCTRIG" or even "NPCTRIG 0x0a plus 0x20 as co-equal bodies"; it is specifically "NPCTRIG slot `0x0a` event-bearing clause ladder, with slot `0x20` as a typed companion/setup body feeding or constraining the same family."
**Secondary handler (000b:b62c):**
`000b:b62c` subscribes to event 0x410 via the registration at `000b:b5cb`. When event 0x410 is received by this handler, it writes state code `0xe` (decimal 14) into the event object's field `+0x6` and passes it to `000b:b7f3` for processing. This is a parallel state-machine path that runs alongside the 000c toggle; likely it drives an associated USECODE process or animation object into state 14.
`cheat_event_listener_handle_event` (`000b:b62c`) receives event 0x410 through the registration installed by `cheat_event_listener_create` at `000b:b3b1`. When event 0x410 arrives, it writes state code `0xe` (decimal 14) into the event object's field `+0x6` and passes it to `000b:b7f3` for processing. This is a parallel state-machine path that runs alongside the 000c toggle; likely it drives an associated USECODE process or animation object into state 14.
| Address | Symbol | Role |
|-------------|-------------------------------|------|
| `0004:c055` | `player_receive_damage_and_dispatch_effects` | Renamed. Contains the `0x604f` immortality gate at `0004:c205`. |
| `000b:b3b1` | `cheat_event_listener_create` | Allocates one seg109 listener object and subscribes it to the shared cheat/control event bundle that includes `0x410`. |
| `000b:b62c` | `cheat_event_listener_handle_event` | Subscriber-side event mapper: rewrites incoming `0x410` to local state `0x0e` before entering the shared listener state machine. |
| `DS:0x604f` | Immortality flag | Set/cleared by event `0x410`. Read only at `0004:c205`. |
| `DS:0x60d2` | Immortality-on notification ptr | Near pointer in DS; resolves to far ptr → "Immortality enabled." display. |
| `DS:0x60ee` | Immortality-off notification ptr | Near pointer in DS; resolves to far ptr → "Immortality disabled." display. |
| `000a:b988` | `video_bios_state_snapshot` | Called after notification display in the 0x410 toggle to refresh screen state. |
### Hidden cheat menu investigation (seg109 UI lane)
New compiled-side evidence shows a real but likely dormant cheat-menu UI path:
| Address | Symbol | Role |
|-------------|-------------------------------------|------|
| `000b:9a86` | `cheat_menu_open_from_current_slot` | Builds a `cheat_event_listener` object, preloads selection from current slot state (`0x659c/0x659e`), pushes it through the sprite tree, and runs a modal draw/update loop. |
| `000b:9c0d` | `cheat_menu_open_modal` | Smaller modal wrapper that directly constructs `cheat_event_listener_create(...)`, traverses it, and returns. |
| `000b:b3b1` | `cheat_event_listener_create` | Constructor for the listener object. Registers event bundle including `0x23f`, `0x410`, `0x411`, `0x441`, etc. |
| `000b:b62c` | `cheat_event_listener_handle_event` | Listener event mapper; event `0x23f` toggles armed/visible state byte `+0x47`; event `0x410` remaps to local state `0x0e` then enters `FUN_000b_b7f3`. |
#### Reachability status in retail binary
- Static constructor callsites for `cheat_event_listener_create` are exactly two locations: `000b:9a9b` and `000b:9c56`.
- Static inbound xrefs to the wrapper entries `000b:9a86` and `000b:9c0d` are currently empty in the recovered code graph.
- The cheat-code matcher `cheat_code_check` (`0007:0d0a`) toggles `0x844/0x6045` and emits event `0x103`; it does **not** call these menu wrappers directly.
- The 000c handler for `0x103` (`000c:99dd`) executes a status/refresh lane and notification path; no direct call to `cheat_event_listener_create` appears there.
Current best read: this menu path is compiled and functional at object level, but likely orphaned/hidden in final gameplay flow (possibly debug/dev-only trigger removed, or only reachable through non-recovered data-driven callback wiring).
#### Retail patch-targeting trail
The practical patch work ended up being mostly about **finding a call site whose runtime context matches the hidden menu wrappers**, not just finding any place that reaches `000a:5276`.
Verified retail anchor points:
| File off | Ghidra | Meaning | Notes |
|----------|--------|---------|-------|
| `0x70d75` | `0007:0d75` | cheat matcher emits event `0x103` | retail bytes = `68 03 01 9A FF FF 00 00 83 C4 02`; NE fixup source = `0007:0d79` -> `seg092:0476` |
| `0x71d68` | fixup entry for `0007:0d79` | seg039 relocation record | exact retail entry: addr_type `0x03`, rel_type `0x00`, chain_off `0x2b79`, target `seg092:0476` |
| `0xc99dd` | `000c:99dd` | later controller-side handler that also executes `push 0x103 / call 000a:5276` | retail fixup source = `000c:99e1` -> `seg092:0476`; this is the first materially safer deferred hook candidate after the direct matcher path failed |
| `0xb9a8d` | `000b:9a8d` | arg setup inside `cheat_menu_open_from_current_slot` | original wrapper uses caller stack words `[BP+8]` and `[BP+6]` plus local armed flag `1` |
| `0xb9c48` | `000b:9c48` | arg setup inside `cheat_menu_open_modal` | original wrapper still feeds caller stack words `[BP+8]` and `[BP+6]` into `cheat_event_listener_create`, but starts with local byte `+0x47 = 0` |
What failed and why:
- Direct retarget of `0007:0d79` to `000b:9a86` crashed at startup when the NE relocation table was patched incorrectly as a raw far pointer. That was a file-format problem, not a semantic proof.
- After the patcher was made NE-fixup-aware, direct retarget to `000b:9a86` no longer broke startup, but the game hung when the cheat actually fired. Disassembly shows why: `cheat_menu_open_from_current_slot` consumes caller-supplied words at `[BP+8]` and `[BP+6]`, so the cheat matcher context is the wrong stack shape.
- Retargeting the same early cheat-matcher call to `000b:9c0d` got farther: the mouse pointer appeared, proving the hidden menu/display path was being entered. But it still hung with looping music, which points to **timing/context**, not a bad target address. The modal path appears unsafe when entered directly from the keyboard matcher even after the constructor args are forced to zero.
Current best patch rationale:
- `0007:0d75` is still the right place to intercept the cheat sequence itself because it is the verified success emission site.
- `000c:99dd` is the better candidate for the **actual menu-open call** because it is a later controller/event context, not the raw keyboard matcher frame.
- `000b:9c48` is the right argument-fix companion because it is the constructor-argument site for `cheat_menu_open_modal`, and the direct disassembly shows that this is where the wrapper still pulls caller-dependent words.
Rejected follow-up patch design:
- Site 1 tried changing `0007:0d75` from `push 0x103` to `push 0x42f`, keeping the original event-dispatch helper call intact.
- Site 2 retargeted the `000c:99e1` relocation so the `0x42f` handler's internal `push 0x103 / call 000a:5276` sequence called `cheat_menu_open_modal` instead.
- Site 3 patched `000b:9c48` from `6A 00 FF 76 08 FF 76 06` to `6A 00 6A 00 6A 00 90 90`.
Observed result on retail test build:
- The game no longer failed at startup, and the mouse pointer appeared when the cheat fired, confirming that the hidden modal UI path was being entered.
- But the game then halted with the retail `FILE\FLEX.C, line 83` failure and dropped into the quit/teardown path (`"No pity. No mercy. No remorse."`).
- That is strong evidence that event `0x42f` is the wrong deferred hook context for this experiment even though the retargeted address itself was valid enough to enter the UI path.
Current patch candidate under test:
- Site 1: keep the original `0007:0d75` bytes and retarget only its existing far-call fixup from `seg092:0476` to `000b:9a86` (`cheat_menu_open_from_current_slot`).
- Site 2: patch `000b:9a8d` from `6A 01 FF 76 08 FF 76 06` to `6A 01 6A 00 6A 00 90 90`.
Rationale for the revised wrapper patch:
- Earlier direct-hook attempts proved that inheriting the two caller-frame words at `000b:9a8f/9a92` is unsafe from the cheat matcher context.
- But later decompilation of `cheat_event_listener_create` showed that the leading `push 0x1` at `000b:9a8d` is a distinct mode byte used by the constructor path, so zeroing all three pushed values was too aggressive.
- The current patch therefore preserves the leading `1` and only forces the two ambiguous 16-bit parameters to zero.
Risk notes:
- These remain behavioral exploration hacks, not correctness fixes.
- The evidence now strongly suggests the hard part is runtime context and event timing, not discovering the retail file offsets.
- If the revised direct `0007:0d79 -> 000b:9a86` path with the narrower `000b:9a8d` wrapper patch still fails, the next step should be a queue/defer design or a trampoline/cave patch rather than another blind event substitution.
### Conservative folklore verification
- "Cheats can be enabled with `-laurie`" is **directly verified**.
@ -389,4 +482,6 @@ Conclusion: event 0x410 is generated exclusively by the **interpreted USECODE la
- "H enables hack mover" is **real at runtime** (strings confirmed), but not found in the static low-level byte dispatch; the activation comes from the USECODE scripting layer.
- "Immortality makes the player invincible" is **partially verified**: damage is divided by 262,144, making HP loss negligible; the hit stagger still plays. There is no bypass of the HP system entirely.
- "Immortality is toggled with a keyboard combo" is **not supported in compiled C code**: event 0x410 has no static keyboard dispatch path. It is USECODE-triggered.
- `TELEPAD` slot `0x20` in `class_event_index.tsv` is **not** direct `0x410` event evidence; its `0x00000410` value is the extracted class-body offset for that slot.
- Among the requested USECODE families, `NPCTRIG` is the strongest remaining player-trigger candidate because it is explicitly event-bearing and also has extracted callable bodies, while `TRIGPAD`, `SPECIAL`, and `REB_PAD` currently read as neighboring referent/state/controller bodies rather than direct event carriers.
- The hidden five-byte matcher compares bytes from live code at `0007:2833`, and the ordinary keyboard ISR producer does not naturally emit byte values `0x80` and `0xfd` into record byte `+1`.

View file

@ -0,0 +1,235 @@
# Pentagram Crusader Reference
## Purpose
This note mines Pentagram's Ultima 8 / Crusader code and bundled docs for evidence that is useful to current Crusader reverse-engineering, especially the USECODE / VM lane.
It complements [docs/scummvm-crusader-reference.md](docs/scummvm-crusader-reference.md). Where Pentagram and ScummVM agree, that usually strengthens provenance, but not always confidence: several of the relevant ScummVM Ultima8 components appear to descend from the same Pentagram-era implementation ideas, so matching behavior between the two should not be treated as fully independent confirmation.
## Highest-Value Findings
1. Pentagram contains direct Crusader USECODE parser and VM support, not just generic U8 notes.
Files: `convert/crusader/ConvertUsecodeCrusader.h`, `usecode/UsecodeFlex.cpp`, `usecode/Usecode.cpp`, `usecode/UCMachine.cpp`, `usecode/remorseintrinsics.h`, `kernel/GUIApp.cpp`.
2. Pentagram's older U8 USECODE documentation is still useful as contrast material because it shows which parts of the object/event model stayed stable and which parts changed in Crusader.
File: `docs/u8usecode.txt`.
3. Pentagram preserves one practical caution that ScummVM does not show as clearly: its Crusader runtime support is incomplete.
Files: `FAQ`, `world/Item.cpp`, `games/RemorseGame.cpp`.
4. Pentagram also records a few engine-format deltas that are useful outside USECODE, including Crusader map coordinate scaling, larger map chunks, and a wider Crusader `typeflag.dat` record.
Files: `world/Map.cpp`, `world/CurrentMap.cpp`, `graphics/TypeFlags.cpp`.
## Direct Pentagram Crusader Evidence
### USECODE class layout and event lookup
`usecode/UsecodeFlex.cpp` matches the broad Crusader model already noted from ScummVM:
- class body object = `classid + 2`
- class names come from object `1` at `name_object + 4 + 13 * classid`
- Crusader class base offset is read from bytes `8..11` of the class object and decremented by `1`
- Crusader event count is computed as `(get_class_base_offset(classid) + 19) / 6`
`usecode/Usecode.cpp` then resolves Crusader event offsets from class data at `20 + 6 * eventid`, using bytes `+2..+5` of each 6-byte row as the code offset.
Implication for current RE:
- Pentagram independently preserves the same `classid + 2` and 6-byte event-row reading used in the ScummVM note.
- The shared `(base + 19) / 6` event-count rule should still be treated carefully in current owner-loaded/raw EUSECODE work, because local binary validation already showed that this shared Pentagram/ScummVM rule is not a clean fit for sampled raw class records.
- In other words, Pentagram is strong provenance for the implementation lineage, but not a reason to override validated binary-side arithmetic.
### Crusader event-name table
`convert/crusader/ConvertUsecodeCrusader.h` provides a named Crusader event table for `0x00..0x1f`:
- clear names: `look`, `use`, `anim`, `setActivity`, `cachein`, `hit`, `gotHit`, `hatch`, `schedule`, `release`, `combine`, `calledFromAnim`, `enterFastArea`, `leaveFastArea`, `justMoved`, `AvatarStoleSomething`, `animGetHit`
- weak placeholders remain for `0x0a`, `0x0b`, `0x0d`, `0x11`, and `0x15..0x1f`
This is slightly rougher than the current ScummVM note in naming quality, but it is still useful because it shows which ordinals were already considered understood in the older Pentagram work and which ones remained unresolved.
### Crusader call opcode semantics inside the VM
`usecode/UCMachine.cpp` contains one especially useful comment-backed distinction:
- U8 opcode `0x11` calls a function at an explicit class/code offset
- Crusader opcode `0x11` calls function number `yy yy` of class `xx xx`, then translates that number through `get_class_event()`
That matters for current USECODE analysis because it reinforces the reading that Crusader bytecode is event-ordinal-driven in places where U8 was direct-offset-driven.
### Remorse intrinsic runtime table exists, but it is partial and sparse
`kernel/GUIApp.cpp` creates `UCMachine(RemorseIntrinsics, 308)` for Remorse, and `usecode/remorseintrinsics.h` holds that live runtime table.
What is useful:
- it confirms a real Remorse-specific runtime intrinsic table with at least `308` entries
- some entries are already mapped to concrete engine hooks such as frame/shape/status/quality accessors, item creation, movement helpers, egg helpers, and timer-tick access
What is not useful enough yet:
- the table is far sparser and rougher than ScummVM's later Remorse/Regret intrinsic descriptions
- many entries are still `0` or placeholder comments
Practical use:
- treat Pentagram intrinsics as secondary hints or provenance for older naming work
- prefer ScummVM for higher-coverage intrinsic labeling
- prefer raw binary behavior over either table for actual renames
### Version-sensitive global evidence
Pentagram's scratch notes add one useful wrinkle to the global-slot story:
- `docs/scratch/globals/remorse1.01.txt` starts with `global_address 003D`
- `docs/scratch/globals/regret1.01.txt` starts with `global_address 001E`
Cross-reference with ScummVM:
- the existing ScummVM note records Remorse global `0x003c` and Regret global `0x001e`
Safest read:
- Regret lines up cleanly at `0x001e`
- Remorse appears version-sensitive or notation-sensitive between Pentagram artifacts and later ScummVM code (`0x003d` in the Pentagram scratch output for Remorse 1.01 versus `0x003c` in the ScummVM runtime initialization path)
Implication for RE:
- keep Remorse global-slot claims version-tagged when possible
- do not collapse `0x003c` and `0x003d` into one unqualified global statement without checking game/version context
## U8-Specific Documentation That Still Helps
### `docs/u8usecode.txt`
This file is U8-specific, not direct Crusader evidence, but it is still useful in three ways.
First, it documents the older U8 class/object indexing model:
- object `0` = global flag names
- object `1` = usecode function names
- object `2 + shape` = shape-linked usecode body
- object `1026 + npc` = NPC-linked usecode body
Second, it records the classic U8 per-class layout:
- 12-byte header prefix
- 32 event pointers
- code body after that table
Third, it preserves an older event-meaning list for ordinals `0x00..0x1f`.
Why it still matters for Crusader:
- many semantic event labels survive into the Crusader table: `look`, `use`, `anim`, `cachein`, `hit`, `gotHit`, `hatch`, `schedule`, `release`, `combine`, `enterFastArea`, `leaveFastArea`, `AvatarStoleSomething`
- the document makes the Crusader deltas clearer: Crusader moved away from a fixed 32 x 4-byte event-pointer table and instead uses a 6-byte-per-event structure with event-number lookup in the VM
Recommended use:
- use `u8usecode.txt` as a contrast document for inherited VM concepts and event semantics
- do not use it as direct proof of Crusader container layout or opcode contracts
## Cross-Reference Against The Existing ScummVM Note
### Where Pentagram and ScummVM clearly agree
Both references point to the same core Crusader USECODE model:
- `classid + 2` class lookup
- class names in object `1`
- bytes `8..11` as the class header field used for Crusader code/event addressing
- 6-byte Crusader event rows
- named event ordinals `0x00..0x1f`
- a Crusader-specific VM/global path rather than a straight U8 reuse
This agreement is useful because it shows the model is not a one-off local interpretation.
### Where Pentagram adds something materially useful
Pentagram contributes a few things the ScummVM note did not emphasize as strongly:
- older U8 documentation that makes Crusader structural deltas easier to isolate
- explicit confirmation in `UCMachine.cpp` that Crusader opcode `0x11` is event-number dispatch, not raw offset dispatch
- scratch global dumps that expose version-sensitive Remorse versus Regret behavior
- explicit incompleteness warnings in the project itself, which help calibrate how much authority to assign to runtime behavior
### Where Pentagram should not increase confidence much
For the current header/count dispute in owner-loaded/raw EUSECODE parsing, Pentagram and ScummVM agreeing with each other does not settle the question.
Reason:
- the relevant Pentagram and ScummVM Crusader USECODE code paths are very close in structure
- that makes them best treated as one implementation lineage, not two independent external confirmations
Current rule for RE remains:
- use Pentagram/ScummVM to anchor object indexing, row size, event labels, and VM intent
- keep the local binary-validated class-header arithmetic as the authority when the shared engine code disagrees with sampled Crusader records
## Non-USECODE Engine Findings Worth Keeping
These are lower priority than the USECODE sections, but still useful for future binary-side work.
### Map loading
`world/Map.cpp` shows that Crusader on-disk map records are still read as 16-byte records, but Pentagram doubles `x` and `y` after loading when `GAME_IS_CRUSADER`.
Implication:
- if a raw loader appears to scale map coordinates or if current external-map tooling sees a factor-of-two mismatch, Pentagram provides a concrete engine-side reason to test that path
### Current map chunking
`world/CurrentMap.cpp` sets `mapChunkSize = 1024` for Crusader versus `512` for U8.
Implication:
- this matches the broader cross-project pattern that Crusader is not just U8 data with renamed files; some world/grid assumptions are materially different
### Crusader `typeflag.dat`
`graphics/TypeFlags.cpp` switches Crusader to 9-byte records instead of U8's 8-byte records, with extended family-bit handling and multiple Crusader-only flag placeholders.
Implication:
- Crusader `typeflag.dat` should continue to be treated as its own format family
- any local parser or reverse-engineered structure should not inherit the U8 8-byte layout blindly
## Confidence Limits
Pentagram is valuable, but only in bounded ways.
Direct reasons for caution:
- `FAQ` says Crusader support was a future goal, not a completed feature
- `games/RemorseGame.cpp` is clearly incomplete compared with the ScummVM Crusader startup path
- `world/Item.cpp` explicitly disables all Crusader usecode events except `use()`
So for current Crusader RE, the best weighting is:
- high confidence: parser/disassembler layout clues, event ordinals, VM intent, container/indexing models, file-format deltas
- medium confidence: sparse Remorse intrinsic names and scratch global artifacts
- low confidence: full runtime behavior, startup semantics, and any absence-based conclusion from Pentagram's Crusader execution path
## Most Useful Pentagram Files
- `convert/crusader/ConvertUsecodeCrusader.h`
- `usecode/UsecodeFlex.cpp`
- `usecode/Usecode.cpp`
- `usecode/UCMachine.cpp`
- `docs/u8usecode.txt`
- `docs/scratch/globals/remorse1.01.txt`
- `world/Item.cpp`
- `graphics/TypeFlags.cpp`
- `world/Map.cpp`
- `world/CurrentMap.cpp`
## Practical RE Follow-Ups
1. Keep using Pentagram and ScummVM event names as slot-label hints only, especially for `0x0a`, `0x0b`, `0x11`, and the still-placeholder high ordinals.
2. When documenting Crusader USECODE VM behavior, cite Pentagram's `opcode 0x11 = class/event dispatch` distinction alongside the existing ScummVM reference.
3. Keep local owner-loaded/raw EUSECODE arithmetic authoritative over the shared Pentagram/ScummVM `(base + 19) / 6` rule until a direct main USECODE sample proves otherwise.
4. Tag Remorse global-slot references with version context when using Pentagram scratch outputs.
5. Reuse Pentagram's map/typeflag deltas when a future binary pass returns to world loaders or shape/type metadata.
6. Treat missing behavior in Pentagram's Crusader runtime as non-evidence unless ScummVM or raw binary analysis supports the same absence.

View file

@ -201,9 +201,9 @@ Second sweep through `000c` adjacent helpers — gated thunk wrappers and input/
| `000c:84c3` | `entity_state_set_byte40_at_global_ptr` | Sets byte `[g_active_dispatch_entry_farptr + 0x40] = 1` then calls thunk unconditionally; current evidence treats this as raising the shared active-entry transition/display hold byte rather than toggling an unrelated global |
| `000c:ac55` | `entity_state_fire_if_handle_valid` | Guard: fires thunk dispatch only when `[0x6054] != -1`; no-op otherwise |
| `000c:ac6d` | `entity_state_fire_with_args_if_handle_valid` | 3-arg variant: pushes `[BP+0xe]` (byte), `[BP+0xc]`, `[BP+0xa]`, handle `[0x6054]`, then `CALLF 0000:ffff` |
| `000c:afa5` | `entity_state_check_field49_and_call_vfunc3c` | Checks field `[ptr+0x49]`: 1→reset to 0 return 1; 2→call `vtable[0x3c]` return 0; else thunk dispatch |
| `000c:b153` | `entity_state_animation_done_tick` | Checks `[param_2+0x14+0xa]` animation-complete flag; if zero increments `field49` and calls `entity_state_check_field49_and_call_vfunc3c`; if set calls `vtable[0x3c]` |
| `000c:b199` | `entity_state_input_key_handler` | Full input dispatcher: ESC/x/X → `vtable[0x3c]` (cancel); Left/Right arrows `0x14b/0x148` → prev state; n/N/`0x14d/0x150` → next state; e/E → set `field47=1`; `-` with counter → trigger at 4. Manages `field47` and `field49` |
| `000c:afa5` | `transition_file_family_select_and_refresh` | Local startup/display selector: `field49==-1` normalizes to `0`; `field49==2` dispatches `vtable[0x3c]`; `field49==0/1/4` composes one of three sibling filenames from inherited base `0x6aa:0x6ac` plus stem/suffix buffers `0x621c/0x6223`, `0x621c/0x622d`, or `0x621c/0x6237`, loads the result into object `+0x520`, then runs the shared redraw/palette/input refresh path |
| `000c:b153` | `transition_file_family_advance_on_anim_tick` | Polls `[param_2+0x14+0xa]`; when clear increments `field49` and re-enters `transition_file_family_select_and_refresh`, otherwise exits through `vtable[0x3c]` |
| `000c:b199` | `transition_file_family_input_key_handler` | Local selector key handler: ESC/x/X → `vtable[0x3c]`; Left/Right arrows `0x14b/0x148` → previous file-family state; n/N/`0x14d/0x150` → next state; e/E arms `field47`; `-` after arming counts up to forced state `4`; selector moves drain the event queue and clear `0x8a94/0x8a96/0x8a98` |
| `000c:b2c3` | `stub_noop_000c_b2c3` | Empty stub; returns immediately |
| `000c:b2c8` | `entity_state_dispatch_if_field49_eq4` | Fires thunk only when `[ptr+0x49]==4` |
| `000c:b349` | `entity_state_dispatch_if_far_ptr_nonzero_a` | Fires thunk if far-pointer args non-zero |
@ -211,8 +211,8 @@ Second sweep through `000c` adjacent helpers — gated thunk wrappers and input/
| `000c:b3d8` | `entity_state_dispatch_if_far_ptr_nonzero_b` | Same null-guard pattern as `b349`, variant b |
**Patterns confirmed:**
- `field49` = state-sequence index; 0=reset, 2=vtable callback, 4=triggered end
- `field47` = keystroke-combo counter
- `field49` = local transition file-family selector state in this startup/display family; `0/1/4` choose sibling filenames under shared base `0x6aa:0x6ac` plus stem `0x621c`, `2` dispatches `vtable[0x3c]`, and `-1` normalizes back to `0`
- `field47` = keystroke arm/counter for the local `e/E` then `-` path into selector state `4`
- `field3f` = linked data pointer (event/record reference)
- `[0x6054]` = current entity handle; `[0x6828]` = `g_active_dispatch_entry_farptr`, the shared active-dispatch entry owner whose byte `+0x40` is reused across the startup/display lane as a hold/busy token
- Bits in `[ptr+0x5b]`: `0x1=init`, `0x2=active/event`, `0x40=pending dispatch`, `0x100=flag100`, `0x180=skip-all mask`
@ -259,15 +259,17 @@ Globals used: `[0x6312]`=start index, `[0x6314]`=count, `[0x630e]`=palette src p
- `entity_vm_slot_index_from_entity` (`000d:45c5`) computes one slot index from a gameplay entity by branching on seg021 class/type helpers and then adding one of the current runtime base offsets `0x8c7c/0x8c7e/0x8c80`
- `entity_vm_context_try_create_masked_for_entity` (`000d:463a`) uses that slot index to test one owner-side mask entry before it creates a context, which is the strongest current bridge from gameplay entities into this VM lane
- `entity_vm_context_create_from_slot_index` (`000d:46ec`) allocates one `0x6714` context object, seeds its `+0xd6/+0xd8` lane through `entity_vm_slot_load_value_plus_offset`, initializes the local mini-VM state, and can prepend caller data into the backward-growing buffer at `+0x102`
- `entity_vm_opcode_sequence_run` (`000d:ebe3`) is now named conservatively in Ghidra: it seeds the stage chain from object `+0xfe`, runs `000d:177c -> 000d:1acb -> 000d:0988 -> 000d:22bc -> optional 000d:1d4a -> 000d:2104`, then finishes with tracked-handle cleanup plus the `0008:ebe7` gate on object `+0xc0` and byte `+0x4b`
- `entity_vm_context_sync_global_value_and_dispatch` (`000d:48da`) is the current context-side runner/sync point: it marks the context busy at `+0x123`, calls `entity_vm_set_field_da_to_global`, optionally writes the current value through `+0x11b/+0x11d`, and dispatches through the context vtable on success
- `entity_vm_context_save` / `entity_vm_context_load` / `entity_vm_context_destroy` / `entity_vm_context_free_buffer` (`000d:498f`, `000d:4a78`, `000d:4962`, `000d:48b6`) now pin down the lifecycle of this object family rather than leaving the whole `000d:45xx..4exx` island anonymous
- `entity_vm_context_try_create_masked_for_entity` is now better constrained at the return-value level too: after the runtime-disable check at `0x6610` and the owner-side slot-mask test succeed, it reports two distinct success shapes. Immediate-flagged contexts (`+0x16 & 0x0008`) clear the caller output word, while object-backed contexts return the created object's low word. That makes the helper a typed bridge from gameplay entities into VM-backed object results, not only a yes/no mask probe.
- `entity_vm_runtime_owner_resource_create` (`000d:7000`) is now one step tighter too: the embedded seg069/070 helper is file-backed rather than abstract. Construction starts with `dos_file_handle_init` (`0009:1c00`), then uses helper vtable slot `+0x04` as the size query that drives the child `+0x10/+0x12` allocation and helper vtable slot `+0x0c` as the table-population callback for the `0x0d`-stride owner table.
- That file-backed helper is now tighter one step deeper as well. The seg070 loops rooted at raw windows `0009:67b6` and `0009:6916` walk helper-owned record arrays at object `+0x10/+0x18`, format per-entry paths through the seg001 string helpers (`0003:e4d3` / `0003:e590`), then open, read, and close each file through `file_handle_alloc_init_and_open` (`0009:1c3a`), `dos_file_seek` (`0009:2034`), and `dos_file_close` (`0009:1e61`). The paired `+0x18` entries are consumed as 16-bit ids passed into those path-format loops beside the far-pointer path table at `+0x10`; no object-1 or `classid + 2` arithmetic appears there, so the safest current read is slot-local file ids rather than exposed original class/object indices. That is strong evidence that `000d:7000` seeds the owner table from an indexed external file set rather than by copying one monolithic in-memory descriptor blob.
- A final loader-side tightening from the current pass is that `0009:67b6` and `0009:6916` now read as twin entry walkers rather than one isolated path-format callback. Both windows iterate the helper-owned count at `+0x14`, index the far-pointer path table at `+0x10` and paired 16-bit id table at `+0x18`, check the source path through `0003:e669`, build formatted paths with distinct local format strings (`DS:3f2d` vs `DS:3f40`), and then reach the same file open/read/close lane. The remaining open question is not whether they are file-backed, but whether they represent two file families, two record templates, or two load phases inside the same helper class.
- A final loader-side tightening from the current pass is that `0009:67b6` and `0009:6916` now read as paired file-family walkers rather than one isolated path-format callback. Both windows iterate the helper-owned count at `+0x14`, index the far-pointer path table at `+0x10` and paired 16-bit id table at `+0x18`, check the source path through `0003:e669`, build formatted paths with distinct local format strings (`DS:3f2d` vs `DS:3f40`), and then reach the same file open/read/close lane. Each loop also writes into its own independently allocated output far buffer before the shared trailer runs, so the best current reading is two parallel file families or record banks loaded by the same helper rather than two phases over one shared buffer. The remaining open question is the exact per-family record schema and higher-level resource role, not whether the helper is file-backed.
- The caller-side bootstrap for that helper is now anchored too: `entity_vm_runtime_init_from_path_if_configured` (`000d:44df`) first checks the configured byte/string global at `0x65a`, builds a path through seg072 helper `0009:3600` using globals `0x6d6:0x6d8` plus `0x65a`, validates that path through `000a:500a`, then calls `entity_vm_runtime_create(0,0,path)`. This is the first verified source-argument path for `entity_vm_runtime_owner_resource_create`, and it strongly suggests the owner/resource table is loaded from an external configured file rather than from a purely in-memory descriptor blob.
- Seg072 helper `0009:3600` is now classified more tightly as a rotating slash-aware path composer rather than a generic buffer advance helper. Its prologue cycles through five `0x50`-byte temp buffers, and its inner cases append optional string parts while inserting `\` only when adjacent path components need a separator. That narrows the two globals used by `000d:44df`: `0x65a` behaves as the configured relative runtime-owner filename/path component, while `0x6d6:0x6d8` behaves as the mutable base/resource-root path buffer that gets joined with `0x65a` before `000a:500a` validation.
- The two still-xref-dark wrappers `0005:2c35` and `0005:2c68` are also narrower now. Their signed extra word does not participate in owner-mask selection inside `entity_vm_context_try_create_masked_for_entity`; it is forwarded into `entity_vm_context_create_from_slot_index`, stored in context field `+0x34`, and passed on to `entity_vm_slot_load_value_plus_offset`. The best current reading is therefore `offset-specialized masked context creation`, not a separate direct selector lane.
- Ghidra now records that signed-offset contract directly in the wrapper names too: `0005:2c35` = `entity_vm_context_try_create_mask_0400_slot0a_with_offset` and `0005:2c68` = `entity_vm_context_try_create_mask_0800_slot0b_with_offset`. That still stops short of real caller-role recovery, but it removes the last ambiguity about whether the extra stack word is semantically live.
- The first opcode-level behavior split inside that runtime is now visible in the `000d:0988` family:
- one branch calls `entity_vm_referent_chain_append_unique_from`, which looks like an attach/union operation on the current referent payload chain
- the `0x1a/0x1b` branch instead calls `entity_vm_referent_chain_remove_matching_from`, which looks like the inverse operation and makes the opcode family materially closer to a graph-editing script VM than a flat event list
@ -307,11 +309,14 @@ Globals used: `[0x6312]`=start index, `[0x6314]`=count, `[0x630e]`=palette src p
Conservative interpretation after this pass:
- The `000d:21ed -> 000d:22bc` lane is strongly supported as a slot-backed payload to entity-link closure path, where two byte-sized metadata fields shape the matrix walk and word entries are link/entity ids.
- The `000d:21ed -> 000d:22bc` lane is strongly supported as a slot-backed payload to entity-link closure path, where two signed byte-sized metadata fields shape an exact `A x B` matrix walk: byte A is the lead-word row count, byte B is the shared target-list width, and the word entries passed to `entity_link` are runtime link/entity ids rather than descriptor selectors.
- Descriptor-family alignment is therefore stronger with generic active event ecosystems (`EVENT`/`NPCTRIG`/`*_BOOT`/`SFXTRIG`) than with `SURCAM*` callback holders, because no direct `eventTrigger`-specific discriminator is read in this lane.
- Direct descriptor-id attribution is still rejected for now: no code evidence ties the consumed bytes/words here to explicit EUSECODE class indices or to a hard `JELYHACK`/`SURCAM*` switch.
- The new extractor-side structure pass tightens the descriptor-side fit inside that generic active-event ecosystem. `USECODE/EUSECODE_extracted/immortality_body_structure.md` shows `EVENT` slot `0x0a` as a broad hub clause stream with `90` internal `0x53 0x5c <u16> EVENT` subheaders and the widest field trailer, while `NPCTRIG` slot `0x0a` stays compact at `5` subheaders and a narrow `referent/event/item/item2` tail. That does not prove a direct class-id bridge into `000d:21ed -> 000d:22bc`, but it does make `NPCTRIG slot 0x0a` the strongest remaining compact descriptor-side candidate for the offset-specialized slot-`0x0a` runtime wrapper `entity_vm_context_try_create_mask_0400_slot0a_with_offset` (`0005:2c35`) instead of the older undifferentiated `EVENT or NPCTRIG` frontier.
- The next focused extractor pass sharpens that fit again. `USECODE/EUSECODE_extracted/immortality_npctrig_clauses.md` now shows `NPCTRIG` slot `0x0a` as a fixed-width five-clause ladder: subheaders at `0x0064/0x0093/0x00c2/0x00f1/0x0120`, uniform `0x2f` stride, backward-walking targets, and one `branch_3f_0a` + `push_24_51` + `writeback_57_02` triple in each full clause. The new runtime-fit section also matters: `000d:5572` proves the extra word from `0005:2c35` is additive (`entity_vm_slot_load_value(...) + offset`), so slot `0x0a` now exposes the only surviving compact five-row selector family that plausibly matches byte A in `000d:21ed`, while slot `0x20` remains a one-clause typeNpc-heavy body with no comparable writeback/push motif or stride family.
- The downstream-use follow-up weakens that direct selector fit. Instruction windows at `000d:47ef..47f3` show `entity_vm_context_create_from_slot_index` storing slot index `SI` at `+0x32` and the dynamic additive word `DI` at `+0x34`, but the live sequencer lane `000d:21ed -> 000d:22bc` never rereads either field: after the create call it only touches the copied blob at `+0x102`, the seeded byte lane at `+0xd6/+0xd8`, and the caller stream at `+0xcc/+0xce`. The persistent uses of `+0x34` are instead the object save/load path: `000d:49e9..4a27` serializes `+0x10c` then `+0x34`, and `000d:4c2d..4c4d` reloads `(+0x32,+0x34)` through `entity_vm_slot_load_value_plus_offset` before storing the returned pair at `+0x10c/+0x10e`. The safest current read is therefore `persisted source offset feeding a later slot-value reload`, not `direct clause selector consumed by the matrix stage`, which weakens the `NPCTRIG slot 0x0a` alignment unless the derived reload value itself can still be tied back to that ladder.
### FUN_000d_ebe3 opcode-to-payload-shape matrix (sequencer-local)
### entity_vm_opcode_sequence_run opcode-to-payload-shape matrix (sequencer-local)
| Sequencer stage | Code anchors | Opcode / lane status | Payload shape class | Verified behavior |
|---|---|---|---|---|
@ -327,8 +332,9 @@ Conservative interpretation after this pass:
What is now hard evidence in code:
- `000d:0988` compares one opcode-local word at `[BP-0x32]` against concrete values `0x19`, `0x1a`, and `0x1b` (`000d:099b`, `000d:09a1`, `000d:0a07`, `000d:0a0d`).
- `FUN_000d_ebe3` calls `000d:177c -> 000d:1acb -> 000d:0988 -> 000d:22bc -> optional 000d:1d4a -> 000d:2104` (`000d:ebf5`, `000d:ec09`, `000d:ec1d`, `000d:ec31`, `000d:ec48`, `000d:ec54`).
- `entity_vm_opcode_sequence_run` (`000d:ebe3`) calls `000d:177c -> 000d:1acb -> 000d:0988 -> 000d:22bc -> optional 000d:1d4a -> 000d:2104` (`000d:ebf5`, `000d:ec09`, `000d:ec1d`, `000d:ec31`, `000d:ec48`, `000d:ec54`).
- `000d:177c`, `000d:1acb`, and `000d:2104` do not contain their own opcode compares in recovered body ranges; they behave as wrapper stages around the opcode-local family tested in `000d:0988`.
- The entry/exit contract is one step tighter too. `000d:ebe9` seeds the first stage from object field `+0xfe`, while the success tail at `000d:ec62..ec79` runs `tracked_entity_handle_mark_remove_all_if_enabled` and then gates `FUN_0008_ebe7` on object field `+0xc0` plus byte `+0x4b`. So the sequencer is not just an isolated opcode cluster; it also participates in outer runtime cleanup and follow-up dispatch state.
Conservative case identity mapping from this pass:
@ -339,9 +345,9 @@ Conservative case identity mapping from this pass:
Still unresolved after this pass:
- The animation constructor near calls at `000e:283e`, `000e:2931`, and `000e:29e4` land on a separate mis-split `000e:ebe3` region, not on `FUN_000d_ebe3`. They therefore no longer count as direct xref evidence for the `000d` dispatcher.
- The true upstream selector/write path for `[BP-0x32]` in `FUN_000d_ebe3` is still unresolved, and no additional opcode id can yet be assigned uniquely beyond the internal `0x19/0x1a/0x1b` family already proven inside `000d:0988`.
- Repeated MCP-visible instruction and data-use searches still do not produce a real direct caller edge for `FUN_000d_ebe3`, `0005:2c35`, or `0005:2c68`. For now that makes the next defensible route `caller-frame / shared-consumer recovery`, not more recycled raw call searches or the retired `000a:44fd` and `000e:ebe3` hypotheses.
- The animation constructor near calls at `000e:283e`, `000e:2931`, and `000e:29e4` land on a separate mis-split `000e:ebe3` region, not on `entity_vm_opcode_sequence_run`. They therefore no longer count as direct xref evidence for the `000d` dispatcher.
- The true upstream selector/write path for `[BP-0x32]` in `entity_vm_opcode_sequence_run` is still unresolved, and no additional opcode id can yet be assigned uniquely beyond the internal `0x19/0x1a/0x1b` family already proven inside `000d:0988`.
- Repeated MCP-visible instruction and data-use searches still do not produce a real direct caller edge for `entity_vm_opcode_sequence_run`, `0005:2c35`, or `0005:2c68`. For now that makes the next defensible route `caller-frame / shared-consumer recovery`, not more recycled raw call searches or the retired `000a:44fd` and `000e:ebe3` hypotheses.
### First readable VM IR sketch (verified-only)
@ -466,9 +472,52 @@ The next gameplay-side wrapper pass now extends well past the three earlier seed
- Direct callsites are now pinned for the simpler wrappers: `0005:0292 -> 0005:2c06`, `0005:0fee -> 0005:2cd2`, `0005:5946/59e9 -> 0005:2c9b`, and `0007:814e/822e -> 0005:2d01`.
- The two direct `0005:2d30` callers are now role-shaped as well: `0005:5370` reaches slot `0x0f` only after `entity_class_has_flag2000` succeeds and class-word bit `0x8000` is clear, while `0005:6f47` reaches the same gate from the complementary branch where class-word bit `0x2000` is still clear before the caller continues into its larger state/update flow.
- `0005:2c68` is no longer usable as indirect selector evidence. The `0007:e521` and `0007:e73c` instruction windows do push `0x2c68` immediately before `CALLF 000a:44fd`, but decompile now shows that value is the caller-local data pointer `DAT_0000_2c68` passed into a fatal-report helper, not an indirect call to wrapper `0005:2c68`.
- `0005:2c35` and `0005:2c68` therefore both remain unresolved in direct caller/xref evidence, and the real selector work stays centered on the still-xref-dark upstream edge into `FUN_000d_ebe3` rather than the disproven `000a:44fd` hypothesis.
- `0005:2c35` and `0005:2c68` therefore both remain unresolved in direct caller/xref evidence, and the real selector work stays centered on the still-xref-dark upstream edge into `entity_vm_opcode_sequence_run` rather than the disproven `000a:44fd` hypothesis.
- Net effect: the active-event ecosystem fit is reinforced by direct caller behavior and payload shapes, but final slot-to-descriptor ownership still requires real caller-role recovery for the remaining xref-dark entry points.
#### Current batch: masked-context hub and sequencer-internal consumer recovery
- The generic masked VM-context hub is now instruction-verified at `000d:463a`. That body maps the incoming entity through `entity_vm_slot_index_from_entity`, rejects the path when runtime global `0x6610` is active or the owner/resource table at `0x6611 + 0x1315/+0x1317` is absent, tests the per-slot `0x0d`-stride owner mask pair against the caller-supplied high/low mask words, and only then falls into `entity_vm_context_create_from_slot_index` (`000d:46ec`).
- `search_instructions` on `000d:463a` now confirms this hub is not isolated to the `0005` wrapper island. In addition to the known seg021 wrappers, live direct callers now include `0004:f047` (mask `0x8000:0x0007`), `0004:f076` (mask `0x2000:0x0015`), and larger callers at `0006:0bbc` / `0006:10e7`. That is new caller-side evidence for the wider owner-slot taxonomy even though the offset-specialized wrappers `0005:2c35` and `0005:2c68` themselves still have no direct caller edges.
- The xref-dark offset wrappers are now tighter structurally too. Disassembly of `0005:2c35` and `0005:2c68` confirms they do nothing beyond sign-extending one extra word, passing mask pairs `0x0400:0x000a` and `0x0800:0x000b`, forwarding the entity pointer to `000d:463a`, and returning the out-word on success. That keeps their best current reading at `offset-specialized masked context creation`, not a separate selector lane.
- The offset word is now behaviorally tighter too. `entity_vm_slot_load_value_plus_offset` (`000d:5572`) is a straight `entity_vm_slot_load_value(...) + offset` wrapper, so the extra word passed by `0005:2c35` is not a second mask or opaque cookie; it is an additive selector/value adjustment that can plausibly choose one of the evenly spaced slot-`0x0a` clause starts once a real caller is recovered.
- The next caller-path pass tightens why `0005:2c35` stays dark. MCP xrefs now show only three entries into `entity_vm_context_create_from_slot_index` (`000d:46ac` from the generic masked hub, plus direct internal sequencer islands `000d:208b` and `000d:21ed`), while `0005:2c35` itself still has no recovered code or data xrefs. Stack setup at `000d:208b` hardcodes the `000d:5572` additive slot-load parameter to `0`, which does not match the `NPCTRIG` slot-`0x0a` clause starts (`0x0064/0x0093/0x00c2/0x00f1/0x0120`) or backward targets (`0x00db/0x00ac/0x007d/0x004e/0x001f`). The remaining live selector frontier is therefore the still-overlapped `000d:21ed` caller frame, not a normal visible caller of `0005:2c35`.
- The sequencer lane also gained two concrete internal consumer shapes. `000d:208b` is now the instruction-verified `create one slot-backed context and materialize or forward its result` path: it builds a `0x6714` context from the caller stream state, writes immediate-flagged results straight to the out pointer, and otherwise forwards the created object through `entity_vm_opcode_finish`. `000d:21ed` is the matching `prepend inline payload and build entity-link matrix` path: it creates a context, prepends caller-owned bytes into `+0x102`, consumes the seeded `+0xd6/+0xd8` bytes as shape/count metadata, and builds repeated `entity_link` closures from the following streamed ids before the same finish path.
- A new downstream-use pass narrows the extra-word role further. The stored offset field at context `+0x34` is now confirmed as durable object state rather than an immediate sequencer input: `000d:21ed -> 000d:22bc` does not reread it at all, `000d:498f`/`000d:4a78` serialize and reload it, and `000d:4c2d..4c4d` recomputes a slot-backed value from `(+0x32,+0x34)` into `+0x10c/+0x10e`. That shifts the remaining immortality question one step downstream: if `NPCTRIG slot 0x0a` still fits this runtime lane, it is more likely through the value reloaded from the slot-plus-offset pair than through `+0x34` as a direct clause selector.
- The hidden pre-call span in the `000d:21ed` lane is now recovered from direct program-memory bytes as well. Window `000d:2131..21ed` reads the seeded `+0xd6/+0xd8` stream as three successive words followed by two signed bytes: word0 becomes the slot index pushed at `000d:21d4`, word1 and word2 are added at `000d:21d0` before being pushed as the dynamic additive arg at `000d:21d3`, byte3 is forwarded as the setup-data length byte, and byte4 becomes the inline-blob length used for the later prepend copy. That makes the source classification explicit: context `+0x34` is not loaded from the owner table or from the caller object at `+0xd4`; it is a computed sum of two consecutive words inside the seeded stream itself.
- The same recovered window also tightens the upstream source layout feeding `entity_vm_context_setup`. The current caller frame base is `caller + [caller+0xd4]`, where `+0xd4` matches the saved frame offset written by `entity_vm_stack_push_frame` (`000c:f7c7`) rather than a descriptor-local field. From that frame base, `000d:21db..21e0` pushes `[frame+0x0a/+0x0c]` as a far pointer passed into `entity_vm_context_setup`, and `000d:21bd..21c8` separately derives `[frame+0x0e]` as the inline payload tail copied after context creation. So this consumer is now better modeled as one generic VM frame-record shape with two payload sources: a frame-stored far pointer plus byte-sized setup length for the initial `+0xcc` stack seed, followed by an adjacent inline tail blob with its own byte-sized length.
- The next frame-producer pass recovers the closest non-overlapped writer feeding that lane too. Raw bytes at `000c:fbf7..fc47` (`caseD_0`) show a generic frame-record producer reading one signed placement byte from the same seeded `+0xd6/+0xd8` stream, popping a far-pointer dword from the caller stream at `[caller+0xcc/+0xce]`, computing `frame_base = caller + [caller+0xd4]`, and storing the dword at `[frame_base + placement + 0x4/+0x6]`. That means the immediate source far pointer consumed later by `000d:21ed` is already stream-backed rather than owner-row-backed; if the `000d:21ed` record uses this exact producer family for its `[frame+0x0a/+0x0c]` lane, the relevant placement byte is `0x0006`, which is the only value that lands the written dword at `+0x0a/+0x0c` and leaves the inline tail starting at `+0x0e`.
- That stronger runtime shape weakens any claim that `000d:21ed` is already reading a descriptor-family-specific record. `NPCTRIG` slot `0x0a` still remains the best surviving descriptor-side candidate because its five-clause ladder is the only compact body that fits the row-count frontier, but the code evidence now shows the immediate input to `000d:21ed` is a generic frame-local record containing a source far pointer, a seeded slot/additive pair, and an inline tail. The remaining descriptor-side question is therefore one level earlier again: where the caller frame receives its `[frame+0x0a/+0x0c]` far pointer and whether the summed `add_a + add_b` still corresponds to a clause-base/delta pair inside `NPCTRIG` slot `0x0a` rather than to a more generic descriptor-relative offset.
- That changes the `NPCTRIG` cross-check in one important way. `NPCTRIG` slot `0x0a` remains the strongest surviving descriptor-side hypothesis only as an upstream source for a predecoded caller-stream record, because the recovered writer consumes a caller-stream dword plus a seeded placement byte instead of indexing owner rows or descriptor tables directly. `NPCTRIG` slot `0x20` still reads as the typed/setup companion body, but neither slot is now a good fit for the immediate write into `[frame+0x0a/+0x0c]` itself.
- One more layer of the producer path is now instruction-verified too. The setup call at `000d:4788 -> 000c:f844 -> 000c:f6e8` does not seed the new context's `+0xcc/+0xce` caller stream directly from the owner table row. Instead `entity_vm_context_setup` first allocates or reuses the object-local stream buffer at `context+0x36+0xcc`, then copies a caller-supplied setup blob from the parent frame using the far pointer/length arguments passed through `000d:46ec`. The slot/additive record returned by `entity_vm_slot_load_value_plus_offset` becomes the separate seeded `+0xd6/+0xd8` stream, while the owner-table row at `(+0x10/+0x12) + 0x0d*slot + 4` is mirrored to `0x39ca[slot]` and preserved separately in the context state.
- The closest sibling template to `caseD_0` also sharpens the placement-byte reading. `000c:ff9f..000d:000d` reads one signed placement byte and one length byte from the same seeded `+0xd6/+0xd8` stream, then copies `len` bytes from `[frame_base + placement + 0x4]` back onto the caller stream. Together with the recovered `000d:21ed` consumer layout (`[frame+0x0a/+0x0c]` far ptr, `[frame+0x0e..]` inline tail), that makes the strongest current fit a fixed two-slot family for this record shape: `caseD_0` uses placement `0x0006` for the far-pointer dword, and the sibling blob-copier uses placement `0x000a` for the inline tail starting at `frame+0x0e`.
- The producer side of that same record family is now tighter too. Linear raw-byte recovery across `000c:f98b..000d:000d` shows `000c:fc4b..fcbb` as the forward blob producer matching the reverse `000c:ff9f..000d:000d` case: it reads placement and length from the seeded `+0xd6/+0xd8` lane, computes `frame_base = caller + [caller+0xd4]`, and copies `len` bytes from the caller stream at `[caller+0xcc/+0xce]` into `[frame_base + placement + 0x4]`. For the `000d:21ed` record shape, that makes placement `0x000a` the best fit for the inline tail now consumed from `[frame+0x0e..]`.
- The dword lane now has a matching reverse case as well. Raw bytes at `000c:ff1f..ff83` show the same recursive family in the opposite direction: it reads one signed placement byte from the seeded `+0xd6/+0xd8` lane, computes `frame_base = caller + [caller+0xd4]`, loads a dword from `[frame_base + placement + 0x4/+0x6]`, subtracts `4` from `[caller+0xcc]`, and writes that dword back onto the caller stream. In other words, the immediate upstream producer for the `000c:fbf7..fc47` far-pointer write can already be another frame-record copier, not a direct owner-row or descriptor-table lookup.
- That narrows the remaining source classification again. The setup far pointer consumed by `000d:21ed` is now best modeled as a recursively propagated pointer into another VM-side byte buffer or predecoded descriptor workspace, not as the owner/resource row source mirrored separately through `0x39ca`. The owner row still matters for slot-backed state reloads, but the `entity_vm_context_setup` blob pointer itself is traveling through the frame-record family independently of that owner-row mirror.
- That also weakens the full-tuple `NPCTRIG` fit one more notch without killing it. The surviving tuple is now better read as `(slot, add_a, add_b, setup_len, inline_len, placement=0x0006/0x000a)` feeding a generic recursive frame-record contract. `NPCTRIG` slot `0x0a` remains the strongest descriptor-side candidate only as an earlier decoder that could have produced this predecoded record family, while slot `0x20` still reads as the typed/setup companion body. No recovered instruction in the immediate `000c:f98b..000d:000d` family yet ties the setup far pointer directly back to either slot.
- Net effect on source classification: the `000d:21ed`-relevant frame record is still not best modeled as generic VM scratch. Its immediate setup bytes are recursively copied from a parent frame record, and the wider context-build path is still anchored in descriptor-derived VM state (`+0xd6/+0xd8` from `entity_vm_slot_load_value_plus_offset`, owner-row source mirrored via `0x39ca`). What remains open is not whether this lane is scratch-backed, but which earlier decoder materializes the parent-frame far pointer before `000c:fbf7` consumes the next dword.
- After the new reverse-case recovery, that blocker can be stated more tightly: the missing piece is no longer a generic parent-frame materializer somewhere above `000c:fbf7`, but the first non-recursive decoder that originates the far pointer before the `ff1f/ff9f -> fbf7/fc4b -> 000d:21ed` propagation chain repeats it.
- The next pass closes that specific source-classification gap inside the same hidden interpreter body. Raw bytes at `000c:fa2f..fa5b` recover an inner opcode dispatcher that reads one opcode byte from the seeded `+0xd6/+0xd8` lane, bounds-checks it against `0x79`, and jumps through `CS:[0x3d9f + opcode * 2]`. That matters because the same local case family now exposes both the recursive frame-record replay stages and a separate set of direct caller-stream seed cases.
- Those non-recursive seed cases are now concrete. `000c:fd51` writes one inline byte from the `+0xd6/+0xd8` control stream onto the caller stream after decrementing `[caller+0xcc]` by `1`, `000c:fd91` and `000c:fdd1` do the same for inline words, and `000c:fe11..fe59` does it for an inline dword. In the dword case the interpreter advances through four literal bytes in the control stream, subtracts `4` from `[caller+0xcc]`, and writes the literal dword directly onto the caller stream before any frame replay logic runs.
- That makes `000c:fe11` the strongest current first non-recursive origin for the far-pointer lane later consumed by `000c:fbf7..fc47` and then by `000d:21ed`. The immediate setup far pointer is therefore no longer best modeled as coming from the owner/resource row, the mirrored `0x39ca` lane, or a generic VM scratch buffer. Its immediate compiled-side source is an inline dword literal embedded in the interpreter/control stream itself; `000c:ff1f..ff83` and `000c:fbf7..fc47` are replay stages layered on top of that literal-seeding path.
- That retunes the `NPCTRIG` cross-check again without killing it. `NPCTRIG` slot `0x0a` still remains the best upstream descriptor-side candidate because it is still the only compact active-event body that fits the surviving slot/additive shape, and slot `0x20` still reads as the typed/setup companion. But any direct immortality mapping now has to explain how the upstream decoder turns that descriptor family into a literal-bearing VM control stream before `000c:fe11`, not how `000d:21ed` or `000c:fbf7` index descriptor rows directly.
- One more pass tightens the creator/consumer split enough to rule out the owner row as the immediate control-stream builder. Direct instruction recovery at `000d:46ec` shows `entity_vm_context_create_from_slot_index` using the owner-table row `(+0x10/+0x12) + 0x0d*slot + 4` only for the separate `0x39ca[slot]` mirror, while the live `+0xd6/+0xd8` lane passed into `entity_vm_context_setup` still comes from `entity_vm_slot_load_value_plus_offset`. In the recovered `000d:21ed` pre-call span, that seeded lane is consumed as `word slot_index`, `word add_a`, `word add_b`, `byte setup_len`, `byte inline_len`, with `add_a + add_b` forwarded as the dynamic word stored at context `+0x34`.
- The same pass also clarifies the setup-payload contract that feeds the later link-matrix stage. `000d:21ed` passes `[frame+0x0a/+0x0c]` as the setup far pointer into `entity_vm_context_setup`, copies `[frame+0x0e..]` as a separate inline tail, and then `000d:22bc` consumes two signed metadata bytes plus a streamed word matrix to drive repeated `entity_link` calls. The immediate source is therefore `decoded per-slot VM stream + frame replay`, not `owner-row lookup + direct descriptor row`.
- That changes the opcode-family reading around `000c:fa2f` in a useful way even though the exact opcode indices remain unresolved in the current overlapped table view. The hidden dispatcher now has a verified immediate-literal family: `000c:fd51` pushes one inline byte to the caller stream, `000c:fd91` pushes a sign-extended byte as a word, `000c:fdd1` pushes an inline word, and `000c:fe11` pushes an inline dword. Together with the recursive replay cases `000c:ff1f` and `000c:ff9f`, that is enough to classify the upstream builder as a generic literal-bearing interpreter/control stream rather than a direct `NPCTRIG` clause reader.
- The descriptor-side fit therefore weakens from `specific direct NPCTRIG selector` to `broader descriptor-derived VM workspace` while staying narrow enough to keep `NPCTRIG` slot `0x0a` alive as the best upstream candidate. Slot `0x0a` still matches the event-bearing compact body and its five-clause ladder remains the only surviving compact source family with a plausible row-count/additive shape, but slot `0x20` still looks like the typed/setup companion and neither slot is now a good fit for the immediate control-stream seeding logic itself.
- The slot-load miss path now closes the workspace-materialization side of that question. Inside `entity_vm_slot_load_value` (`000d:51fd`), a cache miss triggers `000d:5066`, which first reads a slot header and then a `count * 6 + 0xc0` subentry table through the owner-resource wrapper `000d:714c`. When one subentry is still unloaded, `000d:5305..53d4` allocates a value object through `000d:3800`, then calls `000d:714c` again with the subentry source range and the new object's buffer at `+0x0a/+0x0c`; the function returns that same buffer pointer as the final `DX:AX` result. The immediate `+0xd6/+0xd8` workspace is therefore first materialized as a file-backed slot-value buffer during the slot-load miss path itself, not synthesized later from the owner-row mirror or from generic scratch state.
- The inline-tail source is not as tightly closed yet. The same hidden case family contains several immediate scalar caller-stream seed cases, so the `000d:21ed` tail at `[frame+0x0e..]` can now plausibly be assembled from control-stream literals or from another nearby non-recursive payload case rather than from a direct owner-row read. No instruction recovered in `000c:f98b..000d:000d` performs a matching direct descriptor-row lookup for that tail.
- Net effect from this pass: the missing outer selector into `entity_vm_opcode_sequence_run` is still unresolved, but the lane is no longer just one opaque dispatcher plus dark wrappers. It now has a verified generic masked-context creation hub, wider caller-family anchors for that hub, and two internally differentiated sequencer consumer blocks built directly on `entity_vm_context_create_from_slot_index`.
#### Follow-up: four newly surfaced direct `000d:463a` callers
- `0004:f033` (`0x8000:0x0007`) now reads as a generic gameplay-side materialization lane rather than a state-transition helper. When the local seg021 class-nibble query returns `8`, the wrapper bypasses the VM path and returns object word `+0x02` directly from the locally produced object. Otherwise it forwards through `entity_vm_context_try_create_masked_for_entity` and returns the created object's word `+0x02` on success.
- `0004:f05c` (`0x2000:0x0015`) stays on the gameplay-state side too, but with a stronger caller role. The only current direct caller window at `0004:f2b3` reaches it after overlap/proximity tests and entity byte `+0x32` toggling, so the safest reading is still `stateful gameplay materialization lane`, not `descriptor selector`.
- `entity_vm_context_try_create_mask_0008_slot30_with_offset` (`0006:0ba4`) adds the first strong non-`0005` extra-payload lane. It passes mask `0x0008:0x0030` plus one caller word into `000d:463a`; on failure it drops into `0006:0cfa`, which copies class-detail word `+0x02` to `+0x04`, derives a replacement selector from class-detail words `+0x06/+0x08/+0x0a` or the caller value, may clear flag `0x08` through `entity_class_clear_flag8_and_dispatch`, and then continues into the local state-transition/dispatch table. That is concrete evidence that at least one extra-word masked lane is feeding class-state transition materialization rather than a free-standing VM selector root.
- `entity_vm_context_try_create_mask_0010_slot08_with_offset_if_ready` (`0006:108c`) provides the second strong extra-payload lane. It passes mask `0x0010:0x0008` plus one caller word into `000d:463a`, but only after local readiness gates through `0006:ffed` plus the seg021 availability/flag8-clear path. Unlike the earlier looser reading, the helper itself does not fall back to `0006:13b0` or `0006:13e4`; on miss it simply returns `0`. That makes the function a guarded masked-materialization attempt, while the neighboring `0006:13b0/13e4 -> 0006:07c0` class-linked lookups remain adjacent family evidence rather than a direct local fallback inside `0006:108c`.
- Taken together, the new seg004 and seg006 callers strengthen the existing read of the still-dark wrappers `0005:2c35` (`0x0400:0x000a`) and `0005:2c68` (`0x0800:0x000b`). Those wrappers still have no direct caller evidence, but they now sit inside a larger verified subfamily of `extra-word masked materializers` whose known members feed state selectors, class-linked values, or other gameplay-side payload resolution instead of acting as the real upstream selector into `entity_vm_opcode_sequence_run`.
- MCP-native function xrefs now reinforce that stopping point rather than changing it: `entity_vm_context_try_create_masked_for_entity` reports the expected direct callers through `0004:f047`, `0004:f076`, the named `0005` wrapper island, and the two seg006 callsites `0006:0bbc` / `0006:10e7`, while `entity_vm_opcode_sequence_run` plus the dark `0x0400/0x000a` and `0x0800/0x000b` wrappers still surface no direct function-xref callers in the current database. The best next path therefore remains caller-frame recovery or nearby unnamed-function repair, not another generic masked-hub sweep.
| `000c:f844` | `entity_vm_context_setup` | Calls `entity_vm_stack_init_with_data`, then sets `+0xd6..+0xe3` with position/dimension/state params |
| `000c:f600` | `entity_vm_pair_stack_push` | Push (word_a, word_b) onto 31-entry array at `[ptr+0x80]` (count); error if full |
| `000c:f63c` | `entity_vm_pair_stack_pop` | Pop and return word from pair stack; error if empty |

View file

@ -236,6 +236,9 @@ Current verified caller-side detail:
- The seg127 fade-controller ownership is also one step tighter in the same lane. `transition_preentry_setup_resources` resets `0x630a` at `000c:c855`, `transition_preentry_step_script` now has a verified early gate at `000c:ca25` that yields to the fade controller whenever `0x630a` is active, and `transition_palette_fade_begin` at `000c:cdca` explicitly installs palette source/range/step state into `0x630e..0x6316`, asserts `0x630a`, and kicks one immediate fade tick.
- Fade direction is now pinned to seg126 script-control bytes rather than the outer seg005 wrappers. Inside `transition_preentry_step_script`, control byte `0x5e` reaches `palette_fade_begin_full_down` at `000c:cb06`, while control byte `0x26` reaches `palette_fade_begin_full_up` at `000c:cd1a`; control byte `0x2a` shares the same post-fade bookkeeping path after the full-up call.
- The upstream producer path for the remaining seg126 control bytes is now tighter too. `transition_preentry_setup_resources` composes one path from the mutable base at `0x6aa:0x6ac` plus local name buffers (`0x631c`, `0x6335`) through the seg072 slash-aware path helper `0009:3600`, opens that file through `file_handle_alloc_init_and_open`, allocates a buffer of the returned size, reads the full payload into `0x6301:0x6303`, and seeds `0x62fa/0x62fc/0x62ff/0x6305/0x630a/0x6318` before the loop starts. Current best reading is therefore `file-backed transition script/control buffer`, not locally synthesized opcodes.
- The adjacent seg126 selector lane is now classified tightly enough for conservative renames. `transition_file_family_select_and_refresh` (`000c:afa5`) keys object field `+0x49` through values `0`, `1`, and `4`, composes three sibling filenames from the inherited base `0x6aa:0x6ac` plus shared stem `0x621c` with suffix buffers `0x6223`, `0x622d`, and `0x6237`, loads the chosen file into object `+0x520`, and then runs the same redraw/palette/input refresh path. The same helper uses `field49==2` as a direct `vtable[0x3c]` callback branch and `field49==-1` as a normalize-back-to-zero state.
- The local wrappers around that selector now sharpen the caller model without forcing a stronger UI label. `transition_file_family_advance_on_anim_tick` (`000c:b153`) increments `+0x49` when the polled byte at `[param_2+0x14+0xa]` is clear and then re-enters the selector, while `transition_file_family_input_key_handler` (`000c:b199`) maps Left/Right and `n/N` into previous/next selector steps, uses `e/E` plus repeated `-` to force selector state `4`, and otherwise exits through `vtable[0x3c]`.
- This closes the narrow `+0x49` question as a local three-way file-family selector lane, but it still does not justify a stronger UI label for the paired `0x8c5c/0x8c60` renderer presets or the sibling seg127 fade inputs.
- The remaining `transition_preentry_step_script` opcodes now have stable local mechanics even though the higher-level text semantics are still open. Control byte `0x21` consumes the next script word into `SI` and advances `0x62ff` by two, which makes it the current baseline/start-position loader for later text draws. Control byte `0x40` renders one null-terminated entry from the same script buffer through renderer object `0x8c5c:0x8c5e`, while control byte `0x24` mirrors that behavior through `0x8c60:0x8c62`; both paths measure width through the renderer vtable, draw through seg088 `000a:30d7`, blit through seg080 `0009:943a`, advance `SI` by rendered width plus four, and then scan forward to the next opcode byte. Control byte `0x23` sets local completion byte `0x62fe = 1` and returns, so the outer shell exits on the next loop test instead of iterating further.
- Secondary renderer-factory sampling keeps the `0x8c5c` / `0x8c60` split conservative. Other sampled `000a:9748` xrefs use different adjacent preset pairs such as `0x0d/0x0c` at `0007:df30/df3f` and `0x0c/0x0f` at `0008:47c9/4851`, while no sampled caller reproduced the exact `0x10/0x11` startup pair outside `transition_preentry_setup_resources`. That supports keeping these as paired preset text renderers without forcing a title/body or normal/highlight label.
- The missing seg126 step body at `000c:ca1d` still cannot be split out safely because `create_function_by_address` collides with the existing oversized overlap namespace, so this pass preserved the recovery as a decompiler comment instead of forcing a destructive boundary repair. Current best reading is still that `000c:ca1d..cd34` is the real `transition_preentry_step_script` body and that `000c:cd35` starts the fade-tick helper.
@ -317,6 +320,14 @@ Current best neutral conclusion from this pass: the shared `g_active_dispatch_en
- The in-scope `0x31a2` readers are now classed cleanly by role. `0004:c24d` and `000c:e4d8` are edge waits; `000c:ca11` is the seg126 modal-break exit; `000c:e546`, `000c:e5c6`, and `000d:c0ee` are cleanup-abort exits; `000d:9304` and `000d:b6b1` are deferred dispatch/state-advance gates.
- Two remaining `0x31a2` reads stay outside that presentation classification set. `0005:453d` is only a plain getter wrapper for the shared depth word, and `0008:5149` is a seg008 internal/accounting-side read that adds the current depth to another local count before tripping a `>= 0x10` capacity flag.
### Current batch: renderer preset contract and seg127 fade-input closure
- `transition_preentry_setup_resources` is now exact on the paired renderer setup path. Instruction window `000c:c659..c6ab` shows that `FUN_000a_9748` is called only with preset ids `0x10` and `0x11`, storing the resulting temporary renderer objects at `0x8c5c:0x8c5e` and `0x8c60:0x8c62`, then immediately drawing the same seed text buffer `DS:0x631a` at `(0x0a,0x0a)` through both. This closes the structural question as `paired preset text lanes` inside one temporary transition presentation path, but still does not justify a stronger title/body or highlight/shadow label.
- The recovered `transition_preentry_step_script` body is also slightly tighter on the two text opcodes. `0x40` and `0x24` both measure their string through renderer vtable slot `+0x0c`, center it inside a `0x280`-wide lane, fetch rendered width through slot `+0x08`, draw through seg088 `000a:30d7`, blit through seg080 `0009:943a`, and advance `SI` by `rendered_width + 4`; only the selected preset lane differs (`0x8c5c` for `0x40`, `0x8c60` for `0x24`).
- The seg127 fade-controller inputs are now exact rather than only role-level. `transition_palette_fade_begin` stores palette source at `0x630e:0x6310`, start index at `0x6312`, count at `0x6314`, step at `0x6316`, brightness at `0x630d`, active flag at `0x630a`, and direction/state at `0x630b`, then immediately ticks the local fade controller. `transition_palette_fade_tick` dispatches `0x630b==1` to `transition_palette_fade_out_step` and `0x630b==2` to `transition_palette_fade_in_step`.
- The two default script-selected fade wrappers are now instruction-verified too. `palette_fade_begin_full_down` at `000c:c616` pushes direction `1`, step `4`, count `0x80`, start `0`, and palette buffer `DS:0x8c64`; `palette_fade_begin_full_up` at `000c:c600` is the same wrapper with direction `2`. Combined with the `0x5e`, `0x26`, and `0x2a` script-byte sites in `transition_preentry_step_script`, this closes the neighboring seg127 fade-input contract for the startup/display lane.
- The late presentation-handoff family is now direct-decompile confirmed rather than only caller-window inferred. `FUN_000d_938c` creates up to two temporary runtime-state palette entries (`kind 0x3c`, then `kind 0x14`), waits for them to clear, redraws, clears `g_active_dispatch_entry_farptr[+0x40]`, and only then dispatches caller vtable `+0x08`; `entity_cleanup_resources_and_dispatch` shows the same late shared-hold clear on the `entity +0x737` branch immediately before the shared `0x2bd8` controller dispatch. That is enough to treat the startup/display major section as materially complete, with only low-impact residual ambiguity around the exact UI label of preset pair `0x10/0x11` and the optional overlap hygiene at `000c:db68`.
---
## Follow-up: `0x4588` Object-Role Evidence
@ -330,6 +341,10 @@ The `0x4588` FAR object is a runtime-installed callback/dispatch object that par
- **Teardown:** `000a:4a56` checks a once-flag at `0x4595`, clears `0x4588` when non-null, optionally performs a vtable `+0x0c` callback when `0x4590 != 0x458c`, then calls vtable slot `+0x04` followed by `FUN_0009_0d30()`.
- **Callbacks:** `000a:b9e5`, `000a:ba66`, `000d:9d5e`, and `000d:a3b7` all push a two-word value pair followed by the `0x4588` FAR pointer and call vtable slot `+0x0c`. `entity_conditional_render_dispatch` calls the same vtable slot with a single literal `0x0101` argument.
Current batch note:
- `runtime_callback_object_init_once`, `runtime_callback_object_teardown_once`, and `entity_conditional_render_dispatch` now line up even more strongly as a video or presentation-state callback lane rather than a generic allocator client. The object is installed only after BIOS video-state snapshot, teardown emits a final callback only when recorded mode/state changed, and one live caller uses the literal mode-like pair `0x0101` through the same vtable `+0x0c` slot. That is enough to keep pushing the role toward `presentation/video-state callback broker`, but still not enough for a fully behavioral subsystem rename.
### Payload pairs from payload sync callsites
- `000d:9d5e` → vtable `+0x0c` payload from object fields `+0x12d/+0x12f`
@ -362,11 +377,14 @@ The next ScummVM-guided validation step now confirms that the sampled owner-load
### Header and event-table shape
- The loader-side count field is now tighter too. The first dword in the sampled owner-loaded class header is not the total slot count; `000d:5066` uses it as the extra-slot count beyond a fixed `0x20` base table, which is why the cached table allocation is `extra_count * 6 + 0xc0` and the refcount array is `extra_count * 2 + 0x40`.
- That reading matches the extracted class-family shapes exactly: `EVENT` keeps first dword `0x00000000`, `NPCTRIG` moves to `0x00000001`, and `ROLL_NS` to `0x00000002`, while the already-validated owner-loaded event counts remain `0x20`, `0x21`, and `0x23` respectively.
- The sampled class records do contain a stable 4-byte header field at bytes `8..11`.
- The observed values are small boundaries: `0x00d4`, `0x00da`, and `0x00e6` in the current sample set.
- Treating that dword directly as the first post-event-table offset makes the layout line up cleanly: `(dword_at_8 - 20) / 6` yields valid tables of 32, 33, or 35 slots before inline payload/name data begins.
- The region at `class + 0x14` is therefore now directly confirmed as repeated 6-byte slots with `u16 unknown_word + u32 code_or_payload_field` layout.
- Representative low-slot examples are `JELYHACK` slot `1` = `{word=0x002a, dword=0x00000001}`, `SURCAMNS` slot `1` = `{word=0x0051, dword=0x000000d2}`, `SURCAMEW` slot `1` = `{word=0x00f7, dword=0x000000d2}`, `EVENT` slot `10` = `{word=0x1fd6, dword=0x00000001}`, and `REE_BOOT` slots `10/15/16` = `{0x034b,1}`, `{0x025c,0x034c}`, `{0x003b,0x05a8}`.
- The runtime-side selector arithmetic is now exact as well: the owner-resource callbacks operate on `class_id + 2`, which matches the extracted `object_index` column directly. `EVENT` therefore lands on child `0x363` from class id `0x361`, and `NPCTRIG` on child `0x365` from class id `0x363`.
- The leading event word is still not decoded semantically.
### What remains open
@ -374,11 +392,50 @@ The next ScummVM-guided validation step now confirms that the sampled owner-load
- Scanning with the previously noted ScummVM-style `(base_offset + 19) / 6` interpretation overruns into inline payload/name bytes on these owner-loaded records, so the local sample set does not support that exact event-count formula as written.
- The best current arithmetic fit is now tighter: ScummVM's decremented `base_offset` is also used as the live code-stream base in `uc_machine.cpp`, so the local owner-loaded records fit best if bytes `8..11` are the first code-byte offset and event-count derivation is `(base_offset - 19) / 6`, which is exactly equivalent here to `(raw_u32_at_8_11 - 20) / 6`.
- Current `000d` loader evidence does not point to a header rewrite before VM consumption. `entity_vm_runtime_init_from_path_if_configured` (`000d:44df`) only builds the external path and creates the runtime, `entity_vm_runtime_create` (`000d:4c99`) only installs the helper returned by `000d:7000`, `entity_vm_runtime_owner_resource_create` (`000d:7000`) only allocates the child owner table and fills it through helper vtable `+0x0c`, and `entity_vm_context_create_from_slot_index` (`000d:46ec`) directly reads slot-backed source data from that owner table. No local step is yet verified as rewriting the sampled class headers.
- The slot-value miss path is now exact enough to align against the extractor rather than only against motifs. `entity_vm_slot_load_value` (`000d:51fd`) does not build the returned workspace out of owner-row fields or late interpreter scratch: on a miss it uses `000d:5066` plus the same owner-resource wrapper `000d:714c` to read a `0x14`-byte class header, then a cached `6 * (0x20 + extra_count)` subentry table, and finally the selected subentry's byte range straight into a newly allocated value-object buffer at `+0x0a/+0x0c`.
- The final body read at `000d:53b4` now matches the extracted row arithmetic exactly. The 6-byte row contributes `word body_len` plus `dword raw_code_offset`, the class header contributes `dword code_base`, and the reader fetches `body_len` bytes from `code_base + raw_code_offset - 1` through `code_base + raw_code_offset + body_len - 2`.
- That gives a direct owner-loaded fit for the two surviving `NPCTRIG` bodies. For class `NPCTRIG` (`class_id = 0x363`, `object_index = 0x365`), slot `0x0a` uses `{len = 0x0175, raw_code_offset = 0x00000001, code_base = 0x00da}` and therefore materializes range `0x00da..0x024e` (`373` bytes), while slot `0x20` uses `{len = 0x0159, raw_code_offset = 0x00000176, code_base = 0x00da}` and therefore materializes range `0x024f..0x03a7` (`345` bytes). `EVENT` slot `0x0a` fits the same runtime arithmetic with `{len = 0x1fd6, raw_code_offset = 0x00000001, code_base = 0x00d4}` -> `0x00d4..0x20a9`.
- Because `000d:5066/51fd/53b4` now line up with the extracted class headers and event rows byte-for-byte, the remaining immortality blocker is no longer header math or slot-number translation. The open step is upstream class selection into this now-verified loader path: whether the live slot `0x0a` request really names `NPCTRIG`, `EVENT`, or another descriptor family sharing the same owner-loaded format.
- `entity_vm_runtime_owner_resource_create` (`000d:7000`) still does not expose a direct binary-side class-name lookup or explicit `classid + 2` arithmetic. What it does expose is an indexed file-set loader contract: helper-owned count at `+0x14`, far-pointer table at `+0x10`, paired per-entry word table at `+0x18`, vtable `+0x04` size query, and vtable `+0x0c` materialization of the `0x0d`-stride owner records later consumed by `entity_vm_context_create_from_slot_index`. The current pass also makes the helper shape slightly more concrete: the two raw seg070 windows at `0009:67b6` and `0009:6916` are twin per-entry path/read loops with distinct format strings (`DS:3f2d` and `DS:3f40`) but the same `+0x10/+0x18` indexing and file open/read/close lane, which is better evidence for a multi-table or multi-phase external loader than for direct in-memory descriptor iteration.
- The signed slot-offset lane used by the still-xref-dark wrappers `0005:2c35` / `0005:2c68` is also no longer confined to `entity_vm_context_create_from_slot_index` (`000d:46ec`). Inside `entity_vm_runtime_create`, the pre-entry body at `000d:4c25..4c90` reloads object fields `+0x32/+0x34` through `entity_vm_slot_load_value_plus_offset` (`000d:5572`), stores that returned pair into object fields `+0x10c/+0x10e`, and also caches the owner-source far pointer at `+0x117/+0x119`. The paired save path at `000d:49ec` then serializes `+0x10c` through seg070 `0009:2034`, which makes the slot-plus-offset pair a persisted runtime/dispatch state lane rather than a transient wrapper-only argument.
- The signed slot-offset lane used by the still-xref-dark wrappers `0005:2c35` / `0005:2c68` is also no longer confined to `entity_vm_context_create_from_slot_index` (`000d:46ec`). Ghidra now reflects that contract in the conservative wrapper names `entity_vm_context_try_create_mask_0400_slot0a_with_offset` and `entity_vm_context_try_create_mask_0800_slot0b_with_offset`. Inside `entity_vm_runtime_create`, the pre-entry body at `000d:4c25..4c90` reloads object fields `+0x32/+0x34` through `entity_vm_slot_load_value_plus_offset` (`000d:5572`), stores the reconstructed `DX:AX` pair into object fields `+0x10c/+0x10e`, and also caches the owner-source far pointer at `+0x117/+0x119`. The paired save path at `000d:49ec` is narrower than it first looked: it serializes only the low word at `+0x10c` through seg070 `0009:2034`, while the high word is recomputed on load from the fresh `entity_vm_slot_load_value()` result plus the saved additive word.
- Current disassembly closes the exact low-slot wrapper contracts too. `0005:2c35` sign-extends caller word `[BP+0x0a]`, then calls `entity_vm_context_try_create_masked_for_entity` with slot `0x0a` and packed mask `0x00000400`; `0005:2c68` is the same signed-additive shim for slot `0x0b` and packed mask `0x00000800`. Neither wrapper has a recovered outward code/data xref yet, so the best current provenance remains `extra-word masked materializer family member`, not a gameplay event label.
- The newly recovered post-load consumers of `+0x10c/+0x10e` are weak and do not behave like a recovered event-dispatch selector. Predicate `FUN_0001_a772` returns true only when the pair is exactly `0000:0001`, while normalization block `FUN_0002_1860` checks `segment == 0` and clamps `offset < 0x0080` up to `0x0080`. No recovered downstream comparison or dispatch branch matches the five verified `NPCTRIG` slot `0x0a` clause starts (`0x0064/0x0093/0x00c2/0x00f1/0x0120`) or backward targets (`0x001f/0x004e/0x007d/0x00ac/0x00db`); if anything, the `0x0080` floor cuts across that family instead of confirming it.
- The masked-create hub in front of that lane is now explicit too. Window `000d:463a..46e8` maps one gameplay entity through `entity_vm_slot_index_from_entity`, tests the owner/resource table row mask at `0x6611 -> +0x1315/+0x1317 -> (+0x10/+0x12) + 0x0d*slot`, and only then calls `entity_vm_context_create_from_slot_index`. That matters because the offset-specialized wrappers `0005:2c35` / `0005:2c68` are now instruction-verified as nothing more than sign-extended extra-word shims over this generic masked-context hub, rather than separate selector logic.
- The upstream slot selector is now exact enough to rule out one remaining binary-side shortcut. `entity_vm_slot_index_from_entity` (`000d:45c5`) does not expose a class-family choice like `NPCTRIG` versus `EVENT`; it only chooses one of three generic category spans before the owner row is consulted: `(a)` entity ids `1..255` with class-word bit `0x0002` clear map to `entity_id + base_0x8c7e`, `(b)` class-nibble `4` objects map to `class_byte_0x7e05 + base_0x8c80`, and `(c)` everything else maps to `type_word_0x7df9 + base_0x8c7c`.
- The runtime init path now shows where those bases come from too. After `entity_vm_runtime_create` succeeds, `entity_vm_runtime_init_from_path_if_configured` (`000d:44df`) seeds `0x8c7c/0x8c7e/0x8c80/0x8c82` as cumulative category bases by looping over four word counts at `0x6608..0x660e`. Because the compiled side only sees those category-base spans and the later owner-row mask words, it still does not reveal a direct descriptor-class discriminator before the slot body is loaded.
- One direct non-hub consumer reinforces that read. `FUN_0005_295f` is the only currently recovered caller of `entity_vm_slot_index_from_entity` outside the masked hub; it recomputes the same slot index, directly tests owner-row bit `0x0040`, and then branches into gameplay handling before optionally calling `entity_vm_context_try_create_masked_for_entity` with mask `0x0040:0x0006`. Together with the still-empty xref results for `0005:2c35` and the stable `0005:2c35..2c57` function boundary, the safest current interpretation is that these owner-row words are generic capability masks, not explicit `NPCTRIG` / `EVENT` family tags.
- The next immortality pass separates that owner-row path from the live control-stream path even more sharply. Inside `entity_vm_context_create_from_slot_index` (`000d:46ec`), the owner-table row still feeds only the preserved `0x39ca[slot]` mirror, while the actual `+0xd6/+0xd8` control stream handed to `entity_vm_context_setup` comes from `entity_vm_slot_load_value_plus_offset` and the caller-supplied setup/tail pointers come from the current VM frame record. That makes the immediate builder for the `000d:21ed` lane `slot-backed decoded stream plus frame-local replay`, not `owner-row decode`.
- That is the current hard wall for the immortality frontier. The strongest verified answer remains that `NPCTRIG` slot `0x0a` is the best upstream descriptor-side fit and `EVENT` slot `0x0a` remains the generic-hub baseline, but the binary selector path now bottoms out at category spans plus row-capability bits rather than at a provable class-family discriminator.
- The open descriptor question therefore moves one step earlier again. Current `000d` loader/runtime evidence still supports a descriptor-derived upstream workspace, but not a direct owner-row-to-opcode path for the immortality trigger. The closest verified compiled-side seeding now happens later inside the hidden dispatcher at `000c:fa2f`, where immediate literal cases can push byte/word/dword payloads straight onto the caller stream before the frame replay family re-materializes them into the child frame.
- The seg070 twin-file-family helper is now tighter at the buffer/schema level as well. The paired loops at `0009:67b6` and `0009:6916` do not reuse one ambiguous scratch object: each loop performs its own size query/allocation sequence, builds paths from the same `+0x10/+0x18/+0x14` table trio with its own format string (`DS:3f2d` versus `DS:3f40`), feeds a dedicated temporary far buffer through the shared `file_handle_alloc_init_and_open` / `dos_file_seek` / `dos_file_close` trailer, and then frees that loop-local buffer before returning. Current safest read is therefore `two distinct temporary file-family materialization passes inside one owner-resource helper`, not one callback shard reused for both families.
- Additional `0x39ca` consumers are now classified more cleanly. Beyond the already-known static seeds at `000d:7299 -> DS:67f2` and `000d:761c -> DS:6872`, the constructor-like windows at `000d:929a` and `000d:963c` seed rows `DS:68ec` and `DS:68f5` respectively before enabling local timer/dispatch behavior. Those writes behave like dispatch-entry-local static seed rows, not owner-table mirrors. Separately, `FUN_000d_938c` reads temporary dispatch-entry fields `+0x32/+0x34` at `000d:9449..9468` and `000d:9547..9566` only as a wait/poll condition on the scratch-palette (`kind 0x3c`) and current-palette (`kind 0x14`) entries it creates, which further separates active dispatch-entry state from the owner-backed `0x39ca[slot] = {source_off, source_seg}` rows written by `000d:46ec`.
- Safe event-label correlation remains intentionally narrow after this pass. The sampled low slot ids are now concrete, but none of them yet have a verified binary-side behavior match strong enough to promote a ScummVM label like `look`, `use`, or `cachein`.
### Current batch: higher-slot masked wrapper ladder (`0x10..0x14`)
- The gameplay-side masked-wrapper island now extends one verified step past the older `0x0f` frontier. Raw call setup around `0005:3115..322d` shows five higher-slot entries feeding `entity_vm_context_try_create_masked_for_entity` with slot ids `0x10`, `0x11`, `0x12`, `0x13`, and `0x14`.
- The slot `0x10` lane is not yet a clean standalone function object, but the containing body at `0005:3115..3129` is exact enough to classify its call shape: it pushes zero extra word, slot `0x10`, packed mask `0x00010000`, and the live entity pointer before the far call to `000d:463a`. The preceding guard at `0005:30f2..3113` restricts that path to one class-nibble-`4` lane.
- Four neighboring helpers are now renamed directly in Ghidra from stable function objects:
- `0005:313e` = `entity_vm_context_try_create_mask_00020000_slot11_with_offset`
- `0005:3171` = `entity_vm_context_try_create_mask_00040000_slot12`
- `0005:31da` = `entity_vm_context_try_create_mask_00080000_slot13_with_offset_if_valid_entity`
- `0005:31a0` = `entity_vm_context_try_create_mask_00100000_slot14_with_offset`
- Their payload shapes are now exact from disassembly, not only inferred from decompile:
- slot `0x11` pushes one caller-supplied extra word (`MOVZX EAX,[BP+0xa] ; PUSH EAX`)
- slot `0x12` pushes a fixed zero extra word
- slot `0x13` pushes one sign-extended caller word after the same `0005:2686` / `0005:ffed` entity-validity gate used by the older slot-`0x01` helper
- slot `0x14` pushes one caller-supplied extra word
- This widens the verified owner-slot taxonomy in a USECODE-relevant way: the binary is no longer only distinguishing compact low-slot wrappers like `0x0a`/`0x0b`; it also separates a higher-slot family with mixed `no extra word` versus `signed extra word` call contracts.
- The first outward callers in this higher-slot family are now explicit too. `entity_vm_context_try_create_mask_00040000_slot12` (`0005:3171`) is called at `0005:1776` and `0005:1945`; both callsites are currently trapped in non-function windows, but they are real direct edges into the slot-`0x12` zero-extra-word lane. By contrast, current MCP xrefs still show no direct outward callers for the slot `0x11`, `0x13`, or `0x14` wrappers and still none for the dark slot `0x0a` / `0x0b` pair.
- The persisted-context side of the same lane is now tighter at the field level. `entity_vm_context_save` (`000d:498f`) serializes `+0x11f`, `+0x121`, the derived low word at `+0x10c`, the additive word at `+0x34`, and the `0x80`-byte local buffer at `+0x36/+0x38`; `entity_vm_context_load` (`000d:4a78`) rebuilds the frame pointers, reloads the saved low word as the additive argument to `entity_vm_slot_load_value_plus_offset`, restores `+0x10c/+0x10e`, and refreshes the owner-linked source pair at `+0x117/+0x119`. That strengthens the current read that persistence preserves `(slot, additive_word, derived_low_word)` after selector choice, not the upstream class-family selector itself.
- The external event-name correlation can now be tightened slightly but still stays hint-level only:
- slot `0x12` having no extra word is compatible with the external `justMoved()` zero-argument event label
- slot `0x13` carrying one extra word is compatible with Pentagram's `AvatarStoleSomething(uword)` signature
- slot `0x11` carrying one extra word is compatible with Pentagram's placeholder `func11(sint16)` signature and with ScummVM's unresolved `cast`-side slot only at the broad `one scalar argument` level
- slot `0x14` currently does **not** fit Pentagram's older zero-argument `animGetHit()` signature, so that ordinal should remain slot-numbered on the binary side for now
- Operational consequence for the current VM lane: there is now stronger binary evidence that the masked-context family is organized around slot ordinals with distinct payload shapes, not only around one low-slot trigger subset. That helps the current round-trip IR because it justifies keeping higher ordinals as slot-stable records with payload-shape metadata even when their event labels remain external hints.
- The sequencer-side consumer model is also now preserved directly in Ghidra. Address `000d:22bc` carries a decompiler comment recording it as a sequencer-internal matrix stage: it reads two signed metadata bytes from `+0xd6/+0xd8`, consumes caller-stream words as entity/link ids, repeatedly calls `0008:7d27`, and only pushes back words without bit `0x0400` before jumping to `entity_vm_opcode_finish`.
### Conservative parser rule from this batch
- For current owner-loaded/raw EUSECODE work, keep bytes `8..11` raw and derive event count only with `(raw_u32_at_8_11 - 20) / 6` when divisibility and object-size bounds checks succeed.

View file

@ -67,7 +67,13 @@ A small helper cluster in the raw `000e:` area implements a fixed-size CRLF reco
- the trigger/object namespace now clearly includes `JELYHACK`, `NPCTRIG`, `CRUZTRIG`, and `TRIGPAD`
- `JELYHACK` / `JELYH2` sit in a local extraction neighborhood beside `SPECIAL`, `TRIGPAD`, `DATALINK`, `HOFFMAN`, `REE_BOOT`, `SURCAMEW`, and `SFXTRIG`, which looks more like a map/object grouping than random table order
- that neighborhood does not make `JELYHACK` itself event-bearing, but it does place it immediately beside multiple event-capable or trigger-adjacent classes (`REE_BOOT`, `SFXTRIG`, `SURCAMEW.eventTrigger`)
- no extracted chunk has yet been tied directly to event `0x410`
- the requested descriptor-family sweep now sharpens the nearby callable-body picture too: `NPCTRIG` is the only requested family here that is both explicitly event-bearing and non-empty in `class_event_index.tsv` (`equip` at slot `0x0a`, plus anonymous slot `0x20`), while `SPECIAL`, `TRIGPAD`, and `REB_PAD` have callable bodies but still look like state/controller or referent-neighbor records rather than direct event carriers
- the new generated `immortality_target_body_scan.md` / `.tsv` report now scans `EVENT`, `NPCTRIG`, `COR_BOOT`, `REE_BOOT`, `SFXTRIG`, `SPECIAL`, and `TRIGPAD` body windows directly for inline little-endian `0x0410`, dword `0x00000410`, and byte-swapped `0x1004`
- that scan found zero literal hits in every currently targeted body, so no extracted target body is yet tied directly to event `0x410` by immediate-value evidence
- the `TELEPAD` slot-`0x20` row with `raw_code_offset = 0x00000410` in `class_event_index.tsv` is now closed as an offset collision, not proof that `TELEPAD` emits gameplay event `0x410`
- the new body scan also narrows the frontier structurally: `EVENT` remains one monolithic slot-`0x0a` body (`8150` bytes), `NPCTRIG` remains the strongest compact trigger frontier with slot `0x0a` (`373` bytes) plus slot `0x20` (`345` bytes), and `_BOOT` slot pairs (`COR_BOOT`/`REE_BOOT`) stay near-template bodies rather than unique immortality emitters
- `SPECIAL` and `TRIGPAD` are now stronger negative controls too: both still have callable bodies, but the new literal scan found no inline `0x410` evidence there either
- the practical blocker is now narrower: the extractor no longer stops at body offsets only, but it still does not decode emitted payload values or bytecode operands inside the surviving `EVENT` slot-`0x0a` and `NPCTRIG` slot-`0x0a` / `0x20` frontier bodies
- one exact `0x410` collision in compiled code is now explained away: `000e:0953` pushes `0x410` into imported `ASYLUM.27` from the animation audio-subframe path immediately after setting the local audio-completion byte at `+0xef1`. Since `ASYLUM.DLL` is the `ASS_*` audio/media library, treat this as a media ordinal/value collision rather than a second gameplay or USECODE event source.
- the present best reading is that `0x410` is likely carried by data relationships between generic event-capable descriptors (`EVENT`, `NPCTRIG`, `SFXTRIG`, etc.) and map/object references rather than by a plain-text script line
- The `000e:` record parser helpers still matter, but they now appear to cover only the text-oriented subset rather than the entire FLX payload. The strongest concrete caller so far is the raw window at `000e:1b9f..1d49`, where `record_table_parse_buffer` is invoked after setup of fields that match the known animation object layout (`+0x117/+0x11b/+0x11f/+0x123`, `+0xeaf/+0xeb1`, `+0x10f/+0x111`). That makes the currently verified `000e:3639` consumer part of the animation-object lane, not a clean standalone EUSECODE loader.
@ -198,6 +204,10 @@ The game uses standard RIFF/IFF:
### Unresolved callee
- `000e:ffb0` remains unresolved (decompiles garbled due to overlapping instructions at `000f:0085/000f:0086`). Current evidence from the `animation_start` loop suggests this path is the video-side subframe loader paired with `anim_load_audio_frame`.
- The caller-side proof is now explicit enough to preserve that note in Ghidra too: `animation_start` invokes `anim_load_video_frame_wrapper` once per active subframe immediately after `anim_load_audio_frame`, and `anim_load_video_frame_wrapper` is only a thin forwarder to `000e:ffb0`. Until the overlap is repaired, the safest label remains `unresolved video-side subframe loader paired with the resolved audio-frame path`.
- A second caller pass tightens the local model without forcing a repair. `search_instructions` now shows `anim_load_video_frame_wrapper` is also called at `000e:11af` and `000e:1245`, not only from the startup prime loop at `000e:220c`. In both of those additional windows the return value is checked as a success/failure result, which makes `000e:ffb0` look like an active chunk-consume/decode step rather than a passive notifier.
- The strongest new evidence is the neighboring tag gate at `000e:121d..1234`: after `anim_load_audio_frame` runs, the same lane checks the current RIFF chunk tag against `0x62643030` / `0x63643030` (`"00db"` / `"00dc"`) before clearing the local busy flag and continuing. That is the first concrete caller-side clue that `000e:ffb0` is consuming AVI video-frame chunk types rather than some unrelated animation-side bookkeeping path.
- Boundary analysis still reports one overlapped function object `FUN_000e_ffb0 @ 000e:ffb0 body 000e:ffb0 - 000f:00e0`, so the function remains comment-only for now. The useful gain is semantic: the unresolved body is now best described as `video-side subframe loader/decoder for the 00db/00dc chunk lane, paired with anim_load_audio_frame`.
### Constructor pattern

View file

@ -0,0 +1,205 @@
# Pentagram-Derived USECODE Parser And Ghidra Path
## Purpose
This note turns the earlier feasibility assessment into a concrete workflow.
The goal is not to make Ghidra decompile Crusader USECODE as if it were x86 immediately. The goal is to build one trustworthy bridge layer first:
- reuse Pentagram's Crusader opcode decoding where it is still valid
- replace Pentagram's older Crusader container/header assumptions with the owner-loaded class and slot model already verified in the binary and extractor
- emit a lossless IR that can drive both human-readable USECODE output and future Ghidra annotations
## What To Reuse From Pentagram
Useful directly:
- the opcode tokenization model from `convert/Convert.h`
- the disassembly-oriented mnemonic layout from `tools/disasm/Disasm.cpp`
- the Crusader event ordinal table from `convert/crusader/ConvertUsecodeCrusader.h`
Useful only as hints:
- intrinsic names and signatures
- old event-name labels for still-unresolved higher ordinals
Not safe to reuse unchanged:
- Pentagram's Crusader header reader
- any assumption that its old `maxOffset` / `externTable` / `fixupTable` structure matches the owner-loaded EUSECODE class bodies now validated in the extractor and DOS binary
- the partial Node-based decompiler path as if it were a general Crusader decompiler
## Verified Local Model To Use Instead
The proof-of-concept parser should be grounded in the existing local artifacts, not in Pentagram's old header logic.
Current authoritative inputs:
- `USECODE/EUSECODE_extracted/class_layout_index.tsv`
- `USECODE/EUSECODE_extracted/class_event_index.tsv`
- `USECODE/EUSECODE_extracted/chunks/`
Current authoritative facts:
- owner-loaded class object index is `class_id + 2`
- class bytes `8..11` provide the code-base anchor already carried in `class_layout_index.tsv`
- slot rows are 6-byte records: `u16 raw_event_entry_word + u32 raw_code_offset`
- slot body windows are already emitted conservatively as `derived_body_start`, `derived_body_end`, and `derived_body_length`
## End-To-End Process
### 1. Start from extracted owner-loaded artifacts
The parser should not reopen `EUSECODE.FLX` directly for the proof of concept. The extractor has already normalized the class and slot selection step.
Inputs:
- one row from `class_layout_index.tsv`
- one row from `class_event_index.tsv`
- the corresponding chunk file under `USECODE/EUSECODE_extracted/chunks/`
### 2. Select one body window conservatively
For a chosen class and slot:
- locate `entry_index`
- confirm `derived_body_start` and `derived_body_end`
- slice the chunk-local body bytes exactly from that range
### 3. Decode opcodes with Pentagram-derived operand formats
Use Pentagram's operand-width model as the first parser source of truth.
For the proof of concept, keep decoding conservative:
- parse the op exactly when the operand format is understood
- keep the raw bytes for every parsed op
- stop cleanly on an unknown opcode and preserve the remaining tail bytes
### 4. Emit canonical IR v1
The parser output should be one machine-friendly object that includes:
- source artifact metadata
- class metadata
- slot/event metadata
- exact op list with raw bytes
- annotation hints for compiled-side VM anchors
### 5. Feed Ghidra with annotations, not with fake code yet
The first Ghidra-side use should be comments, bookmarks, and cross-reference notes on the compiled VM functions.
Do not try to map the bytecode into a full processor module first.
## Proof-Of-Concept Parser
Tool path:
- `tools/poc_crusader_usecode_parser.py`
Current scope:
- uses the extracted TSV and chunk artifacts already in the repo
- disassembles one selected class/slot body at a time
- emits canonical IR JSON
- optionally emits a readable text listing beside the JSON
Current deliberate limits:
- no full intrinsic name table yet
- no synthetic control-flow graph yet
- no recompilation path yet
- no Ghidra importer yet
That keeps the parser useful without pretending the VM is fully solved.
## Canonical Ghidra Annotation Import Path
The first importer should consume the parser IR and create only three kinds of output.
### 1. Bookmarks
Use bookmarks for class/slot-level evidence that should not be hidden inside comments.
Good first bookmark payloads:
- `NPCTRIG slot 0x0A body parsed by POC tool`
- `EVENT slot 0x0A body parsed by POC tool`
- `slot 0x13 payload-shape hint = signed_word`
### 2. Plate or decompiler comments on compiled anchors
Use comments on the compiled runtime functions that already consume or materialize the USECODE bodies.
Best current anchors:
- `000d:51fd` = slot value load path
- `000d:5572` = slot value plus additive word
- `000d:46ec` = context create from slot index
- `000d:22bc` = decoded matrix/pushback consumer
- `000d:ebe3` = opcode sequence runner
Comment payload should stay short and evidence-heavy, for example:
`POC USECODE body anchor: NPCTRIG slot 0x0A -> body 0x00DA..0x024F, raw word 0x013E, payload shape unresolved, parsed via tools/poc_crusader_usecode_parser.py`
### 3. Optional comment bundles per runtime family
If a later importer wants to annotate more than one function at once, keep it grouped by runtime family instead of by class name.
Examples:
- `slot-backed-owner-loaded-body`
- `slot-plus-offset-value-reload`
- `sequencer-matrix-consumer`
- `literal-replay-interpreter-upstream`
## Why Not A Ghidra Processor Yet
The missing pieces are still too important:
- full opcode semantics are incomplete
- stack and return discipline are incomplete
- the relation between owner-loaded body bytes and the later `000c:fa2f` literal/replay lane is still not closed end-to-end
- the upstream selector into `entity_vm_opcode_sequence_run` is still unresolved
So the right order is:
1. parser
2. IR
3. annotation import
4. only then reconsider a language module
## User Workflow
Run the proof-of-concept parser from the repo root.
Example:
```powershell
c:/Users/Maddo/.PYENV/PYENV-WIN/versions/3.14.3/python.exe tools/poc_crusader_usecode_parser.py --class NPCTRIG --slot 0x0A --emit-text
```
Recommended first targets:
1. `NPCTRIG` slot `0x0A`
2. `NPCTRIG` slot `0x20`
3. `EVENT` slot `0x0A`
4. one `_BOOT` slot `0x10` body as a short repeated-template control sample
What to look for in the output:
- exact raw body window
- whether the body terminates cleanly at opcode `0x7A`
- body-local call targets and global-address ops
- repeated structural motifs that can be carried back into the VM notes
- anchor hints for the compiled runtime functions
## Next Extensions
1. Add the full Crusader intrinsic-name table from Pentagram as hint-only metadata.
2. Emit repeated-body family diffs directly from the parser instead of only from the extractor reports.
3. Add a small importer that converts `annotation_hints` into Ghidra comments and bookmarks.
4. Extend the IR with control-flow edges only after branch/jump confidence is high enough.
5. Tie parser output back to the current slot/additive runtime tuples used in the compiled VM lane.

View file

@ -359,6 +359,135 @@ class:
confidence: authoritative-bytes, hinted-label
```
## IR v1 Parser Schema
The next tooling step changes the role of this document slightly. IR v0 was a note-level target for reversible human-readable output. IR v1 is the canonical machine-facing schema for the Pentagram-derived proof-of-concept parser and any future Ghidra annotation bridge.
The design constraints are now explicit:
- keep every authoritative owner-loaded byte visible
- keep slot identity separate from semantic name hints
- keep runtime-facing metadata visible even when the body decompiler cannot yet explain it
- preserve enough structure to emit Ghidra comments and bookmarks later without reparsing prose notes
### Top-level IR object
```yaml
schema_version: crusader-usecode-ir-v1-poc
source:
flex_path: USECODE/EUSECODE.FLX
extracted_root: USECODE/EUSECODE_extracted
chunk_file: USECODE/EUSECODE_extracted/chunks/chunk_191_table_1BA8_off_04C347_len_0003A8.bin
class:
entry_index: 191
object_index: 0x365
class_id: 0x363
class_name: NPCTRIG
raw_code_base_u32: 0x00da
code_base_minus_one: 0x00d9
conservative_event_count: 0x21
event:
slot: 0x0a
event_name_hint: equip
raw_event_entry_word: 0x013e
raw_code_offset: 0x00000001
derived_body_start: 0x00da
derived_body_end: 0x024f
derived_body_length: 373
repeated_template_status: ""
body:
end_reason: end_opcode
raw_body_sha1: <digest>
unknown_trailing_bytes: ""
ops:
- offset: 0x0000
absolute_body_offset: 0x00da
opcode: 0x5a
mnemonic: init
raw_bytes: 5a06
operands:
local_bytes: 0x06
- offset: 0x0011
absolute_body_offset: 0x00eb
opcode: 0x40
mnemonic: push_local_dword
raw_bytes: 40064c02
operands:
bp_offset: 0x06
annotation_hints:
runtime_family: slot-backed-owner-loaded-body
compiled_anchors:
- 000d:51fd
- 000d:5572
- 000d:46ec
- 000d:ebe3
```
### Required fields
`source` keeps the specific extracted artifact path so the parser output can always be checked against the raw chunk bytes.
`class` keeps the owner-loaded identity and header math already validated in the binary.
`event` keeps the exact six-byte row meaningfully split into authoritative fields plus the derived body window.
`body` records how far the parser got and whether any bytes remain undecoded or trailing.
`ops` is intentionally lossless. Each decoded op keeps:
- body-relative offset
- absolute chunk-local offset
- raw opcode byte
- mnemonic
- exact raw bytes for the whole op
- parsed operands as typed fields
`annotation_hints` is the bridge to Ghidra. It is not a source-language feature. It exists so a later importer can attach the right comments and bookmarks to the compiled VM/runtime addresses without trying to infer them from free text.
### Opcode result policy
The parser should use four result classes only:
- `decoded_op`: normal parsed opcode with structured operands
- `unknown_opcode`: one-byte opcode not yet modeled; stop or fall back conservatively
- `raw_tail`: remaining undecoded bytes after a stop condition
- `debug_blob`: symbol/debug tail such as `0x5c`-anchored metadata
That keeps the IR trustworthy even before the whole Crusader VM is modeled.
### Call-site hint policy
For `call` and `spawn`-family ops, the parser may attach:
- `target_class_id`
- `target_event_slot`
- `target_event_name_hint`
It should not attach a stronger semantic claim than that. The body parser is class/event aware, but not yet authoritative about gameplay meaning.
### Annotation-hint schema
The Ghidra bridge should consume only small, stable items:
```yaml
annotation_hints:
runtime_family: slot-backed-owner-loaded-body
payload_shape_hint: signed_word
compiled_anchors:
- address: 000d:51fd
role: slot_value_loader
- address: 000d:5572
role: slot_value_plus_offset
- address: 000d:46ec
role: context_create_from_slot
- address: 000d:ebe3
role: opcode_sequence_run
- address: 000d:22bc
role: matrix_pushback_stage
```
This is deliberately smaller than a full import format. It keeps the parser reusable even if the first Ghidra-side importer is only a comment/bookmark script.
That is already a real decompilation output. It keeps the exact slot id, the exact six-byte row contents, and the exact class-header facts, while refusing to pretend that `use` is already a proven semantic name for this class.
Here is the same style for one active event-bearing attachment class in the same island:
@ -543,6 +672,43 @@ vm_effect_possible:
That operator block is authoritative as a recovered VM vocabulary, but only ecosystem-level when attached to one specific descriptor family.
### Binary-side slot and payload-shape evidence to preserve in IR
The current VM pass also adds one useful binary-side rule for the higher event ordinals: the compiled wrapper family distinguishes slot identity from payload shape, and that distinction should survive in any round-trip IR even when the human label stays unresolved.
Verified current ladder around `0005:3115..31da`:
- slot `0x10`: guarded callsite only, zero extra word, packed mask `0x00010000`
- slot `0x11`: named wrapper `entity_vm_context_try_create_mask_00020000_slot11_with_offset`, one caller-supplied extra word
- slot `0x12`: named wrapper `entity_vm_context_try_create_mask_00040000_slot12`, zero extra word
- slot `0x13`: named wrapper `entity_vm_context_try_create_mask_00080000_slot13_with_offset_if_valid_entity`, one sign-extended extra word after an entity-validity gate
- slot `0x14`: named wrapper `entity_vm_context_try_create_mask_00100000_slot14_with_offset`, one caller-supplied extra word
Why this matters for the IR:
- It is direct binary evidence that some higher Crusader slot ordinals are already grouped by argument shape before any descriptor-family mapping is proven.
- That means the IR should preserve `slot_id` plus `payload_shape` independently instead of collapsing everything into one guessed event-name table.
- It also gives a bounded way to cross-check external event signatures without over-trusting them: slot `0x12` fits a zero-arg event shape, slot `0x13` fits a one-word event shape, and slot `0x14` currently conflicts with Pentagram's older zero-arg `animGetHit()` note.
Practical annotation rule to adopt now:
- keep higher-slot labels binary-stable as `slot 0x10` .. `slot 0x14` unless local behavior closes the label
- attach external event names only as hints
- attach one small `payload_shape_hint` field such as `none`, `word`, or `signed_word`
Minimal hinted example:
```yaml
slot_record:
slot_id: 0x13
event_name_hint: avatarStoleSomething
payload_shape_hint: signed_word
binary_anchor: 0005:31da
wrapper_name: entity_vm_context_try_create_mask_00080000_slot13_with_offset_if_valid_entity
```
The same pass also hardens one existing IR operator boundary: the `000d:22bc` stage is now comment-backed in Ghidra as a matrix/pushback consumer over decoded workspace bytes, not a direct descriptor-row reader. The current safe attachment point is therefore still `decoded VM workspace -> link-matrix stage`, not `NPCTRIG row -> direct entity-link emission`.
## Conservative Parser Rule To Adopt Now
For the current owner-loaded EUSECODE and round-trip IR work, the safest reversible rule is: