Add detailed class event processing and family comparison tools

- Enhance `extract_eusecode_flx.py` to derive class event rows with additional metadata including derived body windows and repeated template statuses.
- Introduce `usecode_family_compare.py` for comparing event families, analyzing commonalities in event bodies, and generating reports on identical groups and differences.
- Implement new data structures for managing class event rows and family artifact specifications.
- Update output formats to include derived body information and repeated family regression checks.
- Ensure robust validation of repeated family expectations against actual extracted data.
This commit is contained in:
MaddoScientisto 2026-03-22 23:24:46 +01:00
commit 4d3c8cd81b
23 changed files with 15033 additions and 14221 deletions

View file

@ -198,7 +198,7 @@ Second sweep through `000c` adjacent helpers — gated thunk wrappers and input/
| `000c:9f74` | `entity_state_flag100_check_and_dispatch` | Init latch guard at `[0x6053]`; clears `[0x8c55]` on first call; checks `[ptr+0x5b]` bits `0x100` and `0x40`; three-path thunk dispatch |
| `000c:a1ad` | `entity_state_clear_flag40_and_dispatch` | Skips if `[ptr+0x5b]` has `0x180` bits set; if `0x40` set, clears it and calls far ptr at `[0x5e82/0x5e84]`; then dispatches twice with args `(0x0b,0x10,0x1,0x0)` (record/state-key pattern) |
| `000c:a74e` | `entity_state_dispatch_if_flag_bit2` | Tests `[ptr+0x5b]` bit `0x2`; if set pushes extra arg + ptr and dispatches via thunk |
| `000c:84c3` | `entity_state_set_byte40_at_global_ptr` | Sets byte `[DAT_0000_6828 ptr + 0x40] = 1` then calls thunk unconditionally; enables global entity flag |
| `000c:84c3` | `entity_state_set_byte40_at_global_ptr` | Sets byte `[g_active_dispatch_entry_farptr + 0x40] = 1` then calls thunk unconditionally; current evidence treats this as raising the shared active-entry transition/display hold byte rather than toggling an unrelated global |
| `000c:ac55` | `entity_state_fire_if_handle_valid` | Guard: fires thunk dispatch only when `[0x6054] != -1`; no-op otherwise |
| `000c:ac6d` | `entity_state_fire_with_args_if_handle_valid` | 3-arg variant: pushes `[BP+0xe]` (byte), `[BP+0xc]`, `[BP+0xa]`, handle `[0x6054]`, then `CALLF 0000:ffff` |
| `000c:afa5` | `entity_state_check_field49_and_call_vfunc3c` | Checks field `[ptr+0x49]`: 1→reset to 0 return 1; 2→call `vtable[0x3c]` return 0; else thunk dispatch |
@ -214,7 +214,7 @@ Second sweep through `000c` adjacent helpers — gated thunk wrappers and input/
- `field49` = state-sequence index; 0=reset, 2=vtable callback, 4=triggered end
- `field47` = keystroke-combo counter
- `field3f` = linked data pointer (event/record reference)
- `[0x6054]` = current entity handle; `[0x6828]` = another global entity far pointer
- `[0x6054]` = current entity handle; `[0x6828]` = `g_active_dispatch_entry_farptr`, the shared active-dispatch entry owner whose byte `+0x40` is reused across the startup/display lane as a hold/busy token
- Bits in `[ptr+0x5b]`: `0x1=init`, `0x2=active/event`, `0x40=pending dispatch`, `0x100=flag100`, `0x180=skip-all mask`
---
@ -263,7 +263,8 @@ Globals used: `[0x6312]`=start index, `[0x6314]`=count, `[0x630e]`=palette src p
- `entity_vm_context_save` / `entity_vm_context_load` / `entity_vm_context_destroy` / `entity_vm_context_free_buffer` (`000d:498f`, `000d:4a78`, `000d:4962`, `000d:48b6`) now pin down the lifecycle of this object family rather than leaving the whole `000d:45xx..4exx` island anonymous
- `entity_vm_context_try_create_masked_for_entity` is now better constrained at the return-value level too: after the runtime-disable check at `0x6610` and the owner-side slot-mask test succeed, it reports two distinct success shapes. Immediate-flagged contexts (`+0x16 & 0x0008`) clear the caller output word, while object-backed contexts return the created object's low word. That makes the helper a typed bridge from gameplay entities into VM-backed object results, not only a yes/no mask probe.
- `entity_vm_runtime_owner_resource_create` (`000d:7000`) is now one step tighter too: the embedded seg069/070 helper is file-backed rather than abstract. Construction starts with `dos_file_handle_init` (`0009:1c00`), then uses helper vtable slot `+0x04` as the size query that drives the child `+0x10/+0x12` allocation and helper vtable slot `+0x0c` as the table-population callback for the `0x0d`-stride owner table.
- That file-backed helper is now tighter one step deeper as well. The seg070 loops rooted at raw windows `0009:67b6` and `0009:6916` walk helper-owned record arrays at object `+0x10/+0x18`, format per-entry paths through the seg001 string helpers (`0003:e4d3` / `0003:e590`), then open, read, and close each file through `file_handle_alloc_init_and_open` (`0009:1c3a`), `dos_file_seek` (`0009:2034`), and `dos_file_close` (`0009:1e61`). That is strong evidence that `000d:7000` seeds the owner table from an indexed external file set rather than by copying one monolithic in-memory descriptor blob.
- That file-backed helper is now tighter one step deeper as well. The seg070 loops rooted at raw windows `0009:67b6` and `0009:6916` walk helper-owned record arrays at object `+0x10/+0x18`, format per-entry paths through the seg001 string helpers (`0003:e4d3` / `0003:e590`), then open, read, and close each file through `file_handle_alloc_init_and_open` (`0009:1c3a`), `dos_file_seek` (`0009:2034`), and `dos_file_close` (`0009:1e61`). The paired `+0x18` entries are consumed as 16-bit ids passed into those path-format loops beside the far-pointer path table at `+0x10`; no object-1 or `classid + 2` arithmetic appears there, so the safest current read is slot-local file ids rather than exposed original class/object indices. That is strong evidence that `000d:7000` seeds the owner table from an indexed external file set rather than by copying one monolithic in-memory descriptor blob.
- A final loader-side tightening from the current pass is that `0009:67b6` and `0009:6916` now read as twin entry walkers rather than one isolated path-format callback. Both windows iterate the helper-owned count at `+0x14`, index the far-pointer path table at `+0x10` and paired 16-bit id table at `+0x18`, check the source path through `0003:e669`, build formatted paths with distinct local format strings (`DS:3f2d` vs `DS:3f40`), and then reach the same file open/read/close lane. The remaining open question is not whether they are file-backed, but whether they represent two file families, two record templates, or two load phases inside the same helper class.
- The caller-side bootstrap for that helper is now anchored too: `entity_vm_runtime_init_from_path_if_configured` (`000d:44df`) first checks the configured byte/string global at `0x65a`, builds a path through seg072 helper `0009:3600` using globals `0x6d6:0x6d8` plus `0x65a`, validates that path through `000a:500a`, then calls `entity_vm_runtime_create(0,0,path)`. This is the first verified source-argument path for `entity_vm_runtime_owner_resource_create`, and it strongly suggests the owner/resource table is loaded from an external configured file rather than from a purely in-memory descriptor blob.
- Seg072 helper `0009:3600` is now classified more tightly as a rotating slash-aware path composer rather than a generic buffer advance helper. Its prologue cycles through five `0x50`-byte temp buffers, and its inner cases append optional string parts while inserting `\` only when adjacent path components need a separator. That narrows the two globals used by `000d:44df`: `0x65a` behaves as the configured relative runtime-owner filename/path component, while `0x6d6:0x6d8` behaves as the mutable base/resource-root path buffer that gets joined with `0x65a` before `000a:500a` validation.
- The two still-xref-dark wrappers `0005:2c35` and `0005:2c68` are also narrower now. Their signed extra word does not participate in owner-mask selection inside `entity_vm_context_try_create_masked_for_entity`; it is forwarded into `entity_vm_context_create_from_slot_index`, stored in context field `+0x34`, and passed on to `entity_vm_slot_load_value_plus_offset`. The best current reading is therefore `offset-specialized masked context creation`, not a separate direct selector lane.
@ -289,7 +290,8 @@ Globals used: `[0x6312]`=start index, `[0x6314]`=count, `[0x630e]`=palette src p
- `entity_vm_state_copy` (`000c:f772`) copies that same `+0xcc..+0xd2` stream/base quartet verbatim when one mini-VM object is cloned.
- Upstream of the setup helper, `000d:46ec` derives the source payload from the runtime owner table behind `0x6611 -> +0x1315/+0x1317`: with slot index `SI`, it walks owner table `*(owner+0x10/+0x12) + 0x0d*SI + 4`, passes that far pointer into `000c:f844`, and mirrors the resulting per-slot source into `0x39ca[slot]`.
- This sharpens the current JELYHACK-side model rather than overturning it: the code-side producer recovered in this batch is still a generic slot-backed VM source object keyed by gameplay-entity slot selection and owner-side mask bits, not a direct hard-coded descriptor-class switch on `JELYHACK` or `JELYH2`. Combined with the extractor evidence that `JELYHACK` / `JELYH2` remain referent-only while `REE_BOOT` / `SFXTRIG` keep active `event` tags and `SURCAMEW` keeps `eventTrigger`, the better fit is still `referent anchor -> slot-backed payload chain -> neighboring event-bearing attachment`.
- The `0x39ca` mirror question is narrower now too. Fresh windows at `0008:709c/70cb`, `0008:7309/7338`, and `0008:85f9/8617` show only global base-pointer save/restore and allocation/zeroing of the `0x39ca:0x39cc` table itself. The only verified per-slot row writer in this lane remains `entity_vm_context_create_from_slot_index` (`000d:46ec`), which writes `0x39ca[context_slot] = {source_off, source_seg}` after it derives the slot-backed payload source.
- The `0x39ca` mirror question is now split more cleanly. Fresh windows at `0008:709c/70cb`, `0008:7309/7338`, and `0008:85f9/8617` still show only global base-pointer save/restore and allocation/zeroing of the `0x39ca:0x39cc` table itself, but two additional per-slot row writers are now verified in `000d`: `FUN_000d_7299` writes static source `DS:67f2` to `0x39ca[obj+2]` after creating a `0x44`-byte object, and `active_dispatch_entry_create_default` (`000d:761c`) writes static source `DS:6872` to `0x39ca[obj+2]` for the default active dispatch entry. `entity_vm_context_create_from_slot_index` (`000d:46ec`) remains the only confirmed owner-table-derived writer, but it is no longer the only concrete row writer overall.
- The current pass narrows that split one step further. `entity_vm_context_create_from_slot_index` (`000d:46ec`) still derives its row from runtime owner table `(+0x10/+0x12) + 0x0d*slot + 4` before mirroring it into `0x39ca[slot]`, while `000d:7299` and `000d:761c` never touch the owner table at all in the verified windows. Instead they allocate local dispatch-entry-style objects, derive the row index from object field `+0x2`, and seed `0x39ca[row]` from fixed static sources `DS:67f2` and `DS:6872`. The safest current interpretation is therefore `owner-backed VM source mirror` versus `dispatch-entry-local static seed rows`, not three competing writers to the same semantic lane.
- One exact numeric collision is now ruled out as unrelated noise rather than a second VM source: `000e:0953` in the animation/audio lane pushes literal `0x410` into imported `ASYLUM.27` immediately after setting the local audio-completion byte at `+0xef1`. Because `ASYLUM.DLL` is the `ASS_*` audio/media library, this does not weaken the attribution of gameplay event `0x410` to the `000d` VM/USECODE lane.
- Current best JELYHACK reading after this pass: `JELYHACK` itself still looks like a referent-only map/object descriptor, but that no longer makes it inert. A referent-only record can still matter by supplying the referent id that populates the VM referent registry, while neighboring classes such as `REE_BOOT`, `SURCAMEW`, and `SFXTRIG` supply the event-bearing logic attached to the same local object island.
@ -337,8 +339,9 @@ Conservative case identity mapping from this pass:
Still unresolved after this pass:
- Direct CALL xrefs into `FUN_000d_ebe3` are now confirmed from `animation_ctor_variant_a/b/c` at `000e:283e`, `000e:2931`, and `000e:29e4`, so the entry is no longer globally xref-dark.
- Those constructor callsites still do not expose a new concrete wrapper-level opcode number or the direct write/read path for `[BP-0x32]`; no additional opcode id can yet be assigned uniquely beyond the internal `0x19/0x1a/0x1b` family already proven inside `000d:0988`.
- The animation constructor near calls at `000e:283e`, `000e:2931`, and `000e:29e4` land on a separate mis-split `000e:ebe3` region, not on `FUN_000d_ebe3`. They therefore no longer count as direct xref evidence for the `000d` dispatcher.
- The true upstream selector/write path for `[BP-0x32]` in `FUN_000d_ebe3` is still unresolved, and no additional opcode id can yet be assigned uniquely beyond the internal `0x19/0x1a/0x1b` family already proven inside `000d:0988`.
- Repeated MCP-visible instruction and data-use searches still do not produce a real direct caller edge for `FUN_000d_ebe3`, `0005:2c35`, or `0005:2c68`. For now that makes the next defensible route `caller-frame / shared-consumer recovery`, not more recycled raw call searches or the retired `000a:44fd` and `000e:ebe3` hypotheses.
### First readable VM IR sketch (verified-only)
@ -461,6 +464,7 @@ The next gameplay-side wrapper pass now extends well past the three earlier seed
#### Concrete caller/xref addendum from the next pass
- Direct callsites are now pinned for the simpler wrappers: `0005:0292 -> 0005:2c06`, `0005:0fee -> 0005:2cd2`, `0005:5946/59e9 -> 0005:2c9b`, and `0007:814e/822e -> 0005:2d01`.
- The two direct `0005:2d30` callers are now role-shaped as well: `0005:5370` reaches slot `0x0f` only after `entity_class_has_flag2000` succeeds and class-word bit `0x8000` is clear, while `0005:6f47` reaches the same gate from the complementary branch where class-word bit `0x2000` is still clear before the caller continues into its larger state/update flow.
- `0005:2c68` is no longer usable as indirect selector evidence. The `0007:e521` and `0007:e73c` instruction windows do push `0x2c68` immediately before `CALLF 000a:44fd`, but decompile now shows that value is the caller-local data pointer `DAT_0000_2c68` passed into a fatal-report helper, not an indirect call to wrapper `0005:2c68`.
- `0005:2c35` and `0005:2c68` therefore both remain unresolved in direct caller/xref evidence, and the real selector work stays centered on the still-xref-dark upstream edge into `FUN_000d_ebe3` rather than the disproven `000a:44fd` hypothesis.
- Net effect: the active-event ecosystem fit is reinforced by direct caller behavior and payload shapes, but final slot-to-descriptor ownership still requires real caller-role recovery for the remaining xref-dark entry points.