Add Crusader-specific USECODE data and documentation

- Introduced new file `vm_mask_ladder.tsv` containing detailed mappings for Crusader USECODE VM masks and their associated descriptors.
- Added comprehensive documentation in `scummvm-crusader-reference.md` outlining the structure, findings, and implications for reverse-engineering the Crusader engine within ScummVM.
- Created `usecode-roundtrip-ir.md` to document the plan for converting Crusader USECODE bytes into a human-readable format, detailing the container layout, event names, and intrinsic tables.
- Implemented a PowerShell script `temp_usecode_sample.ps1` for extracting and analyzing USECODE data from the Crusader FLX files, providing insights into class and event structures.
This commit is contained in:
MaddoScientisto 2026-03-22 17:26:39 +01:00
commit de42fd1ea1
42 changed files with 21970 additions and 1522 deletions

View file

@ -8,7 +8,7 @@ Content extracted from `crusader_decompilation_notes.md`. Named via systematic a
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 1 | `000a:44fd` | `seg091_func_00fd` | 331 | Recovered boundary. Shares init flag `0x44a4` with `runtime_init_or_abort`; thunk-heavy non-returning wrapper. |
| 1 | `000a:44fd` | `fatal_error_report_fmt_a_and_exit` | 331 | Reentrancy-guarded fatal report helper. Prints the shared banner at `0x44a5`, formats template `0x44cc` with caller words, then exits; earlier `0005:2c68` selector speculation is now rejected. |
| 2 | `0003:ac7e` | `mem_alloc` | 272 | Allocation wrapper → seg082:0000 (`0009:a200`) |
| 3 | `0008:dbec` | `entity_word_list_destroy` | 238 | Frees entity word-list buffer. |
| 4 | `0003:a751` | `mem_free` | 207 | Free wrapper → seg082:007a (`0009:a27a` = `mem_free_checked`) |
@ -22,7 +22,7 @@ Content extracted from `crusader_decompilation_notes.md`. Named via systematic a
| 12 | `0008:bb8c` | `entity_check_flag_0x4000` | 115 | Short-circuits if flag `0x4000` set at `+0x16` |
| 13 | `0008:cda7` | `entity_free_both_word_lists` | 115 | Frees word lists at entity+`0x1e` and `+0x28` if optional pointers at `+0x24/+0x26` and `+0x2e/+0x30` non-null. Both call `entity_word_list_free_existing`. |
| 14 | `0004:26d2` | `nop_void_stub_b` | 111 | Empty function, returns void |
| 15 | `000a:45fe` | `runtime_init_or_abort` | 108 | Reentrancy-guarded init. Flag at `0x44a4`; flushes via `FUN_000a_4a56`, then calls `crt_exit_wrapper(1)`. Hidden code gap `0x4616-0x4643`. |
| 15 | `000a:45fe` | `fatal_error_report_fmt_c_and_exit` | 108 | Sibling fatal report helper. Uses the same `0x44a4` guard and banner string, formats static template `0x4506` with caller words, then exits. |
| 16 | `0004:3324` | `nop_return_zero` | 95 | Returns 0 |
| 17 | `0009:c563` | `event_queue_push` | 82 | Circular buffer enqueue. Ring index (`+0xe`) masked `0x3f`, slot masked `0xfff8`. Writes event type word + data byte pair. |
| 18 | `0005:c448` | `list_remove_and_free` | 74 | Unlinks node from linked list via `FUN_0005_c495`, optionally calls `mem_free` if bit 0 of flags set |
@ -105,13 +105,13 @@ Content extracted from `crusader_decompilation_notes.md`. Named via systematic a
| Address | NE Segment | Callers | Notes |
|---------|-----------|---------|-------|
| `000a:44fd` | seg091:00fd | 331 | Recovered as `seg091_func_00fd`; thunk-heavy init wrapper sharing flag `0x44a4`. |
| `000a:44fd` | seg091:00fd | 331 | Fatal report helper now identified; remaining gap is the exact human-readable template text at `0x44cc`/`0x44a5`, not control flow. |
| `000b:2e00` | seg109:0000 | 74 | Start of segment 109. |
| `0007:5a00` | seg043:0000 | 64 | Start of segment 43. Earlier seg001 `debris_spawn` port was rejected; still needs manual function creation and direct analysis. |
| `000a:48ff` | seg091:04ff | 55 | Recovered as `rng_next_modulo`; bounded wrapper around seg091 RNG state advance. |
| `0003:a880` | seg005:0880 | 49 | In CRT segment near `far_memcpy`. |
| `0003:ad75` | seg005:0d75 | 43 | In CRT segment near `mem_alloc`. |
| `000a:454d` | seg091:014d | 32 | Recovered as `seg091_func_014d`; init/context helper using the `0x45a6` cookie/context global. |
| `000a:454d` | seg091:014d | 32 | Buffer-normalizing fatal report sibling. Copies/clears context through the `0x45a6` global, formats template `0x44e7`, then exits. |
**seg043 reconciliation:**
- The earlier standalone seg001 port hypothesis in this subrange was wrong.

View file

@ -245,8 +245,8 @@ Globals used: `[0x6312]`=start index, `[0x6314]`=count, `[0x630e]`=palette src p
- `entity_vm_set_value_from_slot_plus_offset` (`000c:f95f`) now provides a concrete bridge from the `000c` mini-VM cluster into the `000d` event/countdown lane:
- it calls `FUN_000d_5572(*(word *)0x6611, *(word *)0x6613, param_3, param_4, 0, 0)`
- then stores the returned far pair into target object fields `+0xd6/+0xd8`
- `entity_vm_slot_load_value_plus_offset` (`000d:5572`) is a thin wrapper over `entity_vm_slot_load_value` (`000d:51fd`), and `entity_vm_slot_load_value` contains a verified `PUSH 0x410` path at `000d:5290` before calling the unresolved seg091 event/abort lane at `000a:44fd`.
- This is not enough yet to say that `entity_vm_set_value_from_slot_plus_offset` is the immortality trigger, but it does show that the `000c` mini-VM / record-player cluster can hand work directly into a `000d` helper that emits event `0x410`.
- `entity_vm_slot_load_value_plus_offset` (`000d:5572`) is a thin wrapper over `entity_vm_slot_load_value` (`000d:51fd`), but the previously suspicious `PUSH 0x410` path at `000d:5290` is now reclassified: it pushes `0x410`, `DS`, and `0x6616` into the seg091 fatal-report helper at `000a:44fd`, so this is an error/assert path rather than a live gameplay event dispatch.
- This closes the earlier compiled-code immortality bridge from `000c:f95f` into `000d:51fd`. The verified bridge that remains is the data/value handoff into the context `+0xd6/+0xd8` lane, not a direct event `0x410` producer.
- Supporting renamed helpers in the same lane now include:
- `entity_vm_slot_find_or_select` (`000d:4e7c`): scans 0x26-byte slot records, returns a matching slot id when present, and tracks one fallback slot for reuse/eviction
- `entity_vm_slot_decrement_use_count` (`000d:558d`): decrements one slot-use counter and traps on underflow
@ -262,6 +262,11 @@ Globals used: `[0x6312]`=start index, `[0x6314]`=count, `[0x630e]`=palette src p
- `entity_vm_context_sync_global_value_and_dispatch` (`000d:48da`) is the current context-side runner/sync point: it marks the context busy at `+0x123`, calls `entity_vm_set_field_da_to_global`, optionally writes the current value through `+0x11b/+0x11d`, and dispatches through the context vtable on success
- `entity_vm_context_save` / `entity_vm_context_load` / `entity_vm_context_destroy` / `entity_vm_context_free_buffer` (`000d:498f`, `000d:4a78`, `000d:4962`, `000d:48b6`) now pin down the lifecycle of this object family rather than leaving the whole `000d:45xx..4exx` island anonymous
- `entity_vm_context_try_create_masked_for_entity` is now better constrained at the return-value level too: after the runtime-disable check at `0x6610` and the owner-side slot-mask test succeed, it reports two distinct success shapes. Immediate-flagged contexts (`+0x16 & 0x0008`) clear the caller output word, while object-backed contexts return the created object's low word. That makes the helper a typed bridge from gameplay entities into VM-backed object results, not only a yes/no mask probe.
- `entity_vm_runtime_owner_resource_create` (`000d:7000`) is now one step tighter too: the embedded seg069/070 helper is file-backed rather than abstract. Construction starts with `dos_file_handle_init` (`0009:1c00`), then uses helper vtable slot `+0x04` as the size query that drives the child `+0x10/+0x12` allocation and helper vtable slot `+0x0c` as the table-population callback for the `0x0d`-stride owner table.
- That file-backed helper is now tighter one step deeper as well. The seg070 loops rooted at raw windows `0009:67b6` and `0009:6916` walk helper-owned record arrays at object `+0x10/+0x18`, format per-entry paths through the seg001 string helpers (`0003:e4d3` / `0003:e590`), then open, read, and close each file through `file_handle_alloc_init_and_open` (`0009:1c3a`), `dos_file_seek` (`0009:2034`), and `dos_file_close` (`0009:1e61`). That is strong evidence that `000d:7000` seeds the owner table from an indexed external file set rather than by copying one monolithic in-memory descriptor blob.
- The caller-side bootstrap for that helper is now anchored too: `entity_vm_runtime_init_from_path_if_configured` (`000d:44df`) first checks the configured byte/string global at `0x65a`, builds a path through seg072 helper `0009:3600` using globals `0x6d6:0x6d8` plus `0x65a`, validates that path through `000a:500a`, then calls `entity_vm_runtime_create(0,0,path)`. This is the first verified source-argument path for `entity_vm_runtime_owner_resource_create`, and it strongly suggests the owner/resource table is loaded from an external configured file rather than from a purely in-memory descriptor blob.
- Seg072 helper `0009:3600` is now classified more tightly as a rotating slash-aware path composer rather than a generic buffer advance helper. Its prologue cycles through five `0x50`-byte temp buffers, and its inner cases append optional string parts while inserting `\` only when adjacent path components need a separator. That narrows the two globals used by `000d:44df`: `0x65a` behaves as the configured relative runtime-owner filename/path component, while `0x6d6:0x6d8` behaves as the mutable base/resource-root path buffer that gets joined with `0x65a` before `000a:500a` validation.
- The two still-xref-dark wrappers `0005:2c35` and `0005:2c68` are also narrower now. Their signed extra word does not participate in owner-mask selection inside `entity_vm_context_try_create_masked_for_entity`; it is forwarded into `entity_vm_context_create_from_slot_index`, stored in context field `+0x34`, and passed on to `entity_vm_slot_load_value_plus_offset`. The best current reading is therefore `offset-specialized masked context creation`, not a separate direct selector lane.
- The first opcode-level behavior split inside that runtime is now visible in the `000d:0988` family:
- one branch calls `entity_vm_referent_chain_append_unique_from`, which looks like an attach/union operation on the current referent payload chain
- the `0x1a/0x1b` branch instead calls `entity_vm_referent_chain_remove_matching_from`, which looks like the inverse operation and makes the opcode family materially closer to a graph-editing script VM than a flat event list
@ -284,8 +289,182 @@ Globals used: `[0x6312]`=start index, `[0x6314]`=count, `[0x630e]`=palette src p
- `entity_vm_state_copy` (`000c:f772`) copies that same `+0xcc..+0xd2` stream/base quartet verbatim when one mini-VM object is cloned.
- Upstream of the setup helper, `000d:46ec` derives the source payload from the runtime owner table behind `0x6611 -> +0x1315/+0x1317`: with slot index `SI`, it walks owner table `*(owner+0x10/+0x12) + 0x0d*SI + 4`, passes that far pointer into `000c:f844`, and mirrors the resulting per-slot source into `0x39ca[slot]`.
- This sharpens the current JELYHACK-side model rather than overturning it: the code-side producer recovered in this batch is still a generic slot-backed VM source object keyed by gameplay-entity slot selection and owner-side mask bits, not a direct hard-coded descriptor-class switch on `JELYHACK` or `JELYH2`. Combined with the extractor evidence that `JELYHACK` / `JELYH2` remain referent-only while `REE_BOOT` / `SFXTRIG` keep active `event` tags and `SURCAMEW` keeps `eventTrigger`, the better fit is still `referent anchor -> slot-backed payload chain -> neighboring event-bearing attachment`.
- The `0x39ca` mirror question is narrower now too. Fresh windows at `0008:709c/70cb`, `0008:7309/7338`, and `0008:85f9/8617` show only global base-pointer save/restore and allocation/zeroing of the `0x39ca:0x39cc` table itself. The only verified per-slot row writer in this lane remains `entity_vm_context_create_from_slot_index` (`000d:46ec`), which writes `0x39ca[context_slot] = {source_off, source_seg}` after it derives the slot-backed payload source.
- One exact numeric collision is now ruled out as unrelated noise rather than a second VM source: `000e:0953` in the animation/audio lane pushes literal `0x410` into imported `ASYLUM.27` immediately after setting the local audio-completion byte at `+0xef1`. Because `ASYLUM.DLL` is the `ASS_*` audio/media library, this does not weaken the attribution of gameplay event `0x410` to the `000d` VM/USECODE lane.
- Current best JELYHACK reading after this pass: `JELYHACK` itself still looks like a referent-only map/object descriptor, but that no longer makes it inert. A referent-only record can still matter by supplying the referent id that populates the VM referent registry, while neighboring classes such as `REE_BOOT`, `SURCAMEW`, and `SFXTRIG` supply the event-bearing logic attached to the same local object island.
### 000d:21ed/22bc id-correlation table (runtime lane vs descriptor families)
| Runtime element | Code anchors | Observed width/shape | Correlation status |
|---|---|---|---|
| Metadata byte A | `000d:22d2` after context from `000d:46ec` | 1-byte signed (`CBW`), used as first loop dimension/count input | Not a descriptor id. Behaves as compact shape/count metadata for matrix construction. |
| Metadata byte B | `000d:22ee` | 1-byte signed (`CBW`), paired with byte A and summed to derive loop bounds | Not a descriptor id. Same shape/count role as byte A. |
| Streamed words feeding matrix | `000d:2324`, `000d:2372`, `000d:237b -> 0008:7d27` | 16-bit words consumed from caller stream and passed to `entity_link` | Best fit: runtime entity/link ids, not descriptor-class selectors. |
| Matrix output writeback filter | `000d:23da..2421` | tests `0x0400`; only non-`0x0400` words are pushed back | Matches `entity_word_list` style link-flag semantics, not event opcode tagging. |
| Source stream provenance | `000d:4732..4751`, `000d:47a3..47d4` | source pointer = owner table `(+0x10/+0x12) + 0x0d*slot + 4`; mirrored to `0x39ca[slot]` | Slot-indexed runtime source table, generic across gameplay entity lanes. |
Conservative interpretation after this pass:
- The `000d:21ed -> 000d:22bc` lane is strongly supported as a slot-backed payload to entity-link closure path, where two byte-sized metadata fields shape the matrix walk and word entries are link/entity ids.
- Descriptor-family alignment is therefore stronger with generic active event ecosystems (`EVENT`/`NPCTRIG`/`*_BOOT`/`SFXTRIG`) than with `SURCAM*` callback holders, because no direct `eventTrigger`-specific discriminator is read in this lane.
- Direct descriptor-id attribution is still rejected for now: no code evidence ties the consumed bytes/words here to explicit EUSECODE class indices or to a hard `JELYHACK`/`SURCAM*` switch.
### FUN_000d_ebe3 opcode-to-payload-shape matrix (sequencer-local)
| Sequencer stage | Code anchors | Opcode / lane status | Payload shape class | Verified behavior |
|---|---|---|---|---|
| `000d:0988` (`entity_vm_opcode_mutate_referent_chain`) | `000d:ec1d`, `000d:0988` body | Known `0x18..0x1b` family | Inline/indirect chain payloads | `0x18/0x19` append-unique and `0x1a/0x1b` remove-matching over referent chains, with indirect-vs-inline mode split and shared epilogue. |
| `000d:177c` | `000d:ebf5`, `000d:178b..17aa` | Numeric opcode unresolved in this dispatcher lane | Word scalar (frame-local -> stream) | Does not read `+0xd6/+0xd8`; subtracts `2` from `[context+0xcc]` and pushes one frame-local word (`BP-0x1c6`) onto the stream stack. |
| `000d:1acb` | `000d:ec09`, `000d:1acb..1b22` | Numeric opcode unresolved in this dispatcher lane | Word-pair/list consumer + boolean output | Reads one 32-bit pair from stream (`[context+0xcc]`, then `+4`), compares against `AX:DX`, and pushes a 16-bit predicate result back to stream. |
| `000d:21ed -> 000d:22bc` | `000d:21ed`, `000d:22d2`, `000d:22ee`, `000d:2324..237b`, `000d:23da..2421` | Caller block + internal stage | Mixed: byte metadata + word id matrix | Consumes two signed bytes from seeded `+0xd6/+0xd8` as shape/count metadata, then consumes streamed words as entity/link ids for `entity_link`; only non-`0x0400` words are pushed back. |
| `000d:1d4a` | `000d:ec48`, `000d:1d4a` | Conditional substage when `[obj+0xba]==0` | Control/sentinel (no payload shape proven) | Current body is `INT3`-only (boundary suspect); treated as a control gate/trap island, not a verified payload transformer. |
| `000d:2104` | `000d:ec54`, `000d:2104..212b` | Numeric opcode unresolved in this dispatcher lane | Mixed scalar/handle return | Writes result to caller out-ptr: path A stores frame-local dword (`BP+0xfdaa/fdac`), path B stores object word (`[obj+2]`) with high word cleared; then returns via opcode epilogue. |
### Pass-4 dispatcher lane update (opcode selector evidence)
What is now hard evidence in code:
- `000d:0988` compares one opcode-local word at `[BP-0x32]` against concrete values `0x19`, `0x1a`, and `0x1b` (`000d:099b`, `000d:09a1`, `000d:0a07`, `000d:0a0d`).
- `FUN_000d_ebe3` calls `000d:177c -> 000d:1acb -> 000d:0988 -> 000d:22bc -> optional 000d:1d4a -> 000d:2104` (`000d:ebf5`, `000d:ec09`, `000d:ec1d`, `000d:ec31`, `000d:ec48`, `000d:ec54`).
- `000d:177c`, `000d:1acb`, and `000d:2104` do not contain their own opcode compares in recovered body ranges; they behave as wrapper stages around the opcode-local family tested in `000d:0988`.
Conservative case identity mapping from this pass:
- `000d:177c` = pre-mutate stack push stage for the same `[BP-0x32]` family.
- `000d:1acb` = comparator stage (stream dword pair -> boolean word) for that family.
- `000d:0988` = concrete opcode discriminator for `0x19/0x1a/0x1b` (with `0x18` still implied by sibling path behavior).
- `000d:2104` = family finalizer writing mixed immediate/object output to caller out-ptr.
Still unresolved after this pass:
- Direct CALL xrefs into `FUN_000d_ebe3` are now confirmed from `animation_ctor_variant_a/b/c` at `000e:283e`, `000e:2931`, and `000e:29e4`, so the entry is no longer globally xref-dark.
- Those constructor callsites still do not expose a new concrete wrapper-level opcode number or the direct write/read path for `[BP-0x32]`; no additional opcode id can yet be assigned uniquely beyond the internal `0x19/0x1a/0x1b` family already proven inside `000d:0988`.
### First readable VM IR sketch (verified-only)
From direct decompile/disassembly in `000d:0988`, `000d:208b`, `000d:21ed`, `000d:22bc`, and `0008:7d27`, the current script-readable IR shape is:
- `APPEND_UNIQUE_INLINE` (`opcode 0x18`, implied sibling in `000d:0988`)
- `APPEND_UNIQUE_INDIRECT` (`opcode 0x19`)
- `REMOVE_MATCHING_INDIRECT` (`opcode 0x1a`)
- `REMOVE_MATCHING_INLINE` (`opcode 0x1b`)
- `MATERIALIZE_OR_FORWARD_VALUE` (`000d:208b` path after `entity_vm_context_create_from_slot_index`)
- `PUSH_FRAME_WORD_LITERAL` (`000d:177c`: pushes one frame-local word to stream stack)
- `COMPARE_STREAM_DWORD_AND_PUSH_BOOL` (`000d:1acb`: consumes one stream dword pair and pushes predicate word)
- `PREPEND_INLINE_PAYLOAD` (`000d:21ed`: subtracts from context `+0x102` then copies caller bytes)
- `BUILD_ENTITY_LINK_MATRIX` (`000d:22bc`: two streamed dimension bytes, streamed id table, repeated `entity_link` calls)
- `FINALIZE_MIXED_VALUE_TO_OUTPTR` (`000d:2104`: emits either immediate frame dword or object-word-derived value)
- `EMIT_OR_PUSHBACK_RESULT` (`000d:22bc` tail: values without `0x0400` marker are pushed back to caller stream before `entity_vm_opcode_finish`)
Minimal pseudocode-style sketch:
`referent = active_referent_id()`
`chain = referent.payload_chain`
`chain = mutate(chain, opcode_0x18_to_0x1b, payload_mode)`
`value = materialize_or_forward(context_from_slot(stream_state))`
`if opcode_lane == inline_payload: value = prepend_inline_payload_and_build_link_matrix(stream_ids)`
`emit(value)`
This remains consistent with descriptor-side evidence: referent-only anchors (`JELYHACK`/`JELYH2`) can still drive behavior once neighboring event-capable descriptors attach payload/event semantics to the same referent island.
### First readable pseudo-script renderings (verified-only)
`entity_vm_context_create_from_slot_index` adds one more readable anchor for this IR: after it seeds the embedded mini-VM from the runtime owner table at `0x6611 -> +0x1315/+0x1317 -> (+0x10/+0x12) + 0x0d*slot + 4`, it also writes the same far source pair into the per-slot mirror row addressed through `0x39ca[context_slot]`. That keeps the current readable model honest: the mirror is part of context creation for slot-backed VM state, not yet a proven standalone descriptor-dispatch cache.
The best verified human-readable form right now is therefore a small family of templates rather than a one-record-equals-one-opcode script dump.
Readable template A: referent anchor with event-bearing attachment (JELYHACK island)
```text
anchor JELYHACK(referent)
anchor JELYH2(referent)
attach REE_BOOT(event, counter, item)
attach SFXTRIG(event)
optional_callback SURCAMEW(eventTrigger, link, code, screen, cameraEgg, trueRef, therma)
vm_effect:
chain = APPEND_UNIQUE_INLINE(...) or APPEND_UNIQUE_INDIRECT(...)
chain = REMOVE_MATCHING_INLINE(...) or REMOVE_MATCHING_INDIRECT(...)
value = MATERIALIZE_OR_FORWARD_VALUE(slot_backed_context)
if inline_payload_present:
payload = PREPEND_INLINE_PAYLOAD(caller_blob)
links = BUILD_ENTITY_LINK_MATRIX(shape_a, shape_b, entity_ids)
FINALIZE_MIXED_VALUE_TO_OUTPTR(value)
```
Why this is the current best readable rendering:
- `JELYHACK` and `JELYH2` remain referent-only sibling descriptors with identical first-16-word header shape in `jelyhack_descriptor_compare.tsv`.
- The nearest event-bearing neighbors in `jelyhack_island_graph.md` are `REE_BOOT` (`event`), `SURCAMEW` (`eventTrigger`), and `SFXTRIG` (`event`), so the readable unit is better modeled as `anchor + attachment` than as a self-contained `JELYHACK` event record.
- The runtime side already supports exactly that shape: one referent anchor can own mutable payload chains, and the `000d:21ed -> 000d:22bc` path can expand one inline payload into an entity-link closure before `entity_vm_opcode_finish` commits the result.
Readable template B: active event hub with trigger-side neighbors (EVENT island)
```text
neighbor ROLL_NS(referent, item, item2, riderList, time, total, counter, oldz, cargo, zCheck, zMax)
attach COR_BOOT(event, counter, item)
attach EVENT(event, item, source, dest, door, link, time, counter, counter2, post1, post2, floor, flicMan)
attach NPCTRIG(event, item, item2, typeNpc)
neighbor CRUZTRIG(referent, item, elev)
neighbor NPC_ONLY(referent, item, link)
neighbor VMAIL(referent, textFile)
vm_effect:
select referent-bearing neighborhood
mutate referent payload chain via opcode 0x18..0x1b family
materialize slot-backed value or inline payload
if payload carries shape/count bytes:
build entity-link closure matrix from streamed ids
emit event-bearing result through shared opcode epilogue
```
Why this second template matters:
- `event_island_graph.md` and `event_descriptor_compare.tsv` show a compact three-node event-bearing core (`COR_BOOT`, `EVENT`, `NPCTRIG`) embedded inside referent/link/text neighbors, which matches the same `anchor/neighbor + attachment` runtime model seen around `JELYHACK`.
- `EVENT` is structurally richer than the `_BOOT` and `NPCTRIG` satellites, so it reads better as a hub descriptor whose fields parameterize the same VM-side payload-chain and link-matrix machinery rather than as a flat peer row.
- This is the first point where the binary descriptor artifacts and the `000d` VM IR can be rendered together as a readable pseudo-script target without claiming a direct descriptor-id switch that the code still does not prove.
### Wrapper mask-family expansion around `0005:2867-2d30`
The next gameplay-side wrapper pass now extends well past the three earlier seed wrappers and shows one coherent local mask ladder around `entity_vm_context_try_create_masked_for_entity`.
#### Verified wrapper ladder
| Address | Mask pair | Extra pushed value | Verified caller / guard notes |
|---------|-----------|--------------------|-------------------------------|
| `0005:27a4` | `0x0001:0000` | none | Existing seed. Called from `000c:a09e` on the entity `+0x5b` bit-`0x0004` branch. |
| `0005:2867` | `0x0002:0001` | none | Calls `FUN_0005_2686` first, so the local entity id must be `1..255` when that gate matters. If seg030 helper `FUN_0005_ffed` reports true, the wrapper only continues when `entity_class_get_flag8(local_id)` is true or `local_id == 1`. Called at `000c:8b5b`, `000c:8be2`, `000c:8d59`, `000c:8dec`, `000c:9536`, `000c:95ed`, `000c:9868`, and `000c:a007`; the `000c:8b5b` / `000c:a007` callers then store the returned word into entity field `+0x39` before `entity_state_tick_dispatch`. |
| `0005:2918` | `0x0020:0005` | `CONCAT22(param_4,param_3)` | Sole current caller is `0006:43e5`, reached only when caller object word `+0x3c == 0x20b`; it passes caller fields `+0x36/+0x38` as one extra dword before the out pointer. |
| `0005:2ae2` | `0x0004:0002` | none | Sole current caller is `0008:023d` inside a dispatch-style loop body. |
| `0005:2c06` | `0x0200:0009` | none | Adjacent simple wrapper in the same local family. |
| `0005:2c35` | `0x0400:000a` | sign-extended word argument | Adjacent simple wrapper; assembly pushes one extra sign-extended word before the out pointer. |
| `0005:2c68` | `0x0800:000b` | sign-extended word argument | Same pattern as `0005:2c35`, with one extra sign-extended word operand. |
| `0005:2c9b` | `0x0010:0004` | none | Global gate wrapper: returns early unless `0x1056 != 0`. |
| `0005:2cd2` | `0x1000:000c` | none | Adjacent simple wrapper in the same family. |
| `0005:2d01` | `0x4000:000e` | none | Adjacent simple wrapper in the same family. |
| `0005:2d30` | `0x8000:000f` | none | Larger gameplay gate. Sets entity class-word bit `0x2000` via `FUN_0005_2745(entity, class_word | 0x2000)`, checks class-record bits through `FUN_0005_32a8` / `FUN_0005_32d2` (byte `+0` or `+6` bit `0x10` in the `0x7e46` class table), rejects some seg030 classes unless ids `0x576/0x596/0x59c/0x58f` match, branches on `FUN_0005_11c4` class nibble values `4`, `7`, and `8`, may emit dispatch entry `0x0f16` / event type `0x20f` through `FUN_0004_f08b`, and only then attempts the masked VM context. Current direct callers are `0005:5370` and `0005:6f47`. |
#### Shared preconditions and what they imply
- This island is firmly gameplay-side, not a descriptor-id switch. The wrappers consume live entity/object far pointers, use the runtime slot mapper at `000d:45c5`, and gate on entity-id range, entity class word bits, class-record bytes from `0x7e46`, and state bytes such as entity `+0x5b`, `+0x32`, and `+0x39`.
- The local ladder is not random. The mask pairs now cover `0x0001:0000`, `0x0002:0001`, `0x0004:0002`, `0x0010:0004`, `0x0020:0005`, `0x0200:0009`, `0x0400:000a`, `0x0800:000b`, `0x1000:000c`, `0x4000:000e`, and `0x8000:000f`, which reads like one sparse owner-side slot taxonomy rather than one-off wrappers.
- `0005:2918`, `0005:2c35`, and `0005:2c68` are especially useful because they push extra payload words before the out pointer. That shape fits the current VM model of `slot-selected context + caller-provided payload data` more naturally than a pure referent-anchor lookup.
- `0005:2d30` is the strongest new caller-side anchor. Its branch structure is about class/state gating, dispatch-entry emission, and gameplay-object cleanup/state changes before the masked VM call, which is a better behavioral match for active-event or trigger-bearing descriptors than for a passive referent anchor.
#### Current attribution after the wrapper pass
- The wrapper family now fits the readable active-event template better than the narrow `JELYHACK` referent-anchor template. The callers are dominated by gameplay state checks, class-table gating, dispatch-entry emission, and object-state writes; that is closer to `EVENT` / `NPCTRIG` / `_BOOT` style active-event ecosystems than to a record whose only verified descriptor-side field is `referent`.
- This does not overturn the existing JELYHACK model. `JELYHACK` / `JELYH2` still fit best as referent anchors that can feed the VM referent registry, while neighboring event-bearing descriptors can attach behavior to the same island.
- The direct descriptor bridge is still unproven. No code path in this wrapper family reads an explicit EUSECODE class id or a `69:0A00 event` versus `24:0A02 eventTrigger` tag, so the result stays at ecosystem-level correlation rather than a hard descriptor-class rename.
#### Concrete caller/xref addendum from the next pass
- Direct callsites are now pinned for the simpler wrappers: `0005:0292 -> 0005:2c06`, `0005:0fee -> 0005:2cd2`, `0005:5946/59e9 -> 0005:2c9b`, and `0007:814e/822e -> 0005:2d01`.
- `0005:2c68` is no longer usable as indirect selector evidence. The `0007:e521` and `0007:e73c` instruction windows do push `0x2c68` immediately before `CALLF 000a:44fd`, but decompile now shows that value is the caller-local data pointer `DAT_0000_2c68` passed into a fatal-report helper, not an indirect call to wrapper `0005:2c68`.
- `0005:2c35` and `0005:2c68` therefore both remain unresolved in direct caller/xref evidence, and the real selector work stays centered on the still-xref-dark upstream edge into `FUN_000d_ebe3` rather than the disproven `000a:44fd` hypothesis.
- Net effect: the active-event ecosystem fit is reinforced by direct caller behavior and payload shapes, but final slot-to-descriptor ownership still requires real caller-role recovery for the remaining xref-dark entry points.
| `000c:f844` | `entity_vm_context_setup` | Calls `entity_vm_stack_init_with_data`, then sets `+0xd6..+0xe3` with position/dimension/state params |
| `000c:f600` | `entity_vm_pair_stack_push` | Push (word_a, word_b) onto 31-entry array at `[ptr+0x80]` (count); error if full |
| `000c:f63c` | `entity_vm_pair_stack_pop` | Pop and return word from pair stack; error if empty |

View file

@ -255,3 +255,38 @@ The `0x4588` FAR object is a runtime-installed callback/dispatch object that par
| `0x45a6` | clock/cookie global used by `assert_buffer_valid` |
| `0x39ca` | dispatch callback-table pointer |
| `0x6828` | `g_active_dispatch_entry_farptr` |
---
## Follow-up: VM Owner/Resource Loader and Owner-Loaded Class Validation
The next ScummVM-guided validation step now confirms that the sampled owner-loaded EUSECODE classes are compatible with the ScummVM indexing model even though one header detail remains open.
### Sampled class-record findings
- Using the extracted chunks plus the live raw path `000d:44df -> 000d:4c99 -> 000d:7000`, the large chunk at table offset `0x88` behaves as object `1`.
- For representative class bodies, deriving `object_index = (table_offset - 0x80) / 8`, then `class_id = object_index - 2`, and then reading object `1` at `4 + 13 * class_id` yields the expected names: `EVENT`, `NPCTRIG`, `SURCAMNS`, `JELYHACK`, `REE_BOOT`, `SURCAMEW`, and `SFXTRIG`.
- This is the first direct local confirmation that the owner-loaded records match the ScummVM `object 1` name-table plus `classid + 2` body lookup at the indexing level.
### Header and event-table shape
- The sampled class records do contain a stable 4-byte header field at bytes `8..11`.
- The observed values are small boundaries: `0x00d4`, `0x00da`, and `0x00e6` in the current sample set.
- Treating that dword directly as the first post-event-table offset makes the layout line up cleanly: `(dword_at_8 - 20) / 6` yields valid tables of 32, 33, or 35 slots before inline payload/name data begins.
- The region at `class + 0x14` is therefore now directly confirmed as repeated 6-byte slots with `u16 unknown_word + u32 code_or_payload_field` layout.
- Representative low-slot examples are `JELYHACK` slot `1` = `{word=0x002a, dword=0x00000001}`, `SURCAMNS` slot `1` = `{word=0x0051, dword=0x000000d2}`, `SURCAMEW` slot `1` = `{word=0x00f7, dword=0x000000d2}`, `EVENT` slot `10` = `{word=0x1fd6, dword=0x00000001}`, and `REE_BOOT` slots `10/15/16` = `{0x034b,1}`, `{0x025c,0x034c}`, `{0x003b,0x05a8}`.
- The leading event word is still not decoded semantically.
### What remains open
- Scanning with the previously noted ScummVM-style `(base_offset + 19) / 6` interpretation overruns into inline payload/name bytes on these owner-loaded records, so the local sample set does not support that exact event-count formula as written.
- The best current arithmetic fit is now tighter: ScummVM's decremented `base_offset` is also used as the live code-stream base in `uc_machine.cpp`, so the local owner-loaded records fit best if bytes `8..11` are the first code-byte offset and event-count derivation is `(base_offset - 19) / 6`, which is exactly equivalent here to `(raw_u32_at_8_11 - 20) / 6`.
- Current `000d` loader evidence does not point to a header rewrite before VM consumption. `entity_vm_runtime_init_from_path_if_configured` (`000d:44df`) only builds the external path and creates the runtime, `entity_vm_runtime_create` (`000d:4c99`) only installs the helper returned by `000d:7000`, `entity_vm_runtime_owner_resource_create` (`000d:7000`) only allocates the child owner table and fills it through helper vtable `+0x0c`, and `entity_vm_context_create_from_slot_index` (`000d:46ec`) directly reads slot-backed source data from that owner table. No local step is yet verified as rewriting the sampled class headers.
- `entity_vm_runtime_owner_resource_create` (`000d:7000`) still does not expose a direct binary-side class-name lookup or explicit `classid + 2` arithmetic. What it does expose is an indexed file-set loader contract: helper-owned count at `+0x14`, far-pointer table at `+0x10`, paired per-entry word table at `+0x18`, vtable `+0x04` size query, and vtable `+0x0c` materialization of the `0x0d`-stride owner records later consumed by `entity_vm_context_create_from_slot_index`.
- Safe event-label correlation remains intentionally narrow after this pass. The sampled low slot ids are now concrete, but none of them yet have a verified binary-side behavior match strong enough to promote a ScummVM label like `look`, `use`, or `cachein`.
### Conservative parser rule from this batch
- For current owner-loaded/raw EUSECODE work, keep bytes `8..11` raw and derive event count only with `(raw_u32_at_8_11 - 20) / 6` when divisibility and object-size bounds checks succeed.
- Keep the decremented `code_base_minus_one = raw_u32_at_8_11 - 1` as a separate code-addressing field rather than collapsing it into the event-count rule.
- Preserve the 6-byte event rows and their leading word verbatim until the event-entry word semantics are verified.

View file

@ -42,6 +42,7 @@ A small helper cluster in the raw `000e:` area implements a fixed-size CRLF reco
- `table_end = 0x6090`, which matches the first non-zero payload offset
- `403` non-zero entries in the current file
- `tools/extract_eusecode_flx.py` now parses the full validated table and emits all `403` non-zero entries under `USECODE/EUSECODE_extracted/`, including `entry_index.tsv`, `descriptor_index.tsv`, `descriptor_neighborhoods.tsv`, `summary.json`, per-chunk `.bin`, and `.strings.txt` sidecars.
- The extractor now also carries the conservative owner-loaded class rule directly into machine-readable outputs: `class_layout_index.tsv` records `object_index`, `class_id`, the raw bytes-`8..11` field, derived `code_base_minus_one`, and `conservative_event_count`, while `class_event_index.tsv` expands parsed classes into raw 6-byte event rows with slot numbers, ScummVM event-name hints for `0x00..0x1f`, unresolved leading words, and raw code-offset dwords.
- The generated reports now expose lightweight descriptor summaries (`primary_label`, `field_names`, `field_tags`) so the object lane can be searched by field grammar instead of only by raw names.
- The extracted data now separates into at least two lanes:
- text-heavy records that fit the `000e:` CRLF parser model, such as `DATALINK` mission/objective text and `TEXTFIL1` message banks
@ -92,7 +93,7 @@ A small helper cluster in the raw `000e:` area implements a fixed-size CRLF reco
- opcode `0x1a` = remove matching indirect/string-like payload entries from the referent chain
- opcode `0x1b` = remove matching inline/fixed-size payload entries from the referent chain
- the same helper body also implies the missing sibling `0x18` as the inline/fixed-size append-unique form, because only `0x19/0x1a` set the indirect compare flag while only `0x1a/0x1b` take the removal path
- The first concrete `000c` to `000d` bridge inside that lane remains `entity_vm_set_value_from_slot_plus_offset` at `000c:f95f`: it calls `entity_vm_slot_load_value_plus_offset`, stores its return pair into object fields `+0xd6/+0xd8`, and sits immediately beside other `entity_vm_*` helpers in the `000c:f6b8..f9d9` mini-VM cluster. On the `000d` side, `entity_vm_slot_load_value_plus_offset` wraps `entity_vm_slot_load_value`, and `entity_vm_slot_load_value` contains a concrete `PUSH 0x410` event-emission path at `000d:5290`.
- The first concrete `000c` to `000d` bridge inside that lane remains `entity_vm_set_value_from_slot_plus_offset` at `000c:f95f`: it calls `entity_vm_slot_load_value_plus_offset`, stores its return pair into object fields `+0xd6/+0xd8`, and sits immediately beside other `entity_vm_*` helpers in the `000c:f6b8..f9d9` mini-VM cluster. On the `000d` side, `entity_vm_slot_load_value_plus_offset` wraps `entity_vm_slot_load_value`, but the old `PUSH 0x410` suspicion at `000d:5290` is now rejected: that site reaches the seg091 fatal-report helper family at `000a:44fd`, not live gameplay dispatch.
- The two main `000d` caller blocks beneath that bridge now have a first stable byte/value reading too:
- internal block `000d:208b` is the simple materialize-or-forward path: it creates one VM context from the caller's stream state, checks the returned object flags, and either writes the returned value pair straight to the caller output slot or forwards the created object's low word through the shared opcode epilogue
- internal block `000d:21ed` is the inline-payload path: it creates the same VM context, prepends the caller-owned blob into the backward-growing context buffer at `+0x102`, then consumes two bytes from the seeded `+0xd6/+0xd8` lane as small shape/count metadata before building an `entity_link` closure matrix from the following caller-stream words and pushing back the non-`0x0400` results
@ -121,6 +122,8 @@ A small helper cluster in the raw `000e:` area implements a fixed-size CRLF reco
- `environmental-event`: `FLAMEBOX`, `NOSTRIL`, `STEAMBOX`
- `callback-eventtrigger`: `SURCAMNS`, `SURCAMEW`
- That split matters because it is the first extractor-backed distinction between active event carriers and callback-only trigger holders. The `69:0A00 -> event` classes now look like the active event-bearing core of the descriptor system, while the surveillance classes with `24:0A02 -> eventTrigger` are better treated as callback/attachment endpoints rather than peer event hubs.
- The extractor now emits a stronger script-facing bridge artifact too: `runtime_descriptor_family_rankings.md` / `.tsv` rank those descriptor families against the verified runtime lanes instead of only listing neighborhoods. Current best fit is `EVENT` as the strongest active-event payload lane, `_BOOT` cores and `NPCTRIG` as strong satellites, `SFXTRIG` / environmental classes as moderate active-event fits, `JELYHACK` / `JELYH2` as the dedicated referent-anchor lane, and `SURCAM*` as structurally distinct callback/attachment holders.
- That ranking is anchored by the current owner-loader evidence as well as the descriptor grammar: `000d:44df -> 000d:4c99 -> 000d:7000` supplies the slot-backed source, and raw seg070 windows `0009:67b6` / `0009:6916` now show the embedded helper walking object `+0x10/+0x18` tables, formatting per-entry paths, and open/read/close-loading files before the `0x0d`-stride owner records are materialized.
- The next focused pass tightened the `_BOOT` lane too. `boot_family_compare.tsv` now shows that all five `_BOOT` event cores (`AND_BOOT`, `BRO_BOOT`, `COR_BOOT`, `VAR_BOOT`, `REE_BOOT`) share the same header skeleton and the same compact field shape (`referent,event,counter,item`). The meaningful differences are payload size and local neighborhood, not descriptor schema.
- The new `boot_frontier_graph.md` makes the best early `_BOOT` frontier explicit: `AND_BOOT` and `BRO_BOOT` sit in one compact referent-heavy neighborhood (`OFFWORK`, `GUARD`, `GDOOR_N`, `GDOOR_E`, `BIGCAN`, `CRUMORPH`, `GUARDSQ`, `CARD_NS`, `CARD_EW`, `EWALLEW`/`EWALLNS`) and also point directly at each other as adjacent event-bearing siblings. So the present best reading is a reusable boot-event core template instantiated in several different local object islands, not a set of unrelated boot scripts.
- The environmental hazard lane is now similarly constrained. `environmental_family_compare.tsv` shows that `FLAMEBOX` and `STEAMBOX` are close structural siblings with the same active-event backbone (`referent,event,<hazard>,<hazard2>,direction,count`) and matching `24:0A02 / 24:FC02 / 24:FE02` object-link pattern, while `NOSTRIL` is a smaller fire-specific variant that keeps the active `event` plus dual fire references and count fields but drops the direction/newType side.
@ -188,12 +191,18 @@ All three constructor variants (`000e:2777`, `000e:2860`, `000e:2969`) follow th
1. Call `FUN_000e_e935` (allocator — produces garbled 11KB decompile, not renamed)
2. Set fields `+0xb4` through `+0xc2` on the result
3. Call `000d:ebe3` (multi-step chain initializer: calls `177c`, `1acb`, `0988`, `22bc`, `1d4a`, `2104` in sequence)
3. Call `000d:ebe3` directly (confirmed CALL sites at `000e:283e`, `000e:2931`, `000e:29e4`; multi-step chain initializer: calls `177c`, `1acb`, `0988`, `22bc`, `1d4a`, `2104` in sequence)
4. Call `assert_alive_sentinel` (assertion: checks `+0xd4 != -1`)
5. Call `func_0x000eec83`
The chain at `000d:ebe3` steps through VM opcode handlers (`000d:177c`, `000d:1acb`, `000d:0988`) that operate on a bytecode VM object with stack pointer at `+0xcc` (decremented by 2 per push) and segment base at `+0xce`.
The constructor-side field setup before that sequencer is now slightly tighter too:
- variants A and B both set `+0xc0 = 1` before the direct `000d:ebe3` call and derive `+0xc2` from `DS:0x604e`
- variant C instead sets `+0xc0 = 0`, `+0xc2 = 1`, and `+0x4c = 0x000d` before the same sequencer call
- these direct xrefs make `000d:ebe3` a constructor-side animation sequencer rather than a globally xref-dark dispatcher, but they still do not expose any new wrapper-level opcode number beyond the internal `0x19/0x1a/0x1b` family already proven inside `000d:0988`
### Constructor variant renames
| Address | Name |

View file

@ -0,0 +1,428 @@
# ScummVM Crusader Reference
## Purpose
This note catalogs the Crusader-specific code inside ScummVM's Ultima 8 engine so it can be used as a planning aid for Crusader reverse-engineering work.
Primary source tree: `K:\misc\scummvm\engines\ultima\ultima8`
Important limitation: this is a high-level reimplementation, not a symbol map for the original DOS binaries. It is most useful for:
- identifying original data files and container formats
- naming likely subsystem boundaries
- understanding USECODE VM and event structure
- spotting Remorse versus Regret divergences
- finding concrete file-format footholds for parsers and validators
It is not sufficient on its own for direct raw-function renaming.
## Highest-Value Findings
1. ScummVM keeps a Crusader-specific USECODE description layer with named event ids and large intrinsic signature tables.
Files: `usecode/uc_machine.cpp`, `usecode/usecode_flex.cpp`, `convert/crusader/convert_usecode_crusader.h`, `convert/crusader/convert_usecode_regret.h`, `usecode/remorse_intrinsics.h`, `usecode/regret_intrinsics.h`.
2. ScummVM has explicit parsers for the core Crusader container families used by gameplay assets: FLEX archives, raw archives, USECODE containers, shapes, sound archives, speech archives, save files, and movie subtitle files.
Files: `filesys/flex_file.cpp`, `filesys/archive.cpp`, `filesys/raw_archive.cpp`, `usecode/usecode_flex.cpp`, `audio/sound_flex.cpp`, `audio/speech_flex.cpp`, `filesys/savegame.cpp`, `gumps/movie_gump.cpp`.
3. Crusader-specific gameplay metadata is loaded centrally from a predictable file set.
File: `games/game_data.cpp`.
This is the best ScummVM-side inventory of original asset families to compare against current RE notes.
4. World and item loading diverge for Crusader in a few concrete ways that likely reflect real original-engine differences.
Files: `world/map.cpp`, `world/current_map.cpp`, `world/item_factory.cpp`, `gfx/shape_info.cpp`, `world/weapon_info.h`, `world/world.cpp`, `world/egg.cpp`.
5. Crusader UI, media, and player-control code is separated into clear game-specific files.
Files: `gumps/cru_*.cpp`, `world/actors/cru_avatar_mover_process.cpp`, `audio/cru_music_process.cpp`, `games/start_crusader_process.cpp`, `games/cru_game.cpp`.
## Detection, Boot, and Game Split
### `metaengine.cpp`
- ScummVM treats Ultima 8 and Crusader as one engine family but gives Crusader its own control map.
- The Crusader keymap is a useful external reference for action vocabulary: weapon cycling, inventory cycling, medikit, energy cube, bomb detonation, search/select item, use selection, grab item, attack, center camera on player, jump/roll/crouch, sidesteps, rolls, and crouch toggle.
- `querySaveMetaInfos()` uses `SavegameReader`, which is the entry point for ScummVM-side Crusader save metadata.
### `ultima8.cpp`
- Engine startup registers Crusader-specific process loaders such as `CruAvatarMoverProcess`, `CruPathfinderProcess`, and `CruMusicProcess`.
- `initializePath()` explicitly adds a `data` subdirectory for at least one Regret variant.
### `games/cru_game.cpp`
- `loadFiles()` loads Crusader palettes from `static/gamepal.pal`, `cred.pal`, `diff.pal`, `misc.pal`, `misc2.pal`, and optionally `star.pal`.
- `loadFiles()` then calls `GameData::loadRemorseData()`, which is the central Crusader asset-loader in ScummVM.
- `startGame()` creates the main actor with shape `1`, reserves object ids `384..511`, initializes HP and energy-like stats from `NPCDat`, and switches to map `0`.
- `playIntroMovie()` uses `T01` and `T02` for Remorse, `origin` and `ANIM01` for Regret, and warns that `FLICS` and `SOUND` directories must be copied from the CD.
### `games/start_crusader_process.cpp`
- Startup sequence is explicit: intro movie 1, intro movie 2, difficulty menu, then live game setup.
- ScummVM creates the Crusader HUD gumps (`CruStatusGump`, `CruPickupAreaGump`) before normal play begins.
- It seeds inventory with shape `0x4d4` (`datalink`) and `0x598` (`smiley`), sets shield type, teleports the actor through map `1`, egg `0x1e`, and applies a Regret-specific combat-ready start state.
- This file is a good checklist for early-game object ids, item shapes, and startup-only side effects.
## Core Asset Loading
### `games/game_data.cpp`
`GameData::loadRemorseData()` is the single best source-file summary of original Crusader asset families known to ScummVM.
Loaded files and why they matter:
- `static/fixed.dat`: fixed-object archive for world/map loading.
- `usecode/<lang>usecode.flx`: main USECODE container.
- `static/shapes.flx`: main shape archive, loaded with Crusader-specific shape format.
- `remorseweapons.ini` or `regretweapons.ini`: ScummVM-maintained weapon metadata overlays.
- `remorsegame.ini`: ScummVM-maintained game config overlay.
- `static/typeflag.dat`: per-shape type flags.
- `static/anim.dat`: animation metadata.
- `static/wpnovlay.dat`: weapon overlay metadata.
- `static/glob.flx`: glob data loaded into `MapGlob` objects.
- `static/fonts.flx`: font archive.
- `static/mouse.shp`: cursor shapes.
- `static/gumps.flx`: UI art.
- `static/dtable.flx`: NPC data table (`NPCDat`).
- `static/damage.flx`: damage data consumed by main shape logic.
- `sound/sound.flx`: sound archive.
- `sound/<lang><shape>.flx`: speech per shape, loaded lazily by `getSpeechFlex()`.
Implication for RE:
- This gives a concrete file-driven decomposition of the engine: world placement, usecode, shape/type metadata, overlay metadata, NPC tables, damage rules, UI art, sound, and speech are all separated.
- `dtable.flx`, `damage.flx`, `glob.flx`, and `wpnovlay.dat` should be treated as high-value parser targets if they are not already covered in local tooling.
## Container and File-Format Evidence
### `filesys/flex_file.cpp`
- FLEX detection looks for a padded header region filled with `0x1A`.
- Metadata reader uses:
- table offset `0x80`
- entry count at file offset `0x54`
- 8-byte table entries of `<offset, size>`
- ScummVM rejects counts above `4095` and notes that the largest observed Crusader/U8 FLEX has `3074` entries.
Implication for RE:
- This strongly matches the currently validated EUSECODE/FLEX structure already recovered locally.
- It also gives a second independent implementation to compare against any local extractor edge cases.
### `filesys/archive.cpp` and `filesys/raw_archive.cpp`
- `Archive` layers multiple `FlexFile` sources and resolves objects from newest source to oldest source.
- `RawArchive` caches raw object bytes and exposes them as memory streams.
Implication for RE:
- If any Crusader resources use overlay-style replacement behavior, ScummVM already models that archive precedence.
- This is worth checking before assuming a single-file source of truth for a given object id.
### `usecode/usecode_flex.cpp`
- USECODE classes are addressed as `classid + 2` inside the archive.
- Class names are read from object `1` at `name_object + 4 + 13 * classid`.
- For Crusader, class base offset is read from bytes `8..11` of the class object and decremented by `1`.
- Crusader event count is computed as `(get_class_base_offset(classid) + 19) / 6`.
Implication for RE:
- This is directly relevant to current USECODE work. It provides ScummVM's concrete interpretation of the Crusader class header layout and event-table sizing.
- If local EUSECODE or USECODE parsing still has uncertainties around header size, entry table layout, or event count, this file is the first external cross-check to apply.
## USECODE VM, Events, and Intrinsics
### `usecode/uc_machine.cpp`
- Crusader uses a `ByteSet(0x1000)` global-state store, unlike the U8 `BitSet` path.
- Remorse initializes global `0x003c` to avatar number `1`; Regret initializes global `0x001e`.
- The VM selects `ConvertUsecodeCrusader` for Remorse and `ConvertUsecodeRegret` for Regret.
Implication for RE:
- This is concrete evidence that the Crusader VM/global model diverges from U8 enough that it should not be treated as a drop-in match.
- The initialized global slots are worth comparing against already-known runtime globals in the raw executable.
### `convert/crusader/convert_usecode_crusader.h`
- ScummVM ships a named Crusader event table for event ids `0x00..0x1f`.
- Named events include `look`, `use`, `anim`, `setActivity`, `cachein`, `hit`, `gotHit`, `hatch`, `schedule`, `release`, `equip`, `unequip`, `combine`, `calledFromAnim`, `enterFastArea`, `leaveFastArea`, `avatarStoleSomething`, `animGetHit`, and `unhatch`.
- The same file also includes a large 512-entry intrinsic signature table with many behavior comments extracted from prior Pentagram reverse-engineering.
### `convert/crusader/convert_usecode_regret.h`
- Regret reuses the Crusader event-name table but has a different intrinsic numbering/signature map.
### `usecode/remorse_intrinsics.h` and `usecode/regret_intrinsics.h`
- These provide the live intrinsic dispatch tables used by the engine.
- High-value entries for current RE include weapon firing, status/quality accessors, object creation/destruction, camera moves, palette fades, movie playback, teleport-to-egg, keycard clearing, damage reception, and Crusader-specific audio calls.
High-value USECODE bridge examples from ScummVM's tables:
- `Item::I_fireWeapon`
- `AudioProcess::I_playSFXCru`
- `AudioProcess::I_playAmbientSFXCru`
- `StatusGump::I_hideStatusGump` / `I_showStatusGump`
- `MovieGump::I_playMovieOverlay`
- `World::I_setControlledNPCNum`
- `MainActor::I_clrKeycards`
- `PaletteFaderProcess` fade/jump helpers
- `Egg::I_getEggId`, `I_getEggXRange`, `I_setEggXRange`
Implication for RE:
- These files are an immediate planning aid for USECODE annotation. Even where names are approximate, they constrain argument counts, broad behavior, and event purpose.
- `convert_usecode_crusader.h` is especially valuable because it records many comments of the form "based on disasm" or "same coff as", which likely came from earlier source-level Crusader RE.
## Shapes, Type Flags, Weapons, and Item Families
### `convert/crusader/convert_shape_crusader.cpp`
- ScummVM declares two Crusader-specific shape layouts: `CrusaderShapeFormat` and `Crusader2DShapeFormat`.
- The main 3D-ish shape format uses:
- 6-byte header
- 8-byte frame header
- 28-byte secondary frame header
- explicit width/height/xoff/yoff fields
- The 2D shape format uses a 20-byte secondary frame header.
Implication for RE:
- This is the quickest external reference for main-world versus UI/mouse/gump shape decoding.
### `gfx/shape_info.cpp`
- Crusader type flags are decoded with a different bit layout than U8.
- ScummVM treats Crusader type-flag space as extending to at least bit `71`, with several still-marked unknown ranges.
Implication for RE:
- Any local typeflag decoder should treat Crusader as its own layout, not as the U8 layout with extra cases.
### `world/weapon_info.h`
- Crusader-specific weapon fields include `_sound`, `_reloadSound`, `_ammoType`, `_ammoShape`, `_displayGumpShape`, `_displayGumpFrame`, `_small`, `_clipSize`, `_energyUse`, `_field8`, and `_shotDelay`.
Implication for RE:
- This header is a good target schema for interpreting weapon-related tables and shape metadata in the original data.
- `_field8` is still uncertain in ScummVM, which is a useful warning not to over-claim its meaning in the raw game.
### `world/item_factory.cpp`
- Crusader item families include `SF_CRUWEAPON`, `SF_CRUAMMO`, `SF_CRUBOMB`, and `SF_CRUINVITEM`.
- Item construction applies Crusader-only defaults:
- damage points from shape damage info
- weapon clip size copied into initial quality
- ammo and bomb quality initialized to `1`
Implication for RE:
- This ties together shape family, shape damage info, weapon tables, and runtime item state.
- The quality field is confirmed as overloaded for ammo/clip counts and inventory stack-like quantities.
## World, Maps, Eggs, and Cache-In Behavior
### `world/map.cpp`
- Fixed and nonfixed map objects are read as 16-byte records.
- ScummVM reads each record as:
- `x` = uint16
- `y` = uint16
- `z` = uint8
- `shape` = uint16
- `frame` = uint8
- `flags` = uint16
- `quality` = uint16
- `npcNum` = uint8
- `mapNum` = uint8
- `next` = uint16
- It then applies `World_FromUsecodeXY(x, y)` before constructing items.
- Container nesting is not read from a separate structure: the on-disk `x` field is temporarily treated as container depth while reading hierarchical contents.
Implication for RE:
- This is one of the most concrete format descriptions in the ScummVM codebase.
- It is directly useful for validating fixed/nonfixed parsers and for checking whether any currently unnamed raw loader functions correspond to this record layout.
### `world/current_map.cpp`
- Crusader uses `_mapChunkSize = 1024`; U8 uses `512`.
- When loading a map, ScummVM always calls cache-in events in Crusader (`callCacheIn = (_currentMap != nullptr || GAME_IS_CRUSADER)`).
- It also explicitly calls actor cache-in events for Crusader after actor scheduling.
Implication for RE:
- Cache-in behavior appears more aggressive or more semantically important in Crusader than in U8.
- This may help explain some map-enter or object-activation behavior currently attributed to general dispatch code.
### `world/egg.cpp`
- Crusader supports `unhatch()` as a real egg event path; U8 does not.
- Eggs store a `_hatched` state and expose `get/set egg x/y range` plus `get/set egg id` intrinsics.
Implication for RE:
- `unhatch` is a strong clue for interpreting Crusader trigger/reset behavior.
### `world/world.cpp`
- Crusader save/load stores extra world fields beyond the shared baseline:
- alert active
- difficulty
- controlled NPC number
- Vargas shield value
- `setAlertActiveRemorse()` and `setAlertActiveRegret()` search for concrete shape ids and mutate frames/shapes to update world-state visuals.
- `setGameDifficulty()` contains a Remorse-specific BA-40 ammo patch that modifies weapon metadata at runtime.
Implication for RE:
- Alert-state and difficulty are not just UI globals; ScummVM models them as world-affecting state with concrete shape mutations.
## UI, Interaction, and Player-Control Code
### `gumps/cru_status_gump.cpp`
- Crusader HUD is composed from five child gumps: weapon, ammo, inventory, health, and energy.
### `gumps/cru_weapon_gump.cpp`, `cru_ammo_gump.cpp`, `cru_inventory_gump.cpp`
- HUD display is driven by weapon metadata fields such as `_displayGumpShape`, `_displayGumpFrame`, `_ammoShape`, and live `quality` values.
- `CruAmmoGump` confirms bullets are current weapon quality and reserve clips are counted from the first inventory item matching `ammoShape`.
- `CruInventoryGump` renders the active inventory item through the weapon-info display fields and shows quantity when `quality > 1`.
Implication for RE:
- These files are a good external model for active-weapon, ammo-reserve, and active-inventory state fields.
### `gumps/game_map_gump.cpp`
- Double-click `use` range is `512` in Crusader versus `128` in the shared path.
### `world/actors/cru_avatar_mover_process.cpp`
- Crusader movement logic is explicitly different from U8 and models combat movement, one-shot moves, short jump, crouch, sidesteps, rolls, rebel-base special cases, and combat-angle smoothing.
Implication for RE:
- This file is a practical behavioral checklist when classifying input/combat locomotion code in the raw executable.
## Audio, Speech, and Movies
### `audio/sound_flex.cpp`
- Crusader `sound.flx` differs from U8:
- object `0` contains an index whose entries start with a leading `0x00` or `0xFF`, then 3 bytes of extra data, then a null-terminated sound name
- `ASFX` entries are interpreted as 32-byte header plus raw 11025 Hz sample data
- Non-`ASFX` entries fall back to Sonarc decoding.
Implication for RE:
- This is one of the strongest container-format anchors in the ScummVM codebase.
- If local tooling still treats Crusader audio as opaque FLEX payloads, this file should drive the next parser pass.
### `audio/speech_flex.cpp`
- Speech FLEX object `0` is parsed as a sequence of null-terminated phrases.
- Playback lookup is phrase-prefix based: ScummVM normalizes text and searches phrase table entries to map text to sound samples.
Implication for RE:
- Speech archives are not just sample banks; they embed text phrase indices.
- This can help tie dialog strings back to per-shape voice resources.
### `audio/cru_music_process.cpp`
- Remorse and Regret have separate track name tables.
- Regret track `0x45` means "use the current map's default track" via a hardcoded map-to-track table.
- Remorse track `16` cycles through `M16A`, `M16B`, and `M16C`.
- Music is loaded from `sound/<track>.amf`.
Implication for RE:
- This is useful for identifying music-selection logic and map-to-music linkage in the original executable.
### `gumps/movie_gump.cpp`
- Crusader movie playback uses AVI files under `flics/`.
- Subtitle loading accepts either `.txt` or `.iff` sidecar files.
- ScummVM normalizes certain movie names because USECODE references `mva1`, `mva3a`, `mva5a`, etc., while files on disk may be `mva01`, `mva03a`, `mva05a`.
Implication for RE:
- This is a concrete example of ScummVM compensating for original asset-name/usecode mismatches.
- The subtitle `.iff` fallback is a useful clue for unexplained IFF-like resources.
## Save/Load Format
### `filesys/savegame.cpp`
- ScummVM supports two save formats:
- native `VMU8` saves with versioned file-entry archive payloads
- older Pentagram zip-based saves
- Native saves use a 12-byte file name field and per-entry size/data blocks.
Implication for RE:
- This is mostly relevant to ScummVM compatibility, not original DOS save format recovery.
- It still matters because ScummVM serializes engine state explicitly enough to reveal which runtime fields it considers necessary for Crusader continuity.
## Best Files For Immediate RE Follow-Up
If time is limited, the most valuable ScummVM files to mine first are:
1. `games/game_data.cpp`
Why: best single inventory of Crusader data files and subsystems.
2. `usecode/usecode_flex.cpp`
Why: concrete Crusader USECODE class header and event-count interpretation.
3. `convert/crusader/convert_usecode_crusader.h`
Why: named event ids plus a large intrinsic-signature table with comments.
4. `audio/sound_flex.cpp`
Why: concrete Crusader sound archive interpretation.
5. `world/map.cpp`
Why: concrete fixed/nonfixed map record layout and container nesting behavior.
6. `world/weapon_info.h` and `world/item_factory.cpp`
Why: practical schema for weapon/ammo/inventory metadata.
7. `gumps/movie_gump.cpp`
Why: movie filename normalization and subtitle sidecar handling.
8. `world/current_map.cpp` and `world/world.cpp`
Why: Crusader-only cache-in, alert-state, difficulty, and map chunk differences.
## Suggested RE Uses In This Repo
### USECODE parsing
- Compare local USECODE/EUSECODE container assumptions against `usecode/usecode_flex.cpp`.
- Import ScummVM's event-name table as a conservative annotation source for event ids `0x00..0x1f`.
- Use `convert_usecode_crusader.h` and `remorse_intrinsics.h` as a cross-check for intrinsic numbering, argument counts, and broad semantics.
- Compare Remorse versus Regret intrinsic numbering before assuming one numbering scheme is universal.
### Data-format work
- Validate local FLEX readers against `filesys/flex_file.cpp`.
- Prioritize parsers for `dtable.flx`, `damage.flx`, `glob.flx`, and `wpnovlay.dat` because ScummVM treats them as core runtime inputs.
- Split shape decoding between Crusader main shapes and 2D/gump shapes using `convert_shape_crusader.cpp`.
- Treat `sound.flx` and speech FLEX files as structured formats, not opaque blob stores.
### Raw executable classification
- Use ScummVM's subsystem boundaries to guide search targets for:
- cache-in and unhatch event paths
- alert-state world mutations
- map chunking and area search behavior
- weapon clip/ammo/energy metadata consumers
- movie name normalization and subtitle loading
- Regret map-to-track music selection
## Conservative Takeaways
- ScummVM does not directly solve raw-symbol naming, but it materially sharpens the planning surface for Crusader RE.
- The most actionable ScummVM contributions are format schemas, event/intrinsic vocabularies, and subsystem boundaries.
- For current repo priorities, the strongest leverage is on USECODE parsing, data-file parser expansion, and validation of world/object metadata structures.

View file

@ -0,0 +1,272 @@
# USECODE Round-Trip IR Plan
## Purpose
This note records the current evidence-backed path from Crusader USECODE bytes to a human-readable, editable, and recompilable script form.
It is intentionally conservative. ScummVM gives strong external anchors for the container layout, class/event numbering, and intrinsic naming, but it is not a symbol map for the DOS binary and it is not a ready-made round-trip compiler.
## Externally Anchored Pieces
### Container and class layout
ScummVM now gives a concrete second implementation for the Crusader USECODE class layout:
- `usecode/usecode_flex.cpp` treats each class body as archive object `classid + 2`.
- Class names come from archive object `1` at `name_object + 4 + 13 * classid`.
- For Crusader, the class base offset is read from class bytes `8..11` and then decremented by `1`.
- Crusader event count is computed as `(base_offset + 19) / 6`.
- `usecode/usecode.cpp` resolves event `N` from class data at `20 + 6 * N`, with the code offset stored in bytes `+2..+5` of each 6-byte event record.
Combined with the already validated FLEX container notes, the current externally anchored container model is:
- FLEX entry count at `0x54`
- FLEX table at `0x80`
- USECODE class object index = `classid + 2`
- Crusader class header contains a four-byte base-offset field at bytes `8..11`
- Crusader event table entries are 6 bytes each, with a known dword code offset and an still-unknown leading word
ScummVM also makes one implementation choice explicit that matters for the current mismatch: `uc_machine.cpp` uses `get_class_base_offset()` as the execution-stream base for Crusader class code, not only as metadata for event counting. That means the `obj[8..11] - 1` value is part of the live code-addressing model in ScummVM, not just a comment-level interpretation.
### Binary-side validation against owner-loaded classes
The first direct local validation pass against sampled owner-loaded EUSECODE class records now splits the ScummVM model into two parts: one part is confirmed, and one part still needs reconciliation.
Confirmed on sampled records (`EVENT`, `NPCTRIG`, `SURCAMNS`, `JELYHACK`, `REE_BOOT`, `SURCAMEW`, `SFXTRIG`):
- The extracted chunk at table offset `0x88` behaves like object `1` for class names.
- For each sampled class body, deriving `object_index = (table_offset - 0x80) / 8`, then `class_id = object_index - 2`, and then reading 13 bytes from object `1` at `4 + 13 * class_id` yields the expected class name.
- The class bodies do have a stable 4-byte header field at bytes `8..11`.
- The region at `class + 20` is a real 6-byte event-slot table with `u16 unknown_word + u32 code_or_payload_field` layout.
Broader family spot-checks now keep the same local structure on the owner-loaded side. In addition to the first validated set, the nearby `_BOOT` and environmental event families (`AND_BOOT`, `BRO_BOOT`, `COR_BOOT`, `VAR_BOOT`, `FLAMEBOX`, `NOSTRIL`, `STEAMBOX`) continue to fit the same `table_offset -> object_index -> class_id` progression with a stable bytes-`8..11` dword and a 6-byte table at `+20`. No contradictory sample has appeared in the local EUSECODE set.
Not yet reconciled with ScummVM's current formula note:
- In the sampled owner-loaded records, the raw dword at bytes `8..11` is `0x00d4`, `0x00da`, or `0x00e6`.
- Treating that dword directly as the first post-event-table offset makes the layout line up cleanly: `(dword_at_8 - 20) / 6` gives 32, 33, or 35 valid slots in the samples.
- Scanning instead with the previously noted ScummVM-style `(base_offset + 19) / 6` interpretation overruns into inline payload and class-name bytes in the same samples.
Current best explanation:
- The mismatch is now best explained as a ScummVM interpretation/detail issue, not as a proven loader-side rewrite.
- The same ScummVM code path that decrements bytes `8..11` by `1` also uses that decremented value as the code-stream base. On the local owner-loaded records, this fits naturally if the raw dword is the first code-byte offset and event-table dword offsets are 1-based relative to `code_base_minus_one`.
- Under that reading, the sampled event-count rule becomes `(code_base_minus_one - 19) / 6`, which is exactly equivalent to `(raw_u32_at_8_11 - 20) / 6` and matches the validated `32/33/35` slot counts.
- The `000d` loader/runtime path (`000d:44df -> 000d:4c99 -> 000d:7000 -> 000d:46ec`) currently shows indexed file loading and slot-table materialization, but no verified per-class header rewrite before the VM consumes owner-backed records.
Current safe conclusion:
- The owner-loaded class records are compatible with `object 1` names, `classid + 2` body lookup, a header field at bytes `8..11`, and 6-byte event records at `+20`.
- The exact meaning of the bytes-`8..11` field is now narrower: on the local owner-loaded records it is best read as the first code-byte offset, with ScummVM's decremented `base_offset` acting as a `code_base_minus_one` anchor for 1-based event code offsets.
- The leading word of each 6-byte event entry remains unresolved.
### VM/runtime model
ScummVM also anchors several VM behaviors that line up with the current raw-binary work:
- `usecode/uc_machine.cpp` uses `ByteSet(0x1000)` for Crusader globals rather than the U8 bitset path.
- Remorse initializes global `0x003c` to avatar number `1`; Regret initializes `0x001e`.
- Opcode `0x11` is class/event dispatch in Crusader: the bytecode operand is an event number that is translated through `get_class_event()` before execution.
That makes the current local reading stronger: the `000d` runtime lane looks like a Crusader-specific object/event VM that should be interpreted against Crusader event ordinals, not against U8 assumptions.
### Event names
`convert/crusader/convert_usecode_crusader.h` gives a named event table for ids `0x00..0x1f`:
- Strongly usable names: `look`, `use`, `anim`, `setActivity`, `cachein`, `hit`, `gotHit`, `hatch`, `schedule`, `release`, `equip`, `unequip`, `combine`, `calledFromAnim`, `enterFastArea`, `leaveFastArea`, `cast`, `justMoved`, `avatarStoleSomething`, `animGetHit`, `unhatch`
- Weak placeholders remain for `0x0d` and `0x16..0x1f` (`func0D`, `func16`..`func1F`)
This is enough to annotate event ordinals safely, but not enough to rename raw binary handlers unless local behavior matches.
### Intrinsic tables
ScummVM provides two distinct kinds of intrinsic evidence:
- `convert/crusader/convert_usecode_crusader.h` and `convert_usecode_regret.h` provide ordinal-to-signature/name tables used for readable conversion.
- `usecode/remorse_intrinsics.h` and `usecode/regret_intrinsics.h` provide the live runtime dispatch tables.
The safe reading is:
- Remorse and Regret share the Crusader event-name table.
- Remorse and Regret do not share a single intrinsic numbering/signature map.
- Intrinsic names are strong hints for arity and broad subsystem identity, but they are still not direct rename authority for the DOS binary.
## Safe Reuse Rules
### Safe to import now
- Event names as labels for event ids `0x00..0x1f` in parsers, reports, and note files.
- Intrinsic ordinal names as `name_hint` or `signature_hint` metadata when the ordinal and argument-byte pattern match.
- High-level subsystem labels such as palette fade, camera, movie, audio, item/actor accessors, and weapon fire when they match existing binary evidence.
- Slot numbers from sampled owner-loaded classes even when the event name is still only a hint.
### Not safe to claim yet
- Direct raw-function renames based only on ScummVM event or intrinsic names.
- Remorse intrinsic numbering from Regret tables, or vice versa.
- Specific descriptor-family to slot-mask mappings that are not yet proven on the binary side.
- Meanings for the unknown leading word in the 6-byte Crusader event table entries.
- That the ScummVM `get_class_event_count()` formula applies unchanged to the sampled owner-loaded EUSECODE records.
## IR Requirements For Round-Tripping
The first script IR should preserve exact recompilation inputs before it tries to look pretty.
### Unit of decompilation
The IR should be organized as:
1. USECODE archive
2. class
3. event slot
4. instruction stream
That matches the externally anchored class/event layout and avoids baking in any still-unproven descriptor-to-runtime assumptions.
### Required top-level records
Each class record should preserve:
- `class_id`
- `class_object_index` (`classid + 2`)
- `name_slot_offset` (`4 + 13 * classid` within object `1`)
- `class_name`
- `raw_header_prefix`
- `raw_code_base_u32`
- `code_base_minus_one`
- `event_count`
- `raw_event_table_bytes`
Each event record should preserve:
- `event_id`
- `event_name_hint`
- `raw_event_entry_word`
- `code_offset`
- `raw_body_bytes`
- `decoded_ops`
## IR v0 Shape
The IR should separate authoritative fields from friendly hints.
```yaml
class:
class_id: 0x00be
class_name: EVENT
class_object_index: 0x00c0
raw_code_base_u32: 0x0138
code_base_minus_one: 0x0137
raw_header_prefix: <bytes>
events:
- event_id: 0x04
event_name_hint: cachein
raw_event_entry_word: 0x????
code_offset: 0x00001234
ops:
- op: intrinsic_call
intrinsic_ordinal: 0x001e
name_hint: Item::I_fireWeapon
signature_hint: Item::I_fireWeapon(Item *, x, y, z, byte, int, byte)
arg_bytes: 0x10
- op: vm_chain_mutation
vm_ir: APPEND_UNIQUE_INDIRECT
opcode_hint: 0x19
- op: unknown_raw
bytes: <exact original bytes>
```
### Why this shape
- `event_name_hint` is useful for humans but does not replace the event id.
- `name_hint` and `signature_hint` are useful for intrinsics but do not replace the ordinal.
- `unknown_raw` gives a lossless fallback for still-unmapped opcodes or operand forms.
- `raw_event_entry_word` keeps the compiler from losing bytes whose meaning is not yet settled.
## Operation Families Worth Lifting First
The current binary-side evidence supports lifting a small reversible operator set first:
- `intrinsic_call`
- `class_event_call`
- `append_unique_inline`
- `append_unique_indirect`
- `remove_matching_inline`
- `remove_matching_indirect`
- `materialize_or_forward_value`
- `prepend_inline_payload`
- `build_entity_link_matrix`
- `emit_or_pushback_result`
- `push_frame_word_literal`
- `compare_stream_dword_and_push_bool`
- `unknown_raw`
This is enough to represent the verified `000d:0988`, `000d:177c`, `000d:1acb`, `000d:208b`, `000d:21ed`, and `000d:22bc` families without pretending the whole VM is solved.
## Metadata That Must Survive Recompilation
The compiler side will need more than pretty script text. At minimum it must preserve:
- Original class ordering and sparse class ids
- Original class-name table slotting
- Raw class header bytes not yet semantically decoded
- Raw bytes `8..11` even when a derived `code_base_minus_one` is also stored
- Raw 6-byte event records, including the unknown leading word
- Exact event order within each class
- Exact code offsets or enough relocation data to rebuild them deterministically
- Intrinsic ordinals and argument-byte counts
- Width/sign information for immediates
- Inline versus indirect payload form
- String payload encoding and terminators
- Any unknown opcode byte sequences verbatim
If any of those are dropped, a source-level editor can still be readable, but it will stop being a trustworthy recompilation format.
## Practical Naming Policy
For near-term local RE and tooling:
- Use ScummVM event names as annotation labels for event slots.
- Store intrinsic names as hints attached to ordinals.
- Keep binary-facing renames driven by raw evidence, not by ScummVM alone.
- Treat `EVENT`, `_BOOT`, and `NPCTRIG` as the strongest current active-event families.
- Treat `JELYHACK` and `JELYH2` as referent-anchor classes, not standalone event records.
- Treat `SURCAMNS` and `SURCAMEW` as callback/eventTrigger holders, not proven active-event cores.
## Conservative Parser Rule To Adopt Now
For the current owner-loaded EUSECODE and round-trip IR work, the safest reversible rule is:
- Preserve the raw four-byte header field at bytes `8..11` as authoritative.
- Derive `code_base_minus_one = raw_u32_at_8_11 - 1` for code-addressing only.
- Derive `event_count = (raw_u32_at_8_11 - 20) / 6` only when that value is non-negative, divisible by `6`, and the resulting table end stays within the class object size.
- Treat each event entry as `u16 raw_event_entry_word + u32 raw_code_offset` at `class + 20 + 6 * slot`.
- Treat the event code offset as raw/opaque unless and until the code-addressing interpretation is needed; when needed, interpret it relative to `code_base_minus_one` so that offset `1` lands on the first code byte.
- If the divisibility or bounds checks fail, keep the class opaque and preserve raw bytes rather than forcing a guessed event count.
- `tools/extract_eusecode_flx.py` now implements this rule directly for the current owner-loaded EUSECODE work and emits `class_layout_index.tsv` plus `class_event_index.tsv` so raw header/event rows can be consumed by later IR tooling without re-deriving the arithmetic from prose.
## Remaining Binary-Side Gaps
The main blockers for a real round-trip compiler are still on the binary side:
- The meaning of the first two bytes in each 6-byte Crusader event record is still unverified.
- The exact provenance of ScummVM's current `get_class_event_count()` arithmetic is still unverified; current local evidence says the owner-loaded/raw records fit `raw_u32_at_8_11 = first_code_byte_offset`, while the ScummVM count formula appears sign-shifted relative to that layout.
- The upstream writer for selector local `[BP-0x32]` in the `000d:ebe3` sequencer is still unresolved.
- The full control-flow opcode set and branch encoding are not yet recovered.
- The exact on-disk source format behind `entity_vm_runtime_owner_resource_create` is still not identified.
- No direct descriptor-family to slot-mask mapping is proven yet.
- Callback/eventTrigger descriptors still do not have a callback-specific opcode family.
## Best Current Path
The strongest present path to a usable compiler/decompiler is:
1. Parse classes/events exactly as ScummVM does.
2. Keep the class/object indexing and event-entry layout from ScummVM, but use the conservative local event-count rule above for owner-loaded/raw class parsing until a main USECODE sample proves otherwise.
3. Decompile only the proven operator families into structured IR.
4. Preserve unknown bytes verbatim in place.
5. Attach ScummVM event and intrinsic names as hints, not as truth.
6. Recompile by rebuilding the original class header and event table layout first, then re-emitting decoded and opaque ops together.
That gets to a reversible editor sooner than waiting for a full semantic VM recovery.