54 KiB
USECODE Round-Trip IR Plan
Purpose
This note records the current evidence-backed path from Crusader USECODE bytes to a human-readable, editable, and recompilable script form.
It is intentionally conservative. ScummVM gives strong external anchors for the container layout, class/event numbering, and intrinsic naming, but it is not a symbol map for the DOS binary and it is not a ready-made round-trip compiler.
Externally Anchored Pieces
Container and class layout
ScummVM now gives a concrete second implementation for the Crusader USECODE class layout:
usecode/usecode_flex.cpptreats each class body as archive objectclassid + 2.- Class names come from archive object
1atname_object + 4 + 13 * classid. - For Crusader, the class base offset is read from class bytes
8..11and then decremented by1. - Crusader event count is computed as
(base_offset + 19) / 6. usecode/usecode.cppresolves eventNfrom class data at20 + 6 * N, with the code offset stored in bytes+2..+5of each 6-byte event record.
Combined with the already validated FLEX container notes, the current externally anchored container model is:
- FLEX entry count at
0x54 - FLEX table at
0x80 - USECODE class object index =
classid + 2 - Crusader class header contains a four-byte base-offset field at bytes
8..11 - Crusader event table entries are 6 bytes each, with a known dword code offset and an still-unknown leading word
ScummVM also makes one implementation choice explicit that matters for the current mismatch: uc_machine.cpp uses get_class_base_offset() as the execution-stream base for Crusader class code, not only as metadata for event counting. That means the obj[8..11] - 1 value is part of the live code-addressing model in ScummVM, not just a comment-level interpretation.
Binary-side validation against owner-loaded classes
The first direct local validation pass against sampled owner-loaded EUSECODE class records now splits the ScummVM model into two parts: one part is confirmed, and one part still needs reconciliation.
Confirmed on sampled records (EVENT, NPCTRIG, SURCAMNS, JELYHACK, REE_BOOT, SURCAMEW, SFXTRIG):
- The extracted chunk at table offset
0x88behaves like object1for class names. - For each sampled class body, deriving
object_index = (table_offset - 0x80) / 8, thenclass_id = object_index - 2, and then reading 13 bytes from object1at4 + 13 * class_idyields the expected class name. - The class bodies do have a stable 4-byte header field at bytes
8..11. - The region at
class + 20is a real 6-byte event-slot table withu16 unknown_word + u32 code_or_payload_fieldlayout.
Broader family spot-checks now keep the same local structure on the owner-loaded side. In addition to the first validated set, the nearby _BOOT and environmental event families (AND_BOOT, BRO_BOOT, COR_BOOT, VAR_BOOT, FLAMEBOX, NOSTRIL, STEAMBOX) continue to fit the same table_offset -> object_index -> class_id progression with a stable bytes-8..11 dword and a 6-byte table at +20. No contradictory sample has appeared in the local EUSECODE set.
Not yet reconciled with ScummVM's current formula note:
- In the sampled owner-loaded records, the raw dword at bytes
8..11is0x00d4,0x00da, or0x00e6. - Treating that dword directly as the first post-event-table offset makes the layout line up cleanly:
(dword_at_8 - 20) / 6gives 32, 33, or 35 valid slots in the samples. - Scanning instead with the previously noted ScummVM-style
(base_offset + 19) / 6interpretation overruns into inline payload and class-name bytes in the same samples.
Current best explanation:
- The mismatch is now best explained as a ScummVM interpretation/detail issue, not as a proven loader-side rewrite.
- The same ScummVM code path that decrements bytes
8..11by1also uses that decremented value as the code-stream base. On the local owner-loaded records, this fits naturally if the raw dword is the first code-byte offset and event-table dword offsets are 1-based relative tocode_base_minus_one. - Under that reading, the sampled event-count rule becomes
(code_base_minus_one - 19) / 6, which is exactly equivalent to(raw_u32_at_8_11 - 20) / 6and matches the validated32/33/35slot counts. - The
000dloader/runtime path (000d:44df -> 000d:4c99 -> 000d:7000 -> 000d:46ec) currently shows indexed file loading and slot-table materialization, but no verified per-class header rewrite before the VM consumes owner-backed records.
Current safe conclusion:
- The owner-loaded class records are compatible with
object 1names,classid + 2body lookup, a header field at bytes8..11, and 6-byte event records at+20. - The exact meaning of the bytes-
8..11field is now narrower: on the local owner-loaded records it is best read as the first code-byte offset, with ScummVM's decrementedbase_offsetacting as acode_base_minus_oneanchor for 1-based event code offsets. - The leading word of each 6-byte event entry remains unresolved.
VM/runtime model
ScummVM also anchors several VM behaviors that line up with the current raw-binary work:
usecode/uc_machine.cppusesByteSet(0x1000)for Crusader globals rather than the U8 bitset path.- Remorse initializes global
0x003cto avatar number1; Regret initializes0x001e. - Opcode
0x11is class/event dispatch in Crusader: the bytecode operand is an event number that is translated throughget_class_event()before execution.
That makes the current local reading stronger: the 000d runtime lane looks like a Crusader-specific object/event VM that should be interpreted against Crusader event ordinals, not against U8 assumptions.
Event names
convert/crusader/convert_usecode_crusader.h gives a named event table for ids 0x00..0x1f:
- Strongly usable names:
look,use,anim,setActivity,cachein,hit,gotHit,hatch,schedule,release,equip,unequip,combine,calledFromAnim,enterFastArea,leaveFastArea,cast,justMoved,avatarStoleSomething,animGetHit,unhatch - Weak placeholders remain for
0x0dand0x16..0x1f(func0D,func16..func1F)
This is enough to annotate event ordinals safely, but not enough to rename raw binary handlers unless local behavior matches.
Intrinsic tables
ScummVM provides two distinct kinds of intrinsic evidence:
convert/crusader/convert_usecode_crusader.handconvert_usecode_regret.hprovide ordinal-to-signature/name tables used for readable conversion.usecode/remorse_intrinsics.handusecode/regret_intrinsics.hprovide the live runtime dispatch tables.
The safe reading is:
- Remorse and Regret share the Crusader event-name table.
- Remorse and Regret do not share a single intrinsic numbering/signature map.
- Intrinsic names are strong hints for arity and broad subsystem identity, but they are still not direct rename authority for the DOS binary.
Safe Reuse Rules
Safe to import now
- Event names as labels for event ids
0x00..0x1fin parsers, reports, and note files. - Intrinsic ordinal names as
name_hintorsignature_hintmetadata when the ordinal and argument-byte pattern match. - High-level subsystem labels such as palette fade, camera, movie, audio, item/actor accessors, and weapon fire when they match existing binary evidence.
- Slot numbers from sampled owner-loaded classes even when the event name is still only a hint.
Not safe to claim yet
- Direct raw-function renames based only on ScummVM event or intrinsic names.
- Remorse intrinsic numbering from Regret tables, or vice versa.
- Specific descriptor-family to slot-mask mappings that are not yet proven on the binary side.
- Meanings for the unknown leading word in the 6-byte Crusader event table entries.
- That the ScummVM
get_class_event_count()formula applies unchanged to the sampled owner-loaded EUSECODE records.
IR Requirements For Round-Tripping
The first script IR should preserve exact recompilation inputs before it tries to look pretty.
Current Parser Views
The current proof-of-concept parser now emits three complementary views for a single class/slot body:
- JSON IR: the authoritative machine-facing output for tooling and any future assembler.
- Flat text listing: a byte-faithful decode with offsets, raw bytes, and trailer sections.
- Script view: a more readable block-labeled decompilation with locals, labels, and stack-VM statements.
- Pseudocode view: a higher-level decompilation that tries to collapse common compare ladders and stack expressions into programming-language-like control flow.
The script and pseudocode views are intentionally descriptive rather than authoritative. They are meant to help read bodies like NPCTRIG 0x0A or EVENT 0x0A without losing the exact JSON IR that a round-trip compiler will need.
Deferred Readability Follow-Ups
Keep these parser-facing readability tasks for later while the current focus stays on broad pseudocode export and class-family understanding:
- Replace unresolved
class_XXXX_slot_YYcall labels with behavior-backed names where the compiled/runtime evidence is strong enough. - Replace placeholder argument names such as
arg_06with semantic names inferred from stable usage patterns. - Detect more control-flow shapes beyond compare ladders, especially simple loops and early-return guards.
- Collapse common spawn/setup idioms into more domain-specific statements when the stack pattern is consistent.
- Run the pseudocode renderer across larger families like
EVENT,_BOOT, andSURCAM*and tighten the heuristics where they still leak VM structure. - Add small behavior-level comments only where they help explain gameplay meaning rather than VM mechanics.
Unit of decompilation
The IR should be organized as:
- USECODE archive
- class
- event slot
- instruction stream
That matches the externally anchored class/event layout and avoids baking in any still-unproven descriptor-to-runtime assumptions.
Required top-level records
Each class record should preserve:
class_idclass_object_index(classid + 2)name_slot_offset(4 + 13 * classidwithin object1)class_nameraw_header_prefixraw_code_base_u32code_base_minus_oneevent_countraw_event_table_bytes
Each event record should preserve:
event_idevent_name_hintraw_event_entry_wordcode_offsetraw_body_bytesdecoded_ops
IR v0 Shape
The IR should separate authoritative fields from friendly hints.
class:
class_id: 0x00be
class_name: EVENT
class_object_index: 0x00c0
raw_code_base_u32: 0x0138
code_base_minus_one: 0x0137
raw_header_prefix: <bytes>
events:
- event_id: 0x04
event_name_hint: cachein
raw_event_entry_word: 0x????
code_offset: 0x00001234
ops:
- op: intrinsic_call
intrinsic_ordinal: 0x001e
name_hint: Item::I_fireWeapon
signature_hint: Item::I_fireWeapon(Item *, x, y, z, byte, int, byte)
arg_bytes: 0x10
- op: vm_chain_mutation
vm_ir: APPEND_UNIQUE_INDIRECT
opcode_hint: 0x19
- op: unknown_raw
bytes: <exact original bytes>
Why this shape
event_name_hintis useful for humans but does not replace the event id.name_hintandsignature_hintare useful for intrinsics but do not replace the ordinal.unknown_rawgives a lossless fallback for still-unmapped opcodes or operand forms.raw_event_entry_wordkeeps the compiler from losing bytes whose meaning is not yet settled.
Operation Families Worth Lifting First
The current binary-side evidence supports lifting a small reversible operator set first:
intrinsic_callclass_event_callappend_unique_inlineappend_unique_indirectremove_matching_inlineremove_matching_indirectmaterialize_or_forward_valueprepend_inline_payloadbuild_entity_link_matrixemit_or_pushback_resultpush_frame_word_literalcompare_stream_dword_and_push_boolunknown_raw
This is enough to represent the verified 000d:0988, 000d:177c, 000d:1acb, 000d:208b, 000d:21ed, and 000d:22bc families without pretending the whole VM is solved.
Metadata That Must Survive Recompilation
The compiler side will need more than pretty script text. At minimum it must preserve:
- Original class ordering and sparse class ids
- Original class-name table slotting
- Raw class header bytes not yet semantically decoded
- Raw bytes
8..11even when a derivedcode_base_minus_oneis also stored - Raw 6-byte event records, including the unknown leading word
- Exact event order within each class
- Exact code offsets or enough relocation data to rebuild them deterministically
- Intrinsic ordinals and argument-byte counts
- Width/sign information for immediates
- Inline versus indirect payload form
- String payload encoding and terminators
- Post-
retdebug/local symbol trailers, including the local count byte and each per-local metadata row - Any unknown opcode byte sequences verbatim
If any of those are dropped, a source-level editor can still be readable, but it will stop being a trustworthy recompilation format.
Practical Naming Policy
For near-term local RE and tooling:
- Use ScummVM event names as annotation labels for event slots.
- Store intrinsic names as hints attached to ordinals.
- Keep binary-facing renames driven by raw evidence, not by ScummVM alone.
- Treat
EVENT,_BOOT, andNPCTRIGas the strongest current active-event families. - Treat
JELYHACKandJELYH2as referent-anchor classes, not standalone event records. - Treat
SURCAMNSandSURCAMEWas callback/eventTrigger holders, not proven active-event cores.
Repeated Slot Patterns Safe To Reuse Now
The latest pass over class_layout_index.tsv and class_event_index.tsv adds a small set of repeatable slot patterns that are safe enough to carry into decompiler output.
What is authoritative here:
- whether a class has a non-zero slot entry at a given slot id
- the raw
u16event word for that slot - the raw
u32code offset for that slot - repeated slot-set structure across several classes
What is still hint-level only:
- the ScummVM event-name labels for slots
0x00..0x1f - any mapping from one repeated slot directly to one recovered
000dopcode family - any claim that one repeated slot family is already tied to one exact gameplay subsystem in the DOS binary
Current small safe candidate sets:
| Family | Classes | Non-zero slots | Safe implication |
|---|---|---|---|
| referent-anchor twin | JELYHACK, JELYH2 |
0x01 only |
these are structurally anchor-only classes, not active event hubs |
| boot-event-core | AND_BOOT, BRO_BOOT, COR_BOOT, REE_BOOT, VAR_BOOT |
0x0A, 0x0F, 0x10 |
one reusable three-slot active-event core template |
| callback-eventtrigger | SURCAMNS, SURCAMEW |
0x01, 0x0A, 0x20, 0x21, 0x22 |
one shared callback-oriented multi-slot template |
| environmental-event | FLAMEBOX, NOSTRIL, STEAMBOX |
0x0A, 0x20, 0x21 |
one shared hazard/event template with two extra high slots |
| broad active-event lane | EVENT, SFXTRIG, and several non-island classes |
0x0A only |
slot 0x0A is widespread enough to treat as a real repeated event slot, but too broad to over-specialize |
Concrete repeated evidence worth preserving in IR:
JELYHACKandJELYH2both carry only slot0x01with the exact same row:raw_event_entry_word = 0x002A,raw_code_offset = 0x00000001.- The five
_BOOTcores all share slot0x10with the exact sameraw_event_entry_word = 0x003B, while theraw_code_offsetvaries by class (0x0000045conCOR_BOOT,0x0000048bonAND_BOOT,0x00000522onBRO_BOOT,0x000004dfonVAR_BOOT,0x000005a8onREE_BOOT). That is a good example of repeated structure without identical bodies. SURCAMNSandSURCAMEWshare the same five-slot layout and the same low/high anchor rows (0x0A = 0x00D1/0x00000001,0x22 = 0x01A3/...), but differ in the middle high-slot offsets. That looks like one shared callback template with instance-specific bodies, not two unrelated classes.FLAMEBOX,NOSTRIL, andSTEAMBOXall share one0x0Alow slot plus two extra high slots0x20and0x21. Their exact words differ, so the safe reading is shared layout, not identical compiled behavior.EVENTandSFXTRIGboth participate in the wide0x0Alane, but that family is broad enough that the slot number is more trustworthy than the ScummVM name hint.
Byte-Level Body Comparison Rules And Results
The next step after repeated row mining is to derive the chunk-local body window for each non-zero slot and compare the actual bytes instead of only the 6-byte event-table row.
Current conservative body-window rule:
body_start = code_base_minus_one + raw_code_offsetbody_end = code_base_minus_one + next_non_zero_raw_code_offsetin the same class, or chunk EOF when there is no later non-zero slot- this keeps the representation reversible because it is computed only from preserved header and event-table fields plus the raw chunk bytes
This rule is now carried directly by the extractor outputs instead of living only in notes:
USECODE/EUSECODE_extracted/class_event_index.tsvnow emitsderived_body_start,derived_body_end,derived_body_length, and conservativerepeated_template_statuscolumns per slot row.USECODE/EUSECODE_extracted/boot_family_decompile.md/.tsv,callback_family_decompile.md/.tsv, andenvironmental_family_decompile.md/.tsvnow provide concrete generated per-class decompile artifacts for the_BOOT,SURCAM*, and environmental repeated-family lanes, each grounded in emitted output rather than prose-only examples.USECODE/EUSECODE_extracted/repeated_family_regressions.tsvnow records and enforces the current repeated-family slot sets plus the verified raw-row and derived body-window fields forJELYHACK/JELYH2,_BOOT,SURCAMNS/SURCAMEW, andFLAMEBOX/NOSTRIL/STEAMBOXso extractor changes fail fast if those verified baselines drift.
What this confirms on the current repeated families:
JELYHACKandJELYH2slot0x01are exact row twins but not exact body twins. Both bodies are42bytes long, both start at0x00d4, both keepraw_event_entry_word = 0x002A, and both share a10-byte prefix plus a17-byte suffix. The first differences are at body offsets10,11,12,24, which is consistent with one reused mini-template carrying class-local literals rather than one identical compiled body._BOOTslot0x10is the cleanest repeated-body example. All five classes have a59-byte body, all share the same row word0x003B, all share the same first5bytes and the same last17bytes, and none are byte-identical across the family. This is strong evidence for one shared short-template tail with class-local identifiers or immediates in the middle._BOOTslots0x0Aand0x0Fshow the same pattern at larger sizes. Slot0x0Abodies range from551to843bytes and share only a3-byte prefix but a39-byte suffix; slot0x0Fbodies range from564to604bytes and share a3-byte prefix plus a38-byte suffix. These are repeated family bodies, but not clones.SURCAMNSandSURCAMEWhigh slots0x20and0x22also behave like near-templates, not clones. Slot0x20is698bytes in both classes with an11-byte common prefix and an84-byte common suffix. Slot0x22is419bytes in both classes with an11-byte common prefix and a53-byte common suffix.SURCAMslot0x21is the strongest within-family divergence in this batch.SURCAMNSuses row word0x0709and a body length of1801, whileSURCAMEWuses row word0x0655and a body length of1621. They still share a20-byte suffix, so this is best read as one callback-family slot with materially different instance bodies rather than a parsing mistake.
The practical IR consequence is important: repeated-family status should be recorded separately from byte-identity status. A human-readable decompile should be able to say “same family slot template” without falsely implying “same body bytes.”
What A Decompiled Script Looks Like Today
The most honest present-day decompilation is not a polished source language. It is a reversible descriptor-plus-event-table rendering with optional VM-op vocabulary attached where the 000d lane is already verified.
Level 0: Raw event row plus derived body window
This is the minimal human-usable row form. It preserves the original six-byte event entry, explains how the body window is derived, and records whether the slot looks like an exact twin, a near-template, or a unique body.
class_name: REE_BOOT
slot: 0x10
event_name_hint_scummvm: leaveFastArea
raw_event_entry_word: 0x003b
raw_code_offset: 0x000005a8
code_base_minus_one: 0x00d3
derived_body_start: 0x067b
derived_body_end: 0x06b6
derived_body_length: 59
repeated_template_status: boot-event-core/shared-slot-0x10
body_identity_status: non-identical; shared 5-byte prefix and 17-byte suffix across all five _BOOT bodies
body_sha1: 577c61e9c4c6...
Field meaning, using only what is currently verified:
class_name: authoritative class label from object1in the owner-loaded class tableslot: authoritative numeric slot id from the event table; this is safer than any guessed semantic nameevent_name_hint_scummvm: external label for slots0x00..0x1f; useful for orientation, not yet verified as the local class-specific meaningraw_event_entry_word: the unresolved leadingu16from the 6-byte event record; authoritative bytes, unresolved semanticsraw_code_offset: the authoritative rowu32; currently best read as a 1-based offset relative tocode_base_minus_onecode_base_minus_one: derived from bytes8..11in the class header using the current conservative rulederived_body_startandderived_body_end: computed chunk-local byte window for the slot body; useful for diffing and future recompilation, and now emitted directly in the extractor outputsrepeated_template_status: whether the row participates in a repeated family pattern such asJELYanchor twin,_BOOTevent core, orSURCAMcallback templatebody_identity_status: whether the extracted body bytes are exact twins, near-templates, or materially different within that familybody_sha1: stable digest for exact identity checks without pretending the digest itself has semantic meaning
Level 1: Lossless event-table IR
This is the form that is closest to a future round-trip compiler.
class:
entry_index: 0x0115
class_id: 0x04d3
class_name: JELYHACK
class_object_index: 0x04d5
raw_code_base_u32: 0x00d4
code_base_minus_one: 0x00d3
conservative_event_count: 32
descriptor_fields:
- referent
events:
- slot: 0x01
event_name_hint_scummvm: use
raw_event_entry_word: 0x002a
raw_code_offset: 0x00000001
derived_body_start: 0x00d4
derived_body_end: 0x00fe
derived_body_length: 42
repeated_template_status: referent-anchor-twin/shared-slot-0x01
body_identity_status: near-template-with-JELYH2
confidence: authoritative-bytes, hinted-label
IR v1 Parser Schema
The next tooling step changes the role of this document slightly. IR v0 was a note-level target for reversible human-readable output. IR v1 is the canonical machine-facing schema for the Pentagram-derived proof-of-concept parser and any future Ghidra annotation bridge.
The design constraints are now explicit:
- keep every authoritative owner-loaded byte visible
- keep slot identity separate from semantic name hints
- keep runtime-facing metadata visible even when the body decompiler cannot yet explain it
- preserve enough structure to emit Ghidra comments and bookmarks later without reparsing prose notes
Top-level IR object
schema_version: crusader-usecode-ir-v1-poc
source:
flex_path: USECODE/EUSECODE.FLX
extracted_root: USECODE/EUSECODE_extracted
chunk_file: USECODE/EUSECODE_extracted/chunks/chunk_191_table_1BA8_off_04C347_len_0003A8.bin
class:
entry_index: 191
object_index: 0x365
class_id: 0x363
class_name: NPCTRIG
raw_code_base_u32: 0x00da
code_base_minus_one: 0x00d9
conservative_event_count: 0x21
event:
slot: 0x0a
event_name_hint: equip
raw_event_entry_word: 0x013e
raw_code_offset: 0x00000001
derived_body_start: 0x00da
derived_body_end: 0x024f
derived_body_length: 373
repeated_template_status: ""
body:
end_reason: debug_symbols_then_end
raw_body_sha1: <digest>
unknown_trailing_bytes: ""
debug_symbol_offset: 0x0143
debug_symbol_count: 5
debug_symbols:
- index: 0x00
type_id: 0x69
bp_repr: [BP+00h]
name: referent
- index: 0x01
type_id: 0x69
bp_repr: [BP+0Ah]
name: event
ops:
- offset: 0x0000
absolute_body_offset: 0x00da
opcode: 0x5a
mnemonic: init
raw_bytes: 5a06
operands:
local_bytes: 0x06
- offset: 0x0011
absolute_body_offset: 0x00eb
opcode: 0x40
mnemonic: push_local_dword
raw_bytes: 40064c02
operands:
bp_offset: 0x06
annotation_hints:
runtime_family: slot-backed-owner-loaded-body
compiled_anchors:
- 000d:46ec
- 000d:0988
- 000d:208b
- 000d:21ed
- 000d:22bc
- 000d:2104
- 000d:ebe3
Required fields
source keeps the specific extracted artifact path so the parser output can always be checked against the raw chunk bytes.
class keeps the owner-loaded identity and header math already validated in the binary.
event keeps the exact six-byte row meaningfully split into authoritative fields plus the derived body window.
body records how far the parser got, whether the body terminated at a real 0x7a end marker, and whether a post-ret local/debug trailer was parsed instead of being misclassified as stray opcodes.
ops is intentionally lossless. Each decoded op keeps:
- body-relative offset
- absolute chunk-local offset
- raw opcode byte
- mnemonic
- exact raw bytes for the whole op
- parsed operands as typed fields
debug_symbols preserves the owner-loaded post-ret local metadata block. Current evidence from crusader-disasm and the live extracted chunks shows that many bodies end as: executable ops -> ret -> local/debug symbol rows -> 0x7a end. Those rows are not executable bytecode and should survive round-trip as structured metadata rather than raw tail bytes.
annotation_hints is the bridge to Ghidra. It is not a source-language feature. It exists so a later importer can attach the right comments and bookmarks to the compiled VM/runtime addresses without trying to infer them from free text.
Opcode result policy
The parser should use four result classes only:
decoded_op: normal parsed opcode with structured operandsunknown_opcode: one-byte opcode not yet modeled; stop or fall back conservativelyraw_tail: remaining undecoded bytes after a stop conditiondebug_blob: post-retlocal/debug trailer ending in0x7a
That keeps the IR trustworthy even before the whole Crusader VM is modeled.
Call-site hint policy
For call and spawn-family ops, the parser may attach:
target_class_idtarget_event_slottarget_event_name_hint
It should not attach a stronger semantic claim than that. The body parser is class/event aware, but not yet authoritative about gameplay meaning.
Annotation-hint schema
The Ghidra bridge should consume only small, stable items:
annotation_hints:
runtime_family: slot-backed-owner-loaded-body
payload_shape_hint: signed_word
compiled_anchors:
- address: 000d:46ec
role: context_create_from_slot
- address: 000d:0988
role: referent_chain_mutator
- address: 000d:208b
role: materialize_or_forward_value
- address: 000d:21ed
role: prepend_inline_payload
- address: 000d:22bc
role: matrix_pushback_stage
- address: 000d:2104
role: finalize_to_outptr
- address: 000d:ebe3
role: opcode_sequence_run
runtime_stage_hints:
- stage_address: 000d:0988
ir_name: APPEND_UNIQUE_INDIRECT
This is deliberately smaller than a full import format. It keeps the parser reusable even if the first Ghidra-side importer is only a comment/bookmark script.
That is already a real decompilation output. It keeps the exact slot id, the exact six-byte row contents, and the exact class-header facts, while refusing to pretend that use is already a proven semantic name for this class.
Here is the same style for one active event-bearing attachment class in the same island:
class:
entry_index: 0x011b
class_id: 0x04db
class_name: REE_BOOT
class_object_index: 0x04dd
raw_code_base_u32: 0x00d4
code_base_minus_one: 0x00d3
conservative_event_count: 32
descriptor_fields:
- referent
- event
- counter
- item
events:
- slot: 0x0a
event_name_hint_scummvm: equip
raw_event_entry_word: 0x034b
raw_code_offset: 0x00000001
derived_body_start: 0x00d4
derived_body_end: 0x041f
derived_body_length: 843
repeated_template_status: boot-event-core/shared-slot-0x0a
body_identity_status: same-family-body-not-identical
confidence: authoritative-bytes, hinted-label
- slot: 0x0f
event_name_hint_scummvm: enterFastArea
raw_event_entry_word: 0x025c
raw_code_offset: 0x0000034c
derived_body_start: 0x041f
derived_body_end: 0x067b
derived_body_length: 604
repeated_template_status: boot-event-core/shared-slot-0x0f
body_identity_status: same-family-body-not-identical
confidence: authoritative-bytes, hinted-label
- slot: 0x10
event_name_hint_scummvm: leaveFastArea
raw_event_entry_word: 0x003b
raw_code_offset: 0x000005a8
derived_body_start: 0x067b
derived_body_end: 0x06b6
derived_body_length: 59
repeated_template_status: boot-event-core/shared-slot-0x10
body_identity_status: same-family-body-not-identical
confidence: authoritative-bytes, hinted-label
And here is one callback-style multi-slot class, which shows why the high slots should stay numeric for now:
class:
entry_index: 0x011c
class_id: 0x04de
class_name: SURCAMEW
class_object_index: 0x04e0
raw_code_base_u32: 0x00e6
code_base_minus_one: 0x00e5
conservative_event_count: 35
descriptor_fields:
- referent
- textFile
- monit
- valueBox
- passcode
- link
- code
- screen
- cameraEgg
- trueRef
- therma
- eventTrigger
- foundGun
events:
- slot: 0x01
event_name_hint_scummvm: use
raw_event_entry_word: 0x00f7
raw_code_offset: 0x000000d2
- slot: 0x0a
event_name_hint_scummvm: equip
raw_event_entry_word: 0x00d1
raw_code_offset: 0x00000001
- slot: 0x20
event_name_hint_scummvm: null
raw_event_entry_word: 0x02ba
raw_code_offset: 0x000001c9
derived_body_start: 0x02ae
derived_body_end: 0x0568
derived_body_length: 698
repeated_template_status: callback-eventtrigger/shared-slot-0x20
body_identity_status: same-family-body-not-identical
- slot: 0x21
event_name_hint_scummvm: null
raw_event_entry_word: 0x0655
raw_code_offset: 0x00000483
derived_body_start: 0x0568
derived_body_end: 0x0bbd
derived_body_length: 1621
repeated_template_status: callback-eventtrigger/shared-slot-0x21
body_identity_status: same-family-body-not-identical
- slot: 0x22
event_name_hint_scummvm: null
raw_event_entry_word: 0x01a3
raw_code_offset: 0x00000ad8
derived_body_start: 0x0bbd
derived_body_end: 0x0d60
derived_body_length: 419
repeated_template_status: callback-eventtrigger/shared-slot-0x22
body_identity_status: same-family-body-not-identical
The extra derived fields are worth keeping because they answer the immediate human question that the bare event table does not: not only “which slots exist,” but also “how much body belongs to each slot” and “whether this body is a true clone or only a same-family variant.”
Level 2: Friendly but still reversible hinted form
This is the highest-level script shape that is justified right now.
anchor JELYHACK(referent)
# authoritative event rows for the anchor itself
slot 0x01 hint=use? raw_word=0x002A code_off=0x00000001 body=0x00D4..0x00FE family=JELY-anchor identity=near-template-with-JELYH2
# nearby attachment classes from the same local island
attach REE_BOOT(referent,event,counter,item)
slot 0x0A hint=equip? raw_word=0x034B code_off=0x00000001 body=0x00D4..0x041F family=_BOOT-core identity=shared-template-not-clone
slot 0x0F hint=enterFastArea? raw_word=0x025C code_off=0x0000034C body=0x041F..0x067B family=_BOOT-core identity=shared-template-not-clone
slot 0x10 hint=leaveFastArea? raw_word=0x003B code_off=0x000005A8 body=0x067B..0x06B6 family=_BOOT-core identity=shared-template-not-clone
callback SURCAMEW(referent,textFile,monit,valueBox,passcode,link,code,screen,cameraEgg,trueRef,therma,eventTrigger,foundGun)
slot 0x01 hint=use? raw_word=0x00F7 code_off=0x000000D2 body=0x01B7..0x02AE
slot 0x0A hint=equip? raw_word=0x00D1 code_off=0x00000001 body=0x00E6..0x02AE
slot 0x20 raw_word=0x02BA code_off=0x000001C9 body=0x02AE..0x0568 family=SURCAM-callback identity=shared-template-not-clone
slot 0x21 raw_word=0x0655 code_off=0x00000483 body=0x0568..0x0BBD family=SURCAM-callback identity=shared-template-with-stronger-divergence
slot 0x22 raw_word=0x01A3 code_off=0x00000AD8 body=0x0BBD..0x0D60 family=SURCAM-callback identity=shared-template-not-clone
attach SFXTRIG(referent,event)
slot 0x0A hint=equip? raw_word=0x00B8 code_off=0x00000001
This is decompiled enough to read, diff, and later recompile because it preserves:
- the original class identity
- the exact non-zero event rows
- the derived chunk-local body window for each row
- which names are authoritative fields versus external hints
- which nearby descriptors appear to be anchors, active event attachments, or callback attachments
- whether a repeated family slot is an exact twin or only a structurally similar body
Level 2.5: Human annotation layer
The last layer is prose, not syntax. It should explain the honest current reading of each field so a modder can see what is safe to edit and what still needs caution.
- Class name is authoritative at the container level: it comes from the owner-loaded class-name table and is not a guess.
- Slot id is authoritative at the event-table level: this is the safest event identifier currently available.
- Event-name hint is external: use it as orientation only when the slot is inside
0x00..0x1fand the local behavior has not yet been reverified in binary. - Raw event word is authoritative but semantically unresolved: it must survive round-trip intact.
- Raw code offset is authoritative and operational: combined with
code_base_minus_one, it tells us where the slot body starts in the chunk. - Body-window length is derived but useful: it tells a human whether a slot is a tiny stub-like record or a large body that deserves its own diff or annotation block.
- Repeated-template status is about family structure, not byte identity: a
_BOOTslot can be “the same template role” without being byte-equal across classes. - Body-identity status answers the concrete modding question “am I looking at a clone, a parameterized variant, or a different body that only occupies the same family slot?”
Level 3: Where the current VM IR can be attached
For classes in the active-event ecosystems (EVENT, _BOOT, NPCTRIG, SFXTRIG, and the environmental family), the current 000d work is strong enough to attach the known operator vocabulary without pretending one exact class-to-opcode decode already exists.
vm_effect_possible:
APPEND_UNIQUE_INLINE
APPEND_UNIQUE_INDIRECT
REMOVE_MATCHING_INDIRECT
REMOVE_MATCHING_INLINE
MATERIALIZE_OR_FORWARD_VALUE
PREPEND_INLINE_PAYLOAD
BUILD_ENTITY_LINK_MATRIX
EMIT_OR_PUSHBACK_RESULT
FINALIZE_MIXED_VALUE_TO_OUTPTR
That operator block is authoritative as a recovered VM vocabulary, but only ecosystem-level when attached to one specific descriptor family.
Binary-side slot and payload-shape evidence to preserve in IR
The current VM pass also adds one useful binary-side rule for the higher event ordinals: the compiled wrapper family distinguishes slot identity from payload shape, and that distinction should survive in any round-trip IR even when the human label stays unresolved.
Verified current ladder around 0005:3115..31da:
- slot
0x10: guarded callsite only, zero extra word, packed mask0x00010000 - slot
0x11: named wrapperentity_vm_context_try_create_mask_00020000_slot11_with_offset, one caller-supplied extra word - slot
0x12: named wrapperentity_vm_context_try_create_mask_00040000_slot12, zero extra word - slot
0x13: named wrapperentity_vm_context_try_create_mask_00080000_slot13_with_offset_if_valid_entity, one sign-extended extra word after an entity-validity gate - slot
0x14: named wrapperentity_vm_context_try_create_mask_00100000_slot14_with_offset, one caller-supplied extra word
Why this matters for the IR:
- It is direct binary evidence that some higher Crusader slot ordinals are already grouped by argument shape before any descriptor-family mapping is proven.
- That means the IR should preserve
slot_idpluspayload_shapeindependently instead of collapsing everything into one guessed event-name table. - It also gives a bounded way to cross-check external event signatures without over-trusting them: slot
0x12fits a zero-arg event shape, slot0x13fits a one-word event shape, and slot0x14currently conflicts with Pentagram's older zero-arganimGetHit()note.
Practical annotation rule to adopt now:
- keep higher-slot labels binary-stable as
slot 0x10..slot 0x14unless local behavior closes the label - attach external event names only as hints
- attach one small
payload_shape_hintfield such asnone,word, orsigned_word
Minimal hinted example:
slot_record:
slot_id: 0x13
event_name_hint: avatarStoleSomething
payload_shape_hint: signed_word
binary_anchor: 0005:31da
wrapper_name: entity_vm_context_try_create_mask_00080000_slot13_with_offset_if_valid_entity
The same pass also hardens one existing IR operator boundary: the 000d:22bc stage is now comment-backed in Ghidra as a matrix/pushback consumer over decoded workspace bytes, not a direct descriptor-row reader. The current safe attachment point is therefore still decoded VM workspace -> link-matrix stage, not NPCTRIG row -> direct entity-link emission.
Conservative Parser Rule To Adopt Now
For the current owner-loaded EUSECODE and round-trip IR work, the safest reversible rule is:
- Preserve the raw four-byte header field at bytes
8..11as authoritative. - Derive
code_base_minus_one = raw_u32_at_8_11 - 1for code-addressing only. - Derive
event_count = (raw_u32_at_8_11 - 20) / 6only when that value is non-negative, divisible by6, and the resulting table end stays within the class object size. - Treat each event entry as
u16 raw_event_entry_word + u32 raw_code_offsetatclass + 20 + 6 * slot. - Treat the event code offset as raw/opaque unless and until the code-addressing interpretation is needed; when needed, interpret it relative to
code_base_minus_oneso that offset1lands on the first code byte. - If the divisibility or bounds checks fail, keep the class opaque and preserve raw bytes rather than forcing a guessed event count.
tools/extract_eusecode_flx.pynow implements this rule directly for the current owner-loaded EUSECODE work and emitsclass_layout_index.tsvplusclass_event_index.tsvso raw header/event rows can be consumed by later IR tooling without re-deriving the arithmetic from prose.
Remaining Binary-Side Gaps
The main blockers for a real round-trip compiler are still on the binary side:
- The meaning of the first two bytes in each 6-byte Crusader event record is still unverified.
- The exact provenance of ScummVM's current
get_class_event_count()arithmetic is still unverified; current local evidence says the owner-loaded/raw records fitraw_u32_at_8_11 = first_code_byte_offset, while the ScummVM count formula appears sign-shifted relative to that layout. - The upstream writer for selector local
[BP-0x32]in the000d:ebe3sequencer is still unresolved. - The full control-flow opcode set and branch encoding are not yet recovered.
- The exact on-disk source format behind
entity_vm_runtime_owner_resource_createis still not identified. - No direct descriptor-family to slot-mask mapping is proven yet.
- Callback/eventTrigger descriptors still do not have a callback-specific opcode family.
Best Current Path
The strongest present path to a usable compiler/decompiler is:
- Parse classes/events exactly as ScummVM does.
- Keep the class/object indexing and event-entry layout from ScummVM, but use the conservative local event-count rule above for owner-loaded/raw class parsing until a main USECODE sample proves otherwise.
- Decompile only the proven operator families into structured IR.
- Preserve unknown bytes verbatim in place.
- Attach ScummVM event and intrinsic names as hints, not as truth.
- Recompile by rebuilding the original class header and event table layout first, then re-emitting decoded and opaque ops together.
That gets to a reversible editor sooner than waiting for a full semantic VM recovery.
Recent Research (2026-03-26)
- Root Cause:: The structuring pass left forward/back-edge loops and counted-loop headers detached in fallback output, which produced unstructured pseudocode for some bodies (notably BART slot 0x0F).
- Renderer Fixes:: Added a conservative loop-lifting helper and a restricted infinite-loop lift in the partial fallback renderer to fold loops into structured blocks where safe. See the modified renderer at tools/poc_crusader_usecode_parser.py.
- Validator Added:: A lightweight pseudocode syntax/label validator was added to detect brace mismatches and missing goto/label targets before exporting pseudocode.
- Tests:: Added and adjusted unit tests in tools/tests/test_usecode_structuring.py to guard loop-lifting behavior and fallback conservatism.
- Corpus Validation:: Ran a corpus-wide render+validator pass over 977 decoded bodies; result:
TOTAL_BODIES=977, FAILURES=0(no syntax/label failures). - Real-World Output:: Regenerated the BART pseudocode file — USECODE/EUSECODE_extracted/pseudocode/BART/slot_0F_enterFastArea.txt now shows an outer
while(true)with nested structured branches and counted loops instead of detached labels. - Scope & Safety:: Fully-structured renderer remains conservative; the loop-lifting helper is reused where safe. The outer infinite-loop lift was narrowed to partial fallback after tests revealed regressions when it was too broad.
- Remaining Semantic Gap:: Expression/comparison operand polarity still needs correction (some counted-loop conditions show inverted comparisons). Next work: fix operand ordering in the expression builder so loop headers reflect correct comparison direction.
- Next Steps:: (1) Implement compare-direction fix in the expression builder and add small semantic regression tests, (2) re-run unit tests and a corpus-wide render+validate sweep, (3) regenerate affected pseudocode files for inspection.
- Files of Interest:: tools/poc_crusader_usecode_parser.py, tools/tests/test_usecode_structuring.py, USECODE/EUSECODE_extracted/pseudocode/BART/slot_0F_enterFastArea.txt.
Recent Renderer Work (2026-03-31)
- Opcode Status:: The map renderer was already loading the recovered JP opcode table from usecode_opcodes.txt; no additional opcode-name integration was required in this pass.
- VM Semantics Fix:: The JS renderer in src/lib/usecode-decompiler.js now follows the Pentagram/ScummVM VM for two core cases: opcode
0x24 cmpis equality, not inequality, and opcode0x51 IFis a relative branch on false, not on true. - Readability Impact:: False branches are now emitted with the negated high-level condition, so the existing structurer can recover counted loops as
while (counter <= limit)instead of the previously invertedwhile (counter > limit)pattern. - Regression Coverage:: Added a focused renderer-side regression script at scripts/test-usecode-structuring.mjs to guard one equality-based selector case and one counted-loop case.
- Next Steps:: Rebuild a fresh renderer usecode cache and inspect representative families like
BART,_BOOT, andEVENTfor any remaining cases where other compare producers still leak VM-oriented phrasing.
Recent Renderer Work (2026-04-01)
- Root Cause Closed::
TRIGGER.slot_20was still falling back to labeled blocks in the map-viewer JS renderer because the full structurer did not preserve outer exit labels across nested regions, did not lift rawforeach_list ... -> exititerator loops, and only recognized bare one-linereturn;blocks as return exits. - Renderer Fixes:: src/lib/usecode-decompiler.js now propagates enclosing exit labels through nested structured regions, lifts raw
foreach_list/foreach_slistloops into structuredwhile (true)bodies, and treats comment-prefixed cleanup-plus-return blocks such as/* free_local_list */+return;as real return targets for control-flow recovery. - Readability Impact:: The remorse cache file TRIGGER/slot_20_slot_20.txt now renders as one structured function: the initial phase/setup lane is straight-line
if/else, the middle search fan-out is structured nested conditionals, the nearby0x04B1scan is a realfor item in nearby_items(...)loop, and the follow-up low-priority trigger worklist is a structured fixed-pointwhile (1)loop rather than detachedblock_XXXXlabels. - Regression Coverage:: scripts/test-usecode-structuring.mjs now covers three additional generic structuring cases and one real-data regression that decodes
STATIC/EUSECODE.FLX, rebuilds the liveTRIGGER.slot_20IR, and asserts that the rendered pseudocode no longer falls back to block labels orgoto block_...jumps. - Binary / Ghidra Impact:: This pass tightened renderer-side control-flow recovery only. It did not add a new compiled-side VM decode, so no new Ghidra rename or comment was applied in
CRUSADER.EXEduring this batch. - Additional Root Cause Closed::
BLASTPAC.slot_01still kept loose blocks after the earlier trigger pass because the full structurer treatedgotoedges that jumped exactly to the current region end label as unstructured rather than as normal join exits. That blocked both thenearby_items(shape=0x053A, origin=global[0x003C])selector loop body and the later target/crouch join chain from collapsing. - Additional VM Evidence:: ScummVM's Crusader VM remains the strongest external semantics anchor for this lane: uc_machine.cpp shows opcode
0x51as a relative branch on false, opcode0x73asloopnextpushing a loop-valid flag and freeing the temporary list when exhausted, and opcodes0x75/0x76as real foreach iterators that keep the loop frame live until completion and then pop it before jumping to the exit target. - Additional Renderer Fix:: src/lib/usecode-decompiler.js now also treats jumps to the current structured-region end label as exits, which lets selector-loop bodies and nested join-heavy
if/elseregions close cleanly without falling back to raw block labels. - Additional Readability Impact:: The remorse cache file BLASTPAC/slot_01_use.txt now renders as straight structured pseudocode: the
shape 0x053Asearch is a realfor item in nearby_items(...)loop, the inner retry lane stays a structured counted loop, and the latertarget/InCrouchpath is one nestedif/elsetree rather than detachedblock_0415,block_046e,block_05c5, andblock_061dislands. - Additional Regression Coverage:: scripts/test-usecode-structuring.mjs now adds one focused synthetic
region-end gotoregression plus one real-dataBLASTPAC.slot_01regression, and the current focused suite passes after regenerating the cache. - Current Binary / Ghidra State:: The compiled-side anchor is still the existing
000d:ebe3sequencer note, and this batch still did not recover a new compiled opcode handler. A matching live decompiler comment was added at000d:ebe3to record the ScummVM-backed loop/branch contract used by the current BLASTPAC/TRIGGER selector-loop recovery (0x51false-branch,0x73loopnext validity/free behavior,0x75/0x76foreach iteration contract).
Recent Renderer Work (2026-04-01, list + selector follow-up)
- List Opcode Evidence Closed:: ScummVM's live Crusader VM in uc_machine.cpp confirms opcode
0x0Ebuilds a new list fromcountstack values ofelement_size, and opcode0x17concatenates two list ids by appending the top list into the next list and pushing the combined result. - Renderer Fixes:: src/lib/usecode-decompiler.js now lifts
create_listinto list literals such as[item]andappend_listinto list concatenation expressions instead of leaving raw comment placeholders. That closes the common temporary-worklist patterns in bridge/trigger/free scripts where the old output showed/* create_list */and/* append_list */immediately before an assignment. - False-Branch Fix:: The same renderer pass now treats compound boolean expressions conservatively when inverting Crusader's
0x51false-branch. For simple comparisons it still flips the operator directly, but for composed&&/||expressions it now emits a whole-expression negation rather than corrupting the leftmost compare. This fixes the brokenBRO_BOOT.slot_0Fentry test that previously rendered asglobal[0x001f] != 2 || global[0x001f] == 3 ...even though the bytecode is a plain OR-chain of equality compares. - Selector Readability:: Long same-selector equality ladders that share one join target now render as
switch (...)blocks when every branch is a simple equality case. The immediate real-data win is BRO_BOOT/slot_0A_equip.txt, whose repeatedglobal[0x001f] == Nmovie dispatch chain now decompiles as a switch instead of sixelse ifarms. - BRO_BOOT Structuring Impact:: With the compound-condition fix in place, BRO_BOOT/slot_0F_enterFastArea.txt is expected to collapse into one structured
if/elsearound the twoSPANELscans plus the trailing infinite animation loop, instead of keepingentry:/block_0454-style fallbacks. - Regression Coverage:: scripts/test-usecode-structuring.mjs now adds synthetic regressions for list-literal lifting, compound false-branch negation, and switch rendering, plus real-data regressions for
BRO_BOOT.slot_0AandBRO_BOOT.slot_0F.
Recent Renderer Work (2026-04-02, CHANGER selector close)
- Root Cause Closed::
CHANGER.slot_07in both Remorse and Regret was still rendering aswhile (condition)because the JS loop-selector decoder only recognized the older field-match selectors such asnearby_items(shape=...)and the opaqueselector_0x42(...)fallback. The CHANGER bodies use a different selector family:loopscr 0x24plusloopscr 0x4c, with a hardcoded shape whitelist left on the stack, a computed search distance (100 * 32), and the egg item as origin. Because that selector family was not decoded, the renderer could not surface the roof-target scan clearly enough to decompile or visualize. - Renderer Fix:: src/lib/usecode-decompiler.js now recognizes that stacked-shape whitelist selector and emits readable loops such as
for roof in nearby_items(shapes=[...], distance=(100 * 32), origin=arg_06)instead of collapsing back towhile (condition). - Readability Impact:: The cached Remorse and Regret
CHANGER.slot_07pseudocode bodies now expose the actual nearby-roof selector inputs directly: the hardcoded roof-shape whitelist, the recovered3200-unit range, and the egg-origin scan. That makes the laterItem.getQLo(...) == eggIddestroy branch legible without a raw-byte fallback pass. - Editor Impact:: The same selector close justified promoting Regret
QLo 8 -> CHANGERfrom tooltip-only metadata into the map editor overlay. The viewer can now expose the same local roof-target lane for Regret that was already proven for Remorse, using the recovered Regret whitelist and the same3200-unit scan distance. - Regression Coverage:: scripts/test-usecode-structuring.mjs now adds one synthetic regression for the
loopscr 0x24/0x4cstacked-shape selector and one real-data regression for RegretCHANGER.slot_07, so future renderer changes fail if this selector family falls back to opaque loop output again.