MaddoScientisto daa363c3d2 Add 'annotate-usecode' command to import USECODE IR JSON annotations

- Introduced a new command 'annotate-usecode' to import USECODE IR JSON annotation hints as Ghidra comments on compiled anchors.
- Added argument parsing for multiple IR JSON files, comment type selection, and a dry-run option.
- Implemented logic to read annotation records from the provided IR files and set comments on the corresponding addresses in Ghidra.
- Enhanced JSON schema to include response structure for the new command.

2026-03-24 18:14:20 +01:00

34 KiB

Raw Permalink Blame History

Raw 000e: Parser & RIFF/Animation Clusters

Content extracted from crusader_decompilation_notes.md. Covers the 000e: segment parser helper cluster and the RIFF/AVI animation streaming subsystem.

Raw 000e Parser Helper Cluster

A small helper cluster in the raw 000e: area implements a fixed-size CRLF record parser/table builder, likely used by startup/config or script-ish text data.

Newly renamed helpers

Address	Name
`000e:345e`	`record_table_init`
`000e:34cc`	`record_table_destroy`
`000e:35c6`	`record_table_release_buffer`
`000e:35ef`	`record_table_next_slot`
`000e:3639`	`record_table_parse_buffer`
`000e:3798`	`record_parser_read_line`
`000e:38a0`	`record_parser_seek_next_marker`
`000e:38f8`	`record_parser_find_marker`
`000e:39cc`	`record_parser_dispatch_at_directive`

Behavior notes

record_table_init clears the table header and zeroes 300 words of inline storage.
record_table_parse_buffer walks a CRLF-separated text buffer, captures each line, splits around a marker helper path, and stores parsed entry state into 0x0c-byte records.
record_parser_read_line advances to the next CRLF-delimited line, rejects lines that start with @ or with non-identifier punctuation, and terminates the line in-place with 0.
record_parser_seek_next_marker updates the parser's current marker cursor at +0x18/+0x1a by calling record_parser_find_marker; returns 1 if another marker was found, 0 at end-of-data.
record_parser_find_marker scans forward until an @ marker or end-of-data; optionally consumes the remaining length from the parser state.
record_parser_dispatch_at_directive returns 0 unless the current substring begins with @; in the @ case, it advances by 7 bytes and dispatches through a FAR thunk (0000:ffff).

EUSECODE.FLX extraction notes

USECODE/EUSECODE.FLX does not look like a loadable code image or plain text script. It is now validated as an indexed binary container.
Current table model:
- entry count at file offset 0x54
- entry table at 0x80
- 8-byte records: <u32 data_offset, u32 declared_size>
- entry_count = 3074
- table_end = 0x6090, which matches the first non-zero payload offset
- 403 non-zero entries in the current file
tools/extract_eusecode_flx.py now parses the full validated table and emits all 403 non-zero entries under USECODE/EUSECODE_extracted/, including entry_index.tsv, descriptor_index.tsv, descriptor_neighborhoods.tsv, summary.json, per-chunk .bin, and .strings.txt sidecars.
The extractor now also carries the conservative owner-loaded class rule directly into machine-readable outputs: class_layout_index.tsv records object_index, class_id, the raw bytes-8..11 field, derived code_base_minus_one, and conservative_event_count, while class_event_index.tsv expands parsed classes into raw 6-byte event rows with slot numbers, ScummVM event-name hints for 0x00..0x1f, unresolved leading words, raw code-offset dwords, derived body-window columns, and conservative repeated-template status tags for the verified repeated families.
The extractor now emits one concrete generated per-class decompile artifact for the cleanest repeated lane too: boot_family_decompile.md / .tsv render the five _BOOT classes slot-by-slot with raw row bytes, derived body windows, repeated-template status, and stable body digests.
The generated reports now expose lightweight descriptor summaries (primary_label, field_names, field_tags) so the object lane can be searched by field grammar instead of only by raw names.
The extracted data now separates into at least two lanes:
- text-heavy records that fit the 000e: CRLF parser model, such as DATALINK mission/objective text and TEXTFIL1 message banks
- binary object/behavior descriptors whose sidecars expose object names and field names, such as EVENT, NPCTRIG, CRUZTRIG, TRIGPAD, JELYHACK, JELYH2, SPECIAL, SURCAMNS, and SURCAMEW
The descriptor lane also shows a repeatable tagged field trailer rather than raw trailing strings only. Current spot-checks show patterns like 69 xx 00 <name> and 24 xx 02 <name> immediately before field names in NPCTRIG, CRUZTRIG, TRIGPAD, SPECIAL, and SFXTRIG. This is strong evidence that the field names belong to compact per-field metadata records, not accidental string leakage.
The strongest currently stable tag readings are:
- 69:0000 -> referent
- 69:0A00 -> event on event-capable classes such as EVENT, NPCTRIG, COR_BOOT, REE_BOOT, SFXTRIG, FLAMEBOX, NOSTRIL, VAR_BOOT, and STEAMBOX
- 24:FE02 / 24:FC02 / 24:FA02 on object-reference-like fields such as item, elev, door, source, dest, monster1, deadGuy, and related referent-style links
- 24:0A02 -> eventTrigger on SURCAMNS / SURCAMEW
The tag report is not a full type system yet, but it is already enough to separate scalar/event slots from pointer-like object links in many descriptor classes.
Confirmed descriptor examples from the full index:
- EVENT: referent,event,item,source,dest,door,counter,counter2,link,time,post1,post2,floor,flicMan
- NPCTRIG: referent,event,item,item2,typeNpc
- CRUZTRIG: referent,item,elev
- TRIGPAD: referent,item,elev
- JELYHACK: referent
- JELYH2: referent
- SURCAMNS / SURCAMEW: referent,textFile,monit,valueBox,passcode,link,code,screen,cameraEgg,trueRef,therma,eventTrigger,foundGun
Current immortality-lane status inside EUSECODE:
- the trigger/object namespace now clearly includes JELYHACK, NPCTRIG, CRUZTRIG, and TRIGPAD
- JELYHACK / JELYH2 sit in a local extraction neighborhood beside SPECIAL, TRIGPAD, DATALINK, HOFFMAN, REE_BOOT, SURCAMEW, and SFXTRIG, which looks more like a map/object grouping than random table order
- that neighborhood does not make JELYHACK itself event-bearing, but it does place it immediately beside multiple event-capable or trigger-adjacent classes (REE_BOOT, SFXTRIG, SURCAMEW.eventTrigger)
- the requested descriptor-family sweep now sharpens the nearby callable-body picture too: NPCTRIG is the only requested family here that is both explicitly event-bearing and non-empty in class_event_index.tsv (equip at slot 0x0a, plus anonymous slot 0x20), while SPECIAL, TRIGPAD, and REB_PAD have callable bodies but still look like state/controller or referent-neighbor records rather than direct event carriers
- the new generated immortality_target_body_scan.md / .tsv report now scans EVENT, NPCTRIG, COR_BOOT, REE_BOOT, SFXTRIG, SPECIAL, and TRIGPAD body windows directly for inline little-endian 0x0410, dword 0x00000410, and byte-swapped 0x1004
- that scan found zero literal hits in every currently targeted body, so no extracted target body is yet tied directly to event 0x410 by immediate-value evidence
- the TELEPAD slot-0x20 row with raw_code_offset = 0x00000410 in class_event_index.tsv is now closed as an offset collision, not proof that TELEPAD emits gameplay event 0x410
- the new body scan also narrows the frontier structurally: EVENT remains one monolithic slot-0x0a body (8150 bytes), NPCTRIG remains the strongest compact trigger frontier with slot 0x0a (373 bytes) plus slot 0x20 (345 bytes), and _BOOT slot pairs (COR_BOOT/REE_BOOT) stay near-template bodies rather than unique immortality emitters
- SPECIAL and TRIGPAD are now stronger negative controls too: both still have callable bodies, but the new literal scan found no inline 0x410 evidence there either
- the practical blocker is now narrower: the extractor no longer stops at body offsets only, but it still does not decode emitted payload values or bytecode operands inside the surviving EVENT slot-0x0a and NPCTRIG slot-0x0a / 0x20 frontier bodies
- one exact 0x410 collision in compiled code is now explained away: 000e:0953 pushes 0x410 into imported ASYLUM.27 from the animation audio-subframe path immediately after setting the local audio-completion byte at +0xef1. Since ASYLUM.DLL is the ASS_* audio/media library, treat this as a media ordinal/value collision rather than a second gameplay or USECODE event source.
- the present best reading is that 0x410 is likely carried by data relationships between generic event-capable descriptors (EVENT, NPCTRIG, SFXTRIG, etc.) and map/object references rather than by a plain-text script line
The 000e: record parser helpers still matter, but they now appear to cover only the text-oriented subset rather than the entire FLX payload. The strongest concrete caller so far is the raw window at 000e:1b9f..1d49, where record_table_parse_buffer is invoked after setup of fields that match the known animation object layout (+0x117/+0x11b/+0x11f/+0x123, +0xeaf/+0xeb1, +0x10f/+0x111). That makes the currently verified 000e:3639 consumer part of the animation-object lane, not a clean standalone EUSECODE loader.
This shifts the current working model: treat record_table_parse_buffer as a text/metadata helper used by at least one animation/resource object, while the EUSECODE binary descriptor lane is more likely consumed by the 000d VM/object interpreter path.
That 000d path is now materially less anonymous:
- the global runtime object at 0x6611 is now named entity_vm_runtime_create / entity_vm_runtime_init_slots / entity_vm_runtime_release_slots / entity_vm_runtime_destroy
- it owns the 0x80-entry slot table and a retained owner/resource object at +0x1315/+0x1317
- entity_vm_slot_index_from_entity and entity_vm_context_try_create_masked_for_entity show that gameplay entities are filtered through one owner-side slot-mask table before a context is created
- entity_vm_context_try_create_masked_for_entity is now better constrained too: after the owner-side mask check succeeds, an immediate-flagged context result clears the caller output word while an object-backed result returns the created object's low word
- entity_vm_context_create_from_slot_index then seeds one 0x6714 context from entity_vm_slot_load_value_plus_offset, while the large callers at 000d:208b and 000d:21ed continue by reading bytecode-like data from the seeded +0xd6/+0xd8 lane
The context lane now also has a separate referent-registry subsystem:
- entity_vm_set_field_da_to_global writes the current referent id to 0x8c94 from context field +0xda and then enters the still-misaligned 000c:3350 body
- entity_vm_referent_registry_init / entity_vm_referent_registry_destroy / entity_vm_referent_registry_alloc / entity_vm_referent_registry_release_by_id / entity_vm_referent_registry_free_node show that 0x8c8c/0x8c8e/0x8c90/0x8c94 implement one free-list-backed registry keyed by that current referent id
- this is the first solid runtime mechanism showing how referent-only descriptors can still drive script state even when the actual event field lives in a separate neighboring descriptor
- the registry now also has a named chain container layer: entity_vm_referent_chain_copy, entity_vm_referent_chain_append_unique_from, entity_vm_referent_chain_contains_entry, entity_vm_referent_chain_get_entry_data_at, and entity_vm_referent_chain_get_indirect_data show that one referent can own copied/deduplicated payload chains with either inline fixed-size payloads or indirect string-like payloads
That chain layer is now less one-sided than before:
- entity_vm_referent_chain_remove_matching_from (000d:6a9a) removes entries from one chain when they match a second chain, using either inline compare or indirect string compare depending on the chain type byte
- entity_vm_referent_chain_set_entry_data_at (000d:6cf6) updates the payload of the Nth chain entry in place, freeing old indirect payload storage first when needed
- entity_vm_opcode_finish (000d:3350) is now identified as the common opcode epilogue that writes 0x8c94 from the current frame result and unwinds the temporary slot-array state before returning the opcode result
That makes the emerging human-readable script model less ad hoc. A plausible future IR is now: referent anchor -> payload chain(s) -> event-bearing attachment(s) rather than a flat list of isolated descriptor rows.
The opcode side now reinforces that IR too: at least one handler family around 000d:0988 can either append unique payload entries or remove matching ones before returning through the same epilogue, which is a better fit for a graph-editing/object-attachment VM than for a pure linear trigger list.
That 000d:0988 family is now classified more tightly at the opcode-id level:
- opcode 0x19 = append unique indirect/string-like payload entries into the referent chain
- opcode 0x1a = remove matching indirect/string-like payload entries from the referent chain
- opcode 0x1b = remove matching inline/fixed-size payload entries from the referent chain
- the same helper body also implies the missing sibling 0x18 as the inline/fixed-size append-unique form, because only 0x19/0x1a set the indirect compare flag while only 0x1a/0x1b take the removal path
The first concrete 000c to 000d bridge inside that lane remains entity_vm_set_value_from_slot_plus_offset at 000c:f95f: it calls entity_vm_slot_load_value_plus_offset, stores its return pair into object fields +0xd6/+0xd8, and sits immediately beside other entity_vm_* helpers in the 000c:f6b8..f9d9 mini-VM cluster. On the 000d side, entity_vm_slot_load_value_plus_offset wraps entity_vm_slot_load_value, but the old PUSH 0x410 suspicion at 000d:5290 is now rejected: that site reaches the seg091 fatal-report helper family at 000a:44fd, not live gameplay dispatch.
The two main 000d caller blocks beneath that bridge now have a first stable byte/value reading too:
- internal block 000d:208b is the simple materialize-or-forward path: it creates one VM context from the caller's stream state, checks the returned object flags, and either writes the returned value pair straight to the caller output slot or forwards the created object's low word through the shared opcode epilogue
- internal block 000d:21ed is the inline-payload path: it creates the same VM context, prepends the caller-owned blob into the backward-growing context buffer at +0x102, then consumes two bytes from the seeded +0xd6/+0xd8 lane as small shape/count metadata before building an entity_link closure matrix from the following caller-stream words and pushing back the non-0x0400 results
- that is the first concrete evidence that the +0xd6/+0xd8 lane is not only carrying immediate event/value ids; it also carries compact metadata bytes that parameterize larger inline payloads copied from the caller stream
Current JELYHACK implication: because JELYHACK and JELYH2 still expose only referent, the most defensible model is now that they provide map/object identity into the referent-registry lane, while one adjacent event-capable record (REE_BOOT, SURCAMEW.eventTrigger, SFXTRIG.event, or another nearby generic EVENT/NPCTRIG) carries the actual event semantics that can eventually reach 0x410.
The immediate runtime-owner writer is now pinned down one step further too. entity_vm_runtime_create (000d:4c99) is the only verified writer of runtime +0x1315/+0x1317, and it does so by calling newly recovered entity_vm_runtime_owner_resource_create (000d:7000). That helper does not simply copy a caller-supplied owner table: it constructs one embedded seg069/070 helper object, queries the needed table size through vtable +0x04, allocates child +0x10/+0x12, then fills the 0x0d-stride per-slot producer records through vtable +0x0c. The paired release path is entity_vm_runtime_owner_resource_destroy (000d:70fd).
That narrows the owner/resource classification safely but still stops short of speculative source-format naming. The embedded helper goes through the same seg069/070 object lifecycle used by other file/resource-style helpers (0009:1c00 init, 0009:1800 destroy), so the most defensible current description is still runtime owner/resource helper rather than USECODE file loader or a descriptor-specific name.
The first gameplay-side mask families around entity_vm_context_try_create_masked_for_entity are also now explicit from instruction evidence:
- local wrapper 0004:f033 passes slot mask 0x8000:0007
- FUN_0004_f05c passes slot mask 0x2000:0015 and is reached from 0004:f2b3 after overlap/proximity checks plus entity byte +0x32 state toggling
- FUN_0005_27a4 passes slot mask 0x0001:0000 and is reached from the 000c:a09e entity +0x5b bit-0x0004 branch
Those masks are enough to prove that the runtime is exposing multiple gameplay-side materialization lanes into the same owner/resource table, but they are not yet enough to tie one lane specifically to the JELYHACK/JELYH2 anchor pair instead of the neighboring event-bearing descriptors (REE_BOOT, SURCAMEW, SFXTRIG, or another local trigger record).
The extractor now emits a first graph-oriented view of that claim too: referent_anchor_event_graph.tsv groups referent-bearing rows with nearby event-bearing neighbors, and jelyhack_island_graph.md renders the JELYHACK / JELYH2 island as edges to local descriptors. On the current data, the strongest event-bearing neighbors in that island are REE_BOOT (event), SURCAMEW (eventTrigger), and SFXTRIG (event).
The new focused comparison report (jelyhack_descriptor_compare.tsv) makes one more structural point explicit: JELYHACK and JELYH2 have identical first 16 header words and the same lone referent field tag, while differing only in the label string and one small trailing wx[...] literal. That strengthens the reading that they are sibling referent-anchor classes rather than separate event-bearing behavior records.
The same comparison also helps separate anchor classes from event-bearing neighbors: REE_BOOT, SURCAMEW, and SFXTRIG all carry materially richer header/state patterns than JELYHACK / JELYH2, which is consistent with them holding actual trigger or attachment semantics beside the anchor-only classes.
The 000d:21ed callee chain is now tighter too. The nested call at 0008:7d27 is entity_link, which appends one entity id into another entity's word-list and, unless bit 0x0400 is set, also updates the reciprocal pair-link slots. So the 22bc..2433 opcode block is best understood as building a bidirectional entity-link closure matrix from streamed entity ids, not merely copying opaque words around.
Ghidra now carries that interpretation as a conservative disassembly comment at 000d:22bc, but not yet as a symbol rename, because the surrounding 000d:208b/21ed/22bc region is still mis-split into artificial function bodies.
The new EVENT-focused reports (event_island_graph.md, event_descriptor_compare.tsv) broaden the descriptor-side picture beyond the JELYHACK anchor case. The strongest second island is the compact local cluster at indices 186..195, where COR_BOOT, EVENT, and NPCTRIG all expose explicit 69:0A00 -> event tags while ROLL_NS, CRUZTRIG, NPC_ONLY, and VMAIL stay on the referent/link/text side.
That cluster looks structurally different from JELYHACK in a useful way: EVENT is the large hub payload (0x20AA) carrying source, dest, door, link, time, counter, post1, post2, floor, and flicMan, while COR_BOOT and NPCTRIG are smaller event-bearing satellites and the surrounding records (ROLL_NS, CRUZTRIG, NPC_ONLY, VMAIL) look like attached state/trigger/object descriptors rather than alternate event cores.
The first compare pass on that island is already informative. COR_BOOT, EVENT, CRUZTRIG, NPC_ONLY, and VMAIL share the same leading 0x00000000 dword class shape, NPCTRIG moves to a nearby 0x00000001 shape, and ROLL_NS is the obvious outlier with first dword 0x00000002 plus rider/time/cargo fields. So the present best reading is one three-node event-bearing core embedded inside a wider referent-neighbor island, not one flat run of equivalent trigger records.
The extractor now also emits a global event-family pass (event_family_index.tsv, event_family_summary.md), which turns the local island findings into a wider descriptor taxonomy. Current validated families are:
- event-hub: EVENT
- boot-event-core: AND_BOOT, BRO_BOOT, COR_BOOT, VAR_BOOT, REE_BOOT
- npc-trigger: NPCTRIG
- minimal-event-core: SFXTRIG
- environmental-event: FLAMEBOX, NOSTRIL, STEAMBOX
- callback-eventtrigger: SURCAMNS, SURCAMEW
That split matters because it is the first extractor-backed distinction between active event carriers and callback-only trigger holders. The 69:0A00 -> event classes now look like the active event-bearing core of the descriptor system, while the surveillance classes with 24:0A02 -> eventTrigger are better treated as callback/attachment endpoints rather than peer event hubs.
The extractor now emits a stronger script-facing bridge artifact too: runtime_descriptor_family_rankings.md / .tsv rank those descriptor families against the verified runtime lanes instead of only listing neighborhoods. Current best fit is EVENT as the strongest active-event payload lane, _BOOT cores and NPCTRIG as strong satellites, SFXTRIG / environmental classes as moderate active-event fits, JELYHACK / JELYH2 as the dedicated referent-anchor lane, and SURCAM* as structurally distinct callback/attachment holders.
That ranking is anchored by the current owner-loader evidence as well as the descriptor grammar: 000d:44df -> 000d:4c99 -> 000d:7000 supplies the slot-backed source, and raw seg070 windows 0009:67b6 / 0009:6916 now show the embedded helper walking object +0x10/+0x18 tables, formatting per-entry paths, and open/read/close-loading files before the 0x0d-stride owner records are materialized.
The next focused pass tightened the _BOOT lane too. boot_family_compare.tsv now shows that all five _BOOT event cores (AND_BOOT, BRO_BOOT, COR_BOOT, VAR_BOOT, REE_BOOT) share the same header skeleton and the same compact field shape (referent,event,counter,item). The meaningful differences are payload size and local neighborhood, not descriptor schema.
The new boot_frontier_graph.md makes the best early _BOOT frontier explicit: AND_BOOT and BRO_BOOT sit in one compact referent-heavy neighborhood (OFFWORK, GUARD, GDOOR_N, GDOOR_E, BIGCAN, CRUMORPH, GUARDSQ, CARD_NS, CARD_EW, EWALLEW/EWALLNS) and also point directly at each other as adjacent event-bearing siblings. So the present best reading is a reusable boot-event core template instantiated in several different local object islands, not a set of unrelated boot scripts.
The environmental hazard lane is now similarly constrained. environmental_family_compare.tsv shows that FLAMEBOX and STEAMBOX are close structural siblings with the same active-event backbone (referent,event,<hazard>,<hazard2>,direction,count) and matching 24:0A02 / 24:FC02 / 24:FE02 object-link pattern, while NOSTRIL is a smaller fire-specific variant that keeps the active event plus dual fire references and count fields but drops the direction/newType side.
Their neighborhoods are different enough to matter: environmental_event_graph.md shows FLAMEBOX embedded among vent/door/bridge/copy records, NOSTRIL among flame/pad/desk/blaster/keypad records, and STEAMBOX among bounce/hover/fade/steam/flame box records. So this looks like one hazard-event descriptor family reused across distinct local object islands rather than one single environmental mega-cluster.
The callback lane is tighter too. callback_trigger_compare.tsv confirms that SURCAMNS and SURCAMEW are effectively the same callback-trigger template: identical field set (referent,textFile,monit,valueBox,passcode,link,code,screen,cameraEgg,trueRef,therma,eventTrigger,foundGun) and identical tag grammar except for the therma slot offset (24:F102 vs 24:F602). That keeps the eventTrigger split credible as a true callback/attachment lane rather than only a spelling variation on active event carriers.
Mining the new class_layout_index.tsv / class_event_index.tsv outputs now gives a first small safe set of repeated non-zero slot patterns:
- JELYHACK and JELYH2 are exact referent-anchor twins at the event-table level too: both have only slot 0x01 non-zero, with the same row 0x002A / 0x00000001.
- The five _BOOT event cores (AND_BOOT, BRO_BOOT, COR_BOOT, REE_BOOT, VAR_BOOT) all share the same three-slot pattern 0x0A / 0x0F / 0x10. The clearest exact repeated row is slot 0x10, where all five use raw_event_entry_word = 0x003B with class-specific code offsets.
- SURCAMNS and SURCAMEW share one exact five-slot callback pattern 0x01 / 0x0A / 0x20 / 0x21 / 0x22, including the same 0x0A = 0x00D1 / 0x00000001 anchor row and the same 0x22 event-table word 0x01A3.
- FLAMEBOX, NOSTRIL, and STEAMBOX share one environmental-event pattern 0x0A / 0x20 / 0x21, which is enough to treat the higher slots as real repeated structure even though the exact row values differ by class.
- EVENT and SFXTRIG both participate in the wide 0x0A lane, but that lane is broad enough that the slot number is currently more trustworthy than the ScummVM label attached to it.
The next body-window pass now confirms that repeated slot rows are usually near-templates rather than clones. Using body_start = code_base_minus_one + raw_code_offset and the next non-zero slot offset or chunk EOF as the body end:
- JELYHACK and JELYH2 slot 0x01 are both 42 bytes long with a shared 10-byte prefix and 17-byte suffix, but are not byte-identical.
- _BOOT slot 0x10 is a clean short-template lane: all five bodies are exactly 59 bytes long, share the same first 5 bytes and last 17 bytes, but each has a distinct digest.
- _BOOT slots 0x0A and 0x0F are larger variants of the same pattern: shared suffix-heavy structure, class-local middles, no exact clones.
- SURCAMNS and SURCAMEW slots 0x20 and 0x22 are same-length near-templates (698 and 419 bytes respectively), while slot 0x21 diverges more strongly (1801 vs 1621 bytes) even though it still keeps a common tail.
That makes the current best human-readable script model more precise: preserve repeated-family status and exact row bytes, but record byte-identity as a separate property so “same slot template” does not get mistaken for “same compiled body.”
That pattern pass materially improves what a decompiled USECODE script can look like right now. The honest current form is not a pretty source language; it is a reversible descriptor-plus-event-table rendering with raw slot ids, raw event-entry words, raw code offsets, and optional ScummVM labels marked as hints only. The concrete examples now live in docs/usecode-roundtrip-ir.md and are grounded in readable_script_ir.md, readable_descriptor_templates.md, and runtime_descriptor_family_rankings.md.
The first runtime-side follow-through on those descriptor gains is now a little tighter too. Instruction search around 000d:ebe3 confirms one fixed sequenced VM/opcode driver body, not just a vague constructor helper: it calls 000d:177c, 000d:1acb, 000d:0988, the internal 000d:22bc link-matrix block, then 000d:1d4a and 000d:2104 in order. The key negative result is just as useful: 000d:ec31 is only the internal CALL 000d:22bc site inside that body, not a standalone function entry.
Ghidra now carries that as a conservative disassembly comment at 000d:ebe3. That is still short of a safe rename, but it does promote the lane from “suspected constructor chain” to “verified ordered opcode/handler sequence,” which is the clearest current bridge from the descriptor-side event families back into the 000d VM/object runtime.

Raw 000e RIFF/Animation Cluster

The 000e: segment contains a RIFF/AVI streaming animation subsystem.

Animation object field map

Field offsets relative to the object base pointer:

Offset	Field
`+0xb0`	active/valid flag
`+0xb4`–`+0xc2`	constructor-initialized flags
`+0xd4`	alive sentinel (must be `-1` for "alive")
`+0xe4`	paused flag (`0` = running)
`+0xeaf`/`+0xeb1`	far pointer to current RIFF chunk
`+0xedb`	animation frame stack depth counter (max 9)
`+0xee1`	frame data from current chunk `+4`
`+0xeef`	current subframe index
`+0x1b3`	subframe count
`+0xef1`	audio completion flag
`+0x11b`	ring buffer write pointer
`+0x11f`	ring buffer read pointer
`+0x117`	ring buffer base
`+0x123`	ring buffer end (capacity boundary)
`+0x102`	resource pointer
`+0xde`	entry index (multiplied by `0x30` to reach per-entry data at `+0x1c7`)

RIFF format notes

The game uses standard RIFF/IFF:

LIST magic: 0x5453494c = "LIST"
RIFF magic: 0x46464952 = "RIFF"
"movi" FourCC subchunk for animation frames
Audio frames tagged "01wb" (0x62773130)
Video frames handled through a separate path

Newly renamed functions

Address	Name	Evidence
`000e:2a28`	`riff_find_chunk_by_type`	Walks RIFF LIST/RIFF chunk list; compares each node's FourCC at `+8` vs `param_2`; returns pointer to matching chunk or NULL
`000e:2104`	`animation_start`	Finds `"movi"` chunk via `riff_find_chunk_by_type`, inits ring buffer ptrs at `+0x11b` from `+0x117 + duration`, calls `animation_advance_frame`, loops `anim_load_audio_frame` and a second frame-loader thunk path per subframe
`000e:12f4`	`animation_advance_frame`	Fixed-point `0x1000` timer arithmetic; checks `+0xe4` (paused), advances ring buffer `+0x11b/+0x11f/+0x117/+0x123`; calls advance thunk
`000e:103f`	`animation_tick`	Guard wrapper: checks `param_1+0xd4 != -1`, then calls `animation_advance_frame(param_1, 0)`
`000e:06f7`	`anim_load_audio_frame`	Checks chunk tag == `0x62773130` (`"01wb"` = audio stream 1); computes ring buffer free space; copies chunk payload via `0x0000:ffff` thunk; increments subframe index at `+0xeef`; resets at subframe count `+0x1b3`
`000e:053d`	`anim_load_video_frame_wrapper`	Called once per subframe in `animation_start` immediately after `anim_load_audio_frame`; thin wrapper that forwards to `000e:ffb0`

Unresolved callee

000e:ffb0 remains unresolved (decompiles garbled due to overlapping instructions at 000f:0085/000f:0086). Current evidence from the animation_start loop suggests this path is the video-side subframe loader paired with anim_load_audio_frame.
The caller-side proof is now explicit enough to preserve that note in Ghidra too: animation_start invokes anim_load_video_frame_wrapper once per active subframe immediately after anim_load_audio_frame, and anim_load_video_frame_wrapper is only a thin forwarder to 000e:ffb0. Until the overlap is repaired, the safest label remains unresolved video-side subframe loader paired with the resolved audio-frame path.
A second caller pass tightens the local model without forcing a repair. search_instructions now shows anim_load_video_frame_wrapper is also called at 000e:11af and 000e:1245, not only from the startup prime loop at 000e:220c. In both of those additional windows the return value is checked as a success/failure result, which makes 000e:ffb0 look like an active chunk-consume/decode step rather than a passive notifier.
The strongest new evidence is the neighboring tag gate at 000e:121d..1234: after anim_load_audio_frame runs, the same lane checks the current RIFF chunk tag against 0x62643030 / 0x63643030 ("00db" / "00dc") before clearing the local busy flag and continuing. That is the first concrete caller-side clue that 000e:ffb0 is consuming AVI video-frame chunk types rather than some unrelated animation-side bookkeeping path.
Boundary analysis still reports one overlapped function object FUN_000e_ffb0 @ 000e:ffb0 body 000e:ffb0 - 000f:00e0, so the function remains comment-only for now. The useful gain is semantic: the unresolved body is now best described as video-side subframe loader/decoder for the 00db/00dc chunk lane, paired with anim_load_audio_frame.

Constructor pattern

All three constructor variants (000e:2777, 000e:2860, 000e:2969) follow the same layout:

Call FUN_000e_e935 (allocator — produces garbled 11KB decompile, not renamed)
Set fields +0xb4 through +0xc2 on the result
Call near target 000e:ebe3 directly (confirmed CALL sites at 000e:283e, 000e:2931, 000e:29e4; this is a separate mis-split 000e region, not FUN_000d_ebe3)
Call assert_alive_sentinel (assertion: checks +0xd4 != -1)
Call func_0x000eec83

The old assumption that these constructor calls fed the 000d VM sequencer is now retired. Raw instruction search shows the direct near calls land on 000e:ebe3, whose current body is still mis-split/garbled and cannot yet be tied to the 000d:177c / 000d:1acb / 000d:0988 / 000d:22bc / 000d:2104 chain.

The constructor-side field setup before that sequencer is now slightly tighter too:

variants A and B both set +0xc0 = 1 before the direct 000e:ebe3 call and derive +0xc2 from DS:0x604e
variant C instead sets +0xc0 = 0, +0xc2 = 1, and +0x4c = 0x000d before the same near-call lane
this remains useful for the animation subsystem, but it no longer counts as upstream xref evidence for FUN_000d_ebe3; the true selector/write path into the 000d dispatcher is still unresolved

Constructor variant renames

Address	Name
`000e:223d`	`assert_alive_sentinel`
`000e:2777`	`animation_ctor_variant_a`
`000e:2860`	`animation_ctor_variant_b`
`000e:2969`	`animation_ctor_variant_c`

34 KiB Raw Permalink Blame History Unescape Escape