Crusader_Decomp/plan-mid.md
MaddoScientisto 4d3c8cd81b Add detailed class event processing and family comparison tools
- Enhance `extract_eusecode_flx.py` to derive class event rows with additional metadata including derived body windows and repeated template statuses.
- Introduce `usecode_family_compare.py` for comparing event families, analyzing commonalities in event bodies, and generating reports on identical groups and differences.
- Implement new data structures for managing class event rows and family artifact specifications.
- Update output formats to include derived body information and repeated family regression checks.
- Ensure robust validation of repeated family expectations against actual extracted data.
2026-03-22 23:24:46 +01:00

13 KiB

Crusader Decompilation Mid-Project Plan

Purpose

This file is the live mid-project tracker for the Crusader decompilation effort.

Keep it focused on:

  1. current verified state,
  2. active blockers,
  3. next resume work,
  4. and the remaining path to a reasonably complete decompilation.

Detailed completed analysis belongs in the files under docs/, not in this plan.

Progress Snapshot

  • Overall useful decompilation progress: about 42%
  • Reasonable uncertainty band: about 36% to 45%
  • Top 100 far-call target coverage: about 80%
  • Segment spread with meaningful analysis: about 22% to 28%
  • Tooling maturity for continued work: about 75%

Why The Estimate Stays Here

  • Recent work materially improved semantic confidence inside the startup/display, cache/allocator, callback-object, and USECODE/VM lanes.
  • The startup/display lane now has a verified owner split: g_active_dispatch_entry_farptr[+0x40] is a borrowed shared presentation hold token, while the seg108 0x4f38 bit-0x40 lane stays local to the sprite/object stack.
  • The seg126 control stream is now tighter at the producer side too: the traced setup path still supports a shared-base-path file selector feeding a full external script/control buffer, the 0x6aa:0x6ac base now reads as an inherited external/default path buffer rather than a stronger in-code producer, and the in-scope 0x31a2 transition/presentation readers are now classified by role.
  • That work reduced ambiguity inside already-active clusters more than it expanded whole-program breadth, so the headline only moves modestly.

Current Verified State

Primary Tracking Assets

  • crusader_segment_coverage_ledger.csv now exists for all 145 NE segments and should remain the primary coverage tracker.
  • crusader_decompilation_notes.md is now only an index; detailed evidence lives in docs/.
  • The raw full-EXE porting workflow is stable for the verified seg001 and seg021 mappings.

Strong Or Stable Areas

  • seg001 gameplay/input/projectile work is deep enough to support verified raw-name ports.
  • raw 0007 rendering/camera/tile-visibility work is structurally strong.
  • 0008 dispatch-entry helpers and 000c state-machine helpers have broad partial coverage.
  • 000a/000d tracked-handle, cache, allocator, dispatch-entry, and startup/display support lanes now have a coherent partial map.
  • 000e parser and animation subsystems have a real partial map.
  • The USECODE/VM owner/resource/runtime lane now has a workable partial model plus supporting extraction/reporting tooling.

Recently Closed Or No Longer Live

  • ASYLUM.24 is resolved as _ASS_StopAllSFX; it is no longer an open plan item.
  • The cheat/input side lane is complete enough to leave the live queue.
  • The segment coverage ledger is no longer a missing artifact; only refinement remains.
  • The startup/display lane now has named outer shells (startup_display_transition_prepare, startup_display_transition_driver) plus named seg126/127/136/137/138 helper families. The remaining work is higher-level ownership and state semantics, not basic structural recovery.
  • The top startup/display ownership question is now narrowed: active_dispatch_entry_create_default owns g_active_dispatch_entry_farptr, while seg049/seg126/seg138 helpers only borrow or clear the shared byte +0x40; the seg108 0x4f38 lane is separate local sprite/object state.
  • The shared seg126 base-path question is effectively closed: literal-address search still shows no store into 0x6aa:0x6ac, seg004 only mutates the pointed buffer while separately assigning sibling root 0x6ae:0x6b0, and the startup/display family continues to treat 0x6aa:0x6ac as an inherited mutable external/default base path.
  • The in-scope 0x31a2 transition/presentation reader pass is complete: the remaining reads in this lane now split into edge wait, modal break, deferred dispatch/state advance, and cleanup-abort roles.

Live Blockers

  1. The startup/display transition lane still lacks exact higher-level owner/state labels across seg005, seg049, seg108, seg126, seg127, seg136, seg137, and seg138, even though the shared g_active_dispatch_entry_farptr[+0x40] hold token is now separated from the seg108-local 0x4f38 bit-0x40 lane.
  2. The oversized overlap rooted at 000c:db68 still blocks safe recovery of the real transition_preentry_step_script body.
  3. The 0x4588 callback object is better constrained but still not behaviorally classified enough for a confident subsystem rename.
  4. The USECODE/VM sequencer still lacks the real upstream selector/caller path into FUN_000d_ebe3, and wrappers 0005:2c35 / 0005:2c68 remain caller-dark.
  5. High-value missing or weak function objects still exist in hot ranges such as 000b:2e00, 0007:5a00, and 000e:ffb0.
  6. Non-CALLF far-pointer relocations and weakly covered resource/data loaders remain real second-pass blockers, even though they are not the first thing to attack.

Current Focus

  1. Finish the startup/display transition lane while it is still producing direct executable coverage.
  2. Continue the USECODE/VM lane only where it yields concrete caller, selector, or loader evidence rather than repeated direct-xref dead ends.
  3. Refine the coverage ledger from already-verified notes before broadening into fresh segment sweeps.
  4. Use boundary repair only on active blockers with clear payoff.

Next Resume Point

  1. Continue the adjacent seg126 startup/display clarification from the local three-way file-family selector at 000c:afa5..b152 and nearby seg049 0x2bd8 dispatch sites, but only where it sharpens the validated presentation-handoff model without speculative renames.
  2. Repair the 000c:db68 overlap only if needed to split transition_preentry_step_script into its own clean function object and preserve the already-verified 000c:ca1d..cd4f body in Ghidra.
  3. Classify the exact UI role of the paired 0x8c5c / 0x8c60 renderer presets, the +0x49 selector states, and the neighboring seg127 fade inputs only where the caller evidence stays inside the same startup/display family.
  4. Recover the real upstream caller/selector path into FUN_000d_ebe3 from persisted context/save/load or shared-consumer paths instead of repeating exhausted direct xref hunts.
  5. Recover real caller roles for 0005:2c35 and 0005:2c68, now that both are narrowed to signed slot-offset wrappers feeding the VM context lane.
  6. Clarify whether the seg070 twin loops at 0009:67b6 and 0009:6916 represent two file families, two table formats, or two loader phases of the same helper behind entity_vm_runtime_owner_resource_create.
  7. Promote additional ledger rows where the current docs already justify Foothold, Partial, or Deep.
  8. Revisit 000e:ffb0 and other high-value overlap targets only after the current startup/display and VM lanes stop yielding near-term wins.

Remaining Work To Reach A Reasonably Complete Decompilation State

1. Coverage And Tracker Completion

  • Promote the existing 145-row ledger from a seeded first pass into a trustworthy executable-wide coverage dashboard.
  • Sweep untouched segments cluster-by-cluster instead of one-off function hunting, using adjacency and call relationships.
  • Convert more segments from None to Foothold / Partial where current notes already support it.
  • Close the largest remaining hot-target gaps so the far-call ranking list stays representative of real coverage.
  • Keep the plan, docs, and ledger synchronized after each verified batch.

2. Startup/Display And Presentation Lane

  • Finish semantic ownership across seg005, seg049, seg108, seg126, seg127, seg136, seg137, and seg138.
  • Resolve the remaining role of the shared active-dispatch hold token versus local per-entry hold bytes.
  • Recover the higher-level meaning of the file-backed seg126 control stream without speculating beyond verified byte behaviors.
  • Classify the exact UI role of the paired 0x8c5c / 0x8c60 text-renderer lane if stronger caller evidence appears.
  • Finish the fade-controller producer path so seg127 fade inputs are tied to higher-level transition states, not only local opcodes.
  • Classify FUN_000d_938c, transition_preentry_release_resources, and entity_cleanup_resources_and_dispatch by role once their shared-hold semantics are fully separated.
  • Remove the remaining overlap blockers in this lane, with 000c:db68 first.

3. VM / USECODE / Scripting Lane

  • Recover the upstream selector into FUN_000d_ebe3 and map payload-shape handlers to real opcode dispatch.
  • Recover real caller roles for the dark mask wrappers 0005:2c35 and 0005:2c68.
  • Keep separating owner-table-backed 0x39ca rows from static dispatch-entry seed rows.
  • Finish classifying the seg069/070 helper behind entity_vm_runtime_owner_resource_create.
  • Broaden owner-loaded class/event validation beyond the first strong sample families.
  • Keep event-label mapping conservative: only promote ScummVM event names where binary behavior and slot reuse agree.
  • Mature the reversible script IR until it can represent raw headers, event rows, payload forms, and unresolved opcodes without information loss.
  • Continue extracting readable descriptor-family artifacts, but treat them as evidence aids rather than rename authority.

4. Cache / Allocator / Callback-Object Lane

  • Finish classifying the object rooted at 0x4588 so the allocator finalize path and callback emissions can receive behaviorally meaningful names.
  • Tighten the role of allocator_phase_finalize_pass only where it intersects callback-object semantics or active runtime users.
  • Separate generic cache-manager mechanics from game-specific client behavior wherever caller evidence supports it.
  • Clarify remaining object-role names around tracked handles, dispatch-entry lifecycle helpers, and palette-backed state builders.
  • Keep _ASS_StopAllSFX and the resolved audio-import lane closed; do not treat it as an open blocker again.

5. Rendering, Palette, Animation, And UI Support Lanes

  • Finish the remaining caller-side semantics for raw 0007 rendering helpers, seg049 controller dispatch, seg108 sprite/object helpers, and seg137/138 palette state builders.
  • Revisit 000e:ffb0 and adjacent 000e video/animation overlap only when it blocks active analysis or offers a strong isolated win.
  • Expand the palette/VGA helper family only where it clarifies higher-level behavior rather than duplicating low-level helper names.
  • Keep validating startup/display assumptions against raw 0007/0008/000d caller behavior instead of renaming isolated helpers in a vacuum.

6. Boundary Repair And Function Hygiene

  • Create or repair missing function objects in the highest-traffic unresolved ranges first.
  • Fix only overlaps that block live lanes or high-caller targets.
  • Preserve conservative naming for repaired functions until direct caller or data evidence justifies promotion.
  • Continue rejecting disproven ports or stale hypotheses instead of preserving them in live work queues.

7. Data, Imports, And Resource-Format Coverage

  • Work through the deferred non-CALLF far-pointer relocations when they become necessary for object/table recovery.
  • Expand coverage of weakly mapped resource/data loaders such as FLEX-derived descriptors, tables, caches, and per-shape data files.
  • Cross-check current data-structure assumptions against external references like ScummVM only as supporting evidence, not as rename authority.
  • Keep external import identities synchronized with verified import-table evidence.

8. Completion Criteria

A reasonably complete decompilation state should mean:

  • most actively used subsystems are behaviorally named rather than only structurally named,
  • the major live blockers (000c:db68, 000e:ffb0, hot missing function objects, dark VM selector path, 0x4588 object role) are either resolved or reduced to low-impact residuals,
  • the far-call hot list has very few meaningful unknowns left,
  • the ledger gives a credible whole-program view rather than a sparse seed set,
  • and the remaining gaps are mostly long-tail cleanup, low-traffic helpers, or data polish instead of core architecture uncertainty.

Priority Order

  1. Startup/display transition lane
  2. VM / USECODE selector and loader lane
  3. Coverage-ledger refinement from already-verified notes
  4. High-value overlap repair (000c:db68, then 000e:ffb0 when justified)
  5. 0x4588 callback-object classification
  6. Broader segment sweeps and second-pass data/relocation work

Evidence Anchors

Primary files backing this plan state:

  • crusader_segment_coverage_ledger.csv
  • crusader_decompilation_notes.md
  • docs/overview.md
  • docs/raw-porting-progress.md
  • docs/raw-0008-000c.md
  • docs/raw-000a-000d.md
  • docs/raw-000e.md
  • docs/far-call-targets.md
  • docs/usecode-roundtrip-ir.md
  • docs/scummvm-crusader-reference.md

Update Rule

Update this file when one of the following happens:

  • the headline estimate changes materially,
  • a live blocker is resolved,
  • a subsystem moves from structural to behavioral understanding,
  • a segment cluster is promoted materially in the ledger,
  • or the next resume point changes enough that the current handoff would mislead the next pass.

Keep the file short. Move detailed completed analysis into the appropriate file under docs/ and leave only the current state, blockers, and forward path here.