- Enhance `extract_eusecode_flx.py` to derive class event rows with additional metadata including derived body windows and repeated template statuses. - Introduce `usecode_family_compare.py` for comparing event families, analyzing commonalities in event bodies, and generating reports on identical groups and differences. - Implement new data structures for managing class event rows and family artifact specifications. - Update output formats to include derived body information and repeated family regression checks. - Ensure robust validation of repeated family expectations against actual extracted data.
13 KiB
13 KiB
Crusader Decompilation Mid-Project Plan
Purpose
This file is the live mid-project tracker for the Crusader decompilation effort.
Keep it focused on:
- current verified state,
- active blockers,
- next resume work,
- and the remaining path to a reasonably complete decompilation.
Detailed completed analysis belongs in the files under docs/, not in this plan.
Progress Snapshot
- Overall useful decompilation progress: about 42%
- Reasonable uncertainty band: about 36% to 45%
- Top 100 far-call target coverage: about 80%
- Segment spread with meaningful analysis: about 22% to 28%
- Tooling maturity for continued work: about 75%
Why The Estimate Stays Here
- Recent work materially improved semantic confidence inside the startup/display, cache/allocator, callback-object, and USECODE/VM lanes.
- The startup/display lane now has a verified owner split:
g_active_dispatch_entry_farptr[+0x40]is a borrowed shared presentation hold token, while the seg1080x4f38bit-0x40lane stays local to the sprite/object stack. - The seg126 control stream is now tighter at the producer side too: the traced setup path still supports a shared-base-path file selector feeding a full external script/control buffer, the
0x6aa:0x6acbase now reads as an inherited external/default path buffer rather than a stronger in-code producer, and the in-scope0x31a2transition/presentation readers are now classified by role. - That work reduced ambiguity inside already-active clusters more than it expanded whole-program breadth, so the headline only moves modestly.
Current Verified State
Primary Tracking Assets
crusader_segment_coverage_ledger.csvnow exists for all 145 NE segments and should remain the primary coverage tracker.crusader_decompilation_notes.mdis now only an index; detailed evidence lives indocs/.- The raw full-EXE porting workflow is stable for the verified seg001 and seg021 mappings.
Strong Or Stable Areas
- seg001 gameplay/input/projectile work is deep enough to support verified raw-name ports.
- raw 0007 rendering/camera/tile-visibility work is structurally strong.
- 0008 dispatch-entry helpers and 000c state-machine helpers have broad partial coverage.
- 000a/000d tracked-handle, cache, allocator, dispatch-entry, and startup/display support lanes now have a coherent partial map.
- 000e parser and animation subsystems have a real partial map.
- The USECODE/VM owner/resource/runtime lane now has a workable partial model plus supporting extraction/reporting tooling.
Recently Closed Or No Longer Live
ASYLUM.24is resolved as_ASS_StopAllSFX; it is no longer an open plan item.- The cheat/input side lane is complete enough to leave the live queue.
- The segment coverage ledger is no longer a missing artifact; only refinement remains.
- The startup/display lane now has named outer shells (
startup_display_transition_prepare,startup_display_transition_driver) plus named seg126/127/136/137/138 helper families. The remaining work is higher-level ownership and state semantics, not basic structural recovery. - The top startup/display ownership question is now narrowed:
active_dispatch_entry_create_defaultownsg_active_dispatch_entry_farptr, while seg049/seg126/seg138 helpers only borrow or clear the shared byte+0x40; the seg1080x4f38lane is separate local sprite/object state. - The shared seg126 base-path question is effectively closed: literal-address search still shows no store into
0x6aa:0x6ac, seg004 only mutates the pointed buffer while separately assigning sibling root0x6ae:0x6b0, and the startup/display family continues to treat0x6aa:0x6acas an inherited mutable external/default base path. - The in-scope
0x31a2transition/presentation reader pass is complete: the remaining reads in this lane now split into edge wait, modal break, deferred dispatch/state advance, and cleanup-abort roles.
Live Blockers
- The startup/display transition lane still lacks exact higher-level owner/state labels across seg005, seg049, seg108, seg126, seg127, seg136, seg137, and seg138, even though the shared
g_active_dispatch_entry_farptr[+0x40]hold token is now separated from the seg108-local0x4f38bit-0x40lane. - The oversized overlap rooted at
000c:db68still blocks safe recovery of the realtransition_preentry_step_scriptbody. - The
0x4588callback object is better constrained but still not behaviorally classified enough for a confident subsystem rename. - The USECODE/VM sequencer still lacks the real upstream selector/caller path into
FUN_000d_ebe3, and wrappers0005:2c35/0005:2c68remain caller-dark. - High-value missing or weak function objects still exist in hot ranges such as
000b:2e00,0007:5a00, and000e:ffb0. - Non-CALLF far-pointer relocations and weakly covered resource/data loaders remain real second-pass blockers, even though they are not the first thing to attack.
Current Focus
- Finish the startup/display transition lane while it is still producing direct executable coverage.
- Continue the USECODE/VM lane only where it yields concrete caller, selector, or loader evidence rather than repeated direct-xref dead ends.
- Refine the coverage ledger from already-verified notes before broadening into fresh segment sweeps.
- Use boundary repair only on active blockers with clear payoff.
Next Resume Point
- Continue the adjacent seg126 startup/display clarification from the local three-way file-family selector at
000c:afa5..b152and nearby seg0490x2bd8dispatch sites, but only where it sharpens the validated presentation-handoff model without speculative renames. - Repair the
000c:db68overlap only if needed to splittransition_preentry_step_scriptinto its own clean function object and preserve the already-verified000c:ca1d..cd4fbody in Ghidra. - Classify the exact UI role of the paired
0x8c5c/0x8c60renderer presets, the+0x49selector states, and the neighboring seg127 fade inputs only where the caller evidence stays inside the same startup/display family. - Recover the real upstream caller/selector path into
FUN_000d_ebe3from persisted context/save/load or shared-consumer paths instead of repeating exhausted direct xref hunts. - Recover real caller roles for
0005:2c35and0005:2c68, now that both are narrowed to signed slot-offset wrappers feeding the VM context lane. - Clarify whether the seg070 twin loops at
0009:67b6and0009:6916represent two file families, two table formats, or two loader phases of the same helper behindentity_vm_runtime_owner_resource_create. - Promote additional ledger rows where the current docs already justify
Foothold,Partial, orDeep. - Revisit
000e:ffb0and other high-value overlap targets only after the current startup/display and VM lanes stop yielding near-term wins.
Remaining Work To Reach A Reasonably Complete Decompilation State
1. Coverage And Tracker Completion
- Promote the existing 145-row ledger from a seeded first pass into a trustworthy executable-wide coverage dashboard.
- Sweep untouched segments cluster-by-cluster instead of one-off function hunting, using adjacency and call relationships.
- Convert more segments from
NonetoFoothold/Partialwhere current notes already support it. - Close the largest remaining hot-target gaps so the far-call ranking list stays representative of real coverage.
- Keep the plan, docs, and ledger synchronized after each verified batch.
2. Startup/Display And Presentation Lane
- Finish semantic ownership across seg005, seg049, seg108, seg126, seg127, seg136, seg137, and seg138.
- Resolve the remaining role of the shared active-dispatch hold token versus local per-entry hold bytes.
- Recover the higher-level meaning of the file-backed seg126 control stream without speculating beyond verified byte behaviors.
- Classify the exact UI role of the paired
0x8c5c/0x8c60text-renderer lane if stronger caller evidence appears. - Finish the fade-controller producer path so seg127 fade inputs are tied to higher-level transition states, not only local opcodes.
- Classify
FUN_000d_938c,transition_preentry_release_resources, andentity_cleanup_resources_and_dispatchby role once their shared-hold semantics are fully separated. - Remove the remaining overlap blockers in this lane, with
000c:db68first.
3. VM / USECODE / Scripting Lane
- Recover the upstream selector into
FUN_000d_ebe3and map payload-shape handlers to real opcode dispatch. - Recover real caller roles for the dark mask wrappers
0005:2c35and0005:2c68. - Keep separating owner-table-backed
0x39carows from static dispatch-entry seed rows. - Finish classifying the seg069/070 helper behind
entity_vm_runtime_owner_resource_create. - Broaden owner-loaded class/event validation beyond the first strong sample families.
- Keep event-label mapping conservative: only promote ScummVM event names where binary behavior and slot reuse agree.
- Mature the reversible script IR until it can represent raw headers, event rows, payload forms, and unresolved opcodes without information loss.
- Continue extracting readable descriptor-family artifacts, but treat them as evidence aids rather than rename authority.
4. Cache / Allocator / Callback-Object Lane
- Finish classifying the object rooted at
0x4588so the allocator finalize path and callback emissions can receive behaviorally meaningful names. - Tighten the role of
allocator_phase_finalize_passonly where it intersects callback-object semantics or active runtime users. - Separate generic cache-manager mechanics from game-specific client behavior wherever caller evidence supports it.
- Clarify remaining object-role names around tracked handles, dispatch-entry lifecycle helpers, and palette-backed state builders.
- Keep
_ASS_StopAllSFXand the resolved audio-import lane closed; do not treat it as an open blocker again.
5. Rendering, Palette, Animation, And UI Support Lanes
- Finish the remaining caller-side semantics for raw 0007 rendering helpers, seg049 controller dispatch, seg108 sprite/object helpers, and seg137/138 palette state builders.
- Revisit
000e:ffb0and adjacent 000e video/animation overlap only when it blocks active analysis or offers a strong isolated win. - Expand the palette/VGA helper family only where it clarifies higher-level behavior rather than duplicating low-level helper names.
- Keep validating startup/display assumptions against raw 0007/0008/000d caller behavior instead of renaming isolated helpers in a vacuum.
6. Boundary Repair And Function Hygiene
- Create or repair missing function objects in the highest-traffic unresolved ranges first.
- Fix only overlaps that block live lanes or high-caller targets.
- Preserve conservative naming for repaired functions until direct caller or data evidence justifies promotion.
- Continue rejecting disproven ports or stale hypotheses instead of preserving them in live work queues.
7. Data, Imports, And Resource-Format Coverage
- Work through the deferred non-CALLF far-pointer relocations when they become necessary for object/table recovery.
- Expand coverage of weakly mapped resource/data loaders such as FLEX-derived descriptors, tables, caches, and per-shape data files.
- Cross-check current data-structure assumptions against external references like ScummVM only as supporting evidence, not as rename authority.
- Keep external import identities synchronized with verified import-table evidence.
8. Completion Criteria
A reasonably complete decompilation state should mean:
- most actively used subsystems are behaviorally named rather than only structurally named,
- the major live blockers (
000c:db68,000e:ffb0, hot missing function objects, dark VM selector path,0x4588object role) are either resolved or reduced to low-impact residuals, - the far-call hot list has very few meaningful unknowns left,
- the ledger gives a credible whole-program view rather than a sparse seed set,
- and the remaining gaps are mostly long-tail cleanup, low-traffic helpers, or data polish instead of core architecture uncertainty.
Priority Order
- Startup/display transition lane
- VM / USECODE selector and loader lane
- Coverage-ledger refinement from already-verified notes
- High-value overlap repair (
000c:db68, then000e:ffb0when justified) 0x4588callback-object classification- Broader segment sweeps and second-pass data/relocation work
Evidence Anchors
Primary files backing this plan state:
crusader_segment_coverage_ledger.csvcrusader_decompilation_notes.mddocs/overview.mddocs/raw-porting-progress.mddocs/raw-0008-000c.mddocs/raw-000a-000d.mddocs/raw-000e.mddocs/far-call-targets.mddocs/usecode-roundtrip-ir.mddocs/scummvm-crusader-reference.md
Update Rule
Update this file when one of the following happens:
- the headline estimate changes materially,
- a live blocker is resolved,
- a subsystem moves from structural to behavioral understanding,
- a segment cluster is promoted materially in the ledger,
- or the next resume point changes enough that the current handoff would mislead the next pass.
Keep the file short. Move detailed completed analysis into the appropriate file under docs/ and leave only the current state, blockers, and forward path here.