Crusader_Decomp/plan-mid.md

189 lines
13 KiB
Markdown
Raw Normal View History

# Crusader Decompilation Mid-Project Plan
## Purpose
This file is the live mid-project tracker for the Crusader decompilation effort.
Keep it focused on:
1. current verified state,
2. active blockers,
3. next resume work,
4. and the remaining path to a reasonably complete decompilation.
Detailed completed analysis belongs in the files under `docs/`, not in this plan.
## Progress Snapshot
- Overall useful decompilation progress: about 42%
- Reasonable uncertainty band: about 36% to 45%
- Top 100 far-call target coverage: about 80%
- Segment spread with meaningful analysis: about 22% to 28%
- Tooling maturity for continued work: about 75%
### Why The Estimate Stays Here
- Recent work materially improved semantic confidence inside the startup/display, cache/allocator, callback-object, and USECODE/VM lanes.
- The startup/display lane now has a verified owner split: `g_active_dispatch_entry_farptr[+0x40]` is a borrowed shared presentation hold token, while the seg108 `0x4f38` bit-`0x40` lane stays local to the sprite/object stack.
- The seg126 control stream is now tighter at the producer side too: the traced setup path still supports a shared-base-path file selector feeding a full external script/control buffer, the `0x6aa:0x6ac` base now reads as an inherited external/default path buffer rather than a stronger in-code producer, and the in-scope `0x31a2` transition/presentation readers are now classified by role.
- That work reduced ambiguity inside already-active clusters more than it expanded whole-program breadth, so the headline only moves modestly.
## Current Verified State
### Primary Tracking Assets
- `crusader_segment_coverage_ledger.csv` now exists for all 145 NE segments and should remain the primary coverage tracker.
- `crusader_decompilation_notes.md` is now only an index; detailed evidence lives in `docs/`.
- The raw full-EXE porting workflow is stable for the verified seg001 and seg021 mappings.
### Strong Or Stable Areas
- seg001 gameplay/input/projectile work is deep enough to support verified raw-name ports.
- raw 0007 rendering/camera/tile-visibility work is structurally strong.
- 0008 dispatch-entry helpers and 000c state-machine helpers have broad partial coverage.
- 000a/000d tracked-handle, cache, allocator, dispatch-entry, and startup/display support lanes now have a coherent partial map.
- 000e parser and animation subsystems have a real partial map.
- The USECODE/VM owner/resource/runtime lane now has a workable partial model plus supporting extraction/reporting tooling.
### Recently Closed Or No Longer Live
- `ASYLUM.24` is resolved as `_ASS_StopAllSFX`; it is no longer an open plan item.
- The cheat/input side lane is complete enough to leave the live queue.
- The segment coverage ledger is no longer a missing artifact; only refinement remains.
- The startup/display lane now has named outer shells (`startup_display_transition_prepare`, `startup_display_transition_driver`) plus named seg126/127/136/137/138 helper families. The remaining work is higher-level ownership and state semantics, not basic structural recovery.
- The top startup/display ownership question is now narrowed: `active_dispatch_entry_create_default` owns `g_active_dispatch_entry_farptr`, while seg049/seg126/seg138 helpers only borrow or clear the shared byte `+0x40`; the seg108 `0x4f38` lane is separate local sprite/object state.
- The shared seg126 base-path question is effectively closed: literal-address search still shows no store into `0x6aa:0x6ac`, seg004 only mutates the pointed buffer while separately assigning sibling root `0x6ae:0x6b0`, and the startup/display family continues to treat `0x6aa:0x6ac` as an inherited mutable external/default base path.
- The in-scope `0x31a2` transition/presentation reader pass is complete: the remaining reads in this lane now split into edge wait, modal break, deferred dispatch/state advance, and cleanup-abort roles.
## Live Blockers
1. The startup/display transition lane still lacks exact higher-level owner/state labels across seg005, seg049, seg108, seg126, seg127, seg136, seg137, and seg138, even though the shared `g_active_dispatch_entry_farptr[+0x40]` hold token is now separated from the seg108-local `0x4f38` bit-`0x40` lane.
2. The oversized overlap rooted at `000c:db68` still blocks safe recovery of the real `transition_preentry_step_script` body.
3. The `0x4588` callback object is better constrained but still not behaviorally classified enough for a confident subsystem rename.
4. The USECODE/VM sequencer still lacks the real upstream selector/caller path into `FUN_000d_ebe3`, and wrappers `0005:2c35` / `0005:2c68` remain caller-dark.
5. High-value missing or weak function objects still exist in hot ranges such as `000b:2e00`, `0007:5a00`, and `000e:ffb0`.
6. Non-CALLF far-pointer relocations and weakly covered resource/data loaders remain real second-pass blockers, even though they are not the first thing to attack.
## Current Focus
1. Finish the startup/display transition lane while it is still producing direct executable coverage.
2. Continue the USECODE/VM lane only where it yields concrete caller, selector, or loader evidence rather than repeated direct-xref dead ends.
3. Refine the coverage ledger from already-verified notes before broadening into fresh segment sweeps.
4. Use boundary repair only on active blockers with clear payoff.
## Next Resume Point
1. Continue the adjacent seg126 startup/display clarification from the local three-way file-family selector at `000c:afa5..b152` and nearby seg049 `0x2bd8` dispatch sites, but only where it sharpens the validated presentation-handoff model without speculative renames.
2. Repair the `000c:db68` overlap only if needed to split `transition_preentry_step_script` into its own clean function object and preserve the already-verified `000c:ca1d..cd4f` body in Ghidra.
3. Classify the exact UI role of the paired `0x8c5c` / `0x8c60` renderer presets, the `+0x49` selector states, and the neighboring seg127 fade inputs only where the caller evidence stays inside the same startup/display family.
4. Recover the real upstream caller/selector path into `FUN_000d_ebe3` from persisted context/save/load or shared-consumer paths instead of repeating exhausted direct xref hunts.
5. Recover real caller roles for `0005:2c35` and `0005:2c68`, now that both are narrowed to signed slot-offset wrappers feeding the VM context lane.
6. Clarify whether the seg070 twin loops at `0009:67b6` and `0009:6916` represent two file families, two table formats, or two loader phases of the same helper behind `entity_vm_runtime_owner_resource_create`.
7. Promote additional ledger rows where the current docs already justify `Foothold`, `Partial`, or `Deep`.
8. Revisit `000e:ffb0` and other high-value overlap targets only after the current startup/display and VM lanes stop yielding near-term wins.
## Remaining Work To Reach A Reasonably Complete Decompilation State
### 1. Coverage And Tracker Completion
- Promote the existing 145-row ledger from a seeded first pass into a trustworthy executable-wide coverage dashboard.
- Sweep untouched segments cluster-by-cluster instead of one-off function hunting, using adjacency and call relationships.
- Convert more segments from `None` to `Foothold` / `Partial` where current notes already support it.
- Close the largest remaining hot-target gaps so the far-call ranking list stays representative of real coverage.
- Keep the plan, docs, and ledger synchronized after each verified batch.
### 2. Startup/Display And Presentation Lane
- Finish semantic ownership across seg005, seg049, seg108, seg126, seg127, seg136, seg137, and seg138.
- Resolve the remaining role of the shared active-dispatch hold token versus local per-entry hold bytes.
- Recover the higher-level meaning of the file-backed seg126 control stream without speculating beyond verified byte behaviors.
- Classify the exact UI role of the paired `0x8c5c` / `0x8c60` text-renderer lane if stronger caller evidence appears.
- Finish the fade-controller producer path so seg127 fade inputs are tied to higher-level transition states, not only local opcodes.
- Classify `FUN_000d_938c`, `transition_preentry_release_resources`, and `entity_cleanup_resources_and_dispatch` by role once their shared-hold semantics are fully separated.
- Remove the remaining overlap blockers in this lane, with `000c:db68` first.
### 3. VM / USECODE / Scripting Lane
- Recover the upstream selector into `FUN_000d_ebe3` and map payload-shape handlers to real opcode dispatch.
- Recover real caller roles for the dark mask wrappers `0005:2c35` and `0005:2c68`.
- Keep separating owner-table-backed `0x39ca` rows from static dispatch-entry seed rows.
- Finish classifying the seg069/070 helper behind `entity_vm_runtime_owner_resource_create`.
- Broaden owner-loaded class/event validation beyond the first strong sample families.
- Keep event-label mapping conservative: only promote ScummVM event names where binary behavior and slot reuse agree.
- Mature the reversible script IR until it can represent raw headers, event rows, payload forms, and unresolved opcodes without information loss.
- Continue extracting readable descriptor-family artifacts, but treat them as evidence aids rather than rename authority.
### 4. Cache / Allocator / Callback-Object Lane
- Finish classifying the object rooted at `0x4588` so the allocator finalize path and callback emissions can receive behaviorally meaningful names.
- Tighten the role of `allocator_phase_finalize_pass` only where it intersects callback-object semantics or active runtime users.
- Separate generic cache-manager mechanics from game-specific client behavior wherever caller evidence supports it.
- Clarify remaining object-role names around tracked handles, dispatch-entry lifecycle helpers, and palette-backed state builders.
- Keep `_ASS_StopAllSFX` and the resolved audio-import lane closed; do not treat it as an open blocker again.
### 5. Rendering, Palette, Animation, And UI Support Lanes
- Finish the remaining caller-side semantics for raw 0007 rendering helpers, seg049 controller dispatch, seg108 sprite/object helpers, and seg137/138 palette state builders.
- Revisit `000e:ffb0` and adjacent 000e video/animation overlap only when it blocks active analysis or offers a strong isolated win.
- Expand the palette/VGA helper family only where it clarifies higher-level behavior rather than duplicating low-level helper names.
- Keep validating startup/display assumptions against raw 0007/0008/000d caller behavior instead of renaming isolated helpers in a vacuum.
### 6. Boundary Repair And Function Hygiene
- Create or repair missing function objects in the highest-traffic unresolved ranges first.
- Fix only overlaps that block live lanes or high-caller targets.
- Preserve conservative naming for repaired functions until direct caller or data evidence justifies promotion.
- Continue rejecting disproven ports or stale hypotheses instead of preserving them in live work queues.
### 7. Data, Imports, And Resource-Format Coverage
- Work through the deferred non-CALLF far-pointer relocations when they become necessary for object/table recovery.
- Expand coverage of weakly mapped resource/data loaders such as FLEX-derived descriptors, tables, caches, and per-shape data files.
- Cross-check current data-structure assumptions against external references like ScummVM only as supporting evidence, not as rename authority.
- Keep external import identities synchronized with verified import-table evidence.
### 8. Completion Criteria
A reasonably complete decompilation state should mean:
- most actively used subsystems are behaviorally named rather than only structurally named,
- the major live blockers (`000c:db68`, `000e:ffb0`, hot missing function objects, dark VM selector path, `0x4588` object role) are either resolved or reduced to low-impact residuals,
- the far-call hot list has very few meaningful unknowns left,
- the ledger gives a credible whole-program view rather than a sparse seed set,
- and the remaining gaps are mostly long-tail cleanup, low-traffic helpers, or data polish instead of core architecture uncertainty.
## Priority Order
1. Startup/display transition lane
2. VM / USECODE selector and loader lane
3. Coverage-ledger refinement from already-verified notes
4. High-value overlap repair (`000c:db68`, then `000e:ffb0` when justified)
5. `0x4588` callback-object classification
6. Broader segment sweeps and second-pass data/relocation work
## Evidence Anchors
Primary files backing this plan state:
- `crusader_segment_coverage_ledger.csv`
- `crusader_decompilation_notes.md`
- `docs/overview.md`
- `docs/raw-porting-progress.md`
- `docs/raw-0008-000c.md`
- `docs/raw-000a-000d.md`
- `docs/raw-000e.md`
- `docs/far-call-targets.md`
- `docs/usecode-roundtrip-ir.md`
- `docs/scummvm-crusader-reference.md`
## Update Rule
Update this file when one of the following happens:
- the headline estimate changes materially,
- a live blocker is resolved,
- a subsystem moves from structural to behavioral understanding,
- a segment cluster is promoted materially in the ledger,
- or the next resume point changes enough that the current handoff would mislead the next pass.
Keep the file short. Move detailed completed analysis into the appropriate file under `docs/` and leave only the current state, blockers, and forward path here.