Crusader_Decomp/plan-mid.md
2026-04-05 18:27:09 +02:00

15 KiB

Crusader Decompilation Mid-Project Plan

Purpose

This file is the live mid-project tracker for the Crusader decompilation effort.

Keep it focused on:

  1. current verified state,
  2. active blockers,
  3. next resume work,
  4. and the remaining path to a reasonably complete decompilation.

Detailed completed analysis belongs in the files under docs/, not in this plan.

Progress Snapshot

Latest verified batch: docs/combat-dat.md now closes the shipped combat-tactic data file as a documentation target instead of leaving it as a scratch-note reference. Current best read is that all local Remorse/Regret variants share one identical 14-record COMBAT.DAT, the live NE database now already has the right tactic/process field anchors (combatDatTacticPtr, combatDatTacticCurOffset, combatDatBlockNo, tacticNo) plus setup helpers, and the shipped opcode subset is now decoded into a full human-readable tactic catalog using direct binary parsing plus the ScummVM Crusader attack-process interpreter as a reference model.

  • Overall useful decompilation progress: about 58%
  • Reasonable uncertainty band: about 55% to 63%
  • Top 100 far-call target coverage: about 86%
  • Segment spread with meaningful analysis: about 34% to 40%
  • Tooling maturity for continued work: about 83%

Why The Estimate Moved

  • The NE CRUSADER.EXE database now has materially more named functions, better caller-role coverage, and broader comment-backed provenance than when this tracker was first drafted.
  • The startup/display lane is no longer a top active section. Its outer ownership and control flow are stable enough that it should stay closed unless new caller evidence changes the model.
  • The cheat/debug lane is also much tighter: the jassica16 latch, the broader -laurie gate, the ~ runtime toggle, the F7-family overlays, the F10/Ctrl behavior, and the 0x410 CD-transfer-display branch are now separated well enough that this lane is mostly documentation and cleanup, not architecture recovery.
  • The USECODE/VM lane has moved from broad structure guesses to a partial runtime model: core loader/runtime helpers are named, owner-loaded slot arithmetic is verified against extracted corpora, several masked-create helpers have real contracts, and the major remaining uncertainty is now the upstream selector/caller path rather than the storage format itself.
  • The map-renderer crosswalk lane also removed a lot of lingering shape ambiguity by closing more controller/helper families directly from extracted corpora plus scene evidence.
  • The combat-tactic data lane is also now materially tighter: COMBAT.DAT is no longer just a named-tactic hint source, but a documented bytecode archive with stable per-record names, verified block structure, a decoded shipped opcode subset, and a practical family-level behavior map for the Dumb, Pivot, Advance, Careful, marker-shuttle, and step-out-shoot tactics.

Current Verified State

Primary Tracking Assets

  • crusader_segment_coverage_ledger.csv remains the main executable-wide coverage tracker and should be updated after each verified batch.
  • crusader_decompilation_notes.md is an index, not the place for long-form analysis.
  • CRUSADER.EXE remains the default live Ghidra target.
  • Verified CRUSADER-RAW.EXE work remains a supporting evidence base for ports, naming provenance, and caller/context cross-checks.

Strong Or Stable Areas

  • seg001 gameplay/input/projectile work is stable enough to support verified raw-name ports into live NE work.
  • The raw 0007 rendering/camera/tile-visibility lane has a strong structural map and now acts more as supporting evidence than as a primary unknown.
  • The 0008 dispatch-helper and 000c state/transition lanes have broad partial coverage, including enough caller-side structure to support practical NE naming work.
  • The VM/USECODE lane now also has one earlier compiled-side producer anchored beyond the old direct Item_GetDamaged / StorageDataProcess_Run callers: AreaSearch_CollideMove is now verified as a paired 0x20b / 0x20c collision-process producer, and the local seg031 queue helpers are named structurally in the live database.
  • That same collision-storage producer surface is now wider too: current direct callers are all movement/physics/animation-side (Item_LegalMoveToPoint, Item_LegalMoveToPointWithCollisionInfo, gravity, animation, supersprite, and fast-area gravity cleanup), and no verified non-collision producer reaches the 0x236 queue yet.
  • The movement/collision lane is tighter at the helper level too: the step-aware seg029 sweep wrappers, the seg031 release-side queue cleanup pair, and the adjacent seg090 directional cache-offset helper are now named in the live database, so the remaining uncertainty in this lane sits earlier in caller policy rather than in the local helper layer.
  • The startup/display lane is materially closed. Shared dispatch-entry ownership, seg126 file-backed control flow, seg127 fade control, and the surrounding palette/presentation helpers are now understood well enough that they should not stay in the live critical path.
  • The cheat/debug lane is mostly closed at the behavior level. The secret-sequence matcher, broader cheat gates, F7 overlays, F10 modifier path, Ctrl+L location popup, Ctrl+Q = 0x410 CD-transfer-display toggle, -debug, and -laurie are all separated far more cleanly than before.
  • The hidden usecode-debugger lane is now structurally understood as a layered orphaned subsystem: seg109 UI pieces, seg1408 break-state helpers, and the seg1418 interpreter handoff are no longer conflated.
  • The USECODE/VM lane now has a workable compiled-side model around entity_vm_runtime_create, entity_vm_runtime_owner_resource_create, entity_vm_context_create_from_slot_index, the masked-create hub at 000d:463a, the persistence/load helpers, and the owner-loaded slot/value arithmetic.
  • The owner-loaded body/range model is no longer speculative. Class-selection uses class_id + 2, header/subentry math matches extracted corpus output, and concrete body windows for NPCTRIG, EVENT, and related families are now verified.
  • The map-renderer/documentation lane now has a stronger shape/controller crosswalk. Recent closures include CRUMORPH, NPC_ONLY, WATCHNS, WATCHEW, CRYOBOX, CRAZYEW, CRAZYNS, VIDEOBOX, PANELEW, GENERATR, and cross-game DEATHBOX, with viewer-side links kept conservative where actor-side state is still runtime-only.
  • The command-line/startup lane is much tighter across both games: -warp <mission> [x y z], -mapoff, -egg, startup teleporter selection, and the -u EUSECODE root override all now have practical behavior models instead of folklore-level descriptions.
  • The PSX lane is no longer just side inventory. Retail/pre-alpha bundle loading, mission-briefing/passcode structure, and the reduced-content pre-alpha disc now have dedicated notes and enough stable naming to support future targeted passes.
  • The Remorse class-lift preparation lane now has a usable document cluster: overall plan, candidate inventory, endpoint spec, ABI constraints, family notes for EntityDispatchEntry and SpriteNode, a conservative Entity family split, a VM runtime/owner-resource layout note, a compatibility-header draft, and one grouped resume index.
  • The same class-lift prep lane is now more execution-ready: the 0x4588 broker family has its own focused object note, the toolchain story has a dedicated fingerprint-evidence note, and there is now a concrete first-batch class-authoring checklist ready for the first MCP-backed namespace/struct/vtable pass.

Areas That Are No Longer Live Priorities

  • Startup/display transition recovery is no longer a front-line blocker unless overlap repair becomes necessary for adjacent work.
  • The general cheat/debug key matrix no longer needs broad exploratory work.
  • The -debug switch is no longer an open mystery; remaining work there is mostly sink-side cleanup and documentation.
  • The earlier executable-patch experiments around the hidden debugger are documented history, not a current decompilation priority unless new evidence changes the entry model.

Live Blockers

  1. The main remaining VM uncertainty is the real upstream selector/caller path into entity_vm_opcode_sequence_run and adjacent masked-create helpers. One earlier producer is now closed at AreaSearch_CollideMove for the 0x236 collision-storage family, but the owner-loaded class-family chooser and any broader non-collision producers are still upstream-dark.
  2. The dark masked-materializer wrappers still need caller-role recovery, especially the signed-additive slot-0x0a / slot-0x0b pair and the surrounding higher-slot wrapper ladder.
  3. The callback object rooted at 0x4588 still lacks a behaviorally safe subsystem name even though its allocation/finalize neighborhood is better constrained.
  4. A few hot or awkward function ranges still lack clean function objects or good boundaries, especially around 000c:db68, 000e:ffb0, and several caller-dense gaps in 0007, 000b, and 000e.
  5. Weakly covered resource/data-loader families and non-CALLF far-pointer relocations are still a second-pass blocker for some object/table recovery work.
  6. The segment ledger has improved, but it still trails the actual verified state in the notes and Ghidra database. Promoting known segments from documented evidence remains real work, not bookkeeping trivia.

Current Focus

  1. Keep the live NE CRUSADER.EXE lane as the default working surface, using raw/full-EXE and standalone-segment work only as supporting evidence.
  2. Keep the VM/USECODE lane focused on selector recovery, caller-role recovery, and record-shape confirmation rather than repeating storage-format validation that is already closed.
  3. Promote ledger coverage from existing verified notes before broadening into fresh executable-wide sweeps.
  4. Use overlap repair only where it unlocks an active high-payoff lane.
  5. Use the map-renderer/tooling lane to validate shape ids, map placements, and viewer semantics before promoting additional static-object names in Ghidra.

Next Resume Point

  1. Resume from docs/ne-hole-filling-priorities.md and pick one small NE cluster where the old disasm vocabulary, extracted corpus evidence, and live NE callers overlap cleanly.
  2. Stay on the VM lane and move one step earlier than the now-mapped movement/collision helper set around AreaSearch_CollideMove: the local seg029/031/090 helper layer is now named, so the next work is the policy/dispatch layer that decides when those legal-move, gravity, animation, or supersprite paths instantiate the local 0x236 collision-storage queue, plus verification of whether any non-collision producer feeds the same StorageDataProcess_Create / Run family.
  3. Recover caller roles for the remaining dark signed-additive masked wrappers, especially the slot-0x0a / slot-0x0b pair, and compare them against the now-anchored slot-0x12 caller pattern.
  4. Tighten the higher-slot wrapper ladder around 0005:3115..31da so future event-label promotion depends on compiled caller behavior instead of external tables.
  5. Tighten the seg006 masked-helper caller chains so the local state-selector/value family can be tied to concrete gameplay subsystems.
  6. Classify the paired seg070 loops behind entity_vm_runtime_owner_resource_create, especially which temporary buffers and record schemas each family populates.
  7. Promote additional ledger rows directly from already-verified docs and live comments, especially where segments already deserve Foothold, Partial, or Deep; the new seg029 step-aware sweep batch, seg031 queue-release batch, and seg090 movement-helper batch should be the immediate template.
  8. If the VM lane stalls, revisit 000e:ffb0 from the now-better-constrained video/audio caller windows and try to recover an adjacent non-overlapped helper before attempting broad boundary repair.
  9. Continue the map-renderer cross-check lane by building one conservative shape-id/map-placement crosswalk from shapedata_more_complete.txt, extracted corpora, and authored scene evidence before promoting more trigger-heavy classes in NE.
  10. Keep the PSX pre-alpha lane alive as a secondary target: classify the LoadExec callers, test whether the stale TALK1.XA path is still reachable, and compare the shipped LSET1 bundles against the retail extractor outputs.

Remaining Work To Reach A Reasonably Complete Decompilation State

1. Coverage And Tracker Completion

  • Keep turning the seeded 145-row ledger into a trustworthy whole-program dashboard.
  • Sweep remaining lightly covered segment clusters by adjacency and call relationships rather than one-off function hunting.
  • Keep the plan, the docs, the ledger, and the live Ghidra comments synchronized after each verified batch.

2. VM / USECODE / Scripting Lane

  • Close the upstream selector/caller path into the sequencer and masked-create families.
  • Finish separating owner-row-backed data from runtime-decoded control streams and dispatch-entry seed records.
  • Expand caller-backed event-label promotion only where binary behavior and slot reuse agree.
  • Keep maturing the tooling bridge from extracted corpora into compiled-side annotation/import workflows.

3. Callback / Allocator / Object-Role Lane

  • Classify the 0x4588 callback object strongly enough for a real subsystem name.
  • Separate generic cache/allocator mechanics from game-specific client behavior where caller evidence supports it.
  • Keep low-level helper names conservative until behavior, not just structure, is clear.

4. Rendering / Animation / UI Support Lanes

  • Keep the rendering/palette/animation lanes focused on caller-side semantics and cleanup, not exploratory renaming in isolation.
  • Revisit 000e:ffb0 and adjacent overlap-heavy video helpers only when the payoff is clear.
  • Use map-renderer evidence and extracted corpora to validate static-object and helper/controller naming before promoting it into live NE work.

5. Data / Resource / Relocation Coverage

  • Tackle deferred non-CALLF far-pointer relocations when they are needed for active table/object recovery.
  • Broaden weakly covered resource/data-loader families where they block real subsystem classification.
  • Keep external references like ScummVM or older disasm corpora as evidence aids, not rename authority.

Priority Order

  1. VM / USECODE selector and caller recovery
  2. Coverage-ledger refinement from already-verified notes
  3. Callback-object classification around 0x4588
  4. High-value boundary repair when it unlocks active work
  5. Broader segment sweeps and second-pass data/relocation work
  6. Secondary map-renderer and PSX follow-up lanes

Evidence Anchors

Primary files backing this plan state:

  • crusader_segment_coverage_ledger.csv
  • crusader_decompilation_notes.md
  • docs/overview.md
  • docs/ne-hole-filling-priorities.md
  • docs/crusader-disasm-reference.md
  • docs/raw-porting-progress.md
  • docs/raw-0008-000c.md
  • docs/raw-000a-000d.md
  • docs/raw-000e.md
  • docs/far-call-targets.md
  • docs/usecode-roundtrip-ir.md

Update Rule

Update this file when one of the following happens:

  • the headline estimate changes materially,
  • a live blocker is resolved,
  • a subsystem moves from structural to behavioral understanding,
  • a segment cluster is promoted materially in the ledger,
  • or the next resume point changes enough that the current handoff would mislead the next pass.

Keep this file short. Move detailed completed analysis into the appropriate file under docs/ and leave only the current state, blockers, and forward path here.