Crusader_Decomp/plan-mid.md

15 KiB

Crusader Decompilation Mid-Project Plan

Purpose

This file is the workspace-facing mid-project tracker for the Crusader decompilation effort. It is intended to answer four questions clearly:

  1. How far along is the project?
  2. What is already solid?
  3. What still blocks broader decompilation?
  4. What should be implemented next?

The estimates below are intentionally conservative. They measure verified behavioral understanding, not just renamed symbols.

Progress Snapshot

Working Progress

Last Confirmed State

  • Priority 0 has started: crusader_segment_coverage_ledger.csv exists and contains a first-pass 145-row ledger.
  • The currently seeded ledger rows are conservative and strongest around seg001, seg004, seg021, seg043, seg080, seg082/083/085, seg091, seg094, and seg095.
  • Priority 1 has started on the cache/backend cluster: the seg082 allocator mechanics are now materially recovered (allocator_head_try_alloc_block, allocator_head_free_block, allocator_free_block_by_ptr), and the 0x4588 path now has named lifecycle helpers (runtime_callback_object_init_once, runtime_callback_object_teardown_once, runtime_callback_object_phase_finalize).
  • The 0x4588 blocker is tighter than before: 000a:b988 boundary repair now includes both callback sync callsites (000a:b9e5 / 000a:ba66) inside one real function body, 000d:9d5e / 000d:a3b7 are confirmed inside entity_cleanup_resources_and_dispatch, and adjacent helpers are now clarified as allocator_head_finalize_sweep (0009:a961), video_bios_state_snapshot (000a:4a1f), and video_mode_set_and_record_state (000a:4972). Concrete subsystem identity is still unresolved.
  • A larger MCP rename batch completed for cleanup callees: palette_buffer_alloc_and_init_256 (0009:7853), file_handle_alloc_init_and_open (0009:1c3a), file_handle_open_with_mode (0009:1d6a), surface_release_internal (0009:8d7b), surface_release_and_maybe_free (0009:8e0a), and sprite_redraw_global_if_active (000d:9231). This reduces entity_cleanup_resources_and_dispatch ambiguity on file/surface/palette teardown paths.
  • The previously missing 000d:7e00 function object is now recovered and named entity_dispatch_entry_init_runtime_state, with paired destructor entity_dispatch_entry_release_runtime_state at 000d:8078. Adjacent missing helpers 0003:a880 and 0003:b8e2 were also recovered, with 0003:b8e2 promoted to far_buffer_alloc_with_mode_flags.
  • Additional helper stabilization now covers seg061/064/076: vga_palette_read (0009:6ec7) is confirmed alongside existing palette write/free paths, timer_entity_enable_wrapper (0008:d3ba) is named, and seg064 one-shot gate helpers around 0x3b72/0x3b73 are documented with conservative comments while keeping speculative naming deferred.
  • Constructor-lane semantics tightened further: entity_set_update_period_and_reschedule (0008:d27e) and palette_buffer_alloc_copy_from_source (0009:7905) are now named, and both 0x4588 callback emit callsites (000d:9d5e, 000d:a3b7) now have explicit payload-pair annotations in disassembly.

Current Focus

  1. Finish Priority 0 refinement by promoting more exact segment rows where notes already support a verified foothold.
  2. Continue the Priority 1 pass by tracing remaining caller-side 0x4588 / 0009:b1c3 object-role evidence now that the 000d:7e00 constructor/destructor path is readable.

Next Resume Point

  1. Update the ledger for any additional exact segment anchors found in the reset/cache or render-path notes.
  2. Continue caller-role classification inside entity_cleanup_resources_and_dispatch (contains both 000d:9d5e and 000d:a3b7) and map how it relates to runtime_callback_object_phase_finalize + allocator_head_finalize_sweep.
  3. Promote additional field-level names inside entity_cleanup_resources_and_dispatch and entity_dispatch_entry_init_runtime_state now that update-period/palette-copy helpers are named.
  4. Classify remaining callback-role semantics for the 0x4588 object (especially vtable +0x08 vs +0x0c intent and phase/event meaning) using the confirmed payload pairs +0x12d/+0x12f and +0x74f/+0x751.
  5. Continue ASYLUM.24 only after the 0x4588 path has no further cheap wins.

Headline Estimate

  • Overall useful decompilation progress: about 25%
  • Reasonable uncertainty band: about 20% to 30%

This is the best single-number estimate for the full game right now.

Supporting Metrics

Metric Estimate Meaning
Top 100 far-call target coverage about 80% Roughly 80 of the top 100 most-called far-call targets have been named or materially classified
Whole-program behavioral coverage about 25% Verified subsystem and function understanding across the executable
Segment spread with meaningful analysis about 10% to 15% Segments with more than a trivial foothold or isolated note
Tooling maturity for continued work about 75% Core repair, lookup, and fallback automation needed for continued progress

Why These Numbers Differ

  • The hot-target metric is much higher because the project has already focused on the most shared and most-called helpers.
  • The whole-program metric is lower because most of the 145 NE segments still have not had systematic coverage passes.
  • The segment-spread metric is lower still because only a subset of segments have coherent subsystem-level treatment.

What Is Already In Place

Workflow and Tooling

  • Raw full-EXE Ghidra target is established and in active use.
  • Verified raw-import mapping exists for seg001 and seg021.
  • NE relocation parsing has been implemented.
  • Internal literal far-call fixups have been applied to the raw import.
  • PyGhidra fallback tooling exists for create/delete function work and batch scripted edits.
  • Conservative boundary-repair workflow already exists and has been used successfully.
  • Notes are detailed enough to support a formal executable-wide tracker.

Objective Milestones Already Reached

  • 145 NE segments identified from the internal NE header.
  • 8851 internal literal CALLF sites patched to real targets in the raw import.
  • 2841 non-CALLF far-pointer relocations identified and deferred.
  • 119 import callsites annotated.
  • Top 100 far-call target list processed through five tiers, with about 80 named or materially classified.

Strongly Advanced Areas

Core Gameplay and Entity Work

  • seg001 gameplay, cursor, entity lifecycle, projectile, combat, and AI footholds are strong.
  • A verified seg001 raw-port path is working and already used for multiple projectile helpers.
  • Entity table, class-table, and several global gameplay fields are partially mapped.

Timer, Event, and State Systems

  • seg021 timer and event-dispatch work has meaningful coverage.
  • 000c state-dispatch, cursor-nav, UI-listbox, palette-fade, and mini-VM clusters have footholds.

Rendering and Camera

  • 0007 rendering, draw-list, tile-visibility, and camera work has strong structural coverage.
  • world_to_screen_coords and adjacent geometric helpers are understood well enough to support further caller analysis.

Dispatch and Pair-Sync Helpers

  • 0008 dispatch-entry helper families have multiple verified rename batches.
  • Pair-sync and target-state helper clusters are no longer isolated unknowns.

Cache, Tracked Handles, and Bucket Logic

  • 000a cache manager layer is structurally mapped.
  • 000a tracked-handle table is structurally mapped.
  • 000d tracked bucket / proximity / visibility bucket logic has several meaningful behavioral names.
  • The client/cache distinction is much clearer than before.

Parser and Animation Framework

  • 000e parser cluster has a stable set of verified names.
  • 000e animation framework has a real foothold: chunk lookup, audio load, tick, frame advance, and constructor variants are partly mapped.

Local Repair Successes

  • seg043 overlap repair succeeded and recovered multiple valid function objects.
  • seg091 boundary recovery succeeded and exposed RNG helpers plus local init/context helpers.
  • Recent seg004 reset-path recovery and cache-reset follow-up added a new high-value analysis cluster.

What Still Blocks Broader Coverage

High-Value Classification Gaps

  • The object rooted at 0x4588 is still not classified well enough to safely rename 0009:b1c3.
  • ASYLUM.24 is only known as an import site, not yet a confidently identified routine.
  • Some structural names in the cache/backend/finalize cluster are waiting on object-role confirmation.

Boundary and Decompiler Gaps

  • Some high-caller targets still require conservative boundary repair or follow-up validation.
  • Certain functions still decompile poorly because of overlaps, thunk-heavy paths, or unresolved downstream targets.
  • 000e:ffb0 remains a notable animation/video-side blocker because of overlapping instructions.

Coverage Management Gap

  • A first-pass normalized segment-by-segment coverage ledger now exists for all 145 NE segments.
  • The remaining gap is refinement rather than absence: most segments still need manual promotion from None to Foothold / Partial / Deep as coverage expands.

Deferred Data Work

  • Non-CALLF far-pointer relocations still exist and will matter for deeper object/table recovery.
  • They are no longer the main blocker, but they remain a real second-pass problem.

Current Best Assessment Of Remaining Work

The project has solved most of the architectural uncertainty needed to keep going efficiently. The remaining effort is mainly a scaling problem:

  • expand coverage across many more segments,
  • remove the last high-value boundary blockers,
  • convert structural names into subsystem names when evidence is strong enough,
  • and normalize progress tracking so the whole program can be managed deliberately.

In practical terms, this looks like a true mid-project state rather than an early exploratory state or a late polish state.

Implementation Priorities

Priority 0: Coverage Ledger

First pass completed: an executable-wide coverage ledger now exists for all 145 NE segments in crusader_segment_coverage_ledger.csv.

Next work under Priority 0:

  1. Promote additional segments from None where notes already support a verified foothold.
  2. Normalize raw-address subsystem islands (notably the 000e: parser/animation cluster) back onto exact NE segment rows.
  3. Keep the ledger updated together with crusader_decompilation_notes.md after each verified batch.

Minimum columns:

Column Meaning
Segment NE segment number
Type Code or data
File offset From the NE segment table
Length Segment length
Coverage status None, foothold, partial, deep
Known subsystem Best current classification
Key named functions Short summary only
Blockers Boundary, import, thunk, overlap, unknown object, etc.
Notes source Notes section or evidence anchor

This is the most important missing artifact because it will make the percentage estimates maintainable.

Priority 1: Finish The New Cache/Backend Cluster

Work the newest verified reset-path cluster to closure:

  1. Trace more callers of 0009:b06b.
  2. Trace more callers of FUN_0009_a961.
  3. Classify the object rooted at 0x4588.
  4. Revisit 0009:b1c3 once the object role is clearer.

This is currently the best next analysis target because it closes a live cluster that already has fresh verified work around it.

Priority 2: Resolve ASYLUM.24

Identify what imported routine ASYLUM.24 actually is.

Goal:

  • tighten the description of runtime_cache_reset_sequence,
  • determine whether the import belongs to cache/resource/backend/media initialization,
  • and improve naming confidence around the reset path.

Priority 3: Continue Small-Batch Boundary Repair

Use the existing conservative repair approach for remaining high-value blockers.

Good candidates include:

  • unresolved high-caller function objects,
  • ranges that still steal bytes from adjacent real bodies,
  • and overlaps that block decompilation of already-active subsystems.

Priority 4: Finish Partial Subsystem Islands Before Expanding Broadly

Recommended order:

  1. seg043 plus connected seg004 reset and dispatch paths
  2. 000e animation/video overlap at 000e:ffb0
  3. 000c UI-listbox, mini-VM, and cursor-nav families
  4. Remaining structural 0007 and 0008 helper cohorts

The goal is to reduce the number of half-understood islands before starting broad segment sweeps.

Priority 5: Broaden Coverage Across The Remaining Executable

Once the ledger exists and the current hot cluster is closed, broaden analysis segment by segment.

Preferred method:

  1. Group segments by adjacency and call relationships.
  2. Identify entry points and hot callees first.
  3. Classify globals and tables next.
  4. Promote helper names only when supported by strong evidence.

Use these status values for segment coverage:

Status Meaning
None No meaningful verified analysis yet
Foothold One or two verified entry points or helper names, but no subsystem picture
Partial Several verified names plus some globals/tables or object fields
Deep Coherent subsystem-level understanding with multiple verified related functions

Use these status values for subsystem maturity:

Status Meaning
Unknown Not enough evidence to classify
Structural Behavior is partly mapped but still generic
Behavioral Confident subsystem role is known
Stable Multiple connected functions and data objects support the classification

Suggested Immediate Work Queue

Queue A: Highest Leverage

  1. Expand the first-pass segment coverage ledger beyond the currently seeded segments.
  2. Trace 0009:b06b, FUN_0009_a961, and 0009:b1c3.
  3. Identify ASYLUM.24.

Queue B: Repair And Stabilize

  1. Review remaining high-caller gap functions.
  2. Repair any still-blocking overlaps in small batches.
  3. Re-decompile repaired ranges and keep only evidence-backed names.

Queue C: Broaden Carefully

  1. Expand into adjacent segments connected to already-understood clusters.
  2. Avoid speculative naming.
  3. Update the notes and the coverage ledger together after each verified batch.

Concrete Progress Interpretation

If a single number is needed, use 25%.

If a more honest dashboard is acceptable, use all three:

  • 80% of top-100 hot targets processed
  • 25% overall behavioral decompilation progress
  • 10% to 15% segment spread with meaningful analysis

That combination best reflects the actual state of the project.

Source Anchors

Primary sources for this file:

  • crusader_segment_coverage_ledger.csv
  • crusader_decompilation_notes.md
  • crusader_ne_segments.csv
  • tier4_output.txt
  • tier5_output.txt
  • repo memory progress summary

Next Update Rule

Update this file when one of the following happens:

  • the overall estimate changes materially,
  • a new subsystem reaches behavioral or stable status,
  • a major blocker such as 0x4588, 0009:b1c3, or ASYLUM.24 is resolved,
  • or the segment coverage ledger is created and becomes the new primary progress source.