Add segment coverage ledger and mid-project plan for Crusader decompilation
- Created `crusader_segment_coverage_ledger.csv` to track segment coverage status, types, and known functions. - Introduced `plan-mid.md` as a mid-project tracker outlining progress, objectives, and implementation priorities for the decompilation effort. - Added scripts in `pyghidra_plans` to assist with instruction window dumping and reference inspection for the object at `0x4588`. - Implemented functionality to scan for instruction uses of specific targets related to the decompilation project.
This commit is contained in:
parent
55b3187469
commit
519af09912
42 changed files with 2444 additions and 3 deletions
308
plan-mid.md
Normal file
308
plan-mid.md
Normal file
|
|
@ -0,0 +1,308 @@
|
|||
# Crusader Decompilation Mid-Project Plan
|
||||
|
||||
## Purpose
|
||||
|
||||
This file is the workspace-facing mid-project tracker for the Crusader decompilation effort.
|
||||
It is intended to answer four questions clearly:
|
||||
|
||||
1. How far along is the project?
|
||||
2. What is already solid?
|
||||
3. What still blocks broader decompilation?
|
||||
4. What should be implemented next?
|
||||
|
||||
The estimates below are intentionally conservative. They measure verified behavioral understanding, not just renamed symbols.
|
||||
|
||||
## Progress Snapshot
|
||||
|
||||
## Working Progress
|
||||
|
||||
### Last Confirmed State
|
||||
|
||||
- Priority 0 has started: `crusader_segment_coverage_ledger.csv` exists and contains a first-pass 145-row ledger.
|
||||
- The currently seeded ledger rows are conservative and strongest around seg001, seg004, seg021, seg043, seg080, seg082/083/085, seg091, seg094, and seg095.
|
||||
- Priority 1 has started on the cache/backend cluster: the seg082 allocator mechanics are now materially recovered (`allocator_head_try_alloc_block`, `allocator_head_free_block`, `allocator_free_block_by_ptr`), and the next unresolved clue is that `0x4588` behaves like a runtime-installed callback/dispatch object used by `entity_conditional_render_dispatch` plus a one-shot teardown path.
|
||||
- The `0x4588` blocker is tighter than before: no-function windows now confirm a direct install at `000a:493e`, repeated clear paths in seg004, and additional vtable `+0x0c` callbacks from unresolved `000a:` and `000d:` callers, but the concrete subsystem name is still unresolved.
|
||||
|
||||
### Current Focus
|
||||
|
||||
1. Finish Priority 0 refinement by promoting more exact segment rows where notes already support a verified foothold.
|
||||
2. Continue the Priority 1 pass by tracing the remaining caller-side `0x4588` / `0009:b1c3` object-role evidence rather than the already-recovered allocator mechanics.
|
||||
|
||||
### Next Resume Point
|
||||
|
||||
1. Update the ledger for any additional exact segment anchors found in the reset/cache or render-path notes.
|
||||
2. Recover or classify the still-unbounded callback callers around `000a:b9e5` / `000a:ba66` and `000d:9d5e` / `000d:a3b7`; they now look like the best remaining cheap wins on the `0x4588` path.
|
||||
3. Revisit the nearby install/lifecycle gap around `000a:493e` / `000a:4a56` only if those caller windows need a stronger object-owner model.
|
||||
4. Continue `ASYLUM.24` only after the `0x4588` path has no further cheap wins.
|
||||
|
||||
### Headline Estimate
|
||||
|
||||
- Overall useful decompilation progress: about 25%
|
||||
- Reasonable uncertainty band: about 20% to 30%
|
||||
|
||||
This is the best single-number estimate for the full game right now.
|
||||
|
||||
### Supporting Metrics
|
||||
|
||||
| Metric | Estimate | Meaning |
|
||||
|---|---:|---|
|
||||
| Top 100 far-call target coverage | about 80% | Roughly 80 of the top 100 most-called far-call targets have been named or materially classified |
|
||||
| Whole-program behavioral coverage | about 25% | Verified subsystem and function understanding across the executable |
|
||||
| Segment spread with meaningful analysis | about 10% to 15% | Segments with more than a trivial foothold or isolated note |
|
||||
| Tooling maturity for continued work | about 75% | Core repair, lookup, and fallback automation needed for continued progress |
|
||||
|
||||
### Why These Numbers Differ
|
||||
|
||||
- The hot-target metric is much higher because the project has already focused on the most shared and most-called helpers.
|
||||
- The whole-program metric is lower because most of the 145 NE segments still have not had systematic coverage passes.
|
||||
- The segment-spread metric is lower still because only a subset of segments have coherent subsystem-level treatment.
|
||||
|
||||
## What Is Already In Place
|
||||
|
||||
### Workflow and Tooling
|
||||
|
||||
- Raw full-EXE Ghidra target is established and in active use.
|
||||
- Verified raw-import mapping exists for seg001 and seg021.
|
||||
- NE relocation parsing has been implemented.
|
||||
- Internal literal far-call fixups have been applied to the raw import.
|
||||
- PyGhidra fallback tooling exists for create/delete function work and batch scripted edits.
|
||||
- Conservative boundary-repair workflow already exists and has been used successfully.
|
||||
- Notes are detailed enough to support a formal executable-wide tracker.
|
||||
|
||||
### Objective Milestones Already Reached
|
||||
|
||||
- 145 NE segments identified from the internal NE header.
|
||||
- 8851 internal literal CALLF sites patched to real targets in the raw import.
|
||||
- 2841 non-CALLF far-pointer relocations identified and deferred.
|
||||
- 119 import callsites annotated.
|
||||
- Top 100 far-call target list processed through five tiers, with about 80 named or materially classified.
|
||||
|
||||
## Strongly Advanced Areas
|
||||
|
||||
### Core Gameplay and Entity Work
|
||||
|
||||
- seg001 gameplay, cursor, entity lifecycle, projectile, combat, and AI footholds are strong.
|
||||
- A verified seg001 raw-port path is working and already used for multiple projectile helpers.
|
||||
- Entity table, class-table, and several global gameplay fields are partially mapped.
|
||||
|
||||
### Timer, Event, and State Systems
|
||||
|
||||
- seg021 timer and event-dispatch work has meaningful coverage.
|
||||
- 000c state-dispatch, cursor-nav, UI-listbox, palette-fade, and mini-VM clusters have footholds.
|
||||
|
||||
### Rendering and Camera
|
||||
|
||||
- 0007 rendering, draw-list, tile-visibility, and camera work has strong structural coverage.
|
||||
- `world_to_screen_coords` and adjacent geometric helpers are understood well enough to support further caller analysis.
|
||||
|
||||
### Dispatch and Pair-Sync Helpers
|
||||
|
||||
- 0008 dispatch-entry helper families have multiple verified rename batches.
|
||||
- Pair-sync and target-state helper clusters are no longer isolated unknowns.
|
||||
|
||||
### Cache, Tracked Handles, and Bucket Logic
|
||||
|
||||
- 000a cache manager layer is structurally mapped.
|
||||
- 000a tracked-handle table is structurally mapped.
|
||||
- 000d tracked bucket / proximity / visibility bucket logic has several meaningful behavioral names.
|
||||
- The client/cache distinction is much clearer than before.
|
||||
|
||||
### Parser and Animation Framework
|
||||
|
||||
- 000e parser cluster has a stable set of verified names.
|
||||
- 000e animation framework has a real foothold: chunk lookup, audio load, tick, frame advance, and constructor variants are partly mapped.
|
||||
|
||||
### Local Repair Successes
|
||||
|
||||
- seg043 overlap repair succeeded and recovered multiple valid function objects.
|
||||
- seg091 boundary recovery succeeded and exposed RNG helpers plus local init/context helpers.
|
||||
- Recent seg004 reset-path recovery and cache-reset follow-up added a new high-value analysis cluster.
|
||||
|
||||
## What Still Blocks Broader Coverage
|
||||
|
||||
### High-Value Classification Gaps
|
||||
|
||||
- The object rooted at `0x4588` is still not classified well enough to safely rename `0009:b1c3`.
|
||||
- `ASYLUM.24` is only known as an import site, not yet a confidently identified routine.
|
||||
- Some structural names in the cache/backend/finalize cluster are waiting on object-role confirmation.
|
||||
|
||||
### Boundary and Decompiler Gaps
|
||||
|
||||
- Some high-caller targets still require conservative boundary repair or follow-up validation.
|
||||
- Certain functions still decompile poorly because of overlaps, thunk-heavy paths, or unresolved downstream targets.
|
||||
- `000e:ffb0` remains a notable animation/video-side blocker because of overlapping instructions.
|
||||
|
||||
### Coverage Management Gap
|
||||
|
||||
- A first-pass normalized segment-by-segment coverage ledger now exists for all 145 NE segments.
|
||||
- The remaining gap is refinement rather than absence: most segments still need manual promotion from `None` to `Foothold` / `Partial` / `Deep` as coverage expands.
|
||||
|
||||
### Deferred Data Work
|
||||
|
||||
- Non-CALLF far-pointer relocations still exist and will matter for deeper object/table recovery.
|
||||
- They are no longer the main blocker, but they remain a real second-pass problem.
|
||||
|
||||
## Current Best Assessment Of Remaining Work
|
||||
|
||||
The project has solved most of the architectural uncertainty needed to keep going efficiently.
|
||||
The remaining effort is mainly a scaling problem:
|
||||
|
||||
- expand coverage across many more segments,
|
||||
- remove the last high-value boundary blockers,
|
||||
- convert structural names into subsystem names when evidence is strong enough,
|
||||
- and normalize progress tracking so the whole program can be managed deliberately.
|
||||
|
||||
In practical terms, this looks like a true mid-project state rather than an early exploratory state or a late polish state.
|
||||
|
||||
## Implementation Priorities
|
||||
|
||||
### Priority 0: Coverage Ledger
|
||||
|
||||
First pass completed: an executable-wide coverage ledger now exists for all 145 NE segments in `crusader_segment_coverage_ledger.csv`.
|
||||
|
||||
Next work under Priority 0:
|
||||
|
||||
1. Promote additional segments from `None` where notes already support a verified foothold.
|
||||
2. Normalize raw-address subsystem islands (notably the `000e:` parser/animation cluster) back onto exact NE segment rows.
|
||||
3. Keep the ledger updated together with `crusader_decompilation_notes.md` after each verified batch.
|
||||
|
||||
Minimum columns:
|
||||
|
||||
| Column | Meaning |
|
||||
|---|---|
|
||||
| Segment | NE segment number |
|
||||
| Type | Code or data |
|
||||
| File offset | From the NE segment table |
|
||||
| Length | Segment length |
|
||||
| Coverage status | None, foothold, partial, deep |
|
||||
| Known subsystem | Best current classification |
|
||||
| Key named functions | Short summary only |
|
||||
| Blockers | Boundary, import, thunk, overlap, unknown object, etc. |
|
||||
| Notes source | Notes section or evidence anchor |
|
||||
|
||||
This is the most important missing artifact because it will make the percentage estimates maintainable.
|
||||
|
||||
### Priority 1: Finish The New Cache/Backend Cluster
|
||||
|
||||
Work the newest verified reset-path cluster to closure:
|
||||
|
||||
1. Trace more callers of `0009:b06b`.
|
||||
2. Trace more callers of `FUN_0009_a961`.
|
||||
3. Classify the object rooted at `0x4588`.
|
||||
4. Revisit `0009:b1c3` once the object role is clearer.
|
||||
|
||||
This is currently the best next analysis target because it closes a live cluster that already has fresh verified work around it.
|
||||
|
||||
### Priority 2: Resolve `ASYLUM.24`
|
||||
|
||||
Identify what imported routine `ASYLUM.24` actually is.
|
||||
|
||||
Goal:
|
||||
|
||||
- tighten the description of `runtime_cache_reset_sequence`,
|
||||
- determine whether the import belongs to cache/resource/backend/media initialization,
|
||||
- and improve naming confidence around the reset path.
|
||||
|
||||
### Priority 3: Continue Small-Batch Boundary Repair
|
||||
|
||||
Use the existing conservative repair approach for remaining high-value blockers.
|
||||
|
||||
Good candidates include:
|
||||
|
||||
- unresolved high-caller function objects,
|
||||
- ranges that still steal bytes from adjacent real bodies,
|
||||
- and overlaps that block decompilation of already-active subsystems.
|
||||
|
||||
### Priority 4: Finish Partial Subsystem Islands Before Expanding Broadly
|
||||
|
||||
Recommended order:
|
||||
|
||||
1. seg043 plus connected seg004 reset and dispatch paths
|
||||
2. 000e animation/video overlap at `000e:ffb0`
|
||||
3. 000c UI-listbox, mini-VM, and cursor-nav families
|
||||
4. Remaining structural 0007 and 0008 helper cohorts
|
||||
|
||||
The goal is to reduce the number of half-understood islands before starting broad segment sweeps.
|
||||
|
||||
### Priority 5: Broaden Coverage Across The Remaining Executable
|
||||
|
||||
Once the ledger exists and the current hot cluster is closed, broaden analysis segment by segment.
|
||||
|
||||
Preferred method:
|
||||
|
||||
1. Group segments by adjacency and call relationships.
|
||||
2. Identify entry points and hot callees first.
|
||||
3. Classify globals and tables next.
|
||||
4. Promote helper names only when supported by strong evidence.
|
||||
|
||||
## Recommended Tracking Model
|
||||
|
||||
Use these status values for segment coverage:
|
||||
|
||||
| Status | Meaning |
|
||||
|---|---|
|
||||
| None | No meaningful verified analysis yet |
|
||||
| Foothold | One or two verified entry points or helper names, but no subsystem picture |
|
||||
| Partial | Several verified names plus some globals/tables or object fields |
|
||||
| Deep | Coherent subsystem-level understanding with multiple verified related functions |
|
||||
|
||||
Use these status values for subsystem maturity:
|
||||
|
||||
| Status | Meaning |
|
||||
|---|---|
|
||||
| Unknown | Not enough evidence to classify |
|
||||
| Structural | Behavior is partly mapped but still generic |
|
||||
| Behavioral | Confident subsystem role is known |
|
||||
| Stable | Multiple connected functions and data objects support the classification |
|
||||
|
||||
## Suggested Immediate Work Queue
|
||||
|
||||
### Queue A: Highest Leverage
|
||||
|
||||
1. Expand the first-pass segment coverage ledger beyond the currently seeded segments.
|
||||
2. Trace `0009:b06b`, `FUN_0009_a961`, and `0009:b1c3`.
|
||||
3. Identify `ASYLUM.24`.
|
||||
|
||||
### Queue B: Repair And Stabilize
|
||||
|
||||
1. Review remaining high-caller gap functions.
|
||||
2. Repair any still-blocking overlaps in small batches.
|
||||
3. Re-decompile repaired ranges and keep only evidence-backed names.
|
||||
|
||||
### Queue C: Broaden Carefully
|
||||
|
||||
1. Expand into adjacent segments connected to already-understood clusters.
|
||||
2. Avoid speculative naming.
|
||||
3. Update the notes and the coverage ledger together after each verified batch.
|
||||
|
||||
## Concrete Progress Interpretation
|
||||
|
||||
If a single number is needed, use 25%.
|
||||
|
||||
If a more honest dashboard is acceptable, use all three:
|
||||
|
||||
- 80% of top-100 hot targets processed
|
||||
- 25% overall behavioral decompilation progress
|
||||
- 10% to 15% segment spread with meaningful analysis
|
||||
|
||||
That combination best reflects the actual state of the project.
|
||||
|
||||
## Source Anchors
|
||||
|
||||
Primary sources for this file:
|
||||
|
||||
- `crusader_segment_coverage_ledger.csv`
|
||||
- `crusader_decompilation_notes.md`
|
||||
- `crusader_ne_segments.csv`
|
||||
- `tier4_output.txt`
|
||||
- `tier5_output.txt`
|
||||
- repo memory progress summary
|
||||
|
||||
## Next Update Rule
|
||||
|
||||
Update this file when one of the following happens:
|
||||
|
||||
- the overall estimate changes materially,
|
||||
- a new subsystem reaches behavioral or stable status,
|
||||
- a major blocker such as `0x4588`, `0009:b1c3`, or `ASYLUM.24` is resolved,
|
||||
- or the segment coverage ledger is created and becomes the new primary progress source.
|
||||
Loading…
Add table
Add a link
Reference in a new issue