- Introduced `Ghidra Coverage Batch Director` and `Ghidra Coverage Mini` agents for improved parallel analysis and function coverage in `CRUSADER.EXE`. - Updated `ghidra.instructions.md` to clarify documentation practices and legacy file handling. - Added recent verified function coverage updates to `crusader_decompilation_notes.md` and `plan-mid.md` for better tracking of analysis progress. - Included new binary files for enhanced data handling in the project.
123 lines
10 KiB
Markdown
123 lines
10 KiB
Markdown
---
|
||
applyTo: "**"
|
||
---
|
||
|
||
# Crusader Ghidra Workflow
|
||
|
||
## Safety Guardrails
|
||
|
||
- Never create a git commit on your own.
|
||
- Never run a command that may delete files outside a temporary folder unless you first ask the user with `vscode_askQuestions` and get explicit confirmation.
|
||
- If a request could remove or overwrite repository files, pause and confirm before proceeding.
|
||
|
||
- Active target is the NE Ghidra program `CRUSADER.EXE` unless explicitly stated otherwise.
|
||
- Use Ghidra MCP tools for analysis, decompilation, renaming, comments, and xref work.
|
||
- Treat the verified `CRUSADER-RAW.EXE` work already captured in `docs/` and notes as a cross-reference evidence base for the live `CRUSADER.EXE` session, not as the default active program.
|
||
- Avoid speculative renames. Prefer names that are supported by one of these:
|
||
- verified raw mapping from standalone segment work
|
||
- direct string evidence
|
||
- clear call/field behavior in decompiler or disassembly
|
||
- xref relationships to already-named functions
|
||
- When porting names from standalone segment extracts or prior raw full-EXE work into `CRUSADER.EXE`, use only verified base mappings and keep the older raw address evidence with the live NE address where practical.
|
||
|
||
# Verified Raw Mapping Rules
|
||
|
||
- `seg001` raw base = `0x6E570`
|
||
- `seg021` raw base = `0x87170`
|
||
- Porting formula: `raw_full_exe_flat = verified_segment_base + standalone_segment_relative_offset`
|
||
- `seg001` and `seg021` both contain a keyboard handler; keep the seg001 name as `seg001_input_keyboard_handler` to avoid collision.
|
||
|
||
# Working Method
|
||
|
||
- Prefer a single decompile call first.
|
||
- If the decompiler collapses to thunk-heavy output, use one disassembly lookup to confirm the wrapper or parameter setup.
|
||
- **When `decompile_function` output is too large** (>~50KB), the result is written to a temp JSON file that `read_file` returns as empty `{}`. Use `disassemble_function` instead — it returns inline assembly directly and is fully navigable for large functions.
|
||
- For 16-bit NE decompiler failures such as `Low-level Error: Symbol $$undef... extends beyond the end of the address space`, do not assume the caller's frame is the only culprit. Inspect direct callees for parser-injected hidden `__return_storage_ptr__` parameters or bad pointer-return storage first, especially after prototype edits or function recreation.
|
||
- Cross-reference new `CRUSADER.EXE` findings against the old raw notes before promoting a rename or behavioral claim. If the two differ, keep both addresses and explain the mismatch instead of silently preferring one.
|
||
- Add a short decompiler comment when a rename is mapped from verified notes so the provenance stays visible in Ghidra.
|
||
- Keep `crusader_segment_coverage_ledger.csv` updated after each verified batch whenever a segment can be promoted or reclassified.
|
||
- Do not update `plan-mid.md` or `crusader_decompilation_notes.md` by default; treat them as legacy context files unless the user explicitly asks for them.
|
||
- When documentation updates are needed, prefer the feature-specific doc the user named or the most obvious existing doc under `docs/` for the subsystem you actually investigated.
|
||
- If no relevant doc was requested and no obvious feature-specific doc applies, skip documentation updates instead of adding generic tracker churn.
|
||
- Keep `ghidra_mcp_wishlist.md` updated whenever the workflow hits a missing MCP capability and would otherwise tempt a fallback outside MCP.
|
||
- Each wishlist entry should be short and concrete: what MCP lacked, what command/script/tool had to replace it, and what a useful MCP endpoint or behavior would look like.
|
||
- Record raw-import addresses alongside original segment-relative offsets when porting names.
|
||
- **Always use `rename_function_by_address`** — `rename_function` (by name) fails with "must have required property 'old_name'" and is broken. Use `"function_address": "000c:XXXX"` format.
|
||
- For substantive RE batches, end with at least 6 concrete future steps unless the task is fully closed and there are genuinely fewer defensible next actions.
|
||
- When a batch analyzes currently unnamed Ghidra functions and the behavior is clear enough, rename them in Ghidra instead of leaving them as positional `FUN_xxxx_xxxx` placeholders.
|
||
-
|
||
- **Terminal execution rule:** Always write multi-line Python scripts to a temporary `.py` file and execute that file with the Python interpreter instead of pasting multi-line Python directly into an interactive terminal. This avoids paste/encoding/line-ending issues and ensures the script runs in the expected environment.
|
||
# Executable Write Safety
|
||
|
||
- Normal Ghidra database work on `CRUSADER.EXE` remains in scope: renames, comments, prototypes, local-variable/type cleanup, function creation/deletion, and boundary repair are allowed unless the user says otherwise.
|
||
- Treat only actual program-byte changes as destructive actions: byte patching, write-back flows that alter loaded memory bytes, or any operation that would make the executable differ from the original program bytes.
|
||
- Never run destructive byte-write operations against the main reference executable in the project.
|
||
- Only use byte-patching or other byte-diverging executable write flows when the target program is an explicitly writable patch target, normally a program in the `/Writable` folder.
|
||
- Treat `CRUSADER.EXE`, `CRUSADER-RAW.EXE`, and other main reference executables as read-only with respect to program bytes unless the user explicitly says otherwise.
|
||
- Before running write endpoints such as `patch_bytes_and_reanalyze` or any PyGhidra byte-write script, verify that the selected program is the intended writable copy, not the reference executable.
|
||
- If the target program is not clearly a writable patch copy in `/Writable`, stop and ask the user before performing the byte write.
|
||
|
||
# Python-Backed Ghidra Through MCP Only
|
||
|
||
- Never use the offline/local PyGhidra CLI toolkit from this workspace.
|
||
- Do not invoke `tools.pyghidra_crusader`, the local `.venv-pyghidra311` entrypoint, or any project-open workflow that competes with the live GUI lock.
|
||
- Treat Python-backed Ghidra capabilities as MCP-only: use live `run_readonly_script(...)`, live write-capable MCP script endpoints, and other MCP operations exposed by the running Ghidra session.
|
||
- If MCP lacks a needed Python-backed operation, record that gap in `ghidra_mcp_wishlist.md` instead of falling back to the offline/local toolkit.
|
||
- If the workflow needs the user to change Ghidra state for MCP access, use the ask-questions tool with a yes/no confirmation prompt instead of plain text. Ask the user to open the correct Ghidra program or make the correct tab active before MCP work when needed.
|
||
|
||
# Current Verified Raw-Import Ports
|
||
|
||
These remain valid cross-reference anchors for `CRUSADER.EXE` work. Keep the old raw-import addresses and original segment-relative offsets in notes/comments when using them to support live NE renames.
|
||
|
||
- `0006:e5d0` = `cursor_update_hover` from seg001 `0x0060`
|
||
- `0008:7377` = `entity_count_by_type_a` from seg021 `0x0207`
|
||
- `0007:28ce` = `shot_entity_alloc` from seg001 `0x435e`
|
||
- `0007:2a19` = `shot_entity_free` from seg001 `0x44a9`
|
||
- `0007:2bc9` = `projectile_init_vector` from seg001 `0x4659`
|
||
- `0007:3001` = `entity_fire_weapon` from seg001 `0x4a91`
|
||
- `0007:3088` = `fire_weapon_from_cursor` from seg001 `0x4b18`
|
||
- `0007:30e8` = `projectile_check_hit` from seg001 `0x4b78`
|
||
- `0007:319e` = `projectile_step_update` from seg001 `0x4c2e`
|
||
- `0007:3298` = `projectile_trace_ray` from seg001 `0x4d28`
|
||
- `0007:371d` = `projectile_update_tick` from seg001 `0x51ad`
|
||
- `0007:4009` = `projectile_apply_hit` from seg001 `0x5a99`
|
||
|
||
# Named 000e: Functions (direct analysis — not segment-ported)
|
||
|
||
## Parser Cluster (`000e:34xx–38xx`)
|
||
- `000e:345e` = `record_table_init`
|
||
- `000e:34cc` = `record_table_destroy`
|
||
- `000e:35c6` = `record_table_release_buffer`
|
||
- `000e:35ef` = `record_table_next_slot`
|
||
- `000e:3639` = `record_table_parse_buffer`
|
||
- `000e:3798` = `record_parser_read_line`
|
||
- `000e:38f8` = `record_parser_find_marker`
|
||
|
||
## RIFF/Animation Cluster (`000e:03xx–2xxx`)
|
||
- `000e:2a28` = `riff_find_chunk_by_type` (RIFF LIST/RIFF walker; FourCC match at chunk+8)
|
||
- `000e:2104` = `animation_start` (finds "movi" chunk, inits timing ring buffer, kicks advance)
|
||
- `000e:12f4` = `animation_advance_frame` (fixed-point 0x1000 timer stepper, ring buffer update)
|
||
- `000e:103f` = `animation_tick` (guard wrapper — checks +0xd4 != -1, calls advance_frame)
|
||
- `000e:06f7` = `anim_load_audio_frame` (checks "01wb" chunk tag 0x62773130, copies audio into ring buffer)
|
||
|
||
## Constructor/Assert Helpers (`000e:22xx–29xx`)
|
||
- `000e:223d` = `assert_alive_sentinel` (expects +0xd4 == -1; traps on mismatch)
|
||
- `000e:2777` = `animation_ctor_variant_a` (alloc + init flags + chained init/assert/finalize)
|
||
- `000e:2860` = `animation_ctor_variant_b` (variant A with extra +0x109 init)
|
||
- `000e:2969` = `animation_ctor_variant_c` (default static flag profile +0x4c=0xd)
|
||
|
||
# Documentation Structure
|
||
|
||
Detailed RE notes live in the `docs/` folder. Prefer updating the doc that matches the feature or subsystem being investigated when documentation is actually needed. `crusader_decompilation_notes.md` and `plan-mid.md` are legacy context files, not default maintenance targets. Unless a doc says otherwise, read raw-focused docs as evidence sources to be cross-checked against the live `CRUSADER.EXE` session.
|
||
|
||
| File | Topic |
|
||
|------|-------|
|
||
| `docs/overview.md` | Binary overview, address layout, segment map, next steps |
|
||
| `docs/phar-lap-extender.md` | DOS extender functions and string references |
|
||
| `docs/ne-segment1.md` | NE Segment 1: entity system, cheat system, full game logic analysis |
|
||
| `docs/raw-porting-progress.md` | seg091 RNG, 0x4588 callbacks, 0007 gameplay batches, `snap_entity_to_ground` |
|
||
| `docs/raw-000e.md` | 000e parser cluster and RIFF/animation subsystem |
|
||
| `docs/raw-0007-rendering.md` | Draw list, scroll/camera, coordinate transforms, tile visibility |
|
||
| `docs/raw-0008-000c.md` | 0008 dispatch helpers and 000c state machine |
|
||
| `docs/raw-000a-000d.md` | Tracked handles, cache manager, seg082 allocator, palette helpers, seg004/005 startup |
|
||
| `docs/far-call-targets.md` | Top-104 far-call targets (Tiers 1–5), supporting functions, analysis gaps |
|