Crusader_Decomp/.github/instructions/ghidra.instructions.md
Marco 328a8ba30f Add Ghidra coverage agents and update documentation for enhanced function analysis
- Introduced `Ghidra Coverage Batch Director` and `Ghidra Coverage Mini` agents for improved parallel analysis and function coverage in `CRUSADER.EXE`.
- Updated `ghidra.instructions.md` to clarify documentation practices and legacy file handling.
- Added recent verified function coverage updates to `crusader_decompilation_notes.md` and `plan-mid.md` for better tracking of analysis progress.
- Included new binary files for enhanced data handling in the project.
2026-04-15 17:16:53 +02:00

123 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
applyTo: "**"
---
# Crusader Ghidra Workflow
## Safety Guardrails
- Never create a git commit on your own.
- Never run a command that may delete files outside a temporary folder unless you first ask the user with `vscode_askQuestions` and get explicit confirmation.
- If a request could remove or overwrite repository files, pause and confirm before proceeding.
- Active target is the NE Ghidra program `CRUSADER.EXE` unless explicitly stated otherwise.
- Use Ghidra MCP tools for analysis, decompilation, renaming, comments, and xref work.
- Treat the verified `CRUSADER-RAW.EXE` work already captured in `docs/` and notes as a cross-reference evidence base for the live `CRUSADER.EXE` session, not as the default active program.
- Avoid speculative renames. Prefer names that are supported by one of these:
- verified raw mapping from standalone segment work
- direct string evidence
- clear call/field behavior in decompiler or disassembly
- xref relationships to already-named functions
- When porting names from standalone segment extracts or prior raw full-EXE work into `CRUSADER.EXE`, use only verified base mappings and keep the older raw address evidence with the live NE address where practical.
# Verified Raw Mapping Rules
- `seg001` raw base = `0x6E570`
- `seg021` raw base = `0x87170`
- Porting formula: `raw_full_exe_flat = verified_segment_base + standalone_segment_relative_offset`
- `seg001` and `seg021` both contain a keyboard handler; keep the seg001 name as `seg001_input_keyboard_handler` to avoid collision.
# Working Method
- Prefer a single decompile call first.
- If the decompiler collapses to thunk-heavy output, use one disassembly lookup to confirm the wrapper or parameter setup.
- **When `decompile_function` output is too large** (>~50KB), the result is written to a temp JSON file that `read_file` returns as empty `{}`. Use `disassemble_function` instead — it returns inline assembly directly and is fully navigable for large functions.
- For 16-bit NE decompiler failures such as `Low-level Error: Symbol $$undef... extends beyond the end of the address space`, do not assume the caller's frame is the only culprit. Inspect direct callees for parser-injected hidden `__return_storage_ptr__` parameters or bad pointer-return storage first, especially after prototype edits or function recreation.
- Cross-reference new `CRUSADER.EXE` findings against the old raw notes before promoting a rename or behavioral claim. If the two differ, keep both addresses and explain the mismatch instead of silently preferring one.
- Add a short decompiler comment when a rename is mapped from verified notes so the provenance stays visible in Ghidra.
- Keep `crusader_segment_coverage_ledger.csv` updated after each verified batch whenever a segment can be promoted or reclassified.
- Do not update `plan-mid.md` or `crusader_decompilation_notes.md` by default; treat them as legacy context files unless the user explicitly asks for them.
- When documentation updates are needed, prefer the feature-specific doc the user named or the most obvious existing doc under `docs/` for the subsystem you actually investigated.
- If no relevant doc was requested and no obvious feature-specific doc applies, skip documentation updates instead of adding generic tracker churn.
- Keep `ghidra_mcp_wishlist.md` updated whenever the workflow hits a missing MCP capability and would otherwise tempt a fallback outside MCP.
- Each wishlist entry should be short and concrete: what MCP lacked, what command/script/tool had to replace it, and what a useful MCP endpoint or behavior would look like.
- Record raw-import addresses alongside original segment-relative offsets when porting names.
- **Always use `rename_function_by_address`** — `rename_function` (by name) fails with "must have required property 'old_name'" and is broken. Use `"function_address": "000c:XXXX"` format.
- For substantive RE batches, end with at least 6 concrete future steps unless the task is fully closed and there are genuinely fewer defensible next actions.
- When a batch analyzes currently unnamed Ghidra functions and the behavior is clear enough, rename them in Ghidra instead of leaving them as positional `FUN_xxxx_xxxx` placeholders.
-
- **Terminal execution rule:** Always write multi-line Python scripts to a temporary `.py` file and execute that file with the Python interpreter instead of pasting multi-line Python directly into an interactive terminal. This avoids paste/encoding/line-ending issues and ensures the script runs in the expected environment.
# Executable Write Safety
- Normal Ghidra database work on `CRUSADER.EXE` remains in scope: renames, comments, prototypes, local-variable/type cleanup, function creation/deletion, and boundary repair are allowed unless the user says otherwise.
- Treat only actual program-byte changes as destructive actions: byte patching, write-back flows that alter loaded memory bytes, or any operation that would make the executable differ from the original program bytes.
- Never run destructive byte-write operations against the main reference executable in the project.
- Only use byte-patching or other byte-diverging executable write flows when the target program is an explicitly writable patch target, normally a program in the `/Writable` folder.
- Treat `CRUSADER.EXE`, `CRUSADER-RAW.EXE`, and other main reference executables as read-only with respect to program bytes unless the user explicitly says otherwise.
- Before running write endpoints such as `patch_bytes_and_reanalyze` or any PyGhidra byte-write script, verify that the selected program is the intended writable copy, not the reference executable.
- If the target program is not clearly a writable patch copy in `/Writable`, stop and ask the user before performing the byte write.
# Python-Backed Ghidra Through MCP Only
- Never use the offline/local PyGhidra CLI toolkit from this workspace.
- Do not invoke `tools.pyghidra_crusader`, the local `.venv-pyghidra311` entrypoint, or any project-open workflow that competes with the live GUI lock.
- Treat Python-backed Ghidra capabilities as MCP-only: use live `run_readonly_script(...)`, live write-capable MCP script endpoints, and other MCP operations exposed by the running Ghidra session.
- If MCP lacks a needed Python-backed operation, record that gap in `ghidra_mcp_wishlist.md` instead of falling back to the offline/local toolkit.
- If the workflow needs the user to change Ghidra state for MCP access, use the ask-questions tool with a yes/no confirmation prompt instead of plain text. Ask the user to open the correct Ghidra program or make the correct tab active before MCP work when needed.
# Current Verified Raw-Import Ports
These remain valid cross-reference anchors for `CRUSADER.EXE` work. Keep the old raw-import addresses and original segment-relative offsets in notes/comments when using them to support live NE renames.
- `0006:e5d0` = `cursor_update_hover` from seg001 `0x0060`
- `0008:7377` = `entity_count_by_type_a` from seg021 `0x0207`
- `0007:28ce` = `shot_entity_alloc` from seg001 `0x435e`
- `0007:2a19` = `shot_entity_free` from seg001 `0x44a9`
- `0007:2bc9` = `projectile_init_vector` from seg001 `0x4659`
- `0007:3001` = `entity_fire_weapon` from seg001 `0x4a91`
- `0007:3088` = `fire_weapon_from_cursor` from seg001 `0x4b18`
- `0007:30e8` = `projectile_check_hit` from seg001 `0x4b78`
- `0007:319e` = `projectile_step_update` from seg001 `0x4c2e`
- `0007:3298` = `projectile_trace_ray` from seg001 `0x4d28`
- `0007:371d` = `projectile_update_tick` from seg001 `0x51ad`
- `0007:4009` = `projectile_apply_hit` from seg001 `0x5a99`
# Named 000e: Functions (direct analysis — not segment-ported)
## Parser Cluster (`000e:34xx38xx`)
- `000e:345e` = `record_table_init`
- `000e:34cc` = `record_table_destroy`
- `000e:35c6` = `record_table_release_buffer`
- `000e:35ef` = `record_table_next_slot`
- `000e:3639` = `record_table_parse_buffer`
- `000e:3798` = `record_parser_read_line`
- `000e:38f8` = `record_parser_find_marker`
## RIFF/Animation Cluster (`000e:03xx2xxx`)
- `000e:2a28` = `riff_find_chunk_by_type` (RIFF LIST/RIFF walker; FourCC match at chunk+8)
- `000e:2104` = `animation_start` (finds "movi" chunk, inits timing ring buffer, kicks advance)
- `000e:12f4` = `animation_advance_frame` (fixed-point 0x1000 timer stepper, ring buffer update)
- `000e:103f` = `animation_tick` (guard wrapper — checks +0xd4 != -1, calls advance_frame)
- `000e:06f7` = `anim_load_audio_frame` (checks "01wb" chunk tag 0x62773130, copies audio into ring buffer)
## Constructor/Assert Helpers (`000e:22xx29xx`)
- `000e:223d` = `assert_alive_sentinel` (expects +0xd4 == -1; traps on mismatch)
- `000e:2777` = `animation_ctor_variant_a` (alloc + init flags + chained init/assert/finalize)
- `000e:2860` = `animation_ctor_variant_b` (variant A with extra +0x109 init)
- `000e:2969` = `animation_ctor_variant_c` (default static flag profile +0x4c=0xd)
# Documentation Structure
Detailed RE notes live in the `docs/` folder. Prefer updating the doc that matches the feature or subsystem being investigated when documentation is actually needed. `crusader_decompilation_notes.md` and `plan-mid.md` are legacy context files, not default maintenance targets. Unless a doc says otherwise, read raw-focused docs as evidence sources to be cross-checked against the live `CRUSADER.EXE` session.
| File | Topic |
|------|-------|
| `docs/overview.md` | Binary overview, address layout, segment map, next steps |
| `docs/phar-lap-extender.md` | DOS extender functions and string references |
| `docs/ne-segment1.md` | NE Segment 1: entity system, cheat system, full game logic analysis |
| `docs/raw-porting-progress.md` | seg091 RNG, 0x4588 callbacks, 0007 gameplay batches, `snap_entity_to_ground` |
| `docs/raw-000e.md` | 000e parser cluster and RIFF/animation subsystem |
| `docs/raw-0007-rendering.md` | Draw list, scroll/camera, coordinate transforms, tile visibility |
| `docs/raw-0008-000c.md` | 0008 dispatch helpers and 000c state machine |
| `docs/raw-000a-000d.md` | Tracked handles, cache manager, seg082 allocator, palette helpers, seg004/005 startup |
| `docs/far-call-targets.md` | Top-104 far-call targets (Tiers 15), supporting functions, analysis gaps |