Crusader_Decomp/.github/instructions/ghidra.instructions.md

11 KiB
Raw Blame History

applyTo
**

Crusader Ghidra Workflow

Safety Guardrails

  • Never create a git commit on your own.

  • Never run a command that may delete files outside a temporary folder unless you first ask the user with vscode_askQuestions and get explicit confirmation.

  • If a request could remove or overwrite repository files, pause and confirm before proceeding.

  • Active target is the NE Ghidra program CRUSADER.EXE unless explicitly stated otherwise.

  • Use Ghidra MCP tools for analysis, decompilation, renaming, comments, and xref work.

  • Treat the verified CRUSADER-RAW.EXE work already captured in docs/ and notes as a cross-reference evidence base for the live CRUSADER.EXE session, not as the default active program.

  • Avoid speculative renames. Prefer names that are supported by one of these:

    • verified raw mapping from standalone segment work
    • direct string evidence
    • clear call/field behavior in decompiler or disassembly
    • xref relationships to already-named functions
  • When porting names from standalone segment extracts or prior raw full-EXE work into CRUSADER.EXE, use only verified base mappings and keep the older raw address evidence with the live NE address where practical.

Verified Raw Mapping Rules

  • seg001 raw base = 0x6E570
  • seg021 raw base = 0x87170
  • Porting formula: raw_full_exe_flat = verified_segment_base + standalone_segment_relative_offset
  • seg001 and seg021 both contain a keyboard handler; keep the seg001 name as seg001_input_keyboard_handler to avoid collision.

Working Method

  • Prefer a single decompile call first.
  • If the decompiler collapses to thunk-heavy output, use one disassembly lookup to confirm the wrapper or parameter setup.
  • When decompile_function output is too large (>~50KB), the result is written to a temp JSON file that read_file returns as empty {}. Use disassemble_function instead — it returns inline assembly directly and is fully navigable for large functions.
  • For 16-bit NE decompiler failures such as Low-level Error: Symbol $$undef... extends beyond the end of the address space, do not assume the caller's frame is the only culprit. Inspect direct callees for parser-injected hidden __return_storage_ptr__ parameters or bad pointer-return storage first, especially after prototype edits or function recreation.
  • Cross-reference new CRUSADER.EXE findings against the old raw notes before promoting a rename or behavioral claim. If the two differ, keep both addresses and explain the mismatch instead of silently preferring one.
  • Add a short decompiler comment when a rename is mapped from verified notes so the provenance stays visible in Ghidra.
  • Keep crusader_decompilation_notes.md updated after each verified batch. That file is now a short index — append new analysis to the appropriate file in docs/ and add a row to the index table if a new file is created.
  • Keep crusader_segment_coverage_ledger.csv updated after each verified batch whenever a segment can be promoted or reclassified.
  • Keep the progress section in plan-mid.md updated after each verified batch so the next pass can resume from the exact stopping point.
  • Keep ghidra_mcp_wishlist.md updated whenever the workflow hits a missing MCP capability and has to fall back to PyGhidra or another local-only path.
  • Each wishlist entry should be short and concrete: what MCP lacked, what command/script/tool had to replace it, and what a useful MCP endpoint or behavior would look like.
  • Record raw-import addresses alongside original segment-relative offsets when porting names.
  • Always use rename_function_by_addressrename_function (by name) fails with "must have required property 'old_name'" and is broken. Use "function_address": "000c:XXXX" format.
  • For substantive RE batches, end with at least 6 concrete future steps unless the task is fully closed and there are genuinely fewer defensible next actions.
  • When a batch analyzes currently unnamed Ghidra functions and the behavior is clear enough, rename them in Ghidra instead of leaving them as positional FUN_xxxx_xxxx placeholders.
  • Terminal execution rule: Always write multi-line Python scripts to a temporary .py file and execute that file with the Python interpreter instead of pasting multi-line Python directly into an interactive terminal. This avoids paste/encoding/line-ending issues and ensures the script runs in the expected environment.

Executable Write Safety

  • Normal Ghidra database work on CRUSADER.EXE remains in scope: renames, comments, prototypes, local-variable/type cleanup, function creation/deletion, and boundary repair are allowed unless the user says otherwise.
  • Treat only actual program-byte changes as destructive actions: byte patching, write-back flows that alter loaded memory bytes, or any operation that would make the executable differ from the original program bytes.
  • Never run destructive byte-write operations against the main reference executable in the project.
  • Only use byte-patching or other byte-diverging executable write flows when the target program is an explicitly writable patch target, normally a program in the /Writable folder.
  • Treat CRUSADER.EXE, CRUSADER-RAW.EXE, and other main reference executables as read-only with respect to program bytes unless the user explicitly says otherwise.
  • Before running write endpoints such as patch_bytes_and_reanalyze or any PyGhidra byte-write script, verify that the selected program is the intended writable copy, not the reference executable.
  • If the target program is not clearly a writable patch copy in /Writable, stop and ask the user before performing the byte write.

PyGhidra Fallback

  • Use the local PyGhidra toolkit in tools/pyghidra_crusader when MCP is missing an operation such as function creation, deletion, or batched scripted edits.
  • If Ghidra was started with Python enabled, prefer live MCP run_readonly_script(...) for one-off inspection first; drop to the local PyGhidra CLI only when the work needs write access or MCP still lacks the required operation.
  • When PyGhidra is needed because MCP lacks a required operation, append a note to ghidra_mcp_wishlist.md in the same batch if the gap is not already documented.
  • The workspace-local Python environment for this toolkit is .venv-pyghidra311, created from C:\Users\Maddo\.pyenv\pyenv-win\versions\3.11.6\python.exe and installed from the bundled Ghidra 12.0.4 offline packages.
  • Default install dir for the toolkit is I:\Apps\ghidra_12.0.4_PUBLIC.
  • Invoke the toolkit with \.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader ... from the repo root.
  • Rebuild or refresh that environment with powershell -ExecutionPolicy Bypass -File .\tools\pyghidra_crusader\bootstrap_env.ps1 from the repo root when the local PyGhidra packages drift or a Ghidra upgrade lands.
  • Keep PyGhidra batches small too: prefer one focused repair plan or 1-5 direct edits at a time.
  • Write operations require the Ghidra project to open successfully. If Crusader.lock is present because the GUI owns the project, close Ghidra first or operate on a project copy.
  • If the workflow needs the user to change Ghidra state, use the ask-questions tool with a yes/no confirmation prompt instead of plain text. Ask the user to close Ghidra before PyGhidra write commands, and ask the user to open the Ghidra project before MCP server commands. The prompt should briefly describe exactly what to do and instruct the user to answer Yes only after the action is complete.

Current Verified Raw-Import Ports

These remain valid cross-reference anchors for CRUSADER.EXE work. Keep the old raw-import addresses and original segment-relative offsets in notes/comments when using them to support live NE renames.

  • 0006:e5d0 = cursor_update_hover from seg001 0x0060
  • 0008:7377 = entity_count_by_type_a from seg021 0x0207
  • 0007:28ce = shot_entity_alloc from seg001 0x435e
  • 0007:2a19 = shot_entity_free from seg001 0x44a9
  • 0007:2bc9 = projectile_init_vector from seg001 0x4659
  • 0007:3001 = entity_fire_weapon from seg001 0x4a91
  • 0007:3088 = fire_weapon_from_cursor from seg001 0x4b18
  • 0007:30e8 = projectile_check_hit from seg001 0x4b78
  • 0007:319e = projectile_step_update from seg001 0x4c2e
  • 0007:3298 = projectile_trace_ray from seg001 0x4d28
  • 0007:371d = projectile_update_tick from seg001 0x51ad
  • 0007:4009 = projectile_apply_hit from seg001 0x5a99

Named 000e: Functions (direct analysis — not segment-ported)

Parser Cluster (000e:34xx38xx)

  • 000e:345e = record_table_init
  • 000e:34cc = record_table_destroy
  • 000e:35c6 = record_table_release_buffer
  • 000e:35ef = record_table_next_slot
  • 000e:3639 = record_table_parse_buffer
  • 000e:3798 = record_parser_read_line
  • 000e:38f8 = record_parser_find_marker

RIFF/Animation Cluster (000e:03xx2xxx)

  • 000e:2a28 = riff_find_chunk_by_type (RIFF LIST/RIFF walker; FourCC match at chunk+8)
  • 000e:2104 = animation_start (finds "movi" chunk, inits timing ring buffer, kicks advance)
  • 000e:12f4 = animation_advance_frame (fixed-point 0x1000 timer stepper, ring buffer update)
  • 000e:103f = animation_tick (guard wrapper — checks +0xd4 != -1, calls advance_frame)
  • 000e:06f7 = anim_load_audio_frame (checks "01wb" chunk tag 0x62773130, copies audio into ring buffer)

Constructor/Assert Helpers (000e:22xx29xx)

  • 000e:223d = assert_alive_sentinel (expects +0xd4 == -1; traps on mismatch)
  • 000e:2777 = animation_ctor_variant_a (alloc + init flags + chained init/assert/finalize)
  • 000e:2860 = animation_ctor_variant_b (variant A with extra +0x109 init)
  • 000e:2969 = animation_ctor_variant_c (default static flag profile +0x4c=0xd)

Documentation Structure

Detailed RE notes live in the docs/ folder. crusader_decompilation_notes.md is a short index. Unless a doc says otherwise, read raw-focused docs as evidence sources to be cross-checked against the live CRUSADER.EXE session.

File Topic
docs/overview.md Binary overview, address layout, segment map, next steps
docs/phar-lap-extender.md DOS extender functions and string references
docs/ne-segment1.md NE Segment 1: entity system, cheat system, full game logic analysis
docs/raw-porting-progress.md seg091 RNG, 0x4588 callbacks, 0007 gameplay batches, snap_entity_to_ground
docs/raw-000e.md 000e parser cluster and RIFF/animation subsystem
docs/raw-0007-rendering.md Draw list, scroll/camera, coordinate transforms, tile visibility
docs/raw-0008-000c.md 0008 dispatch helpers and 000c state machine
docs/raw-000a-000d.md Tracked handles, cache manager, seg082 allocator, palette helpers, seg004/005 startup
docs/far-call-targets.md Top-104 far-call targets (Tiers 15), supporting functions, analysis gaps