Crusader_Decomp/.github/instructions/ghidra.instructions.md
2026-04-12 14:45:08 +02:00

9.8 KiB
Raw Blame History

applyTo
**

Crusader Ghidra Workflow

Safety Guardrails

  • Never create a git commit on your own.

  • Never run a command that may delete files outside a temporary folder unless you first ask the user with vscode_askQuestions and get explicit confirmation.

  • If a request could remove or overwrite repository files, pause and confirm before proceeding.

  • Active target is the NE Ghidra program CRUSADER.EXE unless explicitly stated otherwise.

  • Use Ghidra MCP tools for analysis, decompilation, renaming, comments, and xref work.

  • Treat the verified CRUSADER-RAW.EXE work already captured in docs/ and notes as a cross-reference evidence base for the live CRUSADER.EXE session, not as the default active program.

  • Avoid speculative renames. Prefer names that are supported by one of these:

    • verified raw mapping from standalone segment work
    • direct string evidence
    • clear call/field behavior in decompiler or disassembly
    • xref relationships to already-named functions
  • When porting names from standalone segment extracts or prior raw full-EXE work into CRUSADER.EXE, use only verified base mappings and keep the older raw address evidence with the live NE address where practical.

Verified Raw Mapping Rules

  • seg001 raw base = 0x6E570
  • seg021 raw base = 0x87170
  • Porting formula: raw_full_exe_flat = verified_segment_base + standalone_segment_relative_offset
  • seg001 and seg021 both contain a keyboard handler; keep the seg001 name as seg001_input_keyboard_handler to avoid collision.

Working Method

  • Prefer a single decompile call first.
  • If the decompiler collapses to thunk-heavy output, use one disassembly lookup to confirm the wrapper or parameter setup.
  • When decompile_function output is too large (>~50KB), the result is written to a temp JSON file that read_file returns as empty {}. Use disassemble_function instead — it returns inline assembly directly and is fully navigable for large functions.
  • For 16-bit NE decompiler failures such as Low-level Error: Symbol $$undef... extends beyond the end of the address space, do not assume the caller's frame is the only culprit. Inspect direct callees for parser-injected hidden __return_storage_ptr__ parameters or bad pointer-return storage first, especially after prototype edits or function recreation.
  • Cross-reference new CRUSADER.EXE findings against the old raw notes before promoting a rename or behavioral claim. If the two differ, keep both addresses and explain the mismatch instead of silently preferring one.
  • Add a short decompiler comment when a rename is mapped from verified notes so the provenance stays visible in Ghidra.
  • Keep crusader_decompilation_notes.md updated after each verified batch. That file is now a short index — append new analysis to the appropriate file in docs/ and add a row to the index table if a new file is created.
  • Keep crusader_segment_coverage_ledger.csv updated after each verified batch whenever a segment can be promoted or reclassified.
  • Keep the progress section in plan-mid.md updated after each verified batch so the next pass can resume from the exact stopping point.
  • Keep ghidra_mcp_wishlist.md updated whenever the workflow hits a missing MCP capability and would otherwise tempt a fallback outside MCP.
  • Each wishlist entry should be short and concrete: what MCP lacked, what command/script/tool had to replace it, and what a useful MCP endpoint or behavior would look like.
  • Record raw-import addresses alongside original segment-relative offsets when porting names.
  • Always use rename_function_by_addressrename_function (by name) fails with "must have required property 'old_name'" and is broken. Use "function_address": "000c:XXXX" format.
  • For substantive RE batches, end with at least 6 concrete future steps unless the task is fully closed and there are genuinely fewer defensible next actions.
  • When a batch analyzes currently unnamed Ghidra functions and the behavior is clear enough, rename them in Ghidra instead of leaving them as positional FUN_xxxx_xxxx placeholders.
  • Terminal execution rule: Always write multi-line Python scripts to a temporary .py file and execute that file with the Python interpreter instead of pasting multi-line Python directly into an interactive terminal. This avoids paste/encoding/line-ending issues and ensures the script runs in the expected environment.

Executable Write Safety

  • Normal Ghidra database work on CRUSADER.EXE remains in scope: renames, comments, prototypes, local-variable/type cleanup, function creation/deletion, and boundary repair are allowed unless the user says otherwise.
  • Treat only actual program-byte changes as destructive actions: byte patching, write-back flows that alter loaded memory bytes, or any operation that would make the executable differ from the original program bytes.
  • Never run destructive byte-write operations against the main reference executable in the project.
  • Only use byte-patching or other byte-diverging executable write flows when the target program is an explicitly writable patch target, normally a program in the /Writable folder.
  • Treat CRUSADER.EXE, CRUSADER-RAW.EXE, and other main reference executables as read-only with respect to program bytes unless the user explicitly says otherwise.
  • Before running write endpoints such as patch_bytes_and_reanalyze or any PyGhidra byte-write script, verify that the selected program is the intended writable copy, not the reference executable.
  • If the target program is not clearly a writable patch copy in /Writable, stop and ask the user before performing the byte write.

Python-Backed Ghidra Through MCP Only

  • Never use the offline/local PyGhidra CLI toolkit from this workspace.
  • Do not invoke tools.pyghidra_crusader, the local .venv-pyghidra311 entrypoint, or any project-open workflow that competes with the live GUI lock.
  • Treat Python-backed Ghidra capabilities as MCP-only: use live run_readonly_script(...), live write-capable MCP script endpoints, and other MCP operations exposed by the running Ghidra session.
  • If MCP lacks a needed Python-backed operation, record that gap in ghidra_mcp_wishlist.md instead of falling back to the offline/local toolkit.
  • If the workflow needs the user to change Ghidra state for MCP access, use the ask-questions tool with a yes/no confirmation prompt instead of plain text. Ask the user to open the correct Ghidra program or make the correct tab active before MCP work when needed.

Current Verified Raw-Import Ports

These remain valid cross-reference anchors for CRUSADER.EXE work. Keep the old raw-import addresses and original segment-relative offsets in notes/comments when using them to support live NE renames.

  • 0006:e5d0 = cursor_update_hover from seg001 0x0060
  • 0008:7377 = entity_count_by_type_a from seg021 0x0207
  • 0007:28ce = shot_entity_alloc from seg001 0x435e
  • 0007:2a19 = shot_entity_free from seg001 0x44a9
  • 0007:2bc9 = projectile_init_vector from seg001 0x4659
  • 0007:3001 = entity_fire_weapon from seg001 0x4a91
  • 0007:3088 = fire_weapon_from_cursor from seg001 0x4b18
  • 0007:30e8 = projectile_check_hit from seg001 0x4b78
  • 0007:319e = projectile_step_update from seg001 0x4c2e
  • 0007:3298 = projectile_trace_ray from seg001 0x4d28
  • 0007:371d = projectile_update_tick from seg001 0x51ad
  • 0007:4009 = projectile_apply_hit from seg001 0x5a99

Named 000e: Functions (direct analysis — not segment-ported)

Parser Cluster (000e:34xx38xx)

  • 000e:345e = record_table_init
  • 000e:34cc = record_table_destroy
  • 000e:35c6 = record_table_release_buffer
  • 000e:35ef = record_table_next_slot
  • 000e:3639 = record_table_parse_buffer
  • 000e:3798 = record_parser_read_line
  • 000e:38f8 = record_parser_find_marker

RIFF/Animation Cluster (000e:03xx2xxx)

  • 000e:2a28 = riff_find_chunk_by_type (RIFF LIST/RIFF walker; FourCC match at chunk+8)
  • 000e:2104 = animation_start (finds "movi" chunk, inits timing ring buffer, kicks advance)
  • 000e:12f4 = animation_advance_frame (fixed-point 0x1000 timer stepper, ring buffer update)
  • 000e:103f = animation_tick (guard wrapper — checks +0xd4 != -1, calls advance_frame)
  • 000e:06f7 = anim_load_audio_frame (checks "01wb" chunk tag 0x62773130, copies audio into ring buffer)

Constructor/Assert Helpers (000e:22xx29xx)

  • 000e:223d = assert_alive_sentinel (expects +0xd4 == -1; traps on mismatch)
  • 000e:2777 = animation_ctor_variant_a (alloc + init flags + chained init/assert/finalize)
  • 000e:2860 = animation_ctor_variant_b (variant A with extra +0x109 init)
  • 000e:2969 = animation_ctor_variant_c (default static flag profile +0x4c=0xd)

Documentation Structure

Detailed RE notes live in the docs/ folder. crusader_decompilation_notes.md is a short index. Unless a doc says otherwise, read raw-focused docs as evidence sources to be cross-checked against the live CRUSADER.EXE session.

File Topic
docs/overview.md Binary overview, address layout, segment map, next steps
docs/phar-lap-extender.md DOS extender functions and string references
docs/ne-segment1.md NE Segment 1: entity system, cheat system, full game logic analysis
docs/raw-porting-progress.md seg091 RNG, 0x4588 callbacks, 0007 gameplay batches, snap_entity_to_ground
docs/raw-000e.md 000e parser cluster and RIFF/animation subsystem
docs/raw-0007-rendering.md Draw list, scroll/camera, coordinate transforms, tile visibility
docs/raw-0008-000c.md 0008 dispatch helpers and 000c state machine
docs/raw-000a-000d.md Tracked handles, cache manager, seg082 allocator, palette helpers, seg004/005 startup
docs/far-call-targets.md Top-104 far-call targets (Tiers 15), supporting functions, analysis gaps