Crusader_Decomp/ghidra_mcp_wishlist.md
2026-04-10 18:14:55 +02:00

14 KiB

Ghidra MCP Wishlist

This file records concrete MCP gaps hit during Crusader workflow passes.

Rules for keeping it useful:

  • Put only unresolved work in Remaining TODOs.
  • Move implemented or source-fixed items to Done / Implemented.
  • Keep each remaining item short: missing capability, fallback, why it matters, proposed behavior, latest status.

Remaining TODOs

Raw Patch Extraction From Live Ghidra Edits

  • Missing capability: export a machine-friendly raw patch plan from the current live program after verified byte edits, including NE relocation target changes.
  • Current fallback: manually translate selector-space Ghidra edits into raw file offsets, helper-window byte blocks, and relocation-record rewrites before re-encoding them in a PowerShell patcher.
  • Why it matters: the Regret hidden-debugger debug menu 2.0 patch was proven in Ghidra first, but the usable deliverable had to become a raw REGRET.EXE patch because the available export processor path returned the original executable bytes instead of the modified image.
  • Proposed MCP behavior: add an endpoint that emits a structured patch plan for the selected program, including file offsets, original bytes, new bytes, relocation-record deltas, and selector-to-segment metadata for NE executables. A direct export_patched_binary(...) helper would also be useful if it can be proven reliable on the target processor path.
  • Latest status (2026-04-10): the full Regret runtime/helper plus loosecannon trigger patch had to be converted by hand into raw offsets 0xD2840..0xD28DC, 0xD2C94, 0xD2E0C, 0xD2E14, 0xD2E1C, 0x7BB25, 0x7BB15, 0x7BB05, and 0x7BAF5 after the live export path produced an unmodified EXE.

Explicit Write-Target Enforcement And Reporting

  • Missing capability: reliable enforcement and reporting of the exact target program for write-capable MCP operations.
  • Current fallback: re-read bytes immediately after each write, compare against the intended file on disk, and assume nothing from the returned target_program field when explicit selectors were provided.
  • Why it matters: during the Regret debugger patch session, run_write_script(...) results still reported target_program=REGRET.EXE even when /Writable/REGRET-PATCHED.EXE was passed explicitly, which made it harder to tell whether a write actually landed on the writable clone or silently fell back.
  • Proposed MCP behavior: when explicit selectors are provided, write-capable endpoints should either bind to that exact program and report the resolved full domain-file path, or fail with a structured target-mismatch error instead of proceeding with ambiguous status text.
  • Latest status (2026-04-10): the raw-write fallback only remained trustworthy because the session re-read the patched bytes from the writable target after every operation; the reported target text itself was not sufficient evidence.

Instruction-Overwrite Patching Without Script Fallbacks

  • Missing capability: patch over existing defined instructions on writable copies without dropping to custom write scripts that clear code units first.
  • Current fallback: use run_write_script(...) to clear the affected code units, write bytes manually, trigger disassembly, and then re-verify the result with readback helpers.
  • Why it matters: patch_bytes_and_reanalyze(...) refused several valid small Regret debugger edits with Memory change conflicts with instruction, even though the intended operation was a normal code-patch workflow on a writable copy.
  • Proposed MCP behavior: add an explicit instruction-overwrite mode to patch_bytes_and_reanalyze(...) for writable targets so the endpoint can clear the conflicting code units, apply the bytes, re-disassemble the region, and report the final instruction text in one machine-friendly result.
  • Latest status (2026-04-10): the active loosecannon fix at 1148:3743 only landed after falling back to a write script that manually cleared the code unit and forced redisassembly.

Apply-Class-Layout Schema Parity

  • Missing capability: reliable minimal-payload use of apply_class_layout(...) during live class-lift batches.
  • Current fallback: create the class namespace and provisional structs directly through live run_write_script(...), then defer any later class-binding or per-method typing work.
  • Why it matters: the first EntityDispatchEntry pilot batch reached the point where the class shell and provisional datatypes existed, but the endpoint still rejected the bind attempt with an undocumented required methods property even though the local workflow only needed class_path, instance_struct, and vtable_struct.
  • Proposed MCP behavior: apply_class_layout(...) should accept a true minimal bind payload when no method list is being applied, or else the bridge/schema should expose the methods field explicitly and treat an omitted list as empty instead of a validation error.
  • Latest status (2026-04-07): live CRUSADER.EXE class-lift pass for Remorse::EntityDispatchEntry succeeded via run_write_script(...), creating the class shell plus /Remorse/EntityDispatchEntryBase and /Remorse/EntityDispatchEntryVtable. The only failed step in that batch was the direct apply_class_layout(...) call, which rejected the payload before any class work ran.

Class-Lift Typing Live Parity

  • Missing capability: end-to-end live-session parity for storage-aware this typing on 16-bit NE methods whose current storage does not match the default pointer storage the binder would choose.
  • Current fallback: use local PyGhidra with DYNAMIC_STORAGE_ALL_PARAMS, or move methods with set_function_class(...) and defer final this typing/manual prototype cleanup.
  • Why it matters: EntityVmContext lifecycle methods and EntityVmRuntime::Create still need the live MCP path to behave like the verified local PyGhidra repair flow.
  • Proposed MCP behavior: set_function_this_type(...) and apply_class_layout(...) should reliably fall back to dynamic storage in-session for these 16-bit cases, while preserving structured per-method warnings instead of aborting the batch.
  • Latest status (2026-04-06): local PyGhidra confirmed that 1420:0eec, 1420:10b6, 1420:10da, 1420:1162, 1420:118f, and 1420:1278 accept EntityVmContext * this cleanly via DYNAMIC_STORAGE_ALL_PARAMS. The live storage-aware path now also accepts explicit /Remorse/EntityVmRuntime *32, /Remorse/EntityVmOwnerResource *32, /Remorse/EntityVmContext *32, and /Remorse/EntityVmSlotEntry *32 signatures in-session once the exact *32 datatype has first been resolved into the program data-type manager. The remaining live gap is now mostly about deeper mixed-width parameter packs like 1420:0eec CreateFromSlotIndex, not the previously blocked 4-byte object-pointer cases themselves.

Storage-Aware Prototype Live Verification

  • Missing capability: confirmed live-session parity for the newest storage-aware prototype fixes on 16-bit NE repair cases.
  • Current fallback: if the active GUI session is on an older plugin build, reload the plugin; if parity still fails, use local PyGhidra or manually compensate when testing stack offsets / calling conventions.
  • Why it matters: 1000:42e2 and 1420:1499 are the known proof cases for explicit return storage, stack-word parameter modeling, and 16-bit far calling conventions.
  • Proposed MCP behavior: set_function_prototype_storage(...) should accept bare stack: offsets in the same hex-style form used in current workflow notes and should preserve exact calling-convention tokens such as __cdecl16far before falling back to lossy legacy normalization.
  • Latest status (2026-04-06): the reloaded live plugin now reaches the real storage-aware implementation in-session on both proof cases, and explicit AX:DX return storage survives correctly on 1000:42e2 and 1420:1499. The remaining live parity issue is now narrower: calling_convention='__cdecl16far' still normalizes those proof-case applies to plain __cdecl, but direct live run_write_script(...) calls can immediately restore __cdecl16far, which proves the live database accepts the exact convention token and leaves the endpoint-side normalization/deployment path as the remaining gap.

Live Metadata / Read-Target Verification

  • Missing capability: fully verified live-session parity for selector-aware reads and metadata helpers in mixed-build or partially refreshed GUI sessions.
  • Current fallback: bridge alias retries, explicit-target normalization, and manual project-note cross-checks when a live session still behaves like an older plugin build.
  • Why it matters: Crusader work routinely needs side-by-side reads across /CRUSADER.EXE, /es/CRUSADER.EXE, /Writable/..., and other project entries without changing the active Ghidra tab.
  • Proposed MCP behavior: list_project_programs(...), get_runtime_capabilities(...), get_callers(...), and other selector-aware read helpers should bind reliably to the requested or active target and return structured unsupported-state output instead of raw context failures.
  • Latest status (2026-04-06): the local fork already includes alias fallbacks and Windows path/folder normalization for explicit-target matching. Remaining work is live-session verification after plugin refresh, not additional local source coverage.

Done / Implemented In Local Fork

Transport And Runtime

  • POST endpoints now accept both application/json and application/x-www-form-urlencoded request bodies. Unsupported POST payloads fail early with unsupported-content-type instead of degrading into missing-parameter errors.
  • get_runtime_capabilities() reports readonly/write-script capability state and run_readonly_script(...) returns structured unsupported-state output when Python support is unavailable.
  • run_write_script(...) and alias run_transaction_script(...) are implemented with dry-run support, explicit target selectors, a write-policy denylist, and machine-friendly transaction results.
  • Bridge runtime helpers retry compatible aliases on 404 / No context found for request for mixed-build live sessions.

Explicit Targeting And Project Access

  • Explicit write targeting is implemented for edit flows such as apply_program_edit_plan(...) and patch_bytes_and_reanalyze(...) with deterministic save behavior.
  • Selector-aware read/query endpoints now accept project_dir, project_name, folder_path, and program_name and reuse the same target-resolution layer as write flows.
  • Target matching now normalizes Windows path casing and slash style and can infer missing project selectors from the active domain file when appropriate.
  • list_project_programs(...) plus alias project_programs is implemented and returns machine-friendly folder/program inventory.

Analysis, Inspection, And Xrefs

  • Function boundary repair helpers are implemented: create_function_by_address(...), delete_function_by_address(...), and get_function_containing(...).
  • Arbitrary memory/code inspection helpers are implemented: read_region(...), disassemble_region(...), get_instruction_window(...), search_instructions(...), and get_data_uses(...).
  • search_bytes(...) is implemented with ?? wildcards and machine-friendly hit output.
  • Caller/xref recovery is improved via get_callers(...), and get_xrefs_to(...) / get_xrefs_from(...) return typed reference kinds plus containing-function metadata.
  • get_symbol_at(address) now uses direct routes when present and bridge-side legacy fallbacks when the live process is older.

Batch Edits And Comparison Tools

  • Batch helpers are implemented: set_comments(...), set_decompiler_comments(...), rename_functions_by_address(...), and apply_program_edit_plan(...) with dry-run support.
  • Reanalysis helpers are implemented: reanalyze_region(...), patch_bytes_and_reanalyze(...), and analyze_function_boundaries(...).
  • Cross-program comparison helpers are implemented: compare_regions(...), compare_strings(...), and compare_functions(...).
  • port_symbols(...) now ports verified names/comments between programs with provenance text and explicit source/target selectors.

Class / Namespace / OO Recovery

  • Namespace and class authoring helpers are implemented: create_namespace(...), create_class(...), list_namespace_members(...), move_symbol_to_namespace(...), and set_function_class(...).
  • Vtable and struct helpers are implemented: analyze_vtable(...), create_or_update_struct(...), create_or_update_vtable(...), and alias coverage such as build_vtable / set_this_type.
  • set_function_this_type(...) supports storage-strategy hints and apply_class_layout(...) now soft-fails per-method typing with structured warnings instead of aborting the whole batch.

Prototype And Storage Modeling

  • The storage-aware prototype endpoint is implemented as set_function_prototype_storage(...) with alias set_storage_aware_prototype(...).
  • The endpoint accepts declarative return_type, return_storage, ordered parameter lines (name|type|storage), explicit target selectors, varargs, and machine-friendly warnings when hidden __return_storage_ptr__ state is still present.
  • Source-level fixes landed on 2026-04-06 for the two known live correctness bugs:
    • stack: storage is now parsed before generic Ghidra deserialization so workflow-style bare stack offsets are interpreted consistently.
    • exact calling-convention tokens are tried before legacy normalization so 16-bit far conventions such as __cdecl16far are not needlessly collapsed to plain __cdecl when the exact token is accepted.

Historical Notes

  • If a future pass hits a new MCP gap, add it under Remaining TODOs and move it to Done / Implemented In Local Fork once the local source and bridge support are both in place.