Crusader_Decomp/ghidra_mcp_wishlist.md
2026-04-10 18:14:55 +02:00

115 lines
No EOL
14 KiB
Markdown

# Ghidra MCP Wishlist
This file records concrete MCP gaps hit during Crusader workflow passes.
Rules for keeping it useful:
- Put only unresolved work in `Remaining TODOs`.
- Move implemented or source-fixed items to `Done / Implemented`.
- Keep each remaining item short: missing capability, fallback, why it matters, proposed behavior, latest status.
## Remaining TODOs
### Raw Patch Extraction From Live Ghidra Edits
- Missing capability: export a machine-friendly raw patch plan from the current live program after verified byte edits, including NE relocation target changes.
- Current fallback: manually translate selector-space Ghidra edits into raw file offsets, helper-window byte blocks, and relocation-record rewrites before re-encoding them in a PowerShell patcher.
- Why it matters: the Regret hidden-debugger `debug menu 2.0` patch was proven in Ghidra first, but the usable deliverable had to become a raw `REGRET.EXE` patch because the available export processor path returned the original executable bytes instead of the modified image.
- Proposed MCP behavior: add an endpoint that emits a structured patch plan for the selected program, including file offsets, original bytes, new bytes, relocation-record deltas, and selector-to-segment metadata for NE executables. A direct `export_patched_binary(...)` helper would also be useful if it can be proven reliable on the target processor path.
- Latest status (2026-04-10): the full Regret runtime/helper plus `loosecannon` trigger patch had to be converted by hand into raw offsets `0xD2840..0xD28DC`, `0xD2C94`, `0xD2E0C`, `0xD2E14`, `0xD2E1C`, `0x7BB25`, `0x7BB15`, `0x7BB05`, and `0x7BAF5` after the live export path produced an unmodified EXE.
### Explicit Write-Target Enforcement And Reporting
- Missing capability: reliable enforcement and reporting of the exact target program for write-capable MCP operations.
- Current fallback: re-read bytes immediately after each write, compare against the intended file on disk, and assume nothing from the returned `target_program` field when explicit selectors were provided.
- Why it matters: during the Regret debugger patch session, `run_write_script(...)` results still reported `target_program=REGRET.EXE` even when `/Writable/REGRET-PATCHED.EXE` was passed explicitly, which made it harder to tell whether a write actually landed on the writable clone or silently fell back.
- Proposed MCP behavior: when explicit selectors are provided, write-capable endpoints should either bind to that exact program and report the resolved full domain-file path, or fail with a structured target-mismatch error instead of proceeding with ambiguous status text.
- Latest status (2026-04-10): the raw-write fallback only remained trustworthy because the session re-read the patched bytes from the writable target after every operation; the reported target text itself was not sufficient evidence.
### Instruction-Overwrite Patching Without Script Fallbacks
- Missing capability: patch over existing defined instructions on writable copies without dropping to custom write scripts that clear code units first.
- Current fallback: use `run_write_script(...)` to clear the affected code units, write bytes manually, trigger disassembly, and then re-verify the result with readback helpers.
- Why it matters: `patch_bytes_and_reanalyze(...)` refused several valid small Regret debugger edits with `Memory change conflicts with instruction`, even though the intended operation was a normal code-patch workflow on a writable copy.
- Proposed MCP behavior: add an explicit instruction-overwrite mode to `patch_bytes_and_reanalyze(...)` for writable targets so the endpoint can clear the conflicting code units, apply the bytes, re-disassemble the region, and report the final instruction text in one machine-friendly result.
- Latest status (2026-04-10): the active `loosecannon` fix at `1148:3743` only landed after falling back to a write script that manually cleared the code unit and forced redisassembly.
### Apply-Class-Layout Schema Parity
- Missing capability: reliable minimal-payload use of `apply_class_layout(...)` during live class-lift batches.
- Current fallback: create the class namespace and provisional structs directly through live `run_write_script(...)`, then defer any later class-binding or per-method typing work.
- Why it matters: the first `EntityDispatchEntry` pilot batch reached the point where the class shell and provisional datatypes existed, but the endpoint still rejected the bind attempt with an undocumented required `methods` property even though the local workflow only needed `class_path`, `instance_struct`, and `vtable_struct`.
- Proposed MCP behavior: `apply_class_layout(...)` should accept a true minimal bind payload when no method list is being applied, or else the bridge/schema should expose the `methods` field explicitly and treat an omitted list as empty instead of a validation error.
- Latest status (2026-04-07): live `CRUSADER.EXE` class-lift pass for `Remorse::EntityDispatchEntry` succeeded via `run_write_script(...)`, creating the class shell plus `/Remorse/EntityDispatchEntryBase` and `/Remorse/EntityDispatchEntryVtable`. The only failed step in that batch was the direct `apply_class_layout(...)` call, which rejected the payload before any class work ran.
### Class-Lift Typing Live Parity
- Missing capability: end-to-end live-session parity for storage-aware `this` typing on 16-bit NE methods whose current storage does not match the default pointer storage the binder would choose.
- Current fallback: use local PyGhidra with `DYNAMIC_STORAGE_ALL_PARAMS`, or move methods with `set_function_class(...)` and defer final `this` typing/manual prototype cleanup.
- Why it matters: `EntityVmContext` lifecycle methods and `EntityVmRuntime::Create` still need the live MCP path to behave like the verified local PyGhidra repair flow.
- Proposed MCP behavior: `set_function_this_type(...)` and `apply_class_layout(...)` should reliably fall back to dynamic storage in-session for these 16-bit cases, while preserving structured per-method warnings instead of aborting the batch.
- Latest status (2026-04-06): local PyGhidra confirmed that `1420:0eec`, `1420:10b6`, `1420:10da`, `1420:1162`, `1420:118f`, and `1420:1278` accept `EntityVmContext * this` cleanly via `DYNAMIC_STORAGE_ALL_PARAMS`. The live storage-aware path now also accepts explicit `/Remorse/EntityVmRuntime *32`, `/Remorse/EntityVmOwnerResource *32`, `/Remorse/EntityVmContext *32`, and `/Remorse/EntityVmSlotEntry *32` signatures in-session once the exact `*32` datatype has first been resolved into the program data-type manager. The remaining live gap is now mostly about deeper mixed-width parameter packs like `1420:0eec CreateFromSlotIndex`, not the previously blocked 4-byte object-pointer cases themselves.
### Storage-Aware Prototype Live Verification
- Missing capability: confirmed live-session parity for the newest storage-aware prototype fixes on 16-bit NE repair cases.
- Current fallback: if the active GUI session is on an older plugin build, reload the plugin; if parity still fails, use local PyGhidra or manually compensate when testing stack offsets / calling conventions.
- Why it matters: `1000:42e2` and `1420:1499` are the known proof cases for explicit return storage, stack-word parameter modeling, and 16-bit far calling conventions.
- Proposed MCP behavior: `set_function_prototype_storage(...)` should accept bare `stack:` offsets in the same hex-style form used in current workflow notes and should preserve exact calling-convention tokens such as `__cdecl16far` before falling back to lossy legacy normalization.
- Latest status (2026-04-06): the reloaded live plugin now reaches the real storage-aware implementation in-session on both proof cases, and explicit `AX:DX` return storage survives correctly on `1000:42e2` and `1420:1499`. The remaining live parity issue is now narrower: `calling_convention='__cdecl16far'` still normalizes those proof-case applies to plain `__cdecl`, but direct live `run_write_script(...)` calls can immediately restore `__cdecl16far`, which proves the live database accepts the exact convention token and leaves the endpoint-side normalization/deployment path as the remaining gap.
### Live Metadata / Read-Target Verification
- Missing capability: fully verified live-session parity for selector-aware reads and metadata helpers in mixed-build or partially refreshed GUI sessions.
- Current fallback: bridge alias retries, explicit-target normalization, and manual project-note cross-checks when a live session still behaves like an older plugin build.
- Why it matters: Crusader work routinely needs side-by-side reads across `/CRUSADER.EXE`, `/es/CRUSADER.EXE`, `/Writable/...`, and other project entries without changing the active Ghidra tab.
- Proposed MCP behavior: `list_project_programs(...)`, `get_runtime_capabilities(...)`, `get_callers(...)`, and other selector-aware read helpers should bind reliably to the requested or active target and return structured unsupported-state output instead of raw context failures.
- Latest status (2026-04-06): the local fork already includes alias fallbacks and Windows path/folder normalization for explicit-target matching. Remaining work is live-session verification after plugin refresh, not additional local source coverage.
## Done / Implemented In Local Fork
### Transport And Runtime
- POST endpoints now accept both `application/json` and `application/x-www-form-urlencoded` request bodies. Unsupported POST payloads fail early with `unsupported-content-type` instead of degrading into missing-parameter errors.
- `get_runtime_capabilities()` reports readonly/write-script capability state and `run_readonly_script(...)` returns structured unsupported-state output when Python support is unavailable.
- `run_write_script(...)` and alias `run_transaction_script(...)` are implemented with dry-run support, explicit target selectors, a write-policy denylist, and machine-friendly transaction results.
- Bridge runtime helpers retry compatible aliases on `404` / `No context found for request` for mixed-build live sessions.
### Explicit Targeting And Project Access
- Explicit write targeting is implemented for edit flows such as `apply_program_edit_plan(...)` and `patch_bytes_and_reanalyze(...)` with deterministic save behavior.
- Selector-aware read/query endpoints now accept `project_dir`, `project_name`, `folder_path`, and `program_name` and reuse the same target-resolution layer as write flows.
- Target matching now normalizes Windows path casing and slash style and can infer missing project selectors from the active domain file when appropriate.
- `list_project_programs(...)` plus alias `project_programs` is implemented and returns machine-friendly folder/program inventory.
### Analysis, Inspection, And Xrefs
- Function boundary repair helpers are implemented: `create_function_by_address(...)`, `delete_function_by_address(...)`, and `get_function_containing(...)`.
- Arbitrary memory/code inspection helpers are implemented: `read_region(...)`, `disassemble_region(...)`, `get_instruction_window(...)`, `search_instructions(...)`, and `get_data_uses(...)`.
- `search_bytes(...)` is implemented with `??` wildcards and machine-friendly hit output.
- Caller/xref recovery is improved via `get_callers(...)`, and `get_xrefs_to(...)` / `get_xrefs_from(...)` return typed reference kinds plus containing-function metadata.
- `get_symbol_at(address)` now uses direct routes when present and bridge-side legacy fallbacks when the live process is older.
### Batch Edits And Comparison Tools
- Batch helpers are implemented: `set_comments(...)`, `set_decompiler_comments(...)`, `rename_functions_by_address(...)`, and `apply_program_edit_plan(...)` with dry-run support.
- Reanalysis helpers are implemented: `reanalyze_region(...)`, `patch_bytes_and_reanalyze(...)`, and `analyze_function_boundaries(...)`.
- Cross-program comparison helpers are implemented: `compare_regions(...)`, `compare_strings(...)`, and `compare_functions(...)`.
- `port_symbols(...)` now ports verified names/comments between programs with provenance text and explicit source/target selectors.
### Class / Namespace / OO Recovery
- Namespace and class authoring helpers are implemented: `create_namespace(...)`, `create_class(...)`, `list_namespace_members(...)`, `move_symbol_to_namespace(...)`, and `set_function_class(...)`.
- Vtable and struct helpers are implemented: `analyze_vtable(...)`, `create_or_update_struct(...)`, `create_or_update_vtable(...)`, and alias coverage such as `build_vtable` / `set_this_type`.
- `set_function_this_type(...)` supports storage-strategy hints and `apply_class_layout(...)` now soft-fails per-method typing with structured warnings instead of aborting the whole batch.
### Prototype And Storage Modeling
- The storage-aware prototype endpoint is implemented as `set_function_prototype_storage(...)` with alias `set_storage_aware_prototype(...)`.
- The endpoint accepts declarative `return_type`, `return_storage`, ordered parameter lines (`name|type|storage`), explicit target selectors, varargs, and machine-friendly warnings when hidden `__return_storage_ptr__` state is still present.
- Source-level fixes landed on 2026-04-06 for the two known live correctness bugs:
- `stack:` storage is now parsed before generic Ghidra deserialization so workflow-style bare stack offsets are interpreted consistently.
- exact calling-convention tokens are tried before legacy normalization so 16-bit far conventions such as `__cdecl16far` are not needlessly collapsed to plain `__cdecl` when the exact token is accepted.
### Historical Notes
- If a future pass hits a new MCP gap, add it under `Remaining TODOs` and move it to `Done / Implemented In Local Fork` once the local source and bridge support are both in place.