PSX Decompilation

This commit is contained in:
MaddoScientisto 2026-04-07 00:15:44 +02:00
commit bbd29b1f10
25 changed files with 1921 additions and 701 deletions

View file

@ -1,342 +1,83 @@
# Ghidra MCP Wishlist
This file records concrete gaps in the current Ghidra MCP workflow.
Update it whenever a task requires PyGhidra or another local-only fallback because MCP lacks the needed operation.
This file records concrete MCP gaps hit during Crusader workflow passes.
For each new entry, keep the format short:
- Missing capability
- Current fallback
- Why it matters in this repo
- Proposed MCP endpoint or behavior
Rules for keeping it useful:
- Put only unresolved work in `Remaining TODOs`.
- Move implemented or source-fixed items to `Done / Implemented`.
- Keep each remaining item short: missing capability, fallback, why it matters, proposed behavior, latest status.
## Current Wishlist
## Remaining TODOs
### POST Body Contract Gap Hit During Runtime Prototype Repair (2026-04-05)
### Class-Lift Typing Live Parity
- Missing capability: POST endpoints only accept form-urlencoded key/value parameters; direct JSON bodies fail as if required parameters were omitted.
- Current fallback: use bridge helpers or manual form-encoded POSTs when testing endpoints such as `set_function_prototype(...)` directly.
- Why it matters: MCP clients, ad hoc terminal tests, and future automation naturally try JSON first for structured payloads, especially on newer class-lift and prototype endpoints.
- Proposed MCP behavior: accept both `application/x-www-form-urlencoded` and `application/json` on POST endpoints, or return a structured unsupported-content-type error that explicitly says the route only accepts form parameters.
- Status update (2026-04-05): local plugin `parsePostParams(...)` still only splits `key=value&...` bodies and ignores JSON payloads entirely, which is why direct JSON POSTs looked like missing-parameter failures during the `EntityVmRuntime::Create` repair.
- Status update (2026-04-05, local fork): plugin `parsePostParams(...)` now accepts both form-urlencoded bodies and JSON object bodies across POST routes. Unsupported POST bodies now fail early with an explicit `unsupported-content-type` parser error instead of silently degrading into missing-parameter behavior.
- Missing capability: end-to-end live-session parity for storage-aware `this` typing on 16-bit NE methods whose current storage does not match the default pointer storage the binder would choose.
- Current fallback: use local PyGhidra with `DYNAMIC_STORAGE_ALL_PARAMS`, or move methods with `set_function_class(...)` and defer final `this` typing/manual prototype cleanup.
- Why it matters: `EntityVmContext` lifecycle methods and `EntityVmRuntime::Create` still need the live MCP path to behave like the verified local PyGhidra repair flow.
- Proposed MCP behavior: `set_function_this_type(...)` and `apply_class_layout(...)` should reliably fall back to dynamic storage in-session for these 16-bit cases, while preserving structured per-method warnings instead of aborting the batch.
- Latest status (2026-04-06): local PyGhidra confirmed that `1420:0eec`, `1420:10b6`, `1420:10da`, `1420:1162`, `1420:118f`, and `1420:1278` accept `EntityVmContext * this` cleanly via `DYNAMIC_STORAGE_ALL_PARAMS`. The live storage-aware path now also accepts explicit `/Remorse/EntityVmRuntime *32`, `/Remorse/EntityVmOwnerResource *32`, `/Remorse/EntityVmContext *32`, and `/Remorse/EntityVmSlotEntry *32` signatures in-session once the exact `*32` datatype has first been resolved into the program data-type manager. The remaining live gap is now mostly about deeper mixed-width parameter packs like `1420:0eec CreateFromSlotIndex`, not the previously blocked 4-byte object-pointer cases themselves.
### Live PyGhidra Write Gap Hit During Runtime Repair Pass (2026-04-05)
### Storage-Aware Prototype Live Verification
- Missing capability: constrained live PyGhidra write execution through MCP when Ghidra was started with Python enabled.
- Current fallback: keep read-only inspection in live MCP via `run_readonly_script(...)`, but close the GUI and drop back to local project-open PyGhidra for write-side repairs such as custom-storage prototype fixes and datatype edits.
- Why it matters: the runtime class-lift batch had to leave the live session and reopen the project locally just to repair one 16-bit function signature and one allocator-helper callee, even though the live Ghidra instance could already host Python scripts.
- Proposed MCP behavior: add a narrowly scoped live write-script or transaction endpoint family that runs against the active writable program with explicit safety limits, dry-run support where possible, and machine-friendly transaction results.
- Status update (2026-04-05): the local fork can already probe and run live read-only Python when Ghidra starts with PyGhidra enabled, so the remaining gap is write-side exposure and safety policy rather than Python availability itself.
- Status update (2026-04-05, local fork): local plugin and bridge now expose `run_write_script(script_path|script_text, dry_run?)` plus the alias route `run_transaction_script`. The implementation reuses explicit write-target selectors, validates inline or file-backed scripts against a write-policy denylist, wraps execution in a single MCP-managed transaction, reports machine-friendly status/output, and surfaces `write_script_*` capability fields from `get_runtime_capabilities()`. The remaining gap is finer-grained safety policy and live workflow verification, not basic write-side exposure.
- Status update (2026-04-06, VM class-lift pass): direct bridge `run_write_script(...)` still returned `404 No context found for request` against the active `CRUSADER.EXE` GUI session even with explicit target selectors, so the `EntityVmContext` datatype plus the slot-entry/runtime prototype batch still had to fall back to closed-project local PyGhidra. The remaining gap is now active-session context binding for the write-script route, not route availability alone.
- Status update (2026-04-06, local fork hardening): plugin explicit-target binding now normalizes Windows `project_dir` casing/separators, infers missing `project_dir` / `project_name` from the active program when possible, and fills the matching `folder_path` from the active domain file before trying to reopen a target. Bridge `run_write_script(...)` now retries the `run_transaction_script` alias on `404` or `No context found for request`, reducing mixed-build false negatives while live-session verification continues.
- Status update (2026-04-06, live context-typing retry): the trivial dry-run probe for `run_write_script(...)` still returned `404 No context found for request` against the active `CRUSADER.EXE` session both with implicit active-program targeting and with explicit `project_dir` / `project_name` / `folder_path` / `program_name` selectors. The route is still not usable as an in-session fallback for the `EntityVmContext` typing pass.
- Missing capability: confirmed live-session parity for the newest storage-aware prototype fixes on 16-bit NE repair cases.
- Current fallback: if the active GUI session is on an older plugin build, reload the plugin; if parity still fails, use local PyGhidra or manually compensate when testing stack offsets / calling conventions.
- Why it matters: `1000:42e2` and `1420:1499` are the known proof cases for explicit return storage, stack-word parameter modeling, and 16-bit far calling conventions.
- Proposed MCP behavior: `set_function_prototype_storage(...)` should accept bare `stack:` offsets in the same hex-style form used in current workflow notes and should preserve exact calling-convention tokens such as `__cdecl16far` before falling back to lossy legacy normalization.
- Latest status (2026-04-06): the reloaded live plugin now reaches the real storage-aware implementation in-session on both proof cases, and explicit `AX:DX` return storage survives correctly on `1000:42e2` and `1420:1499`. The remaining live parity issue is now narrower: `calling_convention='__cdecl16far'` still normalizes those proof-case applies to plain `__cdecl`, but direct live `run_write_script(...)` calls can immediately restore `__cdecl16far`, which proves the live database accepts the exact convention token and leaves the endpoint-side normalization/deployment path as the remaining gap.
### Class-Lift Typing Gap Hit During VM Runtime Pass (2026-04-05)
### Live Metadata / Read-Target Verification
- Missing capability: a storage-aware class-layout or `this`-typing path for 16-bit NE methods whose current function storage does not match the default pointer storage the binder tries to apply.
- Current fallback: create/update the class namespace and datatype, then move methods individually with `set_function_class(...)` and leave `this` typing/manual prototype cleanup for later.
- Why it matters: the current Remorse class-lift workflow can land ownership cleanly for `EntityVmRuntime`, but `apply_class_layout(...)` failed on the runtime lifecycle cluster with `Failed to apply this type: Storage size does not match data type size: 2` even though the same binder succeeded for `EntityVmOwnerResource`.
- Proposed MCP behavior: let `apply_class_layout(...)` either skip/soft-fail `this` typing per method with structured results, or accept an explicit storage/calling-convention override for `this` so 16-bit segmented/custom-storage methods can still be class-bound and partially typed in one pass.
- Status update (2026-04-05, later MCP-upgrade pass): the upgraded tool surface now allows direct `set_function_class(...)` moves for additional `EntityVmRuntime` helpers and `set_function_this_type(...)` succeeded on `1420:1601 Destroy` when forced to `this_storage=farptr`, but `1420:1499 Create`, `1420:1536 InitSlots`, and `1420:1575 ReleaseSlots` still fail with the same storage-size mismatch, so the gap is narrower but not resolved.
- Status update (2026-04-05, local fork): `set_function_this_type(...)` now treats `this_storage` as a real storage strategy hint instead of always reusing the old first-parameter storage. For existing parameters it tries preserved custom storage first only when the caller asked to preserve/current storage, then falls back to `DYNAMIC_STORAGE_ALL_PARAMS` when the preserved storage is incompatible with the requested `this` type. `apply_class_layout(...)` now records per-method typing failures as structured warnings instead of aborting the entire batch, and bridge method payloads can carry per-method `this_storage` and `calling_convention` overrides.
- Status update (2026-04-06, VM class-lift pass): after landing `/Remorse/EntityVmContext` and the first slot-entry prototype batch, local PyGhidra could collapse `1420:1536 InitSlots` and `1420:1575 ReleaseSlots` to direct `EntityVmRuntime * this`, but `1420:1499 Create` still reintroduced hidden `__return_storage_ptr__` corruption whenever the split-word far runtime pointer was collapsed to a typed `this`. The open gap is now mostly `Create` plus any future 16-bit constructors/factories with the same far-pointer/custom-storage shape.
- Status update (2026-04-06, live context-typing retry): the old `apply_class_layout(...)` dry-run null failure for `/Remorse/EntityVmContext` no longer reproduces, but the real live write path still behaves like the older storage-preserving build. Actual `apply_class_layout(...)` and direct `set_function_this_type(...)` calls on `1420:10b6`, `1420:10da`, `1420:1162`, `1420:118f`, and `1420:1278` all still fail with `Storage size does not match data type size: 2`, so the open gap is now specifically live deployment parity for the dynamic-storage fallback rather than dry-run binder coverage.
- Status update (2026-04-06, local PyGhidra confirmation): after closing the GUI and running the local `tools.pyghidra_crusader` script path, the same context lifecycle entries (`1420:0eec`, `1420:10b6`, `1420:10da`, `1420:1162`, `1420:118f`, `1420:1278`) all accepted `EntityVmContext * this` cleanly via `DYNAMIC_STORAGE_ALL_PARAMS`. That confirms the typing model is valid and the remaining gap is live-session deployment parity, not the class layout itself.
- Missing capability: fully verified live-session parity for selector-aware reads and metadata helpers in mixed-build or partially refreshed GUI sessions.
- Current fallback: bridge alias retries, explicit-target normalization, and manual project-note cross-checks when a live session still behaves like an older plugin build.
- Why it matters: Crusader work routinely needs side-by-side reads across `/CRUSADER.EXE`, `/es/CRUSADER.EXE`, `/Writable/...`, and other project entries without changing the active Ghidra tab.
- Proposed MCP behavior: `list_project_programs(...)`, `get_runtime_capabilities(...)`, `get_callers(...)`, and other selector-aware read helpers should bind reliably to the requested or active target and return structured unsupported-state output instead of raw context failures.
- Latest status (2026-04-06): the local fork already includes alias fallbacks and Windows path/folder normalization for explicit-target matching. Remaining work is live-session verification after plugin refresh, not additional local source coverage.
### 16-bit Prototype And Hidden Return-Storage Gap Hit During Runtime Repair (2026-04-05)
## Done / Implemented In Local Fork
- Missing capability: a semantics-preserving prototype/storage endpoint for 16-bit NE functions that can set explicit parameter storage, explicit return storage, and avoid parser-induced hidden `__return_storage_ptr__` rewrites.
- Current fallback: inspect the broken caller plus its direct callees, then use local PyGhidra to normalize callee prototypes and apply custom storage manually.
- Why it matters: `1420:1499 Remorse::EntityVmRuntime::Create` kept throwing `Low-level Error: Symbol $$undef00000006 extends beyond the end of the address space` until the shared allocator helper at `1000:42e2` was repaired from a pointer-return signature that decompiled with a hidden return-storage parameter.
- Proposed MCP behavior: expose a storage-aware prototype/update endpoint that accepts explicit parameter and return storage, plus optionally a decompiler-health check or warning when a candidate prototype would inject hidden return storage into a 16-bit caller chain.
- Status update (2026-04-05): parser-string prototype updates alone were not sufficient here; the stable repair required explicit `AX:DX` return storage on `1000:42e2` and split-stack-word modeling for the runtime far pointer on `1420:1499`.
- Status update (2026-04-05, later MCP-upgrade pass): the new live `run_write_script(...)` path gives MCP a constrained way to perform these repairs inside the active writable session, but there is still no first-class storage-aware prototype endpoint that models explicit return/parameter storage declaratively. This wishlist item remains open.
- Status update (2026-04-06, local fork): local plugin and bridge now expose `set_function_prototype_storage(...)` plus the alias `set_storage_aware_prototype(...)`. The endpoint accepts declarative `return_type`, `return_storage`, and ordered `parameters` lines (`name|type|storage`), supports explicit target selectors, applies custom return/parameter storage in one transaction, and reports a warning when the resulting signature still contains hidden `__return_storage_ptr__` state.
- Status update (2026-04-06, live in-session verification): the checked-in Java source now wires both `/set_function_prototype_storage` and `/set_storage_aware_prototype` to the storage-aware implementation, but the active GUI session still does not match that build. Direct live POSTs to `/set_function_prototype_storage` returned HTTP 200 with the old legacy body `failed: set_function_prototype ... Function prototype is required`, while the alias route `/set_storage_aware_prototype` still returned `404 No context found for request`. So the live session still cannot exercise the new explicit-storage modeling in-session, and this remains a deployment/runtime parity gap rather than a source-level endpoint absence.
### Transport And Runtime
### Live MCP Issues Hit During Spanish Cheat Pass (2026-03-26)
- POST endpoints now accept both `application/json` and `application/x-www-form-urlencoded` request bodies. Unsupported POST payloads fail early with `unsupported-content-type` instead of degrading into missing-parameter errors.
- `get_runtime_capabilities()` reports readonly/write-script capability state and `run_readonly_script(...)` returns structured unsupported-state output when Python support is unavailable.
- `run_write_script(...)` and alias `run_transaction_script(...)` are implemented with dry-run support, explicit target selectors, a write-policy denylist, and machine-friendly transaction results.
- Bridge runtime helpers retry compatible aliases on `404` / `No context found for request` for mixed-build live sessions.
- Missing capability: working `search_bytes(...)` requests against the currently opened program.
- Current fallback: `read_region(...)`, `get_data_uses(...)`, `search_instructions(...)`, and manual/xref-driven narrowing inside `/es/CRUSADER.EXE`.
- Why it matters: the Spanish-cheat question specifically needed a direct full-memory search for the English `jassica16` scan-code table and any plausible replacement sequence.
- Proposed MCP behavior: `search_bytes(...)` should honor the active program context by default and return a machine-friendly empty-hit result when no matches exist, not `HTTP 404 No context found for request`.
### Explicit Targeting And Project Access
- Missing capability: reliable explicit target selection on read/query endpoints in the live server session.
- Current fallback: repo notes plus manual project `.prp` metadata inspection after `read_region(...)` and `get_function_by_address(...)` ignored explicit root-vs-`/es` selectors and still resolved against the active Spanish program.
- Why it matters: this repo routinely needs side-by-side comparisons between `/CRUSADER.EXE`, `/es/CRUSADER.EXE`, `/Writable/...`, and other project entries without changing the active Ghidra tab.
- Proposed MCP behavior: all selector-aware read endpoints should actually bind to the requested `project_dir` / `project_name` / `folder_path` / `program_name`, or return a structured target-resolution failure instead of silently reading the active program.
- Explicit write targeting is implemented for edit flows such as `apply_program_edit_plan(...)` and `patch_bytes_and_reanalyze(...)` with deterministic save behavior.
- Selector-aware read/query endpoints now accept `project_dir`, `project_name`, `folder_path`, and `program_name` and reuse the same target-resolution layer as write flows.
- Target matching now normalizes Windows path casing and slash style and can infer missing project selectors from the active domain file when appropriate.
- `list_project_programs(...)` plus alias `project_programs` is implemented and returns machine-friendly folder/program inventory.
- Missing capability: consistent context handling for project/runtime metadata helpers in the live server session.
- Current fallback: direct `get_project_access_info()` plus workspace-side `.prp` reads after `list_project_programs(...)`, `get_callers(...)`, `compare_functions(...)`, and `get_runtime_capabilities()` returned `404 No context found for request` during an otherwise healthy active-program session.
- Why it matters: these are the exact helper endpoints needed to validate which program is active, enumerate comparison targets, and reason about whether a failure is a real analysis result or an MCP/session problem.
- Proposed MCP behavior: metadata helpers should either work whenever an active program exists or return structured unsupported-state details, not raw 404 context failures.
- Status update (2026-03-26, later Spanish pass): the refreshed live server still returned `404 No context found for request` for `get_runtime_capabilities(...)` and `get_callers(...)` during an active `/es/CRUSADER.EXE` session, so this is still a live deployment or routing problem, not just an earlier-session artifact.
- Status update (2026-04-05, class-lift pass): after reloading the updated plugin, `get_project_access_info(...)` and the new class-lift write routes were reachable in the active `CRUSADER.EXE` session, but `list_project_programs(...)` still returned `404 No context found for request`, so the metadata-helper context issue is not fully resolved.
- Status update (2026-04-05, local bridge hardening): bridge `list_project_programs(...)` now retries the legacy `/project_programs` alias whenever the live server answers with `404` or `No context found for request`, which should smooth mixed-build sessions while the remaining live metadata routing issue is verified after redeploy.
- Status update (2026-04-06, local fork hardening): bridge `get_runtime_capabilities(...)` now retries the `/runtime_capabilities` alias on `404` or `No context found for request`, and plugin explicit-target matching no longer depends on exact Windows path casing or slash style when deciding whether an already-open program satisfies the request. This should reduce false context failures in mixed-build live sessions, though full deployment verification is still pending.
### Analysis, Inspection, And Xrefs
### Open Gaps Found During Hidden Usecode Debugger Patch Batch (2026-03-24)
- Function boundary repair helpers are implemented: `create_function_by_address(...)`, `delete_function_by_address(...)`, and `get_function_containing(...)`.
- Arbitrary memory/code inspection helpers are implemented: `read_region(...)`, `disassemble_region(...)`, `get_instruction_window(...)`, `search_instructions(...)`, and `get_data_uses(...)`.
- `search_bytes(...)` is implemented with `??` wildcards and machine-friendly hit output.
- Caller/xref recovery is improved via `get_callers(...)`, and `get_xrefs_to(...)` / `get_xrefs_from(...)` return typed reference kinds plus containing-function metadata.
- `get_symbol_at(address)` now uses direct routes when present and bridge-side legacy fallbacks when the live process is older.
- Missing capability: write-capable project/program selection for MCP edit operations.
- Current fallback: local PyGhidra `run-script` plus `read-region` against `--project-dir K:\ghidra\Crusader_Decomp --project-name Crusader --folder-path /Writable --program-name CRUSADER-PATCHED.EXE`.
- Why it matters: retail NE patch work in this repo must sometimes modify and save `/Writable/CRUSADER-PATCHED.EXE` with the GUI closed, while current MCP write flows depend on the active Ghidra session/program context.
- Proposed MCP addition: add bridge-exposed target selectors (`project_dir`, `project_name`, `folder_path`, `program_name`) for write endpoints, backed by plugin support to open the requested project file, apply `patch_bytes_and_reanalyze` or edit-plan writes, and save deterministically.
- Status update (2026-03-24): local fork now accepts optional `project_dir`, `project_name`, `folder_path`, and `program_name` selectors on `apply_program_edit_plan` and `patch_bytes_and_reanalyze`; explicit targets are opened through `GhidraProject`, written, saved deterministically, and then released.
- Status update (2026-03-24, follow-up): explicit target resolution now reuses an already-open matching program when possible and otherwise opens a writable domain object directly; MCP no longer opens explicit targets in read-only mode for edit operations.
### Batch Edits And Comparison Tools
### Open Gaps Found During Current 0x4588 Pass (2026-03-21)
- Batch helpers are implemented: `set_comments(...)`, `set_decompiler_comments(...)`, `rename_functions_by_address(...)`, and `apply_program_edit_plan(...)` with dry-run support.
- Reanalysis helpers are implemented: `reanalyze_region(...)`, `patch_bytes_and_reanalyze(...)`, and `analyze_function_boundaries(...)`.
- Cross-program comparison helpers are implemented: `compare_regions(...)`, `compare_strings(...)`, and `compare_functions(...)`.
- `port_symbols(...)` now ports verified names/comments between programs with provenance text and explicit source/target selectors.
- Missing capability: usable read-only scripting in the live MCP/Ghidra session.
- Current fallback: terminal-side Python and manual MCP inspection windows after `run_readonly_script` returned `Ghidra was not started with PyGhidra. Python is not available`.
- Why it matters: one-off structure probes and byte-pattern scans are still common during EUSECODE and overlap work, and they are much cleaner as constrained in-process reads than as external heuristics.
- Proposed MCP addition: expose runtime capability state for `run_readonly_script` and either guarantee a working in-process script engine or return a machine-friendly unsupported-state response early.
- Status update (2026-03-24): local fork now exposes `get_runtime_capabilities()` with readonly-script probe state and `run_readonly_script()` returns structured `status`/`reason`/`detail` output early when Python support is unavailable in the live session.
- Status update (2026-03-24, follow-up): `open_current_program_readonly()` is now intentionally disabled and returns an unsupported-state response so MCP does not create accidental read-only program instances in normal workflow.
### Class / Namespace / OO Recovery
- Status update (2026-03-21): the current live plugin process still returns HTTP 404 for direct symbol routes (`/get_symbol_at`, `/symbol_at`) in this chat session, but bridge `get_symbol_at(address)` now avoids raw 404s by falling back to compatible legacy endpoints and returning deterministic symbol-state output (for example `0x844` -> `symbol=<none>`).
- Remaining gap: reload/redeploy the updated plugin build so direct symbol routes are present in the live process; bridge fallback now covers older live builds in the meantime.
- Implemented now:
- `get_xrefs_to(address)` / `get_xrefs_from(address)` with typed ref kinds (`call`, `read`, `write`, `jump`, `other`) plus containing-function metadata.
- tolerant `set_function_prototype` retries for legacy calling-convention tokens (for example `__cdecl16far`) and returns an accepted template example on parse/apply failure.
- `rename_data(address, new_name)` now renames or creates the primary symbol at any valid address and returns the resolved symbol metadata instead of `Rename data attempted`.
- `get_symbol_at(address)` returns the primary symbol state at an address so label changes can be verified directly without depending on decompiler refresh timing.
- `get_symbol_at(address)` now resolves the active program on the Swing thread, falls back to the visible/open program when the current-program pointer is transiently unavailable, and the bridge retries the compatible `/symbol_at` alias if a stale server route returns `404 No context found for request`.
- bridge `get_symbol_at(address)` now probes additional legacy aliases (`getSymbolAt`, `symbolAt`, `get_symbol`) and, if symbol routes are absent, derives symbol state from legacy endpoints (`get_function_by_address`, paged `data`) so callers receive machine-friendly output instead of a raw 404.
- Local bridge audit (2026-03-21): `get_xrefs_to` / `get_xrefs_from` wrappers are already present in `K:\mcp\GhidraMCP\bridge_mcp_ghidra.py`; if a client still does not surface them, that is a client/tool-refresh issue rather than a missing local-fork endpoint.
- Namespace and class authoring helpers are implemented: `create_namespace(...)`, `create_class(...)`, `list_namespace_members(...)`, `move_symbol_to_namespace(...)`, and `set_function_class(...)`.
- Vtable and struct helpers are implemented: `analyze_vtable(...)`, `create_or_update_struct(...)`, `create_or_update_vtable(...)`, and alias coverage such as `build_vtable` / `set_this_type`.
- `set_function_this_type(...)` supports storage-strategy hints and `apply_class_layout(...)` now soft-fails per-method typing with structured warnings instead of aborting the whole batch.
## Implemented In Local GhidraMCP Fork (2026-03-21)
### Prototype And Storage Modeling
Added endpoints in `K:\mcp\GhidraMCP\src\main\java\com\lauriewired\GhidraMCPPlugin.java` and tools in `K:\mcp\GhidraMCP\bridge_mcp_ghidra.py`:
- The storage-aware prototype endpoint is implemented as `set_function_prototype_storage(...)` with alias `set_storage_aware_prototype(...)`.
- The endpoint accepts declarative `return_type`, `return_storage`, ordered parameter lines (`name|type|storage`), explicit target selectors, varargs, and machine-friendly warnings when hidden `__return_storage_ptr__` state is still present.
- Source-level fixes landed on 2026-04-06 for the two known live correctness bugs:
- `stack:` storage is now parsed before generic Ghidra deserialization so workflow-style bare stack offsets are interpreted consistently.
- exact calling-convention tokens are tried before legacy normalization so 16-bit far conventions such as `__cdecl16far` are not needlessly collapsed to plain `__cdecl` when the exact token is accepted.
- Function boundary repair:
- `create_function_by_address(entry, name, body_start, body_end, comment?)`
- `delete_function_by_address(entry)`
- `get_function_containing(address)`
- Arbitrary code and memory inspection:
- `read_region(start, end)`
- `disassemble_region(start, end)`
- `get_instruction_window(address, before_count, after_count)`
- `search_instructions(query, mode=text|operand|address, limit?)`
- `get_data_uses(address, include_operand_scans=true, limit?)`
- Batch and transactional edits:
- `set_comments(batch)`
- `set_decompiler_comments(batch)`
- `rename_functions_by_address(batch)`
- `apply_program_edit_plan(plan, dry_run=false)`
- Reanalysis and repair helpers:
- `reanalyze_region(start, end)`
- `patch_bytes_and_reanalyze(start, bytes, comment?)`
- `analyze_function_boundaries(start, end)`
- Read-only project access and scripting:
- `get_project_access_info()`
- `get_runtime_capabilities()`
- `open_current_program_readonly(version=-1, make_current=true)`
- `run_readonly_script(script_path|script_text)` with a constrained token denylist policy
### Historical Notes
- Explicit write targeting:
- optional `project_dir`, `project_name`, `folder_path`, `program_name` selectors on `apply_program_edit_plan(...)`
- optional `project_dir`, `project_name`, `folder_path`, `program_name` selectors on `patch_bytes_and_reanalyze(...)`
Batch encoding used by the current bridge:
- `set_comments` and `set_decompiler_comments`: list of `(address, comment)` pairs.
- `rename_functions_by_address`: list of `(address, new_name)` pairs.
- `apply_program_edit_plan`: one action per line with `|` separators, for example:
- `create_function_by_address|000c:1234|name|000c:1234|000c:1260|note`
- `delete_function_by_address|000c:1234`
- `rename_function_by_address|000c:1234|new_name`
- `set_disassembly_comment|000c:1234|comment text`
- `set_decompiler_comment|000c:1234|comment text`
Notes on read-only coverage:
- `open_current_program_readonly` opens a read-only program object for the currently loaded domain file.
- Project-switch/open-by-path is still not implemented; MCP still operates on the active Ghidra GUI project context.
### Function boundary repair
- Missing capability: create a function at an explicit entry with an explicit body start/end.
- Current fallback: local PyGhidra `create-function` and JSON repair plans.
- Why it matters: boundary repair is a recurring part of this project, especially for overlapped or truncated raw functions.
- Proposed MCP addition: `create_function_by_address(entry, name, body_start, body_end, comment?)`.
- Missing capability: delete an incorrect auto-created function.
- Current fallback: local PyGhidra `delete-function`.
- Why it matters: bad auto-analysis often blocks decompilation of adjacent real functions.
- Proposed MCP addition: `delete_function_by_address(entry)`.
- Missing capability: get the function containing an arbitrary address.
- Current fallback: local PyGhidra `get-function-containing`.
- Why it matters: no-function windows and overlap investigations depend on quickly mapping instruction hits back to owning functions.
- Proposed MCP addition: `get_function_containing(address)`.
### Arbitrary code and memory inspection
- Missing capability: read raw bytes from an arbitrary address range in program memory.
- Current fallback: local PyGhidra `read-region`.
- Why it matters: some important sites are real code bytes that are not yet part of any function object.
- Proposed MCP addition: `read_region(start, end)` returning bytes and a compact hex view.
- Missing capability: dump nearby instructions around an arbitrary address even when no function exists there.
- Current fallback: custom read-only PyGhidra scripts such as `pyghidra_plans/dump_instruction_windows.py`.
- Why it matters: the `0x4588` investigation depended on inspecting instruction windows in no-function regions.
- Proposed MCP addition: `disassemble_region(start, end)` or `get_instruction_window(address, before_count, after_count)`.
- Missing capability: scan all instructions for a literal operand or address token.
- Current fallback: custom PyGhidra scripts such as `scan_4588_instruction_uses.py`.
- Why it matters: normal xref APIs can miss useful operand-text hits in partially analyzed regions.
- Proposed MCP addition: `search_instructions(query, mode=text|operand|address, limit?)`.
- Missing capability: robust data-address xrefs that include operand-based uses even when the reference manager has none.
- Current fallback: instruction-text scans and manual disassembly windows.
- Why it matters: globals like `0x4588` can be heavily used before formal references exist in the database.
- Proposed MCP addition: `get_data_uses(address, include_operand_scans=true)`.
### Batch and transactional edits
- Missing capability: apply a small transactional edit plan containing function removals, function creations, renames, and comments.
- Current fallback: local PyGhidra `apply-plan` with JSON.
- Why it matters: boundary repair work is safer when a verified batch can be replayed atomically.
- Proposed MCP addition: `apply_program_edit_plan(plan)` with dry-run support.
- Missing capability: batch comment creation for a verified address set.
- Current fallback: repeated single-address comment calls or PyGhidra plan files.
- Why it matters: reverse-engineering batches often produce several related evidence comments at once.
- Proposed MCP addition: `set_comments(batch)` and `set_decompiler_comments(batch)`.
- Missing capability: batch rename-by-address for a small verified set.
- Current fallback: repeated `rename_function_by_address` calls or local plan files.
- Why it matters: verified raw-import ports often land in short, evidence-backed batches.
- Proposed MCP addition: `rename_functions_by_address(batch)`.
### Reanalysis and repair helpers
- Missing capability: re-disassemble or reanalyze a small address range after patching bytes or changing function boundaries.
- Current fallback: local scripted repair passes.
- Why it matters: the far-call fixup workflow and boundary recovery both depend on deterministic reanalysis of touched ranges.
- Proposed MCP addition: `reanalyze_region(start, end, options?)`.
- Missing capability: patch a small byte range and immediately re-disassemble affected instructions.
- Current fallback: local PyGhidra repair scripts.
- Why it matters: the NE far-call fixup pass was a major workflow improvement and is exactly the sort of task MCP should eventually support.
- Proposed MCP addition: `patch_bytes_and_reanalyze(start, bytes, comment?)`.
- Missing capability: detect likely bad function overlaps or candidate function starts in a small range.
- Current fallback: manual repair plus custom PyGhidra probing.
- Why it matters: overlap repair is one of the main reasons the workflow still has to leave MCP.
- Proposed MCP addition: `analyze_function_boundaries(start, end)` returning overlap warnings and candidate entries.
### Read-only project access and scripting
- Missing capability: open a locked project read-only or query a specified project clone directly from MCP.
- Current fallback: local PyGhidra against an unlocked temporary project clone.
- Why it matters: the GUI often owns the main project while read-only inspection still needs to continue.
- Proposed MCP addition: read-only project selection/open options for all analysis endpoints.
- Missing capability: run a small read-only script for one-off inspections that do not justify a permanent MCP endpoint yet.
- Current fallback: local PyGhidra `run-script --read-only`.
- Why it matters: several repo workflows start as one-off analysis helpers before they prove worth productizing.
- Proposed MCP addition: a constrained `run_readonly_script(script_text|script_path)` endpoint with explicit safety limits.
### Migrated entries from `ghidra-mcp_wishlist.md`
Short, concrete gaps hit during live Crusader work. Each entry records what MCP lacked, what fallback was needed, and what a useful MCP feature should look like.
## Open Gaps (migrated)
### Byte-pattern search across program memory
- Status: implemented in local fork (2026-03-26)
- Missing MCP capability: search raw bytes or byte patterns across the current program's mapped segments / address spaces.
- Fallback used: manual `read_region` sweeps plus local Python over the MCP HTTP bridge to scan live Spanish `CRUSADER.EXE` memory for the `jassica16` scan-code table.
- Useful MCP feature:
- `search_bytes(pattern, start?, end?, segment_filter?, max_hits?)`
- accepts hex byte patterns with optional wildcards
- returns exact hit addresses plus nearby hex context
- Why it matters: this would have closed the Spanish cheat-sequence question directly inside MCP instead of forcing ad hoc local scripting.
- Status update (2026-03-26): local fork now exposes `search_bytes(pattern, start?, end?, segment_filter?, max_hits?)` in both the Java plugin and Python bridge; it accepts `??` wildcards, scans mapped memory blocks, and returns machine-friendly hit lines with block names and nearby hex context.
### Reliable caller/xref recovery for local call sites
- Status: implemented in local fork (2026-03-26)
- Missing MCP capability: reliable function-call xrefs for near/local calls inside the active program.
- Fallback used: manual `search_instructions` and instruction-window inspection because `get_function_xrefs` did not surface some obvious local call sites in the Spanish keyboard/helper cluster.
- Useful MCP feature:
- improve `get_function_xrefs` so it includes near calls, far calls, tail-call-style jumps, and thunk references consistently
- or add `get_callers(address_or_name, include_near=true, include_far=true, include_jumps=true)`
- Why it matters: tracing helper chains around hidden key-sequence code is slower and less reliable when local callers have to be reconstructed by text search.
- Status update (2026-03-26): local fork now exposes `get_callers(target, include_near=true, include_far=true, include_jumps=true, limit?)`, combining reference-manager hits with instruction-flow scans so local near-call sites show up even when plain xrefs are incomplete; `get_function_xrefs` now reuses the same caller recovery path.
### Cross-program reads inside the same Ghidra project
- Status: implemented in local fork (2026-03-26)
- Missing MCP capability: read/query another program or assembly in the same project without switching the active program first.
- Fallback used: indirect comparison against repo notes, workspace-side files, and ad hoc local scripts instead of querying `/CRUSADER.EXE`, `/es/CRUSADER.EXE`, `/Writable/...`, or other domain files side by side through MCP.
- Useful MCP feature:
- allow explicit target selectors on all read/query endpoints, not only write endpoints
- example: `read_region(start, end, project_dir?, project_name?, folder_path?, program_name?)`
- same for strings, functions, xrefs, data uses, decompile, disassemble, symbol lookup, and segment listing
- Why it matters: live localized-build comparisons and writable-copy verification should not require changing the active Ghidra tab just to inspect another program.
- Status update (2026-03-26): read/query endpoints in the local fork now accept optional explicit target selectors (`project_dir`, `project_name`, `folder_path`, `program_name`) and reuse the same target-resolution layer as write flows; this now covers method/class listings, segments, imports/exports, namespaces, data items, function lookup/listing, decompile/disassembly, symbol lookup, regions, instruction scans, strings, xrefs, and data-use queries.
### Cross-project / cross-program compare tooling
- Status: implemented in local fork (2026-03-26)
- Missing MCP capability: first-class compare operations between two programs in the same project or across projects.
- Fallback used: manual note-to-note comparison, address math, and repeated per-program queries.
- Useful MCP feature:
- `compare_regions(left_program, left_range, right_program, right_range, mode=bytes|words|disasm|strings)`
- `compare_strings(left_program, right_program, filter?)`
- `compare_functions(left_program, left_addr_or_name, right_program, right_addr_or_name, mode=signature|disasm|decompile|xrefs)`
- machine-readable output with address pairs, similarity score, and differing bytes/instructions/strings
- Why it matters: this would make English vs Spanish / Remorse vs Regret / raw vs live NE comparisons much faster and less error-prone.
- Status update (2026-03-26): local fork now exposes `compare_regions(...)`, `compare_strings(...)`, and `compare_functions(...)` with left/right explicit target selectors; outputs are machine-friendly and include comparison mode, similarity score, and capped difference samples for byte/word, disassembly, string, signature, decompile, and xref views.
### Port renames/comments/symbol facts between programs
- Status: implemented in local fork (2026-03-26)
- Missing MCP capability: apply verified names/comments from one program to another program with explicit provenance instead of re-entering them one by one.
- Fallback used: manual rename/comment batches plus external notes to carry mapping provenance.
- Useful MCP feature:
- `port_symbols(source_program, target_program, mappings, apply=names|comments|both, provenance_comment_template?)`
- support direct address maps, segment-relative maps, and user-supplied CSV/JSON mapping tables
- dry-run mode showing collisions and ambiguous targets
- Why it matters: porting verified English or raw-import findings into Spanish or live NE targets is a recurring workflow.
- Status update (2026-03-26): local fork now exposes `port_symbols(mappings, apply=names|comments|both, provenance_comment_template?, dry_run?)` with `source_*` and `target_*` selectors; the bridge accepts a verified list of source/target address pairs and the plugin ports names plus PRE/EOL comments with optional provenance text and explicit-target save support.
### Project inventory / browse endpoint
- Status: implemented in local fork (2026-03-26)
- Missing MCP capability: list project folders and available programs through MCP.
- Fallback used: repo-side assumptions and local tooling; the current MCP read tools expose only the active program cleanly.
- Useful MCP feature:
- `list_project_programs(project_dir?, project_name?, folder_path?, recursive=true)`
- returns folder path, program name, read-only/writable/versioned state, and whether it is currently open
- Why it matters: comparing or porting across programs is awkward without a discoverable inventory of assemblies already in the Ghidra project.
- Status update (2026-03-26): local fork now exposes `list_project_programs(project_dir?, project_name?, folder_path?, recursive=true)` plus a `project_programs` alias; it walks project folders and returns machine-friendly program inventory lines with folder path, program name, content type, read-only/versioned flags, and current-open state.
### Class / namespace authoring for C++ lifting
- Missing MCP capability: create and manage Ghidra class or namespace symbols, then move existing functions under those owners as methods.
- Current fallback: manual Ghidra GUI edits in the Symbol Tree or one-off local scripts outside the normal MCP workflow.
- Why it matters: the Remorse binary already shows repeated ctor/dtor patterns, stable vtable roots, and class-like object families, but the current MCP workflow can only rename flat functions. That blocks a disciplined shift from procedural naming toward grouped C++-style ownership.
- Proposed MCP addition:
- `create_namespace(name, parent_path?, kind=namespace|class)`
- `move_symbol_to_namespace(symbol_address_or_name, namespace_path, new_name?)`
- `set_function_class(function_address, class_path, method_name?, this_param_name?, calling_convention?)`
- machine-friendly responses that include the final symbol path and any rename collisions.
- Status update (2026-04-05): local fork now exposes `create_namespace(...)`, `list_namespace_members(...)`, `move_symbol_to_namespace(...)`, and `set_function_class(...)` in both the Java plugin and Python bridge. The implementation supports explicit target selectors, dry-run moves, collision policies (`fail|keep_existing|rename_incoming`), and compatibility aliases (`create_class`, `move_function_to_class`).
### Vtable / OO recovery helpers for class reconstruction
- Missing MCP capability: first-class helpers for identifying vtables, attaching function slots to candidate classes, and materializing class/instance layouts from evidence-backed data.
- Current fallback: manual note collation from decompiler/disassembly output plus ad hoc datatype work in the GUI.
- Why it matters: the repo already has enough evidence to start lifting major families into C++ classes, but a recompilable source path needs more than renamed functions. It needs reproducible vtable maps, `this`-pointer typing, field layouts, inheritance guesses, and explicit provenance for each class model.
- Proposed MCP addition:
- `analyze_vtable(address, slot_count?, namespace_path?)`
- `create_or_update_struct(name, size?, fields)`
- `set_function_this_type(function_address, struct_name, this_storage=stack|register|farptr)`
- `apply_class_layout(class_path, instance_struct, vtable_struct?, methods)`
- optional dry-run output showing inferred slots, unresolved targets, and conflicting field/size evidence.
- Status update (2026-04-05): local fork now exposes `analyze_vtable(...)`, `create_or_update_struct(...)`, `create_or_update_vtable(...)`, `set_function_this_type(...)`, and `apply_class_layout(...)` in both layers. Struct and vtable authoring accept line-encoded field/slot batches from the bridge, `set_function_this_type(...)` updates the first parameter to a typed `this` pointer while preserving storage when possible, and `apply_class_layout(...)` batches namespace moves plus `this` typing with dry-run support. Compatibility aliases now also cover `set_this_type` and `build_vtable`.
- If a future pass hits a new MCP gap, add it under `Remaining TODOs` and move it to `Done / Implemented In Local Fork` once the local source and bridge support are both in place.