Crusader_Decomp/ghidra_mcp_wishlist.md

38 KiB

Ghidra MCP Wishlist

This file records concrete gaps in the current Ghidra MCP workflow. Update it whenever a task requires PyGhidra or another local-only fallback because MCP lacks the needed operation.

For each new entry, keep the format short:

  • Missing capability
  • Current fallback
  • Why it matters in this repo
  • Proposed MCP endpoint or behavior

Current Wishlist

POST Body Contract Gap Hit During Runtime Prototype Repair (2026-04-05)

  • Missing capability: POST endpoints only accept form-urlencoded key/value parameters; direct JSON bodies fail as if required parameters were omitted.
  • Current fallback: use bridge helpers or manual form-encoded POSTs when testing endpoints such as set_function_prototype(...) directly.
  • Why it matters: MCP clients, ad hoc terminal tests, and future automation naturally try JSON first for structured payloads, especially on newer class-lift and prototype endpoints.
  • Proposed MCP behavior: accept both application/x-www-form-urlencoded and application/json on POST endpoints, or return a structured unsupported-content-type error that explicitly says the route only accepts form parameters.
  • Status update (2026-04-05): local plugin parsePostParams(...) still only splits key=value&... bodies and ignores JSON payloads entirely, which is why direct JSON POSTs looked like missing-parameter failures during the EntityVmRuntime::Create repair.
  • Status update (2026-04-05, local fork): plugin parsePostParams(...) now accepts both form-urlencoded bodies and JSON object bodies across POST routes. Unsupported POST bodies now fail early with an explicit unsupported-content-type parser error instead of silently degrading into missing-parameter behavior.

Live PyGhidra Write Gap Hit During Runtime Repair Pass (2026-04-05)

  • Missing capability: constrained live PyGhidra write execution through MCP when Ghidra was started with Python enabled.
  • Current fallback: keep read-only inspection in live MCP via run_readonly_script(...), but close the GUI and drop back to local project-open PyGhidra for write-side repairs such as custom-storage prototype fixes and datatype edits.
  • Why it matters: the runtime class-lift batch had to leave the live session and reopen the project locally just to repair one 16-bit function signature and one allocator-helper callee, even though the live Ghidra instance could already host Python scripts.
  • Proposed MCP behavior: add a narrowly scoped live write-script or transaction endpoint family that runs against the active writable program with explicit safety limits, dry-run support where possible, and machine-friendly transaction results.
  • Status update (2026-04-05): the local fork can already probe and run live read-only Python when Ghidra starts with PyGhidra enabled, so the remaining gap is write-side exposure and safety policy rather than Python availability itself.
  • Status update (2026-04-05, local fork): local plugin and bridge now expose run_write_script(script_path|script_text, dry_run?) plus the alias route run_transaction_script. The implementation reuses explicit write-target selectors, validates inline or file-backed scripts against a write-policy denylist, wraps execution in a single MCP-managed transaction, reports machine-friendly status/output, and surfaces write_script_* capability fields from get_runtime_capabilities(). The remaining gap is finer-grained safety policy and live workflow verification, not basic write-side exposure.
  • Status update (2026-04-06, VM class-lift pass): direct bridge run_write_script(...) still returned 404 No context found for request against the active CRUSADER.EXE GUI session even with explicit target selectors, so the EntityVmContext datatype plus the slot-entry/runtime prototype batch still had to fall back to closed-project local PyGhidra. The remaining gap is now active-session context binding for the write-script route, not route availability alone.
  • Status update (2026-04-06, local fork hardening): plugin explicit-target binding now normalizes Windows project_dir casing/separators, infers missing project_dir / project_name from the active program when possible, and fills the matching folder_path from the active domain file before trying to reopen a target. Bridge run_write_script(...) now retries the run_transaction_script alias on 404 or No context found for request, reducing mixed-build false negatives while live-session verification continues.
  • Status update (2026-04-06, live context-typing retry): the trivial dry-run probe for run_write_script(...) still returned 404 No context found for request against the active CRUSADER.EXE session both with implicit active-program targeting and with explicit project_dir / project_name / folder_path / program_name selectors. The route is still not usable as an in-session fallback for the EntityVmContext typing pass.

Class-Lift Typing Gap Hit During VM Runtime Pass (2026-04-05)

  • Missing capability: a storage-aware class-layout or this-typing path for 16-bit NE methods whose current function storage does not match the default pointer storage the binder tries to apply.
  • Current fallback: create/update the class namespace and datatype, then move methods individually with set_function_class(...) and leave this typing/manual prototype cleanup for later.
  • Why it matters: the current Remorse class-lift workflow can land ownership cleanly for EntityVmRuntime, but apply_class_layout(...) failed on the runtime lifecycle cluster with Failed to apply this type: Storage size does not match data type size: 2 even though the same binder succeeded for EntityVmOwnerResource.
  • Proposed MCP behavior: let apply_class_layout(...) either skip/soft-fail this typing per method with structured results, or accept an explicit storage/calling-convention override for this so 16-bit segmented/custom-storage methods can still be class-bound and partially typed in one pass.
  • Status update (2026-04-05, later MCP-upgrade pass): the upgraded tool surface now allows direct set_function_class(...) moves for additional EntityVmRuntime helpers and set_function_this_type(...) succeeded on 1420:1601 Destroy when forced to this_storage=farptr, but 1420:1499 Create, 1420:1536 InitSlots, and 1420:1575 ReleaseSlots still fail with the same storage-size mismatch, so the gap is narrower but not resolved.
  • Status update (2026-04-05, local fork): set_function_this_type(...) now treats this_storage as a real storage strategy hint instead of always reusing the old first-parameter storage. For existing parameters it tries preserved custom storage first only when the caller asked to preserve/current storage, then falls back to DYNAMIC_STORAGE_ALL_PARAMS when the preserved storage is incompatible with the requested this type. apply_class_layout(...) now records per-method typing failures as structured warnings instead of aborting the entire batch, and bridge method payloads can carry per-method this_storage and calling_convention overrides.
  • Status update (2026-04-06, VM class-lift pass): after landing /Remorse/EntityVmContext and the first slot-entry prototype batch, local PyGhidra could collapse 1420:1536 InitSlots and 1420:1575 ReleaseSlots to direct EntityVmRuntime * this, but 1420:1499 Create still reintroduced hidden __return_storage_ptr__ corruption whenever the split-word far runtime pointer was collapsed to a typed this. The open gap is now mostly Create plus any future 16-bit constructors/factories with the same far-pointer/custom-storage shape.
  • Status update (2026-04-06, live context-typing retry): the old apply_class_layout(...) dry-run null failure for /Remorse/EntityVmContext no longer reproduces, but the real live write path still behaves like the older storage-preserving build. Actual apply_class_layout(...) and direct set_function_this_type(...) calls on 1420:10b6, 1420:10da, 1420:1162, 1420:118f, and 1420:1278 all still fail with Storage size does not match data type size: 2, so the open gap is now specifically live deployment parity for the dynamic-storage fallback rather than dry-run binder coverage.
  • Status update (2026-04-06, local PyGhidra confirmation): after closing the GUI and running the local tools.pyghidra_crusader script path, the same context lifecycle entries (1420:0eec, 1420:10b6, 1420:10da, 1420:1162, 1420:118f, 1420:1278) all accepted EntityVmContext * this cleanly via DYNAMIC_STORAGE_ALL_PARAMS. That confirms the typing model is valid and the remaining gap is live-session deployment parity, not the class layout itself.

16-bit Prototype And Hidden Return-Storage Gap Hit During Runtime Repair (2026-04-05)

  • Missing capability: a semantics-preserving prototype/storage endpoint for 16-bit NE functions that can set explicit parameter storage, explicit return storage, and avoid parser-induced hidden __return_storage_ptr__ rewrites.
  • Current fallback: inspect the broken caller plus its direct callees, then use local PyGhidra to normalize callee prototypes and apply custom storage manually.
  • Why it matters: 1420:1499 Remorse::EntityVmRuntime::Create kept throwing Low-level Error: Symbol $$undef00000006 extends beyond the end of the address space until the shared allocator helper at 1000:42e2 was repaired from a pointer-return signature that decompiled with a hidden return-storage parameter.
  • Proposed MCP behavior: expose a storage-aware prototype/update endpoint that accepts explicit parameter and return storage, plus optionally a decompiler-health check or warning when a candidate prototype would inject hidden return storage into a 16-bit caller chain.
  • Status update (2026-04-05): parser-string prototype updates alone were not sufficient here; the stable repair required explicit AX:DX return storage on 1000:42e2 and split-stack-word modeling for the runtime far pointer on 1420:1499.
  • Status update (2026-04-05, later MCP-upgrade pass): the new live run_write_script(...) path gives MCP a constrained way to perform these repairs inside the active writable session, but there is still no first-class storage-aware prototype endpoint that models explicit return/parameter storage declaratively. This wishlist item remains open.
  • Status update (2026-04-06, local fork): local plugin and bridge now expose set_function_prototype_storage(...) plus the alias set_storage_aware_prototype(...). The endpoint accepts declarative return_type, return_storage, and ordered parameters lines (name|type|storage), supports explicit target selectors, applies custom return/parameter storage in one transaction, and reports a warning when the resulting signature still contains hidden __return_storage_ptr__ state.
  • Status update (2026-04-06, live in-session verification): the checked-in Java source now wires both /set_function_prototype_storage and /set_storage_aware_prototype to the storage-aware implementation, but the active GUI session still does not match that build. Direct live POSTs to /set_function_prototype_storage returned HTTP 200 with the old legacy body failed: set_function_prototype ... Function prototype is required, while the alias route /set_storage_aware_prototype still returned 404 No context found for request. So the live session still cannot exercise the new explicit-storage modeling in-session, and this remains a deployment/runtime parity gap rather than a source-level endpoint absence.

Live MCP Issues Hit During Spanish Cheat Pass (2026-03-26)

  • Missing capability: working search_bytes(...) requests against the currently opened program.

  • Current fallback: read_region(...), get_data_uses(...), search_instructions(...), and manual/xref-driven narrowing inside /es/CRUSADER.EXE.

  • Why it matters: the Spanish-cheat question specifically needed a direct full-memory search for the English jassica16 scan-code table and any plausible replacement sequence.

  • Proposed MCP behavior: search_bytes(...) should honor the active program context by default and return a machine-friendly empty-hit result when no matches exist, not HTTP 404 No context found for request.

  • Missing capability: reliable explicit target selection on read/query endpoints in the live server session.

  • Current fallback: repo notes plus manual project .prp metadata inspection after read_region(...) and get_function_by_address(...) ignored explicit root-vs-/es selectors and still resolved against the active Spanish program.

  • Why it matters: this repo routinely needs side-by-side comparisons between /CRUSADER.EXE, /es/CRUSADER.EXE, /Writable/..., and other project entries without changing the active Ghidra tab.

  • Proposed MCP behavior: all selector-aware read endpoints should actually bind to the requested project_dir / project_name / folder_path / program_name, or return a structured target-resolution failure instead of silently reading the active program.

  • Missing capability: consistent context handling for project/runtime metadata helpers in the live server session.

  • Current fallback: direct get_project_access_info() plus workspace-side .prp reads after list_project_programs(...), get_callers(...), compare_functions(...), and get_runtime_capabilities() returned 404 No context found for request during an otherwise healthy active-program session.

  • Why it matters: these are the exact helper endpoints needed to validate which program is active, enumerate comparison targets, and reason about whether a failure is a real analysis result or an MCP/session problem.

  • Proposed MCP behavior: metadata helpers should either work whenever an active program exists or return structured unsupported-state details, not raw 404 context failures.

  • Status update (2026-03-26, later Spanish pass): the refreshed live server still returned 404 No context found for request for get_runtime_capabilities(...) and get_callers(...) during an active /es/CRUSADER.EXE session, so this is still a live deployment or routing problem, not just an earlier-session artifact.

  • Status update (2026-04-05, class-lift pass): after reloading the updated plugin, get_project_access_info(...) and the new class-lift write routes were reachable in the active CRUSADER.EXE session, but list_project_programs(...) still returned 404 No context found for request, so the metadata-helper context issue is not fully resolved.

  • Status update (2026-04-05, local bridge hardening): bridge list_project_programs(...) now retries the legacy /project_programs alias whenever the live server answers with 404 or No context found for request, which should smooth mixed-build sessions while the remaining live metadata routing issue is verified after redeploy.

  • Status update (2026-04-06, local fork hardening): bridge get_runtime_capabilities(...) now retries the /runtime_capabilities alias on 404 or No context found for request, and plugin explicit-target matching no longer depends on exact Windows path casing or slash style when deciding whether an already-open program satisfies the request. This should reduce false context failures in mixed-build live sessions, though full deployment verification is still pending.

Open Gaps Found During Hidden Usecode Debugger Patch Batch (2026-03-24)

  • Missing capability: write-capable project/program selection for MCP edit operations.
  • Current fallback: local PyGhidra run-script plus read-region against --project-dir K:\ghidra\Crusader_Decomp --project-name Crusader --folder-path /Writable --program-name CRUSADER-PATCHED.EXE.
  • Why it matters: retail NE patch work in this repo must sometimes modify and save /Writable/CRUSADER-PATCHED.EXE with the GUI closed, while current MCP write flows depend on the active Ghidra session/program context.
  • Proposed MCP addition: add bridge-exposed target selectors (project_dir, project_name, folder_path, program_name) for write endpoints, backed by plugin support to open the requested project file, apply patch_bytes_and_reanalyze or edit-plan writes, and save deterministically.
  • Status update (2026-03-24): local fork now accepts optional project_dir, project_name, folder_path, and program_name selectors on apply_program_edit_plan and patch_bytes_and_reanalyze; explicit targets are opened through GhidraProject, written, saved deterministically, and then released.
  • Status update (2026-03-24, follow-up): explicit target resolution now reuses an already-open matching program when possible and otherwise opens a writable domain object directly; MCP no longer opens explicit targets in read-only mode for edit operations.

Open Gaps Found During Current 0x4588 Pass (2026-03-21)

  • Missing capability: usable read-only scripting in the live MCP/Ghidra session.

  • Current fallback: terminal-side Python and manual MCP inspection windows after run_readonly_script returned Ghidra was not started with PyGhidra. Python is not available.

  • Why it matters: one-off structure probes and byte-pattern scans are still common during EUSECODE and overlap work, and they are much cleaner as constrained in-process reads than as external heuristics.

  • Proposed MCP addition: expose runtime capability state for run_readonly_script and either guarantee a working in-process script engine or return a machine-friendly unsupported-state response early.

  • Status update (2026-03-24): local fork now exposes get_runtime_capabilities() with readonly-script probe state and run_readonly_script() returns structured status/reason/detail output early when Python support is unavailable in the live session.

  • Status update (2026-03-24, follow-up): open_current_program_readonly() is now intentionally disabled and returns an unsupported-state response so MCP does not create accidental read-only program instances in normal workflow.

  • Status update (2026-03-21): the current live plugin process still returns HTTP 404 for direct symbol routes (/get_symbol_at, /symbol_at) in this chat session, but bridge get_symbol_at(address) now avoids raw 404s by falling back to compatible legacy endpoints and returning deterministic symbol-state output (for example 0x844 -> symbol=<none>).

  • Remaining gap: reload/redeploy the updated plugin build so direct symbol routes are present in the live process; bridge fallback now covers older live builds in the meantime.

  • Implemented now:

    • get_xrefs_to(address) / get_xrefs_from(address) with typed ref kinds (call, read, write, jump, other) plus containing-function metadata.
    • tolerant set_function_prototype retries for legacy calling-convention tokens (for example __cdecl16far) and returns an accepted template example on parse/apply failure.
    • rename_data(address, new_name) now renames or creates the primary symbol at any valid address and returns the resolved symbol metadata instead of Rename data attempted.
    • get_symbol_at(address) returns the primary symbol state at an address so label changes can be verified directly without depending on decompiler refresh timing.
    • get_symbol_at(address) now resolves the active program on the Swing thread, falls back to the visible/open program when the current-program pointer is transiently unavailable, and the bridge retries the compatible /symbol_at alias if a stale server route returns 404 No context found for request.
    • bridge get_symbol_at(address) now probes additional legacy aliases (getSymbolAt, symbolAt, get_symbol) and, if symbol routes are absent, derives symbol state from legacy endpoints (get_function_by_address, paged data) so callers receive machine-friendly output instead of a raw 404.
  • Local bridge audit (2026-03-21): get_xrefs_to / get_xrefs_from wrappers are already present in K:\mcp\GhidraMCP\bridge_mcp_ghidra.py; if a client still does not surface them, that is a client/tool-refresh issue rather than a missing local-fork endpoint.

Implemented In Local GhidraMCP Fork (2026-03-21)

Added endpoints in K:\mcp\GhidraMCP\src\main\java\com\lauriewired\GhidraMCPPlugin.java and tools in K:\mcp\GhidraMCP\bridge_mcp_ghidra.py:

  • Function boundary repair:

    • create_function_by_address(entry, name, body_start, body_end, comment?)
    • delete_function_by_address(entry)
    • get_function_containing(address)
  • Arbitrary code and memory inspection:

    • read_region(start, end)
    • disassemble_region(start, end)
    • get_instruction_window(address, before_count, after_count)
    • search_instructions(query, mode=text|operand|address, limit?)
    • get_data_uses(address, include_operand_scans=true, limit?)
  • Batch and transactional edits:

    • set_comments(batch)
    • set_decompiler_comments(batch)
    • rename_functions_by_address(batch)
    • apply_program_edit_plan(plan, dry_run=false)
  • Reanalysis and repair helpers:

    • reanalyze_region(start, end)
    • patch_bytes_and_reanalyze(start, bytes, comment?)
    • analyze_function_boundaries(start, end)
  • Read-only project access and scripting:

    • get_project_access_info()
    • get_runtime_capabilities()
    • open_current_program_readonly(version=-1, make_current=true)
    • run_readonly_script(script_path|script_text) with a constrained token denylist policy
  • Explicit write targeting:

    • optional project_dir, project_name, folder_path, program_name selectors on apply_program_edit_plan(...)
    • optional project_dir, project_name, folder_path, program_name selectors on patch_bytes_and_reanalyze(...)

Batch encoding used by the current bridge:

  • set_comments and set_decompiler_comments: list of (address, comment) pairs.
  • rename_functions_by_address: list of (address, new_name) pairs.
  • apply_program_edit_plan: one action per line with | separators, for example:
    • create_function_by_address|000c:1234|name|000c:1234|000c:1260|note
    • delete_function_by_address|000c:1234
    • rename_function_by_address|000c:1234|new_name
    • set_disassembly_comment|000c:1234|comment text
    • set_decompiler_comment|000c:1234|comment text

Notes on read-only coverage:

  • open_current_program_readonly opens a read-only program object for the currently loaded domain file.
  • Project-switch/open-by-path is still not implemented; MCP still operates on the active Ghidra GUI project context.

Function boundary repair

  • Missing capability: create a function at an explicit entry with an explicit body start/end.

  • Current fallback: local PyGhidra create-function and JSON repair plans.

  • Why it matters: boundary repair is a recurring part of this project, especially for overlapped or truncated raw functions.

  • Proposed MCP addition: create_function_by_address(entry, name, body_start, body_end, comment?).

  • Missing capability: delete an incorrect auto-created function.

  • Current fallback: local PyGhidra delete-function.

  • Why it matters: bad auto-analysis often blocks decompilation of adjacent real functions.

  • Proposed MCP addition: delete_function_by_address(entry).

  • Missing capability: get the function containing an arbitrary address.

  • Current fallback: local PyGhidra get-function-containing.

  • Why it matters: no-function windows and overlap investigations depend on quickly mapping instruction hits back to owning functions.

  • Proposed MCP addition: get_function_containing(address).

Arbitrary code and memory inspection

  • Missing capability: read raw bytes from an arbitrary address range in program memory.

  • Current fallback: local PyGhidra read-region.

  • Why it matters: some important sites are real code bytes that are not yet part of any function object.

  • Proposed MCP addition: read_region(start, end) returning bytes and a compact hex view.

  • Missing capability: dump nearby instructions around an arbitrary address even when no function exists there.

  • Current fallback: custom read-only PyGhidra scripts such as pyghidra_plans/dump_instruction_windows.py.

  • Why it matters: the 0x4588 investigation depended on inspecting instruction windows in no-function regions.

  • Proposed MCP addition: disassemble_region(start, end) or get_instruction_window(address, before_count, after_count).

  • Missing capability: scan all instructions for a literal operand or address token.

  • Current fallback: custom PyGhidra scripts such as scan_4588_instruction_uses.py.

  • Why it matters: normal xref APIs can miss useful operand-text hits in partially analyzed regions.

  • Proposed MCP addition: search_instructions(query, mode=text|operand|address, limit?).

  • Missing capability: robust data-address xrefs that include operand-based uses even when the reference manager has none.

  • Current fallback: instruction-text scans and manual disassembly windows.

  • Why it matters: globals like 0x4588 can be heavily used before formal references exist in the database.

  • Proposed MCP addition: get_data_uses(address, include_operand_scans=true).

Batch and transactional edits

  • Missing capability: apply a small transactional edit plan containing function removals, function creations, renames, and comments.

  • Current fallback: local PyGhidra apply-plan with JSON.

  • Why it matters: boundary repair work is safer when a verified batch can be replayed atomically.

  • Proposed MCP addition: apply_program_edit_plan(plan) with dry-run support.

  • Missing capability: batch comment creation for a verified address set.

  • Current fallback: repeated single-address comment calls or PyGhidra plan files.

  • Why it matters: reverse-engineering batches often produce several related evidence comments at once.

  • Proposed MCP addition: set_comments(batch) and set_decompiler_comments(batch).

  • Missing capability: batch rename-by-address for a small verified set.

  • Current fallback: repeated rename_function_by_address calls or local plan files.

  • Why it matters: verified raw-import ports often land in short, evidence-backed batches.

  • Proposed MCP addition: rename_functions_by_address(batch).

Reanalysis and repair helpers

  • Missing capability: re-disassemble or reanalyze a small address range after patching bytes or changing function boundaries.

  • Current fallback: local scripted repair passes.

  • Why it matters: the far-call fixup workflow and boundary recovery both depend on deterministic reanalysis of touched ranges.

  • Proposed MCP addition: reanalyze_region(start, end, options?).

  • Missing capability: patch a small byte range and immediately re-disassemble affected instructions.

  • Current fallback: local PyGhidra repair scripts.

  • Why it matters: the NE far-call fixup pass was a major workflow improvement and is exactly the sort of task MCP should eventually support.

  • Proposed MCP addition: patch_bytes_and_reanalyze(start, bytes, comment?).

  • Missing capability: detect likely bad function overlaps or candidate function starts in a small range.

  • Current fallback: manual repair plus custom PyGhidra probing.

  • Why it matters: overlap repair is one of the main reasons the workflow still has to leave MCP.

  • Proposed MCP addition: analyze_function_boundaries(start, end) returning overlap warnings and candidate entries.

Read-only project access and scripting

  • Missing capability: open a locked project read-only or query a specified project clone directly from MCP.

  • Current fallback: local PyGhidra against an unlocked temporary project clone.

  • Why it matters: the GUI often owns the main project while read-only inspection still needs to continue.

  • Proposed MCP addition: read-only project selection/open options for all analysis endpoints.

  • Missing capability: run a small read-only script for one-off inspections that do not justify a permanent MCP endpoint yet.

  • Current fallback: local PyGhidra run-script --read-only.

  • Why it matters: several repo workflows start as one-off analysis helpers before they prove worth productizing.

  • Proposed MCP addition: a constrained run_readonly_script(script_text|script_path) endpoint with explicit safety limits.

Migrated entries from ghidra-mcp_wishlist.md

Short, concrete gaps hit during live Crusader work. Each entry records what MCP lacked, what fallback was needed, and what a useful MCP feature should look like.

Open Gaps (migrated)

Byte-pattern search across program memory

  • Status: implemented in local fork (2026-03-26)
  • Missing MCP capability: search raw bytes or byte patterns across the current program's mapped segments / address spaces.
  • Fallback used: manual read_region sweeps plus local Python over the MCP HTTP bridge to scan live Spanish CRUSADER.EXE memory for the jassica16 scan-code table.
  • Useful MCP feature:
    • search_bytes(pattern, start?, end?, segment_filter?, max_hits?)
    • accepts hex byte patterns with optional wildcards
    • returns exact hit addresses plus nearby hex context
  • Why it matters: this would have closed the Spanish cheat-sequence question directly inside MCP instead of forcing ad hoc local scripting.
  • Status update (2026-03-26): local fork now exposes search_bytes(pattern, start?, end?, segment_filter?, max_hits?) in both the Java plugin and Python bridge; it accepts ?? wildcards, scans mapped memory blocks, and returns machine-friendly hit lines with block names and nearby hex context.

Reliable caller/xref recovery for local call sites

  • Status: implemented in local fork (2026-03-26)
  • Missing MCP capability: reliable function-call xrefs for near/local calls inside the active program.
  • Fallback used: manual search_instructions and instruction-window inspection because get_function_xrefs did not surface some obvious local call sites in the Spanish keyboard/helper cluster.
  • Useful MCP feature:
    • improve get_function_xrefs so it includes near calls, far calls, tail-call-style jumps, and thunk references consistently
    • or add get_callers(address_or_name, include_near=true, include_far=true, include_jumps=true)
  • Why it matters: tracing helper chains around hidden key-sequence code is slower and less reliable when local callers have to be reconstructed by text search.
  • Status update (2026-03-26): local fork now exposes get_callers(target, include_near=true, include_far=true, include_jumps=true, limit?), combining reference-manager hits with instruction-flow scans so local near-call sites show up even when plain xrefs are incomplete; get_function_xrefs now reuses the same caller recovery path.

Cross-program reads inside the same Ghidra project

  • Status: implemented in local fork (2026-03-26)
  • Missing MCP capability: read/query another program or assembly in the same project without switching the active program first.
  • Fallback used: indirect comparison against repo notes, workspace-side files, and ad hoc local scripts instead of querying /CRUSADER.EXE, /es/CRUSADER.EXE, /Writable/..., or other domain files side by side through MCP.
  • Useful MCP feature:
    • allow explicit target selectors on all read/query endpoints, not only write endpoints
    • example: read_region(start, end, project_dir?, project_name?, folder_path?, program_name?)
    • same for strings, functions, xrefs, data uses, decompile, disassemble, symbol lookup, and segment listing
  • Why it matters: live localized-build comparisons and writable-copy verification should not require changing the active Ghidra tab just to inspect another program.
  • Status update (2026-03-26): read/query endpoints in the local fork now accept optional explicit target selectors (project_dir, project_name, folder_path, program_name) and reuse the same target-resolution layer as write flows; this now covers method/class listings, segments, imports/exports, namespaces, data items, function lookup/listing, decompile/disassembly, symbol lookup, regions, instruction scans, strings, xrefs, and data-use queries.

Cross-project / cross-program compare tooling

  • Status: implemented in local fork (2026-03-26)
  • Missing MCP capability: first-class compare operations between two programs in the same project or across projects.
  • Fallback used: manual note-to-note comparison, address math, and repeated per-program queries.
  • Useful MCP feature:
    • compare_regions(left_program, left_range, right_program, right_range, mode=bytes|words|disasm|strings)
    • compare_strings(left_program, right_program, filter?)
    • compare_functions(left_program, left_addr_or_name, right_program, right_addr_or_name, mode=signature|disasm|decompile|xrefs)
    • machine-readable output with address pairs, similarity score, and differing bytes/instructions/strings
  • Why it matters: this would make English vs Spanish / Remorse vs Regret / raw vs live NE comparisons much faster and less error-prone.
  • Status update (2026-03-26): local fork now exposes compare_regions(...), compare_strings(...), and compare_functions(...) with left/right explicit target selectors; outputs are machine-friendly and include comparison mode, similarity score, and capped difference samples for byte/word, disassembly, string, signature, decompile, and xref views.

Port renames/comments/symbol facts between programs

  • Status: implemented in local fork (2026-03-26)
  • Missing MCP capability: apply verified names/comments from one program to another program with explicit provenance instead of re-entering them one by one.
  • Fallback used: manual rename/comment batches plus external notes to carry mapping provenance.
  • Useful MCP feature:
    • port_symbols(source_program, target_program, mappings, apply=names|comments|both, provenance_comment_template?)
    • support direct address maps, segment-relative maps, and user-supplied CSV/JSON mapping tables
    • dry-run mode showing collisions and ambiguous targets
  • Why it matters: porting verified English or raw-import findings into Spanish or live NE targets is a recurring workflow.
  • Status update (2026-03-26): local fork now exposes port_symbols(mappings, apply=names|comments|both, provenance_comment_template?, dry_run?) with source_* and target_* selectors; the bridge accepts a verified list of source/target address pairs and the plugin ports names plus PRE/EOL comments with optional provenance text and explicit-target save support.

Project inventory / browse endpoint

  • Status: implemented in local fork (2026-03-26)
  • Missing MCP capability: list project folders and available programs through MCP.
  • Fallback used: repo-side assumptions and local tooling; the current MCP read tools expose only the active program cleanly.
  • Useful MCP feature:
    • list_project_programs(project_dir?, project_name?, folder_path?, recursive=true)
    • returns folder path, program name, read-only/writable/versioned state, and whether it is currently open
  • Why it matters: comparing or porting across programs is awkward without a discoverable inventory of assemblies already in the Ghidra project.
  • Status update (2026-03-26): local fork now exposes list_project_programs(project_dir?, project_name?, folder_path?, recursive=true) plus a project_programs alias; it walks project folders and returns machine-friendly program inventory lines with folder path, program name, content type, read-only/versioned flags, and current-open state.

Class / namespace authoring for C++ lifting

  • Missing MCP capability: create and manage Ghidra class or namespace symbols, then move existing functions under those owners as methods.
  • Current fallback: manual Ghidra GUI edits in the Symbol Tree or one-off local scripts outside the normal MCP workflow.
  • Why it matters: the Remorse binary already shows repeated ctor/dtor patterns, stable vtable roots, and class-like object families, but the current MCP workflow can only rename flat functions. That blocks a disciplined shift from procedural naming toward grouped C++-style ownership.
  • Proposed MCP addition:
    • create_namespace(name, parent_path?, kind=namespace|class)
    • move_symbol_to_namespace(symbol_address_or_name, namespace_path, new_name?)
    • set_function_class(function_address, class_path, method_name?, this_param_name?, calling_convention?)
    • machine-friendly responses that include the final symbol path and any rename collisions.
  • Status update (2026-04-05): local fork now exposes create_namespace(...), list_namespace_members(...), move_symbol_to_namespace(...), and set_function_class(...) in both the Java plugin and Python bridge. The implementation supports explicit target selectors, dry-run moves, collision policies (fail|keep_existing|rename_incoming), and compatibility aliases (create_class, move_function_to_class).

Vtable / OO recovery helpers for class reconstruction

  • Missing MCP capability: first-class helpers for identifying vtables, attaching function slots to candidate classes, and materializing class/instance layouts from evidence-backed data.
  • Current fallback: manual note collation from decompiler/disassembly output plus ad hoc datatype work in the GUI.
  • Why it matters: the repo already has enough evidence to start lifting major families into C++ classes, but a recompilable source path needs more than renamed functions. It needs reproducible vtable maps, this-pointer typing, field layouts, inheritance guesses, and explicit provenance for each class model.
  • Proposed MCP addition:
    • analyze_vtable(address, slot_count?, namespace_path?)
    • create_or_update_struct(name, size?, fields)
    • set_function_this_type(function_address, struct_name, this_storage=stack|register|farptr)
    • apply_class_layout(class_path, instance_struct, vtable_struct?, methods)
    • optional dry-run output showing inferred slots, unresolved targets, and conflicting field/size evidence.
  • Status update (2026-04-05): local fork now exposes analyze_vtable(...), create_or_update_struct(...), create_or_update_vtable(...), set_function_this_type(...), and apply_class_layout(...) in both layers. Struct and vtable authoring accept line-encoded field/slot batches from the bridge, set_function_this_type(...) updates the first parameter to a typed this pointer while preserving storage when possible, and apply_class_layout(...) batches namespace moves plus this typing with dry-run support. Compatibility aliases now also cover set_this_type and build_vtable.