Crusader_Decomp/ghidra_mcp_wishlist.md
2026-04-07 00:15:44 +02:00

8.8 KiB

Ghidra MCP Wishlist

This file records concrete MCP gaps hit during Crusader workflow passes.

Rules for keeping it useful:

  • Put only unresolved work in Remaining TODOs.
  • Move implemented or source-fixed items to Done / Implemented.
  • Keep each remaining item short: missing capability, fallback, why it matters, proposed behavior, latest status.

Remaining TODOs

Class-Lift Typing Live Parity

  • Missing capability: end-to-end live-session parity for storage-aware this typing on 16-bit NE methods whose current storage does not match the default pointer storage the binder would choose.
  • Current fallback: use local PyGhidra with DYNAMIC_STORAGE_ALL_PARAMS, or move methods with set_function_class(...) and defer final this typing/manual prototype cleanup.
  • Why it matters: EntityVmContext lifecycle methods and EntityVmRuntime::Create still need the live MCP path to behave like the verified local PyGhidra repair flow.
  • Proposed MCP behavior: set_function_this_type(...) and apply_class_layout(...) should reliably fall back to dynamic storage in-session for these 16-bit cases, while preserving structured per-method warnings instead of aborting the batch.
  • Latest status (2026-04-06): local PyGhidra confirmed that 1420:0eec, 1420:10b6, 1420:10da, 1420:1162, 1420:118f, and 1420:1278 accept EntityVmContext * this cleanly via DYNAMIC_STORAGE_ALL_PARAMS. The live storage-aware path now also accepts explicit /Remorse/EntityVmRuntime *32, /Remorse/EntityVmOwnerResource *32, /Remorse/EntityVmContext *32, and /Remorse/EntityVmSlotEntry *32 signatures in-session once the exact *32 datatype has first been resolved into the program data-type manager. The remaining live gap is now mostly about deeper mixed-width parameter packs like 1420:0eec CreateFromSlotIndex, not the previously blocked 4-byte object-pointer cases themselves.

Storage-Aware Prototype Live Verification

  • Missing capability: confirmed live-session parity for the newest storage-aware prototype fixes on 16-bit NE repair cases.
  • Current fallback: if the active GUI session is on an older plugin build, reload the plugin; if parity still fails, use local PyGhidra or manually compensate when testing stack offsets / calling conventions.
  • Why it matters: 1000:42e2 and 1420:1499 are the known proof cases for explicit return storage, stack-word parameter modeling, and 16-bit far calling conventions.
  • Proposed MCP behavior: set_function_prototype_storage(...) should accept bare stack: offsets in the same hex-style form used in current workflow notes and should preserve exact calling-convention tokens such as __cdecl16far before falling back to lossy legacy normalization.
  • Latest status (2026-04-06): the reloaded live plugin now reaches the real storage-aware implementation in-session on both proof cases, and explicit AX:DX return storage survives correctly on 1000:42e2 and 1420:1499. The remaining live parity issue is now narrower: calling_convention='__cdecl16far' still normalizes those proof-case applies to plain __cdecl, but direct live run_write_script(...) calls can immediately restore __cdecl16far, which proves the live database accepts the exact convention token and leaves the endpoint-side normalization/deployment path as the remaining gap.

Live Metadata / Read-Target Verification

  • Missing capability: fully verified live-session parity for selector-aware reads and metadata helpers in mixed-build or partially refreshed GUI sessions.
  • Current fallback: bridge alias retries, explicit-target normalization, and manual project-note cross-checks when a live session still behaves like an older plugin build.
  • Why it matters: Crusader work routinely needs side-by-side reads across /CRUSADER.EXE, /es/CRUSADER.EXE, /Writable/..., and other project entries without changing the active Ghidra tab.
  • Proposed MCP behavior: list_project_programs(...), get_runtime_capabilities(...), get_callers(...), and other selector-aware read helpers should bind reliably to the requested or active target and return structured unsupported-state output instead of raw context failures.
  • Latest status (2026-04-06): the local fork already includes alias fallbacks and Windows path/folder normalization for explicit-target matching. Remaining work is live-session verification after plugin refresh, not additional local source coverage.

Done / Implemented In Local Fork

Transport And Runtime

  • POST endpoints now accept both application/json and application/x-www-form-urlencoded request bodies. Unsupported POST payloads fail early with unsupported-content-type instead of degrading into missing-parameter errors.
  • get_runtime_capabilities() reports readonly/write-script capability state and run_readonly_script(...) returns structured unsupported-state output when Python support is unavailable.
  • run_write_script(...) and alias run_transaction_script(...) are implemented with dry-run support, explicit target selectors, a write-policy denylist, and machine-friendly transaction results.
  • Bridge runtime helpers retry compatible aliases on 404 / No context found for request for mixed-build live sessions.

Explicit Targeting And Project Access

  • Explicit write targeting is implemented for edit flows such as apply_program_edit_plan(...) and patch_bytes_and_reanalyze(...) with deterministic save behavior.
  • Selector-aware read/query endpoints now accept project_dir, project_name, folder_path, and program_name and reuse the same target-resolution layer as write flows.
  • Target matching now normalizes Windows path casing and slash style and can infer missing project selectors from the active domain file when appropriate.
  • list_project_programs(...) plus alias project_programs is implemented and returns machine-friendly folder/program inventory.

Analysis, Inspection, And Xrefs

  • Function boundary repair helpers are implemented: create_function_by_address(...), delete_function_by_address(...), and get_function_containing(...).
  • Arbitrary memory/code inspection helpers are implemented: read_region(...), disassemble_region(...), get_instruction_window(...), search_instructions(...), and get_data_uses(...).
  • search_bytes(...) is implemented with ?? wildcards and machine-friendly hit output.
  • Caller/xref recovery is improved via get_callers(...), and get_xrefs_to(...) / get_xrefs_from(...) return typed reference kinds plus containing-function metadata.
  • get_symbol_at(address) now uses direct routes when present and bridge-side legacy fallbacks when the live process is older.

Batch Edits And Comparison Tools

  • Batch helpers are implemented: set_comments(...), set_decompiler_comments(...), rename_functions_by_address(...), and apply_program_edit_plan(...) with dry-run support.
  • Reanalysis helpers are implemented: reanalyze_region(...), patch_bytes_and_reanalyze(...), and analyze_function_boundaries(...).
  • Cross-program comparison helpers are implemented: compare_regions(...), compare_strings(...), and compare_functions(...).
  • port_symbols(...) now ports verified names/comments between programs with provenance text and explicit source/target selectors.

Class / Namespace / OO Recovery

  • Namespace and class authoring helpers are implemented: create_namespace(...), create_class(...), list_namespace_members(...), move_symbol_to_namespace(...), and set_function_class(...).
  • Vtable and struct helpers are implemented: analyze_vtable(...), create_or_update_struct(...), create_or_update_vtable(...), and alias coverage such as build_vtable / set_this_type.
  • set_function_this_type(...) supports storage-strategy hints and apply_class_layout(...) now soft-fails per-method typing with structured warnings instead of aborting the whole batch.

Prototype And Storage Modeling

  • The storage-aware prototype endpoint is implemented as set_function_prototype_storage(...) with alias set_storage_aware_prototype(...).
  • The endpoint accepts declarative return_type, return_storage, ordered parameter lines (name|type|storage), explicit target selectors, varargs, and machine-friendly warnings when hidden __return_storage_ptr__ state is still present.
  • Source-level fixes landed on 2026-04-06 for the two known live correctness bugs:
    • stack: storage is now parsed before generic Ghidra deserialization so workflow-style bare stack offsets are interpreted consistently.
    • exact calling-convention tokens are tried before legacy normalization so 16-bit far conventions such as __cdecl16far are not needlessly collapsed to plain __cdecl when the exact token is accepted.

Historical Notes

  • If a future pass hits a new MCP gap, add it under Remaining TODOs and move it to Done / Implemented In Local Fork once the local source and bridge support are both in place.