439 lines
No EOL
12 KiB
Markdown
439 lines
No EOL
12 KiB
Markdown
# GhidraMCP Class-Lifting Endpoint Spec
|
||
|
||
## Purpose
|
||
|
||
This note drafts the endpoint surface needed to support the Remorse class-lifting workflow described in `docs/remorse-cpp-decompilation-plan.md` and grounded by `docs/remorse-class-candidate-inventory.md`.
|
||
|
||
This is not an implementation batch. It is a local design spec so that when MCP work resumes later, the endpoint set can be built in a way that matches the actual reverse-engineering workflow instead of a generic symbol-edit API.
|
||
|
||
## Design Goals
|
||
|
||
The new endpoints should make these workflows cheap and repeatable:
|
||
|
||
1. create class and namespace containers in Ghidra without touching the GUI
|
||
2. move already-renamed flat functions under explicit class ownership
|
||
3. build typed instance structs and typed vtables from verified evidence
|
||
4. attach `this`-pointer semantics and method signatures to recovered methods
|
||
5. preserve ambiguity when evidence is partial instead of forcing speculative class conversions
|
||
6. support dry-run review before any bulk symbol or datatype mutation
|
||
|
||
## Non-Goals
|
||
|
||
- automatic recovery of class hierarchies from raw heuristics alone
|
||
- one-shot `convert whole binary to C++ classes`
|
||
- speculative inheritance inference without vtable or field evidence
|
||
- silent symbol moves that hide rename collisions or ownership conflicts
|
||
|
||
## Existing MCP Behavior To Reuse
|
||
|
||
The local fork already has patterns worth reusing:
|
||
|
||
- explicit target selectors: `project_dir`, `project_name`, `folder_path`, `program_name`
|
||
- dry-run oriented edit-plan behavior
|
||
- machine-friendly outputs rather than prose-heavy summaries
|
||
- backward-compatible aliases when route names change
|
||
|
||
Every new class-lifting endpoint should follow the same conventions.
|
||
|
||
## Core Object Model Assumptions
|
||
|
||
The class-lifting workflow needs to manipulate four kinds of things explicitly:
|
||
|
||
1. namespace/class containers in the symbol tree
|
||
2. function ownership and method naming
|
||
3. datatypes for instance structs and vtables
|
||
4. binding metadata between methods, vtable slots, and instance layouts
|
||
|
||
That means symbol-only endpoints are not enough. Datatype endpoints and method-binding endpoints are part of the minimum viable feature set.
|
||
|
||
## Proposed Endpoints
|
||
|
||
### 1. `create_namespace`
|
||
|
||
Create a namespace or class container.
|
||
|
||
Parameters:
|
||
|
||
- `name`: string
|
||
- `parent_path`: string, optional
|
||
- `kind`: enum `namespace|class`, default `namespace`
|
||
- explicit target selectors, optional
|
||
|
||
Response:
|
||
|
||
- `status`
|
||
- `created`: bool
|
||
- `kind`
|
||
- `path`
|
||
- `symbol_id` or equivalent stable identifier if available
|
||
- `collision`: existing path info when create is skipped or merged
|
||
|
||
Why it matters:
|
||
|
||
- lets the workflow create `Entity`, `SpriteNode`, `EntityVmRuntime`, or similar owners before moving methods
|
||
|
||
### 2. `list_namespace_members`
|
||
|
||
Return members of a namespace or class container in a machine-friendly form.
|
||
|
||
Parameters:
|
||
|
||
- `path`: string
|
||
- `include_child_namespaces`: bool, default `false`
|
||
- `include_functions`: bool, default `true`
|
||
- `include_data`: bool, default `true`
|
||
- explicit target selectors, optional
|
||
|
||
Response:
|
||
|
||
- `status`
|
||
- `path`
|
||
- `members`: array of `{ kind, name, address?, datatype?, child_count? }`
|
||
|
||
Why it matters:
|
||
|
||
- needed for inventory verification and idempotent batch moves
|
||
|
||
### 3. `move_symbol_to_namespace`
|
||
|
||
Move a function or data symbol under a namespace/class.
|
||
|
||
Parameters:
|
||
|
||
- `symbol_address`: string, optional
|
||
- `symbol_name`: string, optional
|
||
- one of the above required
|
||
- `namespace_path`: string
|
||
- `new_name`: string, optional
|
||
- `conflict_policy`: enum `fail|keep_existing|rename_incoming`, default `fail`
|
||
- `dry_run`: bool, default `false`
|
||
- explicit target selectors, optional
|
||
|
||
Response:
|
||
|
||
- `status`
|
||
- `moved`: bool
|
||
- `old_path`
|
||
- `new_path`
|
||
- `collision`: optional structured collision detail
|
||
|
||
Why it matters:
|
||
|
||
- this is the basic operation needed to turn flat functions into methods after evidence is verified
|
||
|
||
### 4. `set_function_class`
|
||
|
||
High-level helper to move a function into a class and apply method-oriented naming/signature metadata in one call.
|
||
|
||
Parameters:
|
||
|
||
- `function_address`: string
|
||
- `class_path`: string
|
||
- `method_name`: string
|
||
- `this_param_name`: string, optional, default `this`
|
||
- `calling_convention`: string, optional
|
||
- `dry_run`: bool, default `false`
|
||
- explicit target selectors, optional
|
||
|
||
Response:
|
||
|
||
- `status`
|
||
- `function_address`
|
||
- `old_path`
|
||
- `new_path`
|
||
- `signature_before`
|
||
- `signature_after`
|
||
|
||
Why it matters:
|
||
|
||
- reduces the number of separate write operations for the common `move + rename + set this semantics` workflow
|
||
|
||
### 5. `create_or_update_struct`
|
||
|
||
Create or update a structure datatype.
|
||
|
||
Parameters:
|
||
|
||
- `name`: string
|
||
- `category_path`: string, optional
|
||
- `size`: integer, optional
|
||
- `packing`: integer, optional
|
||
- `fields`: array of field specs
|
||
|
||
Each field spec:
|
||
|
||
- `offset`: integer
|
||
- `name`: string
|
||
- `datatype`: string
|
||
- `comment`: string, optional
|
||
- `confidence`: enum `high|medium|low`, optional
|
||
|
||
- `dry_run`: bool, default `false`
|
||
- explicit target selectors, optional
|
||
|
||
Response:
|
||
|
||
- `status`
|
||
- `datatype_path`
|
||
- `created_or_updated`
|
||
- `size`
|
||
- `field_count`
|
||
- `conflicts`: array, optional
|
||
|
||
Why it matters:
|
||
|
||
- class lifting without struct authoring is not enough for readable or recompilable source
|
||
|
||
### 6. `create_or_update_vtable`
|
||
|
||
Create a vtable datatype as a structure of function pointers.
|
||
|
||
Parameters:
|
||
|
||
- `name`: string
|
||
- `category_path`: string, optional
|
||
- `slots`: array of slot specs
|
||
- `dry_run`: bool, default `false`
|
||
- explicit target selectors, optional
|
||
|
||
Each slot spec:
|
||
|
||
- `offset`: integer
|
||
- `name`: string
|
||
- `function_address`: string, optional
|
||
- `prototype`: string, optional
|
||
- `comment`: string, optional
|
||
|
||
Response:
|
||
|
||
- `status`
|
||
- `datatype_path`
|
||
- `slot_count`
|
||
- `bound_functions`: array of `{ offset, function_address, name }`
|
||
|
||
Why it matters:
|
||
|
||
- this is the missing datatype-side half of stable virtual dispatch recovery
|
||
|
||
### 7. `set_function_this_type`
|
||
|
||
Apply or update `this`-pointer typing on a function.
|
||
|
||
Parameters:
|
||
|
||
- `function_address`: string
|
||
- `this_type`: string
|
||
- `this_param_name`: string, optional, default `this`
|
||
- `this_storage`: enum `stack|register|farptr`, optional
|
||
- `calling_convention`: string, optional
|
||
- `dry_run`: bool, default `false`
|
||
- explicit target selectors, optional
|
||
|
||
Response:
|
||
|
||
- `status`
|
||
- `function_address`
|
||
- `signature_before`
|
||
- `signature_after`
|
||
|
||
Why it matters:
|
||
|
||
- many decompiler improvements only show up after the instance type is attached to the first argument correctly
|
||
|
||
### 8. `analyze_vtable`
|
||
|
||
Read-side helper that inspects a suspected vtable region and emits slot candidates.
|
||
|
||
Parameters:
|
||
|
||
- `address`: string
|
||
- `slot_count`: integer, optional
|
||
- `stop_on_invalid_pointer`: bool, default `true`
|
||
- explicit target selectors, optional
|
||
|
||
Response:
|
||
|
||
- `status`
|
||
- `address`
|
||
- `slots`: array of `{ offset, target_address, target_name, is_function, current_owner?, comment? }`
|
||
- `warnings`: array, optional
|
||
|
||
Why it matters:
|
||
|
||
- this is the minimum analysis helper needed before class authorship is applied at scale
|
||
|
||
### 9. `apply_class_layout`
|
||
|
||
Bind a class namespace, instance struct, optional vtable struct, and a set of methods in one dry-runnable transaction.
|
||
|
||
Parameters:
|
||
|
||
- `class_path`: string
|
||
- `instance_struct`: string
|
||
- `vtable_struct`: string, optional
|
||
- `vtable_address`: string, optional
|
||
- `methods`: array of method specs
|
||
- `dry_run`: bool, default `false`
|
||
- explicit target selectors, optional
|
||
|
||
Each method spec:
|
||
|
||
- `function_address`: string
|
||
- `method_name`: string
|
||
- `slot_offset`: integer, optional
|
||
- `is_virtual`: bool, default `false`
|
||
- `this_type`: string, optional
|
||
- `comment`: string, optional
|
||
|
||
Response:
|
||
|
||
- `status`
|
||
- `class_path`
|
||
- `applied_methods`
|
||
- `applied_structs`
|
||
- `warnings`
|
||
|
||
Why it matters:
|
||
|
||
- supports one-shot promotion of a verified family from notes into Ghidra with explicit review first
|
||
|
||
### 10. `export_class_candidate`
|
||
|
||
Read-side export helper for documentation and source-generation prep.
|
||
|
||
Parameters:
|
||
|
||
- `class_path`: string
|
||
- `include_struct_fields`: bool, default `true`
|
||
- `include_vtable`: bool, default `true`
|
||
- `include_method_signatures`: bool, default `true`
|
||
- explicit target selectors, optional
|
||
|
||
Response:
|
||
|
||
- machine-friendly JSON-like object containing class metadata, methods, field layouts, and slot maps
|
||
|
||
Why it matters:
|
||
|
||
- the local docs and future C++ skeleton emission need a clean export surface, not just screen scraping
|
||
|
||
## Field Schemas
|
||
|
||
### Struct field schema
|
||
|
||
Recommended stable shape:
|
||
|
||
```json
|
||
{
|
||
"offset": 0,
|
||
"name": "vtable",
|
||
"datatype": "EntityVTable *",
|
||
"comment": "Primary vtable pointer",
|
||
"confidence": "high"
|
||
}
|
||
```
|
||
|
||
### Method schema
|
||
|
||
```json
|
||
{
|
||
"function_address": "0008:ba00",
|
||
"method_name": "Init",
|
||
"slot_offset": null,
|
||
"is_virtual": false,
|
||
"this_type": "EntityDispatchEntry *",
|
||
"comment": "Base constructor-style init"
|
||
}
|
||
```
|
||
|
||
### Vtable slot schema
|
||
|
||
```json
|
||
{
|
||
"offset": 20,
|
||
"name": "OnEventType2",
|
||
"function_address": "000b:3ab2",
|
||
"prototype": "void (__far *OnEventType2)(SpriteNode *, Event *)"
|
||
}
|
||
```
|
||
|
||
## Transaction And Safety Rules
|
||
|
||
All write-capable class-lifting endpoints should support:
|
||
|
||
- `dry_run`
|
||
- explicit target selectors
|
||
- structured conflict reporting
|
||
- idempotent repeat calls where practical
|
||
- no silent overwrite of unrelated symbols or datatype fields
|
||
|
||
Recommended conflict output shape:
|
||
|
||
- `type`: `symbol_collision|datatype_collision|slot_conflict|owner_conflict|signature_conflict`
|
||
- `path` or `address`
|
||
- `existing`
|
||
- `requested`
|
||
- `resolution_options`
|
||
|
||
## Backward Compatibility And Aliases
|
||
|
||
Where practical, add aliases instead of replacing older names.
|
||
|
||
Recommended aliases:
|
||
|
||
- `create_class` -> `create_namespace(kind=class)`
|
||
- `move_function_to_class` -> `set_function_class`
|
||
- `set_this_type` -> `set_function_this_type`
|
||
- `build_vtable` -> `create_or_update_vtable`
|
||
|
||
This follows the local fork’s existing pattern of keeping compatibility wrappers when route names evolve.
|
||
|
||
## Suggested Implementation Order
|
||
|
||
If implementation resumes later, the smallest useful sequence is:
|
||
|
||
1. `create_namespace`
|
||
2. `move_symbol_to_namespace`
|
||
3. `set_function_this_type`
|
||
4. `create_or_update_struct`
|
||
5. `analyze_vtable`
|
||
6. `create_or_update_vtable`
|
||
7. `apply_class_layout`
|
||
8. `export_class_candidate`
|
||
|
||
That order enables immediate manual class work after only the first three or four endpoints, while leaving the richer transactional workflows for later.
|
||
|
||
## First Real Workflow To Target
|
||
|
||
The first workflow this API should make easy is the pilot family from the current inventory:
|
||
|
||
### `EntityDispatchEntryBase` promotion workflow
|
||
|
||
1. create class namespace `Remorse::EntityDispatchEntry`
|
||
2. create instance struct `EntityDispatchEntry`
|
||
3. move `0008:ba00`, `0008:bca8`, `0008:bd53`, `0008:bf8e`, `0008:c01d`, `0008:dbec`, and constructor variants under that class as methods
|
||
4. attach `this` typing
|
||
5. analyze or define vtables `0x3b06`, `0x2d10`, `0x3afe`, `0x3ad2`, `0x3aa6`
|
||
6. export the class candidate for repo-side documentation and C++ skeleton generation
|
||
|
||
If the endpoint surface handles that family cleanly, it is probably sufficient for the rest of the early C++ lifting work.
|
||
|
||
## Open Questions To Resolve Later
|
||
|
||
- whether Ghidra class namespaces or plain namespaces produce better decompiler output in this 16-bit NE environment
|
||
- how best to encode far-pointer aware `this` conventions in method signatures
|
||
- whether vtable datatypes should be attached to concrete memory addresses automatically or only on explicit request
|
||
- whether confidence annotations should live in datatype comments, decompiler comments, or external export metadata
|
||
|
||
## Summary
|
||
|
||
The endpoint surface needed here is not large, but it does need to span both symbol ownership and datatype authorship. If later MCP work only adds `move function into class`, it will still leave the hardest part of the C++ lift undone.
|
||
|
||
The minimum viable class-lifting feature set is therefore:
|
||
|
||
- namespace/class creation
|
||
- symbol-to-class moves
|
||
- `this` typing
|
||
- struct authoring
|
||
- vtable analysis/authoring
|
||
- one transactional `apply_class_layout` path |