deepened understanding
This commit is contained in:
parent
a70ec15899
commit
73931629ae
32 changed files with 5011 additions and 259 deletions
439
docs/ghidra-mcp-class-lifting-endpoint-spec.md
Normal file
439
docs/ghidra-mcp-class-lifting-endpoint-spec.md
Normal file
|
|
@ -0,0 +1,439 @@
|
|||
# GhidraMCP Class-Lifting Endpoint Spec
|
||||
|
||||
## Purpose
|
||||
|
||||
This note drafts the endpoint surface needed to support the Remorse class-lifting workflow described in `docs/remorse-cpp-decompilation-plan.md` and grounded by `docs/remorse-class-candidate-inventory.md`.
|
||||
|
||||
This is not an implementation batch. It is a local design spec so that when MCP work resumes later, the endpoint set can be built in a way that matches the actual reverse-engineering workflow instead of a generic symbol-edit API.
|
||||
|
||||
## Design Goals
|
||||
|
||||
The new endpoints should make these workflows cheap and repeatable:
|
||||
|
||||
1. create class and namespace containers in Ghidra without touching the GUI
|
||||
2. move already-renamed flat functions under explicit class ownership
|
||||
3. build typed instance structs and typed vtables from verified evidence
|
||||
4. attach `this`-pointer semantics and method signatures to recovered methods
|
||||
5. preserve ambiguity when evidence is partial instead of forcing speculative class conversions
|
||||
6. support dry-run review before any bulk symbol or datatype mutation
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- automatic recovery of class hierarchies from raw heuristics alone
|
||||
- one-shot `convert whole binary to C++ classes`
|
||||
- speculative inheritance inference without vtable or field evidence
|
||||
- silent symbol moves that hide rename collisions or ownership conflicts
|
||||
|
||||
## Existing MCP Behavior To Reuse
|
||||
|
||||
The local fork already has patterns worth reusing:
|
||||
|
||||
- explicit target selectors: `project_dir`, `project_name`, `folder_path`, `program_name`
|
||||
- dry-run oriented edit-plan behavior
|
||||
- machine-friendly outputs rather than prose-heavy summaries
|
||||
- backward-compatible aliases when route names change
|
||||
|
||||
Every new class-lifting endpoint should follow the same conventions.
|
||||
|
||||
## Core Object Model Assumptions
|
||||
|
||||
The class-lifting workflow needs to manipulate four kinds of things explicitly:
|
||||
|
||||
1. namespace/class containers in the symbol tree
|
||||
2. function ownership and method naming
|
||||
3. datatypes for instance structs and vtables
|
||||
4. binding metadata between methods, vtable slots, and instance layouts
|
||||
|
||||
That means symbol-only endpoints are not enough. Datatype endpoints and method-binding endpoints are part of the minimum viable feature set.
|
||||
|
||||
## Proposed Endpoints
|
||||
|
||||
### 1. `create_namespace`
|
||||
|
||||
Create a namespace or class container.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `name`: string
|
||||
- `parent_path`: string, optional
|
||||
- `kind`: enum `namespace|class`, default `namespace`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Response:
|
||||
|
||||
- `status`
|
||||
- `created`: bool
|
||||
- `kind`
|
||||
- `path`
|
||||
- `symbol_id` or equivalent stable identifier if available
|
||||
- `collision`: existing path info when create is skipped or merged
|
||||
|
||||
Why it matters:
|
||||
|
||||
- lets the workflow create `Entity`, `SpriteNode`, `EntityVmRuntime`, or similar owners before moving methods
|
||||
|
||||
### 2. `list_namespace_members`
|
||||
|
||||
Return members of a namespace or class container in a machine-friendly form.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `path`: string
|
||||
- `include_child_namespaces`: bool, default `false`
|
||||
- `include_functions`: bool, default `true`
|
||||
- `include_data`: bool, default `true`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Response:
|
||||
|
||||
- `status`
|
||||
- `path`
|
||||
- `members`: array of `{ kind, name, address?, datatype?, child_count? }`
|
||||
|
||||
Why it matters:
|
||||
|
||||
- needed for inventory verification and idempotent batch moves
|
||||
|
||||
### 3. `move_symbol_to_namespace`
|
||||
|
||||
Move a function or data symbol under a namespace/class.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `symbol_address`: string, optional
|
||||
- `symbol_name`: string, optional
|
||||
- one of the above required
|
||||
- `namespace_path`: string
|
||||
- `new_name`: string, optional
|
||||
- `conflict_policy`: enum `fail|keep_existing|rename_incoming`, default `fail`
|
||||
- `dry_run`: bool, default `false`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Response:
|
||||
|
||||
- `status`
|
||||
- `moved`: bool
|
||||
- `old_path`
|
||||
- `new_path`
|
||||
- `collision`: optional structured collision detail
|
||||
|
||||
Why it matters:
|
||||
|
||||
- this is the basic operation needed to turn flat functions into methods after evidence is verified
|
||||
|
||||
### 4. `set_function_class`
|
||||
|
||||
High-level helper to move a function into a class and apply method-oriented naming/signature metadata in one call.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `function_address`: string
|
||||
- `class_path`: string
|
||||
- `method_name`: string
|
||||
- `this_param_name`: string, optional, default `this`
|
||||
- `calling_convention`: string, optional
|
||||
- `dry_run`: bool, default `false`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Response:
|
||||
|
||||
- `status`
|
||||
- `function_address`
|
||||
- `old_path`
|
||||
- `new_path`
|
||||
- `signature_before`
|
||||
- `signature_after`
|
||||
|
||||
Why it matters:
|
||||
|
||||
- reduces the number of separate write operations for the common `move + rename + set this semantics` workflow
|
||||
|
||||
### 5. `create_or_update_struct`
|
||||
|
||||
Create or update a structure datatype.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `name`: string
|
||||
- `category_path`: string, optional
|
||||
- `size`: integer, optional
|
||||
- `packing`: integer, optional
|
||||
- `fields`: array of field specs
|
||||
|
||||
Each field spec:
|
||||
|
||||
- `offset`: integer
|
||||
- `name`: string
|
||||
- `datatype`: string
|
||||
- `comment`: string, optional
|
||||
- `confidence`: enum `high|medium|low`, optional
|
||||
|
||||
- `dry_run`: bool, default `false`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Response:
|
||||
|
||||
- `status`
|
||||
- `datatype_path`
|
||||
- `created_or_updated`
|
||||
- `size`
|
||||
- `field_count`
|
||||
- `conflicts`: array, optional
|
||||
|
||||
Why it matters:
|
||||
|
||||
- class lifting without struct authoring is not enough for readable or recompilable source
|
||||
|
||||
### 6. `create_or_update_vtable`
|
||||
|
||||
Create a vtable datatype as a structure of function pointers.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `name`: string
|
||||
- `category_path`: string, optional
|
||||
- `slots`: array of slot specs
|
||||
- `dry_run`: bool, default `false`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Each slot spec:
|
||||
|
||||
- `offset`: integer
|
||||
- `name`: string
|
||||
- `function_address`: string, optional
|
||||
- `prototype`: string, optional
|
||||
- `comment`: string, optional
|
||||
|
||||
Response:
|
||||
|
||||
- `status`
|
||||
- `datatype_path`
|
||||
- `slot_count`
|
||||
- `bound_functions`: array of `{ offset, function_address, name }`
|
||||
|
||||
Why it matters:
|
||||
|
||||
- this is the missing datatype-side half of stable virtual dispatch recovery
|
||||
|
||||
### 7. `set_function_this_type`
|
||||
|
||||
Apply or update `this`-pointer typing on a function.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `function_address`: string
|
||||
- `this_type`: string
|
||||
- `this_param_name`: string, optional, default `this`
|
||||
- `this_storage`: enum `stack|register|farptr`, optional
|
||||
- `calling_convention`: string, optional
|
||||
- `dry_run`: bool, default `false`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Response:
|
||||
|
||||
- `status`
|
||||
- `function_address`
|
||||
- `signature_before`
|
||||
- `signature_after`
|
||||
|
||||
Why it matters:
|
||||
|
||||
- many decompiler improvements only show up after the instance type is attached to the first argument correctly
|
||||
|
||||
### 8. `analyze_vtable`
|
||||
|
||||
Read-side helper that inspects a suspected vtable region and emits slot candidates.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `address`: string
|
||||
- `slot_count`: integer, optional
|
||||
- `stop_on_invalid_pointer`: bool, default `true`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Response:
|
||||
|
||||
- `status`
|
||||
- `address`
|
||||
- `slots`: array of `{ offset, target_address, target_name, is_function, current_owner?, comment? }`
|
||||
- `warnings`: array, optional
|
||||
|
||||
Why it matters:
|
||||
|
||||
- this is the minimum analysis helper needed before class authorship is applied at scale
|
||||
|
||||
### 9. `apply_class_layout`
|
||||
|
||||
Bind a class namespace, instance struct, optional vtable struct, and a set of methods in one dry-runnable transaction.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `class_path`: string
|
||||
- `instance_struct`: string
|
||||
- `vtable_struct`: string, optional
|
||||
- `vtable_address`: string, optional
|
||||
- `methods`: array of method specs
|
||||
- `dry_run`: bool, default `false`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Each method spec:
|
||||
|
||||
- `function_address`: string
|
||||
- `method_name`: string
|
||||
- `slot_offset`: integer, optional
|
||||
- `is_virtual`: bool, default `false`
|
||||
- `this_type`: string, optional
|
||||
- `comment`: string, optional
|
||||
|
||||
Response:
|
||||
|
||||
- `status`
|
||||
- `class_path`
|
||||
- `applied_methods`
|
||||
- `applied_structs`
|
||||
- `warnings`
|
||||
|
||||
Why it matters:
|
||||
|
||||
- supports one-shot promotion of a verified family from notes into Ghidra with explicit review first
|
||||
|
||||
### 10. `export_class_candidate`
|
||||
|
||||
Read-side export helper for documentation and source-generation prep.
|
||||
|
||||
Parameters:
|
||||
|
||||
- `class_path`: string
|
||||
- `include_struct_fields`: bool, default `true`
|
||||
- `include_vtable`: bool, default `true`
|
||||
- `include_method_signatures`: bool, default `true`
|
||||
- explicit target selectors, optional
|
||||
|
||||
Response:
|
||||
|
||||
- machine-friendly JSON-like object containing class metadata, methods, field layouts, and slot maps
|
||||
|
||||
Why it matters:
|
||||
|
||||
- the local docs and future C++ skeleton emission need a clean export surface, not just screen scraping
|
||||
|
||||
## Field Schemas
|
||||
|
||||
### Struct field schema
|
||||
|
||||
Recommended stable shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"offset": 0,
|
||||
"name": "vtable",
|
||||
"datatype": "EntityVTable *",
|
||||
"comment": "Primary vtable pointer",
|
||||
"confidence": "high"
|
||||
}
|
||||
```
|
||||
|
||||
### Method schema
|
||||
|
||||
```json
|
||||
{
|
||||
"function_address": "0008:ba00",
|
||||
"method_name": "Init",
|
||||
"slot_offset": null,
|
||||
"is_virtual": false,
|
||||
"this_type": "EntityDispatchEntry *",
|
||||
"comment": "Base constructor-style init"
|
||||
}
|
||||
```
|
||||
|
||||
### Vtable slot schema
|
||||
|
||||
```json
|
||||
{
|
||||
"offset": 20,
|
||||
"name": "OnEventType2",
|
||||
"function_address": "000b:3ab2",
|
||||
"prototype": "void (__far *OnEventType2)(SpriteNode *, Event *)"
|
||||
}
|
||||
```
|
||||
|
||||
## Transaction And Safety Rules
|
||||
|
||||
All write-capable class-lifting endpoints should support:
|
||||
|
||||
- `dry_run`
|
||||
- explicit target selectors
|
||||
- structured conflict reporting
|
||||
- idempotent repeat calls where practical
|
||||
- no silent overwrite of unrelated symbols or datatype fields
|
||||
|
||||
Recommended conflict output shape:
|
||||
|
||||
- `type`: `symbol_collision|datatype_collision|slot_conflict|owner_conflict|signature_conflict`
|
||||
- `path` or `address`
|
||||
- `existing`
|
||||
- `requested`
|
||||
- `resolution_options`
|
||||
|
||||
## Backward Compatibility And Aliases
|
||||
|
||||
Where practical, add aliases instead of replacing older names.
|
||||
|
||||
Recommended aliases:
|
||||
|
||||
- `create_class` -> `create_namespace(kind=class)`
|
||||
- `move_function_to_class` -> `set_function_class`
|
||||
- `set_this_type` -> `set_function_this_type`
|
||||
- `build_vtable` -> `create_or_update_vtable`
|
||||
|
||||
This follows the local fork’s existing pattern of keeping compatibility wrappers when route names evolve.
|
||||
|
||||
## Suggested Implementation Order
|
||||
|
||||
If implementation resumes later, the smallest useful sequence is:
|
||||
|
||||
1. `create_namespace`
|
||||
2. `move_symbol_to_namespace`
|
||||
3. `set_function_this_type`
|
||||
4. `create_or_update_struct`
|
||||
5. `analyze_vtable`
|
||||
6. `create_or_update_vtable`
|
||||
7. `apply_class_layout`
|
||||
8. `export_class_candidate`
|
||||
|
||||
That order enables immediate manual class work after only the first three or four endpoints, while leaving the richer transactional workflows for later.
|
||||
|
||||
## First Real Workflow To Target
|
||||
|
||||
The first workflow this API should make easy is the pilot family from the current inventory:
|
||||
|
||||
### `EntityDispatchEntryBase` promotion workflow
|
||||
|
||||
1. create class namespace `Remorse::EntityDispatchEntry`
|
||||
2. create instance struct `EntityDispatchEntry`
|
||||
3. move `0008:ba00`, `0008:bca8`, `0008:bd53`, `0008:bf8e`, `0008:c01d`, `0008:dbec`, and constructor variants under that class as methods
|
||||
4. attach `this` typing
|
||||
5. analyze or define vtables `0x3b06`, `0x2d10`, `0x3afe`, `0x3ad2`, `0x3aa6`
|
||||
6. export the class candidate for repo-side documentation and C++ skeleton generation
|
||||
|
||||
If the endpoint surface handles that family cleanly, it is probably sufficient for the rest of the early C++ lifting work.
|
||||
|
||||
## Open Questions To Resolve Later
|
||||
|
||||
- whether Ghidra class namespaces or plain namespaces produce better decompiler output in this 16-bit NE environment
|
||||
- how best to encode far-pointer aware `this` conventions in method signatures
|
||||
- whether vtable datatypes should be attached to concrete memory addresses automatically or only on explicit request
|
||||
- whether confidence annotations should live in datatype comments, decompiler comments, or external export metadata
|
||||
|
||||
## Summary
|
||||
|
||||
The endpoint surface needed here is not large, but it does need to span both symbol ownership and datatype authorship. If later MCP work only adds `move function into class`, it will still leave the hardest part of the C++ lift undone.
|
||||
|
||||
The minimum viable class-lifting feature set is therefore:
|
||||
|
||||
- namespace/class creation
|
||||
- symbol-to-class moves
|
||||
- `this` typing
|
||||
- struct authoring
|
||||
- vtable analysis/authoring
|
||||
- one transactional `apply_class_layout` path
|
||||
Loading…
Add table
Add a link
Reference in a new issue