# GhidraMCP Class-Lifting Endpoint Spec ## Purpose This note drafts the endpoint surface needed to support the Remorse class-lifting workflow described in `docs/remorse-cpp-decompilation-plan.md` and grounded by `docs/remorse-class-candidate-inventory.md`. This is not an implementation batch. It is a local design spec so that when MCP work resumes later, the endpoint set can be built in a way that matches the actual reverse-engineering workflow instead of a generic symbol-edit API. ## Design Goals The new endpoints should make these workflows cheap and repeatable: 1. create class and namespace containers in Ghidra without touching the GUI 2. move already-renamed flat functions under explicit class ownership 3. build typed instance structs and typed vtables from verified evidence 4. attach `this`-pointer semantics and method signatures to recovered methods 5. preserve ambiguity when evidence is partial instead of forcing speculative class conversions 6. support dry-run review before any bulk symbol or datatype mutation ## Non-Goals - automatic recovery of class hierarchies from raw heuristics alone - one-shot `convert whole binary to C++ classes` - speculative inheritance inference without vtable or field evidence - silent symbol moves that hide rename collisions or ownership conflicts ## Existing MCP Behavior To Reuse The local fork already has patterns worth reusing: - explicit target selectors: `project_dir`, `project_name`, `folder_path`, `program_name` - dry-run oriented edit-plan behavior - machine-friendly outputs rather than prose-heavy summaries - backward-compatible aliases when route names change Every new class-lifting endpoint should follow the same conventions. ## Core Object Model Assumptions The class-lifting workflow needs to manipulate four kinds of things explicitly: 1. namespace/class containers in the symbol tree 2. function ownership and method naming 3. datatypes for instance structs and vtables 4. binding metadata between methods, vtable slots, and instance layouts That means symbol-only endpoints are not enough. Datatype endpoints and method-binding endpoints are part of the minimum viable feature set. ## Proposed Endpoints ### 1. `create_namespace` Create a namespace or class container. Parameters: - `name`: string - `parent_path`: string, optional - `kind`: enum `namespace|class`, default `namespace` - explicit target selectors, optional Response: - `status` - `created`: bool - `kind` - `path` - `symbol_id` or equivalent stable identifier if available - `collision`: existing path info when create is skipped or merged Why it matters: - lets the workflow create `Entity`, `SpriteNode`, `EntityVmRuntime`, or similar owners before moving methods ### 2. `list_namespace_members` Return members of a namespace or class container in a machine-friendly form. Parameters: - `path`: string - `include_child_namespaces`: bool, default `false` - `include_functions`: bool, default `true` - `include_data`: bool, default `true` - explicit target selectors, optional Response: - `status` - `path` - `members`: array of `{ kind, name, address?, datatype?, child_count? }` Why it matters: - needed for inventory verification and idempotent batch moves ### 3. `move_symbol_to_namespace` Move a function or data symbol under a namespace/class. Parameters: - `symbol_address`: string, optional - `symbol_name`: string, optional - one of the above required - `namespace_path`: string - `new_name`: string, optional - `conflict_policy`: enum `fail|keep_existing|rename_incoming`, default `fail` - `dry_run`: bool, default `false` - explicit target selectors, optional Response: - `status` - `moved`: bool - `old_path` - `new_path` - `collision`: optional structured collision detail Why it matters: - this is the basic operation needed to turn flat functions into methods after evidence is verified ### 4. `set_function_class` High-level helper to move a function into a class and apply method-oriented naming/signature metadata in one call. Parameters: - `function_address`: string - `class_path`: string - `method_name`: string - `this_param_name`: string, optional, default `this` - `calling_convention`: string, optional - `dry_run`: bool, default `false` - explicit target selectors, optional Response: - `status` - `function_address` - `old_path` - `new_path` - `signature_before` - `signature_after` Why it matters: - reduces the number of separate write operations for the common `move + rename + set this semantics` workflow ### 5. `create_or_update_struct` Create or update a structure datatype. Parameters: - `name`: string - `category_path`: string, optional - `size`: integer, optional - `packing`: integer, optional - `fields`: array of field specs Each field spec: - `offset`: integer - `name`: string - `datatype`: string - `comment`: string, optional - `confidence`: enum `high|medium|low`, optional - `dry_run`: bool, default `false` - explicit target selectors, optional Response: - `status` - `datatype_path` - `created_or_updated` - `size` - `field_count` - `conflicts`: array, optional Why it matters: - class lifting without struct authoring is not enough for readable or recompilable source ### 6. `create_or_update_vtable` Create a vtable datatype as a structure of function pointers. Parameters: - `name`: string - `category_path`: string, optional - `slots`: array of slot specs - `dry_run`: bool, default `false` - explicit target selectors, optional Each slot spec: - `offset`: integer - `name`: string - `function_address`: string, optional - `prototype`: string, optional - `comment`: string, optional Response: - `status` - `datatype_path` - `slot_count` - `bound_functions`: array of `{ offset, function_address, name }` Why it matters: - this is the missing datatype-side half of stable virtual dispatch recovery ### 7. `set_function_this_type` Apply or update `this`-pointer typing on a function. Parameters: - `function_address`: string - `this_type`: string - `this_param_name`: string, optional, default `this` - `this_storage`: enum `stack|register|farptr`, optional - `calling_convention`: string, optional - `dry_run`: bool, default `false` - explicit target selectors, optional Response: - `status` - `function_address` - `signature_before` - `signature_after` Why it matters: - many decompiler improvements only show up after the instance type is attached to the first argument correctly ### 8. `analyze_vtable` Read-side helper that inspects a suspected vtable region and emits slot candidates. Parameters: - `address`: string - `slot_count`: integer, optional - `stop_on_invalid_pointer`: bool, default `true` - explicit target selectors, optional Response: - `status` - `address` - `slots`: array of `{ offset, target_address, target_name, is_function, current_owner?, comment? }` - `warnings`: array, optional Why it matters: - this is the minimum analysis helper needed before class authorship is applied at scale ### 9. `apply_class_layout` Bind a class namespace, instance struct, optional vtable struct, and a set of methods in one dry-runnable transaction. Parameters: - `class_path`: string - `instance_struct`: string - `vtable_struct`: string, optional - `vtable_address`: string, optional - `methods`: array of method specs - `dry_run`: bool, default `false` - explicit target selectors, optional Each method spec: - `function_address`: string - `method_name`: string - `slot_offset`: integer, optional - `is_virtual`: bool, default `false` - `this_type`: string, optional - `comment`: string, optional Response: - `status` - `class_path` - `applied_methods` - `applied_structs` - `warnings` Why it matters: - supports one-shot promotion of a verified family from notes into Ghidra with explicit review first ### 10. `export_class_candidate` Read-side export helper for documentation and source-generation prep. Parameters: - `class_path`: string - `include_struct_fields`: bool, default `true` - `include_vtable`: bool, default `true` - `include_method_signatures`: bool, default `true` - explicit target selectors, optional Response: - machine-friendly JSON-like object containing class metadata, methods, field layouts, and slot maps Why it matters: - the local docs and future C++ skeleton emission need a clean export surface, not just screen scraping ## Field Schemas ### Struct field schema Recommended stable shape: ```json { "offset": 0, "name": "vtable", "datatype": "EntityVTable *", "comment": "Primary vtable pointer", "confidence": "high" } ``` ### Method schema ```json { "function_address": "0008:ba00", "method_name": "Init", "slot_offset": null, "is_virtual": false, "this_type": "EntityDispatchEntry *", "comment": "Base constructor-style init" } ``` ### Vtable slot schema ```json { "offset": 20, "name": "OnEventType2", "function_address": "000b:3ab2", "prototype": "void (__far *OnEventType2)(SpriteNode *, Event *)" } ``` ## Transaction And Safety Rules All write-capable class-lifting endpoints should support: - `dry_run` - explicit target selectors - structured conflict reporting - idempotent repeat calls where practical - no silent overwrite of unrelated symbols or datatype fields Recommended conflict output shape: - `type`: `symbol_collision|datatype_collision|slot_conflict|owner_conflict|signature_conflict` - `path` or `address` - `existing` - `requested` - `resolution_options` ## Backward Compatibility And Aliases Where practical, add aliases instead of replacing older names. Recommended aliases: - `create_class` -> `create_namespace(kind=class)` - `move_function_to_class` -> `set_function_class` - `set_this_type` -> `set_function_this_type` - `build_vtable` -> `create_or_update_vtable` This follows the local fork’s existing pattern of keeping compatibility wrappers when route names evolve. ## Suggested Implementation Order If implementation resumes later, the smallest useful sequence is: 1. `create_namespace` 2. `move_symbol_to_namespace` 3. `set_function_this_type` 4. `create_or_update_struct` 5. `analyze_vtable` 6. `create_or_update_vtable` 7. `apply_class_layout` 8. `export_class_candidate` That order enables immediate manual class work after only the first three or four endpoints, while leaving the richer transactional workflows for later. ## First Real Workflow To Target The first workflow this API should make easy is the pilot family from the current inventory: ### `EntityDispatchEntryBase` promotion workflow 1. create class namespace `Remorse::EntityDispatchEntry` 2. create instance struct `EntityDispatchEntry` 3. move `0008:ba00`, `0008:bca8`, `0008:bd53`, `0008:bf8e`, `0008:c01d`, `0008:dbec`, and constructor variants under that class as methods 4. attach `this` typing 5. analyze or define vtables `0x3b06`, `0x2d10`, `0x3afe`, `0x3ad2`, `0x3aa6` 6. export the class candidate for repo-side documentation and C++ skeleton generation If the endpoint surface handles that family cleanly, it is probably sufficient for the rest of the early C++ lifting work. ## Open Questions To Resolve Later - whether Ghidra class namespaces or plain namespaces produce better decompiler output in this 16-bit NE environment - how best to encode far-pointer aware `this` conventions in method signatures - whether vtable datatypes should be attached to concrete memory addresses automatically or only on explicit request - whether confidence annotations should live in datatype comments, decompiler comments, or external export metadata ## Summary The endpoint surface needed here is not large, but it does need to span both symbol ownership and datatype authorship. If later MCP work only adds `move function into class`, it will still leave the hardest part of the C++ lift undone. The minimum viable class-lifting feature set is therefore: - namespace/class creation - symbol-to-class moves - `this` typing - struct authoring - vtable analysis/authoring - one transactional `apply_class_layout` path