# Remorse C++ Decompilation Plan ## Goal Turn the current evidence-backed Remorse decompilation into understandable, maintainable C++ source that can eventually be rebuilt into a working executable. The important constraint is that this should be treated as a staged lift, not a direct dump of Ghidra pseudocode into a compiler. The shortest path to a recompilable result is to recover the original object model deliberately: class ownership, instance layouts, vtables, calling conventions, segmented-pointer rules, resource formats, and subsystem boundaries. ## Short Answer: Can Ghidra Be Made More Class-Aware? Yes, but only partially and mostly through explicit modeling. Ghidra can already represent a lot of what we need: - class and namespace symbols in the Symbol Tree - structs and unions in the Data Type Manager - vtable data and typed function pointers - method ownership through namespaces/classes - `this`-pointer style signatures when the calling convention and object layout are known What it does not do well here is infer all of that automatically from a 16-bit DOS binary with mixed C/C++ patterns, custom memory conventions, and incomplete original type information. For this project, class recovery has to be evidence-driven. ## Why The Shift Is Justified Now The current notes already contain repeated object-oriented evidence, not just loose procedural code: - constructor-style helpers that allocate, stamp a vtable, and zero instance state - destructor or teardown paths that restore a base vtable and free owned buffers - stable indirect dispatch through known vtable slots - controller, entity, sprite-node, VM-context, and resource-helper families with repeatable instance fields - several class-like clusters that already have better behavioral names than generic `FUN_...` placeholders That is enough to start building a real C++ object model rather than treating the entire program as flat C with random function pointers. Useful evidence anchors already in the repo include: - `docs/ne-segment1.md` for entity, projectile, dialog, and sprite-adjacent object lanes - `docs/raw-0008-000c.md` for constructor families, vtable-backed dispatch entries, VM/runtime helpers, and stateful controller objects - `docs/raw-000a-000d.md` for loader/resource families, callback brokers, and teardown-heavy object lanes - `docs/raw-porting-progress.md` for callback-object evidence and cross-segment vtable dispatch patterns - `docs/far-call-targets.md` for high-frequency ctor/dtor/vtable-slot helpers ## End State The real target should be defined more tightly than `nice C++`: 1. major gameplay, rendering, UI, VM, and resource subsystems are expressed as named classes with understandable responsibilities 2. instance layouts and ownership rules are explicit enough that decompiled code stops depending on anonymous offset math for routine work 3. virtual dispatch is expressed through named methods or typed vtable tables rather than raw slot offsets 4. the source can be rebuilt with a documented toolchain into a working executable or an equivalent working runtime target 5. the rebuilt result is validated by behavior, not by cosmetic similarity to decompiler output ## Working Assumption About The Rebuild Target There are two plausible endgames, and the plan should keep them separate from the start: ### Track A: Original-style executable rebuild Rebuild a DOS executable that preserves the segmented-memory model, calling conventions, packed layouts, and resource/file expectations closely enough to run the original game data. This is the harder but most direct historical target. It likely depends on recovering or emulating: - the original or closest-possible compiler model - near/far pointer conventions - packed struct layout and enum sizes - startup/runtime integration with the Phar Lap environment or an equivalent replacement layer ### Track B: Behaviorally equivalent source port Rebuild the game logic in modern C++ while preserving data formats and behavior, but not necessarily the original binary ABI. This is often the faster path to a working recompiled game, but it is a different goal. If the project wants a true executable reconstruction rather than an engine rewrite, Track A has to remain the primary constraint. For now, the safest planning stance is: recover source in a way that keeps both tracks open for as long as possible. ## Recommended Strategy ### Phase 0: Treat Ghidra As The Truth Database Use Ghidra as the canonical place where recovered class ownership, vtable slots, field layouts, and method names live. That means pushing beyond flat rename work into: - class namespaces for object families - typed instance structs - typed vtable structs where the slots are stable enough - method names that distinguish static helpers from instance methods - explicit comments recording why a family is believed to be one class and not just one subsystem ### Phase 1: Recover The Object Model Before Chasing Pretty Output Prioritize families that already have strong OO evidence. Best early targets: 1. entity families in `seg001` and the raw/live `0007` lanes 2. dispatch-entry / controller objects in `0008` and `000c` 3. sprite-node and UI/menu object families 4. VM runtime, context, owner-resource, and loader helpers 5. callback/resource broker objects around `0x4588` For each candidate class family, the minimum closure should be: - candidate class name - constructor and destructor candidates - instance size estimate - confirmed or suspected vtable base - known slot-to-method map - field map with confidence levels - inbound callers that prove object lifetime or ownership ### Phase 2: Separate Methods From Free Functions Not every helper touching an object should become a class method. The conversion rule should be conservative: - make it a method when the object pointer is clearly the owner, the function acts on instance state, and the function participates in the class lifecycle or virtual surface - keep it free or subsystem-local when it behaves like a pure helper, allocator utility, serializer, or cross-object coordinator This matters because over-classing weak evidence will make the source look cleaner while actually reducing correctness. ### Phase 3: Build Stable Type Layers Before broad C++ emission, define a small number of disciplined type layers: - ABI layer: exact-width integers, near/far pointer wrappers, packed structs, fixed calling-convention macros - runtime layer: allocators, file/resource handles, callback tables, event records, dispatch entries - gameplay layer: entities, actors, projectiles, triggers, controller objects, UI nodes - VM layer: runtime/context/owner-resource classes, opcode streams, slot/value helpers The source should compile against these types first, even if some methods still contain low-level or ugly code. ### Phase 4: Land Recompilable C++ In Vertical Slices Do not wait for the whole game to be class-clean before testing compilation. Instead, move in subsystem slices: 1. one object family 2. its structs and vtable 3. its constructors/destructors 4. a handful of live methods 5. a compile test for that slice This is the only realistic way to find layout or calling-convention mistakes early. ### Phase 5: Add Runtime Validation Harnesses A source-level recompile effort will fail if verification is only manual. Needed validation layers: - map/resource load smoke tests - deterministic startup path checks - function-level trace comparisons for selected hot methods - data-layout assertions on recovered structs - script/VM behavior checks where extracted USECODE already gives a second evidence source ### Phase 6: Choose The First Real Rebuild Milestone The first meaningful source milestone should not be `whole game builds`. A better first milestone is one of these: 1. compile a library that matches one major subsystem ABI and can run against fixture data 2. rebuild the startup/resource path far enough to load into a title/menu state 3. rebuild one contained gameplay loop such as entity allocation/update/teardown with equivalent traces ## Ghidra/MCP Gaps That Matter For This Plan The local MCP fork already gives enough read/query power to continue class recovery, but it is still missing key authoring operations for a serious C++ lift: - create class or namespace symbols through MCP - move existing functions under class ownership cleanly - create or update struct and vtable datatypes through MCP - set `this`-pointer types and method signatures systematically - analyze a candidate vtable and bind slots to named methods in one operation Those gaps have been added to `ghidra_mcp_wishlist.md` in this batch. ## First Concrete Work Batches The most defensible first batches are small and structural. ### Batch 1: Class Inventory Pass Build a repo-side inventory of the strongest current class candidates: - class family name - addresses for ctor/dtor/vtable roots - known methods - instance-size estimate - notes/doc references ### Batch 2: One Fully Modeled Family Pick one family with low ambiguity and carry it through end to end inside Ghidra and the notes: - class namespace - method ownership - instance struct - vtable struct - method-slot table - short rationale note Good initial candidates are the `entity_dispatch_entry_*` family, the sprite-node family, or one compact controller object family. ### Batch 3: C++ Skeleton Output Emit one hand-maintained C++ header/source pair for that family with: - exact-width field placeholders - named methods - comments for unresolved fields or slot semantics - enough type discipline that the code could later be compiled under a chosen toolchain ### Batch 4: Toolchain Recon Establish the most credible compile target and constraints early: - likely original compiler family or nearest substitute - calling convention spelling - memory-model requirements - struct packing behavior - import/library expectations Without this, the source can drift into modernized C++ that reads well but cannot realistically rebuild the game. ## What To Avoid - Do not mass-convert procedural helpers into methods just to make the output look object-oriented. - Do not let Ghidra pseudocode naming outrun field-layout evidence. - Do not assume modern C++ ABI rules match the original compiler. - Do not mix `behaviorally equivalent port` goals with `original-style executable rebuild` claims in the same milestone. - Do not wait for perfect global understanding before compiling anything. ## Immediate Next Steps 1. add the missing class/namespace and vtable-authoring MCP endpoints to the local fork when ready 2. make a `class candidate inventory` note from the strongest existing families in the current docs 3. choose one family and model it all the way through as a pilot C++ class 4. decide whether the primary rebuild constraint is original-style DOS/NE compatibility or a behaviorally equivalent C++ port 5. define the first compile/test harness before broad source emission starts ## Success Criteria For This Plan This plan is working if, after a few batches, the project has all of the following: - at least one real class family fully modeled in Ghidra and mirrored in source - repeatable rules for when a function becomes a method - repeatable rules for vtable and field-layout evidence - a documented compile target with ABI constraints - a narrow but real compilation/validation loop If those do not exist, the project is still doing useful reverse engineering, but it has not yet truly shifted into a recompilable C++ decompilation lane.