# Remorse Rebuild ABI Notes ## Purpose This note records the current ABI, memory-model, and toolchain constraints that should shape any future Remorse source reconstruction. The class-lifting notes answer `what the objects probably are`. This note answers `what the rebuilt source must still respect if it aims to become a working executable rather than only readable C++`. ## Current Baseline The live target is not a flat modern Win32 program. Current verified binary facts: - DOS target - 16-bit protected-mode environment - Phar Lap 286 DOS-Extender (`RUN286`) - bound `MZ -> NE` executable - heavy use of inter-segment and external `CALLF` fixups That means the default safe assumption is: - segmented code/data model matters - near/far calls matter - pointer width and calling convention details matter - loader/runtime expectations matter ## Hard Constraints Already Visible In The Binary ### 1. Segmented addressing is real, not presentation noise Evidence: - executable format is `MZ -> NE` - raw import behavior collapses unresolved calls to `0000:ffff` until NE fixups are applied - repaired raw import had thousands of internal literal `CALLF` sites patched to real segment:offset targets - the notes repeatedly distinguish far pointers, segment:offset storage, and per-segment relocation behavior Practical implication: - a rebuild target that ignores far calls and far data pointers too early will drift away from the original executable model ### 2. Function boundaries and external calls are loader-sensitive Evidence: - `CALLF 0000:ffff` is a placeholder used by the NE loader for real inter-segment/external targets - unresolved far thunk behavior in raw import is explicitly not a real dispatcher Practical implication: - source emission must preserve which calls are logically intra-object methods and which ones are ABI-significant far calls or imported runtime/library calls ### 3. Runtime/library layer is not trivial glue Evidence: - large Phar Lap runtime/extender segments remain part of startup and low-level system behavior - CRT wrappers and formatter/runtime helpers are explicitly identified - MetaWare High C formatting/runtime wrappers are present in the notes Practical implication: - the original or near-original compiler/runtime environment matters enough that `just compile with a modern compiler` is not a safe early assumption for an original-style rebuild ### 4. Object layout is tightly coupled to exact field offsets Evidence: - major gameplay and UI families are still being recovered by exact offsets - VM/runtime helpers, dispatch entries, and entity families all depend on stable field positions Practical implication: - class lifting must preserve packed layout discipline and exact-width integer choices from the start ## Current Best Toolchain Read This is still a working model, not a closed historical claim. ### High-confidence environment facts - DOS protected mode under Phar Lap 286 extender - NE executable image - runtime/CRT evidence compatible with MetaWare High C presence in at least part of the binary toolchain story ### What remains open - exact original compiler version - exact memory-model flags used for all modules - exact calling-convention mapping for each object family - exact linker/build recipe needed to reproduce compatible NE output ## Recommended Rebuild Tracks ### Track A: Original-style executable reconstruction If the goal is to rebuild something close to the shipped executable model, the source must preserve: - segmented pointer distinctions - explicit near/far calling boundaries where needed - exact struct packing - compatible CRT/runtime assumptions - executable/resource layout expectations This is the stricter track. ### Track B: Behaviorally equivalent source port If the goal is instead a working engine/game rebuild using the original data with equivalent behavior, then the source can relax some ABI constraints later. But even on this track, the early reverse-engineering output should still preserve ABI facts long enough that the project can make an informed choice instead of accidentally forcing itself into a port. ## Source-Level Rules To Adopt Early Any future generated or handwritten code should default to these constraints: ### Integer widths - use explicit fixed-width integer types everywhere possible - do not use plain `int`, `long`, or compiler-default enum width as semantic types in the first pass ### Layout control - keep a visible packing strategy for recovered structs - record uncertain padding explicitly rather than letting the compiler invent it silently ### Pointer model - keep far-pointer distinctions visible in the type system or wrapper layer - do not immediately collapse all pointers to one flat host pointer type if Track A remains in scope ### Calling conventions - keep calling convention annotations explicit in working notes and emitted skeletons - do not assume one modern host calling convention is an adequate stand-in for every recovered method or helper ### Virtual dispatch - preserve raw slot order in provisional vtable types - do not rename or reorder slots to look cleaner before the mapping is stable ## Candidate ABI Support Layer The first C++ source slices should probably compile against a small compatibility layer rather than raw host C++ alone. Current likely categories: - exact-width integer typedefs - far/near pointer wrappers or placeholder abstractions - packing macros or pragmas - calling-convention macros - segmented address helper types for debugging and trace comparison - imported runtime service shims for file, memory, and platform calls ## Immediate Compiler/Runtime Questions To Close Later These are the most useful next ABI questions for the repo: 1. Which compiler/runtime signatures in the binary most strongly identify the original toolchain family and version? 2. Which current methods clearly require far-call semantics even after class lifting? 3. Which object families can safely be emitted as host-side plain structs first, and which still need explicit segmented-pointer wrappers? 4. What is the narrowest executable milestone that can validate calling conventions and struct layout before whole-program reconstruction is attempted? ## Practical Risk List ### Risk: pretty C++ that cannot rebuild the game Cause: - class lifting done without ABI discipline Mitigation: - keep this note paired with the class-layout notes and require exact-width/packing/calling-convention placeholders in early skeletons ### Risk: false confidence from host compilation success Cause: - code compiles under a modern compiler but no longer matches segmented runtime behavior Mitigation: - define compile success and behavioral/ABI success as separate milestones ### Risk: loss of far-call/import provenance Cause: - unresolved thunk placeholders or loader-patched calls get flattened into generic helper names Mitigation: - preserve call provenance in notes and later exports, especially for methods that only look local after fixup repair ## Recommended Near-Term Documentation Follow-Ups 1. collect all current compiler/runtime fingerprints into one evidence note 2. add an `ABI concerns` section to future class-layout notes when a family uses far pointers or segmented ownership directly 3. draft the first minimal compatibility header for future C++ skeletons once the first class family is selected for source emission ## Current Bottom Line The project is now documented well enough to start class lifting, but not well enough to safely emit `clean modern C++` without guardrails. The safest present rule is: - keep object recovery aggressive - keep ABI assumptions conservative - keep Track A and Track B separate in every future source milestone