7.6 KiB
Remorse Rebuild ABI Notes
Purpose
This note records the current ABI, memory-model, and toolchain constraints that should shape any future Remorse source reconstruction.
The class-lifting notes answer what the objects probably are.
This note answers what the rebuilt source must still respect if it aims to become a working executable rather than only readable C++.
Current Baseline
The live target is not a flat modern Win32 program.
Current verified binary facts:
- DOS target
- 16-bit protected-mode environment
- Phar Lap 286 DOS-Extender (
RUN286) - bound
MZ -> NEexecutable - heavy use of inter-segment and external
CALLFfixups
That means the default safe assumption is:
- segmented code/data model matters
- near/far calls matter
- pointer width and calling convention details matter
- loader/runtime expectations matter
Hard Constraints Already Visible In The Binary
1. Segmented addressing is real, not presentation noise
Evidence:
- executable format is
MZ -> NE - raw import behavior collapses unresolved calls to
0000:ffffuntil NE fixups are applied - repaired raw import had thousands of internal literal
CALLFsites patched to real segment:offset targets - the notes repeatedly distinguish far pointers, segment:offset storage, and per-segment relocation behavior
Practical implication:
- a rebuild target that ignores far calls and far data pointers too early will drift away from the original executable model
2. Function boundaries and external calls are loader-sensitive
Evidence:
CALLF 0000:ffffis a placeholder used by the NE loader for real inter-segment/external targets- unresolved far thunk behavior in raw import is explicitly not a real dispatcher
Practical implication:
- source emission must preserve which calls are logically intra-object methods and which ones are ABI-significant far calls or imported runtime/library calls
3. Runtime/library layer is not trivial glue
Evidence:
- large Phar Lap runtime/extender segments remain part of startup and low-level system behavior
- CRT wrappers and formatter/runtime helpers are explicitly identified
- MetaWare High C formatting/runtime wrappers are present in the notes
Practical implication:
- the original or near-original compiler/runtime environment matters enough that
just compile with a modern compileris not a safe early assumption for an original-style rebuild
4. Object layout is tightly coupled to exact field offsets
Evidence:
- major gameplay and UI families are still being recovered by exact offsets
- VM/runtime helpers, dispatch entries, and entity families all depend on stable field positions
Practical implication:
- class lifting must preserve packed layout discipline and exact-width integer choices from the start
Current Best Toolchain Read
This is still a working model, not a closed historical claim.
High-confidence environment facts
- DOS protected mode under Phar Lap 286 extender
- NE executable image
- runtime/CRT evidence compatible with MetaWare High C presence in at least part of the binary toolchain story
What remains open
- exact original compiler version
- exact memory-model flags used for all modules
- exact calling-convention mapping for each object family
- exact linker/build recipe needed to reproduce compatible NE output
Recommended Rebuild Tracks
Track A: Original-style executable reconstruction
If the goal is to rebuild something close to the shipped executable model, the source must preserve:
- segmented pointer distinctions
- explicit near/far calling boundaries where needed
- exact struct packing
- compatible CRT/runtime assumptions
- executable/resource layout expectations
This is the stricter track.
Track B: Behaviorally equivalent source port
If the goal is instead a working engine/game rebuild using the original data with equivalent behavior, then the source can relax some ABI constraints later.
But even on this track, the early reverse-engineering output should still preserve ABI facts long enough that the project can make an informed choice instead of accidentally forcing itself into a port.
Source-Level Rules To Adopt Early
Any future generated or handwritten code should default to these constraints:
Integer widths
- use explicit fixed-width integer types everywhere possible
- do not use plain
int,long, or compiler-default enum width as semantic types in the first pass
Layout control
- keep a visible packing strategy for recovered structs
- record uncertain padding explicitly rather than letting the compiler invent it silently
Pointer model
- keep far-pointer distinctions visible in the type system or wrapper layer
- do not immediately collapse all pointers to one flat host pointer type if Track A remains in scope
Calling conventions
- keep calling convention annotations explicit in working notes and emitted skeletons
- do not assume one modern host calling convention is an adequate stand-in for every recovered method or helper
Virtual dispatch
- preserve raw slot order in provisional vtable types
- do not rename or reorder slots to look cleaner before the mapping is stable
Candidate ABI Support Layer
The first C++ source slices should probably compile against a small compatibility layer rather than raw host C++ alone.
Current likely categories:
- exact-width integer typedefs
- far/near pointer wrappers or placeholder abstractions
- packing macros or pragmas
- calling-convention macros
- segmented address helper types for debugging and trace comparison
- imported runtime service shims for file, memory, and platform calls
Immediate Compiler/Runtime Questions To Close Later
These are the most useful next ABI questions for the repo:
- Which compiler/runtime signatures in the binary most strongly identify the original toolchain family and version?
- Which current methods clearly require far-call semantics even after class lifting?
- Which object families can safely be emitted as host-side plain structs first, and which still need explicit segmented-pointer wrappers?
- What is the narrowest executable milestone that can validate calling conventions and struct layout before whole-program reconstruction is attempted?
Practical Risk List
Risk: pretty C++ that cannot rebuild the game
Cause:
- class lifting done without ABI discipline
Mitigation:
- keep this note paired with the class-layout notes and require exact-width/packing/calling-convention placeholders in early skeletons
Risk: false confidence from host compilation success
Cause:
- code compiles under a modern compiler but no longer matches segmented runtime behavior
Mitigation:
- define compile success and behavioral/ABI success as separate milestones
Risk: loss of far-call/import provenance
Cause:
- unresolved thunk placeholders or loader-patched calls get flattened into generic helper names
Mitigation:
- preserve call provenance in notes and later exports, especially for methods that only look local after fixup repair
Recommended Near-Term Documentation Follow-Ups
- collect all current compiler/runtime fingerprints into one evidence note
- add an
ABI concernssection to future class-layout notes when a family uses far pointers or segmented ownership directly - draft the first minimal compatibility header for future C++ skeletons once the first class family is selected for source emission
Current Bottom Line
The project is now documented well enough to start class lifting, but not well enough to safely emit clean modern C++ without guardrails.
The safest present rule is:
- keep object recovery aggressive
- keep ABI assumptions conservative
- keep Track A and Track B separate in every future source milestone