212 lines
7.6 KiB
Markdown
212 lines
7.6 KiB
Markdown
|
|
# Remorse Rebuild ABI Notes
|
||
|
|
|
||
|
|
## Purpose
|
||
|
|
|
||
|
|
This note records the current ABI, memory-model, and toolchain constraints that should shape any future Remorse source reconstruction.
|
||
|
|
|
||
|
|
The class-lifting notes answer `what the objects probably are`.
|
||
|
|
This note answers `what the rebuilt source must still respect if it aims to become a working executable rather than only readable C++`.
|
||
|
|
|
||
|
|
## Current Baseline
|
||
|
|
|
||
|
|
The live target is not a flat modern Win32 program.
|
||
|
|
|
||
|
|
Current verified binary facts:
|
||
|
|
|
||
|
|
- DOS target
|
||
|
|
- 16-bit protected-mode environment
|
||
|
|
- Phar Lap 286 DOS-Extender (`RUN286`)
|
||
|
|
- bound `MZ -> NE` executable
|
||
|
|
- heavy use of inter-segment and external `CALLF` fixups
|
||
|
|
|
||
|
|
That means the default safe assumption is:
|
||
|
|
|
||
|
|
- segmented code/data model matters
|
||
|
|
- near/far calls matter
|
||
|
|
- pointer width and calling convention details matter
|
||
|
|
- loader/runtime expectations matter
|
||
|
|
|
||
|
|
## Hard Constraints Already Visible In The Binary
|
||
|
|
|
||
|
|
### 1. Segmented addressing is real, not presentation noise
|
||
|
|
|
||
|
|
Evidence:
|
||
|
|
|
||
|
|
- executable format is `MZ -> NE`
|
||
|
|
- raw import behavior collapses unresolved calls to `0000:ffff` until NE fixups are applied
|
||
|
|
- repaired raw import had thousands of internal literal `CALLF` sites patched to real segment:offset targets
|
||
|
|
- the notes repeatedly distinguish far pointers, segment:offset storage, and per-segment relocation behavior
|
||
|
|
|
||
|
|
Practical implication:
|
||
|
|
|
||
|
|
- a rebuild target that ignores far calls and far data pointers too early will drift away from the original executable model
|
||
|
|
|
||
|
|
### 2. Function boundaries and external calls are loader-sensitive
|
||
|
|
|
||
|
|
Evidence:
|
||
|
|
|
||
|
|
- `CALLF 0000:ffff` is a placeholder used by the NE loader for real inter-segment/external targets
|
||
|
|
- unresolved far thunk behavior in raw import is explicitly not a real dispatcher
|
||
|
|
|
||
|
|
Practical implication:
|
||
|
|
|
||
|
|
- source emission must preserve which calls are logically intra-object methods and which ones are ABI-significant far calls or imported runtime/library calls
|
||
|
|
|
||
|
|
### 3. Runtime/library layer is not trivial glue
|
||
|
|
|
||
|
|
Evidence:
|
||
|
|
|
||
|
|
- large Phar Lap runtime/extender segments remain part of startup and low-level system behavior
|
||
|
|
- CRT wrappers and formatter/runtime helpers are explicitly identified
|
||
|
|
- MetaWare High C formatting/runtime wrappers are present in the notes
|
||
|
|
|
||
|
|
Practical implication:
|
||
|
|
|
||
|
|
- the original or near-original compiler/runtime environment matters enough that `just compile with a modern compiler` is not a safe early assumption for an original-style rebuild
|
||
|
|
|
||
|
|
### 4. Object layout is tightly coupled to exact field offsets
|
||
|
|
|
||
|
|
Evidence:
|
||
|
|
|
||
|
|
- major gameplay and UI families are still being recovered by exact offsets
|
||
|
|
- VM/runtime helpers, dispatch entries, and entity families all depend on stable field positions
|
||
|
|
|
||
|
|
Practical implication:
|
||
|
|
|
||
|
|
- class lifting must preserve packed layout discipline and exact-width integer choices from the start
|
||
|
|
|
||
|
|
## Current Best Toolchain Read
|
||
|
|
|
||
|
|
This is still a working model, not a closed historical claim.
|
||
|
|
|
||
|
|
### High-confidence environment facts
|
||
|
|
|
||
|
|
- DOS protected mode under Phar Lap 286 extender
|
||
|
|
- NE executable image
|
||
|
|
- runtime/CRT evidence compatible with MetaWare High C presence in at least part of the binary toolchain story
|
||
|
|
|
||
|
|
### What remains open
|
||
|
|
|
||
|
|
- exact original compiler version
|
||
|
|
- exact memory-model flags used for all modules
|
||
|
|
- exact calling-convention mapping for each object family
|
||
|
|
- exact linker/build recipe needed to reproduce compatible NE output
|
||
|
|
|
||
|
|
## Recommended Rebuild Tracks
|
||
|
|
|
||
|
|
### Track A: Original-style executable reconstruction
|
||
|
|
|
||
|
|
If the goal is to rebuild something close to the shipped executable model, the source must preserve:
|
||
|
|
|
||
|
|
- segmented pointer distinctions
|
||
|
|
- explicit near/far calling boundaries where needed
|
||
|
|
- exact struct packing
|
||
|
|
- compatible CRT/runtime assumptions
|
||
|
|
- executable/resource layout expectations
|
||
|
|
|
||
|
|
This is the stricter track.
|
||
|
|
|
||
|
|
### Track B: Behaviorally equivalent source port
|
||
|
|
|
||
|
|
If the goal is instead a working engine/game rebuild using the original data with equivalent behavior, then the source can relax some ABI constraints later.
|
||
|
|
|
||
|
|
But even on this track, the early reverse-engineering output should still preserve ABI facts long enough that the project can make an informed choice instead of accidentally forcing itself into a port.
|
||
|
|
|
||
|
|
## Source-Level Rules To Adopt Early
|
||
|
|
|
||
|
|
Any future generated or handwritten code should default to these constraints:
|
||
|
|
|
||
|
|
### Integer widths
|
||
|
|
|
||
|
|
- use explicit fixed-width integer types everywhere possible
|
||
|
|
- do not use plain `int`, `long`, or compiler-default enum width as semantic types in the first pass
|
||
|
|
|
||
|
|
### Layout control
|
||
|
|
|
||
|
|
- keep a visible packing strategy for recovered structs
|
||
|
|
- record uncertain padding explicitly rather than letting the compiler invent it silently
|
||
|
|
|
||
|
|
### Pointer model
|
||
|
|
|
||
|
|
- keep far-pointer distinctions visible in the type system or wrapper layer
|
||
|
|
- do not immediately collapse all pointers to one flat host pointer type if Track A remains in scope
|
||
|
|
|
||
|
|
### Calling conventions
|
||
|
|
|
||
|
|
- keep calling convention annotations explicit in working notes and emitted skeletons
|
||
|
|
- do not assume one modern host calling convention is an adequate stand-in for every recovered method or helper
|
||
|
|
|
||
|
|
### Virtual dispatch
|
||
|
|
|
||
|
|
- preserve raw slot order in provisional vtable types
|
||
|
|
- do not rename or reorder slots to look cleaner before the mapping is stable
|
||
|
|
|
||
|
|
## Candidate ABI Support Layer
|
||
|
|
|
||
|
|
The first C++ source slices should probably compile against a small compatibility layer rather than raw host C++ alone.
|
||
|
|
|
||
|
|
Current likely categories:
|
||
|
|
|
||
|
|
- exact-width integer typedefs
|
||
|
|
- far/near pointer wrappers or placeholder abstractions
|
||
|
|
- packing macros or pragmas
|
||
|
|
- calling-convention macros
|
||
|
|
- segmented address helper types for debugging and trace comparison
|
||
|
|
- imported runtime service shims for file, memory, and platform calls
|
||
|
|
|
||
|
|
## Immediate Compiler/Runtime Questions To Close Later
|
||
|
|
|
||
|
|
These are the most useful next ABI questions for the repo:
|
||
|
|
|
||
|
|
1. Which compiler/runtime signatures in the binary most strongly identify the original toolchain family and version?
|
||
|
|
2. Which current methods clearly require far-call semantics even after class lifting?
|
||
|
|
3. Which object families can safely be emitted as host-side plain structs first, and which still need explicit segmented-pointer wrappers?
|
||
|
|
4. What is the narrowest executable milestone that can validate calling conventions and struct layout before whole-program reconstruction is attempted?
|
||
|
|
|
||
|
|
## Practical Risk List
|
||
|
|
|
||
|
|
### Risk: pretty C++ that cannot rebuild the game
|
||
|
|
|
||
|
|
Cause:
|
||
|
|
|
||
|
|
- class lifting done without ABI discipline
|
||
|
|
|
||
|
|
Mitigation:
|
||
|
|
|
||
|
|
- keep this note paired with the class-layout notes and require exact-width/packing/calling-convention placeholders in early skeletons
|
||
|
|
|
||
|
|
### Risk: false confidence from host compilation success
|
||
|
|
|
||
|
|
Cause:
|
||
|
|
|
||
|
|
- code compiles under a modern compiler but no longer matches segmented runtime behavior
|
||
|
|
|
||
|
|
Mitigation:
|
||
|
|
|
||
|
|
- define compile success and behavioral/ABI success as separate milestones
|
||
|
|
|
||
|
|
### Risk: loss of far-call/import provenance
|
||
|
|
|
||
|
|
Cause:
|
||
|
|
|
||
|
|
- unresolved thunk placeholders or loader-patched calls get flattened into generic helper names
|
||
|
|
|
||
|
|
Mitigation:
|
||
|
|
|
||
|
|
- preserve call provenance in notes and later exports, especially for methods that only look local after fixup repair
|
||
|
|
|
||
|
|
## Recommended Near-Term Documentation Follow-Ups
|
||
|
|
|
||
|
|
1. collect all current compiler/runtime fingerprints into one evidence note
|
||
|
|
2. add an `ABI concerns` section to future class-layout notes when a family uses far pointers or segmented ownership directly
|
||
|
|
3. draft the first minimal compatibility header for future C++ skeletons once the first class family is selected for source emission
|
||
|
|
|
||
|
|
## Current Bottom Line
|
||
|
|
|
||
|
|
The project is now documented well enough to start class lifting, but not well enough to safely emit `clean modern C++` without guardrails.
|
||
|
|
|
||
|
|
The safest present rule is:
|
||
|
|
|
||
|
|
- keep object recovery aggressive
|
||
|
|
- keep ABI assumptions conservative
|
||
|
|
- keep Track A and Track B separate in every future source milestone
|