Crusader_Decomp/docs/remorse-rebuild-abi-notes.md

212 lines
7.6 KiB
Markdown
Raw Permalink Normal View History

2026-04-05 18:27:09 +02:00
# Remorse Rebuild ABI Notes
## Purpose
This note records the current ABI, memory-model, and toolchain constraints that should shape any future Remorse source reconstruction.
The class-lifting notes answer `what the objects probably are`.
This note answers `what the rebuilt source must still respect if it aims to become a working executable rather than only readable C++`.
## Current Baseline
The live target is not a flat modern Win32 program.
Current verified binary facts:
- DOS target
- 16-bit protected-mode environment
- Phar Lap 286 DOS-Extender (`RUN286`)
- bound `MZ -> NE` executable
- heavy use of inter-segment and external `CALLF` fixups
That means the default safe assumption is:
- segmented code/data model matters
- near/far calls matter
- pointer width and calling convention details matter
- loader/runtime expectations matter
## Hard Constraints Already Visible In The Binary
### 1. Segmented addressing is real, not presentation noise
Evidence:
- executable format is `MZ -> NE`
- raw import behavior collapses unresolved calls to `0000:ffff` until NE fixups are applied
- repaired raw import had thousands of internal literal `CALLF` sites patched to real segment:offset targets
- the notes repeatedly distinguish far pointers, segment:offset storage, and per-segment relocation behavior
Practical implication:
- a rebuild target that ignores far calls and far data pointers too early will drift away from the original executable model
### 2. Function boundaries and external calls are loader-sensitive
Evidence:
- `CALLF 0000:ffff` is a placeholder used by the NE loader for real inter-segment/external targets
- unresolved far thunk behavior in raw import is explicitly not a real dispatcher
Practical implication:
- source emission must preserve which calls are logically intra-object methods and which ones are ABI-significant far calls or imported runtime/library calls
### 3. Runtime/library layer is not trivial glue
Evidence:
- large Phar Lap runtime/extender segments remain part of startup and low-level system behavior
- CRT wrappers and formatter/runtime helpers are explicitly identified
- MetaWare High C formatting/runtime wrappers are present in the notes
Practical implication:
- the original or near-original compiler/runtime environment matters enough that `just compile with a modern compiler` is not a safe early assumption for an original-style rebuild
### 4. Object layout is tightly coupled to exact field offsets
Evidence:
- major gameplay and UI families are still being recovered by exact offsets
- VM/runtime helpers, dispatch entries, and entity families all depend on stable field positions
Practical implication:
- class lifting must preserve packed layout discipline and exact-width integer choices from the start
## Current Best Toolchain Read
This is still a working model, not a closed historical claim.
### High-confidence environment facts
- DOS protected mode under Phar Lap 286 extender
- NE executable image
- runtime/CRT evidence compatible with MetaWare High C presence in at least part of the binary toolchain story
### What remains open
- exact original compiler version
- exact memory-model flags used for all modules
- exact calling-convention mapping for each object family
- exact linker/build recipe needed to reproduce compatible NE output
## Recommended Rebuild Tracks
### Track A: Original-style executable reconstruction
If the goal is to rebuild something close to the shipped executable model, the source must preserve:
- segmented pointer distinctions
- explicit near/far calling boundaries where needed
- exact struct packing
- compatible CRT/runtime assumptions
- executable/resource layout expectations
This is the stricter track.
### Track B: Behaviorally equivalent source port
If the goal is instead a working engine/game rebuild using the original data with equivalent behavior, then the source can relax some ABI constraints later.
But even on this track, the early reverse-engineering output should still preserve ABI facts long enough that the project can make an informed choice instead of accidentally forcing itself into a port.
## Source-Level Rules To Adopt Early
Any future generated or handwritten code should default to these constraints:
### Integer widths
- use explicit fixed-width integer types everywhere possible
- do not use plain `int`, `long`, or compiler-default enum width as semantic types in the first pass
### Layout control
- keep a visible packing strategy for recovered structs
- record uncertain padding explicitly rather than letting the compiler invent it silently
### Pointer model
- keep far-pointer distinctions visible in the type system or wrapper layer
- do not immediately collapse all pointers to one flat host pointer type if Track A remains in scope
### Calling conventions
- keep calling convention annotations explicit in working notes and emitted skeletons
- do not assume one modern host calling convention is an adequate stand-in for every recovered method or helper
### Virtual dispatch
- preserve raw slot order in provisional vtable types
- do not rename or reorder slots to look cleaner before the mapping is stable
## Candidate ABI Support Layer
The first C++ source slices should probably compile against a small compatibility layer rather than raw host C++ alone.
Current likely categories:
- exact-width integer typedefs
- far/near pointer wrappers or placeholder abstractions
- packing macros or pragmas
- calling-convention macros
- segmented address helper types for debugging and trace comparison
- imported runtime service shims for file, memory, and platform calls
## Immediate Compiler/Runtime Questions To Close Later
These are the most useful next ABI questions for the repo:
1. Which compiler/runtime signatures in the binary most strongly identify the original toolchain family and version?
2. Which current methods clearly require far-call semantics even after class lifting?
3. Which object families can safely be emitted as host-side plain structs first, and which still need explicit segmented-pointer wrappers?
4. What is the narrowest executable milestone that can validate calling conventions and struct layout before whole-program reconstruction is attempted?
## Practical Risk List
### Risk: pretty C++ that cannot rebuild the game
Cause:
- class lifting done without ABI discipline
Mitigation:
- keep this note paired with the class-layout notes and require exact-width/packing/calling-convention placeholders in early skeletons
### Risk: false confidence from host compilation success
Cause:
- code compiles under a modern compiler but no longer matches segmented runtime behavior
Mitigation:
- define compile success and behavioral/ABI success as separate milestones
### Risk: loss of far-call/import provenance
Cause:
- unresolved thunk placeholders or loader-patched calls get flattened into generic helper names
Mitigation:
- preserve call provenance in notes and later exports, especially for methods that only look local after fixup repair
## Recommended Near-Term Documentation Follow-Ups
1. collect all current compiler/runtime fingerprints into one evidence note
2. add an `ABI concerns` section to future class-layout notes when a family uses far pointers or segmented ownership directly
3. draft the first minimal compatibility header for future C++ skeletons once the first class family is selected for source emission
## Current Bottom Line
The project is now documented well enough to start class lifting, but not well enough to safely emit `clean modern C++` without guardrails.
The safest present rule is:
- keep object recovery aggressive
- keep ABI assumptions conservative
- keep Track A and Track B separate in every future source milestone