Crusader_Decomp/docs/remorse-rebuild-abi-notes.md
2026-04-05 18:27:09 +02:00

7.6 KiB

Remorse Rebuild ABI Notes

Purpose

This note records the current ABI, memory-model, and toolchain constraints that should shape any future Remorse source reconstruction.

The class-lifting notes answer what the objects probably are. This note answers what the rebuilt source must still respect if it aims to become a working executable rather than only readable C++.

Current Baseline

The live target is not a flat modern Win32 program.

Current verified binary facts:

  • DOS target
  • 16-bit protected-mode environment
  • Phar Lap 286 DOS-Extender (RUN286)
  • bound MZ -> NE executable
  • heavy use of inter-segment and external CALLF fixups

That means the default safe assumption is:

  • segmented code/data model matters
  • near/far calls matter
  • pointer width and calling convention details matter
  • loader/runtime expectations matter

Hard Constraints Already Visible In The Binary

1. Segmented addressing is real, not presentation noise

Evidence:

  • executable format is MZ -> NE
  • raw import behavior collapses unresolved calls to 0000:ffff until NE fixups are applied
  • repaired raw import had thousands of internal literal CALLF sites patched to real segment:offset targets
  • the notes repeatedly distinguish far pointers, segment:offset storage, and per-segment relocation behavior

Practical implication:

  • a rebuild target that ignores far calls and far data pointers too early will drift away from the original executable model

2. Function boundaries and external calls are loader-sensitive

Evidence:

  • CALLF 0000:ffff is a placeholder used by the NE loader for real inter-segment/external targets
  • unresolved far thunk behavior in raw import is explicitly not a real dispatcher

Practical implication:

  • source emission must preserve which calls are logically intra-object methods and which ones are ABI-significant far calls or imported runtime/library calls

3. Runtime/library layer is not trivial glue

Evidence:

  • large Phar Lap runtime/extender segments remain part of startup and low-level system behavior
  • CRT wrappers and formatter/runtime helpers are explicitly identified
  • MetaWare High C formatting/runtime wrappers are present in the notes

Practical implication:

  • the original or near-original compiler/runtime environment matters enough that just compile with a modern compiler is not a safe early assumption for an original-style rebuild

4. Object layout is tightly coupled to exact field offsets

Evidence:

  • major gameplay and UI families are still being recovered by exact offsets
  • VM/runtime helpers, dispatch entries, and entity families all depend on stable field positions

Practical implication:

  • class lifting must preserve packed layout discipline and exact-width integer choices from the start

Current Best Toolchain Read

This is still a working model, not a closed historical claim.

High-confidence environment facts

  • DOS protected mode under Phar Lap 286 extender
  • NE executable image
  • runtime/CRT evidence compatible with MetaWare High C presence in at least part of the binary toolchain story

What remains open

  • exact original compiler version
  • exact memory-model flags used for all modules
  • exact calling-convention mapping for each object family
  • exact linker/build recipe needed to reproduce compatible NE output

Track A: Original-style executable reconstruction

If the goal is to rebuild something close to the shipped executable model, the source must preserve:

  • segmented pointer distinctions
  • explicit near/far calling boundaries where needed
  • exact struct packing
  • compatible CRT/runtime assumptions
  • executable/resource layout expectations

This is the stricter track.

Track B: Behaviorally equivalent source port

If the goal is instead a working engine/game rebuild using the original data with equivalent behavior, then the source can relax some ABI constraints later.

But even on this track, the early reverse-engineering output should still preserve ABI facts long enough that the project can make an informed choice instead of accidentally forcing itself into a port.

Source-Level Rules To Adopt Early

Any future generated or handwritten code should default to these constraints:

Integer widths

  • use explicit fixed-width integer types everywhere possible
  • do not use plain int, long, or compiler-default enum width as semantic types in the first pass

Layout control

  • keep a visible packing strategy for recovered structs
  • record uncertain padding explicitly rather than letting the compiler invent it silently

Pointer model

  • keep far-pointer distinctions visible in the type system or wrapper layer
  • do not immediately collapse all pointers to one flat host pointer type if Track A remains in scope

Calling conventions

  • keep calling convention annotations explicit in working notes and emitted skeletons
  • do not assume one modern host calling convention is an adequate stand-in for every recovered method or helper

Virtual dispatch

  • preserve raw slot order in provisional vtable types
  • do not rename or reorder slots to look cleaner before the mapping is stable

Candidate ABI Support Layer

The first C++ source slices should probably compile against a small compatibility layer rather than raw host C++ alone.

Current likely categories:

  • exact-width integer typedefs
  • far/near pointer wrappers or placeholder abstractions
  • packing macros or pragmas
  • calling-convention macros
  • segmented address helper types for debugging and trace comparison
  • imported runtime service shims for file, memory, and platform calls

Immediate Compiler/Runtime Questions To Close Later

These are the most useful next ABI questions for the repo:

  1. Which compiler/runtime signatures in the binary most strongly identify the original toolchain family and version?
  2. Which current methods clearly require far-call semantics even after class lifting?
  3. Which object families can safely be emitted as host-side plain structs first, and which still need explicit segmented-pointer wrappers?
  4. What is the narrowest executable milestone that can validate calling conventions and struct layout before whole-program reconstruction is attempted?

Practical Risk List

Risk: pretty C++ that cannot rebuild the game

Cause:

  • class lifting done without ABI discipline

Mitigation:

  • keep this note paired with the class-layout notes and require exact-width/packing/calling-convention placeholders in early skeletons

Risk: false confidence from host compilation success

Cause:

  • code compiles under a modern compiler but no longer matches segmented runtime behavior

Mitigation:

  • define compile success and behavioral/ABI success as separate milestones

Risk: loss of far-call/import provenance

Cause:

  • unresolved thunk placeholders or loader-patched calls get flattened into generic helper names

Mitigation:

  • preserve call provenance in notes and later exports, especially for methods that only look local after fixup repair
  1. collect all current compiler/runtime fingerprints into one evidence note
  2. add an ABI concerns section to future class-layout notes when a family uses far pointers or segmented ownership directly
  3. draft the first minimal compatibility header for future C++ skeletons once the first class family is selected for source emission

Current Bottom Line

The project is now documented well enough to start class lifting, but not well enough to safely emit clean modern C++ without guardrails.

The safest present rule is:

  • keep object recovery aggressive
  • keep ABI assumptions conservative
  • keep Track A and Track B separate in every future source milestone