6.6 KiB
Remorse Toolchain Fingerprint Evidence
Purpose
This note gathers the strongest current compiler, runtime, loader, and ABI fingerprints that matter for eventual source reconstruction.
It exists to answer one narrow question better than the broader ABI note:
- what concrete binary evidence currently supports the working toolchain and executable-model assumptions?
This note should stay paired with docs/remorse-rebuild-abi-notes.md.
High-Confidence Executable Model Evidence
1. Bound MZ -> NE executable
Strong anchors from docs/overview.md:
- outer DOS header is
MZ e_lfanew = 0x36F70- internal header at
0x36F70isNE - internal NE image describes
145segments
Why it matters:
- the game is not a flat DOS EXE with incidental overlays
- the executable model already assumes segmented protected-mode program structure
2. Phar Lap 286 DOS extender
Strong anchors from docs/overview.md and docs/phar-lap-extender.md:
- executable is documented as using Phar Lap 286 DOS-Extender (
RUN286) - major code regions are extender/runtime segments rather than game logic segments
- named loader path includes
init_dos_extender,load_executable_image,apply_relocations, and child-transfer helpers
Why it matters:
- the startup/runtime environment is part of the program contract, not an afterthought
- Track A reconstruction must preserve this loader/executable-model reality or replace it deliberately
3. Runtime-patched far-call model
Strong anchors from docs/overview.md:
- unresolved inter-segment and external calls appear in raw import as
CALLF 0000:ffff - those are NE loader fixup placeholders, not one dispatcher
- repaired raw import already patched
8851internal literal far-call sites to real targets
Why it matters:
- far-call provenance is real ABI evidence
- any future source lift has to preserve which edges are ordinary local methods versus loader-significant far calls/imports
Runtime / CRT Fingerprints
1. Phar Lap runtime strings
Strong anchors from docs/phar-lap-extender.md:
13fc:0016=$Id: comhighc.c 1.1 91/08/06...13fc:0048=$Id: comutils.c 1.1 91/08/06...1760:665c=Copyright (C) 1986-93 Phar Lap Software, Inc.1760:73da=-LDTSIZE 4096 -EXTHIGH D0_0000h -NI 18 -ISTKSIZE 3
Current safest interpretation:
- Phar Lap runtime/source provenance is directly embedded in the binary
comhighc.cis the strongest current fingerprint tying part of the runtime story to High C-related runtime material
2. Protected-mode service and memory helpers
Strong anchors from docs/phar-lap-extender.md:
- DPMI/interrupt wrappers in segment
1339 - EMS management in segment
1677 - task switching and child-process execution paths in
10daand1760
Why it matters:
- the executable depends on a real protected-mode runtime layer with memory and interrupt service expectations
- this makes
modern compiler output that merely compilesa weak reconstruction milestone by itself
Binary-Structure Fingerprints That Affect Source Emission
1. Segmented address layout is visible throughout analysis
Strong anchors from docs/overview.md:
- raw address model uses
SSSS:OOOO - game code begins only after the Phar Lap loader region
- notes repeatedly distinguish extender segments from NE gameplay segments
Implication:
- source that immediately collapses every pointer and call edge into one flat host model loses verified structure too early
2. Loader-sensitive call repair already affects function understanding
Strong anchors from docs/overview.md:
- callsites had to be repaired before large parts of the raw import became meaningful
- inter-segment and external targets are encoded through relocation records, not fixed immediate addresses in the raw bytes
Implication:
- future class lifting should preserve import/far-call comments or metadata, especially for methods that only look local after fixup repair
Working Compiler Story: What Is Safe And What Is Not
Safe now
- Phar Lap 286 protected-mode DOS environment is real
- NE segmented executable model is real
- runtime strings directly reference
comhighc.candcomutils.c - the broad toolchain story includes Phar Lap runtime material and High C-related runtime evidence
Not safe to claim as closed yet
- exact original compiler version for every module
- exact linker flags for the game NE image
- exact near/far defaults and calling-convention flags used by all gameplay modules
- exact rebuild recipe needed for a compatible historical executable
Evidence Strength By Question
Question: does segmented ABI discipline matter?
Answer:
- yes, strongly supported
Why:
- NE format, loader-patched far calls, and segment-separated code organization all point the same way
Question: is a High C-related runtime story real or speculative?
Answer:
- real at the runtime-fingerprint level, still incomplete at the full-build-chain level
Why:
comhighc.cstring evidence is concrete- full per-module compiler attribution is not yet closed
Question: can Track A and Track B still share the same early source work?
Answer:
- yes, if early source keeps exact widths, packing, far-call provenance, and segmented-pointer placeholders visible
What This Means For Future Real Work
When MCP class tools are ready or when hand-written skeletons start, these fingerprints should drive the rules:
- keep exact-width aliases mandatory
- keep packing explicit
- keep segmented-pointer or far-pointer placeholders available
- keep calling-convention markers visible even when still provisional
- keep far-call/import provenance attached to lifted methods where it matters
Highest-Value Remaining Fingerprint Questions
- collect more direct CRT/helper signatures that distinguish Phar Lap runtime pieces from gameplay-generated code
- identify which recovered object families most clearly cross near/far ownership boundaries
- isolate functions whose call shape strongly suggests non-default calling conventions
- determine the smallest rebuild slice that can test layout and call discipline before whole-program ambitions
Bottom Line
The current toolchain story is strong enough to justify ABI-conservative source emission rules.
The safe working model remains: Phar Lap protected-mode DOS, bound MZ -> NE executable, loader-patched far-call environment, and a real High C-related runtime fingerprint that is informative but not yet the entire historical build recipe.