Crusader_Decomp/docs/remorse-toolchain-fingerprint-evidence.md
2026-04-05 18:27:09 +02:00

175 lines
No EOL
6.6 KiB
Markdown

# Remorse Toolchain Fingerprint Evidence
## Purpose
This note gathers the strongest current compiler, runtime, loader, and ABI fingerprints that matter for eventual source reconstruction.
It exists to answer one narrow question better than the broader ABI note:
- what concrete binary evidence currently supports the working toolchain and executable-model assumptions?
This note should stay paired with [docs/remorse-rebuild-abi-notes.md](docs/remorse-rebuild-abi-notes.md).
## High-Confidence Executable Model Evidence
### 1. Bound `MZ -> NE` executable
Strong anchors from [docs/overview.md](docs/overview.md):
- outer DOS header is `MZ`
- `e_lfanew = 0x36F70`
- internal header at `0x36F70` is `NE`
- internal NE image describes `145` segments
Why it matters:
- the game is not a flat DOS EXE with incidental overlays
- the executable model already assumes segmented protected-mode program structure
### 2. Phar Lap 286 DOS extender
Strong anchors from [docs/overview.md](docs/overview.md) and [docs/phar-lap-extender.md](docs/phar-lap-extender.md):
- executable is documented as using Phar Lap 286 DOS-Extender (`RUN286`)
- major code regions are extender/runtime segments rather than game logic segments
- named loader path includes `init_dos_extender`, `load_executable_image`, `apply_relocations`, and child-transfer helpers
Why it matters:
- the startup/runtime environment is part of the program contract, not an afterthought
- Track A reconstruction must preserve this loader/executable-model reality or replace it deliberately
### 3. Runtime-patched far-call model
Strong anchors from [docs/overview.md](docs/overview.md):
- unresolved inter-segment and external calls appear in raw import as `CALLF 0000:ffff`
- those are NE loader fixup placeholders, not one dispatcher
- repaired raw import already patched `8851` internal literal far-call sites to real targets
Why it matters:
- far-call provenance is real ABI evidence
- any future source lift has to preserve which edges are ordinary local methods versus loader-significant far calls/imports
## Runtime / CRT Fingerprints
### 1. Phar Lap runtime strings
Strong anchors from [docs/phar-lap-extender.md](docs/phar-lap-extender.md):
- `13fc:0016` = `$Id: comhighc.c 1.1 91/08/06...`
- `13fc:0048` = `$Id: comutils.c 1.1 91/08/06...`
- `1760:665c` = `Copyright (C) 1986-93 Phar Lap Software, Inc.`
- `1760:73da` = `-LDTSIZE 4096 -EXTHIGH D0_0000h -NI 18 -ISTKSIZE 3`
Current safest interpretation:
- Phar Lap runtime/source provenance is directly embedded in the binary
- `comhighc.c` is the strongest current fingerprint tying part of the runtime story to High C-related runtime material
### 2. Protected-mode service and memory helpers
Strong anchors from [docs/phar-lap-extender.md](docs/phar-lap-extender.md):
- DPMI/interrupt wrappers in segment `1339`
- EMS management in segment `1677`
- task switching and child-process execution paths in `10da` and `1760`
Why it matters:
- the executable depends on a real protected-mode runtime layer with memory and interrupt service expectations
- this makes `modern compiler output that merely compiles` a weak reconstruction milestone by itself
## Binary-Structure Fingerprints That Affect Source Emission
### 1. Segmented address layout is visible throughout analysis
Strong anchors from [docs/overview.md](docs/overview.md):
- raw address model uses `SSSS:OOOO`
- game code begins only after the Phar Lap loader region
- notes repeatedly distinguish extender segments from NE gameplay segments
Implication:
- source that immediately collapses every pointer and call edge into one flat host model loses verified structure too early
### 2. Loader-sensitive call repair already affects function understanding
Strong anchors from [docs/overview.md](docs/overview.md):
- callsites had to be repaired before large parts of the raw import became meaningful
- inter-segment and external targets are encoded through relocation records, not fixed immediate addresses in the raw bytes
Implication:
- future class lifting should preserve import/far-call comments or metadata, especially for methods that only look local after fixup repair
## Working Compiler Story: What Is Safe And What Is Not
### Safe now
- Phar Lap 286 protected-mode DOS environment is real
- NE segmented executable model is real
- runtime strings directly reference `comhighc.c` and `comutils.c`
- the broad toolchain story includes Phar Lap runtime material and High C-related runtime evidence
### Not safe to claim as closed yet
- exact original compiler version for every module
- exact linker flags for the game NE image
- exact near/far defaults and calling-convention flags used by all gameplay modules
- exact rebuild recipe needed for a compatible historical executable
## Evidence Strength By Question
### Question: does segmented ABI discipline matter?
Answer:
- yes, strongly supported
Why:
- NE format, loader-patched far calls, and segment-separated code organization all point the same way
### Question: is a High C-related runtime story real or speculative?
Answer:
- real at the runtime-fingerprint level, still incomplete at the full-build-chain level
Why:
- `comhighc.c` string evidence is concrete
- full per-module compiler attribution is not yet closed
### Question: can Track A and Track B still share the same early source work?
Answer:
- yes, if early source keeps exact widths, packing, far-call provenance, and segmented-pointer placeholders visible
## What This Means For Future Real Work
When MCP class tools are ready or when hand-written skeletons start, these fingerprints should drive the rules:
1. keep exact-width aliases mandatory
2. keep packing explicit
3. keep segmented-pointer or far-pointer placeholders available
4. keep calling-convention markers visible even when still provisional
5. keep far-call/import provenance attached to lifted methods where it matters
## Highest-Value Remaining Fingerprint Questions
1. collect more direct CRT/helper signatures that distinguish Phar Lap runtime pieces from gameplay-generated code
2. identify which recovered object families most clearly cross near/far ownership boundaries
3. isolate functions whose call shape strongly suggests non-default calling conventions
4. determine the smallest rebuild slice that can test layout and call discipline before whole-program ambitions
## Bottom Line
The current toolchain story is strong enough to justify ABI-conservative source emission rules.
The safe working model remains: Phar Lap protected-mode DOS, bound `MZ -> NE` executable, loader-patched far-call environment, and a real High C-related runtime fingerprint that is informative but not yet the entire historical build recipe.