175 lines
No EOL
6.6 KiB
Markdown
175 lines
No EOL
6.6 KiB
Markdown
# Remorse Toolchain Fingerprint Evidence
|
|
|
|
## Purpose
|
|
|
|
This note gathers the strongest current compiler, runtime, loader, and ABI fingerprints that matter for eventual source reconstruction.
|
|
|
|
It exists to answer one narrow question better than the broader ABI note:
|
|
|
|
- what concrete binary evidence currently supports the working toolchain and executable-model assumptions?
|
|
|
|
This note should stay paired with [docs/remorse-rebuild-abi-notes.md](docs/remorse-rebuild-abi-notes.md).
|
|
|
|
## High-Confidence Executable Model Evidence
|
|
|
|
### 1. Bound `MZ -> NE` executable
|
|
|
|
Strong anchors from [docs/overview.md](docs/overview.md):
|
|
|
|
- outer DOS header is `MZ`
|
|
- `e_lfanew = 0x36F70`
|
|
- internal header at `0x36F70` is `NE`
|
|
- internal NE image describes `145` segments
|
|
|
|
Why it matters:
|
|
|
|
- the game is not a flat DOS EXE with incidental overlays
|
|
- the executable model already assumes segmented protected-mode program structure
|
|
|
|
### 2. Phar Lap 286 DOS extender
|
|
|
|
Strong anchors from [docs/overview.md](docs/overview.md) and [docs/phar-lap-extender.md](docs/phar-lap-extender.md):
|
|
|
|
- executable is documented as using Phar Lap 286 DOS-Extender (`RUN286`)
|
|
- major code regions are extender/runtime segments rather than game logic segments
|
|
- named loader path includes `init_dos_extender`, `load_executable_image`, `apply_relocations`, and child-transfer helpers
|
|
|
|
Why it matters:
|
|
|
|
- the startup/runtime environment is part of the program contract, not an afterthought
|
|
- Track A reconstruction must preserve this loader/executable-model reality or replace it deliberately
|
|
|
|
### 3. Runtime-patched far-call model
|
|
|
|
Strong anchors from [docs/overview.md](docs/overview.md):
|
|
|
|
- unresolved inter-segment and external calls appear in raw import as `CALLF 0000:ffff`
|
|
- those are NE loader fixup placeholders, not one dispatcher
|
|
- repaired raw import already patched `8851` internal literal far-call sites to real targets
|
|
|
|
Why it matters:
|
|
|
|
- far-call provenance is real ABI evidence
|
|
- any future source lift has to preserve which edges are ordinary local methods versus loader-significant far calls/imports
|
|
|
|
## Runtime / CRT Fingerprints
|
|
|
|
### 1. Phar Lap runtime strings
|
|
|
|
Strong anchors from [docs/phar-lap-extender.md](docs/phar-lap-extender.md):
|
|
|
|
- `13fc:0016` = `$Id: comhighc.c 1.1 91/08/06...`
|
|
- `13fc:0048` = `$Id: comutils.c 1.1 91/08/06...`
|
|
- `1760:665c` = `Copyright (C) 1986-93 Phar Lap Software, Inc.`
|
|
- `1760:73da` = `-LDTSIZE 4096 -EXTHIGH D0_0000h -NI 18 -ISTKSIZE 3`
|
|
|
|
Current safest interpretation:
|
|
|
|
- Phar Lap runtime/source provenance is directly embedded in the binary
|
|
- `comhighc.c` is the strongest current fingerprint tying part of the runtime story to High C-related runtime material
|
|
|
|
### 2. Protected-mode service and memory helpers
|
|
|
|
Strong anchors from [docs/phar-lap-extender.md](docs/phar-lap-extender.md):
|
|
|
|
- DPMI/interrupt wrappers in segment `1339`
|
|
- EMS management in segment `1677`
|
|
- task switching and child-process execution paths in `10da` and `1760`
|
|
|
|
Why it matters:
|
|
|
|
- the executable depends on a real protected-mode runtime layer with memory and interrupt service expectations
|
|
- this makes `modern compiler output that merely compiles` a weak reconstruction milestone by itself
|
|
|
|
## Binary-Structure Fingerprints That Affect Source Emission
|
|
|
|
### 1. Segmented address layout is visible throughout analysis
|
|
|
|
Strong anchors from [docs/overview.md](docs/overview.md):
|
|
|
|
- raw address model uses `SSSS:OOOO`
|
|
- game code begins only after the Phar Lap loader region
|
|
- notes repeatedly distinguish extender segments from NE gameplay segments
|
|
|
|
Implication:
|
|
|
|
- source that immediately collapses every pointer and call edge into one flat host model loses verified structure too early
|
|
|
|
### 2. Loader-sensitive call repair already affects function understanding
|
|
|
|
Strong anchors from [docs/overview.md](docs/overview.md):
|
|
|
|
- callsites had to be repaired before large parts of the raw import became meaningful
|
|
- inter-segment and external targets are encoded through relocation records, not fixed immediate addresses in the raw bytes
|
|
|
|
Implication:
|
|
|
|
- future class lifting should preserve import/far-call comments or metadata, especially for methods that only look local after fixup repair
|
|
|
|
## Working Compiler Story: What Is Safe And What Is Not
|
|
|
|
### Safe now
|
|
|
|
- Phar Lap 286 protected-mode DOS environment is real
|
|
- NE segmented executable model is real
|
|
- runtime strings directly reference `comhighc.c` and `comutils.c`
|
|
- the broad toolchain story includes Phar Lap runtime material and High C-related runtime evidence
|
|
|
|
### Not safe to claim as closed yet
|
|
|
|
- exact original compiler version for every module
|
|
- exact linker flags for the game NE image
|
|
- exact near/far defaults and calling-convention flags used by all gameplay modules
|
|
- exact rebuild recipe needed for a compatible historical executable
|
|
|
|
## Evidence Strength By Question
|
|
|
|
### Question: does segmented ABI discipline matter?
|
|
|
|
Answer:
|
|
|
|
- yes, strongly supported
|
|
|
|
Why:
|
|
|
|
- NE format, loader-patched far calls, and segment-separated code organization all point the same way
|
|
|
|
### Question: is a High C-related runtime story real or speculative?
|
|
|
|
Answer:
|
|
|
|
- real at the runtime-fingerprint level, still incomplete at the full-build-chain level
|
|
|
|
Why:
|
|
|
|
- `comhighc.c` string evidence is concrete
|
|
- full per-module compiler attribution is not yet closed
|
|
|
|
### Question: can Track A and Track B still share the same early source work?
|
|
|
|
Answer:
|
|
|
|
- yes, if early source keeps exact widths, packing, far-call provenance, and segmented-pointer placeholders visible
|
|
|
|
## What This Means For Future Real Work
|
|
|
|
When MCP class tools are ready or when hand-written skeletons start, these fingerprints should drive the rules:
|
|
|
|
1. keep exact-width aliases mandatory
|
|
2. keep packing explicit
|
|
3. keep segmented-pointer or far-pointer placeholders available
|
|
4. keep calling-convention markers visible even when still provisional
|
|
5. keep far-call/import provenance attached to lifted methods where it matters
|
|
|
|
## Highest-Value Remaining Fingerprint Questions
|
|
|
|
1. collect more direct CRT/helper signatures that distinguish Phar Lap runtime pieces from gameplay-generated code
|
|
2. identify which recovered object families most clearly cross near/far ownership boundaries
|
|
3. isolate functions whose call shape strongly suggests non-default calling conventions
|
|
4. determine the smallest rebuild slice that can test layout and call discipline before whole-program ambitions
|
|
|
|
## Bottom Line
|
|
|
|
The current toolchain story is strong enough to justify ABI-conservative source emission rules.
|
|
|
|
The safe working model remains: Phar Lap protected-mode DOS, bound `MZ -> NE` executable, loader-patched far-call environment, and a real High C-related runtime fingerprint that is informative but not yet the entire historical build recipe. |