deepened understanding

This commit is contained in:
Maddo 2026-04-05 18:27:09 +02:00
commit 73931629ae
32 changed files with 5011 additions and 259 deletions

484
docs/combat-dat.md Normal file
View file

@ -0,0 +1,484 @@
# COMBAT.DAT
## Scope
This note documents the shipped `COMBAT.DAT` used by the Crusader builds in this workspace.
Verified corpus facts:
- `STATIC/COMBAT.DAT`, `STATIC_1.01/COMBAT.DAT`, `STATIC_DEMO/COMBAT.DAT`, `STATIC_JP/COMBAT.DAT`, `STATIC_REGRET/COMBAT.DAT`, and `STATIC_REGRET_DEMO/COMBAT.DAT` are byte-identical.
- All six files are `1734` bytes and share SHA-256 `c6097a721141a2e66ec6d0cd578427305f28ae9efe8a206bd5cf946ce2075faf`.
- ScummVM's Crusader support code and the live `CRUSADER.EXE` database agree that this file drives NPC ranged-combat tactics through a small bytecode interpreter.
This note uses three evidence sources together:
- direct binary parsing of the shipped `COMBAT.DAT`
- live `CRUSADER.EXE` function/type names already present in the workspace export map
- ScummVM Crusader-source behavior as a readable reference model for the tactic interpreter
## High-Level Findings
- The file is a small FLEX-style archive with `14` populated tactic records.
- Each tactic record has a zero-padded `16`-byte name, followed by four `uint16` block offsets, followed by bytecode.
- In the shipped data, only block `0` and block `1` matter. Block `0` acts like an entry/setup phase. Block `1` acts like the steady-state loop. The third and fourth offset slots are present but only contain near-EOF placeholder values in this file.
- The tactic bytecode is not just a name table. It is real AI scripting: face target, move, pathfind, test line-of-fire, loop, jump, sleep, and switch tactic/block.
- Several tactics are waypoint-driven. ScummVM identifies opcode `0xA6` as a search for nearby shape `0x33A` marker objects by frame number; that matches the naming of `OneToTwo`, `ThreeToFour`, `EggHopper1`, and `123_Shoot`.
- There is one notable model discrepancy to keep in mind: the shipped archive contains a valid entry `0` named `Dumb`, but ScummVM still special-cases tactic number `0` as a generic built-in attack path rather than always interpreting `COMBAT.DAT` entry `0`. Treat that as an open compatibility detail rather than silently preferring one model.
## Archive Layout
### Outer Container
Observed on-disk structure:
- file size: `0x06c6`
- first populated record starts at `0x0280`
- archive directory begins at `0x0080`
- directory entries are `8` bytes each: `<uint32 offset, uint32 length>`
- the shipped file leaves most directory slots empty and populates only `14` entries
Populated record table:
| Index | File offset | Length | Name |
|------:|------------:|-------:|------|
| 0 | `0x0280` | `0x004f` | `Dumb` |
| 1 | `0x02cf` | `0x003f` | `Pivot` |
| 2 | `0x030e` | `0x004f` | `Advance` |
| 3 | `0x035d` | `0x0047` | `Mental` |
| 4 | `0x03a4` | `0x0052` | `Careful` |
| 5 | `0x03f6` | `0x004c` | `OneToTwo` |
| 6 | `0x0442` | `0x004c` | `ThreeToFour` |
| 7 | `0x048e` | `0x0074` | `EggHopper1` |
| 8 | `0x0502` | `0x0056` | `StepOutShootNE` |
| 9 | `0x0558` | `0x005c` | `StepOutShootNW` |
| 10 | `0x05b4` | `0x0046` | `Random_CHAOS` |
| 11 | `0x05fa` | `0x003b` | `Static_Chaos` |
| 12 | `0x0635` | `0x0043` | `Stand_Choas` |
| 13 | `0x0678` | `0x004e` | `123_Shoot` |
Two names preserve the shipped typos/spelling:
- `Stand_Choas`
- `Random_CHAOS` / `Static_Chaos` mixed casing
### Inner Record Format
Per record:
| Offset | Size | Meaning |
|-------:|-----:|---------|
| `0x00` | `16` | ASCII tactic name, NUL-padded |
| `0x10` | `2` | block 0 start offset |
| `0x12` | `2` | block 1 start offset |
| `0x14` | `2` | block 2 start offset |
| `0x16` | `2` | block 3 start offset |
| `0x18+` | variable | bytecode stream |
Observed block-offset pattern:
- Most tactics use block 0 at `0x002c` and block 1 at `0x0031`.
- The waypoint tactics `OneToTwo`, `ThreeToFour`, and `123_Shoot` have a longer setup block and therefore move block 1 farther forward.
- The later two offset slots are usually `0x004d` and `0x004e` even when they point at the final sentinel bytes rather than real independent blocks.
ScummVM's comment in `combat_dat.h` says the format has `10x2-byte offsets`, but both its constructor and the live executable logic only use four offsets. The shipped file matches the implementation, not the stale comment.
## Live Executable Integration
The live `CRUSADER.EXE` export map already carries the important attack-process fields:
- `combatDatTacticPtr` at attack-process offset `0x45`
- `combatDatTacticPtr2` at `0x49`
- `combatDatTacticCurOffset` at `0x4d`
- `combatDatBlockNo` at `0x4f`
- `tacticNo` at `0x51`
Relevant live helpers:
- `1108:0586` `Attack_SetupForTacticNo`
- `1108:0506` `Attack_SetupForBlockNo`
- `10e8:3572` `NPC_GetNPCTacticNo`
- `10e8:358c` `NPC_SetNPCTacticNo`
Current best read of the runtime flow:
1. NPC state stores a tactic number in `ItemNPCData.field21_0x5c`.
2. `Attack_SetupForTacticNo` validates that the tactic slot has loaded COMBAT.DAT data, stores the selected tactic number in the attack process, and copies the far pointer to that archive entry into the process.
3. `Attack_SetupForBlockNo` selects one of the record's four offset words and seeds `combatDatTacticCurOffset` from it.
4. The attack-process main loop reads one opcode byte at a time from the selected block and executes it as AI logic.
ScummVM's Crusader attack-process code adds one more important detail:
- `readNextWordWithData()` treats any immediate operand `>= 33000` as an index into a `10`-entry tactic-local variable array instead of as a literal value.
The shipped `COMBAT.DAT` in this workspace does not appear to use that variable indirection. All operands decoded here are direct literals.
## Used Opcode Set
Only this subset is actually used by the shipped tactics:
| Opcode | Mnemonic | Meaning |
|------:|----------|---------|
| `0x84` | `set_target_objid` | Set attack target object id |
| `0x85` | `anim_walk` | Play walk/advance animation |
| `0x88` | `turn_left_90` | Turn `90` degrees left |
| `0x89` | `turn_right_90` | Turn `90` degrees right |
| `0x8a` | `fire_small_if_clear` | Fire if line-of-fire is valid |
| `0x8d` | `pathfind_home` | Pathfind to actor home position |
| `0x8f` | `pathfind_midpoint` | Pathfind to midpoint between actor and target |
| `0x93` | `sleep_scaled` | Sleep for N ticks, scaled by difficulty |
| `0x94` | `loiter` | Run a loiter sub-process for N ticks |
| `0x95` | `face_target` | Turn toward target center |
| `0x9a` | `jump_if_dist_lt_481` | Jump if target distance is under `481` |
| `0x9c` | `jump_if_shot_blocked` | Jump if `fireDistance()` fails |
| `0x9d` | `jump_if_shot_clear` | Jump if `fireDistance()` succeeds |
| `0x9f` | `loop_begin` | Set loop counter and remember current stream position |
| `0xa6` | `pathfind_marker_frame` | Find nearby shape `0x33a` marker by frame and pathfind to it |
| `0xa9` | `face_east` | Turn east |
| `0xaa` | `face_west` | Turn west |
| `0xc0` | `jump` | Unconditional jump to stream offset |
| `0xc1` | `loop_end` | Decrement loop counter and jump back while nonzero |
| `0xff` | `flip_to_block1_restart` | If still in block 0, switch to block 1; then restart current block |
Practical interpretation of `0xff` in shipped data:
- In block `0`, it is a one-way handoff into block `1`.
- In block `1`, it behaves like `restart block 1 forever`.
That is why most tactics have a very short block `0`: setup once, then live in block `1`.
## Tactic Catalog
### 0. `Dumb`
Role:
- simplest mobile shooter with midpoint fallback
Decoded behavior:
- block 0: target object `1`, face target, then hand off to block 1
- block 1:
- if a shot is already clear, jump straight into a `3`-iteration fire loop
- otherwise pathfind to the midpoint between NPC and target, face target, and re-test line-of-fire
- if still not clear after midpoint attempts, loiter briefly
- once clear, loop `3` times: face target, fire, sleep `30`
- restart block 1
Human read:
- close some distance toward a firing lane, then burst-fire three times
### 1. `Pivot`
Role:
- stationary pivot-and-shoot burst
Decoded behavior:
- block 0: target object `1`, face target, hand off
- block 1:
- if line-of-fire is clear, enter a `3`-shot loop immediately
- loop body: face target, fire, sleep `30`
- restart
Human read:
- no movement logic beyond facing the target; just turn and shoot in bursts
### 2. `Advance`
Role:
- move forward between bursts if a clear shot is available; midpoint fallback otherwise
Decoded behavior:
- block 0: target object `1`, face target, hand off
- block 1:
- face target
- if shot is blocked, jump to midpoint-reposition logic
- otherwise run a `3`-iteration loop: face target, fire, loop
- sleep `30`
- play walk animation
- restart
- midpoint branch: pathfind midpoint, face target, re-test for clear shot, loiter briefly if still blocked, restart
Human read:
- pressure forward when able to shoot, otherwise drift toward a midpoint lane until a shot opens
### 3. `Mental`
Role:
- shoot if clear; otherwise midpoint hop and shoot
Decoded behavior:
- block 0: target object `1`, face target, hand off
- block 1:
- if shot is blocked, pathfind to midpoint first
- run a `3`-iteration fire loop
- restart
Human read:
- simpler than `Advance`: it does not include the extra walk/sleep pressure cycle, only burst-fire with midpoint correction
### 4. `Careful`
Role:
- range-gated cautious shooter
Decoded behavior:
- block 0: target object `1`, face target, hand off
- block 1:
- if target distance is not under `481`, loiter and restart
- if within `481` and shot is clear, fire once and restart
- if blocked, face target and test again
- if still blocked, pathfind midpoint and test again
- if still blocked, face target and test again
- if still blocked after all of that, loiter briefly and restart
Human read:
- only engages at medium/close range and spends more effort finding a clean shot before moving or firing
### 5. `OneToTwo`
Role:
- shuttle between marker frame `1` and marker frame `2`
Decoded behavior:
- block 0:
- target object `1`
- pathfind to marker frame `1`
- face target
- hand off
- block 1:
- pathfind to marker frame `2`
- face target
- run a `3`-iteration fire loop while line-of-fire is clear
- return to marker frame `1`
- face target
- sleep `120`
- restart block 1
Human read:
- waypoint-based peek-and-return behavior across two authored combat markers
### 6. `ThreeToFour`
Role:
- same pattern as `OneToTwo`, but between marker frames `3` and `4`
Decoded behavior:
- block 0: target object `1`, face target, pathfind marker `3`, hand off
- block 1:
- pathfind marker `4`
- face target
- if shot is clear, run a `3`-shot burst loop
- return to marker `3`
- face target
- sleep `120`
- restart
Human read:
- authored two-point lateral or cover movement using a second pair of marker frames
### 7. `EggHopper1`
Role:
- multi-marker patrol shooter
Decoded behavior:
- block 0: target object `1`, face target, hand off
- block 1:
- pathfind marker `0`, attempt a `3`-shot loop
- pathfind marker `1`, attempt a `3`-shot loop
- pathfind marker `2`, attempt a `3`-shot loop
- pathfind marker `3`, attempt a `3`-shot loop
- pathfind home
- face target
- restart
Human read:
- sweep across four authored combat markers in order, trying a short firing burst at each stop, then return home
### 8. `StepOutShootNE`
Role:
- east-west step-out gunner, nominally biased to an east-facing start
Decoded behavior:
- block 0: target object `1`, face east, hand off
- block 1:
- walk outward
- face target
- up to `3` short fire attempts with `5`-tick delays
- face west
- walk back
- face target
- another `3` short fire attempts with `5`-tick delays
- if blocked, wait `25` ticks instead of firing through
- face east
- restart
Human read:
- pop out from one side, take a short burst, step back across the lane, burst again, and reset orientation
### 9. `StepOutShootNW`
Role:
- mirror-image of `StepOutShootNE`, nominally biased to a west-facing start
Decoded behavior:
- block 0: target object `1`, face west, hand off
- block 1:
- walk outward from the west-facing side
- face target and attempt short burst fire
- if blocked, delay `25`
- face east and walk across
- repeat the short-burst pattern
- face west and restart
Human read:
- same authored cover-pop logic as the northeast variant, but with opposite home orientation
### 10. `Random_CHAOS`
Role:
- immediate short burst, then aggressive midpoint pressure
Decoded behavior:
- block 0:
- target object `1`
- face target
- run a `2`-shot loop with `30`-tick sleeps
- hand off
- block 1:
- face target
- pathfind midpoint
- face target
- run a `3`-shot loop with `30`-tick sleeps
- restart block 1 forever
Human read:
- start with a short static burst, then keep pressing toward midpoint and firing
### 11. `Static_Chaos`
Role:
- pure stationary burst shooter
Decoded behavior:
- block 0: target object `1`, face target, hand off
- block 1:
- face target
- loop `3` times: fire, loop
- sleep `30`
- restart
Human read:
- no pathfinding at all; just keep facing and firing from the current spot
### 12. `Stand_Choas`
Role:
- stationary turret-like spread pattern
Decoded behavior:
- block 0: target object `1`, hand off
- block 1:
- face target
- if shot is clear, fire once and skip to the post-shot delay
- otherwise rotate and fire in a fixed sweep: left, original/right-adjusted, then right again
- sleep `30`
- face target
- restart
Human read:
- a stand-and-sweep pattern for targets that are partially obstructed or moving across the arc
### 13. `123_Shoot`
Role:
- two marker hops followed by midpoint pressure
Decoded behavior:
- block 0:
- target object `1`
- face target
- pathfind marker `0`
- face target
- pathfind marker `1`
- face target
- run a `2`-shot loop
- hand off
- block 1:
- face target
- pathfind midpoint
- run a `3`-shot loop
- restart
Human read:
- staged opening movement across authored markers, then transition into a more ordinary midpoint-pressure gunner
## Pattern Summary
Across the whole file, the tactics cluster into a few clear families:
| Family | Tactics | Shared idea |
|--------|---------|-------------|
| stationary shooters | `Pivot`, `Static_Chaos`, `Stand_Choas` | little or no movement; rely on facing and burst loops |
| midpoint pressers | `Dumb`, `Advance`, `Mental`, `Random_CHAOS`, `123_Shoot` block 1 | move toward a midpoint lane when a shot is blocked |
| cautious/range-gated | `Careful` | only engage inside a distance window and avoid overcommitting |
| authored marker shuttles | `OneToTwo`, `ThreeToFour`, `EggHopper1`, `123_Shoot` block 0 | follow placed map markers keyed by frame number |
| step-out cover shooters | `StepOutShootNE`, `StepOutShootNW` | walk out, burst, walk back, repeat |
This is enough to treat `COMBAT.DAT` as a compact authored AI-script table rather than a loose name list.
## Open Questions
1. Tactic `0` is the main remaining semantic mismatch. The archive contains `Dumb` at index `0`, while ScummVM still routes `_tactic == 0` through `genericAttack()`. The live executable-side helper accepts tactic `0` as a normal COMBAT.DAT slot, so this needs a later direct compiled-side pass if exact retail precedence matters.
2. The later two per-record offset slots are structurally present but operationally unimportant in this file. A future pass could still check whether any retail code path ever selects block `2` or `3`.
3. The marker-search opcode `0xA6` is strongly understood from ScummVM and the tactic names, but the live-game name of shape `0x33A` and the exact authored map placement conventions remain better documented on the map-data side than in the live NE database.
## Practical RE Use
For future compiled-side work, the main safe takeaways are:
- `tacticNo` in NPC data is not cosmetic; it selects a real bytecode program.
- block `0` is usually an initialization or reposition phase; block `1` is the stable loop.
- the named tactics are portable data labels because the shipped file is identical across the local Remorse/Regret variants.
- waypoint-driven tactics should be interpreted together with local marker placements, not only from executable code.

View file

@ -0,0 +1,170 @@
# Entity Class Family Split
## Purpose
This note breaks the large seg001 `Entity` lane into a conservative class-family model that can later be promoted in Ghidra or emitted as C++ without pretending that every vtable and helper belongs to one monolithic base class.
Current goal is not full inheritance recovery. Current goal is to identify the safest boundaries between:
- one shared gameplay-entity core layout
- projectile-specific allocation and movement behavior
- debris/corpse variants
- registry/helper surfaces that look adjacent but should not be merged automatically
## Core Shared Entity Object
Strongest current common object is the gameplay entity body documented in [docs/ne-segment1.md](docs/ne-segment1.md).
Stable shared anchors:
- `0007:3f2f entity_spawn`
- `0007:40d4 entity_remove`
- `0007:4552 entity_set_position`
- `0007:4591 entity_try_place`
- `0007:5092 entity_deactivate`
Stable shared fields from the current note set include:
- `+0x00` vtable pointer
- `+0x02` slot index
- `+0x04` entity type
- `+0x19/+0x1a` flags
- `+0x3c` sprite handle
- `+0x45/+0x47/+0x49` world position
- `+0x4f/+0x51/+0x53` base position
- `+0x54/+0x56/+0x58` previous position
This is the safest current candidate for a future `Entity` or `ActorBase` style root.
## Candidate Split
### 1. `Entity` base gameplay family
Best current scope:
- allocation/spawn and placement
- common position, flags, facing, and sprite ownership
- generic remove/deactivate behavior
- registry-facing slot identity
Best current vtable anchor:
- generic/AI entity vtable `0x29aa`
Current caution:
- this family likely includes several behaviorally different actors, but the verified note set still supports one shared base before the split gets more specific.
### 2. `ShotEntity` or projectile-derived family
Strong anchors:
- `0007:28ce shot_entity_alloc`
- `0007:44a9 shot_entity_free`
- `0007:4659 projectile_init_vector`
- `0007:4b78 projectile_check_hit`
- `0007:4c2e projectile_step_update`
- `0007:4d28 projectile_trace_ray`
- `0007:51ad projectile_update_tick`
- `0007:5a99 projectile_apply_hit`
Best current distinct evidence:
- dedicated vtable `0x297e`
- extra projectile ownership/target fields through `+0x6a..+0xbd`
- separate shot sprite handle at `+0x3f`
- dedicated cleanup path in `shot_entity_free`
Current safest interpretation:
- projectile objects are not just one mode of the generic entity vtable; they deserve at least one derived-family model with their own ctor/free/update surface.
### 3. `DebrisEntity` family
Strong anchors:
- `0007:7490 debris_spawn`
- `0007:75ff entity_die`
Best current distinct evidence:
- dedicated debris/fragment vtable `0x2a57`
- corpse/debris-adjacent vtable pair `0x2a1a` and `0x2a33`
- death-spawn path uses separate velocity/facing behavior rather than only the generic entity update lane
Current safest interpretation:
- debris should stay separate from `ShotEntity`; both share movement-style fields but not the same lifecycle intent.
### 4. `CorpseEntity` or actor-remnant family
Strong current evidence:
- vtable pair `0x2a1a` and `0x2a33`
- adjacency to `entity_die` and debris-spawn behavior
Current caution:
- the notes support a corpse/remnant family, but not yet a crisp split between static remains, actor corpse, and debris fragments.
- keep this as a provisional derived branch until a dedicated caller pass closes object lifetime more tightly.
### 5. Adjacent but probably separate: dialog/menu object lane
Anchor:
- `0007:2c92 dialog_spawn`
Why it should stay separate:
- vtable `0x28b5`
- callback registration at `0x39ca`
- behavior is UI/dialog-style rather than ordinary gameplay entity movement or projectile logic
Current safest interpretation:
- this object is near the entity notes because it lives in the same broad segment lane, but it should not be promoted under the core gameplay `Entity` family automatically.
## Registry And Helper Surfaces That Should Not Be Mis-modeled
### Entity registry vtable `0x2969`
Current evidence in [docs/ne-segment1.md](docs/ne-segment1.md) shows `0x2969` stored at `0x39ca + slot*4` as a registry vtable rather than as the entity instance's primary vtable.
That means:
- do not treat `0x2969` as a normal `Entity` virtual table
- keep registry or handle-table behavior separate from per-instance inheritance
### Pure helpers that should remain free functions for now
Examples:
- `snap_entity_to_ground`
- `spawn_entity_checked`
- `map_find_spawn_point`
- `actor_find_in_view`
These may operate on entities or produce entities, but current evidence still reads better as subsystem helpers than as obvious instance methods.
## Recommended Promotion Order
1. model the shared `Entity` base layout first
2. split `ShotEntity` next because its ctor/free/update lane is strongest
3. split debris/corpse branches only after one more caller-side lifetime pass
4. leave dialog/menu object modeling separate from the entity inheritance tree
## Source-Emission Guidance
If this family is emitted to provisional C++ later, safest first skeleton is:
- one `Entity` base struct/class with the stable common layout
- one `ShotEntity` derived placeholder
- one `DebrisEntity` derived placeholder
- one unresolved `CorpseLikeEntity` placeholder if needed
- separate `DialogMenuObject` class rather than folding it into the gameplay entity tree
## Bottom Line
The current evidence strongly supports a shared gameplay entity core, but it does not support flattening generic actor, projectile, debris, corpse, and dialog/menu behavior into one class.
The right near-term move is `base first, derived families second, adjacent objects separate`.

View file

@ -0,0 +1,244 @@
# EntityDispatchEntry Class Layout
## Purpose
This note is the first focused class-layout working paper for the Remorse C++ lift.
It takes the broad `EntityDispatchEntry*` inventory entry and narrows it into a base/derived object model that can later be pushed into Ghidra as class namespaces, instance structs, vtable structs, and method ownership.
The goal is not to claim a final C++ API. The goal is to lock down the pieces that are already stable enough to support later implementation work.
## Why This Family Goes First
`EntityDispatchEntry` is the strongest current pilot family because it already has:
- a clear constructor-style base init path
- multiple derived constructor variants
- explicit owned state and word-list teardown
- stable field groups with known offsets
- repeated virtual-slot dispatch through known offsets
- strong caller evidence across scheduler, runtime-state, palette, and startup/display lanes
That makes it the best place to prototype the full later workflow:
- class namespace creation
- method ownership
- instance-struct typing
- vtable typing
- base/derived split
- later C++ skeleton emission
## Candidate High-Level Model
Current best working split:
- `EntityDispatchEntryBase`
- `EntityDispatchEntryTimed` or `EntityDispatchEntryPeriodic` for the `0x3aa6` timing/period variant
- `EntityDispatchEntryRuntimeState` for the later `000d:7e00/8078` runtime-state owned-buffer family
This should stay a working model, not a hard rename, until the class work lands in Ghidra.
## Base Constructor Surface
### `0008:ba00` `entity_dispatch_entry_init`
Current best read:
- optional allocate/init path for a `0x32`-byte base object
- stamps base vtable/list-link state using `0x3b06`, `0x2d10`, and `0x3afe`
- zeroes core state fields
- seeds the group/layer byte through `entity_set_group_id`
This is the strongest current candidate for the base constructor-style init method.
### Derived constructor variants
#### `0008:cefb` `entity_dispatch_entry_ctor_vtbl_3ad2`
- allocates if null
- reinitializes through `entity_dispatch_entry_init`
- sets vtable `0x3ad2`
- sets flag `0x100` at `+0x16`
- zeroes extension words `+0x32/+0x34`
#### `0008:d214` `entity_dispatch_entry_ctor_vtbl_3aa6`
- allocates `0x40` bytes if null
- reuses `0008:cefb`
- sets vtable `0x3aa6`
- sets flag `0x200` at `+0x16`
- zeroes fields `+0x38..+0x3e`
#### Related alloc/init helpers
- `0004:ea00 entity_dispatch_entry_alloc_type_0f5e`
- `0004:eb1f entity_dispatch_entry_ctor_0f3a_with_cache_reset`
These look more like subtype-specific factory/create helpers than pure base constructors, but they still belong in the family map.
## Destroy / Release Surface
### Base-owned word-list destruction
#### `0008:dbec` `entity_word_list_destroy`
- resets vtable to `0x2d10`
- frees list storage if present
- optionally frees object when destroy flag bit `1` is set
This is the clearest current destructor-style path on the base object.
### Runtime-state release
#### `000d:8078` `entity_dispatch_entry_release_runtime_state`
- frees paired owned buffers
- updates shared hold/owner propagation through `g_active_dispatch_entry_farptr`
- destroys embedded word-list members
This reads as the release/destructor path for the runtime-state derived family rather than for the whole base type.
## Current Base Layout
This table is a working layout, not a finished header.
| Offset | Current name | Confidence | Current meaning |
|---|---|---|---|
| `+0x00` | `type_or_kind` | Medium | Constructor/factory helpers stamp type words such as `0x0f3a`, `0x0f5e`, or `0x051e` here in some subfamilies. Base-vtable interpretation remains separate. |
| `+0x02` | `slot_index_or_count` | Medium | Used as entry slot/index in several wrappers; also used as count in the base word-list family, so exact role may vary by subtype or overlay. |
| `+0x04` | `source_type` | High | Written by `entity_set_source_type`. |
| `+0x06` | `event_type_or_list_ptr_lo` | Medium | Written by `entity_set_event_type_checked`, but also participates in word-list storage in the list-owning variant. This is likely one of the current overlay collisions to resolve later. |
| `+0x08` | `group_id_byte` | High | Low 5-bit group/layer value managed by `entity_set_group_id`. |
| `+0x0a/+0x0c/+0x0e/+0x10` | `link_or_state_words` | High | Cleared by `entity_dispatch_entry_unlink`; belong to link/extent/target/reset state. |
| `+0x12/+0x14` | `target_farptr` | High | Managed by `entity_flag20_*_target` helpers. |
| `+0x16` | `flags1` | High | Holds bits `0x10`, `0x20`, `0x100`, `0x200`, `0x4000`, and other subtype/state gates. |
| `+0x18` | `flags2` | High | Holds bits `0x40`, `0x80`, `0x100`, `0x400`, `0x1000`; used by unlink, periodic, and refresh paths. |
| `+0x1e/+0x28` | `embedded_dispatch_or_word_list_members` | Medium | Many callsites treat these as subobject or vtable-dispatch bases. Exact split still needs a dedicated subobject note. |
| `+0x24/+0x26`, `+0x2e/+0x30` | `optional_member_ptrs` | Medium | Checked before freeing both embedded word-list members. |
| `+0x32/+0x34` | `extension_words_a` | High | Zeroed by the `0x3ad2` constructor variant; also used by later runtime/VM helper flows. |
| `+0x36/+0x38/+0x3a` | `period_or_schedule_words` | Medium | Written by `entity_set_update_period_and_reschedule`; clearly timing-related in the periodic variant. |
| `+0x3c/+0x3e` | `accumulator_words` | High | Used by `entity_periodic_accumulate_and_dispatch`. |
| `+0x40` | `hold_token` | High | Shared/borrowed hold byte in startup/display and runtime-state families. |
| `+0x41/+0x42/+0x44` | `runtime_state_flags` | High | Initialized by `entity_dispatch_entry_init_runtime_state`. |
| `+0x46/+0x48` | `owned_buffer_a` | High | Runtime-state owned work/palette-like buffer. |
| `+0x4a/+0x4c` | `owned_buffer_b` | High | Second runtime-state owned buffer. |
| `+0x49` | `file_family_selector` | High for the seg126 subtype | Local selector state in startup/display transition family. Likely subtype-specific, not general base meaning. |
| `+0x5b` | `state_flags` | High for the seg126 subtype | State-machine bits in the `000c` startup/display lane. Likely subtype-specific overlay. |
| `+0x520` | `selected_resource` | Medium | Loaded file/resource object in the transition-file-family subtype. |
## Important Layout Caveat
This family is almost certainly not one flat struct with universally stable semantics at every offset. Current evidence already shows subtype overlays:
- base scheduler/dispatch-entry state
- word-list-owning variants
- periodic/timer variants
- startup/display transition variants
- runtime-state/palette-backed variants
So the safest future Ghidra modeling strategy is:
1. create a minimal `EntityDispatchEntryBase`
2. create derived or overlay structs for subtype-specific tails
3. avoid prematurely forcing every offset into one monolithic universal class layout
## Candidate Method Map
### Strong base methods
| Address | Current function | Candidate method role |
|---|---|---|
| `0008:ba00` | `entity_dispatch_entry_init` | `InitBase()` |
| `0008:bbb6` | `entity_set_source_type` | `SetSourceType()` |
| `0008:bc27` | `entity_set_event_type_checked` | `SetEventTypeChecked()` |
| `0008:bca8` | `entity_set_group_id` | `SetGroupId()` |
| `0008:bd53` | `entity_dispatch_entry_unlink` | `Unlink()` |
| `0008:be05` | `entity_increment_group_id` | `IncrementGroupId()` |
| `0008:c01d` | `entity_refresh_dispatch_state` | `RefreshDispatchState()` |
| `0008:bfb2` | `entity_clear_status_bits_from_flags` | `ClearStatusBitsFromFlags()` |
| `0008:bf8e` | `entity_call_update_vfunc14` | `CallUpdateSlot14()` |
| `0008:beee` | `entity_run_flagged_handlers` | `RunFlaggedHandlers()` |
### Pair/link/target helpers
| Address | Current function | Candidate method role |
|---|---|---|
| `0008:c7f1` | `entity_pair_update_link_slot_a` | `UpdateLinkSlotA()` |
| `0008:c890` | `entity_pair_update_link_slot_b` | `UpdateLinkSlotB()` |
| `0008:c92f` | `entity_pair_sync_a` | `PairSyncA()` |
| `0008:ca18` | `entity_pair_sync_b` | `PairSyncB()` |
| `0008:c9ee` | `entity_pair_mark_and_sync_a` | `MarkAndPairSyncA()` |
| `0008:cad7` | `entity_pair_mark_and_sync_b` | `MarkAndPairSyncB()` |
| `0008:cb2c` | `entity_flag20_clear_and_update_target` | `ClearFlag20AndUpdateTarget()` |
| `0008:cb5c` | `entity_flag20_set_and_init_target` | `SetFlag20AndInitTarget()` |
### Periodic/timed subtype methods
| Address | Current function | Candidate method role |
|---|---|---|
| `0008:cefb` | `entity_dispatch_entry_ctor_vtbl_3ad2` | `ConstructVtable3AD2()` |
| `0008:d214` | `entity_dispatch_entry_ctor_vtbl_3aa6` | `ConstructVtable3AA6()` |
| `0008:d313` | `entity_periodic_accumulate_and_dispatch` | `TickPeriodic()` |
| `0008:d3e6` | `entity_set_flag2000_and_update_active_counters` | `EnableActiveCounters()` |
| `0008:d433` | `entity_clear_flag2000_and_update_active_counters` | `DisableActiveCounters()` |
| `0008:d27e` | `entity_set_update_period_and_reschedule` | `SetUpdatePeriodAndReschedule()` |
### Word-list-owning subtype methods
| Address | Current function | Candidate method role |
|---|---|---|
| `0008:da00` | `entity_word_list_set_0408_terminated` | `SetWordList0408Terminated()` |
| `0008:dba3` | `entity_word_list_free_existing` | `FreeWordList()` |
| `0008:dbec` | `entity_word_list_destroy` | `Destroy()` |
| `0008:dc38` | `entity_word_list_ensure_contains` | `EnsureWordListContains()` |
| `0008:dcab` | `entity_word_list_append_unique` | `AppendUniqueWord()` |
| `0008:ddaf` | `entity_word_list_remove_value` | `RemoveWordValue()` |
| `0008:deea` | `entity_word_list_get_at` | `GetWordAt()` |
| `0008:df1b` | `entity_word_list_set_at` | `SetWordAt()` |
| `0008:dfa1` | `entity_word_list_find_unflagged_by_id10` | `FindUnflaggedWordById10()` |
### Runtime-state subtype methods
| Address | Current function | Candidate method role |
|---|---|---|
| `000d:7e00` | `entity_dispatch_entry_init_runtime_state` | `InitRuntimeState()` |
| `000d:8078` | `entity_dispatch_entry_release_runtime_state` | `ReleaseRuntimeState()` |
## Candidate Virtual Surface
The current evidence does not justify a fully named vtable yet, but some slot use is already real:
- `+0x14` = update callback slot used by `entity_call_update_vfunc14`
- `+0x28` = callback slot used by the periodic and proximity-style dispatch helpers
- embedded subobject/member surfaces at `+0x1e` and `+0x28` are also dispatched through helper wrappers in `far-call-targets.md`
Recommended future vtable note shape:
| Slot offset | Current best role | Evidence quality |
|---|---|---|
| `+0x14` | update/refresh callback | High |
| `+0x28` | periodic/dispatch callback | High |
| others | unknown/default stubs | Low |
## Safe Future Ghidra Modeling Steps
When manual class work starts, the safest order for this family is:
1. create class namespace `EntityDispatchEntry`
2. move only the strong base methods first
3. create minimal `EntityDispatchEntryBase` struct with the stable fields through `+0x18`
4. create subtype overlay structs for word-list, timed, and runtime-state tails
5. create a small provisional vtable for only the verified slots
Do not start by forcing one complete 0x520-byte monolithic class.
## Questions To Close Later
- whether `+0x00` should be modeled as a literal `kind` field in all variants or only in some factory-built subtypes
- exact ownership split between the base object and the embedded surfaces at `+0x1e` and `+0x28`
- whether the seg126 startup/display subtype is truly part of the same inheritance family or only shares a lower-level dispatch-entry substrate
- final base-size versus subtype-size boundaries once class namespaces exist in Ghidra
## Immediate Next Documentation Value
The next best companion note after this one is a slot-focused `SpriteNode` virtual table note, because that gives a second family with a cleaner explicit virtual surface and helps calibrate how aggressive the first Ghidra class conversion should be.

View file

@ -0,0 +1,169 @@
# Entity VM Runtime And Owner-Resource Layout
## Purpose
This note gathers the current class-lift-relevant structure for the VM runtime lane into one place.
It focuses on four connected objects:
- `EntityVmRuntime`
- `EntityVmOwnerResource`
- `EntityVmContext`
- the slot/value helpers that connect gameplay entities to owner-loaded VM source data
The goal is not full opcode recovery. The goal is to make later class authoring and C++ skeleton emission faster by freezing the current ownership model.
## High-Level Ownership Model
Current best model from [docs/raw-0008-000c.md](docs/raw-0008-000c.md) and [docs/raw-000a-000d.md](docs/raw-000a-000d.md):
1. startup path resolves a configured EUSECODE root/path
2. `entity_vm_runtime_create` allocates the main runtime body
3. runtime constructor attaches one file-backed helper created by `entity_vm_runtime_owner_resource_create`
4. gameplay entities map to slot indices through `entity_vm_slot_index_from_entity`
5. masked-create helpers test owner-side capability bits and then build per-entity or per-slot `EntityVmContext` objects
6. contexts seed their local stream/value state from owner-loaded source rows and runtime slot caches
## `EntityVmRuntime`
Strong anchors:
- `000d:44df entity_vm_runtime_init_from_path_if_configured`
- `000d:4c99 entity_vm_runtime_create`
- `000d:4d36 entity_vm_runtime_init_slots`
- `000d:4d75 entity_vm_runtime_release_slots`
- `000d:4e01 entity_vm_runtime_destroy`
Current strongest structural claims:
- runtime body is the global owner behind `0x6611/0x6613`
- front region behaves like a `0x80` entry slot table with stride `0x26`
- tail region around `+0x1300..+0x1318` holds runtime budget/default metadata plus the owner-resource helper pointer
- helper attachment lives at `+0x1315/+0x1317`
Current safe class role:
- long-lived VM root object that owns slot state, owner resource, category-base words, and runtime-wide value budgets
## `EntityVmOwnerResource`
Strong anchors:
- `000d:7000 entity_vm_runtime_owner_resource_create`
- `000d:70fd entity_vm_runtime_owner_resource_destroy`
Best current helper shape:
- compact file-backed helper object
- helper-owned count at `+0x14`
- far-pointer table at `+0x10`
- paired 16-bit table at `+0x18`
- helper vtable `+0x04` acts as size query
- helper vtable `+0x0c` materializes the `0x0d`-stride owner rows later consumed by contexts
Current safest interpretation:
- this is the most bounded class-lift target in the VM lane
- it looks like a real helper object with a compact stable layout and a clear owner relationship to `EntityVmRuntime`
### seg070 loader contract
The paired loops rooted at raw windows `0009:67b6` and `0009:6916` are current best evidence that the helper is file-backed rather than a pure in-memory descriptor copier.
Verified behavior already captured in the main notes:
- iterate helper-owned count at `+0x14`
- index path/id tables at `+0x10` and `+0x18`
- build formatted paths with two distinct format strings
- open, seek/read, close, and free loop-local buffers through the DOS/file helper lane
Current caution:
- exact per-family record schema is still open, so the helper should be modeled as a loader/index object first, not as a final descriptor-schema class.
## `EntityVmContext`
Strong anchors:
- `000d:463a entity_vm_context_try_create_masked_for_entity`
- `000d:46ec entity_vm_context_create_from_slot_index`
- `000d:48b6 entity_vm_context_free_buffer`
- `000d:48da entity_vm_context_sync_global_value_and_dispatch`
- `000d:4962 entity_vm_context_destroy`
- `000d:498f entity_vm_context_save`
- `000d:4a78 entity_vm_context_load`
Current safe role:
- per-entity or per-slot execution/context object built from runtime slot state plus owner-loaded source data
Current layout claims that matter for class lifting:
- `+0x32` stores slot index
- `+0x34` stores the additive offset word used by the `slot_load_value_plus_offset` lane
- `+0x36` embeds the mini-VM/state object
- `+0xd6/+0xd8` hold the seeded source/control stream lane
- `+0x102` is the backward-growing local payload/buffer lane
- `+0x10c/+0x10e` store a derived low/high pair reused by save/load
- `+0x117/+0x119` cache the owner-linked source pair
- `+0x123` behaves as a busy or active flag in the sync/dispatch path
Current caution:
- context dispatch semantics are still active work, so this object should be modeled around lifecycle and data ownership first, not around final method names for every opcode-facing helper.
## Gameplay Entity To VM Bridge
### Slot selection
`entity_vm_slot_index_from_entity` (`000d:45c5`) is the key bridge from gameplay entity identity into the VM lane.
Current safest summary:
- it does not choose `NPCTRIG` versus `EVENT` directly
- it maps gameplay entities into category spans using runtime base words such as `0x8c7c/0x8c7e/0x8c80`
- owner-row capability bits and later slot-value materialization do the next stage of filtering
### Masked-create helpers
The masked-create family is already class-lift relevant even before final event labels are known.
What is safe now:
- the hub at `000d:463a` checks runtime-disable state and owner-side mask bits
- low-slot and high-slot wrappers differ by slot id, mask, and whether they pass an extra signed/additive word
- wrappers like slot `0x0a` / `0x0b` are offset-specialized context creators, not separate selector universes
This means future Ghidra or C++ modeling should treat them as helper factories around `EntityVmContext`, not as methods on unrelated gameplay classes.
## Best Class-Lift Targets In This Lane
1. `EntityVmOwnerResource`
2. `EntityVmRuntime`
3. `EntityVmContext`
Why this order:
- owner-resource helper is compact and structurally bounded
- runtime has clear ownership over the helper and slot table
- context has the richest semantics but also the most unresolved dispatcher behavior
## Source-Emission Guidance
If emitted as provisional C++ later, safest early skeleton is:
- `EntityVmOwnerResource` with explicit loader/index fields and placeholder virtual/helper methods
- `EntityVmRuntime` with fixed-size slot table, owner pointer, category-base fields, and create/destroy methods
- `EntityVmContext` with exact saved-field placeholders and a distinct embedded mini-VM state member
Avoid in the first skeleton:
- speculative opcode enums presented as final
- collapsing the owner-resource helper into plain runtime fields
- flattening the source/control stream pair into one host-only pointer abstraction if Track A remains active
## Bottom Line
The VM lane now supports a real class model, but it should start with ownership and layout rather than with overconfident script-semantic names.
The most defensible current model is `runtime owns helper and slot state; contexts are short-lived objects built from slot selection plus owner-loaded source rows`.

View file

@ -0,0 +1,439 @@
# GhidraMCP Class-Lifting Endpoint Spec
## Purpose
This note drafts the endpoint surface needed to support the Remorse class-lifting workflow described in `docs/remorse-cpp-decompilation-plan.md` and grounded by `docs/remorse-class-candidate-inventory.md`.
This is not an implementation batch. It is a local design spec so that when MCP work resumes later, the endpoint set can be built in a way that matches the actual reverse-engineering workflow instead of a generic symbol-edit API.
## Design Goals
The new endpoints should make these workflows cheap and repeatable:
1. create class and namespace containers in Ghidra without touching the GUI
2. move already-renamed flat functions under explicit class ownership
3. build typed instance structs and typed vtables from verified evidence
4. attach `this`-pointer semantics and method signatures to recovered methods
5. preserve ambiguity when evidence is partial instead of forcing speculative class conversions
6. support dry-run review before any bulk symbol or datatype mutation
## Non-Goals
- automatic recovery of class hierarchies from raw heuristics alone
- one-shot `convert whole binary to C++ classes`
- speculative inheritance inference without vtable or field evidence
- silent symbol moves that hide rename collisions or ownership conflicts
## Existing MCP Behavior To Reuse
The local fork already has patterns worth reusing:
- explicit target selectors: `project_dir`, `project_name`, `folder_path`, `program_name`
- dry-run oriented edit-plan behavior
- machine-friendly outputs rather than prose-heavy summaries
- backward-compatible aliases when route names change
Every new class-lifting endpoint should follow the same conventions.
## Core Object Model Assumptions
The class-lifting workflow needs to manipulate four kinds of things explicitly:
1. namespace/class containers in the symbol tree
2. function ownership and method naming
3. datatypes for instance structs and vtables
4. binding metadata between methods, vtable slots, and instance layouts
That means symbol-only endpoints are not enough. Datatype endpoints and method-binding endpoints are part of the minimum viable feature set.
## Proposed Endpoints
### 1. `create_namespace`
Create a namespace or class container.
Parameters:
- `name`: string
- `parent_path`: string, optional
- `kind`: enum `namespace|class`, default `namespace`
- explicit target selectors, optional
Response:
- `status`
- `created`: bool
- `kind`
- `path`
- `symbol_id` or equivalent stable identifier if available
- `collision`: existing path info when create is skipped or merged
Why it matters:
- lets the workflow create `Entity`, `SpriteNode`, `EntityVmRuntime`, or similar owners before moving methods
### 2. `list_namespace_members`
Return members of a namespace or class container in a machine-friendly form.
Parameters:
- `path`: string
- `include_child_namespaces`: bool, default `false`
- `include_functions`: bool, default `true`
- `include_data`: bool, default `true`
- explicit target selectors, optional
Response:
- `status`
- `path`
- `members`: array of `{ kind, name, address?, datatype?, child_count? }`
Why it matters:
- needed for inventory verification and idempotent batch moves
### 3. `move_symbol_to_namespace`
Move a function or data symbol under a namespace/class.
Parameters:
- `symbol_address`: string, optional
- `symbol_name`: string, optional
- one of the above required
- `namespace_path`: string
- `new_name`: string, optional
- `conflict_policy`: enum `fail|keep_existing|rename_incoming`, default `fail`
- `dry_run`: bool, default `false`
- explicit target selectors, optional
Response:
- `status`
- `moved`: bool
- `old_path`
- `new_path`
- `collision`: optional structured collision detail
Why it matters:
- this is the basic operation needed to turn flat functions into methods after evidence is verified
### 4. `set_function_class`
High-level helper to move a function into a class and apply method-oriented naming/signature metadata in one call.
Parameters:
- `function_address`: string
- `class_path`: string
- `method_name`: string
- `this_param_name`: string, optional, default `this`
- `calling_convention`: string, optional
- `dry_run`: bool, default `false`
- explicit target selectors, optional
Response:
- `status`
- `function_address`
- `old_path`
- `new_path`
- `signature_before`
- `signature_after`
Why it matters:
- reduces the number of separate write operations for the common `move + rename + set this semantics` workflow
### 5. `create_or_update_struct`
Create or update a structure datatype.
Parameters:
- `name`: string
- `category_path`: string, optional
- `size`: integer, optional
- `packing`: integer, optional
- `fields`: array of field specs
Each field spec:
- `offset`: integer
- `name`: string
- `datatype`: string
- `comment`: string, optional
- `confidence`: enum `high|medium|low`, optional
- `dry_run`: bool, default `false`
- explicit target selectors, optional
Response:
- `status`
- `datatype_path`
- `created_or_updated`
- `size`
- `field_count`
- `conflicts`: array, optional
Why it matters:
- class lifting without struct authoring is not enough for readable or recompilable source
### 6. `create_or_update_vtable`
Create a vtable datatype as a structure of function pointers.
Parameters:
- `name`: string
- `category_path`: string, optional
- `slots`: array of slot specs
- `dry_run`: bool, default `false`
- explicit target selectors, optional
Each slot spec:
- `offset`: integer
- `name`: string
- `function_address`: string, optional
- `prototype`: string, optional
- `comment`: string, optional
Response:
- `status`
- `datatype_path`
- `slot_count`
- `bound_functions`: array of `{ offset, function_address, name }`
Why it matters:
- this is the missing datatype-side half of stable virtual dispatch recovery
### 7. `set_function_this_type`
Apply or update `this`-pointer typing on a function.
Parameters:
- `function_address`: string
- `this_type`: string
- `this_param_name`: string, optional, default `this`
- `this_storage`: enum `stack|register|farptr`, optional
- `calling_convention`: string, optional
- `dry_run`: bool, default `false`
- explicit target selectors, optional
Response:
- `status`
- `function_address`
- `signature_before`
- `signature_after`
Why it matters:
- many decompiler improvements only show up after the instance type is attached to the first argument correctly
### 8. `analyze_vtable`
Read-side helper that inspects a suspected vtable region and emits slot candidates.
Parameters:
- `address`: string
- `slot_count`: integer, optional
- `stop_on_invalid_pointer`: bool, default `true`
- explicit target selectors, optional
Response:
- `status`
- `address`
- `slots`: array of `{ offset, target_address, target_name, is_function, current_owner?, comment? }`
- `warnings`: array, optional
Why it matters:
- this is the minimum analysis helper needed before class authorship is applied at scale
### 9. `apply_class_layout`
Bind a class namespace, instance struct, optional vtable struct, and a set of methods in one dry-runnable transaction.
Parameters:
- `class_path`: string
- `instance_struct`: string
- `vtable_struct`: string, optional
- `vtable_address`: string, optional
- `methods`: array of method specs
- `dry_run`: bool, default `false`
- explicit target selectors, optional
Each method spec:
- `function_address`: string
- `method_name`: string
- `slot_offset`: integer, optional
- `is_virtual`: bool, default `false`
- `this_type`: string, optional
- `comment`: string, optional
Response:
- `status`
- `class_path`
- `applied_methods`
- `applied_structs`
- `warnings`
Why it matters:
- supports one-shot promotion of a verified family from notes into Ghidra with explicit review first
### 10. `export_class_candidate`
Read-side export helper for documentation and source-generation prep.
Parameters:
- `class_path`: string
- `include_struct_fields`: bool, default `true`
- `include_vtable`: bool, default `true`
- `include_method_signatures`: bool, default `true`
- explicit target selectors, optional
Response:
- machine-friendly JSON-like object containing class metadata, methods, field layouts, and slot maps
Why it matters:
- the local docs and future C++ skeleton emission need a clean export surface, not just screen scraping
## Field Schemas
### Struct field schema
Recommended stable shape:
```json
{
"offset": 0,
"name": "vtable",
"datatype": "EntityVTable *",
"comment": "Primary vtable pointer",
"confidence": "high"
}
```
### Method schema
```json
{
"function_address": "0008:ba00",
"method_name": "Init",
"slot_offset": null,
"is_virtual": false,
"this_type": "EntityDispatchEntry *",
"comment": "Base constructor-style init"
}
```
### Vtable slot schema
```json
{
"offset": 20,
"name": "OnEventType2",
"function_address": "000b:3ab2",
"prototype": "void (__far *OnEventType2)(SpriteNode *, Event *)"
}
```
## Transaction And Safety Rules
All write-capable class-lifting endpoints should support:
- `dry_run`
- explicit target selectors
- structured conflict reporting
- idempotent repeat calls where practical
- no silent overwrite of unrelated symbols or datatype fields
Recommended conflict output shape:
- `type`: `symbol_collision|datatype_collision|slot_conflict|owner_conflict|signature_conflict`
- `path` or `address`
- `existing`
- `requested`
- `resolution_options`
## Backward Compatibility And Aliases
Where practical, add aliases instead of replacing older names.
Recommended aliases:
- `create_class` -> `create_namespace(kind=class)`
- `move_function_to_class` -> `set_function_class`
- `set_this_type` -> `set_function_this_type`
- `build_vtable` -> `create_or_update_vtable`
This follows the local forks existing pattern of keeping compatibility wrappers when route names evolve.
## Suggested Implementation Order
If implementation resumes later, the smallest useful sequence is:
1. `create_namespace`
2. `move_symbol_to_namespace`
3. `set_function_this_type`
4. `create_or_update_struct`
5. `analyze_vtable`
6. `create_or_update_vtable`
7. `apply_class_layout`
8. `export_class_candidate`
That order enables immediate manual class work after only the first three or four endpoints, while leaving the richer transactional workflows for later.
## First Real Workflow To Target
The first workflow this API should make easy is the pilot family from the current inventory:
### `EntityDispatchEntryBase` promotion workflow
1. create class namespace `Remorse::EntityDispatchEntry`
2. create instance struct `EntityDispatchEntry`
3. move `0008:ba00`, `0008:bca8`, `0008:bd53`, `0008:bf8e`, `0008:c01d`, `0008:dbec`, and constructor variants under that class as methods
4. attach `this` typing
5. analyze or define vtables `0x3b06`, `0x2d10`, `0x3afe`, `0x3ad2`, `0x3aa6`
6. export the class candidate for repo-side documentation and C++ skeleton generation
If the endpoint surface handles that family cleanly, it is probably sufficient for the rest of the early C++ lifting work.
## Open Questions To Resolve Later
- whether Ghidra class namespaces or plain namespaces produce better decompiler output in this 16-bit NE environment
- how best to encode far-pointer aware `this` conventions in method signatures
- whether vtable datatypes should be attached to concrete memory addresses automatically or only on explicit request
- whether confidence annotations should live in datatype comments, decompiler comments, or external export metadata
## Summary
The endpoint surface needed here is not large, but it does need to span both symbol ownership and datatype authorship. If later MCP work only adds `move function into class`, it will still leave the hardest part of the C++ lift undone.
The minimum viable class-lifting feature set is therefore:
- namespace/class creation
- symbol-to-class moves
- `this` typing
- struct authoring
- vtable analysis/authoring
- one transactional `apply_class_layout` path

View file

@ -69,9 +69,17 @@ The tooltip now exposes generalized metadata for editor/helper objects instead o
- Decode more shape-specific field semantics for the still-unresolved editor objects, especially the remaining non-promoted invisible-wall, camera/helper, music-controller, and secret-door-switch families, and keep folding any new results back into the dedicated USECODE-link note.
- Find the No Regret replacement for the Remorse `0x024F` monster-egg workflow instead of assuming the same shape is reused.
## `0x0318` Frame `0`: `CRUMORPH`
- The older `placeholder cube` label is no longer the best behavioral read for `0x0318`. Both extracted corpora now name class `0x0318` as `CRUMORPH`: Remorse `EUSECODE_extracted/class_event_index.tsv` entry `173` and Regret `REGRET_USECODE_extracted/class_event_index.tsv` entry `174` both expose a live `equip` body at slot `0x0A`.
- The two recovered `equip` bodies differ slightly in helper naming, but they agree on the same high-level lane. Both scan nearby family-`6` actors, compare the pad `QLo` against mutable actor field `0x63`, reject dead actors, transfer control to the first live match, wait until control sticks, and then dispatch `TRIGGER.slot_20` lane `0` or `1` depending on whether that controlled actor is still alive.
- Current best read is therefore `control-transfer morph pad`, not decorative cube and not DTABLE-backed NPC spawner. The object's authored low quality byte is the local control key; `npcNum` does not carry the actor target directly, and the actor-side match is not a stable exported scene field.
- Static scene evidence is strongest in Regret, which is why the viewer promotion was first justified there. The decompressed `.cache` scenes repeatedly show nearby same-`QLo` `0x04B1` helpers close enough to expose a cautious local `CRUMORPH -> CMD_LINK` overlay rule.
- The deeper actor-target side remains intentionally unexported. The same actor-key follow-up that covered `NPC_ONLY` still applies here: the compared actor byte is mutable field `0x63`, and recovered `TRIGGER.slot_29` / `slot_2B` lanes can rewrite it after load. That keeps `CRUMORPH -> actor` arrows out of the viewer for now.
- Practical viewer implication: `0x0318` should be labeled `CRUMORPH`, should expose its `QLo` / `QHi` / `mapNum` / `npcNum` / `nextItem` bytes in tooltip metadata, should open `CRUMORPH::equip` from the USECODE action, and should keep only the already-evidenced nearby same-`QLo` `0x04B1` arrows.
## Newly Promoted Regret-Only Controllers
- `0x0318` is now promoted as `CRUMORPH`, not a blank placeholder cube. The recovered `equip` body scans nearby NPCs for a shared internal control key derived from the item's `QLo`, temporarily transfers player control to the first live match, and then brackets `TRIGGER.slot_20` with success or failure lanes.
- `0x0366` remains `NPC_ONLY`, but the latest decompressed `.cache` sweep tightens its practical viewer behavior: actor-target arrows are still not justified, while cautious local `NPC_ONLY -> 0x04B1` same-`QLo` arrows are now strong enough to expose.
- `0x04c6` / `0x04de` are now promoted as `WATCHNS` / `WATCHEW`, not generic editor leftovers. Their recovered `slot_20` bodies scan nearby `0x0510` posts by shared `QLo` and then bracket `TRIGGER.slot_20` around a watcher-specific follow-up lane.
- `0x0510` is now better treated as a `SECRET_DOOR_POST` helper target rather than an unresolved standalone controller. The strongest current viewer behavior is a cautious local arrow from `WATCHNS` / `WATCHEW` plus tooltip decoding of its `QLo`/`QHi` bytes.
@ -79,6 +87,31 @@ The tooltip now exposes generalized metadata for editor/helper objects instead o
- `0x0451` / `0x05ae` are now closed as `CRAZYEW` / `CRAZYNS`, small Regret-only hit-driven NPC wake-up relays rather than vague contextual map labels.
- `0x056d` is now closed as `VIDEOBOX`, a gated controller with a direct `equip` body, even though its higher-level gameplay meaning is still less explicit than the watcher and cryobox lanes.
## Shared Trigger Follow-Up: `0x00A2`, `0x03C1`, And `0x04E7`
### `0x04E7` Frame `0`: `DEATHBOX` In Both Games
- The `npc death` icon label now has a clean cross-game closure, not just a Remorse-side guess. Both extracted corpora expose class `0x04E7` as `DEATHBOX`, and both corpora keep the active exported body at slot `0x0A` (`equip` / `func0A`).
- That means the Remorse equivalent is exact rather than approximate: same shape id, same class label, same nearby-`DEATHBOX` scan from `NPCDEATH.slot_20`, and the same practical viewer interpretation as an NPC-death helper/controller keyed by local `QLo`.
- Practical viewer implication: Regret should no longer leave `0x04E7` as an anonymous editor object when the underlying usecode/export evidence already matches Remorse exactly.
### `0x00A2`: `PANELEW`
- Both extracted corpora now close `0x00A2` directly as `PANELEW`, the east-west counterpart to `PANELNS`, not as a generic unnamed wall button.
- Recovered body `PANELEW::use` is small but consistent across both games:
- if `frame == 0`, it returns immediately
- otherwise, if the panel's map byte is clear, it dispatches `TRIGGER.slot_20` lane `0` from the panel item itself
- The handler does not need to read a second bespoke target field because the downstream trigger family already uses the panel's local `QLo` as the practical authored link id.
- Practical viewer implication: `0x00A2` should be labeled `PANELEW`, should open `PANELEW::use` from the USECODE action, and should participate in the same cautious nearby same-`QLo` `0x04B1` helper-arrow rule already used for `PANELNS` and other local switch/controller shapes.
### `0x03C1`: `GENERATR`
- The old `generator` hunch is directionally right, but the extracted name is now explicit in both games: class `0x03C1` is `GENERATR`.
- The direct active lane is very small and decisive. `GENERATR::gotHit` does not contain a long custom destruction script; it simply excludes the source item and immediately spawns `TRIGGER.slot_20` lane `0` from that same item.
- Current safest read is therefore `destroyable generator/controller` rather than `free-standing scripted puzzle object`: destroying it is useful because it forwards the object's local trigger key into the standard trigger network.
- There is also a second, narrower set-piece lane in Remorse. Recovered `SATARG::use` explicitly scans nearby `shape=0x03C1` items during its countdown/shutdown sequence and drives them through `ITEM.slot_28` beside the related `0x03BF` bank, which fits authored generator-bank or power-node shutdown scenes rather than a different standalone class meaning.
- Practical viewer implication: `0x03C1` should be labeled `GENERATR`, should open `GENERATR::gotHit` from the USECODE action, and should expose the same cautious nearby same-`QLo` `0x04B1` helper arrows as other trigger-source objects because its recovered destruction lane feeds directly into `TRIGGER`.
## Actor-Key Family Follow-Up
- The latest actor-link follow-up did not justify exporting a stable `NPC_ONLY -> actor` or `CRUMORPH -> actor` overlay from static map/cache data alone.

View file

@ -13,12 +13,14 @@ The implementation uses extracted `class_event_index.tsv` results plus existing
| `MONITNS` (`0x0102`) | `MONITNS::use` (`slot 0x01`) | Existing gameplay notes tie shape `258` / `0x0102` to a live monitor/computer-adjacent use handler, making it a strong non-editor first-view script target. |
| `MONITEW` (`0x0165`) | `MONITEW::use` (`slot 0x01`) | Disasm crosswalks shape `0x0165` to the east-west monitor variant, which keeps the same live computer-adjacent use handler family. |
| `PANELNS` (`0x00A1`) | `PANELNS::use` (`slot 0x01`) | Verified panel-switch wrapper for the same nearby trigger-helper chain. |
| `PANELEW` (`0x00A2`) | `PANELEW::use` (`slot 0x01`) | East-west panel-switch counterpart to `PANELNS`; nonzero frames with clear map state forward the panel's local `QLo` into `TRIGGER.slot_20` lane `0`. |
| `CRUMORPH` (`0x0318`) | `CRUMORPH::equip` (`slot 0x0A`) | Recovered control-transfer pad body scans nearby NPCs for a local-`QLo` control key match, temporarily hands control to the first live hit, and then dispatches `TRIGGER.slot_20` lane `0` or `1`. |
| `NPCTRIG` (`0x0363`) | `NPCTRIG::equip` (`slot 0x0A`) | Crosswalked shape/class match; the compact slot-`0x0A` body is still the strongest active-event frontier for this trigger family. |
| `CRUZTRIG` (`0x0365`) | `CRUZTRIG::gotHit` (`slot 0x06`) | Disasm crosswalks shape `0x0365` to CRUZTRIG, and `gotHit` is the recovered live body for this trigger/helper family. |
| `VMAIL` (`0x0367`) | `VMAIL::slot_0a` (`slot 0x0A`) | Disasm crosswalks shape `0x0367` to VMAIL; slot `0x0A` is the active helper body even though its final semantic label is still weaker than the slot number. |
| `CARD_NS` (`0x031D`) | `CARD_NS::use` (`slot 0x01`) | Thin wrapper into the downstream `SWITCH` / `TRIGGER` path. Regret also exposes `cast`, but `use` remains the stable first inspection point. |
| `SPANEL` (`0x03AA`) | `SPANEL::use` (`slot 0x01`) | Same local `QLo`-keyed switch/controller family as `PANELNS` and `CARD_NS`. |
| `GENERATR` (`0x03C1`) | `GENERATR::gotHit` (`slot 0x06`) | Destroyable generator/controller lane; the recovered body immediately excludes the source item and dispatches `TRIGGER.slot_20` lane `0`, making it the right first inspection point for power-node objects. |
| `FASTSKIL` (`0x0120`) | `FASTSKIL::enterFastArea` (`slot 0x0F`) | Difficulty-gated trigger router, including the verified `QLo`, `QLo + 1`, and `QLo + 2` remap lane. |
| `SKILLBOX` (`0x04E3`) | `SKILLBOX::equip` (`slot 0x0A`) | Corpus-backed skill-gated controller body; this is the active recovered lane, not `enterFastArea`. |
| `CHEST_NS` (`0x054F`) | `CHEST_NS::use` (`slot 0x01`) | The live chest-open handler runs the animation/audio path and the same general FREE-backed content-spawn flow as the east-west chest family. |

View file

@ -61,9 +61,14 @@ Evidence used here:
- The `0006:43c3` lane now shows where the owner-row bit-`0x0040` probe is consumed locally: inside a subtype-`0x20c` dispatch-entry object rather than at a generic descriptor-choice site. That improves caller provenance for `0005:295f`, but it still does not prove which owner-loaded class family seeded the later VM data.
- The second direct caller family is now closed too. Old `0006:c5f0` lands at live call site `1128:0ff0` inside `Item_ReceiveHit`, where the non-NPC item path calls `Item_GetDamaged` with hitter sentinel `0x4000`, packed damage `(damagetype << 8) | damage_lo`, and a local flag-out byte that records the returned owner-row bit-`0x0040` capability before the local destruction / impact follow-up.
- The third direct caller family is now closed too. Old `0007:3584` lands at live call site `1138:1384` inside `SuperSprite_HitAndFinish`, where the non-NPC collision lane probes `Item_GetDamaged` with hitter sentinel `0x4000` and packed damage `(firetype << 8) | damage`; only when that flag-out byte stays clear and the target is not fixed does the lane fall through into the local `Item_ReceiveHit` knockback path.
- The next earlier compiled-side producer is now closed for one real gameplay family. `AreaSearch_CollideMove` at `10e0:123a` queues paired `0x236` storage processes in both its first-collision and linked-list collision lanes: `0x20b` is always created as the local `hit` notifier from moving item to collided item, and the reciprocal `0x20c` process is always created as the `got-hit` notifier from collided item back to the moving item.
- The local queue-helper cluster around that producer is now named in the live NE database too: `10f0:046d` = `storage_process_ref_list_create`, `10f0:0502` = `storage_process_ref_list_append`, and `10f0:06b5` = `storage_process_ref_list_destroy`. Their recovered contract is a counted far-pointer array drained later by `StorageDataProcess_RunAndTerminateProcs`, not a darker allocator or owner-resource helper.
- The producer surface above `AreaSearch_CollideMove` is now wider but still collision-local. Direct callers currently verified in the live session are `Item_LegalMoveToPoint`, `Item_LegalMoveToPointWithCollisionInfo` (`10a0:1841`), `GravityProcess_Run`, `AnimPrimitive_CheckToStartNewAnimation`, `AnimPrimitiveProcess_Run`, `SuperSprite_AdvanceFrame`, and `GravityProcess_FastAreaCleanup` (`1038:11fd`). No non-collision caller currently reaches `StorageDataProcess_Create` or `StorageDataProcess_RunAndTerminateProcs` directly.
- Two more local movement helpers are now named structurally in the live NE database: `10a0:1841` = `Item_LegalMoveToPointWithCollisionInfo`, a legal-move wrapper that preserves blocked/collision outputs around the same `AreaSearch_CollideMove` commit path, and `1138:0ee8` = `SuperSprite_SweepTestAdvance`, the supersprite-side sweep probe that stores the first collision before `SuperSprite_AdvanceFrame` commits the move.
- The surrounding movement and cleanup helper layer is now less anonymous too. `10e0:11c5` = `AreaSearch_SweepShapeBetweenPoints`, `10e0:15b4` = `AreaSearch_SweepItemToPointWithStepUp`, and `10e0:162f` = `AreaSearch_SweepShapeBetweenPointsWithStepUp` now cover the step-aware sweep path beneath the legal-move wrappers, while `10f0:03ff` = `StorageDataProcess_Release` and `10f0:0542` = `storage_process_ref_list_terminate_item_matches` close the local queue-release side. Adjacent seg090 helper `10a0:196f` is now `ItemCache_PushAndPopToDirectionalOffset`.
- `0005:2c35` remains outward-dark in the current NE session: instruction search still shows no recovered code or data xrefs, and its proven local role is still only `sign-extended additive word -> slot 0x0a / mask 0x0400 -> generic masked hub`.
- The live `CRUSADER.EXE` integration batch is now extended for this lane. Comment-backed anchors were already present at `1420:0dc5` (`Item_GetUsecodeClassId`), `1420:0e3a` (`Usecode_ItemCallEvent`), `10a0:2718` (`Item_Hit`), `10a0:275f` (`Item_GetDamaged`), `10f0:02d9` (`StorageDataProcess_Create`), and `10f0:0379` (`StorageDataProcess_Run`), with branch comments at `10f0:03c3` and `10f0:03e5` preserving the verified `0x20c` / `0x20b` split; new live comments now also anchor the remaining direct caller sites at `1128:0ff0` and `1138:1384`.
- Result of this pass: all currently recovered direct `0005:295f` caller families are now closed, but the compiled-side selector evidence still bottoms out at subtype-gated dispatch or generic gameplay damage consumers plus owner-row capability bits, not a concrete `NPCTRIG` / `EVENT` class-family choice. The next defensible NE step is therefore an earlier producer that assigns subtype `0x20b/0x20c` into field `+0x3c` or otherwise chooses the owner-loaded class family before these generic damage consumers run.
- The live `CRUSADER.EXE` integration batch is now extended for this lane. Comment-backed anchors were already present at `1420:0dc5` (`Item_GetUsecodeClassId`), `1420:0e3a` (`Usecode_ItemCallEvent`), `10a0:2718` (`Item_Hit`), `10a0:275f` (`Item_GetDamaged`), `10f0:02d9` (`StorageDataProcess_Create`), and `10f0:0379` (`StorageDataProcess_Run`), with branch comments at `10f0:03c3` and `10f0:03e5` preserving the verified `0x20c` / `0x20b` split; this pass adds live comments at `10e0:123a`, `10f0:046d`, `10f0:0502`, and `10f0:06b5` and promotes the queue helpers to stable names.
- Result of this pass: all currently recovered direct `0005:295f` caller families are now closed, and the current `0x236` storage-process producer surface is now mapped far enough to say something negative too: the queue remains collision-local in the current database and is reached through movement, gravity, animation, and supersprite area-search paths rather than through a broader owner-family selector. The next defensible NE step is therefore either an earlier policy/dispatch layer deciding when those movement lanes call `AreaSearch_CollideMove`, or the first real non-collision producer if one exists elsewhere.
## Priority 2: Rendering / Camera / Tile-Visibility / Watch-Controller Lane

View file

@ -0,0 +1,207 @@
# Presentation Callback Broker Layout
## Purpose
This note isolates the current class-lift-relevant evidence for the callback/broker object rooted at `0x4588`.
The subsystem name is still intentionally conservative. The goal here is to freeze the object model, lifecycle, and vtable-slot surface that are already well evidenced, so later Ghidra class work does not have to rediscover them.
Current working family name:
- `PresentationCallbackBroker`
That name is still provisional, but it fits the current evidence better than a generic allocator or generic render-object label.
## Why This Looks Like A Real Object Family
Current evidence does not support treating `0x4588` as a stray callback function pointer.
What is already stable:
- one installed nullable FAR object pointer at `0x4588`
- explicit once-only install and teardown helpers
- live vtable slots `+0x04`, `+0x08`, and `+0x0c`
- state snapshot globals at `0x458c` and `0x4590`
- once guards at `0x4594` and `0x4595`
- a fallback/auxiliary buffer pointer at `0x45a6`
That is strong enough to treat this as a typed broker/helper object with global lifetime rather than as a raw callback cell.
## Core Lifecycle
### Install
Strong anchors:
- `000a:4913 runtime_callback_object_init_once`
- `000a:4a1f video_bios_state_snapshot`
Current verified behavior from the raw notes:
- checks one-time guard `0x4594`
- snapshots state through `video_bios_state_snapshot`
- stores previous/current state words in `0x458c` and `0x4590`
- installs the incoming FAR object pointer at `0x4588`
- ensures fallback buffer allocation at `0x45a6`
Current safest role:
- object installation is tied to video or presentation state capture, not to generic allocation alone
### Teardown
Strong anchor:
- `000a:4a56 runtime_callback_object_teardown_once`
Current verified behavior:
- checks once-only teardown guard `0x4595`
- clears `0x4588` when an object is present
- emits vtable `+0x0c` callback only when `0x4590 != 0x458c`
- then calls vtable slot `+0x04`
- follows with cleanup through `FUN_0009_0d30()`
Current safest role:
- the broker is not only a notifier; it also owns a release/finalize path and a final conditional dispatch when presentation state changed
### Finalize sweep phase
Strong anchor:
- `0009:b1c3 allocator_phase_finalize_pass`
Verified behavior:
- accepts finalize phase `0` or `1`
- forwards that phase twice through broker vtable slot `+0x08`
- then sweeps allocator heads through `allocator_head_finalize_sweep`
Current safest role:
- slot `+0x08` looks like a finalize or phase-advance callback surface shared with allocator/presentation cleanup, not a normal per-entity method
## Vtable Surface
### Slot `+0x04`
Evidence:
- teardown path calls it as the broker release path
Current working meaning:
- `release` or `shutdown`
### Slot `+0x08`
Evidence:
- `allocator_phase_finalize_pass` forwards phase `0` or `1` twice through this slot
Current working meaning:
- `phase_finalize` or `advance_finalize_phase`
### Slot `+0x0c`
Evidence:
- teardown emits it conditionally when recorded state changed
- `entity_cleanup_resources_and_dispatch` uses it at `000d:9d5e` and `000d:a3b7`
- raw notes also document callers `000a:b9e5` and `000a:ba66`
- one caller uses literal mode-like pair `0x0101`
Current working meaning:
- `emit_state_pair` or `dispatch_state_change`
Current caution:
- payload semantics are still not closed well enough to call this a specific display-mode or palette-event method with confidence
## Global State Cluster
| Address | Current role |
|---------|--------------|
| `0x4588` | nullable FAR broker object pointer |
| `0x458c` | recorded state word A |
| `0x4590` | recorded state word B |
| `0x4594` | install-once guard |
| `0x4595` | teardown-once guard |
| `0x45a6` | fallback or auxiliary FAR buffer |
Current safest class-lift read:
- this global cluster behaves like one installed process-wide broker instance plus its remembered state and support buffer
## Caller-Side Payload Evidence
The payload side is important even though exact semantics are still open.
Verified pairs already documented:
- `entity_cleanup_resources_and_dispatch` call at `000d:9d5e` uses object fields `+0x12d/+0x12f`
- matching call at `000d:a3b7` uses object fields `+0x74f/+0x751`
- one live caller uses literal `0x0101`
What this supports now:
- slot `+0x0c` is consuming compact two-word state/payload pairs
- those pairs are presentation-adjacent enough to align with the earlier video-state snapshot evidence
What it does not yet support:
- one final semantic label for those two words across every callsite
## Relationship To Other Families
### Presentation and startup/display lane
The broker note belongs next to the startup/display and runtime-state family notes because its callers overlap with that cleanup/palette/presentation lane.
Strong surrounding anchors:
- `entity_cleanup_resources_and_dispatch`
- palette/state handoff work in `FUN_000d_938c`
- active dispatch-entry hold token at `g_active_dispatch_entry_farptr[+0x40]`
Current safest read:
- the broker is presentation-side infrastructure shared by cleanup/finalize paths, not a child class of `EntityDispatchEntry`
### Allocator lane
The allocator interacts with this broker through finalize phases, but current evidence still reads as `allocator client callback` rather than `allocator-owned object`.
That means:
- keep allocator classes and this broker separate in future class modeling
## Class-Lift Guidance
If promoted in Ghidra later, safest current move is:
1. create a conservative owner like `PresentationCallbackBroker`
2. model the global state cluster as supporting globals, not as instance fields until constructor ownership is tighter
3. create a provisional vtable with slots `+0x04`, `+0x08`, and `+0x0c`
4. leave slot names conservative and role-based
5. do not promote this family as the first MCP/class-lift pilot
Why not first:
- lifecycle is strong, but subsystem semantics are still weaker than `EntityDispatchEntry`, `SpriteNode`, or `EntityVmOwnerResource`
## Best Next Evidence To Collect Later
1. close more callers of vtable slot `+0x0c`
2. classify the exact payload meaning of the two-word pairs
3. decide whether `0x45a6` is an owned buffer, fallback object, or adapter scratch lane
4. tighten the constructor/installation provenance in the live NE program, not only the raw notes
## Bottom Line
The `0x4588` family is now documented well enough to be treated as a real object candidate.
The safest current model is: one global presentation-state callback broker with a small live vtable surface, explicit install/teardown, conditional state-pair emission, and allocator-linked finalize phases.

View file

@ -518,6 +518,18 @@ The next gameplay-side wrapper pass now extends well past the three earlier seed
- Taken together, the new seg004 and seg006 callers strengthen the existing read of the still-dark wrappers `0005:2c35` (`0x0400:0x000a`) and `0005:2c68` (`0x0800:0x000b`). Those wrappers still have no direct caller evidence, but they now sit inside a larger verified subfamily of `extra-word masked materializers` whose known members feed state selectors, class-linked values, or other gameplay-side payload resolution instead of acting as the real upstream selector into `entity_vm_opcode_sequence_run`.
- MCP-native function xrefs now reinforce that stopping point rather than changing it: `entity_vm_context_try_create_masked_for_entity` reports the expected direct callers through `0004:f047`, `0004:f076`, the named `0005` wrapper island, and the two seg006 callsites `0006:0bbc` / `0006:10e7`, while `entity_vm_opcode_sequence_run` plus the dark `0x0400/0x000a` and `0x0800/0x000b` wrappers still surface no direct function-xref callers in the current database. The best next path therefore remains caller-frame recovery or nearby unnamed-function repair, not another generic masked-hub sweep.
#### Latest verified NE pass: collision producer and local storage-process queue
- The next earlier compiled-side producer for the already-named `StorageDataProcess_Create` / `StorageDataProcess_Run` pair is now closed in the live `CRUSADER.EXE` session. `AreaSearch_CollideMove` at `10e0:123a` allocates a local queue, then emits paired `0x236` processes in both the first-collision lane and the linked-list collision lane.
- The subtype assignment is now explicit at the caller, not just inferred from `StorageDataProcess_Run`: `0x20b` is the local `hit` notifier from the moving item to the collided item, and the reciprocal `0x20c` process is the `got-hit` notifier from the collided item back to the moving item. The first-collision lane uses the precomputed collision magnitude `local_4` as the damage word; the later linked-list lane uses `0`.
- The same pass also closes the local queue-helper trio in seg031. `10f0:046d` is now `storage_process_ref_list_create`, allocating the small queue header plus a counted far-pointer array; `10f0:0502` is now `storage_process_ref_list_append`, storing one `StorageDataProcess` far pointer and recording the assigned slot index in process field `+0x3a`; and `10f0:06b5` is now `storage_process_ref_list_destroy`, freeing the array and optionally the header object.
- The same live pass also widens the producer surface around that queue without breaking the earlier read. Direct callers into `AreaSearch_CollideMove` are now confirmed as movement/collision heavy: `Item_LegalMoveToPoint`, `Item_LegalMoveToPointWithCollisionInfo`, `GravityProcess_Run`, `AnimPrimitive_CheckToStartNewAnimation`, `AnimPrimitiveProcess_Run`, `SuperSprite_AdvanceFrame`, and `GravityProcess_FastAreaCleanup` (`1038:11fd`).
- Two more structural names now anchor that caller set in the live NE database. `10a0:1841` is `Item_LegalMoveToPointWithCollisionInfo`, the legal-move wrapper variant that preserves blocked/collision outputs around the same area-search commit path, and `1138:0ee8` is `SuperSprite_SweepTestAdvance`, the supersprite-side sweep probe that stores the first collision before `SuperSprite_AdvanceFrame` commits movement.
- The same movement lane is now tighter at the helper level too. `10e0:11c5` is now `AreaSearch_SweepShapeBetweenPoints`, the thin wrapper that seeds the search struct and forwards one shape/path sweep into `AreaSearch_SweepTestPt`; `10e0:15b4` is `AreaSearch_SweepItemToPointWithStepUp`, the item-based bridge from current item position and shape into that sweep path; and `10e0:162f` is `AreaSearch_SweepShapeBetweenPointsWithStepUp`, the step-aware wrapper that retries same-z sweeps with vertical offsets and optional `+8` / `+9` step-up probes before returning the resolved point in `srch->pt`.
- The seg031 queue now has its release-side cleanup pair named as well. `10f0:03ff` is `StorageDataProcess_Release`, a release path that terminates queued peer processes referencing the same item before unlinking both MList hooks, and `10f0:0542` is `storage_process_ref_list_terminate_item_matches`, the counted-array helper that clears matching queue slots and forces termination for processes whose `itemno` or `otheritem` matches the requested item.
- One adjacent seg090 helper is now anchored structurally too: `10a0:196f` is `ItemCache_PushAndPopToDirectionalOffset`, which pushes the current item into the cache and repositions the cache pop target to the current point plus one direction-offset lookup from the local `0x0ffe` / `0x100e` tables.
- This moves the VM/caller frontier one step earlier without overclaiming the selector. The closed producer family is still a gameplay collision queue, not an owner-loaded class-family chooser, and no direct non-collision caller currently reaches `StorageDataProcess_Create` or `StorageDataProcess_RunAndTerminateProcs`. The remaining gap is therefore the earlier policy layer that decides when those movement lanes call `AreaSearch_CollideMove`, or the first non-collision producer if one exists elsewhere.
| `000c:f844` | `entity_vm_context_setup` | Calls `entity_vm_stack_init_with_data`, then sets `+0xd6..+0xe3` with position/dimension/state params |
| `000c:f600` | `entity_vm_pair_stack_push` | Push (word_a, word_b) onto 31-entry array at `[ptr+0x80]` (count); error if full |
| `000c:f63c` | `entity_vm_pair_stack_pop` | Pop and return word from pair stack; error if empty |

View file

@ -0,0 +1,115 @@
# Remorse Class Candidate Inventory
## Purpose
This note is the working inventory for the first C++-oriented class lift in Remorse.
It is intentionally narrower than a full source plan. The goal here is to identify the object families that already have enough verified evidence to justify explicit class modeling in Ghidra later, even before the MCP layer grows class-authoring endpoints.
Until MCP can create namespaces, move methods, and build typed vtable/struct layouts directly, this note should act as the canonical queue for manual class-upgrade work.
## Selection Rules
A family belongs here only if at least some of the following are already true in the current notes:
- a constructor-style allocator/init path exists
- a destructor, teardown, or release path exists
- one or more stable vtable roots or slot dispatches are known
- instance fields have repeatable meanings across multiple methods
- the family has enough caller context to separate instance methods from generic helpers
Families that are only `callback-shaped` or `object-like` but still lack a safe subsystem label can stay here at lower confidence. They are still useful because they tell us where later class tooling must preserve ambiguity.
## Confidence Scale
- High: enough evidence to start class namespaces, instance structs, and method ownership now
- Medium: class-like enough to inventory and maybe type, but higher-level naming or ownership is still partially open
- Low: strong object mechanics, but subsystem naming or field ownership is still too ambiguous for broad lifting
## Inventory
| Candidate family | Confidence | Current best class-level read | Core evidence | Ctor / create evidence | Dtor / release evidence | Vtable evidence | Size / layout evidence | Immediate lift value |
|---|---|---|---|---|---|---|---|---|
| `EntityDispatchEntryBase` | High | Base dispatch-entry object used by scheduler/event and transition/runtime-state families | `entity_dispatch_entry_init`, repeated field writes, list ownership, flag/state helpers | `0008:ba00 entity_dispatch_entry_init` alloc/init path for `0x32` bytes; variant allocators in `0004:ea00`, `0008:cefb`, `0008:d214` | `0008:dbec entity_word_list_destroy`; `000d:8078 entity_dispatch_entry_release_runtime_state` for runtime-state flavor | base/root vtables `0x3b06`, `0x2d10`, `0x3afe`; derived vtables `0x3ad2`, `0x3aa6`; slot dispatch at `+0x14`, `+0x28` | stable fields at `+0x02`, `+0x04`, `+0x06`, `+0x08`, `+0x0a..+0x18`, `+0x32..+0x3e`, plus runtime-state tail | Strongest pilot family for the first full class-model pass |
| `EntityDispatchEntryRuntimeState` | High | Runtime-state / palette-backed derived dispatch entry with owned buffers and hold-state propagation | runtime-state init/release pair, palette-entry emission families, owner-byte propagation through `g_active_dispatch_entry_farptr` | `000d:7e00 entity_dispatch_entry_init_runtime_state`; several `dispatch_entry_create_*_palette_state*` helpers build this family | `000d:8078 entity_dispatch_entry_release_runtime_state` frees owned buffers and updates hold-state propagation | inherits dispatch-entry behavior; runtime-state users dispatch through vtable slot `+0x08` after waits | owned fields `+0x41/+0x42/+0x44`, paired buffers at `+0x46/+0x48` and `+0x4a/+0x4c` | Excellent second step after the base dispatch-entry type |
| `SpriteNode` | High | Tree-based UI/render node with child links, dirty-state propagation, event dispatch, and destructor ownership | recursive tree helpers, event switch dispatch, dirty/update family, destructor | constructor not yet explicitly named in current notes, but object-init helper and repeated sprite-node ownership patterns are strong | `000b:326e sprite_node_destroy` releases child nodes and frees self | event dispatch through vtable slots `+0x14/+0x18/+0x20/+0x24`; many default vtable stubs identified in seg091 | repeated fields at `+0x19/+0x1b`, `+0x21`, `+0x23`, `+0x29`; global focus pointer interaction at `0x4fd0:0x4fd2` | Strong candidate because virtual surface is already visible and bounded |
| `Entity` | High | Base gameplay entity / actor-like object with multiple derived vtables for generic, shot, corpse, and debris behaviors | stable NE entity layout, repeated lifecycle helpers, projectile/debris subfamilies | `0007:3f2f entity_spawn`; `0007:435e shot_entity_alloc`; `0007:7490 debris_spawn`; `0007:2c92 dialog_spawn` is adjacent but likely separate family | `0007:40d4 entity_remove`; `0007:5092 entity_deactivate`; `0007:44a9 shot_entity_free` for projectile flavor | vtables `0x29aa`, `0x297e`, `0x2a1a`, `0x2a33`, `0x2a57`, plus registry vtable `0x2969`; multiple virtual-slot helper wrappers in raw notes | strong instance layout from `+0x00` through `+0xbd` in `ne-segment1.md` | Central long-term class family, but should probably be split into base + derived subfamilies |
| `DialogMenuObject` | Medium | Small UI/dialog object family sharing one vtable and event-notify wrappers | `dialog_spawn`, menu/cursor event notify wrappers, UI update callback | `0007:2c92 dialog_spawn` allocates object and stamps vtable `0x28b5` | no standalone destructor named yet in current notes | vtable `0x28b5`; `cursor_event_notify_*`, `menu_event_notify_*`, and `ui_update_callback` all behave like method wrappers | enough to establish family ownership, but layout is still thin | Good compact pilot if a smaller UI-oriented family is preferred |
| `WatchEntityController` | Medium | Global controller/watch/camera object with explicit virtual dispatch and startup/display involvement | global object at `0x2bd8`, dispatch wrapper, create-global path, startup/display callsites | `0007:ba00 watch_entity_controller_create_global` delegates to `0007:ba45 watch_entity_controller_create`, stamps type `0x2c2b`, stores global object | no direct destructor identified in current notes | repeated dispatch through vtable slots `+0x24`, `+0x2c`, and `+0x30` | global-object ownership clearer than field layout; seed row at `0x2be4` into callback table | Worth inventorying now because it will benefit immediately from namespace/method grouping |
| `EntityVmRuntime` | High | Main VM runtime object that owns owner-resource helper, cached slot/value state, and category-base setup | creation/load path is structurally stable and repeatedly cross-checked against extracted usecode evidence | `000d:44df entity_vm_runtime_init_from_path_if_configured`; `000d:4c99 entity_vm_runtime_create` | destroy path not fully named in the snippets here, but owner-resource destroy is known and runtime state/save-load consumers are well constrained | not a classic gameplay vtable family in the current notes, but method-style ownership and object fields are stable | object size and field zones strongly implied by `+0x10c/+0x10e`, `+0x117/+0x119`, `+0x1315/+0x1317` and related runtime state | Major lift target because VM readability is a blocker for recompilable source |
| `EntityVmOwnerResource` | High | File-backed helper owned by VM runtime that indexes source tables and materializes owner rows | helper object shape and per-entry loader contract are already tight | `000d:7000 entity_vm_runtime_owner_resource_create` allocates helper/object tables | paired destroy helper `000d:70fd entity_vm_runtime_owner_resource_destroy` is documented in related notes | helper method-table uses slots `+0x04` size-query and `+0x0c` materialization callback | helper-owned count `+0x14`, far-pointer table `+0x10`, paired word table `+0x18`, owner rows stride `0x0d` | One of the cleanest non-gameplay object families for typed struct work |
| `EntityVmContext` | Medium | Per-slot/per-entity VM context object built from runtime and owner-resource data | create/setup/load helpers already have clear ownership, but broader dispatch semantics are still active work | `000d:46ec entity_vm_context_create_from_slot_index` and related masked-create wrappers | no single destroy method is highlighted in the current note set used here | context-side dispatch and busy-state updates through virtual or callback-like method surface at least on the context object | stable fields include `+0x32/+0x34`, `+0xd6/+0xd8`, `+0x102`, `+0x10c/+0x10e`, `+0x11b/+0x11d`, `+0x123` | Important for VM readability, but should follow runtime and owner-resource typing |
| `CacheBackendObject` | Medium | Small backend/cache loader object with DOS file-handle state and method table | constructor and callback roles are already explicit | `0009:5600 cache_backend_object_init` allocates `0x20` bytes and seeds method-table state | no explicit destructor named in current note slice | backend callback roles at `+0x34` and `+0x0c` are verified in cache lookup/load path | concrete `0x20`-byte size; fields at `+0x08`, `+0x0c`, `+0x10`, `+0x14`, `+0x16`, `+0x18`, `+0x1c` | Good contained family for early datatype work |
| `PresentationCallbackBroker` | Low | Video/presentation-state callback broker rooted at `0x4588` | init/teardown/callback slot evidence is real, but subsystem naming remains intentionally conservative | `runtime_callback_object_init_once` family is documented, but not all constructor details are fully promoted here | `runtime_callback_object_teardown_once` and finalize path are explicit | vtable slots `+0x04`, `+0x08`, `+0x0c` all have live evidence | global state at `0x4588/0x458c/0x4590/0x4594/0x4595/0x45a6`; payload fields from caller objects at `+0x12d/+0x12f`, `+0x74f/+0x751` | Useful as a typed broker object later, but not a good first namespace/class pilot |
| `UsecodeDebuggerBreakState` | Medium | Dormant debugger-state object retained in retail binary | clear constructor and method expectations, but retail instantiation path is missing | `1408:0000 usecode_debugger_break_state_create` allocates and initializes the object | no direct destroy path highlighted in current notes | breakpoint gate callbacks through object vtable during `1408:0053` flow | internal tables and state exist, but field map is not yet summarized into one layout note | Good archival class candidate even if not a current gameplay priority |
## Recommended Modeling Order
If the goal is to make later class-authoring work fast and low-risk, the best order is:
1. `EntityDispatchEntryBase`
2. `EntityDispatchEntryRuntimeState`
3. `SpriteNode`
4. `EntityVmOwnerResource`
5. `CacheBackendObject`
6. `WatchEntityController`
7. `Entity`
8. `EntityVmRuntime`
9. `EntityVmContext`
10. `PresentationCallbackBroker`
This order prioritizes bounded families with visible constructors, derived variants, or explicit method tables before the larger gameplay and VM surfaces.
## First Pilot Candidates
### Best pilot: `EntityDispatchEntryBase`
Why it is the safest first full class lift:
- constructor variants are already named
- derived vtable variants are known
- multiple field groups have stable semantics
- destroy/release behavior is present
- it directly exercises the exact MCP features later needed: namespace creation, struct typing, method ownership, and vtable slot labeling
### Smallest bounded pilot: `CacheBackendObject`
Why it is attractive:
- explicit `0x20` size
- explicit create path
- explicit method-table callback roles
- smaller ambiguity surface than gameplay entities or the VM runtime
### Best UI-oriented pilot: `SpriteNode`
Why it is valuable:
- clear virtual event surface
- tree ownership and destructor logic already exist
- likely to benefit quickly from readable class/derived-method naming
## Per-Family Documentation Tasks
For each inventory entry, later class-upgrade work should produce the same minimum artifacts:
1. candidate namespace/class name
2. constructor and destructor list
3. instance-size estimate or known size
4. vtable root(s) and known slots
5. field map grouped by confidence
6. caller ownership notes
7. explicit `keep as free function` list for helpers that should not become methods
## Immediate Follow-Up Notes Worth Writing Later
- dedicated `EntityDispatchEntry` class-layout note with base/derived split
- dedicated `SpriteNode` virtual-slot table note
- dedicated `EntityVmRuntime` / `EntityVmOwnerResource` layout note
- dedicated `Entity` family split note covering base entity, projectile, debris, and corpse variants
## Current Rule Until MCP Catches Up
Do not rely on automatic class listings from the live MCP tools as the class-recovery source of truth. The current output is still noisy and does not reflect the actual game object families well enough for disciplined C++ lifting.
Use this inventory plus the linked evidence notes as the authoritative queue for future class-authoring work.

View file

@ -0,0 +1,94 @@
# Remorse Class-Lift Work Index
## Purpose
This note is the easy-to-find landing page for the current Remorse C++ and class-lifting preparation work.
Use it as the starting point when the project returns to:
- class and namespace authoring inside Ghidra
- vtable and instance-layout promotion
- hand-maintained C++ skeleton emission
- ABI-safe source reconstruction planning
This index does not replace the detailed notes. It groups them into one work order so later implementation can resume quickly.
## Read This First
1. [docs/remorse-cpp-decompilation-plan.md](docs/remorse-cpp-decompilation-plan.md)
2. [docs/remorse-class-candidate-inventory.md](docs/remorse-class-candidate-inventory.md)
3. [docs/remorse-rebuild-abi-notes.md](docs/remorse-rebuild-abi-notes.md)
4. [docs/ghidra-mcp-class-lifting-endpoint-spec.md](docs/ghidra-mcp-class-lifting-endpoint-spec.md)
That set gives the high-level target, the current candidate families, the rebuild constraints, and the future MCP authoring surface.
## Current Note Groups
### 1. Overall Direction
- [docs/remorse-cpp-decompilation-plan.md](docs/remorse-cpp-decompilation-plan.md): staged route from decompiler-style C toward evidence-backed C++.
- [docs/remorse-rebuild-abi-notes.md](docs/remorse-rebuild-abi-notes.md): Track A versus Track B guardrails, segmented-pointer concerns, packing, and calling-convention constraints.
- [docs/remorse-toolchain-fingerprint-evidence.md](docs/remorse-toolchain-fingerprint-evidence.md): focused evidence note for the bound NE/Phar Lap/High-C-related toolchain story that underlies the ABI constraints.
- [docs/remorse-cpp-compatibility-header-draft.md](docs/remorse-cpp-compatibility-header-draft.md): first draft of the compatibility/support layer that future C++ skeletons should target.
### 2. Candidate Inventory And Tooling Surface
- [docs/remorse-class-candidate-inventory.md](docs/remorse-class-candidate-inventory.md): strongest current class candidates and modeling order.
- [docs/ghidra-mcp-class-lifting-endpoint-spec.md](docs/ghidra-mcp-class-lifting-endpoint-spec.md): missing class/vtable/datatype authoring operations for the local MCP fork.
### 3. Family-Specific Layout Notes
- [docs/entity-dispatch-entry-class-layout.md](docs/entity-dispatch-entry-class-layout.md): current `EntityDispatchEntry` base-versus-derived model, release surface, and subtype overlays.
- [docs/sprite-node-class-layout.md](docs/sprite-node-class-layout.md): `SpriteNode` destructor/event surface and candidate virtual-slot map.
- [docs/entity-class-family-split.md](docs/entity-class-family-split.md): conservative split of the large `Entity` lane into base, projectile, debris, corpse/actor, and adjacent non-entity families.
- [docs/entity-vm-runtime-owner-resource-layout.md](docs/entity-vm-runtime-owner-resource-layout.md): current runtime/helper/context ownership model for the VM lane.
- [docs/presentation-callback-broker-layout.md](docs/presentation-callback-broker-layout.md): current object/lifecycle/vtable evidence for the `0x4588` presentation-state callback broker family.
### 4. Execution Checklists
- [docs/remorse-first-class-authoring-checklist.md](docs/remorse-first-class-authoring-checklist.md): concrete first-batch checklist for the initial Ghidra/MCP class-authoring pass, with pilot-family guidance and source-emission gates.
## Recommended Work Order
### Stage 1: Keep The Evidence Model Honest
1. Re-read the plan, ABI note, and candidate inventory.
2. Pick one family with bounded ambiguity.
3. Confirm ctor, dtor, vtable root, and stable field groups before any class ownership changes in Ghidra.
Best current pilot families:
1. `EntityDispatchEntry`
2. `SpriteNode`
3. `EntityVmOwnerResource`
`Entity` remains a top-priority family, but it should be split deliberately rather than promoted as one giant class too early.
### Stage 2: Ghidra Authoring Pass
1. Create class or namespace owners.
2. Move only strongly owned methods first.
3. Create provisional instance structs and vtable structs.
4. Preserve slot order and unresolved fields instead of trying to beautify them.
The future MCP endpoint sequence should follow the spec note rather than ad hoc scripting.
### Stage 3: First C++ Skeleton Slice
1. Emit one header/source pair for the pilot family.
2. Build it against the compatibility layer rather than raw host C++ alone.
3. Keep unresolved offsets as named placeholders instead of collapsing them into speculative semantics.
4. Record which parts are Track A safe versus Track B convenience-only.
## Immediate Follow-Ups Still Open
1. Convert the first family note into a hand-maintained C++ skeleton once the compatibility header is accepted.
2. Implement the MCP class/vtable authoring endpoints only after the workflow and note set above are stable enough to drive them.
3. Add one more dedicated note for the callback/object lane around `0x4588` only if later caller evidence supports a stronger subsystem name than `PresentationCallbackBroker`.
4. Turn the first-class-authoring checklist into a completed execution log once the first real MCP batch lands.
## Bottom Line
The current prep work is now large enough that it should be treated as one coordinated lane rather than scattered notes.
Use this file as the resume point for future class-lift and C++ reconstruction work.

View file

@ -0,0 +1,159 @@
# Remorse C++ Compatibility Header Draft
## Purpose
This note defines the minimum compatibility/support layer that future hand-maintained Remorse C++ skeletons should target.
It is not a final header and it is not a compiler lock-in decision. It is a draft contract so early class-emission work does not silently drift into host-only modern C++ assumptions.
This note should stay paired with [docs/remorse-rebuild-abi-notes.md](docs/remorse-rebuild-abi-notes.md).
## Why This Layer Is Needed
Current rebuild notes already make three constraints clear:
- Track A original-style executable reconstruction remains in scope
- the executable is shaped by segmented pointers and far-call conventions
- host compilation success is not enough if layout, slot order, or call shape drift
That means the first C++ skeletons should compile against a compatibility layer that makes these constraints explicit even before the exact compiler/toolchain decision is closed.
## Minimum Header Categories
### 1. Exact-width integer aliases
Needed because:
- field maps are being recovered by offset
- slot/value and save/load structures use exact-width arithmetic
- sign extension versus zero extension matters repeatedly in the current notes
Current draft surface:
```cpp
using u8 = std::uint8_t;
using s8 = std::int8_t;
using u16 = std::uint16_t;
using s16 = std::int16_t;
using u32 = std::uint32_t;
using s32 = std::int32_t;
```
### 2. Packing controls
Needed because:
- object and save-state layouts are currently tracked by exact offsets
- future vtable-bearing structs must not silently pick host-default padding
Current draft surface:
```cpp
#define CR_PACK_PUSH_1
#define CR_PACK_POP
#define CR_PACKED
```
The final spelling can change per compiler. The important part is that early source slices already depend on named packing controls rather than ad hoc pragmas in every file.
### 3. Calling-convention markers
Needed because:
- later method and helper promotion will need to distinguish ordinary helpers from call-shape-sensitive surfaces
- Track A may need explicit far/near or compiler-family-specific conventions
Current draft surface:
```cpp
#define CR_CDECL
#define CR_STDCALL
#define CR_THISCALL
#define CR_FARCALL
```
`CR_FARCALL` is intentionally placeholder-level today. The current value is in making far-call-sensitive surfaces visible in source, not in pretending the final compiler spelling is already known.
### 4. Segmented pointer helper types
Needed because:
- several notes still preserve segment:offset provenance directly
- owner-loaded VM source pairs and some callback/resource lanes should not be flattened too early
Current draft surface:
```cpp
struct FarPtr16 {
u16 offset;
u16 segment;
};
template <typename T>
struct FarPtr {
u16 offset;
u16 segment;
};
```
Current rule:
- use these first as evidence-preserving placeholders, not as proof that final emitted code must literally use this exact host representation everywhere.
### 5. Vtable and slot-order helpers
Needed because:
- current family notes still treat slot order as evidence, not cosmetics
- later MCP or hand-authored promotion should preserve raw order even when names are provisional
Current draft surface:
```cpp
#define CR_VSLOT(index)
```
This can stay documentation-only at first if needed. The point is to keep raw slot numbering visible in provisional class headers.
## First Skeleton Rules
When the first class family is emitted to source, the compatibility header should enforce these rules:
1. every field uses exact-width aliases
2. every packed layout is explicit
3. every unresolved far/segment-sensitive pointer uses a named placeholder type
4. every provisional virtual surface keeps raw slot order visible
5. Track A-only assumptions are marked instead of being silently baked into Track B-style cleanup
## Families Most Likely To Need This Immediately
1. `EntityDispatchEntry`
2. `SpriteNode`
3. `EntityVmRuntime`
4. `EntityVmOwnerResource`
5. `EntityVmContext`
`Entity` will also need it, but its family split is large enough that it should probably not be the first source-emission pilot.
## What This Draft Deliberately Avoids
- picking one final compiler family now
- pretending near/far semantics are already solved
- turning unresolved imported-runtime calls into polished modern interfaces
- flattening segment:offset provenance into generic host pointers too early
## Proposed Next Step
Once the first pilot family is chosen, convert this note into a real minimal header with:
- exact-width aliases
- packing markers
- calling-convention placeholders
- one segmented-pointer placeholder type
- one comment block explaining Track A versus Track B expectations
## Bottom Line
The compatibility header should exist before the first C++ skeleton is emitted, not after.
That keeps early source work honest and preserves the option of an original-style rebuild instead of quietly drifting into a pure host-port codebase.

View file

@ -0,0 +1,256 @@
# Remorse C++ Decompilation Plan
## Goal
Turn the current evidence-backed Remorse decompilation into understandable, maintainable C++ source that can eventually be rebuilt into a working executable.
The important constraint is that this should be treated as a staged lift, not a direct dump of Ghidra pseudocode into a compiler. The shortest path to a recompilable result is to recover the original object model deliberately: class ownership, instance layouts, vtables, calling conventions, segmented-pointer rules, resource formats, and subsystem boundaries.
## Short Answer: Can Ghidra Be Made More Class-Aware?
Yes, but only partially and mostly through explicit modeling.
Ghidra can already represent a lot of what we need:
- class and namespace symbols in the Symbol Tree
- structs and unions in the Data Type Manager
- vtable data and typed function pointers
- method ownership through namespaces/classes
- `this`-pointer style signatures when the calling convention and object layout are known
What it does not do well here is infer all of that automatically from a 16-bit DOS binary with mixed C/C++ patterns, custom memory conventions, and incomplete original type information. For this project, class recovery has to be evidence-driven.
## Why The Shift Is Justified Now
The current notes already contain repeated object-oriented evidence, not just loose procedural code:
- constructor-style helpers that allocate, stamp a vtable, and zero instance state
- destructor or teardown paths that restore a base vtable and free owned buffers
- stable indirect dispatch through known vtable slots
- controller, entity, sprite-node, VM-context, and resource-helper families with repeatable instance fields
- several class-like clusters that already have better behavioral names than generic `FUN_...` placeholders
That is enough to start building a real C++ object model rather than treating the entire program as flat C with random function pointers.
Useful evidence anchors already in the repo include:
- `docs/ne-segment1.md` for entity, projectile, dialog, and sprite-adjacent object lanes
- `docs/raw-0008-000c.md` for constructor families, vtable-backed dispatch entries, VM/runtime helpers, and stateful controller objects
- `docs/raw-000a-000d.md` for loader/resource families, callback brokers, and teardown-heavy object lanes
- `docs/raw-porting-progress.md` for callback-object evidence and cross-segment vtable dispatch patterns
- `docs/far-call-targets.md` for high-frequency ctor/dtor/vtable-slot helpers
## End State
The real target should be defined more tightly than `nice C++`:
1. major gameplay, rendering, UI, VM, and resource subsystems are expressed as named classes with understandable responsibilities
2. instance layouts and ownership rules are explicit enough that decompiled code stops depending on anonymous offset math for routine work
3. virtual dispatch is expressed through named methods or typed vtable tables rather than raw slot offsets
4. the source can be rebuilt with a documented toolchain into a working executable or an equivalent working runtime target
5. the rebuilt result is validated by behavior, not by cosmetic similarity to decompiler output
## Working Assumption About The Rebuild Target
There are two plausible endgames, and the plan should keep them separate from the start:
### Track A: Original-style executable rebuild
Rebuild a DOS executable that preserves the segmented-memory model, calling conventions, packed layouts, and resource/file expectations closely enough to run the original game data.
This is the harder but most direct historical target. It likely depends on recovering or emulating:
- the original or closest-possible compiler model
- near/far pointer conventions
- packed struct layout and enum sizes
- startup/runtime integration with the Phar Lap environment or an equivalent replacement layer
### Track B: Behaviorally equivalent source port
Rebuild the game logic in modern C++ while preserving data formats and behavior, but not necessarily the original binary ABI.
This is often the faster path to a working recompiled game, but it is a different goal. If the project wants a true executable reconstruction rather than an engine rewrite, Track A has to remain the primary constraint.
For now, the safest planning stance is: recover source in a way that keeps both tracks open for as long as possible.
## Recommended Strategy
### Phase 0: Treat Ghidra As The Truth Database
Use Ghidra as the canonical place where recovered class ownership, vtable slots, field layouts, and method names live.
That means pushing beyond flat rename work into:
- class namespaces for object families
- typed instance structs
- typed vtable structs where the slots are stable enough
- method names that distinguish static helpers from instance methods
- explicit comments recording why a family is believed to be one class and not just one subsystem
### Phase 1: Recover The Object Model Before Chasing Pretty Output
Prioritize families that already have strong OO evidence.
Best early targets:
1. entity families in `seg001` and the raw/live `0007` lanes
2. dispatch-entry / controller objects in `0008` and `000c`
3. sprite-node and UI/menu object families
4. VM runtime, context, owner-resource, and loader helpers
5. callback/resource broker objects around `0x4588`
For each candidate class family, the minimum closure should be:
- candidate class name
- constructor and destructor candidates
- instance size estimate
- confirmed or suspected vtable base
- known slot-to-method map
- field map with confidence levels
- inbound callers that prove object lifetime or ownership
### Phase 2: Separate Methods From Free Functions
Not every helper touching an object should become a class method.
The conversion rule should be conservative:
- make it a method when the object pointer is clearly the owner, the function acts on instance state, and the function participates in the class lifecycle or virtual surface
- keep it free or subsystem-local when it behaves like a pure helper, allocator utility, serializer, or cross-object coordinator
This matters because over-classing weak evidence will make the source look cleaner while actually reducing correctness.
### Phase 3: Build Stable Type Layers
Before broad C++ emission, define a small number of disciplined type layers:
- ABI layer: exact-width integers, near/far pointer wrappers, packed structs, fixed calling-convention macros
- runtime layer: allocators, file/resource handles, callback tables, event records, dispatch entries
- gameplay layer: entities, actors, projectiles, triggers, controller objects, UI nodes
- VM layer: runtime/context/owner-resource classes, opcode streams, slot/value helpers
The source should compile against these types first, even if some methods still contain low-level or ugly code.
### Phase 4: Land Recompilable C++ In Vertical Slices
Do not wait for the whole game to be class-clean before testing compilation.
Instead, move in subsystem slices:
1. one object family
2. its structs and vtable
3. its constructors/destructors
4. a handful of live methods
5. a compile test for that slice
This is the only realistic way to find layout or calling-convention mistakes early.
### Phase 5: Add Runtime Validation Harnesses
A source-level recompile effort will fail if verification is only manual.
Needed validation layers:
- map/resource load smoke tests
- deterministic startup path checks
- function-level trace comparisons for selected hot methods
- data-layout assertions on recovered structs
- script/VM behavior checks where extracted USECODE already gives a second evidence source
### Phase 6: Choose The First Real Rebuild Milestone
The first meaningful source milestone should not be `whole game builds`.
A better first milestone is one of these:
1. compile a library that matches one major subsystem ABI and can run against fixture data
2. rebuild the startup/resource path far enough to load into a title/menu state
3. rebuild one contained gameplay loop such as entity allocation/update/teardown with equivalent traces
## Ghidra/MCP Gaps That Matter For This Plan
The local MCP fork already gives enough read/query power to continue class recovery, but it is still missing key authoring operations for a serious C++ lift:
- create class or namespace symbols through MCP
- move existing functions under class ownership cleanly
- create or update struct and vtable datatypes through MCP
- set `this`-pointer types and method signatures systematically
- analyze a candidate vtable and bind slots to named methods in one operation
Those gaps have been added to `ghidra_mcp_wishlist.md` in this batch.
## First Concrete Work Batches
The most defensible first batches are small and structural.
### Batch 1: Class Inventory Pass
Build a repo-side inventory of the strongest current class candidates:
- class family name
- addresses for ctor/dtor/vtable roots
- known methods
- instance-size estimate
- notes/doc references
### Batch 2: One Fully Modeled Family
Pick one family with low ambiguity and carry it through end to end inside Ghidra and the notes:
- class namespace
- method ownership
- instance struct
- vtable struct
- method-slot table
- short rationale note
Good initial candidates are the `entity_dispatch_entry_*` family, the sprite-node family, or one compact controller object family.
### Batch 3: C++ Skeleton Output
Emit one hand-maintained C++ header/source pair for that family with:
- exact-width field placeholders
- named methods
- comments for unresolved fields or slot semantics
- enough type discipline that the code could later be compiled under a chosen toolchain
### Batch 4: Toolchain Recon
Establish the most credible compile target and constraints early:
- likely original compiler family or nearest substitute
- calling convention spelling
- memory-model requirements
- struct packing behavior
- import/library expectations
Without this, the source can drift into modernized C++ that reads well but cannot realistically rebuild the game.
## What To Avoid
- Do not mass-convert procedural helpers into methods just to make the output look object-oriented.
- Do not let Ghidra pseudocode naming outrun field-layout evidence.
- Do not assume modern C++ ABI rules match the original compiler.
- Do not mix `behaviorally equivalent port` goals with `original-style executable rebuild` claims in the same milestone.
- Do not wait for perfect global understanding before compiling anything.
## Immediate Next Steps
1. add the missing class/namespace and vtable-authoring MCP endpoints to the local fork when ready
2. make a `class candidate inventory` note from the strongest existing families in the current docs
3. choose one family and model it all the way through as a pilot C++ class
4. decide whether the primary rebuild constraint is original-style DOS/NE compatibility or a behaviorally equivalent C++ port
5. define the first compile/test harness before broad source emission starts
## Success Criteria For This Plan
This plan is working if, after a few batches, the project has all of the following:
- at least one real class family fully modeled in Ghidra and mirrored in source
- repeatable rules for when a function becomes a method
- repeatable rules for vtable and field-layout evidence
- a documented compile target with ABI constraints
- a narrow but real compilation/validation loop
If those do not exist, the project is still doing useful reverse engineering, but it has not yet truly shifted into a recompilable C++ decompilation lane.

View file

@ -0,0 +1,139 @@
# Remorse First Class-Authoring Checklist
## Purpose
This note turns the current class-lift preparation into a concrete execution checklist for the first real Ghidra/MCP authoring batch.
It is intentionally operational. The goal is to remove startup cost once the necessary MCP class/vtable/datatype tools exist.
## Recommended First Pilot Order
Current best pilot ranking:
1. `EntityDispatchEntry`
2. `SpriteNode`
3. `EntityVmOwnerResource`
Why this order:
- `EntityDispatchEntry` has strong ctor/release and field-group evidence
- `SpriteNode` has a compact, visible virtual surface
- `EntityVmOwnerResource` is structurally bounded but still slightly more loader-schema-sensitive
## Batch 0: Preconditions
Before touching class ownership in Ghidra:
1. re-read [docs/remorse-class-lift-index.md](docs/remorse-class-lift-index.md)
2. re-read the target family note
3. confirm the ABI note and compatibility-header draft are still aligned with the intended source track
4. confirm the MCP tool surface actually supports namespace/class creation, datatype authoring, and method ownership changes
Do not start with authoring if any of those are still missing.
## Batch 1: Authoring Rules
These rules should hold for the first pilot family regardless of which family is chosen:
1. move only strongly owned methods first
2. preserve raw slot order in provisional vtables
3. keep uncertain fields explicit as padding or unknown words/bytes
4. prefer conservative owner names over polished but speculative subsystem names
5. add provenance comments when a layout or slot label is imported from an earlier note rather than re-derived in-session
## Pilot A: `EntityDispatchEntry`
Use [docs/entity-dispatch-entry-class-layout.md](docs/entity-dispatch-entry-class-layout.md) as the source note.
### Minimum Ghidra/MCP deliverables
1. create owner namespace/class for the base family
2. create provisional instance struct for the base layout
3. create provisional vtable datatype with stable known slots only
4. move base ctor/release methods under that owner
5. keep subtype/overlay-specific methods outside the base owner until the split is tighter
### First methods to move
- base ctor at `0008:ba00`
- release/destroy surface at `0008:dbec`
- runtime-state init/release pair `000d:7e00` / `000d:8078` only if the owner relationship is still judged direct in-session
### Fields that should stay visible immediately
- type and subtype-related words near `+0x04/+0x06/+0x08`
- active/hold/runtime flags around `+0x16/+0x18`
- runtime-state and paired resource lanes around `+0x3c/+0x40`
### Things to avoid in the first pass
- flattening every derived constructor into one base class
- forcing final names for every slot in the vtable
- pretending all runtime-state fields belong to the same subtype
## Pilot B: `SpriteNode`
Use [docs/sprite-node-class-layout.md](docs/sprite-node-class-layout.md) as the source note.
### Minimum Ghidra/MCP deliverables
1. create owner namespace/class for `SpriteNode`
2. create provisional vtable with only the slot order and best current role labels
3. move destructor and event-dispatch surface under that owner
4. create instance struct with the known layout anchors and visible unknown regions
### First methods to move
- `000b:326e sprite_node_destroy`
- `000b:3ab2 sprite_node_dispatch_event`
- `000b:3380 sprite_node_is_dirty`
- `000b:33a6 sprite_node_mark_dirty`
### Things to avoid in the first pass
- pushing traversal helpers into the class just because they walk the tree
- overcommitting to child/sibling field semantics beyond the already-noted anchors
## Pilot C: `EntityVmOwnerResource`
Use [docs/entity-vm-runtime-owner-resource-layout.md](docs/entity-vm-runtime-owner-resource-layout.md) as the source note.
### Minimum Ghidra/MCP deliverables
1. create helper class owner
2. create compact helper instance struct
3. create provisional method table for the `+0x04` and `+0x0c` helper callbacks
4. move create/destroy pair under that owner
### Things to avoid in the first pass
- baking in final file-family schema labels
- collapsing the helper into the runtime object as anonymous fields
## Source-Emission Readiness Gate
Do not emit the first C++ skeleton immediately after Ghidra authoring.
Only emit when all of these are true:
1. owner namespace/class exists
2. provisional instance struct exists
3. provisional vtable exists when relevant
4. top 3-5 strongly owned methods are moved
5. unresolved fields remain explicit instead of silently renamed away
6. Track A versus Track B assumptions are recorded for that family
## Success Criteria For The First Real Batch
The first class-authoring batch is successful if:
1. one family becomes visibly easier to navigate in Ghidra
2. method ownership is improved without speculative over-grouping
3. instance layout is more explicit than raw offset math
4. the result can drive one hand-maintained C++ header/source skeleton with the compatibility layer
## Bottom Line
The first class-authoring batch should be judged by structural clarity, not by how polished or object-oriented the output looks.
`EntityDispatchEntry` remains the best first pilot because it offers the highest ratio of stable object evidence to remaining subtype ambiguity.

View file

@ -0,0 +1,212 @@
# Remorse Rebuild ABI Notes
## Purpose
This note records the current ABI, memory-model, and toolchain constraints that should shape any future Remorse source reconstruction.
The class-lifting notes answer `what the objects probably are`.
This note answers `what the rebuilt source must still respect if it aims to become a working executable rather than only readable C++`.
## Current Baseline
The live target is not a flat modern Win32 program.
Current verified binary facts:
- DOS target
- 16-bit protected-mode environment
- Phar Lap 286 DOS-Extender (`RUN286`)
- bound `MZ -> NE` executable
- heavy use of inter-segment and external `CALLF` fixups
That means the default safe assumption is:
- segmented code/data model matters
- near/far calls matter
- pointer width and calling convention details matter
- loader/runtime expectations matter
## Hard Constraints Already Visible In The Binary
### 1. Segmented addressing is real, not presentation noise
Evidence:
- executable format is `MZ -> NE`
- raw import behavior collapses unresolved calls to `0000:ffff` until NE fixups are applied
- repaired raw import had thousands of internal literal `CALLF` sites patched to real segment:offset targets
- the notes repeatedly distinguish far pointers, segment:offset storage, and per-segment relocation behavior
Practical implication:
- a rebuild target that ignores far calls and far data pointers too early will drift away from the original executable model
### 2. Function boundaries and external calls are loader-sensitive
Evidence:
- `CALLF 0000:ffff` is a placeholder used by the NE loader for real inter-segment/external targets
- unresolved far thunk behavior in raw import is explicitly not a real dispatcher
Practical implication:
- source emission must preserve which calls are logically intra-object methods and which ones are ABI-significant far calls or imported runtime/library calls
### 3. Runtime/library layer is not trivial glue
Evidence:
- large Phar Lap runtime/extender segments remain part of startup and low-level system behavior
- CRT wrappers and formatter/runtime helpers are explicitly identified
- MetaWare High C formatting/runtime wrappers are present in the notes
Practical implication:
- the original or near-original compiler/runtime environment matters enough that `just compile with a modern compiler` is not a safe early assumption for an original-style rebuild
### 4. Object layout is tightly coupled to exact field offsets
Evidence:
- major gameplay and UI families are still being recovered by exact offsets
- VM/runtime helpers, dispatch entries, and entity families all depend on stable field positions
Practical implication:
- class lifting must preserve packed layout discipline and exact-width integer choices from the start
## Current Best Toolchain Read
This is still a working model, not a closed historical claim.
### High-confidence environment facts
- DOS protected mode under Phar Lap 286 extender
- NE executable image
- runtime/CRT evidence compatible with MetaWare High C presence in at least part of the binary toolchain story
### What remains open
- exact original compiler version
- exact memory-model flags used for all modules
- exact calling-convention mapping for each object family
- exact linker/build recipe needed to reproduce compatible NE output
## Recommended Rebuild Tracks
### Track A: Original-style executable reconstruction
If the goal is to rebuild something close to the shipped executable model, the source must preserve:
- segmented pointer distinctions
- explicit near/far calling boundaries where needed
- exact struct packing
- compatible CRT/runtime assumptions
- executable/resource layout expectations
This is the stricter track.
### Track B: Behaviorally equivalent source port
If the goal is instead a working engine/game rebuild using the original data with equivalent behavior, then the source can relax some ABI constraints later.
But even on this track, the early reverse-engineering output should still preserve ABI facts long enough that the project can make an informed choice instead of accidentally forcing itself into a port.
## Source-Level Rules To Adopt Early
Any future generated or handwritten code should default to these constraints:
### Integer widths
- use explicit fixed-width integer types everywhere possible
- do not use plain `int`, `long`, or compiler-default enum width as semantic types in the first pass
### Layout control
- keep a visible packing strategy for recovered structs
- record uncertain padding explicitly rather than letting the compiler invent it silently
### Pointer model
- keep far-pointer distinctions visible in the type system or wrapper layer
- do not immediately collapse all pointers to one flat host pointer type if Track A remains in scope
### Calling conventions
- keep calling convention annotations explicit in working notes and emitted skeletons
- do not assume one modern host calling convention is an adequate stand-in for every recovered method or helper
### Virtual dispatch
- preserve raw slot order in provisional vtable types
- do not rename or reorder slots to look cleaner before the mapping is stable
## Candidate ABI Support Layer
The first C++ source slices should probably compile against a small compatibility layer rather than raw host C++ alone.
Current likely categories:
- exact-width integer typedefs
- far/near pointer wrappers or placeholder abstractions
- packing macros or pragmas
- calling-convention macros
- segmented address helper types for debugging and trace comparison
- imported runtime service shims for file, memory, and platform calls
## Immediate Compiler/Runtime Questions To Close Later
These are the most useful next ABI questions for the repo:
1. Which compiler/runtime signatures in the binary most strongly identify the original toolchain family and version?
2. Which current methods clearly require far-call semantics even after class lifting?
3. Which object families can safely be emitted as host-side plain structs first, and which still need explicit segmented-pointer wrappers?
4. What is the narrowest executable milestone that can validate calling conventions and struct layout before whole-program reconstruction is attempted?
## Practical Risk List
### Risk: pretty C++ that cannot rebuild the game
Cause:
- class lifting done without ABI discipline
Mitigation:
- keep this note paired with the class-layout notes and require exact-width/packing/calling-convention placeholders in early skeletons
### Risk: false confidence from host compilation success
Cause:
- code compiles under a modern compiler but no longer matches segmented runtime behavior
Mitigation:
- define compile success and behavioral/ABI success as separate milestones
### Risk: loss of far-call/import provenance
Cause:
- unresolved thunk placeholders or loader-patched calls get flattened into generic helper names
Mitigation:
- preserve call provenance in notes and later exports, especially for methods that only look local after fixup repair
## Recommended Near-Term Documentation Follow-Ups
1. collect all current compiler/runtime fingerprints into one evidence note
2. add an `ABI concerns` section to future class-layout notes when a family uses far pointers or segmented ownership directly
3. draft the first minimal compatibility header for future C++ skeletons once the first class family is selected for source emission
## Current Bottom Line
The project is now documented well enough to start class lifting, but not well enough to safely emit `clean modern C++` without guardrails.
The safest present rule is:
- keep object recovery aggressive
- keep ABI assumptions conservative
- keep Track A and Track B separate in every future source milestone

View file

@ -0,0 +1,175 @@
# Remorse Toolchain Fingerprint Evidence
## Purpose
This note gathers the strongest current compiler, runtime, loader, and ABI fingerprints that matter for eventual source reconstruction.
It exists to answer one narrow question better than the broader ABI note:
- what concrete binary evidence currently supports the working toolchain and executable-model assumptions?
This note should stay paired with [docs/remorse-rebuild-abi-notes.md](docs/remorse-rebuild-abi-notes.md).
## High-Confidence Executable Model Evidence
### 1. Bound `MZ -> NE` executable
Strong anchors from [docs/overview.md](docs/overview.md):
- outer DOS header is `MZ`
- `e_lfanew = 0x36F70`
- internal header at `0x36F70` is `NE`
- internal NE image describes `145` segments
Why it matters:
- the game is not a flat DOS EXE with incidental overlays
- the executable model already assumes segmented protected-mode program structure
### 2. Phar Lap 286 DOS extender
Strong anchors from [docs/overview.md](docs/overview.md) and [docs/phar-lap-extender.md](docs/phar-lap-extender.md):
- executable is documented as using Phar Lap 286 DOS-Extender (`RUN286`)
- major code regions are extender/runtime segments rather than game logic segments
- named loader path includes `init_dos_extender`, `load_executable_image`, `apply_relocations`, and child-transfer helpers
Why it matters:
- the startup/runtime environment is part of the program contract, not an afterthought
- Track A reconstruction must preserve this loader/executable-model reality or replace it deliberately
### 3. Runtime-patched far-call model
Strong anchors from [docs/overview.md](docs/overview.md):
- unresolved inter-segment and external calls appear in raw import as `CALLF 0000:ffff`
- those are NE loader fixup placeholders, not one dispatcher
- repaired raw import already patched `8851` internal literal far-call sites to real targets
Why it matters:
- far-call provenance is real ABI evidence
- any future source lift has to preserve which edges are ordinary local methods versus loader-significant far calls/imports
## Runtime / CRT Fingerprints
### 1. Phar Lap runtime strings
Strong anchors from [docs/phar-lap-extender.md](docs/phar-lap-extender.md):
- `13fc:0016` = `$Id: comhighc.c 1.1 91/08/06...`
- `13fc:0048` = `$Id: comutils.c 1.1 91/08/06...`
- `1760:665c` = `Copyright (C) 1986-93 Phar Lap Software, Inc.`
- `1760:73da` = `-LDTSIZE 4096 -EXTHIGH D0_0000h -NI 18 -ISTKSIZE 3`
Current safest interpretation:
- Phar Lap runtime/source provenance is directly embedded in the binary
- `comhighc.c` is the strongest current fingerprint tying part of the runtime story to High C-related runtime material
### 2. Protected-mode service and memory helpers
Strong anchors from [docs/phar-lap-extender.md](docs/phar-lap-extender.md):
- DPMI/interrupt wrappers in segment `1339`
- EMS management in segment `1677`
- task switching and child-process execution paths in `10da` and `1760`
Why it matters:
- the executable depends on a real protected-mode runtime layer with memory and interrupt service expectations
- this makes `modern compiler output that merely compiles` a weak reconstruction milestone by itself
## Binary-Structure Fingerprints That Affect Source Emission
### 1. Segmented address layout is visible throughout analysis
Strong anchors from [docs/overview.md](docs/overview.md):
- raw address model uses `SSSS:OOOO`
- game code begins only after the Phar Lap loader region
- notes repeatedly distinguish extender segments from NE gameplay segments
Implication:
- source that immediately collapses every pointer and call edge into one flat host model loses verified structure too early
### 2. Loader-sensitive call repair already affects function understanding
Strong anchors from [docs/overview.md](docs/overview.md):
- callsites had to be repaired before large parts of the raw import became meaningful
- inter-segment and external targets are encoded through relocation records, not fixed immediate addresses in the raw bytes
Implication:
- future class lifting should preserve import/far-call comments or metadata, especially for methods that only look local after fixup repair
## Working Compiler Story: What Is Safe And What Is Not
### Safe now
- Phar Lap 286 protected-mode DOS environment is real
- NE segmented executable model is real
- runtime strings directly reference `comhighc.c` and `comutils.c`
- the broad toolchain story includes Phar Lap runtime material and High C-related runtime evidence
### Not safe to claim as closed yet
- exact original compiler version for every module
- exact linker flags for the game NE image
- exact near/far defaults and calling-convention flags used by all gameplay modules
- exact rebuild recipe needed for a compatible historical executable
## Evidence Strength By Question
### Question: does segmented ABI discipline matter?
Answer:
- yes, strongly supported
Why:
- NE format, loader-patched far calls, and segment-separated code organization all point the same way
### Question: is a High C-related runtime story real or speculative?
Answer:
- real at the runtime-fingerprint level, still incomplete at the full-build-chain level
Why:
- `comhighc.c` string evidence is concrete
- full per-module compiler attribution is not yet closed
### Question: can Track A and Track B still share the same early source work?
Answer:
- yes, if early source keeps exact widths, packing, far-call provenance, and segmented-pointer placeholders visible
## What This Means For Future Real Work
When MCP class tools are ready or when hand-written skeletons start, these fingerprints should drive the rules:
1. keep exact-width aliases mandatory
2. keep packing explicit
3. keep segmented-pointer or far-pointer placeholders available
4. keep calling-convention markers visible even when still provisional
5. keep far-call/import provenance attached to lifted methods where it matters
## Highest-Value Remaining Fingerprint Questions
1. collect more direct CRT/helper signatures that distinguish Phar Lap runtime pieces from gameplay-generated code
2. identify which recovered object families most clearly cross near/far ownership boundaries
3. isolate functions whose call shape strongly suggests non-default calling conventions
4. determine the smallest rebuild slice that can test layout and call discipline before whole-program ambitions
## Bottom Line
The current toolchain story is strong enough to justify ABI-conservative source emission rules.
The safe working model remains: Phar Lap protected-mode DOS, bound `MZ -> NE` executable, loader-patched far-call environment, and a real High C-related runtime fingerprint that is informative but not yet the entire historical build recipe.

View file

@ -138,8 +138,8 @@ The broader Remorse catalog does contain more unused-looking or obviously non-re
### Placeholder cube family
- `0x0251` no longer fits the generic placeholder bucket well. Current best read is `VALUEBOX`, a local data/helper box used by monitors, watcher panels, security displays, and keypads.
- `0x0318` also no longer belongs in the generic placeholder-cube bucket. The older Remorse catalog label `PLACEHOLDER_CUBE` is now weaker than the extracted usecode evidence: both Remorse and Regret name class `0x0318` as `CRUMORPH`, and the recovered `equip` body is a control-transfer pad that compares local `QLo` against mutable actor field `0x63` before bracketing `TRIGGER.slot_20`.
- The strongest remaining placeholder-cube entries from the catalog/scene pass are:
- `0x0318` = `PLACEHOLDER_CUBE`
- `0x0337` = `PLACEHOLDER_CUBE_BIG`
- `0x0361` = `PLACEHOLDER_CUBE_RED_BLACK`
- Cached Remorse scenes place these on multiple maps, so they are not catalog-only artifacts.

View file

@ -0,0 +1,165 @@
# SpriteNode Class Layout
## Purpose
This note captures the current class-level read of the `SpriteNode` family so later Ghidra class work can move quickly and conservatively.
Compared with `EntityDispatchEntry`, this family has a cleaner explicit virtual/event-dispatch surface and a more bounded ownership model. That makes it an excellent second pilot for class lifting.
## Current Best Class-Level Read
`SpriteNode` is a tree-based render/UI node family with:
- child-chain ownership
- accumulated position and bounds propagation
- dirty-state tracking
- central event dispatch through a small virtual surface
- destructor ownership over child nodes and global focus state
This already looks much closer to an ordinary C++ object family than many gameplay-side structures.
## Strongest Evidence Anchors
### Destructor
#### `000b:326e` `sprite_node_destroy`
Current best read:
- destructor-style path
- sets vtable ptr to `0x501a`
- clears global focus pointer `[0x4fd0:0x4fd2]` if `self`
- releases child nodes
- frees object memory through `mem_free`
This is the strongest current single proof that `SpriteNode` should be lifted as an owned object family.
### Event dispatch
#### `000b:3ab2` `sprite_node_dispatch_event`
Current best read:
- large event dispatch switch
- checks event types `2/4/8/0x100`
- updates global focus pointer at `[0x4fd0:0x4fd2]`
- dispatches through virtual slots `+0x14`, `+0x18`, `+0x20`, `+0x24`
This is the strongest current proof of a stable virtual method surface.
### Dirty/update family
- `000b:3380 sprite_node_is_dirty`
- `000b:33a6 sprite_node_mark_dirty`
- `000b:40ee sprite_node_update_and_dispatch`
These show that node state changes and redraw/update dispatch are methods on the same family, not just free helper functions wandering across unrelated data.
### Recursive tree helpers
- `000a:b988 sprite_node_get_or_traverse`
- `000b:358d sprite_tree_accumulate_pos`
- `000b:3a00 sprite_tree_sum_x_offset`
- `000b:3a35 sprite_tree_sum_y_offset`
These are strong evidence for a child-linked tree object with inherited coordinate accumulation.
## Candidate Layout
This layout is intentionally conservative.
| Offset | Current name | Confidence | Current meaning |
|---|---|---|---|
| `+0x19/+0x1b` | `child_or_next_ptr` | High | Child-chain pointer pair used by recursive traversal and offset accumulation. |
| `+0x21` | `local_x_offset` | High | Summed by `sprite_tree_sum_x_offset` / accumulate helpers. |
| `+0x23` | `local_y_offset` | High | Summed by `sprite_tree_sum_y_offset` / accumulate helpers. |
| `+0x29` | `dirty_flags` | High | Checked by `sprite_node_is_dirty`; manipulated by mark/update paths. |
| `+0x17e` | `redraw_flag` | Medium | Cleared by `sprite_clear_redraw_flag`; likely a subtype or larger-object field tied to one SpriteNode family variant. |
## Layout Caveat
The current notes likely mix a compact core `SpriteNode` with one or more larger derived UI/display objects. The evidence for `+0x17e` strongly suggests there are bigger family members or wrapper objects in the same virtual ecosystem.
So the safe future modeling strategy is:
- define a small `SpriteNodeBase`
- keep larger UI/display fields in derived or sibling structs until more offsets are closed
## Candidate Method Map
### Strong instance methods
| Address | Current function | Candidate method role |
|---|---|---|
| `000b:326e` | `sprite_node_destroy` | `Destroy()` |
| `000b:3380` | `sprite_node_is_dirty` | `IsDirty()` |
| `000b:33a6` | `sprite_node_mark_dirty` | `MarkDirty()` |
| `000b:3ab2` | `sprite_node_dispatch_event` | `DispatchEvent()` |
| `000b:40ee` | `sprite_node_update_and_dispatch` | `UpdateAndDispatch()` |
| `000a:b988` | `sprite_node_get_or_traverse` | `GetOrTraverse()` |
### Strong family-local helpers that may remain free/static
| Address | Current function | Why it may stay non-method |
|---|---|---|
| `000b:3a00` | `sprite_tree_sum_x_offset` | Pure recursive accumulation helper; method status depends on later decompile readability. |
| `000b:3a35` | `sprite_tree_sum_y_offset` | Same as above. |
| `000b:330c` | `sprite_tree_dispatch_wrapper` | Looks like a pure thunk wrapper rather than a meaningful source-level method. |
| `000b:3362` | `sprite_tree_unwind_check` | Stack-segment guard helper; probably not worth presenting as a class method. |
## Candidate Virtual Slot Map
The currently verified slots are already good enough for a first typed vtable.
| Slot offset | Current best role | Evidence |
|---|---|---|
| `+0x14` | event handler A | `sprite_node_dispatch_event` dispatches here for one event class |
| `+0x18` | event handler B | same dispatcher |
| `+0x20` | event handler C | same dispatcher |
| `+0x24` | event handler D | same dispatcher |
The seg091 default-slot helpers are also useful evidence:
- `000a:7b44`, `000a:7b49`, `000a:7b53`, `000a:7b4e`, `000a:7b78`, `000a:7b7d`, `000a:7b30`, `000a:7b3f`, `000a:7b35`, `000a:7b3a`
- `000a:7b58` returns zero and behaves like a default no-op boolean slot
- `000a:7b5f` is a forwarding trampoline slot
These likely belong to one or more shared/default node vtables and should be preserved as vtable evidence even if they never become pretty source-level methods.
## Ownership And Global State
### Focus/global state
Global focus pointer `[0x4fd0:0x4fd2]` is updated in the dispatch family and cleared in the destructor.
That gives the family a real interaction with global UI focus/state, but the key point for class work is simpler:
- focus ownership is tied to the node family itself
- this is not just an arbitrary free helper changing global UI state
### Child ownership
The destructor and recursive sum/traverse helpers strongly suggest real child ownership or at least managed child linkage.
That means later class modeling should preserve a node/tree mental model rather than flattening everything into stand-alone display items.
## Candidate Ghidra Modeling Plan
When class authoring begins, the safest sequence for this family is:
1. create class namespace `SpriteNode`
2. move `Destroy`, `IsDirty`, `MarkDirty`, `DispatchEvent`, `UpdateAndDispatch`, and `GetOrTraverse` first
3. create minimal `SpriteNodeBase` struct with the stable offsets around `+0x19`, `+0x21`, `+0x23`, and `+0x29`
4. create provisional vtable with slots `+0x14`, `+0x18`, `+0x20`, `+0x24`
5. keep recursive tree helpers outside the class until decompiler output shows they benefit from becoming methods
## Open Questions
- exact root vtable address or addresses for the main SpriteNode family
- whether the `+0x17e` redraw flag belongs to a derived display node rather than the compact base node
- which event-code cases map to which slot semantically beyond the current `A/B/C/D` placeholder naming
- whether `sprite_tree_accumulate_pos` should become a class method, a static helper, or a separate geometry utility
## Immediate Follow-Up Value
The most useful next companion work after this note is not more sprite detail by itself. It is the rebuild-ABI note, because once the first few class families are documented this well, the next real risk is drifting away from the original memory and calling-convention model before any code is emitted.