Stuff
This commit is contained in:
parent
ee33f94b4b
commit
f92d1504fa
547 changed files with 37597 additions and 0 deletions
401
docs/retail-debug-arg.md
Normal file
401
docs/retail-debug-arg.md
Normal file
|
|
@ -0,0 +1,401 @@
|
|||
# Retail `-debug` Argument in `CRUSADER.EXE`
|
||||
|
||||
This note records the current evidence-backed read of the retail `-debug` command-line switch in the live NE `CRUSADER.EXE` database.
|
||||
|
||||
Short version:
|
||||
- retail `CRUSADER.EXE` does recognize `-debug`
|
||||
- the switch is not fully disabled in the parser
|
||||
- it sets a global debug-print threshold and two debug-related globals
|
||||
- one surviving `-debug` feature is now concretely identified as an AVI/video-player timing overlay
|
||||
- current evidence does **not** show it constructing or wiring the hidden seg109/seg1408 usecode debugger state object at `1478:659c/659e`
|
||||
- current best read is `debug-print threshold plus movie-playback timing overlay`, not `the lost hidden debugger bootstrap`
|
||||
|
||||
## Verified Parser Evidence
|
||||
|
||||
In the live NE image, `HandleCommandlineArgs` at `1048:09b9` contains a real branch for `"-debug"` at `1048:0a93`.
|
||||
|
||||
The branch body does all of the following:
|
||||
- writes `0x000a` to `1478:87e0` (`g_debugMsgLevel`)
|
||||
- calls `ConsolePrintf(0x32, 1478:0ad6)`
|
||||
- writes `1` to `1478:0845` (`g_someDebugFlag`)
|
||||
- writes `1` to `1478:0859` (`g_someDebugFlag2`)
|
||||
|
||||
So the parameter has exactly three direct state effects currently recovered:
|
||||
- it changes the global print threshold
|
||||
- it sets one still-unresolved debug latch at `1478:0845`
|
||||
- it enables the seg1468 video timing overlay through `1478:0859`
|
||||
|
||||
The printed string at `1478:0ad6` is:
|
||||
|
||||
`Debugging mode ON.`
|
||||
|
||||
That is strong evidence against the narrow claim that retail only still recognizes the string but has the feature body disabled. The retail parser still executes real side effects when `-debug` is present.
|
||||
|
||||
Nearby startup strings in the same table also confirm this is the normal command-line switch cluster:
|
||||
- `CRUSADER: No Remorse`
|
||||
- `Cheats are active.`
|
||||
- `You DO need help!`
|
||||
- `Debugging mode ON.`
|
||||
- `Enabling ENHANCED mode. (NOT!)`
|
||||
|
||||
## What Still Reacts To `-debug`
|
||||
|
||||
### 1. Debug message threshold is live
|
||||
|
||||
`1478:87e0` is already symbolized as `g_debugMsgLevel` in the live export.
|
||||
|
||||
Data-use recovery shows it is read by:
|
||||
- `ConsolePrintf`
|
||||
- `DebugPrintAndWaitForInput`
|
||||
- two adjacent positioned debug-print helpers in the same family (`12d0:0391`, `12d0:0442`)
|
||||
|
||||
Current best read:
|
||||
- `-debug` sets the runtime print threshold to `10`
|
||||
- the compare in the wrappers is effectively `if (call_level < g_debugMsgLevel) skip`
|
||||
- so `-debug` does **not** create a new console sink by itself; it only changes which existing callsites are eligible to print
|
||||
- the shared print staging buffer at `1478:45a6` is allocated independently by `12d0:0513`, not by the `-debug` parser branch
|
||||
|
||||
The low-level sink path is now tighter too:
|
||||
- `ProbablyPrintDebugMessage` at `1000:65cc` formats through the generic `ProbablySomethingDebuggy` / `FUN_1000_67ac` stream pipeline
|
||||
- that helper passes DS:`1478:6c46` as the target stream object
|
||||
- the surrounding DS data at `1478:6c32..1478:6c81` is a four-entry static stdio-style table with handle words `0`, `1`, `2`, and `3`
|
||||
- the `1478:6c46` entry is therefore the handle-`1` stream, i.e. the program's `stdout` slot
|
||||
|
||||
Practical implication:
|
||||
- the text side of `-debug` is not a hidden second debugger UI and not a newly-created in-memory log sink
|
||||
- it is ordinary formatted DOS standard-output text gated by `g_debugMsgLevel`
|
||||
- the main reason it is easy to miss is that Crusader spends most of its runtime in graphics mode, while many eligible callsites are startup, shutdown, or failure diagnostics
|
||||
|
||||
This matters because it narrows the real effect of the print side:
|
||||
- `-debug` definitely changes print gating
|
||||
- but it does not add a new visible on-screen text channel by itself
|
||||
- any visible text side effect depends on already-existing print callsites and on how DOS `stdout` is surfaced in the running environment
|
||||
|
||||
### 1a. What subsystems use that print gate
|
||||
|
||||
Recovered `ConsolePrintf` callers show that the thresholded text/debug lane is not limited to video. Current caller families include:
|
||||
- command-line handling and startup/shutdown (`HandleCommandlineArgs`, `CheckForLaurieArg`, `Init_Everything`, `Uninitialize`)
|
||||
- config parsing (`LoadConfigFile`)
|
||||
- cache/shape rebuild work (`CacheShapeHand_1070_15a9`)
|
||||
- item/glob spawning failure paths (`ItemGlob_GlobEggHatch`)
|
||||
- drawlist or display initialization (`DList_Init`)
|
||||
- audio init (`Init_ASS`)
|
||||
- joystick init/calibration (`Joystick_Init`, `Joystick_Calibrate`, nearby helper `1400:0c7e`)
|
||||
- teleporter and several `UProcess_*` / `1428:*` runtime helpers
|
||||
|
||||
The rarer blocking helper `DebugPrintAndWaitForInput` is much narrower. Current recovered callers are:
|
||||
- `Dispatch_Init_320_0281`
|
||||
- `Dispatch_1320_103e`
|
||||
- `NewGump_Alloc`
|
||||
|
||||
Those are all failure/debug-stop style paths rather than normal AVI playback logic.
|
||||
|
||||
Practical interpretation:
|
||||
- there **is** a non-video `-debug` lane
|
||||
- but the recovered non-video lane is primarily `thresholded debug/error text`, often in startup or failure handling
|
||||
- it is not currently a second confirmed visible feature on the level of the movie timing dots you observed
|
||||
|
||||
### 1c. Recovered printed strings and format strings
|
||||
|
||||
The print inventory is now tighter than the earlier subsystem-only caller list.
|
||||
|
||||
Recovered call levels so far:
|
||||
- `0x32`
|
||||
- `0xff`
|
||||
|
||||
Because retail `-debug` sets `g_debugMsgLevel = 10`, both recovered levels are above threshold and therefore eligible to print when their code paths execute.
|
||||
|
||||
#### Command-line and startup argument strings
|
||||
|
||||
Recovered directly from `HandleCommandlineArgs` / `CheckForLaurieArg`:
|
||||
- `Cheats are active.`
|
||||
- `You DO need help!`
|
||||
- `Debugging mode ON.`
|
||||
- `Enabling ENHANCED mode. (NOT!)`
|
||||
- `Warping to mission %d.`
|
||||
- `Warping to mission %d @ x:%d y:%d z:%d.`
|
||||
- `Defaulting to skill level %d`
|
||||
- `Map offset = %d`
|
||||
- `Destination Egg = %d`
|
||||
- `Demo mode.`
|
||||
|
||||
These are the cleanest first things to look for in DOSBox logging because they belong to deterministic early startup/argument paths.
|
||||
|
||||
#### Init / config / install-status strings
|
||||
|
||||
Recovered from `Init_Everything` and `LoadConfigFile`:
|
||||
- `Using map patch file.`
|
||||
- `Loading: [`
|
||||
- ` ]`
|
||||
- `.`
|
||||
- `Running with partial installation.`
|
||||
- `Running with full installation.`
|
||||
- `Redirecting mission %d tune to '%s'`
|
||||
|
||||
Interpretation:
|
||||
- some of these are not human-readable status sentences so much as progress-bar fragments and single-character emitters
|
||||
- that means an `INT 21h` log may show very short writes rather than one tidy line of text
|
||||
|
||||
#### Cache / rebuild strings
|
||||
|
||||
Recovered from `CacheShapeHand_1070_15a9`:
|
||||
- `Creating Swap file [`
|
||||
- ` ]`
|
||||
- `] `
|
||||
- `\n\r[`
|
||||
- ` ]\r[`
|
||||
- a single-byte progress marker at `1478:0e1e`
|
||||
|
||||
Interpretation:
|
||||
- this lane appears to print progress scaffolding rather than rich prose
|
||||
- if DOSBox logs writes with counts but not payload text, many tiny startup writes may belong to this cache/swap-file progress path
|
||||
|
||||
#### Runtime / failure diagnostics
|
||||
|
||||
Recovered from narrower runtime and failure paths:
|
||||
- `dl init ` from `DList_Init`
|
||||
- `COULD NOT CREATE GLOB ITEM!` from `ItemGlob_GlobEggHatch`
|
||||
|
||||
These are useful because they are strong “real debug text” fingerprints if they ever show up in a live log.
|
||||
|
||||
#### Audio init marker
|
||||
|
||||
Recovered from `Init_ASS`:
|
||||
- `.`
|
||||
|
||||
Current best read:
|
||||
- this is another minimal progress marker, not a descriptive sentence
|
||||
|
||||
#### Blocking `DebugPrintAndWaitForInput` strings
|
||||
|
||||
Recovered from the three currently known `DebugPrintAndWaitForInput` callers:
|
||||
- `No room for Dispatcher Record/Playback.`
|
||||
- `End of script! (press any key)`
|
||||
- `Out of Memory! [%u]`
|
||||
|
||||
Interpretation:
|
||||
- these are failure/debug-stop strings
|
||||
- they are good fingerprints for recognizing the lane, but they should not be expected during healthy normal gameplay
|
||||
|
||||
#### Shutdown strings
|
||||
|
||||
Recovered from `Uninitialize` and nearby startup/shutdown text:
|
||||
- `CRUSADER: No Remorse`
|
||||
- `No pity. No mercy. No remorse.`
|
||||
|
||||
Current best read:
|
||||
- the first is a normal shutdown-side banner
|
||||
- the second is associated with the Laurie-enabled lane and is not a generic `-debug` print on its own
|
||||
|
||||
### 1b. How to read that text in practice
|
||||
|
||||
Current best evidence-backed read is:
|
||||
- the messages go to the executable's normal DOS `stdout` stream, not to a bespoke debugger console
|
||||
- if `stdout` is left attached to `CON`, text may only be practically visible when Crusader is still in a text-ish startup/shutdown context or when a failure path forces a text-mode-style report
|
||||
- replacing `stdout` with a disk file via shell redirection is now a poor default recommendation, because live user testing showed `CRUSADER.EXE -debug > DEBUG.TXT` surviving startup but crashing after the intro cutscene
|
||||
|
||||
What this does and does not mean:
|
||||
- plain file redirection probably changes the handle-1 stream semantics enough to destabilize later stdio/device logic in this executable or runtime
|
||||
- it will not magically produce extra chatter unless existing callsites actually execute at or above the threshold
|
||||
- it does not by itself surface the hidden seg109/seg1408 debugger UI, because that is a separate control path
|
||||
|
||||
Remaining caution:
|
||||
- this is a strong static conclusion about the sink identity
|
||||
- runtime capture details still depend on the exact DOSBox / shell setup, because the game often runs after switching away from a normal text-console presentation
|
||||
- safest current guidance is to keep handle `1` attached to a character-device-style sink and capture around it, rather than redirecting directly to a regular file
|
||||
|
||||
### 2. `g_someDebugFlag2` is live
|
||||
|
||||
`1478:0859` (`g_someDebugFlag2`) is written by the `-debug` parser branch and read later in segment `1468`.
|
||||
|
||||
Recovered readers:
|
||||
- `VideoPlayer_AdvanceFrameAndHandleSkip` (`1468:2869`)
|
||||
- `VideoPlayer_StreamChunks` (`1468:2af4`)
|
||||
|
||||
Both routines conditionally call `VideoPlayer_DrawDebugTimingOverlay` (`1468:2de9`) when `g_someDebugFlag2 != 0`.
|
||||
|
||||
The current live export already places these functions in the `VideoPlayer_*` neighborhood:
|
||||
- preceding function `1468:283f..2868` is typed as `VideoPlayer_Run(struct Process * p_proc)`
|
||||
- `AVI_1468_2188` is an AVI-header parser that recognizes `AVI ` / `LIST` / `strl` / `strh` / `strf`
|
||||
- `FUN_1468_3904` is a later `movi`-chunk setup/prime path that calls `VideoPlayer_StreamChunks(..., 1)`
|
||||
|
||||
Current best read of this lane:
|
||||
- this is part of a video / presentation / media-processing subsystem
|
||||
- `-debug` leaves behind an extra instrumentation path in that subsystem
|
||||
- the behavior does not currently look like debugger-object creation, breakpoint management, or usecode UI entry
|
||||
|
||||
`VideoPlayer_DrawDebugTimingOverlay` is now specific enough to describe concretely:
|
||||
- it clears a temporary `0x1f4`-byte buffer (`500` bytes)
|
||||
- it computes marker positions from AVI timing fields using a divisor of `6000`
|
||||
- it writes marker bytes `8` and `9` into that temporary line buffer
|
||||
- it copies the resulting `500`-byte line into a video buffer near offset `0x1de * 0x280 + 0x78`
|
||||
- it builds and copies a second `500`-byte line into the next scanline at `+0x280`
|
||||
|
||||
Because the helper writes into two adjacent scanlines of a `0x280`-wide (`640`-pixel) buffer near the bottom of the frame, current best read is:
|
||||
- this is a built-in movie-playback timing overlay or marker strip
|
||||
- it is probably intended to visualize AVI/video timing state while playback is running
|
||||
- it is a practical runtime feature that may be observable during intro/cutscene playback when launched with `-debug`
|
||||
- it is still unrelated to the seg109/seg1408 usecode debugger object model
|
||||
|
||||
Runtime confirmation now matches the static read:
|
||||
- the observed moving dots at the bottom of played videos are consistent with this helper's two-line marker overlay
|
||||
- that closes the strongest user-visible effect of retail `-debug`
|
||||
|
||||
### 3. `g_someDebugFlag` is only weakly understood
|
||||
|
||||
`1478:0845` is symbolized as `g_someDebugFlag` and is also written by the `-debug` branch.
|
||||
|
||||
Current evidence is weaker here:
|
||||
- the parser write is confirmed
|
||||
- no equally clear downstream reader has been recovered yet
|
||||
|
||||
Current safest read:
|
||||
- this is a real surviving `-debug` state cell
|
||||
- it may be vestigial, sparsely used, or hiding inside still-unrecovered data/indirect uses
|
||||
- it should not currently be overinterpreted as a debugger bootstrap flag
|
||||
|
||||
At this point the absence evidence is fairly strong:
|
||||
- direct data-use recovery still finds only the parser write
|
||||
- nearby bytes `1478:0844` and `1478:0846` do have real readers/writers, so this is not just a search blind spot across the whole region
|
||||
- current best read is `orphaned or still-hidden latch`, not `known active second feature`
|
||||
|
||||
## Comprehensive Current Effect Summary
|
||||
|
||||
Based on the current live NE evidence, retail `-debug` enables or changes the following and no more has yet been proven:
|
||||
|
||||
### Confirmed direct effects
|
||||
|
||||
1. It prints `Debugging mode ON.` during command-line handling.
|
||||
2. It sets `g_debugMsgLevel` at `1478:87e0` to `10`.
|
||||
3. It sets `g_someDebugFlag` at `1478:0845` to `1`.
|
||||
4. It sets `g_someDebugFlag2` at `1478:0859` to `1`.
|
||||
|
||||
### Confirmed runtime-visible effect
|
||||
|
||||
1. It enables the seg1468 AVI/video-player timing overlay, which draws moving marker dots or traces near the bottom of played videos.
|
||||
|
||||
### Confirmed non-video effect class
|
||||
|
||||
1. It changes eligibility for existing debug/error print wrappers used across startup, config, cache, joystick, audio, item/glob, dispatch, and some runtime process code.
|
||||
2. A smaller subset of those callsites are blocking `print-and-wait` diagnostics used on failure/debug-stop paths in dispatch/gump allocation code.
|
||||
3. The recovered text sink for those wrappers is the program's handle-`1` stdio stream at `1478:6c46`, so the lane is standard-output text rather than a separate debugger-only channel.
|
||||
4. The recovered printable inventory is currently dominated by startup/status/progress strings plus a smaller number of failure-only diagnostics; it is not a rich always-on gameplay console.
|
||||
|
||||
### Things not currently proven as practical effects
|
||||
|
||||
1. A new visible text console or new text window.
|
||||
2. Any hidden usecode debugger bootstrap.
|
||||
3. Any connection to the seg109/seg1408 debugger-state pointer at `1478:659c/659e`.
|
||||
4. Any second confirmed user-visible feature beyond the AVI timing dots.
|
||||
5. Any active downstream behavior for `1478:0845`.
|
||||
|
||||
## What `-debug` Does **Not** Currently Prove
|
||||
|
||||
The hidden retail debugger / unit-inspector work already mapped elsewhere in this repo still centers on:
|
||||
- seg109 UI wrappers such as `usecode_debugger_open_for_current_unit` and `usecode_debugger_open_modal`
|
||||
- seg1408 debugger-state helpers such as `usecode_debugger_break_state_create` and `usecode_debugger_maybe_break_on_current_line`
|
||||
- the global debugger-state far pointer at `1478:659c/659e`
|
||||
|
||||
That `1478:659c/659e` pointer is still read by the known interpreter-side debugger path, including the break callback lane around `1418:04aa..04b5` and the seg109 debugger-opening wrappers.
|
||||
|
||||
What has **not** been shown in the current `-debug` pass:
|
||||
- no recovered write from the `-debug` branch to `1478:659c/659e`
|
||||
- no evidence that `-debug` calls `usecode_debugger_break_state_create`
|
||||
- no evidence that `-debug` enters `usecode_debugger_open_for_current_unit`
|
||||
- no evidence that `-debug` is the same switch as `-laurie`
|
||||
|
||||
So the current answer is:
|
||||
- `-debug` is real
|
||||
- `-debug` still does something
|
||||
- but it is currently **not** the same evidence trail as the hidden usecode debugger bootstrap
|
||||
|
||||
## Relationship To `-laurie`
|
||||
|
||||
The two switches should stay separated.
|
||||
|
||||
Current repo evidence still supports:
|
||||
- `-laurie` / `CheckForLaurieArg(...)` writes the cheat/debugger gate at `1478:0844` (`g_wasLaurieSet` / prior `cheats_enabled` lane)
|
||||
- `-debug` is handled inside the normal command-line option loop in `HandleCommandlineArgs`
|
||||
- `-debug` writes `1478:0845`, `1478:0859`, and `1478:87e0`, not `1478:0844`
|
||||
|
||||
That means the old hidden debugger/UI work and the `-debug` switch are adjacent only at a broad `debug features existed` level. They are not currently the same recovered control path.
|
||||
|
||||
## Best Current Conclusion
|
||||
|
||||
The wiki claim is only partly right.
|
||||
|
||||
Accurate part:
|
||||
- there really is a retail `-debug` command-line argument
|
||||
|
||||
Inaccurate or currently unsupported parts:
|
||||
- it is not merely a dead recognized string; the parser branch is still live
|
||||
- the current evidence does not support `secondary monitor debug kernel` specifically
|
||||
- the current evidence does not support `-debug` as the missing bootstrap for the hidden seg109/seg1408 usecode debugger
|
||||
|
||||
Best current evidence-backed replacement claim:
|
||||
|
||||
> Retail `CRUSADER.EXE` still recognizes and executes a live `-debug` branch. That branch prints `Debugging mode ON.`, raises the debug message level, and enables a concrete seg1468 AVI/video-player timing overlay that draws two 500-byte marker traces into adjacent scanlines near the bottom of the playback buffer. However, current evidence does not show it creating the seg1408 debugger-state object at `1478:659c/659e`, so it should not currently be treated as the missing bootstrap for the hidden usecode debugger UI.
|
||||
|
||||
## Claim Check: `E69FB` And The "Secondary Monochrome Monitor" Idea
|
||||
|
||||
An external claim said the potential debug instructions were at flat file offset `E69FB` and might only be visible on a secondary monochrome monitor.
|
||||
|
||||
Current evidence does not support that.
|
||||
|
||||
### `E69FB` mapping
|
||||
|
||||
Using the local NE segment map:
|
||||
- NE segment `144` begins at file offset `0xE3C00`
|
||||
- flat offset `0xE69FB` therefore lands at segment-relative offset `0x2dfb`
|
||||
- in the live NE image that maps to `1478:2dfb`
|
||||
|
||||
The bytes around that location are not executable instructions for a hidden monitor-debug path. They fall inside a data/string cluster:
|
||||
- `KeyboardProcess`
|
||||
- `KEYIO.C`
|
||||
- `PRIORITY.C`
|
||||
- `SystemTimer`
|
||||
- `SYSTIMER.C`
|
||||
- `AccWait`
|
||||
|
||||
`1478:2dfb` itself lands inside the `SYSTIMER.C` string, not inside a code body.
|
||||
|
||||
Current safest read:
|
||||
- the cited flat offset is almost certainly a mistaken pointer into a data/descriptor/source-file-name region
|
||||
- it is not a useful anchor for `-debug` behavior and not evidence for hidden display-specific code by itself
|
||||
|
||||
### Secondary monochrome monitor check
|
||||
|
||||
The current `-debug` evidence points elsewhere:
|
||||
- text/debug output goes through the normal `stdout` sink at `1478:6c46`
|
||||
- the user-visible runtime feature is the seg1468 AVI timing overlay drawn into the main video buffer
|
||||
|
||||
Additional negative evidence from the live program:
|
||||
- no recovered text strings mentioning `mono`, `monochrome`, `hercules`, or `MDA`
|
||||
- no recovered instruction hits referencing obvious monochrome-adapter I/O ports such as `0x3b4`, `0x3b5`, `0x3b8`, or `0x3ba`
|
||||
- no recovered instruction hits referencing the `B000` monochrome text-memory window
|
||||
|
||||
That does **not** mathematically prove that no historical or stripped monitor-debug code ever existed during development, but it does mean:
|
||||
- the current retail `-debug` evidence does not support the "secondary monochrome monitor" explanation
|
||||
- the current retail implementation is better explained as `stdout`-gated text plus the AVI timing overlay
|
||||
|
||||
## Ghidra Refinements Applied
|
||||
|
||||
The live `CRUSADER.EXE` database now carries this batch's first-pass refinements too:
|
||||
- `1468:2869` -> `VideoPlayer_AdvanceFrameAndHandleSkip`
|
||||
- `1468:2af4` -> `VideoPlayer_StreamChunks`
|
||||
- `1468:2de9` -> `VideoPlayer_DrawDebugTimingOverlay`
|
||||
- parser and global comments at `1048:0a93`, `1478:0845`, `1478:0859`, and `1478:87e0`
|
||||
- overlay-gate comments at `1468:2920` and `1468:2dc8`
|
||||
- debug-print helper comments at `12d0:0391`, `12d0:03ee`, and `12d0:0442`
|
||||
- sink/init comments at `1000:65cc` and `12d0:0513`
|
||||
- stdio-table comments at `1478:6c32` and `1478:6c46`
|
||||
|
||||
## Follow-Up Targets
|
||||
|
||||
If this lane is revisited, the highest-value remaining questions are:
|
||||
- identify a concrete behavioral name for `1478:0845` by finding a real downstream consumer
|
||||
- classify the remaining nearby seg1468 helpers so the AVI/video-player object layout around `+0x117/+0x11b/+0x11f/+0x123` can be named cleanly
|
||||
- test whether the overlay is visibly present during intro/cutscene playback in DOSBox or another live runtime
|
||||
- find meaningful callsites into the seg12d0 positioned debug-print helpers to learn whether `-debug` exposes additional text diagnostics beyond movie timing markers
|
||||
- determine the default runtime value path for `g_debugMsgLevel` more rigorously, since static initialized data alone does not yet explain the full practical print behavior
|
||||
- sample a few representative `ConsolePrintf` / `DebugPrintAndWaitForInput` format strings under live capture so the stdout lane can be characterized with runtime output rather than only static caller families
|
||||
180
docs/usecode-alarmhat-analysis.md
Normal file
180
docs/usecode-alarmhat-analysis.md
Normal file
|
|
@ -0,0 +1,180 @@
|
|||
# ALARMHAT Analysis
|
||||
|
||||
## Purpose
|
||||
|
||||
This note records the current evidence-backed read of exported `ALARMHAT` pseudocode and what the class most likely means in gameplay.
|
||||
|
||||
The goal is not to force a final rename. The goal is to state what the script definitely does, what it probably does, and where the remaining uncertainty sits.
|
||||
|
||||
## Sources used
|
||||
|
||||
- exported pseudocode: `USECODE/EUSECODE_extracted/pseudocode/ALARMHAT/slot_0A_equip.txt`
|
||||
- class/event index rows for `ALARMHAT`
|
||||
- raw linear disassembly in `K:/ghidra/crusader-disasm/crusader_disasm.txt`
|
||||
- nearby alarm-family comparators:
|
||||
- `ALARMBOX`
|
||||
- `ALRMTRIG`
|
||||
- `ALARM_NS`
|
||||
- `ALARM_EW`
|
||||
|
||||
## Short version
|
||||
|
||||
`ALARMHAT` is not a general many-event alarm controller. In the extracted corpus it has one live body: slot `0x0A` (`equip`).
|
||||
|
||||
That body behaves like an alarm-family state controller attached to an item. It does two different things depending on the current frame of the item:
|
||||
|
||||
1. in frame `0`, it searches nearby shape `0x04D0` objects and equips qualifying ones with mode `0x17`
|
||||
2. in non-zero frames, it first requires the item to be on-screen, then performs a nearby actor/family scan, and if that passes it searches nearby shape `0x04D0` objects and equips qualifying ones with mode `0x15`
|
||||
|
||||
The likely gameplay read is: `ALARMHAT` is a local alarm-state driver that flips nearby helper objects or actors into one of two equipment/activation states, with the second state gated by player-visible activity near the item.
|
||||
|
||||
## Structural facts
|
||||
|
||||
From `class_event_index.tsv`:
|
||||
|
||||
- class: `ALARMHAT`
|
||||
- class id: `0x0561`
|
||||
- only decoded non-zero slot: `0x0A equip`
|
||||
- body window: `0x00D4..0x025F`
|
||||
- body length: `395` bytes
|
||||
|
||||
From the debug trailer/local names in the exported body:
|
||||
|
||||
- locals currently render as `referent`, `var`, `item`, and `npc`
|
||||
|
||||
Those local names are useful hints, but the behavior matters more than the variable spelling.
|
||||
|
||||
## Direct script behavior
|
||||
|
||||
The body begins with the standard alarm-family setup:
|
||||
|
||||
```text
|
||||
set_info(0x0211, *(arg_06));
|
||||
process_exclude();
|
||||
```
|
||||
|
||||
That matches `ALARMBOX::equip`, which also begins with `set_info(0x0211, *(arg_06)); process_exclude();`.
|
||||
|
||||
After that, the script splits on `Item.getFrame(arg_06)`.
|
||||
|
||||
### Branch 1: frame == 0
|
||||
|
||||
The raw disassembly shows a loop-selector sequence equivalent to:
|
||||
|
||||
- search nearby items
|
||||
- constrain by `item->shape == 0x04D0`
|
||||
- use the current item as the search origin
|
||||
|
||||
For each matching object:
|
||||
|
||||
- call `Item.getFrame(item)`
|
||||
- only continue on frame `0`
|
||||
- call `Item::I_equip(pid, 0x17, item)`
|
||||
- `suspend`
|
||||
|
||||
So the first branch is not sounding an alarm by itself. It is driving nearby shape `0x04D0` objects into a specific equip/activation mode.
|
||||
|
||||
### Branch 2: frame != 0
|
||||
|
||||
This branch only runs if `Item::I_isOnScreen(arg_06)` passes.
|
||||
|
||||
It then performs another nearby search, this time using `item->family` with family value `6`. Inside that scan it uses:
|
||||
|
||||
- `Actor::I_isNPC(...)`
|
||||
- `Item::I_getZ(...)`
|
||||
- a vertical-band test of `origin_z - 10 < candidate_z < origin_z + 10`
|
||||
|
||||
The exact truth sense of the `Actor::I_isNPC` step is still slightly uncertain because we are reading it through the current pseudo-IR condition simplifier, but the script is clearly trying to qualify nearby actor-like entities in the same local vertical band before proceeding.
|
||||
|
||||
If that actor/family gate succeeds, the script runs a second nearby shape search for `item->shape == 0x04D0`, again requiring frame `0` on the found object, then calls:
|
||||
|
||||
```text
|
||||
Item::I_equip(pid, 0x15, item)
|
||||
```
|
||||
|
||||
followed by `suspend`.
|
||||
|
||||
So the non-zero-frame branch is the more selective mode: it only arms the nearby `0x04D0` helper objects once a local visibility/actor-proximity condition is satisfied.
|
||||
|
||||
## What shape `0x04D0` likely is
|
||||
|
||||
The old disassembly corpus labels usecode class `0x04D0` as `MONSTER`.
|
||||
|
||||
That does not prove every shape-id use of `0x04D0` is literally a monster actor object, but it is a strong clue that `ALARMHAT` is not interacting with an arbitrary visual prop. It is probably targeting a helper/actor class in the monster or hostile-response lane.
|
||||
|
||||
This is the strongest reason not to read `ALARMHAT` as just a decorative siren hat sprite. The script is actively scanning for nearby `0x04D0` objects and equipping them.
|
||||
|
||||
## Comparison to the rest of the alarm family
|
||||
|
||||
`ALARMHAT` fits the broader alarm family, but it is not identical to the other alarm classes.
|
||||
|
||||
### `ALARM_NS` and `ALARM_EW`
|
||||
|
||||
These are tiny `enterFastArea` stubs. They mainly stamp info, exclude themselves from processing, and gate on a simple intrinsic.
|
||||
|
||||
Those look like directional/environment trigger markers.
|
||||
|
||||
### `ALRMTRIG`
|
||||
|
||||
This class is a compact trigger/spawner. Its `equip` body branches on map/state and spawns `class_0A18_slot_20(...)` with different mode values.
|
||||
|
||||
That looks more like an alarm-event relay.
|
||||
|
||||
### `ALARMBOX`
|
||||
|
||||
This is the closest comparator.
|
||||
|
||||
`ALARMBOX::equip`:
|
||||
|
||||
- uses the same `set_info(0x0211)` + `process_exclude()` prologue
|
||||
- branches by local state and frame
|
||||
- spawns different helper slots for low/high alarm states
|
||||
- can also spawn `class_0A18_slot_20(...)`
|
||||
|
||||
So `ALARMBOX` reads like a more explicit alarm control box, while `ALARMHAT` reads like a local accessory/controller that pushes nearby helper objects into one of two equip states.
|
||||
|
||||
## Gameplay interpretation
|
||||
|
||||
The safest gameplay-facing read is:
|
||||
|
||||
`ALARMHAT` is an alarm-state accessory or controller item that toggles nearby hostile/helper objects between two alarm-response modes.
|
||||
|
||||
The second mode only activates when the item is visible on screen and when a nearby actor-like entity qualifies within a narrow Z band, which strongly suggests local encounter awareness rather than a map-global trigger.
|
||||
|
||||
In practical gameplay terms, the object likely does something like one of these:
|
||||
|
||||
- wake up or arm nearby hostile responders
|
||||
- switch nearby helper entities between idle and alert states
|
||||
- mark a local alarm point as actively triggered once the player or another actor is nearby
|
||||
|
||||
The name `ALARMHAT` therefore probably reflects object art or designer shorthand rather than a full behavior description. The script behavior is closer to `alarm accessory that equips nearby monster/helper actors into a response mode` than to `play a siren sound` or `flash a light`.
|
||||
|
||||
## Confidence and uncertainty
|
||||
|
||||
High confidence:
|
||||
|
||||
- `ALARMHAT` has one live exported body, slot `0x0A equip`
|
||||
- it uses the same `0x0211` alarm-family setup as `ALARMBOX`
|
||||
- it branches on its own frame
|
||||
- it performs nearby searches on shape `0x04D0`
|
||||
- it calls `Item::I_equip(...)` on qualifying found objects with two different mode values: `0x17` and `0x15`
|
||||
- the second mode is gated by on-screen and nearby actor/family checks
|
||||
|
||||
Moderate confidence:
|
||||
|
||||
- shape `0x04D0` is a monster/helper/hostile-response class rather than an inert prop
|
||||
- the non-zero-frame branch is the alert or escalation state
|
||||
|
||||
Lower confidence / still open:
|
||||
|
||||
- the exact semantic meaning of equip modes `0x17` versus `0x15`
|
||||
- the precise truth sense of the `Actor::I_isNPC`-named intrinsic inside the family-`6` scan
|
||||
- whether `ALARMHAT` is physically a wearable-looking prop, a mounted alarm device, or a small controller object whose art happened to be named `HAT`
|
||||
|
||||
## Current best rename-level takeaway
|
||||
|
||||
Do not rename it yet.
|
||||
|
||||
But the current best mental model is:
|
||||
|
||||
`ALARMHAT = local alarm-state driver that equips nearby MONSTER/helper objects into one of two response modes depending on frame/state and nearby actor visibility.`
|
||||
248
docs/usecode-equipment-system.md
Normal file
248
docs/usecode-equipment-system.md
Normal file
|
|
@ -0,0 +1,248 @@
|
|||
# Crusader USECODE Equipment System
|
||||
|
||||
## Purpose
|
||||
|
||||
This note records the current evidence-backed read of Crusader's surviving `equip` / `unequip` system.
|
||||
|
||||
The short answer is: yes, a real inherited equipment-style event system survived into Crusader, but by this point it has been generalized far beyond RPG inventory handling. In Crusader, `equip` and `unequip` are standard item usecode events that many classes reuse for actor setup, turret arming, trap activation, alarm-state changes, and environmental control.
|
||||
|
||||
## Short version
|
||||
|
||||
The funny interpretation is half true and half misleading.
|
||||
|
||||
What is true:
|
||||
|
||||
- Crusader still has first-class `equip` and `unequip` item events.
|
||||
- The live binary has dedicated intrinsics for them.
|
||||
- The extracted USECODE corpus has a large number of slot `0x0A equip` and slot `0x0B unequip` bodies.
|
||||
- Actor-like classes such as `NPC` and `MONSTER` really do implement `equip` bodies.
|
||||
|
||||
What is misleading:
|
||||
|
||||
- this is not yet proof of a fully intact Ultima-style paper-doll equipment subsystem
|
||||
- many `equip` bodies are plainly being used as generalized state-change hooks rather than literal "put item on actor" logic
|
||||
- alarms, sentries, guns, floors, conveyors, cameras, and hazard objects all reuse the same event vocabulary
|
||||
|
||||
Current best read:
|
||||
|
||||
Crusader inherited the Ultima 8 event names and event slots, then repurposed them into a broad object-activation vocabulary. The old RPG-flavored name `equip` survived, but its semantics widened into `apply mode / arm / initialize / attach behavior / activate state`.
|
||||
|
||||
## Compiled-side proof
|
||||
|
||||
The live `CRUSADER.EXE` session already has named intrinsic handlers for these events:
|
||||
|
||||
- `10a0:2a35` = `Item_Equip`
|
||||
- `10a0:2a68` = `Item_Unequip`
|
||||
- nearby sibling `10a0:2b30` = `Item_EnterFastArea`
|
||||
|
||||
The decompiler output for `Item_Equip` is the key proof:
|
||||
|
||||
```c
|
||||
word __cdecl16far Item_Equip(int *pitemno,uint dest)
|
||||
{
|
||||
res = Usecode_ItemCallEvent(pitemno,0x400,UC_Equip,...);
|
||||
if (res != 0) {
|
||||
return in_stack_0000fffc;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
`Item_Unequip` is parallel:
|
||||
|
||||
```c
|
||||
word __cdecl16far Item_Unequip(int *pitemno,int val)
|
||||
{
|
||||
res = Usecode_ItemCallEvent(pitemno,0x800,UC_Unequip,...);
|
||||
if (res != 0) {
|
||||
return in_stack_0000fffc;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
That proves several things immediately:
|
||||
|
||||
1. `equip` and `unequip` are real compiled-side usecode events, not parser inventions.
|
||||
2. They are gated by dedicated owner-row capability masks: `0x400` for equip and `0x800` for unequip.
|
||||
3. The intrinsic does not directly manipulate an inventory table itself. It forwards into a generic item-event dispatcher.
|
||||
|
||||
The central dispatcher decompiles as `Usecode_ItemCallEvent`. Its current Ghidra decompile explicitly comments it as a generic owner-row mask gate:
|
||||
|
||||
- compute item usecode class id
|
||||
- consult the owner-row table behind the runtime at `0x6611`
|
||||
- test the supplied capability mask
|
||||
- if the row supports the event, create a usecode process for the requested event
|
||||
|
||||
That is the architectural core of the surviving system: `equip` and `unequip` are class capabilities, not hardwired game-engine special cases.
|
||||
|
||||
## Cross-check against the engine source lineage
|
||||
|
||||
ScummVM's Ultima8/Crusader code preserves the same event interpretation.
|
||||
|
||||
In `engines/ultima/ultima8/world/item.cpp`:
|
||||
|
||||
- `callUsecodeEvent_equipWithParam(param)` is explicitly documented as event `A`
|
||||
- `callUsecodeEvent_unequipWithParam(param)` is explicitly documented as event `B`
|
||||
|
||||
That source also shows the event neighborhood:
|
||||
|
||||
- event `9` = release
|
||||
- event `A` = equip
|
||||
- event `B` = unequip
|
||||
- event `C` = combine
|
||||
- event `E` = called from anim
|
||||
- event `F` = enter fast area
|
||||
- event `10` = leave fast area
|
||||
- event `11` = cast
|
||||
|
||||
This strongly supports the idea that Crusader inherited an Ultima-style item event vocabulary wholesale and kept using it even after the gameplay moved away from a classic RPG.
|
||||
|
||||
## How widespread it is in the extracted corpus
|
||||
|
||||
The current exported pseudocode corpus contains:
|
||||
|
||||
- `77` slot `0x0A equip` entries
|
||||
- `50` slot `0x0B unequip` entries
|
||||
|
||||
That is far too widespread to be accidental or restricted to a tiny one-off subsystem.
|
||||
|
||||
The distribution is also telling:
|
||||
|
||||
- actor/NPC classes: `NPC`, `MONSTER`
|
||||
- turret/weapon classes: `BASEGUN`, `SENTRY`, `GATGUN*`, `WALGUN*`, `GOVGUN*`
|
||||
- alarm/hazard/environment classes: `ALARMBOX`, `ALARMHAT`, `FFFLOOR`, `FLAMEBOX`, `STEAMBOX`
|
||||
- cameras and movers: `CAM_*`, `EYECAM*`, `CONVEY_*`, `ROLL_*`, `HOVER*`
|
||||
|
||||
So the names survived as a general event interface across many object families.
|
||||
|
||||
## What `equip` means in practice
|
||||
|
||||
### 1. General activation or state application
|
||||
|
||||
Many non-actor classes use `equip` exactly the way a modern engine might use `activate`, `arm`, `enable`, or `set mode`.
|
||||
|
||||
Representative example: `BASEGUN::equip`
|
||||
|
||||
```text
|
||||
set_info(0x0211, *(arg_06));
|
||||
process_exclude();
|
||||
spawn class_0A1A_slot_24(pid, arg_0A, ..., arg_06);
|
||||
suspend;
|
||||
```
|
||||
|
||||
Parallel example: `SENTRY::equip`
|
||||
|
||||
Its exported pseudocode is the same structural pattern as `BASEGUN::equip`, and `SENTRY::unequip` mirrors the same shutdown logic as `BASEGUN::unequip`.
|
||||
|
||||
That is not wearable-item behavior. It is a clean arm/disarm lifecycle.
|
||||
|
||||
### 2. Alarm-state or environmental mode switching
|
||||
|
||||
`ALARMBOX::equip` and `ALARMHAT::equip` are good examples of the generalized meaning.
|
||||
|
||||
They both begin with:
|
||||
|
||||
```text
|
||||
set_info(0x0211, *(arg_06));
|
||||
process_exclude();
|
||||
```
|
||||
|
||||
Then they branch by frame/state and spawn or equip helper objects in specific response modes.
|
||||
|
||||
`ALARMHAT` is especially illustrative because it calls `Item::I_equip(pid, 0x17, item)` and `Item::I_equip(pid, 0x15, item)` on nearby `shape 0x04D0` helper objects. That is exactly where the older RPG-flavored naming sounds absurdly literal, but the behavior is really `push nearby helper into alarm response mode`.
|
||||
|
||||
### 3. NPC and monster setup
|
||||
|
||||
Actor-facing classes really do implement `equip`, but the current evidence still points to scripted setup and state application rather than visible inventory slot management.
|
||||
|
||||
Representative example: `NPC::equip`
|
||||
|
||||
The exported body does not look like a paper-doll or inventory routine. It looks like a setup dispatcher keyed by `arg_0A`:
|
||||
|
||||
- `arg_0A == 10`: reset, teleport to egg, gather several values, create or legalize the NPC/item context, then spawn `class_0A11_slot_29(...)`
|
||||
- `arg_0A == 30`: read several values with `Intrinsic00DF(...)`
|
||||
- `arg_0A == 31`: suspend
|
||||
- `arg_0A == 1/2/3`: explicit but empty branches
|
||||
|
||||
That reads much more like `apply initialization mode` or `run NPC setup variant` than `equip helmet`.
|
||||
|
||||
Representative example: `MONSTER::equip`
|
||||
|
||||
`MONSTER::equip` checks frame/state, stamps `0x0211`, then spawns `class_0A1E_slot_2D(pid, var, monster1, arg_06)` for a range of mode values. Again, that looks like a dispatcher for monster behavior setup or activation, not a visible inventory pane.
|
||||
|
||||
## What `unequip` means in practice
|
||||
|
||||
`unequip` is the paired shutdown or detachment event.
|
||||
|
||||
Representative example: `BASEGUN::unequip`
|
||||
|
||||
```text
|
||||
set_info(0x0212, *(arg_06));
|
||||
process_exclude();
|
||||
if (Item.getStatus(arg_06) & 4) {
|
||||
spawn class_0A1A_slot_27(arg_06);
|
||||
}
|
||||
```
|
||||
|
||||
`SENTRY::unequip` is the same pattern.
|
||||
|
||||
That strongly suggests `unequip` has become the general reverse transition for classes that use `equip` as `arm` or `enable`.
|
||||
|
||||
Environmental example: `FFFLOOR::unequip`
|
||||
|
||||
This body does not resemble inventory logic at all. It scans nearby family-`6` objects, filters by overlap, and then spawns an actor-facing helper `class_0A11_slot_2D(retval, *(arg_06), item)` before suspending.
|
||||
|
||||
So by the time we reach Crusader, `unequip` has clearly broadened into `deactivate / detach / reverse state / cleanup side effects`.
|
||||
|
||||
## What survived from the RPG ancestor
|
||||
|
||||
The surviving part is not just the word choice. The system architecture survived too:
|
||||
|
||||
- item classes advertise event capabilities through owner-row mask bits
|
||||
- the engine exposes generic intrinsics that dispatch into class-specific usecode handlers
|
||||
- the event numbers remain organized in an Ultima-style item-event vocabulary
|
||||
- actor and item classes both participate in the same event model
|
||||
|
||||
That is exactly the kind of subsystem that makes sense as an inherited RPG engine layer which later got repurposed for an action game.
|
||||
|
||||
## What we do not have yet
|
||||
|
||||
We do **not** yet have proof of a fully intact classic RPG equipment layout such as:
|
||||
|
||||
- fixed visible paper-doll slots
|
||||
- a clean actor inventory table indexed by helmet/armor/weapon slots
|
||||
- universal actor gear attachment semantics across all NPCs
|
||||
|
||||
The current evidence is stronger for:
|
||||
|
||||
- an inherited event vocabulary with `equip` / `unequip`
|
||||
- class-specific scripted setup and mode application
|
||||
- actor and environmental reuse of the same event names
|
||||
|
||||
So the safe statement is:
|
||||
|
||||
Crusader definitely has a hidden equipment **event system**.
|
||||
|
||||
Crusader may also retain deeper actor-equipment semantics under that layer, but that is not yet proven from the current pass.
|
||||
|
||||
## Current best model
|
||||
|
||||
The most defensible current model is:
|
||||
|
||||
1. Ultima-style item events `A = equip` and `B = unequip` survived into Crusader.
|
||||
2. The compiled engine still dispatches them through generic item-event intrinsics.
|
||||
3. Owner-row capability bits decide which classes support those events.
|
||||
4. USECODE then interprets `equip` and `unequip` per class.
|
||||
5. In Crusader, many classes reinterpret those events as `activate/deactivate`, `arm/disarm`, `initialize/teardown`, or `switch mode`, while actor-facing classes also use them for setup-like behavior.
|
||||
|
||||
## Good follow-up targets
|
||||
|
||||
The best next places to deepen this are:
|
||||
|
||||
1. compare more actor-facing equip bodies (`NPC`, `MONSTER`, `ANDROID`, `AVATAR`) against each other
|
||||
2. find compiled-side consumers of the value passed as the `equip` parameter to see whether it maps to actor slots, modes, or stance/state enums
|
||||
3. recover more names around the helper slots repeatedly spawned from equip bodies, especially the `0A11`, `0A1A`, and `0A1E` families
|
||||
4. check whether any actor-only classes expose both equip and unequip in ways that look like real gear attach/detach rather than generic initialization
|
||||
5. decode more loop-selector idioms so actor/environment equip bodies become less VM-shaped
|
||||
6. cross-check ScummVM Crusader actor/inventory code for surviving slot semantics that may still map to the retail binary
|
||||
140
docs/usecode-jelyhack-analysis.md
Normal file
140
docs/usecode-jelyhack-analysis.md
Normal file
|
|
@ -0,0 +1,140 @@
|
|||
# JELYHACK USECODE Analysis
|
||||
|
||||
## Scope
|
||||
|
||||
This note focuses on the currently exported pseudocode and byte-level decode for `JELYHACK` class `277` / class id `0x04D3`, especially its only non-zero event body: slot `0x01` (`use`).
|
||||
|
||||
Current generated pseudocode lives at:
|
||||
|
||||
- `USECODE/EUSECODE_extracted/pseudocode/JELYHACK/slot_01_use.txt`
|
||||
- `USECODE/EUSECODE_extracted/pseudocode/JELYH2/slot_01_use.txt`
|
||||
|
||||
## Direct decompilation result
|
||||
|
||||
Current readable decompilation for `JELYHACK::use`:
|
||||
|
||||
```text
|
||||
function jelyhack_use() /* entry=277 class_id=0x04D3 slot=0x01 */
|
||||
{
|
||||
entry:
|
||||
set_info(0x0207, *(arg_06));
|
||||
process_exclude();
|
||||
return;
|
||||
|
||||
}
|
||||
```
|
||||
|
||||
Byte-faithful decode of the same body:
|
||||
|
||||
```text
|
||||
00D4: 5A init local_bytes=0x0
|
||||
00D6: 5C symbol_info symbol=JELYHACK
|
||||
00E2: 0B push_word_immediate value_u16=0x0207
|
||||
00E5: 40 push_local_dword [BP+06h]
|
||||
00E7: 4C push_indirect size=0x2
|
||||
00E9: 77 set_info
|
||||
00EA: 78 process_exclude
|
||||
00EB: 5B line_number line_number=0x00DB
|
||||
00EE: 50 ret
|
||||
```
|
||||
|
||||
The parser still sees bytes after `ret`, but on the current readable pass they are intentionally elided because control has already returned. Those post-`ret` bytes are identical between `JELYHACK` and `JELYH2`, so they do not currently support any class-specific behavioral claim.
|
||||
|
||||
## What is directly supported
|
||||
|
||||
1. `JELYHACK` is not an active event hub in the same sense as `EVENT`, `NPCTRIG`, or `_BOOT` classes.
|
||||
2. Its only non-zero slot is `0x01` (`use`), with `raw_event_entry_word = 0x002A`, `raw_code_offset = 0x00000001`, and body range `0x00D4..0x00FE` (`42` bytes).
|
||||
3. The actual readable body before return is tiny: one `set_info(0x0207, *(arg_06))`, one `process_exclude()`, then return.
|
||||
4. `JELYH2` is the same script body for practical purposes. The only pre-return differences are the symbol label string and line-number metadata; control flow and active ops are otherwise the same.
|
||||
5. `JELYHACK` still exposes only the `referent` field on the descriptor side, which keeps it in the `referent-anchor` category rather than the event-bearing category.
|
||||
|
||||
## Comparison against the exported pseudocode corpus
|
||||
|
||||
The `JELYHACK::use` body is not unique. Normalizing away only the function header, the exact same readable body currently appears in these seven exports:
|
||||
|
||||
- `AVATAR/slot_01_use.txt`
|
||||
- `GRAVITON/slot_01_use.txt`
|
||||
- `IONIC/slot_01_use.txt`
|
||||
- `JELYHACK/slot_01_use.txt`
|
||||
- `JELYH2/slot_01_use.txt`
|
||||
- `PLASMA/slot_01_use.txt`
|
||||
- `WEA_BOOT/slot_01_use.txt`
|
||||
|
||||
That matters because it argues against a special-purpose `JELYHACK` event implementation. The more defensible reading is that this is one small generic `use` stub reused across several classes.
|
||||
|
||||
There is also a second useful comparison: other classes often start with the same `set_info(...); process_exclude();` prologue and then continue into much richer logic. `DATALINK::use` is a good example: it begins with the same opening but then expands into a larger branch-heavy routine. So the `set_info/process_exclude` pair is best treated as a common preamble, not the whole semantic payload of a class family.
|
||||
|
||||
## What `JELYHACK::use` most likely does
|
||||
|
||||
Current safest reading:
|
||||
|
||||
- It performs a small generic `use`-entry setup using info id `0x0207` and the dereferenced word at `arg_06`.
|
||||
- It then marks the current process or handler for exclusion/suppression through `process_exclude()`.
|
||||
- It does not itself implement the richer trigger logic one would expect from an active gameplay event script.
|
||||
|
||||
The unresolved part is the exact gameplay meaning of `set_info(0x0207, *(arg_06))`. The exported corpus shows this exact pattern in multiple unrelated classes, so `0x0207` currently looks more like a shared UI/message/interaction setup code than a JELYHACK-specific action.
|
||||
|
||||
## What it probably does not do
|
||||
|
||||
The current evidence argues against several stronger claims:
|
||||
|
||||
- It is probably not the main script that drives the JELYHACK gameplay behavior by itself.
|
||||
- It is probably not the actual event-bearing payload that reaches the richer runtime opcode lanes recovered around `000d:208b`, `000d:21ed`, and `000d:22bc`.
|
||||
- It is probably not a unique script template specialized only for the JELYHACK object.
|
||||
|
||||
## Relationship to JELYH2 and the surrounding island
|
||||
|
||||
`JELYHACK` and `JELYH2` remain the clearest referent-anchor twins in the extracted USECODE data:
|
||||
|
||||
- same lone live slot `0x01`
|
||||
- same event-table row shape (`0x002A / 0x00000001`)
|
||||
- same `42`-byte body length
|
||||
- same readable `use` body before return
|
||||
- same descriptor-side role: `referent-anchor`
|
||||
|
||||
This fits the broader neighborhood evidence already captured elsewhere:
|
||||
|
||||
- `JELYHACK` / `JELYH2` sit beside event-bearing neighbors such as `REE_BOOT`, `SURCAMEW`, and `SFXTRIG`
|
||||
- those neighbors expose `event` or `eventTrigger` fields and carry materially richer behavior bodies
|
||||
- the current best model is therefore still `referent anchor + neighboring event-bearing attachment`, not `JELYHACK as a standalone active event core`
|
||||
|
||||
## Comparison with nearby event-bearing neighbors
|
||||
|
||||
The generated pseudocode reinforces that split.
|
||||
|
||||
`REE_BOOT` slot `0x0A` and `SFXTRIG` slot `0x0A` diverge immediately from the JELYHACK body:
|
||||
|
||||
- they use `set_info(0x0211, *(arg_06))` instead of `0x0207`
|
||||
- they perform status checks, helper calls, spawns, waits/suspends, and other active logic
|
||||
- they therefore look like genuine event-bearing routines rather than passive anchor stubs
|
||||
|
||||
`DATALINK` slot `0x01` is also instructive in the other direction:
|
||||
|
||||
- it starts with the same `set_info/process_exclude` preamble
|
||||
- but it then continues into a much larger routine
|
||||
- so the shared opening is not enough to classify a body as anchor-only or event-bearing by itself
|
||||
|
||||
## Current best conclusion
|
||||
|
||||
The decompiled `JELYHACK::use` body is important mainly because it supports a negative result cleanly:
|
||||
|
||||
- `JELYHACK` is not hiding a large active event script in its only live slot
|
||||
- its visible `use` handler is a minimal generic stub shared with several other classes
|
||||
- the interesting gameplay semantics around the JELYHACK island are still more likely to live in neighboring event-bearing descriptors attached to the same referent context
|
||||
|
||||
So the current best human-readable model remains:
|
||||
|
||||
```text
|
||||
anchor JELYHACK(referent)
|
||||
anchor JELYH2(referent)
|
||||
|
||||
use:
|
||||
set_info(shared_use_code_0x0207, deref(arg_06))
|
||||
process_exclude()
|
||||
return
|
||||
|
||||
actual island behavior:
|
||||
likely carried by neighboring event-bearing attachments such as REE_BOOT / SURCAMEW / SFXTRIG
|
||||
```
|
||||
|
||||
That is a stronger and cleaner claim than the older vague label `referent anchor`: the exported pseudocode now shows that the anchor really does have code, but the code is a tiny shared interaction stub, not the island's main behavior engine.
|
||||
248
docs/usecode-tool-improvement-plan.md
Normal file
248
docs/usecode-tool-improvement-plan.md
Normal file
|
|
@ -0,0 +1,248 @@
|
|||
# USECODE Tool Improvement Plan
|
||||
|
||||
## Purpose
|
||||
|
||||
This note turns the earlier tooling comparison into a concrete improvement plan for the local parser/decompiler.
|
||||
|
||||
The intent is not to copy Pentagram or `crusader-disasm` wholesale. The intent is to extract the parts that are genuinely useful for the current workspace toolchain:
|
||||
|
||||
- `tools/poc_crusader_usecode_parser.py`
|
||||
- `tools/export_usecode_pseudocode.py`
|
||||
- the extracted owner-loaded corpus under `USECODE/EUSECODE_extracted/`
|
||||
|
||||
## Short version
|
||||
|
||||
The most useful next upgrades are:
|
||||
|
||||
1. make the decoder tables more authoritative
|
||||
2. decode loop/selector idioms into real structured searches
|
||||
3. improve intrinsic naming and signatures
|
||||
4. distinguish code from trailers more rigorously
|
||||
5. add corpus-level pattern clustering and family annotations
|
||||
6. keep strengthening the runtime bridge back into the retail binary
|
||||
|
||||
## Priority 1: Authoritative opcode metadata
|
||||
|
||||
### What to borrow
|
||||
|
||||
From Pentagram and `crusader-disasm`:
|
||||
|
||||
- stable opcode names
|
||||
- operand-shape knowledge
|
||||
- special handling for records like `SYMBOL_INFO`, `LINE_NUMBER`, `PROCESS_EXCLUDE`, and `END`
|
||||
|
||||
### Why it matters
|
||||
|
||||
The current parser already decodes enough to produce readable pseudocode, but some opcodes are still treated more heuristically than declaratively. That is fine for proof-of-concept output, but it becomes fragile once more control-flow and loop idioms are added.
|
||||
|
||||
### Concrete change
|
||||
|
||||
Move the per-opcode knowledge into a single explicit table describing:
|
||||
|
||||
- mnemonic
|
||||
- stack effect where known
|
||||
- immediate layout
|
||||
- control-flow behavior
|
||||
- whether the opcode is normal code, metadata, or trailer-oriented
|
||||
- whether the opcode participates in loop selector mini-languages
|
||||
|
||||
### Expected payoff
|
||||
|
||||
- fewer ad hoc decode branches
|
||||
- easier regression testing against the text corpus
|
||||
- cleaner IR for later restructuring passes
|
||||
|
||||
## Priority 2: Real loop/selector decoding
|
||||
|
||||
### What to borrow
|
||||
|
||||
From the older disassembly corpus:
|
||||
|
||||
- the meaning of `loopscr` tokens such as `end`, `==`, `item->shape`, `item->family`, and typed literal selectors
|
||||
- the visible repeated patterns in alarm-family and trigger-family bodies
|
||||
|
||||
### Why it matters
|
||||
|
||||
Right now the parser preserves loop selector bytes faithfully, but readable pseudocode still shows comments like `loopscr value_u8=0x40` instead of the underlying search semantics.
|
||||
|
||||
That is the main reason scripts like `ALARMHAT` still read as partially machine-shaped even though the overall behavior is already understandable.
|
||||
|
||||
### Concrete change
|
||||
|
||||
Introduce a small loop-selector IR layer so common loop forms render as something closer to:
|
||||
|
||||
```text
|
||||
for item in nearby_items(shape=0x04D0, origin=arg_06):
|
||||
```
|
||||
|
||||
or:
|
||||
|
||||
```text
|
||||
for candidate in nearby_items(family=6, origin=arg_06):
|
||||
```
|
||||
|
||||
The first target is not full generality. The first target is the set of repeated loop forms already seen in:
|
||||
|
||||
- `NPCTRIG`
|
||||
- `ALARMHAT`
|
||||
- `ALARMBOX`
|
||||
- `ALRMTRIG`
|
||||
- nearby environmental families
|
||||
|
||||
### Expected payoff
|
||||
|
||||
- much better readability for object-searching scripts
|
||||
- better gameplay interpretation of trigger/controller classes
|
||||
- a cleaner path to naming common search idioms
|
||||
|
||||
## Priority 3: Better intrinsic naming and signatures
|
||||
|
||||
### What to borrow
|
||||
|
||||
From Pentagram and `crusader-disasm`:
|
||||
|
||||
- historical intrinsic names
|
||||
- text-mined call arities and stack cleanup behavior
|
||||
- rough prototype guesses from the older corpus tools
|
||||
|
||||
### Why it matters
|
||||
|
||||
Readable pseudocode is bottlenecked less by control flow now and more by anonymous calls like `Intrinsic0007()` or generic placeholders like `class_0A18_slot_20(...)`.
|
||||
|
||||
The older tool lines already contain partial information that can improve this materially, as long as it is treated as hint-quality evidence rather than rename authority.
|
||||
|
||||
### Concrete change
|
||||
|
||||
Build a local intrinsic metadata table with confidence levels:
|
||||
|
||||
- `verified`
|
||||
- `strong hint`
|
||||
- `weak hint`
|
||||
|
||||
Populate it from:
|
||||
|
||||
- Pentagram tables
|
||||
- `usecode_opcodes.txt`
|
||||
- mined `calli`/`add sp` patterns from `crusader_disasm.txt`
|
||||
- current repo notes where compiled-side names are already justified
|
||||
|
||||
### Expected payoff
|
||||
|
||||
- more readable pseudocode
|
||||
- safer future promotion of intrinsic names
|
||||
- less confusion between Remorse-only, Regret-only, and cross-game vocabulary
|
||||
|
||||
## Priority 4: Explicit code-versus-trailer boundaries
|
||||
|
||||
### What to borrow
|
||||
|
||||
From Pentagram's symbol-info/debug-symbol handling:
|
||||
|
||||
- the idea that `0x5C` points into structured trailer data
|
||||
- the practical distinction between executable body and debug/local trailer rows
|
||||
|
||||
### Why it matters
|
||||
|
||||
The JELYHACK pass already showed how important this is. Tiny scripts are easy to misread if post-`ret` metadata gets rendered as live code.
|
||||
|
||||
The current parser now avoids that in readable pseudocode, but the boundary logic should become a first-class part of the IR rather than a readability-only safeguard.
|
||||
|
||||
### Concrete change
|
||||
|
||||
Make trailer parsing explicit in the IR:
|
||||
|
||||
- code extent
|
||||
- trailer extent
|
||||
- debug symbol rows
|
||||
- line-number records
|
||||
- terminal `END`
|
||||
|
||||
### Expected payoff
|
||||
|
||||
- safer whole-corpus export
|
||||
- better local naming and source-like output
|
||||
- fewer false positives when mining repeated code bodies
|
||||
|
||||
## Priority 5: Corpus-level pattern clustering
|
||||
|
||||
### What to borrow
|
||||
|
||||
From the `crusader-disasm` corpus mindset:
|
||||
|
||||
- treat the full body set as a searchable evidence base, not only as isolated scripts
|
||||
|
||||
### Why it matters
|
||||
|
||||
The JELYHACK result was only obvious after repeated-body comparison showed it was a small shared stub. The same strategy can keep the decompiler honest elsewhere.
|
||||
|
||||
### Concrete change
|
||||
|
||||
Add corpus analysis helpers that cluster or index:
|
||||
|
||||
- exact repeated bodies
|
||||
- normalized repeated bodies
|
||||
- repeated loop-selector templates
|
||||
- repeated spawn/call templates by class and slot
|
||||
|
||||
Those results should feed back into readable annotations like:
|
||||
|
||||
- `shared interaction stub`
|
||||
- `alarm-family controller template`
|
||||
- `common trigger setup pattern`
|
||||
|
||||
### Expected payoff
|
||||
|
||||
- faster triage of interesting scripts
|
||||
- better distinction between generic templates and unique gameplay logic
|
||||
- fewer overinterpretations of tiny bodies
|
||||
|
||||
## Priority 6: Stronger runtime bridge and import path
|
||||
|
||||
### What to borrow
|
||||
|
||||
From the local repo workflow rather than directly from Pentagram:
|
||||
|
||||
- the current runtime anchors already recorded in `runtime_vm_ir.tsv`
|
||||
- the Ghidra-side annotation path planned in the USECODE notes
|
||||
|
||||
### Why it matters
|
||||
|
||||
The parser is strongest when its readable output can be tied back to the compiled loader and sequencer. That keeps the decompiler grounded instead of drifting into pure script aesthetics.
|
||||
|
||||
### Concrete change
|
||||
|
||||
Expand the export and annotation path so pseudocode/index output can carry verified runtime anchors where known, especially around:
|
||||
|
||||
- `000d:51fd`
|
||||
- `000d:5572`
|
||||
- `000d:46ec`
|
||||
- `000d:21ed`
|
||||
- `000d:22bc`
|
||||
- `000d:ebe3`
|
||||
|
||||
### Expected payoff
|
||||
|
||||
- easier Ghidra-side correlation
|
||||
- safer promotion of slot/event names
|
||||
- better compiled-to-script navigation
|
||||
|
||||
## Suggested implementation order
|
||||
|
||||
1. stabilize opcode metadata tables
|
||||
2. formalize trailer parsing in IR
|
||||
3. implement first real loop-selector decoder for common `shape` and `family` searches
|
||||
4. add intrinsic metadata with confidence levels
|
||||
5. add corpus clustering/index helpers
|
||||
6. extend runtime-anchor export/import integration
|
||||
|
||||
## What not to do yet
|
||||
|
||||
- Do not chase full round-tripping first. Readability is still the higher-value frontier.
|
||||
- Do not mass-promote intrinsic or event names from Pentagram or the old disasm corpus without current-binary support.
|
||||
- Do not try to solve every loop/selector form before landing the small repeated set that already appears across the alarm and trigger families.
|
||||
|
||||
## Current best next step
|
||||
|
||||
The most leverage is in loop-selector decoding.
|
||||
|
||||
That is the place where the older tools still give us directly reusable structure and where the current readable output most obviously needs another step forward.
|
||||
282
docs/usecode-tooling-comparison.md
Normal file
282
docs/usecode-tooling-comparison.md
Normal file
|
|
@ -0,0 +1,282 @@
|
|||
# USECODE Tooling Comparison
|
||||
|
||||
## Purpose
|
||||
|
||||
This note compares three different USECODE-facing tool lines now in use around the Crusader work:
|
||||
|
||||
1. Pentagram's built-in Crusader usecode converter/disassembler
|
||||
2. the local `crusader-disasm` corpus and helper scripts
|
||||
3. the current workspace parser/decompiler in `tools/poc_crusader_usecode_parser.py`
|
||||
|
||||
The goal is not to rank them abstractly. The goal is to state what each one is actually good at, what assumptions it bakes in, and why the current local parser had to diverge.
|
||||
|
||||
## Short version
|
||||
|
||||
Pentagram is a game-engine-side disassembler/converter with generic Crusader hooks.
|
||||
|
||||
`crusader-disasm` is mostly a generated disassembly corpus plus small maintenance scripts that mine or preserve information from that corpus.
|
||||
|
||||
Our current parser is the first tool in this workspace that is explicitly built around the validated owner-loaded `EUSECODE.FLX` structure recovered from the retail binary and then pushed further into readable pseudocode export.
|
||||
|
||||
## Pentagram: what it does
|
||||
|
||||
The relevant Pentagram pieces are:
|
||||
|
||||
- `convert/crusader/ConvertUsecodeCrusader.h`
|
||||
- `convert/Convert.h`
|
||||
- `tools/disasm/Disasm.cpp`
|
||||
- `usecode/UsecodeFlex.cpp`
|
||||
|
||||
### Pentagram's model
|
||||
|
||||
Pentagram is trying to solve a different problem from our current script. It is not primarily a workspace extraction/decompilation pipeline. It is an engine-aware converter/disassembler that sits on top of Pentagram's own USECODE model.
|
||||
|
||||
Its Crusader-specific logic provides:
|
||||
|
||||
- an event-name table for slots `0x00..0x1f`
|
||||
- an intrinsic-name table
|
||||
- a Crusader header reader
|
||||
- Crusader event-table decoding through `readevents`
|
||||
- Crusader opcode parsing by routing into the generic `readOpGeneric(..., crusader=true)` path
|
||||
|
||||
### What Pentagram assumes
|
||||
|
||||
Pentagram's class/container assumptions come from its own `UsecodeFlex` and converter model:
|
||||
|
||||
- class bodies are addressed as object `classid + 2`
|
||||
- class names come from object `1`
|
||||
- the Crusader base offset comes from bytes `8..11`, then decremented by `1`
|
||||
- event count is derived as `(base_offset + 19) / 6`
|
||||
- disassembly is driven from the converter header and event table, not from our later owner-loaded extractor outputs
|
||||
|
||||
That is close enough to be extremely useful, but it is not the same as the now-validated local owner-loaded reading we use in this repo.
|
||||
|
||||
### What Pentagram outputs well
|
||||
|
||||
Pentagram is strong at:
|
||||
|
||||
- linear opcode disassembly
|
||||
- printing BP/SP-relative references in a readable way
|
||||
- mapping class/slot offsets to event names
|
||||
- following opcode `0x5C` symbol-info records into trailing local/debug symbol data
|
||||
- printing those debug symbols after the code body
|
||||
|
||||
The JELYHACK example is a good illustration. Pentagram's disassembly prints:
|
||||
|
||||
```text
|
||||
Func_1 (Event 1) JELYHACK::use():
|
||||
0001: 5A init 00
|
||||
0003: 5C symbol info offset 001Ch = "JELYHACK"
|
||||
000F: 0B push 0207h
|
||||
0012: 40 push dword [BP+06h]
|
||||
0014: 4C push indirect 02h bytes
|
||||
0016: 77 set info
|
||||
0017: 78 process exclude
|
||||
0018: 5B line number 219 (00DBh)
|
||||
001B: 50 ret
|
||||
00: 01 type=69 (i) [BP+00h] (00) 00 referent
|
||||
002A: 7A end
|
||||
```
|
||||
|
||||
That is still one of the clearest proofs that the post-`ret` region contains local/debug-style metadata, not active control flow.
|
||||
|
||||
### Where Pentagram stops short for this repo
|
||||
|
||||
Pentagram is not built around our current local needs:
|
||||
|
||||
- it does not consume `class_layout_index.tsv`, `class_event_index.tsv`, or the extracted chunk corpus
|
||||
- it does not expose a workspace-friendly IR
|
||||
- it does not attach our verified runtime anchors from `runtime_vm_ir.tsv`
|
||||
- it does not export batch pseudocode for the whole `EUSECODE` corpus
|
||||
- it still reflects a converter/disassembler view, not a readability-first decompiler view
|
||||
- its Crusader intrinsic table is explicitly mixed with Regret-era knowledge and is useful as a hint table, not rename authority
|
||||
|
||||
So Pentagram gave us crucial structure and vocabulary, but not the repo-specific decompilation pipeline we needed.
|
||||
|
||||
## crusader-disasm: what it does
|
||||
|
||||
The local `crusader-disasm` tree is different again. It is not one coherent parser in the same way Pentagram is. It is a mixture of:
|
||||
|
||||
- a large generated disassembly corpus in `crusader_disasm.txt`
|
||||
- opcode-name tables such as `usecode_opcodes.txt`
|
||||
- small maintenance scripts such as `parse_crusader_disasm.py` and `update_disasm_comments.py`
|
||||
- handwritten notes and side data gathered over time
|
||||
|
||||
### What `crusader-disasm` is strongest at
|
||||
|
||||
Its biggest strength is that it is already a rich evidence corpus.
|
||||
|
||||
`usecode_opcodes.txt` gives a full opcode-name vocabulary such as:
|
||||
|
||||
- `0x04 ASSIGN_MEMBER_CHAR`
|
||||
- `0x10 NEAR_ROUTINE_CALL`
|
||||
- `0x5C SYMBOL_INFO`
|
||||
- `0x78 PROCESS_EXCLUDE`
|
||||
- `0x7A END`
|
||||
|
||||
That helped verify several names and fill decode gaps in our parser.
|
||||
|
||||
The generated `crusader_disasm.txt` is also valuable because it shows concrete output form, not just names. It proved things like:
|
||||
|
||||
- how `symbol info` is rendered
|
||||
- where local/debug symbol rows appear
|
||||
- what a tiny body like `JELYHACK::use` looks like in a traditional disassembly listing
|
||||
|
||||
### What the helper scripts actually do
|
||||
|
||||
The helper scripts in `crusader-disasm` are narrow and pragmatic.
|
||||
|
||||
`parse_crusader_disasm.py`:
|
||||
|
||||
- scans an already-generated `crusader_disasm.txt`
|
||||
- looks for `calli` lines, nearby `add sp`, and retval pushes
|
||||
- infers rough intrinsic prototypes from the text listing
|
||||
- emits a guessed intrinsic table
|
||||
|
||||
That means it is not parsing `EUSECODE.FLX` directly. It is mining structure from a pre-rendered textual disassembly.
|
||||
|
||||
`update_disasm_comments.py`:
|
||||
|
||||
- merges comments from an older disassembly into an updated regenerated one
|
||||
- preserves manual annotations when intrinsic names change
|
||||
|
||||
So this is again a maintenance aid around a text corpus, not a first-principles byte parser.
|
||||
|
||||
### Where `crusader-disasm` stops short for this repo
|
||||
|
||||
`crusader-disasm` is excellent evidence, but weak as a live decompilation pipeline:
|
||||
|
||||
- it does not operate on our extracted owner-loaded chunk/index data
|
||||
- it does not produce structured IR
|
||||
- it does not know our validated body windows from `class_event_index.tsv`
|
||||
- it does not emit script/pseudocode views
|
||||
- it does not integrate runtime-anchor hints from the current RE notes
|
||||
- some of its information is annotation-quality and corpus-quality rather than machine-robust parser output
|
||||
|
||||
In practice, `crusader-disasm` has been most useful as a vocabulary/evidence source, not as the final tool we run to generate the readable corpus.
|
||||
|
||||
## Our current parser/decompiler: what it does differently
|
||||
|
||||
The current local tool line is centered on:
|
||||
|
||||
- `tools/extract_eusecode_flx.py`
|
||||
- `tools/poc_crusader_usecode_parser.py`
|
||||
- `tools/export_usecode_pseudocode.py`
|
||||
|
||||
### 1. It is built around the validated owner-loaded local format
|
||||
|
||||
This is the biggest difference.
|
||||
|
||||
Our parser does not start from Pentagram's generic converter header model or from a pre-rendered disassembly text file. It starts from the extracted local artifacts and the currently validated retail-binary understanding:
|
||||
|
||||
- `class_id + 2` body lookup
|
||||
- bytes `8..11` treated as the first code-byte anchor / `code_base_minus_one` basis
|
||||
- 6-byte event rows at `+20`
|
||||
- derived body ranges emitted into `class_event_index.tsv`
|
||||
- chunk files under `USECODE/EUSECODE_extracted/chunks/`
|
||||
|
||||
That is why it can decompile the actual extracted corpus in a repeatable workspace-local way.
|
||||
|
||||
### 2. It separates authoritative IR from readable views
|
||||
|
||||
Pentagram and `crusader-disasm` mostly produce one human-facing linear listing.
|
||||
|
||||
Our parser deliberately splits output into layers:
|
||||
|
||||
- JSON IR for machine-facing structure
|
||||
- flat text listing for byte-faithful decode
|
||||
- script view for stack-machine readability
|
||||
- pseudocode view for programming-language-like readability
|
||||
- batch export of that pseudocode corpus into `USECODE/EUSECODE_extracted/pseudocode`
|
||||
|
||||
That separation is what let us make JELYHACK readable without losing the exact bytes and trailer structure.
|
||||
|
||||
### 3. It handles post-`ret` metadata differently
|
||||
|
||||
Pentagram already knew about debug symbols through `0x5C` and `readDbgSymbols()`.
|
||||
|
||||
The important difference is that our parser had to make that logic safe in the extracted-corpus setting:
|
||||
|
||||
- it now detects ret-anchored debug/local trailers explicitly
|
||||
- it avoids mis-decoding those bytes as live opcodes on bodies like `NPCTRIG 0x0A`
|
||||
- it exposes debug symbols in the IR and readable views
|
||||
- it now hides dead post-return junk from the human pseudocode when readability matters more than raw listing fidelity
|
||||
|
||||
So Pentagram gave the structural clue, but our parser had to adapt it to the owner-loaded extracted corpus and to the readability-first output mode.
|
||||
|
||||
### 4. It adds runtime cross-reference hints that the older tools do not
|
||||
|
||||
Our parser attaches the verified runtime bridge information from `runtime_vm_ir.tsv` and related notes, such as:
|
||||
|
||||
- `000d:0988`
|
||||
- `000d:177c`
|
||||
- `000d:1acb`
|
||||
- `000d:208b`
|
||||
- `000d:21ed`
|
||||
- `000d:22bc`
|
||||
- `000d:2104`
|
||||
- `000d:46ec`
|
||||
- `000d:ebe3`
|
||||
|
||||
Neither Pentagram nor `crusader-disasm` is doing that kind of live repo-specific runtime correlation.
|
||||
|
||||
### 5. It is aimed at whole-corpus readability, not only opcode fidelity
|
||||
|
||||
This is the most visible practical difference.
|
||||
|
||||
Pentagram and `crusader-disasm` are good at telling you what bytes and opcodes are present.
|
||||
|
||||
Our current script is trying to answer a different question too:
|
||||
|
||||
`What does this class body seem to do, in language a human can scan?`
|
||||
|
||||
That is why the current parser now:
|
||||
|
||||
- names locals where the debug trailer provides them
|
||||
- folds compare ladders into `if / else if`
|
||||
- suppresses dead post-`ret` tail noise in pseudocode
|
||||
- exports the whole decoded corpus into per-class pseudocode files
|
||||
|
||||
That is the main place where our script now goes beyond the older tools.
|
||||
|
||||
## What the older tools still do better
|
||||
|
||||
This is not a one-way replacement story.
|
||||
|
||||
Pentagram still does some things better than our current script:
|
||||
|
||||
- broader mature generic opcode conversion framework
|
||||
- a cleaner historical disassembler path for symbol-info and debug-symbol printing
|
||||
- a converter architecture that already knows how to build node-like structures for many ops
|
||||
|
||||
`crusader-disasm` still does some things better too:
|
||||
|
||||
- richer long-lived annotation corpus
|
||||
- a larger existing body of older naming/vocabulary experiments
|
||||
- a direct opcode-name table from a distinct extraction route
|
||||
- concrete disassembly output that is sometimes easier to cross-check than a newer heuristic pseudocode layer
|
||||
|
||||
So the best current workflow is still hybrid:
|
||||
|
||||
- use Pentagram for structural/reference behavior
|
||||
- use `crusader-disasm` for opcode vocabulary and corpus evidence
|
||||
- use the local parser for validated owner-loaded extraction, IR, pseudocode, and batch readability export
|
||||
|
||||
## Best current summary
|
||||
|
||||
Pentagram is a converter/disassembler.
|
||||
|
||||
`crusader-disasm` is a disassembly corpus with helper scripts.
|
||||
|
||||
Our script is the first repo-local tool that is explicitly trying to be a readable decompiler over the validated extracted `EUSECODE` corpus.
|
||||
|
||||
That is why the current parser looks less like a classic disassembler and more like a layered RE workbench:
|
||||
|
||||
- extractor-backed local format understanding
|
||||
- structured IR
|
||||
- byte-faithful listing
|
||||
- readability-first script/pseudocode views
|
||||
- batch corpus export
|
||||
- runtime-annotation hints tied to the current Crusader notes
|
||||
|
||||
The tradeoff is that our current script is newer and more heuristic. It is better at producing something a human can read across the whole corpus, but it is not yet as mature or as battle-tested at raw opcode coverage as the older reference tools.
|
||||
Loading…
Add table
Add a link
Reference in a new issue