Map sorting and usecode

This commit is contained in:
MaddoScientisto 2026-03-26 23:12:38 +01:00
commit af5b77ea13
7 changed files with 1497 additions and 39 deletions

View file

@ -8,6 +8,8 @@ Recent verified localized-build batch: [docs/spanish-cheat-differences.md](docs/
Recent verified batch: [docs/retail-debug-arg.md](docs/retail-debug-arg.md) now records the live NE proof that retail `CRUSADER.EXE` still recognizes and executes a real `-debug` command-line branch. That branch prints `Debugging mode ON.`, sets `g_debugMsgLevel` at `1478:87e0`, and toggles two debug globals at `1478:0845/0859`. The later sink pass also closes the text-output target more tightly: `ProbablyPrintDebugMessage` formats through the static stdio-style table at `1478:6c32..6c81` and writes to the handle-`1` entry at `1478:6c46`, so the non-video side is ordinary DOS `stdout` gated by the debug threshold, plus the already-confirmed AVI timing overlay. Current best read remains `surviving debug-output / instrumentation switch`, not `the missing bootstrap for the hidden seg109/seg1408 usecode debugger`. The same batch also leaves the earlier `-laurie` and `0x659c/659e` debugger-state conclusions intact: `-debug` is a separate switch and is not currently evidenced as constructing the hidden usecode-debugger break-state object.
Recent tooling batch: [docs/map-rendering.md](docs/map-rendering.md) now starts a dedicated offline map-rendering lane. `tools/render_crusader_map.py` can load `FIXED.DAT`, expand `GLOB.FLX`, decode the required `SHAPES.FLX` entries with Crusader frame headers, apply `GAMEPAL.PAL`, and write a first-pass PNG, with a `--fixed-dat` override so the same pipeline can be pointed at either game's map file. The current renderer is intentionally limited to fixed-map content and a simple deterministic painter rather than the full Pentagram/ScummVM dependency sorter, and the current workspace caveat is that `STATIC_REGRET` still lacks a copied `FIXED.DAT`, so No Regret rendering needs that file supplied explicitly.
Latest doc-reconciliation batch: [docs/ne-segment1.md](docs/ne-segment1.md) now has a combined hidden-debugger component table that explicitly separates the seg109/raw-reference UI wrappers (`000b:9a86`, `000b:9c0d`, `000b:b3b1`, `000b:b62c`, `000b:2882`) from the live seg1408 breakpoint-state helpers (`1408:0000`, `1408:0053`, `1408:00dd`, `1408:029e`, `1408:03b0`, `1408:03f7`, `1408:0419`, `1408:0432`, `1408:0444`) and the interpreter hook at `1418:04aa..04b5`. Current best read remains `two connected layers of one hidden usecode debugger`, not `conflicting address claims for the same function family`.
Follow-up cheat-key correction pass: [docs/ne-segment1.md](docs/ne-segment1.md) now also records a live NE cleanup of several folklore keyboard-cheat claims. `~` is a real runtime cheat-latch toggle at `13e8:203d`, `Ctrl+C` is wrong for this build and should be `Ctrl+L` for the coordinate popup at `13e8:255e`, and the third F7-family overlay really does exist as a separate `Ctrl+F7` path at `13e8:1a20` alongside the other two cheat-gated F7 overlay toggles.
@ -36,6 +38,7 @@ The same `docs/ne-segment1.md` note now also has the first consolidated cheat/de
| [docs/retail-debug-arg.md](docs/retail-debug-arg.md) | Focused note on the retail `-debug` command-line switch: live parser evidence, exact startup message, surviving globals, segment `1468` instrumentation path, and why it is currently separate from the hidden usecode debugger bootstrap |
| [docs/scummvm-crusader-reference.md](docs/scummvm-crusader-reference.md) | ScummVM Ultima8/Pentagram Crusader integration survey: USECODE/event tables, FLEX/resource formats, world/map loaders, HUD/media, and RE follow-up priorities |
| [docs/pentagram-crusader-reference.md](docs/pentagram-crusader-reference.md) | Pentagram-source Crusader/U8 reference: direct Crusader USECODE parser and VM evidence, U8 usecode docs, runtime-confidence limits, and cross-checks against the ScummVM note |
| [docs/map-rendering.md](docs/map-rendering.md) | Offline map-rendering lane: `FIXED.DAT`/`GLOB.FLX`/`SHAPES.FLX`/`GAMEPAL.PAL` format notes, current Python renderer, supported inputs, and fidelity gaps |
| [docs/usecode-roundtrip-ir.md](docs/usecode-roundtrip-ir.md) | ScummVM-to-binary USECODE cross-walk, owner-loaded class-layout and header/event-count reconciliation, conservative IR v0 plan, and the generated class-event/body-window outputs that now ground reversible `_BOOT`, `SURCAM*`, and environmental family decompile artifacts plus repeated-family regression checks |
| [docs/usecode-pentagram-ghidra-path.md](docs/usecode-pentagram-ghidra-path.md) | Pentagram-derived Crusader USECODE parser plan, proof-of-concept workflow, canonical IR v1 goals, and the Ghidra-side annotation import path |
| [docs/usecode-tooling-comparison.md](docs/usecode-tooling-comparison.md) | Comparison of Pentagram's converter/disassembler, the local `crusader-disasm` corpus/scripts, and the current workspace parser/pseudocode exporter, with emphasis on assumptions, strengths, and repo-specific differences |

221
docs/map-rendering.md Normal file
View file

@ -0,0 +1,221 @@
# Crusader Map Rendering Workbench
## Purpose
This note starts a dedicated lane for offline Crusader map extraction and PNG rendering from the shipped data files in this workspace.
Current implementation entry point:
- `tools/render_crusader_map.py`
Current supported data roots:
- `STATIC` for No Remorse
- `STATIC_REGRET` for No Regret
Current asset note:
- `STATIC_REGRET` in this workspace now includes `FIXED.DAT`
- the renderer still accepts `--fixed-dat` so alternate map copies can be tested without changing the rest of the static asset path
The immediate goal is practical and narrow: load a fixed map, expand glob placements, decode the required shapes from `SHAPES.FLX`, apply `GAMEPAL.PAL`, and render a deterministic PNG.
## Source Cross-Checks Used
The first renderer is grounded in the overlapping parts of three sources rather than in ad hoc guesses.
1. Pentagram Crusader shape/map loaders
- `convert/crusader/ConvertShapeCrusader.cpp`
- `graphics/Shape.cpp`
- `graphics/ShapeFrame.cpp`
- `world/Map.cpp`
- `world/MapGlob.cpp`
- `graphics/Palette.cpp`
- `graphics/TypeFlags.cpp`
2. ScummVM Ultima8 Crusader paths
- `gfx/shape_archive.cpp`
- `gfx/type_flags.cpp`
- `world/map.cpp`
- `world/glob_egg.cpp`
- `world/coord_utils.h`
- `world/item_sorter.cpp`
- `world/sort_item.cpp`
3. Local workspace evidence
- `docs/scummvm-crusader-reference.md`
- `docs/pentagram-crusader-reference.md`
- `docs/raw-0007-rendering.md`
- `crusader-disasm/shapedata.txt`
- `crusader-disasm/mapdump/mapdump.py`
## File Formats Used By The First Tool
### `FIXED.DAT`
The map container is treated as a header plus a map table:
- map count at file offset `0x54`
- map table at file offset `0x80`
- each table row is `<u32 offset, u32 size>`
Each map payload is read as packed 16-byte item records:
- `x: u16`
- `y: u16`
- `z: u8`
- `shape: u16`
- `frame: u8`
- `flags: u16`
- `quality: u16`
- `npc_num: u8`
- `map_num: u8`
- `next: u16`
Crusader-specific coordinate adjustment matches the Pentagram and ScummVM runtime loaders:
- world `x = disk_x * 2`
- world `y = disk_y * 2`
### `GLOB.FLX`
`GLOB.FLX` is handled as a normal FLEX archive, not as a one-off format.
Each non-empty glob object contains:
- object count: `u16`
- repeated entries of `x:u8 y:u8 z:u8 shape:u16 frame:u8`
Glob expansion matches the Crusader `GlobEgg::enterFastArea()` rule in ScummVM/Pentagram:
- `coordmask = ~0x3ff`
- `coordshift = 2`
- `offset = 2`
- `itemx = (parent_x & coordmask) + (glob_x << 2) + 2`
- `itemy = (parent_y & coordmask) + (glob_y << 2) + 2`
- `itemz = parent_z + glob_z`
The first renderer expands glob contents and skips drawing the source glob egg itself.
### `SHAPES.FLX`
World shapes use the Crusader shape layout documented by Pentagram/ScummVM:
- shape header: 6 bytes
- 4 bytes unknown
- 2-byte frame count
- frame header table: 8 bytes per frame
- 3-byte frame offset
- 1 unknown byte
- 4-byte frame length
- frame body header: 28 bytes
- 8 unknown bytes
- 4-byte compression flag
- 4-byte width
- 4-byte height
- 4-byte x offset
- 4-byte y offset
- then `height` 4-byte line offsets
- then per-line RLE data
The current decoder follows the runtime line walker used in `ShapeFrame::getPixelAtPoint()`:
- each line is a series of skip/run pairs
- compressed runs use the low bit to choose literal versus repeated-color mode
- pixels absent from the RLE stream are treated as transparent
### `GAMEPAL.PAL`
`GAMEPAL.PAL` is read as 768 bytes of VGA-style palette data.
Each component is promoted from `0..63` to `0..255` using the same scaling used by Pentagram:
- `rgb8 = (rgb6 * 255) / 63`
### `TYPEFLAG.DAT`
The renderer currently uses Crusader's 9-byte records to extract:
- family id
- shape footpad dimensions (`x`, `y`, `z`)
- editor flag
This is enough for:
- skipping known egg families in the first pass
- expanding `SF_GLOBEGG`
- documenting future work toward a better sorter
The current tool does not yet use the footpad values for full ItemSorter-equivalent overlap resolution.
## Current Projection And Painting Rules
The renderer anchors each shape at the same world-to-screen bottom point used by the runtime shape painter:
$$
screen_x = \frac{x - y}{4}
$$
$$
screen_y = \frac{x + y}{8} - z
$$
Frame placement then follows the shape-frame offsets used by the runtime sorter:
- unflipped: `left = screen_x - xoff`
- flipped: `left = screen_x + xoff - width`
- `top = screen_y - yoff`
The renderer now uses a ScummVM/Pentagram-style dependency graph sorter rather than a plain scalar key.
The current implementation ports the crucial parts of `SortItem` and `ItemSorter`:
- footpad-derived world boxes from `TYPEFLAG.DAT`
- screen-diamond overlap and containment checks
- `below()` ordering rules for flat pieces, tall pieces, roofs, translucent items, and Crusader inventory-item families
- dependency expansion so overlapping items are painted only after everything behind them
This is materially better than the initial `z / x+y` heuristic and is the main path for reducing wall and prop overdraw artifacts, though it still omits some of the engine's more specialized runtime-only cases.
## Command Examples
Render No Remorse map `0`:
```powershell
c:/Users/Maddo/.PYENV/PYENV-WIN/versions/3.14.3/python.exe tools/render_crusader_map.py --game remorse --map 0 --output out/map0-remorse.png
```
Render No Regret map `0` and emit metadata:
```powershell
c:/Users/Maddo/.PYENV/PYENV-WIN/versions/3.14.3/python.exe tools/render_crusader_map.py --game regret --fixed-dat K:/path/to/REGRET/FIXED.DAT --map 0 --output out/map0-regret.png --metadata out/map0-regret.json
```
Render a bounded world-space region only:
```powershell
c:/Users/Maddo/.PYENV/PYENV-WIN/versions/3.14.3/python.exe tools/render_crusader_map.py --game remorse --map 0 --world-rect 0 0 4096 4096 --output out/map0-quarter.png
```
## Current Deliberate Limits
This tool is a start, not a complete engine clone.
Current gaps:
1. It renders `FIXED.DAT` only. It does not yet merge save-state or `NONFIXED.DAT` style movable items.
2. It expands globs, but it does not yet emulate broader fast-area/runtime-driven materialization behavior.
3. It skips several egg-family placements instead of trying to visualize their hidden runtime helpers.
4. It now implements the core dependency graph sorter, but it still omits experimental occlusion grouping and some runtime-only sprite/highlight cases.
5. It does not yet consume `ANIM.DAT`, `DAMAGE.FLX`, `DTABLE.FLX`, `WPNOVLAY.DAT`, or palette transforms such as `XFORMPAL.DAT`.
6. It uses `GAMEPAL.PAL` directly and does not yet model alternate or transformed palettes.
7. It writes a plain RGBA PNG using only the standard library; there is no zoomed viewer, tile atlas exporter, or sprite manifest yet.
## Immediate Follow-Ups
1. Validate and tune the dependency sorter against representative Remorse and Regret rooms, especially tall wall seams and dense prop clusters.
2. Add optional atlas export for all shapes touched by a chosen map.
3. Add a second path for movable/dynamic content once the relevant Crusader save/runtime files are pinned down for both games.
4. Compare a few rendered regions against known in-game screenshots to tighten projection and ordering errors.
5. Add optional per-item manifest output with `(shape, frame, x, y, z, source)` rows for debugging bad composites.
6. Revisit raw `0007` rendering notes and the live executable only if the current Pentagram/ScummVM overlap model proves insufficient for specific remaining errors.

View file

@ -791,3 +791,18 @@ The strongest present path to a usable compiler/decompiler is:
6. Recompile by rebuilding the original class header and event table layout first, then re-emitting decoded and opaque ops together.
That gets to a reversible editor sooner than waiting for a full semantic VM recovery.
## **Recent Research (2026-03-26)**
- **Root Cause:**: The structuring pass left forward/back-edge loops and counted-loop headers detached in fallback output, which produced unstructured pseudocode for some bodies (notably BART slot 0x0F).
- **Renderer Fixes:**: Added a conservative loop-lifting helper and a restricted infinite-loop lift in the partial fallback renderer to fold loops into structured blocks where safe. See the modified renderer at [tools/poc_crusader_usecode_parser.py](tools/poc_crusader_usecode_parser.py).
- **Validator Added:**: A lightweight pseudocode syntax/label validator was added to detect brace mismatches and missing goto/label targets before exporting pseudocode.
- **Tests:**: Added and adjusted unit tests in [tools/tests/test_usecode_structuring.py](tools/tests/test_usecode_structuring.py) to guard loop-lifting behavior and fallback conservatism.
- **Corpus Validation:**: Ran a corpus-wide render+validator pass over 977 decoded bodies; result: `TOTAL_BODIES=977, FAILURES=0` (no syntax/label failures).
- **Real-World Output:**: Regenerated the BART pseudocode file — [USECODE/EUSECODE_extracted/pseudocode/BART/slot_0F_enterFastArea.txt](USECODE/EUSECODE_extracted/pseudocode/BART/slot_0F_enterFastArea.txt) now shows an outer `while(true)` with nested structured branches and counted loops instead of detached labels.
- **Scope & Safety:**: Fully-structured renderer remains conservative; the loop-lifting helper is reused where safe. The outer infinite-loop lift was narrowed to partial fallback after tests revealed regressions when it was too broad.
- **Remaining Semantic Gap:**: Expression/comparison operand polarity still needs correction (some counted-loop conditions show inverted comparisons). Next work: fix operand ordering in the expression builder so loop headers reflect correct comparison direction.
- **Next Steps:**: (1) Implement compare-direction fix in the expression builder and add small semantic regression tests, (2) re-run unit tests and a corpus-wide render+validate sweep, (3) regenerate affected pseudocode files for inspection.
- **Files of Interest:**: [tools/poc_crusader_usecode_parser.py](tools/poc_crusader_usecode_parser.py), [tools/tests/test_usecode_structuring.py](tools/tests/test_usecode_structuring.py), [USECODE/EUSECODE_extracted/pseudocode/BART/slot_0F_enterFastArea.txt](USECODE/EUSECODE_extracted/pseudocode/BART/slot_0F_enterFastArea.txt).
If you want, I can (a) implement the comparison/operand polarity fix next, (b) run the unit tests and a fresh corpus sweep, and (c) open a PR-ready commit with these doc and code updates.

View file

@ -49,6 +49,7 @@ Detailed completed analysis belongs in the files under `docs/`, not in this plan
- 000a/000d tracked-handle, cache, allocator, dispatch-entry, and startup/display support lanes now have a coherent partial map.
- 000e parser and animation subsystems have a real partial map.
- The auxiliary local disassembly corpus at `K:/ghidra/crusader-disasm` is now inventoried and integrated as a separate evidence source for shape metadata, static map/object dumps, opcode names, and older Remorse/Regret intrinsic-function vocabularies; its safe-reuse rules and porting implications are captured in `docs/crusader-disasm-reference.md`.
- The workspace now also has a first dedicated offline map-rendering/tooling lane: `tools/render_crusader_map.py` can load a chosen `FIXED.DAT`, expand `GLOB.FLX`, decode required `SHAPES.FLX` frames, apply `GAMEPAL.PAL`, and emit a first-pass PNG from either static set, while `docs/map-rendering.md` captures the current format contracts, the `--fixed-dat` override, and the intentionally limited compositor model.
- The USECODE/VM owner/resource/runtime lane now has a workable partial model, a named sequencer entry, paired external file-family loader evidence, and supporting extraction/reporting tooling.
- The USECODE/VM tooling lane now also has a concrete near-term implementation path: a Pentagram-derived proof-of-concept parser can reuse opcode decoding while swapping in the locally verified owner-loaded class and slot arithmetic, with a hybrid Ghidra comment/bookmark import path instead of a premature custom processor module.
- The USECODE tooling lane now also has a first full readable corpus export: `tools/export_usecode_pseudocode.py` writes `977` current pseudocode bodies into `USECODE/EUSECODE_extracted/pseudocode`, and the first focused read of that corpus now shows `JELYHACK::use` / `JELYH2::use` as tiny shared `set_info(0x0207) -> process_exclude -> return` stubs rather than hidden active event cores.
@ -162,6 +163,7 @@ Detailed completed analysis belongs in the files under `docs/`, not in this plan
3. Refine the coverage ledger from already-verified notes before broadening into fresh segment sweeps.
4. Use boundary repair only on active blockers with clear payoff, with `000c:db68` now downgraded to optional hygiene unless it blocks adjacent work again.
5. Revisit the `0x4588` callback object only when caller-side evidence is strong enough to support behavioral naming.
6. Use the new offline map-rendering lane to cross-check shape ids, map placements, and visible world composition against `crusader-disasm` shape/map notes before promoting additional rendering- or static-object-related names in `CRUSADER.EXE`.
## Next Resume Point
@ -172,16 +174,17 @@ Detailed completed analysis belongs in the files under `docs/`, not in this plan
5. Refine the coverage ledger from already-verified notes before broadening into fresh segment sweeps.
6. Use boundary repair only on active blockers with clear payoff, with `000c:db68` now downgraded to optional hygiene unless it blocks adjacent work again.
7. Revisit the `0x4588` callback object only when caller-side evidence is strong enough to support behavioral naming.
8. Exercise `tools/render_crusader_map.py` on a few representative No Remorse and No Regret maps, then tighten the paint order using `TYPEFLAG.DAT` footpads and any mismatches visible against in-game screenshots or `crusader-disasm` map evidence.
8. Recover the real upstream caller/selector path into `entity_vm_opcode_sequence_run`, most likely by finding the first non-recursive `0x6714` context-method caller or vtable dispatch site rather than by repeating raw xref queries that still return no direct edges.
9. Recover real caller roles for `entity_vm_context_try_create_mask_0400_slot0a_with_offset` and `entity_vm_context_try_create_mask_0800_slot0b_with_offset` by treating them as the remaining dark members of the now-verified signed-additive masked-materializer subfamily and comparing them against the newly anchored slot-`0x12` caller pattern.
10. Tighten the newly surfaced higher-slot wrapper ladder around `0005:3115..31da`, especially the two slot-`0x12` caller sites at `0005:1776` / `0005:1945` and the slot-`0x10` guarded callsite, so any future promotion to `leaveFastArea` / `func11|cast` / `justMoved` / `AvatarStoleSomething` / `animGetHit` is driven by binary caller behavior rather than by external tables alone.
11. Tighten the outward caller chains around the renamed seg006 masked helpers `entity_vm_context_try_create_mask_0008_slot30_with_offset` (`0006:0ba4`) and `entity_vm_context_try_create_mask_0010_slot08_with_offset_if_ready` (`0006:108c`) so the local state-selector lane and the adjacent class-linked value family can be tied back to concrete gameplay subsystems rather than only to class-detail fields.
12. Tighten the paired-file-family reading of the seg070 twin loops at `0009:67b6` and `0009:6916` by recovering which temporary buffer and record schema each family populates behind `entity_vm_runtime_owner_resource_create`.
13. Promote additional ledger rows where the current docs already justify `Foothold`, `Partial`, or `Deep`.
14. If the VM lane stalls again, revisit `000e:ffb0` from the now-verified `00db/00dc` caller windows and try to recover an adjacent non-overlapped helper before attempting any boundary repair.
15. If the immortality lane is revisited, stay focused on `NPCTRIG` slot `0x0a` first, with slot `0x20` still treated as the typed/setup companion and `EVENT` only as the generic hub baseline; the three currently recovered direct `0005:295f` caller families are now all closed and comment-backed in the live NE program at `10f0:02d9`, `10f0:0379`, `10f0:03c3`, `10f0:03e5`, `1128:0ff0`, and `1138:1384`, so the next defensible step is an earlier producer that assigns subtype `0x20b/0x20c` into field `+0x3c` or otherwise chooses the owner-loaded class family before these generic damage consumers run.
16. Use the new Pentagram-derived parser proof of concept as the first tooling bridge for raw class/slot bodies: extend opcode coverage conservatively, emit IR v1 artifacts, and only then prototype a Ghidra-side annotation importer against compiled anchors like `000d:51fd`, `000d:5572`, `000d:46ec`, `000d:22bc`, and `000d:ebe3`.
9. Recover the real upstream caller/selector path into `entity_vm_opcode_sequence_run`, most likely by finding the first non-recursive `0x6714` context-method caller or vtable dispatch site rather than by repeating raw xref queries that still return no direct edges.
10. Recover real caller roles for `entity_vm_context_try_create_mask_0400_slot0a_with_offset` and `entity_vm_context_try_create_mask_0800_slot0b_with_offset` by treating them as the remaining dark members of the now-verified signed-additive masked-materializer subfamily and comparing them against the newly anchored slot-`0x12` caller pattern.
11. Tighten the newly surfaced higher-slot wrapper ladder around `0005:3115..31da`, especially the two slot-`0x12` caller sites at `0005:1776` / `0005:1945` and the slot-`0x10` guarded callsite, so any future promotion to `leaveFastArea` / `func11|cast` / `justMoved` / `AvatarStoleSomething` / `animGetHit` is driven by binary caller behavior rather than by external tables alone.
12. Tighten the outward caller chains around the renamed seg006 masked helpers `entity_vm_context_try_create_mask_0008_slot30_with_offset` (`0006:0ba4`) and `entity_vm_context_try_create_mask_0010_slot08_with_offset_if_ready` (`0006:108c`) so the local state-selector lane and the adjacent class-linked value family can be tied back to concrete gameplay subsystems rather than only to class-detail fields.
13. Tighten the paired-file-family reading of the seg070 twin loops at `0009:67b6` and `0009:6916` by recovering which temporary buffer and record schema each family populates behind `entity_vm_runtime_owner_resource_create`.
14. Promote additional ledger rows where the current docs already justify `Foothold`, `Partial`, or `Deep`.
15. If the VM lane stalls again, revisit `000e:ffb0` from the now-verified `00db/00dc` caller windows and try to recover an adjacent non-overlapped helper before attempting any boundary repair.
16. If the immortality lane is revisited, stay focused on `NPCTRIG` slot `0x0a` first, with slot `0x20` still treated as the typed/setup companion and `EVENT` only as the generic hub baseline; the three currently recovered direct `0005:295f` caller families are now all closed and comment-backed in the live NE program at `10f0:02d9`, `10f0:0379`, `10f0:03c3`, `10f0:03e5`, `1128:0ff0`, and `1138:1384`, so the next defensible step is an earlier producer that assigns subtype `0x20b/0x20c` into field `+0x3c` or otherwise chooses the owner-loaded class family before these generic damage consumers run.
17. Use the new Pentagram-derived parser proof of concept as the first tooling bridge for raw class/slot bodies: extend opcode coverage conservatively, emit IR v1 artifacts, and only then prototype a Ghidra-side annotation importer against compiled anchors like `000d:51fd`, `000d:5572`, `000d:46ec`, `000d:22bc`, and `000d:ebe3`.
## Remaining Work To Reach A Reasonably Complete Decompilation State

View file

@ -2544,6 +2544,110 @@ def render_selector_chain(
return rendered, label_to_index[join_label]
def render_loop_construct(
blocks: list[tuple[str, list[str]]],
label_to_index: dict[str, int],
index: int,
end_index: int,
return_labels: set[str],
active_regions: set[tuple[int, int, tuple[str, ...]]] | None = None,
render_cache: dict[tuple[int, int, tuple[str, ...]], tuple[list[str], bool] | None] | None = None,
) -> tuple[list[str], int] | None:
_, statements = blocks[index]
if not statements:
return None
terminal = parse_terminal_statement(statements[-1])
if terminal is None or terminal.kind != "if":
return None
target_label = terminal.target or ""
target_index = label_to_index.get(target_label)
if target_index is None or target_index <= index or target_index > end_index:
return None
loop_tail_index = last_nonempty_block_index(blocks, index + 1, target_index)
if loop_tail_index is None:
return None
loop_tail_terminal = parse_terminal_statement(blocks[loop_tail_index][1][-1])
if loop_tail_terminal is None or loop_tail_terminal.kind != "goto" or loop_tail_terminal.target != blocks[index][0]:
return None
loop_body = render_structured_region(
blocks,
label_to_index,
index + 1,
target_index,
return_labels,
{blocks[index][0]},
active_regions,
render_cache,
)
if loop_body is None:
return None
loop_lines, _ = loop_body
loop_selector = None
if index > 0 and is_loop_selector_only_block(blocks[index - 1][1]):
loop_selector = parse_loop_selector_statement(blocks[index - 1][1][0])
rendered: list[str] = []
if loop_selector is not None:
rendered.append(f"for {loop_selector} {{")
else:
rendered.append(f"while ({invert_condition_text(terminal.condition or 'condition')}) {{")
rendered.extend(indent_lines(loop_lines))
rendered.append("}")
return rendered, target_index
def render_infinite_loop_construct(
blocks: list[tuple[str, list[str]]],
label_to_index: dict[str, int],
index: int,
end_index: int,
return_labels: set[str],
active_regions: set[tuple[int, int, tuple[str, ...]]] | None = None,
render_cache: dict[tuple[int, int, tuple[str, ...]], tuple[list[str], bool] | None] | None = None,
) -> tuple[list[str], int] | None:
if index + 1 >= end_index:
return None
loop_label = blocks[index][0]
loop_tail_index: int | None = None
for candidate in range(end_index - 1, index, -1):
statements = blocks[candidate][1]
if not statements:
continue
terminal = parse_terminal_statement(statements[-1])
if terminal is not None and terminal.kind == "goto" and terminal.target == loop_label:
loop_tail_index = candidate
break
if loop_tail_index is None:
return None
loop_body = render_structured_region(
blocks,
label_to_index,
index,
loop_tail_index + 1,
return_labels,
{loop_label},
active_regions,
render_cache,
)
if loop_body is None:
return None
loop_lines, _ = loop_body
rendered = ["while (true) {"]
rendered.extend(indent_lines(loop_lines))
rendered.append("}")
return rendered, loop_tail_index + 1
def render_structured_region(
blocks: list[tuple[str, list[str]]],
label_to_index: dict[str, int],
@ -2635,34 +2739,20 @@ def render_structured_region(
index = selector_join_index
continue
if target_index <= end_index:
loop_tail_index = last_nonempty_block_index(blocks, index + 1, target_index)
if loop_tail_index is not None:
loop_tail_terminal = parse_terminal_statement(blocks[loop_tail_index][1][-1])
if loop_tail_terminal is not None and loop_tail_terminal.kind == "goto" and loop_tail_terminal.target == blocks[index][0]:
loop_body = render_structured_region(
blocks,
label_to_index,
index + 1,
target_index,
return_labels,
{blocks[index][0]},
active_regions,
render_cache,
)
if loop_body is not None:
loop_lines, _ = loop_body
loop_selector = None
if index > start_index:
loop_selector = parse_loop_selector_statement(blocks[index - 1][1][0]) if is_loop_selector_only_block(blocks[index - 1][1]) else None
if loop_selector is not None:
lines.append(f"for {loop_selector} {{")
else:
lines.append(f"while ({invert_condition_text(terminal.condition or 'condition')}) {{")
lines.extend(indent_lines(loop_lines))
lines.append("}")
index = target_index
continue
loop_construct = render_loop_construct(
blocks,
label_to_index,
index,
end_index,
return_labels,
active_regions,
render_cache,
)
if loop_construct is not None:
loop_lines, loop_join_index = loop_construct
lines.extend(loop_lines)
index = loop_join_index
continue
true_tail_index = last_nonempty_block_index(blocks, index + 1, target_index)
if true_tail_index is not None:
@ -2817,6 +2907,38 @@ def render_partially_structured_blocks(blocks: list[tuple[str, list[str]]]) -> l
index = selector_join_index
continue
loop_construct = render_loop_construct(
blocks,
label_to_index,
index,
len(blocks),
return_labels,
)
if loop_construct is not None:
loop_lines, loop_join_index = loop_construct
lines.append(f" {label}:")
for statement in loop_lines:
lines.append(f" {statement}" if statement else "")
lines.append("")
index = loop_join_index
continue
infinite_loop_construct = render_infinite_loop_construct(
blocks,
label_to_index,
index,
len(blocks),
return_labels,
)
if infinite_loop_construct is not None:
loop_lines, loop_join_index = infinite_loop_construct
lines.append(f" {label}:")
for statement in loop_lines:
lines.append(f" {statement}" if statement else "")
lines.append("")
index = loop_join_index
continue
lines.append(f" {label}:")
for statement in statements:
lines.append(f" {statement}")
@ -2855,6 +2977,47 @@ def render_pseudocode(ir: dict[str, Any], shape_catalog: ShapeCatalog | None = N
return apply_shape_catalog_to_pseudocode("\n".join(lines) + "\n", shape_catalog)
def validate_pseudocode_text(text: str) -> list[str]:
errors: list[str] = []
label_lines: dict[str, int] = {}
goto_targets: list[tuple[str, int]] = []
brace_depth = 0
for line_number, raw_line in enumerate(text.splitlines(), start=1):
stripped = raw_line.strip()
if not stripped:
continue
if stripped.endswith("{"):
brace_depth += 1
if stripped == "}":
brace_depth -= 1
if brace_depth < 0:
errors.append(f"line {line_number}: unexpected closing brace")
brace_depth = 0
label_match = re.fullmatch(r"([A-Za-z_][A-Za-z0-9_]*):", stripped)
if label_match is not None:
label = label_match.group(1)
previous_line = label_lines.get(label)
if previous_line is not None:
errors.append(f"line {line_number}: duplicate label {label} (first at line {previous_line})")
else:
label_lines[label] = line_number
for match in re.finditer(r"\bgoto ([A-Za-z_][A-Za-z0-9_]*)\s*;", stripped):
goto_targets.append((match.group(1), line_number))
if brace_depth != 0:
errors.append(f"unbalanced braces: final depth {brace_depth}")
for target, line_number in goto_targets:
if target not in label_lines:
errors.append(f"line {line_number}: goto target {target} has no label")
return errors
def render_text(ir: dict[str, Any]) -> str:
labels = build_listing_labels(ir)

1000
tools/render_crusader_map.py Normal file

File diff suppressed because it is too large Load diff

View file

@ -9,6 +9,7 @@ from tools.poc_crusader_usecode_parser import (
render_partially_structured_blocks,
render_structured_pseudocode,
try_decode_loop_selector,
validate_pseudocode_text,
)
@ -222,6 +223,58 @@ class UsecodeStructuringTests(unittest.TestCase):
self.assertNotIn("block_0358:", text)
self.assertNotIn("goto block_0469;", text)
def test_generic_loop_renders_in_partial_fallback(self) -> None:
blocks = [
("entry", ["goto block_01E2;"]),
("block_01E2", ["counter = 0;"]),
("block_025C", ["if (counter <= rndNum) goto block_0315;"]),
("block_0267", ["counter2 = 1;"]),
("block_026E", ["if (counter2 <= 7) goto block_02B6;"]),
("block_0276", ["spawn FREE.waitNTimerTicks(pid, 10, 0x00000000);", "suspend;", "counter2 = (1 + counter2);", "goto block_026E;"]),
("block_02B6", ["counter = (1 + counter);", "goto block_025C;"]),
("block_0315", ["goto block_01E2;"]),
]
rendered = render_partially_structured_blocks(blocks)
text = "\n".join(rendered)
self.assertIn("while (true) {", text)
self.assertIn("while (counter > rndNum) {", text)
self.assertIn("while (counter2 > 7) {", text)
self.assertNotIn("block_026E:", text)
self.assertNotIn("goto block_025C;", text)
def test_infinite_loop_region_renders_as_while_true(self) -> None:
blocks = [
("entry", ["set_info(0x021B, *(arg_06));"]),
("block_01E2", ["suspend;", "FREE.slot_20(100);", "if (retval > 50) goto block_0318;"]),
("block_0205", ["FREE.slot_20(pid, 120);", "goto block_046D;"]),
("block_0318", ["FREE.slot_20(pid, 60);"]),
("block_046D", ["goto block_01E2;"]),
("block_0470", ["return;"]),
]
rendered = render_partially_structured_blocks(blocks)
text = "\n".join(rendered)
self.assertIn("while (true) {", text)
self.assertNotIn("goto block_01E2;", text)
self.assertNotIn("block_046D:", text)
def test_pseudocode_validator_reports_missing_label(self) -> None:
errors = validate_pseudocode_text(
"function sample()\n{\n entry:\n goto missing;\n}\n"
)
self.assertEqual(errors, ["line 4: goto target missing has no label"])
def test_pseudocode_validator_accepts_balanced_text(self) -> None:
errors = validate_pseudocode_text(
"function sample()\n{\n entry:\n while (true) {\n goto entry;\n }\n}\n"
)
self.assertEqual(errors, [])
if __name__ == "__main__":
unittest.main()