# PSX Map Exporter Spec ## Goal `psx-map-exporter` is a standalone Node.js probe for Crusader PSX map extraction. It exists to prove a fresh end-to-end path from raw `LSET*.WDL` input to: - extracted intermediate sprite assets under `.cache` - a rendered map PNG under `.output` This project does not reuse `Crusader-Map-Viewer` code, scene caches, donor mappings, or sidecar summaries as binding inputs. It only consumes raw PSX assets plus the documented executable-backed findings from `docs/psx` and the live Ghidra session. ## Scope Version `v0` is intentionally narrow. It will: - read one PSX `LSET*.WDL` file - parse the documented `0x38`-byte top-level header - carve the post-audio map/art regions from header-derived boundaries - parse the loader-sized post-audio sections as a second, higher-value view of the file layout - extract the dense constructor-placement family from `post_audio_section_00` - keep the smaller root-dispatch family available as a comparison probe - render a layered authored probe that can combine constructor placements with the smaller root-dispatch lane - scan `post_audio_region_04` for type-4/type-5 sprite bundles - decode bundle frames directly from the raw WDL - write extracted frame PNGs to `.cache` - compose a probe map PNG to `.output` It will not claim full runtime parity yet. Known non-goals for `v0`: - exact `DAT_800758d8/d0/cc/d4` parity - exact CLUT reproduction - full stage-1 dependency-graph ordering - exact type-to-resource binding for unresolved families - full `post_audio_region_01` / `post_audio_region_02` semantic decode ## Evidence Constraints The implementation is grounded in these current facts from the docs and Ghidra: - `LSET*.WDL` uses a fixed `0x38`-byte top-level header. - The second dword is the audio/SPU blob size. - The old region-only carve is not sufficient on its own for visible-object recovery; loader-sized `post_audio_section_00` contains both the small root-dispatch rows and the dense constructor-placement rows. - The file contains a post-audio area with four high-confidence absolute boundaries that split: - `post_audio_region_00` - `post_audio_region_01` - `post_audio_region_02` - `post_audio_region_03` - `post_audio_region_04` - The small count-prefixed section-0 root-dispatch rows are real, but they are not the whole map object set. - The dense constructor-placement records recovered from loader-sized `post_audio_section_00` are currently the best standalone live-object seed source, not a proven final visible-map layer. - Current strongest standalone layout read: the constructor-placement lane is a count-prefixed `12`-byte substream inside the loader-sized section-0 span rather than a whole-section `24`-byte row grid. For `LSET1/L0.WDL`, the best current candidate has a section-relative header at `0x38`, a record start at `0x3c`, and a reported count of `1182` records. - The constructor-placement stream can extend slightly past the nominal `post_audio_section_00` slice, so standalone parsing must follow the detected stream count from the section-0 base instead of truncating strictly at the section object boundary. - `post_audio_region_04` is the strongest current graphics bank candidate. - The direct `typeWord -> bundle slot` scan-order binding is disproven as a final art rule and is retained only as a diagnostic bundle-family probe. - The real art/template lane is `DAT_800758d8`, but the executable now shows two distinct late art feeds per WDL pass rather than one monolithic bank: - an earlier art-install blob that builds resources and temporarily mirrors them into `DAT_800758d8` - a later `8`-byte header-only override blob that restores raw active-header pointers into `DAT_800758d8` - The later header-only override is the safer standalone parser target: constructors branch on first dword `0x58` and then reuse `DAT_800758c8[type]`, so the final post-load `DAT_800758d8` state is a raw-header lane, not a permanently built-resource lane. - Type-4/type-5 drawable bundles expose width, height, palette mode/index, frame count, frame table offset, and data offset in the raw bundle header. - Bundle frame entries use a `20`-byte row with size, relative data offset, width, height, origin x/y, and flags. - `sprite_rle_decode_rows` uses row-local control bytes: - positive: repeat next byte N times - negative: copy next `abs(N)` literal bytes - zero: end row - The executable projection basis is: $$ screen_x = y - x $$ $$ screen_y = 2z - \frac{x + y}{2} $$ ## Input Model The exporter accepts either: - a direct `--wdl` path - or a `--source` path relative to a PSX disc root Default disc root for local workspace runs: - `d:/Ghidra/Crusader-Map-Viewer/map_renderer/STATIC_PSX` Expected source examples: - `LSET1/L0.WDL` - `LSET4/L37.WDL` ## Output Layout ### `.cache` Per-run cache path: - `.cache//` Contents: - `wdl-summary.json` - `records.json` - `bundles.json` - `frame-manifest.json` - `active-header-overrides.json` - `sprites//frame_.png` The cache is disposable. It exists to preserve intermediate evidence and make re-runs inspectable. `records.json` now also records constructor-stream detection metadata when available: stream header offset, record start offset, reported count, and the initial structured-prefix run. The cache also records candidate late `DAT_800758d8` header-only override blobs as a standalone diagnostic. Those candidates are not used as final art binding yet. `wdl-summary.json` now also emits `sceneInterpretation`, which is an explicit warning-bearing classification of what the current export most likely represents. For constructor-placement exports this should currently read as a constructor-fed live-object seed lane rather than a final visible-world reconstruction. ### `.output` Per-run final outputs: - `.output/.png` - `.output/.json` - `.output/_.png` for each rendered authored layer when layered mode is active The JSON stores the final probe scene manifest used to draw the PNG. The `.output` folder is reset at the start of each export so evaluation only sees artifacts from the current run. The `.output/.json` manifest inherits `sceneInterpretation` from `wdl-summary.json` so consumers do not need to infer that warning from prose docs alone. ## Record Extraction Rules `v0` now uses the loader-sized `post_audio_section_00` extraction paths as the primary scene source. Current interpretation constraint: - `section0_constructor_placements` should currently be treated as constructor-fed world-object seed records. - They preserve meaningful layout and projection structure, but current evidence does not support treating them as the complete visible map or static architecture layer. - If a render shows coherent room layout with globally wrong or repeated art, the exporter is currently visualizing one runtime object lane without the downstream per-type bind/state path and without the separate static-world substrate. Record extraction rule: - `auto` / `combined` / `layered` mode merges both authored section-0 families into one layered probe: - constructor placements provide the dense live-object seed lane - root-dispatch rows provide the smaller comparison and auxiliary authored lane - `constructors` / `region01` mode first searches the section-0 span for a count-prefixed `12`-byte constructor stream and, when found, treats each record as six little-endian `u16` words: - `typeWord` - `xWord` - `yWord` - `zWord` - `selectorWord` - `laneWord` - If a count-prefixed constructor stream is not found, the exporter falls back to the older whole-section `24`-byte paired-record scan as a compatibility probe. - `roots` / `region00` mode keeps the small count-prefixed root-dispatch probe for comparison and negative-evidence checks Plausibility filter: - `typeWord` in a conservative visible-family range - not all coordinate words are zero - `laneWord` is non-zero and within the current conservative control-word range This is explicitly a probe schema, not a final loader-faithful schema. Current negative result: - Correcting the constructor stream start/count for `LSET1/L0.WDL` only changes the standalone constructor probe slightly (`1130 -> 1135` records, `1090 -> 1095` rendered items) and does not materially change the repeated wrong-art output. Current evidence therefore points to unresolved art/runtime binding as the primary blocker, not a missed constructor-tail decode. ## Art Binding Rule `v0` uses one explicit diagnostic binding rule: - `typeWord -> bundle slot index` That means the sorted bundle list from `post_audio_region_04` is indexed directly by `typeWord` when the slot exists. This rule is explicitly not claimed as final executable truth. Current docs and Ghidra evidence show the final art path goes through the late `DAT_800758d8` art bank plus downstream state-script/runtime selection. The slot rule remains useful only as a clean standalone negative-evidence probe. For the generic family band now dominating `LSET1/L0` failures (`0x003e`, `0x0042`, `0x0044`, `0x0045`, `0x004f`, `0x0059`, `0x005b`), repeated wrong art is now understood as both a binding failure and a semantic-layer failure: the exporter is currently visualizing constructor-fed runtime object seeds as though they were the final visible world. The chosen bundle and clamped frame index, plus binding-diversity metrics, are preserved in output metadata so failures stay auditable. There is now one opt-in experimental binding mode for current map-0 research: - `runtime-map0-masked-proxy` That mode reads `.cache/runtime-map0-correlation.json`, takes the live `headerWord11` field from the current map-0 type rows, masks it to `0x0fffff`, and remaps a type only when that masked value lands within a small tolerance of a scanned raw bundle offset with matching kind/mode. All non-matching types still fall back to the raw slot rule. This is still a probe rule, not claimed final executable truth. It exists to turn the new RAM-backed map-0 correlation into a small, auditable extraction improvement without pretending the full late `DAT_800758d8` bank parse is solved. When debug labels are enabled for a map render, labels now identify unique rendered resources rather than per-instance placements. The stable label key is currently `bundle offset + clamped frame + resolved palette`. Validation atlas sheets still use progressive cell indices. ## Rendering Rule For each record: - compute `screenX` and `screenY` from the documented projection basis - select frame index from `selectorWord`, clamped to available frames - place sprite top-left at: - `screenX - originX` - `screenY - originY` Current draw order is conservative: - main-visible before special-visible - then ascending `screenY` - then ascending `screenX` This is a probe approximation. The later graph-based stage-1 ordering still belongs to a future pass. The rendered PNG uses a neutral opaque background by default so probe silhouettes are legible without relying on transparency. ## Color Rule `v0` emits grayscale art from raw pixel indices. Reason: - bundle frame decode is already well constrained - full CLUT parity is not - grayscale preserves shape/variant evidence without pretending the palette problem is solved Transparent index `0` stays transparent. ## CLI Primary command: ```powershell node src/cli.js --source LSET1/L0.WDL ``` Supported options: - `--source ` - `--wdl ` - `--disc-root ` - `--binding-mode ` - `--map-source ` - `--out-name ` ## Success Criteria `v0` is successful if it can: - parse a raw `LSET*.WDL` - recover the loader-sized section view alongside the region carve - scan bundles directly from `post_audio_region_04` - decode at least one frame from raw data - extract a stable constructor-placement record set from `post_audio_section_00` - write extracted sprite PNGs into `.cache` - write a readable diagnostic probe PNG into `.output` ## Planned Follow-Ups - replace diagnostic slot binding with a direct parser for the late header-only `DAT_800758d8` override stream and bundle match path - recover the exact raw on-disk encoding of the earlier built-resource art-install blob so the two late art feeds are modeled separately instead of flattened into one guessed bank - identify and parse the separate static-world or subordinate level substrate that complements the constructor-fed live-object lane, instead of treating section-0 constructor placements as the whole map - add palette/CLUT reconstruction - add stage-1 graph ordering recovery - compare the probe scene against fixed live samples such as `map 104` without reintroducing viewer-side donor assumptions