Crusader_Decomp/psx-map-exporter/docs/spec.md
2026-04-13 16:50:28 +02:00

13 KiB

PSX Map Exporter Spec

Goal

psx-map-exporter is a standalone Node.js probe for Crusader PSX map extraction.

It exists to prove a fresh end-to-end path from raw LSET*.WDL input to:

  • extracted intermediate sprite assets under .cache
  • a rendered map PNG under .output

This project does not reuse Crusader-Map-Viewer code, scene caches, donor mappings, or sidecar summaries as binding inputs. It only consumes raw PSX assets plus the documented executable-backed findings from docs/psx and the live Ghidra session.

Scope

Version v0 is intentionally narrow.

It will:

  • read one PSX LSET*.WDL file
  • parse the documented 0x38-byte top-level header
  • carve the post-audio map/art regions from header-derived boundaries
  • parse the loader-sized post-audio sections as a second, higher-value view of the file layout
  • extract the dense constructor-placement family from post_audio_section_00
  • keep the smaller root-dispatch family available as a comparison probe
  • render a layered authored probe that can combine constructor placements with the smaller root-dispatch lane
  • scan post_audio_region_04 for type-4/type-5 sprite bundles
  • decode bundle frames directly from the raw WDL
  • write extracted frame PNGs to .cache
  • compose a probe map PNG to .output

It will not claim full runtime parity yet.

Known non-goals for v0:

  • exact DAT_800758d8/d0/cc/d4 parity
  • exact CLUT reproduction
  • full stage-1 dependency-graph ordering
  • exact type-to-resource binding for unresolved families
  • full post_audio_region_01 / post_audio_region_02 semantic decode

Evidence Constraints

The implementation is grounded in these current facts from the docs and Ghidra:

  • LSET*.WDL uses a fixed 0x38-byte top-level header.
  • The second dword is the audio/SPU blob size.
  • The old region-only carve is not sufficient on its own for visible-object recovery; loader-sized post_audio_section_00 contains both the small root-dispatch rows and the dense constructor-placement rows.
  • The file contains a post-audio area with four high-confidence absolute boundaries that split:
    • post_audio_region_00
    • post_audio_region_01
    • post_audio_region_02
    • post_audio_region_03
    • post_audio_region_04
  • The small count-prefixed section-0 root-dispatch rows are real, but they are not the whole map object set.
  • The dense constructor-placement records recovered from loader-sized post_audio_section_00 are currently the best standalone live-object seed source, not a proven final visible-map layer.
  • Current strongest standalone layout read: the constructor-placement lane is a count-prefixed 12-byte substream inside the loader-sized section-0 span rather than a whole-section 24-byte row grid. For LSET1/L0.WDL, the best current candidate has a section-relative header at 0x38, a record start at 0x3c, and a reported count of 1182 records.
  • The constructor-placement stream can extend slightly past the nominal post_audio_section_00 slice, so standalone parsing must follow the detected stream count from the section-0 base instead of truncating strictly at the section object boundary.
  • post_audio_region_04 is the strongest current graphics bank candidate.
  • The direct typeWord -> bundle slot scan-order binding is disproven as a final art rule and is retained only as a diagnostic bundle-family probe.
  • The real art/template lane is DAT_800758d8, but the executable now shows two distinct late art feeds per WDL pass rather than one monolithic bank:
    • an earlier art-install blob that builds resources and temporarily mirrors them into DAT_800758d8
    • a later 8-byte header-only override blob that restores raw active-header pointers into DAT_800758d8
  • The later header-only override is the safer standalone parser target: constructors branch on first dword 0x58 and then reuse DAT_800758c8[type], so the final post-load DAT_800758d8 state is a raw-header lane, not a permanently built-resource lane.
  • Type-4/type-5 drawable bundles expose width, height, palette mode/index, frame count, frame table offset, and data offset in the raw bundle header.
  • Bundle frame entries use a 20-byte row with size, relative data offset, width, height, origin x/y, and flags.
  • sprite_rle_decode_rows uses row-local control bytes:
    • positive: repeat next byte N times
    • negative: copy next abs(N) literal bytes
    • zero: end row
  • The executable projection basis is:

screen_x = y - x

screen_y = 2z - \frac{x + y}{2}

Input Model

The exporter accepts either:

  • a direct --wdl path
  • or a --source path relative to a PSX disc root

Default disc root for local workspace runs:

  • d:/Ghidra/Crusader-Map-Viewer/map_renderer/STATIC_PSX

Expected source examples:

  • LSET1/L0.WDL
  • LSET4/L37.WDL

Output Layout

.cache

Per-run cache path:

  • .cache/<map-stem>/

Contents:

  • wdl-summary.json
  • records.json
  • bundles.json
  • frame-manifest.json
  • active-header-overrides.json
  • sprites/<bundle-offset>/frame_<n>.png

The cache is disposable. It exists to preserve intermediate evidence and make re-runs inspectable.

records.json now also records constructor-stream detection metadata when available: stream header offset, record start offset, reported count, and the initial structured-prefix run.

The cache also records candidate late DAT_800758d8 header-only override blobs as a standalone diagnostic. Those candidates are not used as final art binding yet.

wdl-summary.json now also emits sceneInterpretation, which is an explicit warning-bearing classification of what the current export most likely represents. For constructor-placement exports this should currently read as a constructor-fed live-object seed lane rather than a final visible-world reconstruction.

.output

Per-run final outputs:

  • .output/<map-stem>.png
  • .output/<map-stem>.json
  • .output/<map-stem>_<layer>.png for each rendered authored layer when layered mode is active

The JSON stores the final probe scene manifest used to draw the PNG.

The .output folder is reset at the start of each export so evaluation only sees artifacts from the current run.

The .output/<map-stem>.json manifest inherits sceneInterpretation from wdl-summary.json so consumers do not need to infer that warning from prose docs alone.

Record Extraction Rules

v0 now uses the loader-sized post_audio_section_00 extraction paths as the primary scene source.

Current interpretation constraint:

  • section0_constructor_placements should currently be treated as constructor-fed world-object seed records.
  • They preserve meaningful layout and projection structure, but current evidence does not support treating them as the complete visible map or static architecture layer.
  • If a render shows coherent room layout with globally wrong or repeated art, the exporter is currently visualizing one runtime object lane without the downstream per-type bind/state path and without the separate static-world substrate.

Record extraction rule:

  • auto / combined / layered mode merges both authored section-0 families into one layered probe:
    • constructor placements provide the dense live-object seed lane
    • root-dispatch rows provide the smaller comparison and auxiliary authored lane
  • constructors / region01 mode first searches the section-0 span for a count-prefixed 12-byte constructor stream and, when found, treats each record as six little-endian u16 words:
    • typeWord
    • xWord
    • yWord
    • zWord
    • selectorWord
    • laneWord
  • If a count-prefixed constructor stream is not found, the exporter falls back to the older whole-section 24-byte paired-record scan as a compatibility probe.
  • roots / region00 mode keeps the small count-prefixed root-dispatch probe for comparison and negative-evidence checks

Plausibility filter:

  • typeWord in a conservative visible-family range
  • not all coordinate words are zero
  • laneWord is non-zero and within the current conservative control-word range

This is explicitly a probe schema, not a final loader-faithful schema.

Current negative result:

  • Correcting the constructor stream start/count for LSET1/L0.WDL only changes the standalone constructor probe slightly (1130 -> 1135 records, 1090 -> 1095 rendered items) and does not materially change the repeated wrong-art output. Current evidence therefore points to unresolved art/runtime binding as the primary blocker, not a missed constructor-tail decode.

Art Binding Rule

v0 uses one explicit diagnostic binding rule:

  • typeWord -> bundle slot index

That means the sorted bundle list from post_audio_region_04 is indexed directly by typeWord when the slot exists.

This rule is explicitly not claimed as final executable truth. Current docs and Ghidra evidence show the final art path goes through the late DAT_800758d8 art bank plus downstream state-script/runtime selection. The slot rule remains useful only as a clean standalone negative-evidence probe.

For the generic family band now dominating LSET1/L0 failures (0x003e, 0x0042, 0x0044, 0x0045, 0x004f, 0x0059, 0x005b), repeated wrong art is now understood as both a binding failure and a semantic-layer failure: the exporter is currently visualizing constructor-fed runtime object seeds as though they were the final visible world.

The chosen bundle and clamped frame index, plus binding-diversity metrics, are preserved in output metadata so failures stay auditable.

There is now one opt-in experimental binding mode for current map-0 research:

  • runtime-map0-masked-proxy

That mode reads .cache/runtime-map0-correlation.json, takes the live headerWord11 field from the current map-0 type rows, masks it to 0x0fffff, and remaps a type only when that masked value lands within a small tolerance of a scanned raw bundle offset with matching kind/mode. All non-matching types still fall back to the raw slot rule.

This is still a probe rule, not claimed final executable truth. It exists to turn the new RAM-backed map-0 correlation into a small, auditable extraction improvement without pretending the full late DAT_800758d8 bank parse is solved.

When debug labels are enabled for a map render, labels now identify unique rendered resources rather than per-instance placements. The stable label key is currently bundle offset + clamped frame + resolved palette. Validation atlas sheets still use progressive cell indices.

Rendering Rule

For each record:

  • compute screenX and screenY from the documented projection basis
  • select frame index from selectorWord, clamped to available frames
  • place sprite top-left at:
    • screenX - originX
    • screenY - originY

Current draw order is conservative:

  • main-visible before special-visible
  • then ascending screenY
  • then ascending screenX

This is a probe approximation. The later graph-based stage-1 ordering still belongs to a future pass.

The rendered PNG uses a neutral opaque background by default so probe silhouettes are legible without relying on transparency.

Color Rule

v0 emits grayscale art from raw pixel indices.

Reason:

  • bundle frame decode is already well constrained
  • full CLUT parity is not
  • grayscale preserves shape/variant evidence without pretending the palette problem is solved

Transparent index 0 stays transparent.

CLI

Primary command:

node src/cli.js --source LSET1/L0.WDL

Supported options:

  • --source <relative-path>
  • --wdl <absolute-or-relative-file>
  • --disc-root <path>
  • --binding-mode <raw|runtime-map0-masked-proxy>
  • --map-source <auto|combined|layered|constructors|roots|region01|region00>
  • --out-name <stem>

Success Criteria

v0 is successful if it can:

  • parse a raw LSET*.WDL
  • recover the loader-sized section view alongside the region carve
  • scan bundles directly from post_audio_region_04
  • decode at least one frame from raw data
  • extract a stable constructor-placement record set from post_audio_section_00
  • write extracted sprite PNGs into .cache
  • write a readable diagnostic probe PNG into .output

Planned Follow-Ups

  • replace diagnostic slot binding with a direct parser for the late header-only DAT_800758d8 override stream and bundle match path
  • recover the exact raw on-disk encoding of the earlier built-resource art-install blob so the two late art feeds are modeled separately instead of flattened into one guessed bank
  • identify and parse the separate static-world or subordinate level substrate that complements the constructor-fed live-object lane, instead of treating section-0 constructor placements as the whole map
  • add palette/CLUT reconstruction
  • add stage-1 graph ordering recovery
  • compare the probe scene against fixed live samples such as map 104 without reintroducing viewer-side donor assumptions