psx map standalone exporter
This commit is contained in:
parent
a9153546ae
commit
2f243976b6
16 changed files with 3254 additions and 5 deletions
248
psx-map-exporter/docs/implementation-analysis.md
Normal file
248
psx-map-exporter/docs/implementation-analysis.md
Normal file
|
|
@ -0,0 +1,248 @@
|
|||
# PSX Map Exporter Implementation Analysis
|
||||
|
||||
## Summary
|
||||
|
||||
The exporter should be treated as a controlled probe, not as a final renderer.
|
||||
|
||||
The key design choice is to keep the whole path raw-file-based and auditable:
|
||||
|
||||
- raw WDL in
|
||||
- explicit carve + record extraction + bundle extraction
|
||||
- cached sprite/frame artifacts out
|
||||
- final composed map PNG out
|
||||
|
||||
That keeps the work independent from the existing viewer and makes every wrong assumption inspectable.
|
||||
|
||||
## Why This Architecture
|
||||
|
||||
The existing PSX work already proved two important negative results:
|
||||
|
||||
- direct raw bundle-order art binding is too weak to count as solved
|
||||
- viewer-side polish is low value until extraction is isolated and testable
|
||||
|
||||
So the new exporter should optimize for:
|
||||
|
||||
- small number of assumptions
|
||||
- easy intermediate inspection
|
||||
- direct correspondence to documented executable behavior where possible
|
||||
|
||||
## Chosen `v0` Path
|
||||
|
||||
### 1. Parse only the parts of the WDL we can justify now
|
||||
|
||||
Implemented directly from docs:
|
||||
|
||||
- `0x34` header
|
||||
- audio-size dword
|
||||
- absolute region boundaries recovered from high offset words in the header
|
||||
|
||||
Not implemented in `v0`:
|
||||
|
||||
- full loader section choreography
|
||||
- detached runtime stream install
|
||||
- inflated runtime-state interpretation
|
||||
|
||||
Those are preserved as future extension points but not required for the first PNG.
|
||||
|
||||
### 2. Prefer loader-sized `post_audio_section_00` as a layered authored probe
|
||||
|
||||
Why:
|
||||
|
||||
- the old region00-first path is now known to overfit the small root-dispatch family
|
||||
- loader-sized section parsing recovers the dense constructor-placement records from the same first real section, currently modeled as paired 12-byte records inside 24-byte row chunks
|
||||
- the same section also exposes the smaller root-dispatch lane, which is independently renderable offline and now belongs in the default layered probe
|
||||
|
||||
Tradeoff:
|
||||
|
||||
- the art binding is still diagnostic-only for many types
|
||||
- constructor placements are better understood as one runtime object seed layer, not the final visible map or the static world substrate
|
||||
- root-dispatch rows now render as a second authored layer, but they still do not close the runtime-only control, state, and dynamic effect gaps
|
||||
|
||||
This is acceptable for `v0` because the project goal is a fresh, inspectable layered baseline rather than a falsely confident full renderer.
|
||||
|
||||
### 3. Decode art from raw bundles, but keep binding diagnostic
|
||||
|
||||
What is strong already:
|
||||
|
||||
- bundle scan can be constrained by executable-backed header fields
|
||||
- frame decode and row-RLE semantics are pinned
|
||||
|
||||
What is still weak:
|
||||
|
||||
- exact late-`DAT_800758d8` parse and type-to-resource selection path
|
||||
- exact palette path
|
||||
|
||||
So the current standalone probe does the right split:
|
||||
|
||||
- strong part: raw bundle/frame decode
|
||||
- diagnostic part: `typeWord -> bundle slot`
|
||||
|
||||
It also exports candidate late active-header override blobs to cache so the Ghidra-backed `DAT_800758d8` header-only lane can be inspected per run without pretending that binding is already solved.
|
||||
|
||||
The newer conclusion from `LSET1/L0` label failures is narrower than the earlier wording: if one type repeatedly paints a coherent room footprint with obviously wrong art, the exporter is probably visualizing valid world-object seed placement while still missing the separate static-world layer and the downstream executable bind/state path that chooses the final drawable resource.
|
||||
|
||||
Viewer-derived sidecars and donor mappings are no longer acceptable here because they blur exactly the binding problem the exporter is meant to isolate.
|
||||
|
||||
## Module Plan
|
||||
|
||||
### `src/wdl.js`
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- read header words
|
||||
- compute post-audio start
|
||||
- derive regions from absolute boundary values
|
||||
- expose region buffers and summary metadata
|
||||
|
||||
Reason to isolate it:
|
||||
|
||||
- the carve is likely to change as more loader details land
|
||||
- record extraction should not depend on header internals
|
||||
|
||||
### `src/bundles.js`
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- scan the graphics bank for plausible kind-4/kind-5 bundles
|
||||
- parse bundle headers and frame entries
|
||||
- decode frame bytes
|
||||
- emit grayscale PNG-ready RGBA buffers
|
||||
|
||||
When the standalone scan yields zero bundles for a map, `src/export-map.js` may hydrate bundle offsets and frame geometry from `out/psx_wdl_disc/.../summary.json` and continue decoding the actual frame bytes from the raw WDL.
|
||||
|
||||
Reason to isolate it:
|
||||
|
||||
- this code is reusable even if the map schema changes
|
||||
- it is the strongest raw-file-backed part of the exporter
|
||||
|
||||
### `src/export-map.js`
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- choose the record source
|
||||
- choose diagnostic art binding
|
||||
- normalize screen bounds
|
||||
- write cache metadata and composed outputs
|
||||
|
||||
This file holds the intentionally weak parts of `v0` so they remain easy to replace.
|
||||
|
||||
### `src/render.js`
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- sprite compositing
|
||||
- sort order approximation
|
||||
- PNG encoding
|
||||
- neutral opaque background for evaluation-friendly probe output
|
||||
|
||||
## Data Contracts
|
||||
|
||||
### Record
|
||||
|
||||
```json
|
||||
{
|
||||
"index": 0,
|
||||
"source": "region00",
|
||||
"typeWord": 74,
|
||||
"xWord": 5635,
|
||||
"yWord": 3815,
|
||||
"zWord": 0,
|
||||
"selectorWord": 1,
|
||||
"laneWord": 32,
|
||||
"screenX": -1820,
|
||||
"screenY": -4725
|
||||
}
|
||||
```
|
||||
|
||||
### Bundle
|
||||
|
||||
```json
|
||||
{
|
||||
"offsetInRegion": 58808,
|
||||
"absoluteOffset": 534068,
|
||||
"kind": 5,
|
||||
"mode": 2,
|
||||
"paletteIndex": 12,
|
||||
"frameCount": 3,
|
||||
"dataOffset": 112,
|
||||
"frameTableOffset": 52
|
||||
}
|
||||
```
|
||||
|
||||
### Scene Item
|
||||
|
||||
```json
|
||||
{
|
||||
"recordIndex": 0,
|
||||
"bundleSlot": 74,
|
||||
"bundleAbsoluteOffset": 954728,
|
||||
"frameIndex": 1,
|
||||
"screenX": -1820,
|
||||
"screenY": -4725,
|
||||
"drawX": -1879,
|
||||
"drawY": -4815,
|
||||
"width": 96,
|
||||
"height": 91,
|
||||
"originX": 59,
|
||||
"originY": 90
|
||||
}
|
||||
```
|
||||
|
||||
## Validation Strategy
|
||||
|
||||
`v0` validation should answer four questions only:
|
||||
|
||||
1. Did the raw WDL parse into the documented regions?
|
||||
2. Did the graphics-bank scanner recover plausible bundles with decoded frames?
|
||||
3. Did the constructor-placement extractor recover plausible section-0 rows from the loader-sized section view?
|
||||
4. Did the compositor produce a non-empty PNG with recognizable art silhouettes on a neutral background?
|
||||
|
||||
This is enough for the first pass.
|
||||
|
||||
## Risks
|
||||
|
||||
### Binding risk
|
||||
|
||||
The diagnostic bundle binding is the weakest part of the pipeline.
|
||||
|
||||
Expected failure modes:
|
||||
|
||||
- correct placement with wrong art family
|
||||
- repeated art across several type families
|
||||
- frame clamping where selector words exceed available bundle frames
|
||||
|
||||
Mitigation:
|
||||
|
||||
- keep the chosen bundle slot, frame clamp count, and bundle-repeat metrics in output metadata
|
||||
|
||||
### Schema risk
|
||||
|
||||
The `region00` record extractor uses a plausibility scan instead of a final loader schema.
|
||||
|
||||
Expected failure modes:
|
||||
|
||||
- false positives in some maps
|
||||
- missing records when the preamble differs
|
||||
|
||||
Mitigation:
|
||||
|
||||
- preserve `recordStartOffset`
|
||||
- make `region01` fallback selectable from CLI
|
||||
|
||||
### Palette risk
|
||||
|
||||
Grayscale is intentionally not faithful to the executable color path.
|
||||
|
||||
Mitigation:
|
||||
|
||||
- keep the grayscale rule explicit
|
||||
- do not mix partial CLUT heuristics into `v0`
|
||||
|
||||
## Immediate Follow-Up Options
|
||||
|
||||
After `v0` works, the next pass should choose one of these:
|
||||
|
||||
1. Replace provisional art binding with a loader-backed type/resource lookup.
|
||||
2. Parse the late `DAT_800758d8` bank directly from the large late graphics area instead of relying on slot order.
|
||||
3. Add executable-backed CLUT reconstruction once the palette path is pinned tightly enough.
|
||||
4. Recover stage-1 graph ordering when sprite placement is stable enough to make sort differences meaningful.
|
||||
256
psx-map-exporter/docs/spec.md
Normal file
256
psx-map-exporter/docs/spec.md
Normal file
|
|
@ -0,0 +1,256 @@
|
|||
# PSX Map Exporter Spec
|
||||
|
||||
## Goal
|
||||
|
||||
`psx-map-exporter` is a standalone Node.js probe for Crusader PSX map extraction.
|
||||
|
||||
It exists to prove a fresh end-to-end path from raw `LSET*.WDL` input to:
|
||||
|
||||
- extracted intermediate sprite assets under `.cache`
|
||||
- a rendered map PNG under `.output`
|
||||
|
||||
This project does not reuse `Crusader-Map-Viewer` code, scene caches, donor mappings, or sidecar summaries as binding inputs. It only consumes raw PSX assets plus the documented executable-backed findings from `docs/psx` and the live Ghidra session.
|
||||
|
||||
## Scope
|
||||
|
||||
Version `v0` is intentionally narrow.
|
||||
|
||||
It will:
|
||||
|
||||
- read one PSX `LSET*.WDL` file
|
||||
- parse the documented `0x38`-byte top-level header
|
||||
- carve the post-audio map/art regions from header-derived boundaries
|
||||
- parse the loader-sized post-audio sections as a second, higher-value view of the file layout
|
||||
- extract the dense constructor-placement family from `post_audio_section_00`
|
||||
- keep the smaller root-dispatch family available as a comparison probe
|
||||
- render a layered authored probe that can combine constructor placements with the smaller root-dispatch lane
|
||||
- scan `post_audio_region_04` for type-4/type-5 sprite bundles
|
||||
- decode bundle frames directly from the raw WDL
|
||||
- write extracted frame PNGs to `.cache`
|
||||
- compose a probe map PNG to `.output`
|
||||
|
||||
It will not claim full runtime parity yet.
|
||||
|
||||
Known non-goals for `v0`:
|
||||
|
||||
- exact `DAT_800758d8/d0/cc/d4` parity
|
||||
- exact CLUT reproduction
|
||||
- full stage-1 dependency-graph ordering
|
||||
- exact type-to-resource binding for unresolved families
|
||||
- full `post_audio_region_01` / `post_audio_region_02` semantic decode
|
||||
|
||||
## Evidence Constraints
|
||||
|
||||
The implementation is grounded in these current facts from the docs and Ghidra:
|
||||
|
||||
- `LSET*.WDL` uses a fixed `0x38`-byte top-level header.
|
||||
- The second dword is the audio/SPU blob size.
|
||||
- The old region-only carve is not sufficient on its own for visible-object recovery; loader-sized `post_audio_section_00` contains both the small root-dispatch rows and the dense constructor-placement rows.
|
||||
- The file contains a post-audio area with four high-confidence absolute boundaries that split:
|
||||
- `post_audio_region_00`
|
||||
- `post_audio_region_01`
|
||||
- `post_audio_region_02`
|
||||
- `post_audio_region_03`
|
||||
- `post_audio_region_04`
|
||||
- The small count-prefixed section-0 root-dispatch rows are real, but they are not the whole map object set.
|
||||
- The dense constructor-placement records recovered from loader-sized `post_audio_section_00` are currently the best standalone live-object seed source, not a proven final visible-map layer.
|
||||
- Current strongest standalone layout read: the constructor-placement lane is a count-prefixed `12`-byte substream inside the loader-sized section-0 span rather than a whole-section `24`-byte row grid. For `LSET1/L0.WDL`, the best current candidate has a section-relative header at `0x38`, a record start at `0x3c`, and a reported count of `1182` records.
|
||||
- The constructor-placement stream can extend slightly past the nominal `post_audio_section_00` slice, so standalone parsing must follow the detected stream count from the section-0 base instead of truncating strictly at the section object boundary.
|
||||
- `post_audio_region_04` is the strongest current graphics bank candidate.
|
||||
- The direct `typeWord -> bundle slot` scan-order binding is disproven as a final art rule and is retained only as a diagnostic bundle-family probe.
|
||||
- The real art/template lane is `DAT_800758d8`, but the executable now shows two distinct late art feeds per WDL pass rather than one monolithic bank:
|
||||
- an earlier art-install blob that builds resources and temporarily mirrors them into `DAT_800758d8`
|
||||
- a later `8`-byte header-only override blob that restores raw active-header pointers into `DAT_800758d8`
|
||||
- The later header-only override is the safer standalone parser target: constructors branch on first dword `0x58` and then reuse `DAT_800758c8[type]`, so the final post-load `DAT_800758d8` state is a raw-header lane, not a permanently built-resource lane.
|
||||
- Type-4/type-5 drawable bundles expose width, height, palette mode/index, frame count, frame table offset, and data offset in the raw bundle header.
|
||||
- Bundle frame entries use a `20`-byte row with size, relative data offset, width, height, origin x/y, and flags.
|
||||
- `sprite_rle_decode_rows` uses row-local control bytes:
|
||||
- positive: repeat next byte N times
|
||||
- negative: copy next `abs(N)` literal bytes
|
||||
- zero: end row
|
||||
- The executable projection basis is:
|
||||
|
||||
$$
|
||||
screen_x = y - x
|
||||
$$
|
||||
|
||||
$$
|
||||
screen_y = 2z - \frac{x + y}{2}
|
||||
$$
|
||||
|
||||
## Input Model
|
||||
|
||||
The exporter accepts either:
|
||||
|
||||
- a direct `--wdl` path
|
||||
- or a `--source` path relative to a PSX disc root
|
||||
|
||||
Default disc root for local workspace runs:
|
||||
|
||||
- `d:/Ghidra/Crusader-Map-Viewer/map_renderer/STATIC_PSX`
|
||||
|
||||
Expected source examples:
|
||||
|
||||
- `LSET1/L0.WDL`
|
||||
- `LSET4/L37.WDL`
|
||||
|
||||
## Output Layout
|
||||
|
||||
### `.cache`
|
||||
|
||||
Per-run cache path:
|
||||
|
||||
- `.cache/<map-stem>/`
|
||||
|
||||
Contents:
|
||||
|
||||
- `wdl-summary.json`
|
||||
- `records.json`
|
||||
- `bundles.json`
|
||||
- `frame-manifest.json`
|
||||
- `active-header-overrides.json`
|
||||
- `sprites/<bundle-offset>/frame_<n>.png`
|
||||
|
||||
The cache is disposable. It exists to preserve intermediate evidence and make re-runs inspectable.
|
||||
|
||||
`records.json` now also records constructor-stream detection metadata when available: stream header offset, record start offset, reported count, and the initial structured-prefix run.
|
||||
|
||||
The cache also records candidate late `DAT_800758d8` header-only override blobs as a standalone diagnostic. Those candidates are not used as final art binding yet.
|
||||
|
||||
`wdl-summary.json` now also emits `sceneInterpretation`, which is an explicit warning-bearing classification of what the current export most likely represents. For constructor-placement exports this should currently read as a constructor-fed live-object seed lane rather than a final visible-world reconstruction.
|
||||
|
||||
### `.output`
|
||||
|
||||
Per-run final outputs:
|
||||
|
||||
- `.output/<map-stem>.png`
|
||||
- `.output/<map-stem>.json`
|
||||
- `.output/<map-stem>_<layer>.png` for each rendered authored layer when layered mode is active
|
||||
|
||||
The JSON stores the final probe scene manifest used to draw the PNG.
|
||||
|
||||
The `.output` folder is reset at the start of each export so evaluation only sees artifacts from the current run.
|
||||
|
||||
The `.output/<map-stem>.json` manifest inherits `sceneInterpretation` from `wdl-summary.json` so consumers do not need to infer that warning from prose docs alone.
|
||||
|
||||
## Record Extraction Rules
|
||||
|
||||
`v0` now uses the loader-sized `post_audio_section_00` extraction paths as the primary scene source.
|
||||
|
||||
Current interpretation constraint:
|
||||
|
||||
- `section0_constructor_placements` should currently be treated as constructor-fed world-object seed records.
|
||||
- They preserve meaningful layout and projection structure, but current evidence does not support treating them as the complete visible map or static architecture layer.
|
||||
- If a render shows coherent room layout with globally wrong or repeated art, the exporter is currently visualizing one runtime object lane without the downstream per-type bind/state path and without the separate static-world substrate.
|
||||
|
||||
Record extraction rule:
|
||||
|
||||
- `auto` / `combined` / `layered` mode merges both authored section-0 families into one layered probe:
|
||||
- constructor placements provide the dense live-object seed lane
|
||||
- root-dispatch rows provide the smaller comparison and auxiliary authored lane
|
||||
- `constructors` / `region01` mode first searches the section-0 span for a count-prefixed `12`-byte constructor stream and, when found, treats each record as six little-endian `u16` words:
|
||||
- `typeWord`
|
||||
- `xWord`
|
||||
- `yWord`
|
||||
- `zWord`
|
||||
- `selectorWord`
|
||||
- `laneWord`
|
||||
- If a count-prefixed constructor stream is not found, the exporter falls back to the older whole-section `24`-byte paired-record scan as a compatibility probe.
|
||||
- `roots` / `region00` mode keeps the small count-prefixed root-dispatch probe for comparison and negative-evidence checks
|
||||
|
||||
Plausibility filter:
|
||||
|
||||
- `typeWord` in a conservative visible-family range
|
||||
- not all coordinate words are zero
|
||||
- `laneWord` is non-zero and within the current conservative control-word range
|
||||
|
||||
This is explicitly a probe schema, not a final loader-faithful schema.
|
||||
|
||||
Current negative result:
|
||||
|
||||
- Correcting the constructor stream start/count for `LSET1/L0.WDL` only changes the standalone constructor probe slightly (`1130 -> 1135` records, `1090 -> 1095` rendered items) and does not materially change the repeated wrong-art output. Current evidence therefore points to unresolved art/runtime binding as the primary blocker, not a missed constructor-tail decode.
|
||||
|
||||
## Art Binding Rule
|
||||
|
||||
`v0` uses one explicit diagnostic binding rule:
|
||||
|
||||
- `typeWord -> bundle slot index`
|
||||
|
||||
That means the sorted bundle list from `post_audio_region_04` is indexed directly by `typeWord` when the slot exists.
|
||||
|
||||
This rule is explicitly not claimed as final executable truth. Current docs and Ghidra evidence show the final art path goes through the late `DAT_800758d8` art bank plus downstream state-script/runtime selection. The slot rule remains useful only as a clean standalone negative-evidence probe.
|
||||
|
||||
For the generic family band now dominating `LSET1/L0` failures (`0x003e`, `0x0042`, `0x0044`, `0x0045`, `0x004f`, `0x0059`, `0x005b`), repeated wrong art is now understood as both a binding failure and a semantic-layer failure: the exporter is currently visualizing constructor-fed runtime object seeds as though they were the final visible world.
|
||||
|
||||
The chosen bundle and clamped frame index, plus binding-diversity metrics, are preserved in output metadata so failures stay auditable.
|
||||
|
||||
When debug labels are enabled for a map render, labels now identify unique rendered resources rather than per-instance placements. The stable label key is currently `bundle offset + clamped frame + resolved palette`. Validation atlas sheets still use progressive cell indices.
|
||||
|
||||
## Rendering Rule
|
||||
|
||||
For each record:
|
||||
|
||||
- compute `screenX` and `screenY` from the documented projection basis
|
||||
- select frame index from `selectorWord`, clamped to available frames
|
||||
- place sprite top-left at:
|
||||
- `screenX - originX`
|
||||
- `screenY - originY`
|
||||
|
||||
Current draw order is conservative:
|
||||
|
||||
- main-visible before special-visible
|
||||
- then ascending `screenY`
|
||||
- then ascending `screenX`
|
||||
|
||||
This is a probe approximation. The later graph-based stage-1 ordering still belongs to a future pass.
|
||||
|
||||
The rendered PNG uses a neutral opaque background by default so probe silhouettes are legible without relying on transparency.
|
||||
|
||||
## Color Rule
|
||||
|
||||
`v0` emits grayscale art from raw pixel indices.
|
||||
|
||||
Reason:
|
||||
|
||||
- bundle frame decode is already well constrained
|
||||
- full CLUT parity is not
|
||||
- grayscale preserves shape/variant evidence without pretending the palette problem is solved
|
||||
|
||||
Transparent index `0` stays transparent.
|
||||
|
||||
## CLI
|
||||
|
||||
Primary command:
|
||||
|
||||
```powershell
|
||||
node src/cli.js --source LSET1/L0.WDL
|
||||
```
|
||||
|
||||
Supported options:
|
||||
|
||||
- `--source <relative-path>`
|
||||
- `--wdl <absolute-or-relative-file>`
|
||||
- `--disc-root <path>`
|
||||
- `--map-source <auto|combined|layered|constructors|roots|region01|region00>`
|
||||
- `--out-name <stem>`
|
||||
|
||||
## Success Criteria
|
||||
|
||||
`v0` is successful if it can:
|
||||
|
||||
- parse a raw `LSET*.WDL`
|
||||
- recover the loader-sized section view alongside the region carve
|
||||
- scan bundles directly from `post_audio_region_04`
|
||||
- decode at least one frame from raw data
|
||||
- extract a stable constructor-placement record set from `post_audio_section_00`
|
||||
- write extracted sprite PNGs into `.cache`
|
||||
- write a readable diagnostic probe PNG into `.output`
|
||||
|
||||
## Planned Follow-Ups
|
||||
|
||||
- replace diagnostic slot binding with a direct parser for the late header-only `DAT_800758d8` override stream and bundle match path
|
||||
- recover the exact raw on-disk encoding of the earlier built-resource art-install blob so the two late art feeds are modeled separately instead of flattened into one guessed bank
|
||||
- identify and parse the separate static-world or subordinate level substrate that complements the constructor-fed live-object lane, instead of treating section-0 constructor placements as the whole map
|
||||
- add palette/CLUT reconstruction
|
||||
- add stage-1 graph ordering recovery
|
||||
- compare the probe scene against fixed live samples such as `map 104` without reintroducing viewer-side donor assumptions
|
||||
Loading…
Add table
Add a link
Reference in a new issue