Crusader_Decomp/docs/scummvm-crusader-reference.md
MaddoScientisto de42fd1ea1 Add Crusader-specific USECODE data and documentation
- Introduced new file `vm_mask_ladder.tsv` containing detailed mappings for Crusader USECODE VM masks and their associated descriptors.
- Added comprehensive documentation in `scummvm-crusader-reference.md` outlining the structure, findings, and implications for reverse-engineering the Crusader engine within ScummVM.
- Created `usecode-roundtrip-ir.md` to document the plan for converting Crusader USECODE bytes into a human-readable format, detailing the container layout, event names, and intrinsic tables.
- Implemented a PowerShell script `temp_usecode_sample.ps1` for extracting and analyzing USECODE data from the Crusader FLX files, providing insights into class and event structures.
2026-03-22 17:26:39 +01:00

428 lines
No EOL
20 KiB
Markdown

# ScummVM Crusader Reference
## Purpose
This note catalogs the Crusader-specific code inside ScummVM's Ultima 8 engine so it can be used as a planning aid for Crusader reverse-engineering work.
Primary source tree: `K:\misc\scummvm\engines\ultima\ultima8`
Important limitation: this is a high-level reimplementation, not a symbol map for the original DOS binaries. It is most useful for:
- identifying original data files and container formats
- naming likely subsystem boundaries
- understanding USECODE VM and event structure
- spotting Remorse versus Regret divergences
- finding concrete file-format footholds for parsers and validators
It is not sufficient on its own for direct raw-function renaming.
## Highest-Value Findings
1. ScummVM keeps a Crusader-specific USECODE description layer with named event ids and large intrinsic signature tables.
Files: `usecode/uc_machine.cpp`, `usecode/usecode_flex.cpp`, `convert/crusader/convert_usecode_crusader.h`, `convert/crusader/convert_usecode_regret.h`, `usecode/remorse_intrinsics.h`, `usecode/regret_intrinsics.h`.
2. ScummVM has explicit parsers for the core Crusader container families used by gameplay assets: FLEX archives, raw archives, USECODE containers, shapes, sound archives, speech archives, save files, and movie subtitle files.
Files: `filesys/flex_file.cpp`, `filesys/archive.cpp`, `filesys/raw_archive.cpp`, `usecode/usecode_flex.cpp`, `audio/sound_flex.cpp`, `audio/speech_flex.cpp`, `filesys/savegame.cpp`, `gumps/movie_gump.cpp`.
3. Crusader-specific gameplay metadata is loaded centrally from a predictable file set.
File: `games/game_data.cpp`.
This is the best ScummVM-side inventory of original asset families to compare against current RE notes.
4. World and item loading diverge for Crusader in a few concrete ways that likely reflect real original-engine differences.
Files: `world/map.cpp`, `world/current_map.cpp`, `world/item_factory.cpp`, `gfx/shape_info.cpp`, `world/weapon_info.h`, `world/world.cpp`, `world/egg.cpp`.
5. Crusader UI, media, and player-control code is separated into clear game-specific files.
Files: `gumps/cru_*.cpp`, `world/actors/cru_avatar_mover_process.cpp`, `audio/cru_music_process.cpp`, `games/start_crusader_process.cpp`, `games/cru_game.cpp`.
## Detection, Boot, and Game Split
### `metaengine.cpp`
- ScummVM treats Ultima 8 and Crusader as one engine family but gives Crusader its own control map.
- The Crusader keymap is a useful external reference for action vocabulary: weapon cycling, inventory cycling, medikit, energy cube, bomb detonation, search/select item, use selection, grab item, attack, center camera on player, jump/roll/crouch, sidesteps, rolls, and crouch toggle.
- `querySaveMetaInfos()` uses `SavegameReader`, which is the entry point for ScummVM-side Crusader save metadata.
### `ultima8.cpp`
- Engine startup registers Crusader-specific process loaders such as `CruAvatarMoverProcess`, `CruPathfinderProcess`, and `CruMusicProcess`.
- `initializePath()` explicitly adds a `data` subdirectory for at least one Regret variant.
### `games/cru_game.cpp`
- `loadFiles()` loads Crusader palettes from `static/gamepal.pal`, `cred.pal`, `diff.pal`, `misc.pal`, `misc2.pal`, and optionally `star.pal`.
- `loadFiles()` then calls `GameData::loadRemorseData()`, which is the central Crusader asset-loader in ScummVM.
- `startGame()` creates the main actor with shape `1`, reserves object ids `384..511`, initializes HP and energy-like stats from `NPCDat`, and switches to map `0`.
- `playIntroMovie()` uses `T01` and `T02` for Remorse, `origin` and `ANIM01` for Regret, and warns that `FLICS` and `SOUND` directories must be copied from the CD.
### `games/start_crusader_process.cpp`
- Startup sequence is explicit: intro movie 1, intro movie 2, difficulty menu, then live game setup.
- ScummVM creates the Crusader HUD gumps (`CruStatusGump`, `CruPickupAreaGump`) before normal play begins.
- It seeds inventory with shape `0x4d4` (`datalink`) and `0x598` (`smiley`), sets shield type, teleports the actor through map `1`, egg `0x1e`, and applies a Regret-specific combat-ready start state.
- This file is a good checklist for early-game object ids, item shapes, and startup-only side effects.
## Core Asset Loading
### `games/game_data.cpp`
`GameData::loadRemorseData()` is the single best source-file summary of original Crusader asset families known to ScummVM.
Loaded files and why they matter:
- `static/fixed.dat`: fixed-object archive for world/map loading.
- `usecode/<lang>usecode.flx`: main USECODE container.
- `static/shapes.flx`: main shape archive, loaded with Crusader-specific shape format.
- `remorseweapons.ini` or `regretweapons.ini`: ScummVM-maintained weapon metadata overlays.
- `remorsegame.ini`: ScummVM-maintained game config overlay.
- `static/typeflag.dat`: per-shape type flags.
- `static/anim.dat`: animation metadata.
- `static/wpnovlay.dat`: weapon overlay metadata.
- `static/glob.flx`: glob data loaded into `MapGlob` objects.
- `static/fonts.flx`: font archive.
- `static/mouse.shp`: cursor shapes.
- `static/gumps.flx`: UI art.
- `static/dtable.flx`: NPC data table (`NPCDat`).
- `static/damage.flx`: damage data consumed by main shape logic.
- `sound/sound.flx`: sound archive.
- `sound/<lang><shape>.flx`: speech per shape, loaded lazily by `getSpeechFlex()`.
Implication for RE:
- This gives a concrete file-driven decomposition of the engine: world placement, usecode, shape/type metadata, overlay metadata, NPC tables, damage rules, UI art, sound, and speech are all separated.
- `dtable.flx`, `damage.flx`, `glob.flx`, and `wpnovlay.dat` should be treated as high-value parser targets if they are not already covered in local tooling.
## Container and File-Format Evidence
### `filesys/flex_file.cpp`
- FLEX detection looks for a padded header region filled with `0x1A`.
- Metadata reader uses:
- table offset `0x80`
- entry count at file offset `0x54`
- 8-byte table entries of `<offset, size>`
- ScummVM rejects counts above `4095` and notes that the largest observed Crusader/U8 FLEX has `3074` entries.
Implication for RE:
- This strongly matches the currently validated EUSECODE/FLEX structure already recovered locally.
- It also gives a second independent implementation to compare against any local extractor edge cases.
### `filesys/archive.cpp` and `filesys/raw_archive.cpp`
- `Archive` layers multiple `FlexFile` sources and resolves objects from newest source to oldest source.
- `RawArchive` caches raw object bytes and exposes them as memory streams.
Implication for RE:
- If any Crusader resources use overlay-style replacement behavior, ScummVM already models that archive precedence.
- This is worth checking before assuming a single-file source of truth for a given object id.
### `usecode/usecode_flex.cpp`
- USECODE classes are addressed as `classid + 2` inside the archive.
- Class names are read from object `1` at `name_object + 4 + 13 * classid`.
- For Crusader, class base offset is read from bytes `8..11` of the class object and decremented by `1`.
- Crusader event count is computed as `(get_class_base_offset(classid) + 19) / 6`.
Implication for RE:
- This is directly relevant to current USECODE work. It provides ScummVM's concrete interpretation of the Crusader class header layout and event-table sizing.
- If local EUSECODE or USECODE parsing still has uncertainties around header size, entry table layout, or event count, this file is the first external cross-check to apply.
## USECODE VM, Events, and Intrinsics
### `usecode/uc_machine.cpp`
- Crusader uses a `ByteSet(0x1000)` global-state store, unlike the U8 `BitSet` path.
- Remorse initializes global `0x003c` to avatar number `1`; Regret initializes global `0x001e`.
- The VM selects `ConvertUsecodeCrusader` for Remorse and `ConvertUsecodeRegret` for Regret.
Implication for RE:
- This is concrete evidence that the Crusader VM/global model diverges from U8 enough that it should not be treated as a drop-in match.
- The initialized global slots are worth comparing against already-known runtime globals in the raw executable.
### `convert/crusader/convert_usecode_crusader.h`
- ScummVM ships a named Crusader event table for event ids `0x00..0x1f`.
- Named events include `look`, `use`, `anim`, `setActivity`, `cachein`, `hit`, `gotHit`, `hatch`, `schedule`, `release`, `equip`, `unequip`, `combine`, `calledFromAnim`, `enterFastArea`, `leaveFastArea`, `avatarStoleSomething`, `animGetHit`, and `unhatch`.
- The same file also includes a large 512-entry intrinsic signature table with many behavior comments extracted from prior Pentagram reverse-engineering.
### `convert/crusader/convert_usecode_regret.h`
- Regret reuses the Crusader event-name table but has a different intrinsic numbering/signature map.
### `usecode/remorse_intrinsics.h` and `usecode/regret_intrinsics.h`
- These provide the live intrinsic dispatch tables used by the engine.
- High-value entries for current RE include weapon firing, status/quality accessors, object creation/destruction, camera moves, palette fades, movie playback, teleport-to-egg, keycard clearing, damage reception, and Crusader-specific audio calls.
High-value USECODE bridge examples from ScummVM's tables:
- `Item::I_fireWeapon`
- `AudioProcess::I_playSFXCru`
- `AudioProcess::I_playAmbientSFXCru`
- `StatusGump::I_hideStatusGump` / `I_showStatusGump`
- `MovieGump::I_playMovieOverlay`
- `World::I_setControlledNPCNum`
- `MainActor::I_clrKeycards`
- `PaletteFaderProcess` fade/jump helpers
- `Egg::I_getEggId`, `I_getEggXRange`, `I_setEggXRange`
Implication for RE:
- These files are an immediate planning aid for USECODE annotation. Even where names are approximate, they constrain argument counts, broad behavior, and event purpose.
- `convert_usecode_crusader.h` is especially valuable because it records many comments of the form "based on disasm" or "same coff as", which likely came from earlier source-level Crusader RE.
## Shapes, Type Flags, Weapons, and Item Families
### `convert/crusader/convert_shape_crusader.cpp`
- ScummVM declares two Crusader-specific shape layouts: `CrusaderShapeFormat` and `Crusader2DShapeFormat`.
- The main 3D-ish shape format uses:
- 6-byte header
- 8-byte frame header
- 28-byte secondary frame header
- explicit width/height/xoff/yoff fields
- The 2D shape format uses a 20-byte secondary frame header.
Implication for RE:
- This is the quickest external reference for main-world versus UI/mouse/gump shape decoding.
### `gfx/shape_info.cpp`
- Crusader type flags are decoded with a different bit layout than U8.
- ScummVM treats Crusader type-flag space as extending to at least bit `71`, with several still-marked unknown ranges.
Implication for RE:
- Any local typeflag decoder should treat Crusader as its own layout, not as the U8 layout with extra cases.
### `world/weapon_info.h`
- Crusader-specific weapon fields include `_sound`, `_reloadSound`, `_ammoType`, `_ammoShape`, `_displayGumpShape`, `_displayGumpFrame`, `_small`, `_clipSize`, `_energyUse`, `_field8`, and `_shotDelay`.
Implication for RE:
- This header is a good target schema for interpreting weapon-related tables and shape metadata in the original data.
- `_field8` is still uncertain in ScummVM, which is a useful warning not to over-claim its meaning in the raw game.
### `world/item_factory.cpp`
- Crusader item families include `SF_CRUWEAPON`, `SF_CRUAMMO`, `SF_CRUBOMB`, and `SF_CRUINVITEM`.
- Item construction applies Crusader-only defaults:
- damage points from shape damage info
- weapon clip size copied into initial quality
- ammo and bomb quality initialized to `1`
Implication for RE:
- This ties together shape family, shape damage info, weapon tables, and runtime item state.
- The quality field is confirmed as overloaded for ammo/clip counts and inventory stack-like quantities.
## World, Maps, Eggs, and Cache-In Behavior
### `world/map.cpp`
- Fixed and nonfixed map objects are read as 16-byte records.
- ScummVM reads each record as:
- `x` = uint16
- `y` = uint16
- `z` = uint8
- `shape` = uint16
- `frame` = uint8
- `flags` = uint16
- `quality` = uint16
- `npcNum` = uint8
- `mapNum` = uint8
- `next` = uint16
- It then applies `World_FromUsecodeXY(x, y)` before constructing items.
- Container nesting is not read from a separate structure: the on-disk `x` field is temporarily treated as container depth while reading hierarchical contents.
Implication for RE:
- This is one of the most concrete format descriptions in the ScummVM codebase.
- It is directly useful for validating fixed/nonfixed parsers and for checking whether any currently unnamed raw loader functions correspond to this record layout.
### `world/current_map.cpp`
- Crusader uses `_mapChunkSize = 1024`; U8 uses `512`.
- When loading a map, ScummVM always calls cache-in events in Crusader (`callCacheIn = (_currentMap != nullptr || GAME_IS_CRUSADER)`).
- It also explicitly calls actor cache-in events for Crusader after actor scheduling.
Implication for RE:
- Cache-in behavior appears more aggressive or more semantically important in Crusader than in U8.
- This may help explain some map-enter or object-activation behavior currently attributed to general dispatch code.
### `world/egg.cpp`
- Crusader supports `unhatch()` as a real egg event path; U8 does not.
- Eggs store a `_hatched` state and expose `get/set egg x/y range` plus `get/set egg id` intrinsics.
Implication for RE:
- `unhatch` is a strong clue for interpreting Crusader trigger/reset behavior.
### `world/world.cpp`
- Crusader save/load stores extra world fields beyond the shared baseline:
- alert active
- difficulty
- controlled NPC number
- Vargas shield value
- `setAlertActiveRemorse()` and `setAlertActiveRegret()` search for concrete shape ids and mutate frames/shapes to update world-state visuals.
- `setGameDifficulty()` contains a Remorse-specific BA-40 ammo patch that modifies weapon metadata at runtime.
Implication for RE:
- Alert-state and difficulty are not just UI globals; ScummVM models them as world-affecting state with concrete shape mutations.
## UI, Interaction, and Player-Control Code
### `gumps/cru_status_gump.cpp`
- Crusader HUD is composed from five child gumps: weapon, ammo, inventory, health, and energy.
### `gumps/cru_weapon_gump.cpp`, `cru_ammo_gump.cpp`, `cru_inventory_gump.cpp`
- HUD display is driven by weapon metadata fields such as `_displayGumpShape`, `_displayGumpFrame`, `_ammoShape`, and live `quality` values.
- `CruAmmoGump` confirms bullets are current weapon quality and reserve clips are counted from the first inventory item matching `ammoShape`.
- `CruInventoryGump` renders the active inventory item through the weapon-info display fields and shows quantity when `quality > 1`.
Implication for RE:
- These files are a good external model for active-weapon, ammo-reserve, and active-inventory state fields.
### `gumps/game_map_gump.cpp`
- Double-click `use` range is `512` in Crusader versus `128` in the shared path.
### `world/actors/cru_avatar_mover_process.cpp`
- Crusader movement logic is explicitly different from U8 and models combat movement, one-shot moves, short jump, crouch, sidesteps, rolls, rebel-base special cases, and combat-angle smoothing.
Implication for RE:
- This file is a practical behavioral checklist when classifying input/combat locomotion code in the raw executable.
## Audio, Speech, and Movies
### `audio/sound_flex.cpp`
- Crusader `sound.flx` differs from U8:
- object `0` contains an index whose entries start with a leading `0x00` or `0xFF`, then 3 bytes of extra data, then a null-terminated sound name
- `ASFX` entries are interpreted as 32-byte header plus raw 11025 Hz sample data
- Non-`ASFX` entries fall back to Sonarc decoding.
Implication for RE:
- This is one of the strongest container-format anchors in the ScummVM codebase.
- If local tooling still treats Crusader audio as opaque FLEX payloads, this file should drive the next parser pass.
### `audio/speech_flex.cpp`
- Speech FLEX object `0` is parsed as a sequence of null-terminated phrases.
- Playback lookup is phrase-prefix based: ScummVM normalizes text and searches phrase table entries to map text to sound samples.
Implication for RE:
- Speech archives are not just sample banks; they embed text phrase indices.
- This can help tie dialog strings back to per-shape voice resources.
### `audio/cru_music_process.cpp`
- Remorse and Regret have separate track name tables.
- Regret track `0x45` means "use the current map's default track" via a hardcoded map-to-track table.
- Remorse track `16` cycles through `M16A`, `M16B`, and `M16C`.
- Music is loaded from `sound/<track>.amf`.
Implication for RE:
- This is useful for identifying music-selection logic and map-to-music linkage in the original executable.
### `gumps/movie_gump.cpp`
- Crusader movie playback uses AVI files under `flics/`.
- Subtitle loading accepts either `.txt` or `.iff` sidecar files.
- ScummVM normalizes certain movie names because USECODE references `mva1`, `mva3a`, `mva5a`, etc., while files on disk may be `mva01`, `mva03a`, `mva05a`.
Implication for RE:
- This is a concrete example of ScummVM compensating for original asset-name/usecode mismatches.
- The subtitle `.iff` fallback is a useful clue for unexplained IFF-like resources.
## Save/Load Format
### `filesys/savegame.cpp`
- ScummVM supports two save formats:
- native `VMU8` saves with versioned file-entry archive payloads
- older Pentagram zip-based saves
- Native saves use a 12-byte file name field and per-entry size/data blocks.
Implication for RE:
- This is mostly relevant to ScummVM compatibility, not original DOS save format recovery.
- It still matters because ScummVM serializes engine state explicitly enough to reveal which runtime fields it considers necessary for Crusader continuity.
## Best Files For Immediate RE Follow-Up
If time is limited, the most valuable ScummVM files to mine first are:
1. `games/game_data.cpp`
Why: best single inventory of Crusader data files and subsystems.
2. `usecode/usecode_flex.cpp`
Why: concrete Crusader USECODE class header and event-count interpretation.
3. `convert/crusader/convert_usecode_crusader.h`
Why: named event ids plus a large intrinsic-signature table with comments.
4. `audio/sound_flex.cpp`
Why: concrete Crusader sound archive interpretation.
5. `world/map.cpp`
Why: concrete fixed/nonfixed map record layout and container nesting behavior.
6. `world/weapon_info.h` and `world/item_factory.cpp`
Why: practical schema for weapon/ammo/inventory metadata.
7. `gumps/movie_gump.cpp`
Why: movie filename normalization and subtitle sidecar handling.
8. `world/current_map.cpp` and `world/world.cpp`
Why: Crusader-only cache-in, alert-state, difficulty, and map chunk differences.
## Suggested RE Uses In This Repo
### USECODE parsing
- Compare local USECODE/EUSECODE container assumptions against `usecode/usecode_flex.cpp`.
- Import ScummVM's event-name table as a conservative annotation source for event ids `0x00..0x1f`.
- Use `convert_usecode_crusader.h` and `remorse_intrinsics.h` as a cross-check for intrinsic numbering, argument counts, and broad semantics.
- Compare Remorse versus Regret intrinsic numbering before assuming one numbering scheme is universal.
### Data-format work
- Validate local FLEX readers against `filesys/flex_file.cpp`.
- Prioritize parsers for `dtable.flx`, `damage.flx`, `glob.flx`, and `wpnovlay.dat` because ScummVM treats them as core runtime inputs.
- Split shape decoding between Crusader main shapes and 2D/gump shapes using `convert_shape_crusader.cpp`.
- Treat `sound.flx` and speech FLEX files as structured formats, not opaque blob stores.
### Raw executable classification
- Use ScummVM's subsystem boundaries to guide search targets for:
- cache-in and unhatch event paths
- alert-state world mutations
- map chunking and area search behavior
- weapon clip/ammo/energy metadata consumers
- movie name normalization and subtitle loading
- Regret map-to-track music selection
## Conservative Takeaways
- ScummVM does not directly solve raw-symbol naming, but it materially sharpens the planning surface for Crusader RE.
- The most actionable ScummVM contributions are format schemas, event/intrinsic vocabularies, and subsystem boundaries.
- For current repo priorities, the strongest leverage is on USECODE parsing, data-file parser expansion, and validation of world/object metadata structures.