11 KiB
USECODE Tooling Comparison
Purpose
This note compares three different USECODE-facing tool lines now in use around the Crusader work:
- Pentagram's built-in Crusader usecode converter/disassembler
- the local
crusader-disasmcorpus and helper scripts - the current workspace parser/decompiler in
tools/poc_crusader_usecode_parser.py
The goal is not to rank them abstractly. The goal is to state what each one is actually good at, what assumptions it bakes in, and why the current local parser had to diverge.
Short version
Pentagram is a game-engine-side disassembler/converter with generic Crusader hooks.
crusader-disasm is mostly a generated disassembly corpus plus small maintenance scripts that mine or preserve information from that corpus.
Our current parser is the first tool in this workspace that is explicitly built around the validated owner-loaded EUSECODE.FLX structure recovered from the retail binary and then pushed further into readable pseudocode export.
Pentagram: what it does
The relevant Pentagram pieces are:
convert/crusader/ConvertUsecodeCrusader.hconvert/Convert.htools/disasm/Disasm.cppusecode/UsecodeFlex.cpp
Pentagram's model
Pentagram is trying to solve a different problem from our current script. It is not primarily a workspace extraction/decompilation pipeline. It is an engine-aware converter/disassembler that sits on top of Pentagram's own USECODE model.
Its Crusader-specific logic provides:
- an event-name table for slots
0x00..0x1f - an intrinsic-name table
- a Crusader header reader
- Crusader event-table decoding through
readevents - Crusader opcode parsing by routing into the generic
readOpGeneric(..., crusader=true)path
What Pentagram assumes
Pentagram's class/container assumptions come from its own UsecodeFlex and converter model:
- class bodies are addressed as object
classid + 2 - class names come from object
1 - the Crusader base offset comes from bytes
8..11, then decremented by1 - event count is derived as
(base_offset + 19) / 6 - disassembly is driven from the converter header and event table, not from our later owner-loaded extractor outputs
That is close enough to be extremely useful, but it is not the same as the now-validated local owner-loaded reading we use in this repo.
What Pentagram outputs well
Pentagram is strong at:
- linear opcode disassembly
- printing BP/SP-relative references in a readable way
- mapping class/slot offsets to event names
- following opcode
0x5Csymbol-info records into trailing local/debug symbol data - printing those debug symbols after the code body
The JELYHACK example is a good illustration. Pentagram's disassembly prints:
Func_1 (Event 1) JELYHACK::use():
0001: 5A init 00
0003: 5C symbol info offset 001Ch = "JELYHACK"
000F: 0B push 0207h
0012: 40 push dword [BP+06h]
0014: 4C push indirect 02h bytes
0016: 77 set info
0017: 78 process exclude
0018: 5B line number 219 (00DBh)
001B: 50 ret
00: 01 type=69 (i) [BP+00h] (00) 00 referent
002A: 7A end
That is still one of the clearest proofs that the post-ret region contains local/debug-style metadata, not active control flow.
Where Pentagram stops short for this repo
Pentagram is not built around our current local needs:
- it does not consume
class_layout_index.tsv,class_event_index.tsv, or the extracted chunk corpus - it does not expose a workspace-friendly IR
- it does not attach our verified runtime anchors from
runtime_vm_ir.tsv - it does not export batch pseudocode for the whole
EUSECODEcorpus - it still reflects a converter/disassembler view, not a readability-first decompiler view
- its Crusader intrinsic table is explicitly mixed with Regret-era knowledge and is useful as a hint table, not rename authority
So Pentagram gave us crucial structure and vocabulary, but not the repo-specific decompilation pipeline we needed.
crusader-disasm: what it does
The local crusader-disasm tree is different again. It is not one coherent parser in the same way Pentagram is. It is a mixture of:
- a large generated disassembly corpus in
crusader_disasm.txt - opcode-name tables such as
usecode_opcodes.txt - small maintenance scripts such as
parse_crusader_disasm.pyandupdate_disasm_comments.py - handwritten notes and side data gathered over time
What crusader-disasm is strongest at
Its biggest strength is that it is already a rich evidence corpus.
usecode_opcodes.txt gives a full opcode-name vocabulary such as:
0x04 ASSIGN_MEMBER_CHAR0x10 NEAR_ROUTINE_CALL0x5C SYMBOL_INFO0x78 PROCESS_EXCLUDE0x7A END
That helped verify several names and fill decode gaps in our parser.
The generated crusader_disasm.txt is also valuable because it shows concrete output form, not just names. It proved things like:
- how
symbol infois rendered - where local/debug symbol rows appear
- what a tiny body like
JELYHACK::uselooks like in a traditional disassembly listing
What the helper scripts actually do
The helper scripts in crusader-disasm are narrow and pragmatic.
parse_crusader_disasm.py:
- scans an already-generated
crusader_disasm.txt - looks for
callilines, nearbyadd sp, and retval pushes - infers rough intrinsic prototypes from the text listing
- emits a guessed intrinsic table
That means it is not parsing EUSECODE.FLX directly. It is mining structure from a pre-rendered textual disassembly.
update_disasm_comments.py:
- merges comments from an older disassembly into an updated regenerated one
- preserves manual annotations when intrinsic names change
So this is again a maintenance aid around a text corpus, not a first-principles byte parser.
Where crusader-disasm stops short for this repo
crusader-disasm is excellent evidence, but weak as a live decompilation pipeline:
- it does not operate on our extracted owner-loaded chunk/index data
- it does not produce structured IR
- it does not know our validated body windows from
class_event_index.tsv - it does not emit script/pseudocode views
- it does not integrate runtime-anchor hints from the current RE notes
- some of its information is annotation-quality and corpus-quality rather than machine-robust parser output
In practice, crusader-disasm has been most useful as a vocabulary/evidence source, not as the final tool we run to generate the readable corpus.
Our current parser/decompiler: what it does differently
The current local tool line is centered on:
tools/extract_eusecode_flx.pytools/poc_crusader_usecode_parser.pytools/export_usecode_pseudocode.py
1. It is built around the validated owner-loaded local format
This is the biggest difference.
Our parser does not start from Pentagram's generic converter header model or from a pre-rendered disassembly text file. It starts from the extracted local artifacts and the currently validated retail-binary understanding:
class_id + 2body lookup- bytes
8..11treated as the first code-byte anchor /code_base_minus_onebasis - 6-byte event rows at
+20 - derived body ranges emitted into
class_event_index.tsv - chunk files under
USECODE/EUSECODE_extracted/chunks/
That is why it can decompile the actual extracted corpus in a repeatable workspace-local way.
2. It separates authoritative IR from readable views
Pentagram and crusader-disasm mostly produce one human-facing linear listing.
Our parser deliberately splits output into layers:
- JSON IR for machine-facing structure
- flat text listing for byte-faithful decode
- script view for stack-machine readability
- pseudocode view for programming-language-like readability
- batch export of that pseudocode corpus into
USECODE/EUSECODE_extracted/pseudocode
That separation is what let us make JELYHACK readable without losing the exact bytes and trailer structure.
3. It handles post-ret metadata differently
Pentagram already knew about debug symbols through 0x5C and readDbgSymbols().
The important difference is that our parser had to make that logic safe in the extracted-corpus setting:
- it now detects ret-anchored debug/local trailers explicitly
- it avoids mis-decoding those bytes as live opcodes on bodies like
NPCTRIG 0x0A - it exposes debug symbols in the IR and readable views
- it now hides dead post-return junk from the human pseudocode when readability matters more than raw listing fidelity
So Pentagram gave the structural clue, but our parser had to adapt it to the owner-loaded extracted corpus and to the readability-first output mode.
4. It adds runtime cross-reference hints that the older tools do not
Our parser attaches the verified runtime bridge information from runtime_vm_ir.tsv and related notes, such as:
000d:0988000d:177c000d:1acb000d:208b000d:21ed000d:22bc000d:2104000d:46ec000d:ebe3
Neither Pentagram nor crusader-disasm is doing that kind of live repo-specific runtime correlation.
5. It is aimed at whole-corpus readability, not only opcode fidelity
This is the most visible practical difference.
Pentagram and crusader-disasm are good at telling you what bytes and opcodes are present.
Our current script is trying to answer a different question too:
What does this class body seem to do, in language a human can scan?
That is why the current parser now:
- names locals where the debug trailer provides them
- folds compare ladders into
if / else if - suppresses dead post-
rettail noise in pseudocode - exports the whole decoded corpus into per-class pseudocode files
That is the main place where our script now goes beyond the older tools.
What the older tools still do better
This is not a one-way replacement story.
Pentagram still does some things better than our current script:
- broader mature generic opcode conversion framework
- a cleaner historical disassembler path for symbol-info and debug-symbol printing
- a converter architecture that already knows how to build node-like structures for many ops
crusader-disasm still does some things better too:
- richer long-lived annotation corpus
- a larger existing body of older naming/vocabulary experiments
- a direct opcode-name table from a distinct extraction route
- concrete disassembly output that is sometimes easier to cross-check than a newer heuristic pseudocode layer
So the best current workflow is still hybrid:
- use Pentagram for structural/reference behavior
- use
crusader-disasmfor opcode vocabulary and corpus evidence - use the local parser for validated owner-loaded extraction, IR, pseudocode, and batch readability export
Best current summary
Pentagram is a converter/disassembler.
crusader-disasm is a disassembly corpus with helper scripts.
Our script is the first repo-local tool that is explicitly trying to be a readable decompiler over the validated extracted EUSECODE corpus.
That is why the current parser looks less like a classic disassembler and more like a layered RE workbench:
- extractor-backed local format understanding
- structured IR
- byte-faithful listing
- readability-first script/pseudocode views
- batch corpus export
- runtime-annotation hints tied to the current Crusader notes
The tradeoff is that our current script is newer and more heuristic. It is better at producing something a human can read across the whole corpus, but it is not yet as mature or as battle-tested at raw opcode coverage as the older reference tools.