Compare commits

...

2 commits

Author SHA1 Message Date
a56851f994 Enhance CLI functionality and improve common utilities
- Added new commands to the CLI for dumping regions, renaming functions by address, and setting various types of comments.
- Implemented JSON output formatting for CLI commands.
- Introduced functions for decompiling and disassembling functions, as well as retrieving cross-references.
- Enhanced common utilities with functions for reading memory regions, iterating Java items, and managing function metadata.
- Added suppress_output context manager to hide process output during Ghidra startup.
- Updated existing functions to improve error handling and output formatting.
2026-03-21 09:44:35 +01:00
24d4416003 Add various scripts and JSON plans for Ghidra project
- Introduced `seg043_boundary_repair.json` to manage function boundaries in segment 043.
- Created `read_file.py` for reading and printing file content size.
- Added `resolve_bb4f.py` to resolve specific function call targets.
- Implemented `resolve_top_targets.py` to find resolved NE targets for top-called wrapper functions.
- Added `script_contents.txt` to summarize NE relocation far calls.
- Updated `tier4_ghidra.txt`, `tier4_ghidra_check.txt`, `tier4_output.txt`, and `tier4_result.txt` with function call statistics.
- Created `tier5_errors.txt` for error logging and `tier5_output.txt` for additional function call statistics.
- Established `tools` directory with helper scripts for the Ghidra project, including CLI and common functionalities.
- Implemented command-line interface in `cli.py` for various project operations.
- Added `common.py` for shared functions and configurations across tools.
- Introduced `validate_fixups.py` to validate NE relocation fixups against known addresses.
2026-03-20 23:50:39 +01:00
41 changed files with 146748 additions and 14 deletions

View file

@ -31,6 +31,16 @@ applyTo: "**"
- Record raw-import addresses alongside original segment-relative offsets when porting names.
- **Always use `rename_function_by_address`**`rename_function` (by name) fails with "must have required property 'old_name'" and is broken. Use `"function_address": "000c:XXXX"` format.
# PyGhidra Fallback
- Use the local PyGhidra toolkit in `tools/pyghidra_crusader` when MCP is missing an operation such as function creation, deletion, or batched scripted edits.
- The workspace-local Python environment for this toolkit is `.venv-pyghidra311`, created from `C:\Users\Maddo\.pyenv\pyenv-win\versions\3.11.6\python.exe` and installed from the bundled Ghidra 11.3.2 offline packages.
- Default install dir for the toolkit is `I:\Apps\ghidra_11.3.2_PUBLIC`.
- Invoke the toolkit with `\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader ...` from the repo root.
- Keep PyGhidra batches small too: prefer one focused repair plan or 1-5 direct edits at a time.
- Write operations require the Ghidra project to open successfully. If `Crusader.lock` is present because the GUI owns the project, close Ghidra first or operate on a project copy.
- If the workflow needs the user to change Ghidra state, use the ask-questions tool with a yes/no confirmation prompt instead of plain text. Ask the user to close Ghidra before PyGhidra write commands, and ask the user to open the Ghidra project before MCP server commands. The prompt should briefly describe exactly what to do and instruct the user to answer `Yes` only after the action is complete.
# Current Verified Raw-Import Ports
- `0006:e5d0` = `cursor_update_hover` from seg001 `0x0060`

View file

@ -0,0 +1,176 @@
# PyGhidra Ghidra Ops
Use this skill when Ghidra MCP is missing a needed operation and you need native CPython access to the Ghidra API for the local Crusader project.
## Use Cases
- Create or delete functions in `CRUSADER-RAW.EXE`.
- Apply small batched repairs driven by verified addresses.
- Add comments or rename functions by address from a repeatable JSON plan.
- Decompile or disassemble functions without switching back to the MCP server.
- Query function metadata, search by name, and inspect xrefs from the same local CLI.
- Inspect project root files to confirm the program name/path before running edits.
## Workspace Defaults
- Ghidra install dir: `I:\Apps\ghidra_11.3.2_PUBLIC`
- Ghidra project dir: repo root
- Ghidra project name: `Crusader`
- Default program: `CRUSADER-RAW.EXE`
- Local Python env: `.venv-pyghidra311`
- CLI entrypoint: `.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader`
## Constraints
- Stay conservative. Use the same rename and batch-size rules as the main Ghidra workflow.
- Prefer one focused plan or 1-5 direct edits at a time.
- Write operations require the project to be openable for modification. If `Crusader.lock` is present because the GUI owns the project, close Ghidra first or work on a copy.
- Keep `crusader_decompilation_notes.md` updated after verified repair batches.
## Commands
List root project files:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader project-files
```
Delete a bad function object:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader delete-function --entry 0007:5b6f
```
Create a repaired function with an explicit body:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader create-function \
--entry 0007:5a90 \
--name seg043_func_0090 \
--body-start 0007:5a90 \
--body-end 0007:5b79 \
--plate-comment "Recovered from standalone seg043 boundary scan"
```
Rename a function by entry address:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader rename-function --entry 0006:02cc --name entity_class_get_flag20
```
MCP-style read/query commands are also available from the same CLI:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader get-function-by-address --address 000a:48ff
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader get-function-containing --address 000a:4901
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader decompile-function-by-address --address 000a:48ff
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader disassemble-function --address 000a:48ff
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader read-region --start 000a:48ff --end 000a:4912
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader search-functions-by-name --query rng_
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader list-strings --limit 20
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader list-imports --limit 20
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader list-exports --limit 20
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader list-namespaces --limit 20
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader list-segments --limit 20
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader list-data-items --limit 20
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader list-classes --limit 20
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader get-xrefs-to --address 000a:48ff
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader get-function-xrefs --name rng_next_modulo
```
All commands also support structured output for scripting:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader --format json get-function-by-address --address 000a:48ff
```
For ad hoc investigation, prefer `run-script` over multiline `python -c` or pasted PowerShell here-strings. It avoids leaving the shared shell stuck in an unfinished string/block state:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader run-script --script .\pyghidra_plans\inspect_rng.py --read-only
```
Script globals available inside `run-script`:
```python
config
project
program
helpers["get_function"]
helpers["get_function_containing"]
helpers["decompile_function"]
helpers["disassemble_function"]
helpers["get_xrefs_to"]
helpers["get_xrefs_from"]
helpers["read_region_bytes"]
helpers["rename_function"]
helpers["set_comment"]
```
Write-side MCP-style aliases are available too:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader rename-function-by-address --entry 000a:48ff --name rng_next_modulo
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader set-decompiler-comment --address 000a:48ff --text "Returns RNG output modulo the requested bound."
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader set-disassembly-comment --address 000a:48ff --text "Modulo wrapper around rng_advance_state"
```
Apply a small JSON plan:
```json
{
"transaction": "Repair seg043 boundaries",
"remove_functions": [
"0007:5b6f"
],
"create_functions": [
{
"entry": "0007:5a90",
"name": "seg043_func_0090",
"body_start": "0007:5a90",
"body_end": "0007:5b79",
"comment": "Recovered from standalone seg043 boundary scan"
},
{
"entry": "0007:5b7a",
"name": "seg043_func_017a",
"body_start": "0007:5b7a",
"body_end": "0007:5c1b"
},
{
"entry": "0007:5c1c",
"name": "seg043_func_021c",
"body_start": "0007:5c1c",
"body_end": "0007:5c80"
}
],
"comments": [
{
"address": "0007:5b6f",
"text": "Old auto-created split overlaps the earlier seg043:0090..0179 routine.",
"type": "plate"
}
]
}
```
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader apply-plan --plan .\seg043_repair.json
```
Dry-run a plan before touching the project:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader apply-plan --plan .\seg043_repair.json --dry-run
```
## Implementation Notes
- Address strings accept raw `SSSS:OOOO` form or plain integers such as `0x75a90`.
- The CLI tries a few root folder path variants when opening the program so it can tolerate minor project path differences.
- Plan files support `remove_functions`, `rename_functions`, `create_functions`, `comments`, and `assert_functions`.
- `set-decompiler-comment` maps to a pre-comment and `set-disassembly-comment` maps to an EOL comment.
- Read/query commands open the program read-only; create/rename/comment/plan commands still require the project to be writable.
- `run-script --read-only` is the safest way to do one-off inspection without getting the shared PowerShell session stuck in a multiline Python string.
- `read-region` now reads bytes one address at a time instead of relying on a bulk `getBytes` path that produced misleading all-zero results in this project under PyGhidra.
- PyGhidra startup now suppresses the noisy local GhidraMCP `Module.manifest` warnings during normal CLI operation.

27
.gitignore vendored
View file

@ -5,9 +5,36 @@ ghidra_*
*.swp
*.lock
*.lock~
.tmp_*
# IDE and OS files
.vscode/
.idea/
.DS_Store
Thumbs.db
# Local Python environments
.venv-pyghidra311/
# Python caches, bytecode, and tooling state
__pycache__/
*.py[cod]
*$py.class
.python-version
.pytest_cache/
.mypy_cache/
.ruff_cache/
.pyre/
.hypothesis/
.tox/
.nox/
.coverage
.coverage.*
htmlcov/
build/
dist/
*.egg-info/
# Local scratch and probe files
.tmp_*.txt
.tmp_*.py

View file

@ -30,6 +30,33 @@
- Naming note:
- `seg001` and `seg021` both contain a keyboard handler; in the full program database, the seg001 copy is named `seg001_input_keyboard_handler` to avoid a symbol collision with seg021 `input_keyboard_handler`.
### Address Space Layout in the Raw Import
Ghidra segment:offset `SSSS:OOOO` = flat address `SSSS * 0x10000 + OOOO`.
| Flat range | Content |
|---|---|
| `0x00000``0x36F6F` | Phar Lap 286 DOS extender (outer MZ stub code) |
| `0x36F70` | NE header (145-segment game image begins here in file) |
| `0x6E570`+ | NE game segments at their Phar Lap linear load addresses |
Mapping rule (verified for seg001 and seg021):
```
runtime_flat_base = NE_segment_file_offset + 0x36F70
```
Example: seg004 at file `0x40A00` → runtime `0x77970` → Ghidra `0007:7970`.
Functions at Ghidra `0003:XXXX` / `0004:XXXX` are **Phar Lap extender code** (flat < `0x40000` is below any game segment). Functions at `0006:E570`+ are game NE segments.
### `0000:ffff` — NE Fixup Placeholder (not a dispatcher)
`unresolved_far_thunk_dispatch` at `0000:ffff` is NOT a runtime function. Every `CALLF 0x0000:ffff` in the binary is a **different** external or inter-segment call patched by the NE loader at runtime. The decompiler body is garbled (it reads NE fixup-chain sentinel data). Decompiler comment added in Ghidra. See individual call sites for per-site behavioral annotations.
Known call-site classifications (by argument pattern):
- `PUSH DS; PUSH imm_ordinal; CALLF` — Phar Lap extender calling a runtime-imported procedure by ordinal
- `PUSH ptr_seg; PUSH ptr_off; CALLF` — inter-NE-segment function call (intra-game far call)
- Multiple typed pushes then CALLF — external C runtime / game subsystem call with normal args
### Latest Raw Full-EXE Porting Progress
- Newly ported and renamed into `CRUSADER-RAW.EXE` from verified `seg001` mapping (`base 0x6E570`):
@ -48,6 +75,23 @@
- `entity_fire_weapon` currently decompiles as a thin wrapper that calls `projectile_init_vector`.
- `fire_weapon_from_cursor` still decompiles poorly in the raw import, but disassembly shows it begins by pushing cursor sprite/state data from the `0x27d6` area, consistent with the existing seg001 notes.
### Raw seg091 Boundary Recovery (init/context + RNG helpers)
- Conservative PyGhidra boundary repair created the missing seg091 functions in `CRUSADER-RAW.EXE`:
- `000a:44fd` = `seg091_func_00fd`, body `000a:44fd-000a:454c`
- `000a:454d` = `seg091_func_014d`, body `000a:454d-000a:45fd`
- `000a:48a0` = `rng_advance_state`, body `000a:48a0-000a:48e2`
- `000a:48ff` = `rng_next_modulo`, body `000a:48ff-000a:4912`
- Additional adjacent helper identified directly in the raw import:
- `000a:48e3` = `rng_set_seed`
- Verified current behavior from the raw import:
- `seg091_func_00fd` shares runtime flag `0x44a4` with `runtime_init_or_abort`; if the flag is clear it sets it and dispatches through an unresolved far thunk, then falls into a second unresolved thunk path that Ghidra currently marks as non-returning.
- `seg091_func_014d` also shares flag `0x44a4`; it checks an optional long argument against the global context/cookie at `0x45a6`, zeroes the pointed byte when the argument is null, then dispatches through an unresolved far thunk. Keep the positional name until caller-side analysis resolves the thunk target and full signature.
- `rng_set_seed` writes the 32-bit RNG seed/state pair at `0x4584:0x4586` and forces the low word odd.
- `rng_advance_state` updates the same 32-bit state with a simple multiply/add step.
- `rng_next_modulo` advances the RNG state and returns the result modulo the requested bound, or `0` when the bound is zero.
- Short decompiler comments were added in Ghidra at all five seg091 entries so the current evidence stays attached to the raw database.
### Raw 0007 Gameplay Helper Batch (entity/tile aux state)
- New conservative gameplay-side helper renames (direct analysis from field writes and call structure):
@ -57,7 +101,9 @@
- Current verified behavior:
- `entity_sync_tile_aux_state` reads entity tile index at `+0x4`, toggles bit `0x04` in tile record `+0x59` based on entity byte `+0x54`, and copies entity word `+0x55` into tile record `+0x0d`.
- `entity_sync_tile_aux_if_linked` only performs the sync when entity link/pointer `+0x50/+0x52` is non-null.
- `entity_mark_dirty_and_sync_tile_aux` calls the linked-sync helper, sets entity flag bit `0x04` at `+0x42`, then enters the existing unresolved thunk path (`0000:ffff`).
- `entity_mark_dirty_and_sync_tile_aux` calls the linked-sync helper, sets entity flag bit `0x04` at `+0x42`, then calls through `0000:ffff` with args `(SS:&tile_index, entity[+0x57])` — annotated at `0007:8666` as `entity_tile_type_notify(tile_index_ptr, type_byte)`.
- New entity field found this pass:
- `entity[+0x57]` (byte) = entity type/class byte (passed to tile-type notification; meaning not yet fully established — adjacent to named fields `+0x54`/`+0x55`)
### Raw 0007 Gameplay Helper Batch (facing/direction)
@ -125,9 +171,25 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
}
```
#### Next RE target (to close remaining uncertainty)
#### Architectural Resolution: `unresolved_far_thunk_dispatch` / `0000:ffff`
- Recover the true callee behind `0000:ffff` for the `0007:224b` call site by relocation/import-table reconstruction or by matching this call path in a cleaner segment-mapped database. That should reveal exact per-slot use of the two dispatch tables and final coordinate math.
**`unresolved_far_thunk_dispatch` is NOT a real dispatcher.** It is the NE binary fixup placeholder.
- In a Phar Lap 286 NE executable, inter-segment and external far calls are stored in the binary as `CALLF 0x0000:ffff` (or similar invalid sentinel values).
- The Phar Lap NE loader patches each of these call sites to the real segment:offset at load time using the per-segment relocation records in the NE file.
- In Ghidra's raw import, those fixups are never applied. Every unresolved far call collapses to the same `0000:ffff` stub, where the decompiler produces garbled output (it's reading fixup-chain data, not real instructions).
- **Each `CALLF 0x0000:ffff` in the binary is a DIFFERENT call with a DIFFERENT actual target.** Identifying the target requires either parsing the NE relocation table or cross-matching with the resolved standalone segment extracts.
Address layout in the raw import (flat_address = `SSSS:OOOO` where flat = `SSSS * 0x10000 + OOOO`):
- `0000:` `0003:` (flat < `0x40000`) = Phar Lap 286 DOS extender code (the outer MZ stub portion)
- `0006:E570` onwards = NE game segments (seg001+ at their Phar Lap-assigned linear addresses)
- Mapping rule verified: `runtime_flat = NE_segment_file_offset + 0x36F70` (the NE header offset in the EXE)
Decompiler comment added to `0000:ffff` in Ghidra documenting this.
#### Next RE targets for `snap_entity_to_ground`
- The `0007:224b` thunk call is an intra-NE inter-segment call (calling into a different game segment with ground-aligned coordinate math). Identifying it requires the NE relocation table or matching the disassembly in the standalone extracts.
### Raw 0007 Gameplay Helper Follow-up: AI sweep + checked spawn path
@ -150,8 +212,11 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
- Added disassembly + decompiler comments capturing stable behavior:
- Reads player entity FAR pointer from global `0x2de4`.
- Copies player world position fields (`+0x40`, `+0x42`) into globals `0x27e7` / `0x27e9` (AI focus position cache used by downstream logic).
- Iterates entity IDs from `2` through `255` and dispatches per-entity processing through the shared thunk path.
- This function now has enough recovered semantics to treat it as the frame-level AI sweep dispatcher even though individual thunked callees remain unresolved in the raw import.
- Iterates entity IDs from `2` through `255` and dispatches per-entity processing through two sequential thunked calls per entity.
- New disassembly comments added at both dispatch call sites:
- `0007:101c`: `entity_slot_fetch(SS:&entity_id)` — first call; resolves entity slot/pointer from loop ID
- `0007:1093`: `entity_tick_dispatch(SS:&entity_id, g_0x27c8)` — second call; per-entity AI tick with global `0x27c8` mode/context word
- Global `0x27c8` is now confirmed as the current targeted/current entity handle: `entity_is_type_match` compares against it directly, and both spawn helpers `map_find_spawn_point` / `enemy_spawn_at_position` snapshot it before their thunked core paths.
### Raw 0007 Gameplay Logic: animation / range / command globals
@ -172,14 +237,18 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
- `g_speed_double_flag` (`0x27fd`) — doubles speed_factor to 2 when set (fast game mode).
- Local variables renamed: `speed_factor` (1 or 2) and `advance_steps` (04, number of frame advances this tick).
- Entity struct fields confirmed (relative to `entity_ptr` as `int*`):
- `[0x1b]` = frame_min (backward direction counter)
- `[0x1c]` = frame_max
- `[0x1d]` = current_frame
- `[0x1e]` = loop_flag
- `[0x1f]` = reverse_direction_flag
- `+0x3f` (as `char*`) = completion handle/sentinel (`-1` = none, `0x2802` = player entity)
- On frame overflow: if completion handle valid and not player-entity, fires thunked event; calls vtable `[+8]` method.
- Added decompiler comment at function entry explaining all fields and behavior.
- `[0x1b]` (byte `+0x36`) = frame_min (backward direction counter)
- `[0x1c]` (byte `+0x38`) = frame_max
- `[0x1d]` (byte `+0x3a`) = current_frame
- `[0x1e]` (byte `+0x3c`) = loop_flag (0 = animation disabled)
- `[0x1f]` (byte `+0x3e`) = reverse_direction_flag / double-speed flag
- `+0x3f` (word, byte-offset) = completion handle/sentinel (`-1` = none, `0x2802` = player entity)
- `+0x00` (far ptr) = vtable pointer
- New disassembly comments added at all three `CALLF 0x0000:ffff` sites and the vtable indirect call:
- `0007:27dc`: `entity_completion_callback(handle)` — fires when loop wraps; skips player handle
- `0007:27fd`: vtable indirect `entity->vtable[+8](entity, 0, 0)``on_loop_complete` virtual method
- `0007:281e`: `notify_frame_progress(handle, current_frame)` — per-frame notification
- `0007:2851`: `entity_sprite_advance(entity_far_ptr, advance_amount, 0)` — core frame-advance call; advance_amount = `entity[+0x3c] * (steps+1) * speed_factor`
#### `entity_command_dispatch` (`0007:0990`) — partially decompiled
@ -191,10 +260,24 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
- Dispatches entity command through shared thunk; actual command table data not yet resolved.
- No incoming XREFs found in the raw import (likely called via table or vtable dispatch).
#### Enemy spawn helper cluster (`0007:505d`, `0007:5259`, `0007:5275`, `0007:5291`)
- Existing raw names align with prior standalone seg001 notes:
- `0007:505d` = `map_find_spawn_point` (`seg001 + 0x6aed`)
- `0007:5259` = `enemy_spawn_with_target` (`seg001 + 0x6ce9`)
- `0007:5275` = `enemy_spawn_no_target` (`seg001 + 0x6d05`)
- `0007:5291` = `enemy_spawn_at_position` (`seg001 + 0x6d21`)
- Current verified raw-import behavior:
- `enemy_spawn_with_target` is a thin wrapper over `enemy_spawn_at_position(..., target_player_flag = 1)`.
- `enemy_spawn_no_target` is the same wrapper but passes `target_player_flag = 0`.
- `map_find_spawn_point` and `enemy_spawn_at_position` both copy DS:`0x27c8` into locals before entering their unresolved thunk body, matching the standalone notes that treat `0x27c8` as the current targeted/current entity handle.
- Short decompiler comments were added in Ghidra on the raw spawn helpers to preserve this provenance.
#### Global map additions (renamed in Ghidra)
| Address | Name | Evidence |
|---------|------|---------|
| `0x27c8` | `g_current_entity_handle` | Compared directly by `entity_is_type_match`; also captured by `entity_ai_update_loop`, `map_find_spawn_point`, and `enemy_spawn_at_position` as the current targeted/current entity handle |
| `0x2de4` | `g_player_entity_farptr` | FAR ptr to player entity; `+0x40`/`+0x42` are world X/Y |
| `0x27e7` | `g_ai_focus_pos_x` | Set by `entity_ai_update_loop` from player entity `+0x40` |
| `0x27e9` | `g_ai_focus_pos_y` | Set by `entity_ai_update_loop` from player entity `+0x42` |
@ -219,12 +302,14 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
- `000e:35ef` = `record_table_next_slot`
- `000e:3639` = `record_table_parse_buffer`
- `000e:3798` = `record_parser_read_line`
- `000e:38a0` = `record_parser_seek_next_marker`
- `000e:38f8` = `record_parser_find_marker`
- `000e:39cc` = `record_parser_dispatch_at_directive`
- Current behavior read from raw-import decompilation/disassembly:
- `record_table_init` clears the table header and zeroes 300 words of inline storage.
- `record_table_parse_buffer` walks a CRLF-separated text buffer, captures each line, splits around a marker helper path, and stores parsed entry state into 0x0c-byte records.
- `record_parser_read_line` advances to the next CRLF-delimited line, rejects lines that start with `@` or with non-identifier punctuation, and terminates the line in-place with `0`.
- `record_parser_seek_next_marker` updates the parser's current marker cursor at `+0x18/+0x1a` by calling `record_parser_find_marker`; returns 1 if another marker was found, 0 at end-of-data.
- `record_parser_find_marker` scans forward until an `@` marker or end-of-data; optionally consumes the remaining length from the parser state.
- `record_parser_dispatch_at_directive` returns `0` unless the current substring begins with `@`; in the `@` case, it advances by 7 bytes and dispatches through a FAR thunk (`0000:ffff`).
@ -758,7 +843,30 @@ A scroll/camera management cluster found in the `0007:bxxx0007:dxxx` range.
| Address | Name | Evidence |
|---------|------|---------|
| `0007:5b6f` | `entity_set_at_target_update_facing` | Sets entity `+0x3a = 1` (arrived flag); calls `entity_set_facing_direction`; clears bit `0x10` from entity type table `0x7e1e[type*0x79+0x59]`; tail-calls thunk to advance state. Called in the entity state machine context. |
| `0007:5b6f` | internal block only *(no function after repair)* | Direct raw-analysis behavior remains useful as a local label: this block sets entity `+0x3a = 1` (arrived flag), calls `entity_set_facing_direction`, clears bit `0x10` from entity type table `0x7e1e[type*0x79+0x59]`, then tail-calls onward. After the PyGhidra boundary repair, `0007:5b6f` is no longer a function entry and should be treated only as an internal control-flow label inside the first repaired seg043 routine. |
### seg043 Standalone Boundary Recovery
- Direct disassembly of `NE_segments/seg043_code_off_75A00_len_336F.bin` shows the first non-zero bytes at offset `0x0090`; offsets `0x0000..0x008f` are all zero in the standalone extract.
- The first three clean 16-bit prologues in seg043 are at:
- `seg043:0090` -> raw `0007:5a90`
- `seg043:017a` -> raw `0007:5b7a`
- `seg043:021c` -> raw `0007:5c1c`
- The first recovered standalone function spans `0x0090..0x0179`, which means raw `0007:5b6f` falls inside the tail of that routine and overlaps the true return at raw `0007:5b79`.
- Repair status: applied in `CRUSADER-RAW.EXE` via the local PyGhidra toolkit. The bad function object at `0007:5b6f` was removed, and three conservative replacement functions were created:
- `0007:5a90` = `seg043_func_0090` with body `0007:5a90..0007:5b79`
- `0007:5b7a` = `entity_set_at_target_update_facing` with body `0007:5b7a..0007:5c1b`
- `0007:5c1c` = `seg043_func_021c` with body `0007:5c1c..0007:5c80`
- Follow-up re-decompilation now supports one real behavioral rename: `0007:5b7a` sets entity `+0x3a` to 1, calls `entity_set_facing_direction`, clears class-detail bit `0x10` at `0x7e1e[type*0x79+0x59]`, then continues into downstream dispatch, so the repaired middle function has been renamed `entity_set_at_target_update_facing`.
- `0007:5a90` now has a stronger structural read from standalone disassembly: it allocates an object when the incoming far pointer is null (literal `0x98`), runs a far setup helper using DS:`0x4b48..0x4b4e` and the second incoming far pointer, writes `0x4c13` at the object base, calls `entity_set_at_target_update_facing` with the third incoming far pointer, then adjusts the nested object at `+0x38` using extents read from the object at `+0x34` before returning the object pointer.
- `0007:5c1c` also has a stronger structural read: it optionally calls a virtual method through `[object->vtable + 0x4c]` when `object+0x44/+0x46` is non-null, passes a local stack word through `entity_class_get_flag20`, then dispatches one or two downstream far helpers using `object+0x48`, gated by a local status byte at `[bp-0xe]`.
- `0007:5a90` and `0007:5c1c` remain intentionally positional because their current decompiles still collapse into unresolved thunk dispatches and do not yet support safe behavioral names.
### Entity Class Flag Helper
| Address | Name | Evidence |
|---------|------|---------|
| `0006:02cc` | `entity_class_get_flag20` | Returns `((class_detail[type*0x79 + 0x59] & 0x20) >> 5)`. Conservative raw-analysis name; bit meaning still unknown, so the helper is named after the observed flag mask rather than a guessed behavior. |
### Animation Start Frame Helper
@ -1213,6 +1321,280 @@ Globals: `[0x63da]` = mouse button state, `[0x63d6]/[0x63d8]` = cursor X/Y, `[0x
| Address | Name | Evidence |
|---------|------|---------|
| `000c:dac1` | `cursor_nav_state_reset` | Zeros all directional/button flags; sets `[+0x32/+0x33]=0xff`, `[+0x47]=0xffff` |
## Top-40 Most-Called Far-Call Targets (NE Fixup Resolution)
Named via systematic analysis of 11,692 NE relocation fixup entries. These are the functions most frequently called through the `CALLF 0x0000:ffff` thunk mechanism.
### Tier 1: Top 20 (73+ callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 1 | `000a:44fd` | `seg091_func_00fd` | 331 | Recovered boundary. Shares init flag `0x44a4` with `runtime_init_or_abort`; thunk-heavy non-returning wrapper. |
| 2 | `0003:ac7e` | `mem_alloc` | 272 | Allocation wrapper → seg082:0000 (`0009:a200`) |
| 3 | `0008:dbec` | `entity_word_list_destroy` | 238 | Already named. Frees entity word-list buffer. |
| 4 | `0003:a751` | `mem_free` | 207 | Free wrapper → seg082:007a (`0009:a27a` = `mem_free_checked`) |
| 5 | `0008:bb4f` | `mem_alloc_far` | 174 | Thin wrapper → `mem_alloc` |
| 6 | `0003:a897` | `far_memcpy` | 165 | REP MOVSW + trailing MOVSB |
| 7 | `0005:088f` | `entity_get_type_word` | 130 | Returns type word from table 0x7df9 indexed by slot |
| 8 | `000b:358d` | `sprite_tree_accumulate_pos` | 122 | Recursively sums X/Y offsets (+0x21/+0x23) through linked child nodes (+0x19/+0x1b), copies 8-byte position block via far_memcpy |
| 9 | `0008:ce3d` | `entity_call_two_vtables` | 118 | Calls vtable[+4] at entity+0x1e and +0x28 |
| 10 | `0004:26cd` | `nop_void_stub` | 118 | Empty function, returns void |
| 11 | `0008:ce00` | `entity_call_two_vtables_base` | 117 | Calls vtable[0] at entity+0x1e and +0x28 |
| 12 | `0008:bb8c` | `entity_check_flag_0x4000` | 115 | Short-circuits if flag 0x4000 set at +0x16 |
| 13 | `0008:cda7` | `entity_free_both_word_lists` | 115 | Frees word lists at entity+0x1e and +0x28 if optional pointers at +0x24/+0x26 and +0x2e/+0x30 non-null. Both call `entity_word_list_free_existing`. |
| 14 | `0004:26d2` | `nop_void_stub_b` | 111 | Empty function, returns void |
| 15 | `000a:45fe` | `runtime_init_or_abort` | 108 | Reentrancy-guarded init. Flag at 0x44a4; flushes via FUN_000a_4a56, then calls `crt_exit_wrapper(1)`. Hidden code gap 0x4616-0x4643. |
| 16 | `0004:3324` | `nop_return_zero` | 95 | Returns 0 |
| 17 | `0009:c563` | `event_queue_push` | 82 | Circular buffer enqueue. Ring index (+0xe) masked 0x3f, slot masked 0xfff8. Writes event type word + data byte pair. |
| 18 | `0005:c448` | `list_remove_and_free` | 74 | Unlinks node from linked list via FUN_0005_c495, optionally calls `mem_free` if bit 0 of flags set |
| 19 | `000b:2e00` | *(no function in Ghidra)* | 74 | Analysis gap at seg109:0000. Needs manual function creation. |
| 20 | `0009:1f12` | `dos_file_lseek` | 73 | DOS LSEEK (INT 21h AH=42h) wrapper with error reporting to 0x867a |
### Tier 2: Ranks 21-40 (56-73 callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 21 | `0009:3600` | `rotating_buffer_advance` | 73 | Advances 5-slot circular counter at 0x3eb6, zeros pointer in table at 0x867c, dispatches via jump table |
| 22 | `0009:943a` | `entity_rect_compare_and_dispatch` | 68 | Compares bounding rectangles of two entities, dispatches based on flag bits 4/2/1 at +0x16 |
| 23 | `0009:1e61` | `dos_file_close` | 65 | DOS file close (INT 21h), error reporting, sets handle to -1 |
| 24 | `0005:e252` | *(unnamed — unclear)* | 65 | Copies 11 words from Phar Lap extender area (FUN_0000_12c6+5), then calls thunk. Interrupt/trampoline setup? |
| 25 | `0003:dbcc` | `crt_format_string` | 64 | MetaWare High C formatting wrapper. Calls FUN_0003_bb92 with runtime format dispatch table. |
| 26 | `0007:5a00` | *(no function in Ghidra)* | 64 | High-traffic raw target at `seg043:0000`. Earlier `debris_spawn` / seg001 mapping was rejected after checking relocation labels. Still needs manual function creation and direct analysis. |
| 27 | `000a:4742` | `assert_buffer_valid` | 63 | Validates handle: asserts param_2 == cookie at 0x45a6 and param_1 < limit at 0x87e0 |
| 28 | `0009:9216` | `entity_conditional_render_dispatch` | 63 | Checks entity flag bits 4 and 1 at +0x16, dispatches to vtable[+0xc] or thunk |
| 29 | `0008:cb2c` | `entity_flag20_clear_and_update_target` | 61 | *(already named)* Clears flag bit 0x20, writes target +0x12/+0x14, calls refresh |
| 30 | `0008:cb5c` | `entity_flag20_set_and_init_target` | 61 | *(already named)* Sets flag bit 0x20, inits target if zero, calls refresh |
| 31 | `0007:7306` | `entity_create_stack_object` | 58 | Allocates 0xCC bytes on stack, inits via `object_init_zero_fields` (0005:c400), calls thunk |
| 32 | `0007:8709` | `entity_mark_dirty_and_sync_tile_aux` | 58 | *(already named)* Syncs tile aux, sets flag bit 0x04 at +0x42 |
| 33 | `0007:87c5` | `entity_set_flag20_from_field42` | 58 | Reads entity+0x42/+0x44, calls `entity_flag20_set_and_init_target` with those values |
| 34 | `0007:8508` | `entity_table_lookup_and_dispatch` | 58 | *(already named)* Searches table at 0x2b46, dispatches via indirect jump |
| 35 | `0007:8920` | `entity_call_vtable_slot0c` | 58 | *(already named)* Calls vtable entry at +0x0c |
| 36 | `000a:b988` | `sprite_node_get_or_traverse` | 57 | If child pointer at +0x19/+0x1b non-null, traverses; otherwise returns leaf value |
| 37 | `0003:a98b` | `crt_signed_div32` | 56 | Entry: adjusts near→far stack, sets CX=0 (signed quotient), jumps to `crt_div32_impl` |
| 38 | `000a:7b44` | `nop_return_void_a` | 56 | Empty function (default vtable slot?) |
| 39 | `000a:7b49` | `nop_return_void_b` | 56 | Empty function (default vtable slot?) |
| 40 | `000a:7b53` | `nop_return_void_c` | 56 | Empty function (default vtable slot?) |
### Supporting Functions Discovered
| Address | Name | Description |
|---------|------|-------------|
| `000b:3a00` | `sprite_tree_sum_x_offset` | Recursive: sums field +0x21 through child chain +0x19/+0x1b |
| `000b:3a35` | `sprite_tree_sum_y_offset` | Recursive: sums field +0x23 through child chain +0x19/+0x1b |
| `0003:a845` | `crt_exit_wrapper` | Calls `crt_exit_impl(param,0,0)` |
| `0003:a7ee` | `crt_exit_impl` | Full C exit: atexit handlers, stdio flush, MetaWare runtime cleanup |
| `0003:a9a8` | `crt_div32_impl` | 32-bit division core. CX flags: bit0=unsigned, bit1=modulo, bit2=negate |
| `0005:c400` | `object_init_zero_fields` | Zeros fields +0x25, +0x29, +0x31, +0x32 of a struct. Returns pointer. |
| `000a:4440` | `joystick_read_axes_and_buttons` | Reads PC game port 0x201. Times axis responses, reads button nibble to 0x44a2 |
| `000b:3380` | `sprite_node_is_dirty` | Checks flags at obj+0x29 & 3 == 1 or 3 → returns bool |
| `000b:33a6` | `sprite_node_mark_dirty` | If not dirty, calls FUN_000b_3965 with mode=3 to invalidate |
### Tier 3: Ranks 41-60 (42-56 callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 41 | `000a:7b58` | `nop_return_zero_b` | 56 | Returns 0 (default vtable slot) |
| 42 | `000b:3ab2` | `sprite_node_dispatch_event` | 56 | Large event dispatch: checks event type (2/4/8/0x100), updates global focus ptr at [0x4fd0:4fd2], dispatches via vtable methods [+0x14/+0x18/+0x20/+0x24] by event code. Switch table for 16 event types. |
| 43 | `000a:48ff` | `rng_next_modulo` | 55 | Advances seg091 RNG state and returns the result modulo the requested bound; returns 0 when bound is 0. |
| 44 | `000b:3362` | `sprite_tree_unwind_check` | 55 | Validates SS == param_2 (stack segment guard), then decrements global counter at [0x4fd6] |
| 45 | `000b:40ee` | `sprite_node_update_and_dispatch` | 55 | If `sprite_node_is_dirty` returns false: marks dirty, calcs accumulated bounds via `sprite_tree_get_accumulated_bounds` (3ed8), then dispatches via thunk |
| 46 | `000a:7b5f` | `vtable_stub_trampoline` | 55 | Calls through fixup thunk (forwarder to another function) |
| 47 | `000a:7b78` | `nop_return_void_e` | 55 | Empty function (default vtable slot) |
| 48 | `000a:7b7d` | `nop_return_void_f` | 55 | Empty function (default vtable slot) |
| 49 | `000a:7b4e` | `nop_return_void_d` | 54 | Empty function (default vtable slot) |
| 50 | `000b:330c` | `sprite_tree_dispatch_wrapper` | 52 | Pure thunk wrapper: calls through fixup |
| 51 | `0009:2034` | `dos_file_seek` | 51 | INT 21h AH=42h (LSEEK). Takes file object ptr, extracts handle at obj+4, seeks to offset param. Error reporting to [0x867a]. |
| 52 | `0005:0466` | `entity_resolve_slot_ptr` | 50 | *(already named)* |
| 53 | `0003:a880` | *(no function in Ghidra)* | 49 | Analysis gap in CRT segment |
| 54 | `0006:170c` | `tile_class_get_byte` | 47 | Looks up class data: indexes into table at [0x7e1e] by (*param_1 * 0x79), returns byte at offset +0xc |
| 55 | `000b:4097` | `sprite_dispatch_with_event` | 45 | Pushes event params + global [0x49c2:0x49c4], calls thunk |
| 56 | `0005:02c1` | `entity_is_type_match` | 43 | Compares *param_1 against global at [0x27c8], returns 1 if equal, 0 otherwise |
| 57 | `0003:ad75` | *(no function in Ghidra)* | 43 | Analysis gap in CRT segment |
| 58 | `000a:e709` | `render_dispatch_by_flag` | 43 | Dispatches between two thunk paths based on boolean flag at stack+0x10 |
| 59 | `0003:d0ff` | `crt_sprintf_wrapper` | 42 | Calls FUN_0003_bb92 (format engine) with rearranged params and string constant at 0x67ac |
| 60 | `000b:326e` | `sprite_node_destroy` | 42 | Destructor: sets vtable ptr to 0x501a, clears global [0x4fd0:4fd2] if self, releases child nodes, calls mem_free via thunk |
### Updated Analysis Gaps
`0007:5a00` / `0007:5b6f` reconciliation:
- The earlier standalone seg001 port hypothesis in this subrange was wrong.
- Relocation data places raw `0007:5a00` at `seg043:0000`, and the already-named helper at `0007:5b6f` sits at `seg043:016f`.
- Because of that segment placement, standalone seg001 names such as `debris_spawn` (`0x7490`) and `entity_die` (`0x75ff`) should NOT be ported into this raw range.
- `0007:5b6f` no longer exists as a function after the PyGhidra repair pass. Its old raw-analysis behavior now lines up with the repaired function `0007:5b7a = entity_set_at_target_update_facing`, so `0007:5b6f` should be treated only as an internal control-flow location inside that function.
- Additional resolved call targets inside the missing seg043 block were annotated in Ghidra from relocation data:
- `0007:5a8a` -> `entity_set_event_type_checked`
- `0007:5a98` -> `FUN_0008_cc01` (timer-related flag/event helper; tests `+0x16 & 0x2`, sets `+0x16 |= 0x800`, copies event field `+0x06` to `+0x22`, checks `0x1000`, then conditionally dispatches)
- `0007:5b36` -> `entity_get_type_word`
- `0007:5b44` -> `saveslot_read_entry_flags`
- `0007:5bb8` -> `entity_is_type_match`
- `0007:5c49` -> `entity_class_get_flag20`
- `0007:5c8b` -> `mem_alloc_far`
- Current boundary state:
- The seg043 split has now been repaired in Ghidra. Verified temporary functions exist at raw `0007:5a90`, `0007:5b7a`, and `0007:5c1c`.
- The repaired middle function at `0007:5b7a` has now been promoted from a positional label to `entity_set_at_target_update_facing` based on direct decompile/disassembly behavior.
- The remaining repaired functions at `0007:5a90` and `0007:5c1c` should keep their positional names until a later pass resolves the thunk-heavy bodies more clearly.
- The next pass on this region should continue re-decompiling `seg043_func_0090` and `seg043_func_021c`, resolve the still-unknown far thunks they call, and replace the positional names only when their behavior is directly supported.
| Address | NE Segment | Callers | Notes |
|---------|-----------|---------|-------|
| `000a:44fd` | seg091:00fd | 331 | Recovered as `seg091_func_00fd`; thunk-heavy init wrapper sharing flag `0x44a4`. |
| `000b:2e00` | seg109:0000 | 74 | Start of segment 109. |
| `0007:5a00` | seg043:0000 | 64 | Start of segment 43. Earlier seg001 `debris_spawn` port was rejected; still needs manual function creation and direct analysis. |
| `000a:48ff` | seg091:04ff | 55 | Recovered as `rng_next_modulo`; bounded wrapper around seg091 RNG state advance. |
| `0003:a880` | seg005:0880 | 49 | In CRT segment near `far_memcpy`. |
| `0003:ad75` | seg005:0d75 | 43 | In CRT segment near `mem_alloc`. |
| `000a:454d` | seg091:014d | 32 | Recovered as `seg091_func_014d`; init/context helper using the `0x45a6` cookie/context global. |
### Tier 4: Ranks 61-80 (29-42 callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 61 | `000b:30a5` | `sprite_tree_forward_wrapper` | 42 | Pure thunk forwarder |
| 62 | `0008:bc27` | `entity_set_event_type_checked` | 41 | *(pre-existing name)* Sets event code at +0x06 with range/timer checks |
| 63 | `0008:d214` | `entity_dispatch_entry_ctor_vtbl_3aa6` | 40 | *(pre-existing name)* Constructor: alloc 0x40, vtbl 3AA6, flag 0x200 |
| 64 | `0005:1565` | `entity_action_by_type_dispatch` | 39 | Checks entity type against whitelist (0x432,0x5a0,0x1fd,0x1fe,0x8f,0x59f,0x2b3,0x2ca), dispatches by flags at [0xc76] and [0x85f] |
| 65 | `0008:4bba` | `channel_slot_enable` | 39 | Sets enable byte=1 in 5-slot table at 0x84ca (slot * 0xd stride) |
| 66 | `0009:6f5a` | `vga_palette_write` | 38 | Writes RGB triplets to VGA DAC (port 0x3C8/0x3C9). Range param_2..param_3 from palette data at *param_1 |
| 67 | `0009:8ef6` | `line_draw_dispatch` | 38 | Compares abs(dx) vs abs(dy) to determine major axis, dispatches to appropriate line draw routine |
| 68 | `000a:7b30` | `nop_return_void_g` | 38 | Empty function (default vtable slot) |
| 69 | `000a:7b3f` | `nop_return_void_h` | 38 | Empty function (default vtable slot) |
| 70 | `0009:6e7f` | `palette_free_if_set` | 35 | Frees existing palette data if ptr non-null, checks alignment |
| 71 | `000a:7b35` | `nop_return_void_i` | 35 | Empty function (default vtable slot) |
| 72 | `0009:c433` | `event_queue_align_index` | 34 | Returns `param_1 & 0xFFF8` — aligns ring index to 8-byte event slot boundary |
| 73 | `0009:2156` | `dos_file_get_size` | 33 | Saves file position, does INT 21h AH=42h AL=02 (seek to end), restores position. Returns file size in DX:AX |
| 74 | `000a:2c41` | `list_iterate_next` | 33 | Linked list iterator: if *out==0 returns first from obj+2; else follows next at ptr+2/+4. Returns bool (has more) |
| 75 | `000a:454d` | `seg091_func_014d` | 32 | Recovered boundary. Shares flag `0x44a4`; checks optional long argument against the `0x45a6` cookie/context global. |
| 76 | `000b:2446` | `sprite_clear_redraw_flag` | 31 | Clears flag at obj+0x17e, then dispatches via thunk |
| 77 | `0005:1238` | `entity_get_class_word` | 30 | Looks up table at [0x7e01] indexed by *param_1 * 2, returns word. Sister of `entity_get_type_word` (which uses [0x7df9]) |
| 78 | `000b:1446` | `display_null_check_dispatch` | 30 | Null-checks far ptr params, dispatches to different thunks based on result |
| 79 | `000d:85da` | `map_object_set_dirty_flag` | 29 | Sets byte at global_obj[0x6828]+0x40 = 1 if global non-null, then calls thunk |
| 80 | `0005:1511` | `entity_destroy_trampoline` | 29 | Pure thunk forwarder to entity destruction |
---
## Deep Analysis: Coordinate Transform System
### `world_to_screen_coords` at `0004:e7bd` (NE seg018:07bd)
**Signature:**
```c
void world_to_screen_coords(int world_x, int world_y, int *screen_x, int *screen_y)
```
**Isometric Projection Math:**
```
screen_x = (world_x - world_y) / 2 - camera_x // SAR 1 (signed divide)
screen_y = (world_x + world_y) / 4 - camera_y // SHR 2 (unsigned divide)
```
Camera globals: `g_scroll_offset_x` (DS:0x2bb7), `g_scroll_offset_y` (DS:0x2bb9).
**Assembly detail:**
- `SAR AX, 1` for screen_x — signed arithmetic shift preserves sign for negative (world_x - world_y) differences
- `SHR AX, 2` for screen_y — unsigned logical shift (sum world_x + world_y is always positive)
- The 2:1 ratio (÷2 for X, ÷4 for Y) produces the classic 2:1 isometric diamond tile shape
**Coordinate axes on screen:**
- World X axis → lower-right on screen (+0.5 screen_x, +0.25 screen_y per world unit)
- World Y axis → lower-left on screen (-0.5 screen_x, +0.25 screen_y per world unit)
- Camera subtraction converts absolute world-space to viewport-relative screen coordinates
**Callers (17 across 8 NE segments):**
| Call site | NE Segment | Context |
|-----------|-----------|---------|
| `0004:7d6f` | seg012 | Map/tile rendering |
| `0005:0305` | seg021 | Entity system |
| `0005:432f` | seg021 | Entity placement |
| `0005:4457` | seg021 | Entity placement |
| `0005:6f8f` | seg022 | Entity rendering |
| `0005:7263` | seg022 | Entity rendering |
| `0007:2262` | seg040 | `snap_entity_to_ground` — ground alignment |
| `0007:237d` | seg040 | Ground snap dispatch |
| `0007:cf4e` | seg049 | Entity positioning |
| `0007:d039` | seg049 | Entity positioning |
| `0007:d43f` | seg049 | Entity positioning |
| `0007:d6fe` | seg049 | Entity positioning |
| `0008:3223` | seg053 | Entity-to-screen render setup |
| `0008:32e7` | seg053 | Entity-to-screen render setup |
| `0008:334b` | seg053 | Entity-to-screen render setup |
| `000b:858b` | seg115 | Sprite system |
| `000b:f100` | seg120 | Sprite system |
**Entity struct layout (from seg053 caller at `0008:31f6`):**
```
entity_array_base = far ptr at [DS:0x2cff]
entity_struct_size = 19 bytes (0x13)
entity.world_x = offset +0x0a (word)
entity.world_y = offset +0x0c (word)
```
### Comparison: Two Coordinate Transform Functions
| Property | `world_to_screen_coords` (0004:e7bd) | `world_to_screen_isometric` (0007:be67) |
|----------|---------------------------------------|----------------------------------------|
| Input type | Fine-grained world units (entity positions) | Coarse tile-grid units (map rendering) |
| screen_x | `(wx - wy) / 2 - cam_x` | `(wx + sx) + (wy + sy) * 2` |
| screen_y | `(wx + wy) / 4 - cam_y` | `(wy + sy) * 2 - (wx + sx)` |
| Camera handling | Subtracted after transform | Added before transform |
| Operations | Division (SAR/SHR) | Multiplication (SHL) |
| Aspect ratio | 2:1 (from /2 : /4) | 2:1 (from 1 : 2 multipliers) |
Both functions implement the same 2:1 isometric projection but at different coordinate scales. `world_to_screen_coords` divides down from fine world units while `world_to_screen_isometric` multiplies up from coarse tile units.
### Adjacent Function: `map_position_equal` at `0004:e784`
Compares two 5-byte `map_position` structs: `{ x:word, y:word, layer:byte }`. Returns 1 (AL) if all three fields match, 0 otherwise. Located immediately before `world_to_screen_coords` in seg018.
---
### Tier 5: Ranks 81-100 (25-29 callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 81 | `0009:1c00` | `dos_file_handle_init` | 29 | Inits 6-byte file handle struct: dword=0, word+4=0xFFFF (invalid). Aborts on null ptr |
| 82 | `0008:75f3` | `entity_get_ptr` | 29 | *(pre-existing)* Looks up entity far ptr from table at DS:0x39b0, indexed by id*4 |
| 83 | `0006:0208` | `entity_class_get_flag4` | 29 | Returns bit 2 of classinfo byte at [0x7e1e]+*p1*0x79+0x13 → 0 or 1 |
| 84 | `000a:30d7` | `list_node_set_if_context` | 29 | Sets node fields +2/+4 if params match context globals at 0x45a6/0x45a8 |
| 85 | `0009:c45f` | `object_init_and_get_next` | 29 | Calls `object_init_zero_fields` then returns *(result+2) — init+accessor combo |
| 86 | `0004:d7a0` | `object_deref_get_word4` | 28 | Dereferences far ptr chain: returns word at *(*(param_1)+4) |
| 87 | `000a:5276` | `debug_check_flag_45aa` | 28 | If byte at DS:0x45aa non-zero, calls thunk (diagnostic/assert check) |
| 88 | `0003:d94f` | `far_memset` | 28 | Wrapper reordering params for CRT memset impl at 0003:d92b (odd-aligned, word-fill loop) |
| 89 | `000a:7b3a` | `nop_return_void_j` | 28 | Empty function (default vtable slot) |
| 90 | `0008:ca18` | `entity_pair_sync_b` | 27 | *(pre-existing)* Pairwise sync wrapper direction B |
| 91 | `0008:bd20` | `entity_sprite_set_target_pos` | 27 | *(pre-existing)* Sets flag 0x1000, copies player pos to entity +0x0a/+0x0c |
| 92 | `0009:3ceb` | `buffer_release_and_dispatch` | 27 | Frees far ptr at obj+0x3b if set, nulls it; conditionally dispatches on bit 0 |
| 93 | `0005:09b4` | `entity_get_flags_byte` | 27 | Reads byte from [0x7dfd]+id, conditionally extends with classinfo byte at [0x7e1e]+id*0x79+0xf |
| 94 | `0005:0fbb` | `entity_lookup_sprite_word` | 27 | Returns word from [0x7e05]+*p1*2 — sprite/visual index table |
| 95 | `0008:d27e` | `entity_dispatch_trampoline_b` | 26 | Pure forwarder thunk (CALLF thunk only) |
| 96 | `0005:0376` | `entity_resolve_base_type` | 26 | Walks entity class hierarchy (bit 8 in [0x7e01]) via [0x7ded], returns base type from [0x7df1] |
| 97 | `000b:2492` | `sprite_redraw_if_needed` | 26 | If redraw flag at +0x17e is clear, calls update routine + thunk |
| 98 | `0003:e4d3` | `dos_file_open_wrapper` | 26 | Zeros output byte, delegates to file open impl at 0003:bb92 |
| 99 | `0005:033e` | `entity_resolve_base_parent` | 25 | Same hierarchy walk as `entity_resolve_base_type` but returns parent from [0x7ded] |
| 100 | `000a:87fd` | `render_clip_rect_to_viewport` | 25 | Clips 4 rect params to viewport bounds at [0x4014], sets dirty flag at 0x8a16, increments draw counter at 0x4716 |
**Entity Table Pointers (DS-relative, discovered in tier 5):**
| DS Offset | Type | Stride | Purpose |
|-----------|------|--------|---------|
| `0x7dfd` | byte[] | 1 | Entity flags byte (entity_get_flags_byte) |
| `0x7e01` | word[] | 2 | Entity class flags (bit 8 = has parent in hierarchy) |
| `0x7e05` | word[] | 2 | Entity sprite/visual index |
| `0x7ded` | word[] | 2 | Entity parent/hierarchy index |
| `0x7df1` | word[] | 2 | Entity base type word |
| `0x7e1e` | struct[] | 0x79 | Entity class detail records (121 bytes per class) |
### Recent Manual Boundary Repairs
Recent high-traffic addresses recovered with manual function creation in Ghidra/PyGhidra:
| Address | NE Segment | Callers | Notes |
|---------|-----------|---------|-------|
| `000a:48ff` | seg091:04ff | 55 | Recovered as `rng_next_modulo`; manual boundary repair narrowed to `000a:48ff-000a:4912`. |
| `000b:2e00` | seg109:0000 | 74 | Start of segment 109. |
| `0007:5a00` | seg043:0000 | 64 | Start of segment 43. Earlier seg001 `debris_spawn` port was rejected; still needs manual function creation and direct analysis. |
| `0009:a200` | seg082:0000 | - | Target of `mem_alloc`. Start of segment 82. |
| `000c:db68` | `cursor_nav_update_and_dispatch` | Calls `cursor_zone_quadrant_classify`; updates `[+0x37..+0x3a]`; reads `[0x63da]`; switch on direction (08); maps scancodes 0x48/0x50/0x4b/0x4d/0x39 |
| `000c:d3e9` | `cursor_set_ref_and_dispatch` | Null-checks param; sets `*param_1 = &DAT_0000_638e`; calls dispatch |
| `000c:d710` | `cursor_set_ref2_and_dispatch` | Same pattern; sets `*param_1 = &DAT_0000_6346` |

41
disasm_helper.py Normal file
View file

@ -0,0 +1,41 @@
import struct, os, sys
BIN_PATH = r'k:\ghidra\Crusader_Decomp\NE_segments\seg001_code_off_37600_len_8400.bin'
TARGET = 0x265B
with open(BIN_PATH, 'rb') as f:
f.seek(TARGET - 0x200)
data = f.read(0x280)
try:
import capstone
md = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_16)
for ins in md.disasm(data, TARGET - 0x200):
print(' 0x%04x: %s %s' % (ins.address, ins.mnemonic, ins.op_str))
if ins.address > TARGET + 0x40:
break
except ImportError:
print('capstone not available, trying ndisasm...')
import subprocess, tempfile
tmp = os.path.join(os.environ.get('TEMP', '.'), 'seg001_chunk.bin')
with open(tmp, 'wb') as f2:
f2.write(data)
result = subprocess.run(
['ndisasm', '-b', '16', '-o', '0x%x' % (TARGET - 0x200), tmp],
capture_output=True, text=True, timeout=15
)
if result.returncode == 0:
for line in result.stdout.split('\n'):
try:
addr = int(line.split()[0], 16)
if TARGET - 0x200 <= addr <= TARGET + 0x40:
print(line)
except:
pass
else:
print('ndisasm failed:', result.stderr)
# Fallback: hex dump
offset = TARGET - 0x200
for i in range(0, len(data), 16):
hexb = ' '.join('%02x' % b for b in data[i:i+16])
print('0x%04x: %s' % (offset+i, hexb))

11
get_tier4.py Normal file
View file

@ -0,0 +1,11 @@
from collections import Counter
c = Counter()
with open('ne_reloc_far_calls.tsv') as f:
next(f)
for line in f:
parts = line.strip().split('\t')
tgt = parts[1]
c[tgt] += 1
for i, (addr, cnt) in enumerate(c.most_common(100)):
if i >= 60 and i < 80:
print(f'{i+1:3d} {addr} {cnt}')

11
get_tier5.py Normal file
View file

@ -0,0 +1,11 @@
from collections import Counter
c = Counter()
with open('ne_reloc_far_calls.tsv') as f:
next(f)
for line in f:
parts = line.strip().split('\t')
tgt = parts[1]
c[tgt] += 1
for i, (addr, cnt) in enumerate(c.most_common(120)):
if i >= 80 and i < 100:
print(f'{i+1:3d} {addr} {cnt}')

11693
ne_reloc_far_calls.tsv Normal file

File diff suppressed because it is too large Load diff

120
ne_reloc_far_imports.tsv Normal file
View file

@ -0,0 +1,120 @@
source_ghidra target source_seg source_off_in_seg
0003:761e PHAPI.DOSCREATEDSALIAS seg001 0x001e
0003:76b1 DOSCALLS.38 seg001 0x00b1
0003:76be DOSCALLS.38 seg001 0x00be
0003:7795 DOSCALLS.89 seg001 0x0195
0003:77ab DOSCALLS.89 seg001 0x01ab
0003:f46e DOSCALLS.39 seg001 0x7e6e
0003:f51d DOSCALLS.40 seg001 0x7f1d
0003:f539 DOSCALLS.41 seg001 0x7f39
0003:f561 DOSCALLS.40 seg001 0x7f61
0003:f59c DOSCALLS.42 seg001 0x7f9c
0003:f6c9 DOSCALLS.42 seg001 0x80c9
0003:f851 PHAPI.DOSMAPREALSEG seg001 0x8251
0003:f88d DOSCALLS.39 seg001 0x828d
0003:f896 DOSCALLS.39 seg001 0x8296
0003:f8b3 PHAPI.DOSMAPREALSEG seg001 0x82b3
0003:f943 DOSCALLS.127 seg001 0x8343
0004:17c6 ASYLUM.36 seg004 0x0dc6
0004:17dc ASYLUM.28 seg004 0x0ddc
0004:19cf ASYLUM.45 seg004 0x0fcf
0004:25a5 ASYLUM.24 seg005 0x07a5
0004:6f26 ASYLUM.36 seg011 0x0126
0004:6f2e ASYLUM.28 seg011 0x012e
0004:6f4d ASYLUM.37 seg011 0x014d
0004:6f57 ASYLUM.29 seg011 0x0157
0004:70a2 ASYLUM.37 seg011 0x02a2
0004:70ad ASYLUM.29 seg011 0x02ad
0004:7136 ASYLUM.36 seg011 0x0336
0004:713e ASYLUM.28 seg011 0x033e
0004:715d ASYLUM.37 seg011 0x035d
0004:7167 ASYLUM.29 seg011 0x0367
0004:72af ASYLUM.37 seg011 0x04af
0004:72ba ASYLUM.29 seg011 0x04ba
0006:eba2 ASYLUM.36 seg039 0x09a2
0006:ebb5 ASYLUM.37 seg039 0x09b5
0006:ebc0 ASYLUM.36 seg039 0x09c0
0006:ebd3 ASYLUM.37 seg039 0x09d3
0008:67ee PHAPI._DosRealFarCall seg058 0x03ee
0008:6a7f PHAPI.DOSALLOCREALSEG seg059 0x007f
0008:6aad PHAPI.DOSALLOCREALSEG seg059 0x00ad
0008:6ae8 PHAPI._DosRealIntr seg059 0x00e8
0008:6b2e PHAPI.DOSMAPREALSEG seg059 0x012e
0008:9797 PHAPI.BORISREALINTR seg059 0x2d97
0008:97ac PHAPI.BORISREALINTR seg059 0x2dac
0008:a06b PHAPI._DosRealFarCall seg059 0x366b
0008:ebb2 ASYLUM.34 seg064 0x01b2
0008:ebba ASYLUM.33 seg064 0x01ba
0008:ebff ASYLUM.31 seg064 0x01ff
0008:ec18 ASYLUM.30 seg064 0x0218
0008:ec3c ASYLUM.32 seg064 0x023c
0008:f208 PHAPI.DOSMAPLINSEG seg065 0x0208
0008:f233 PHAPI.DOSMAPLINSEG seg065 0x0233
0008:f2bf PHAPI.DOSMAPLINSEG seg065 0x02bf
0009:080f DOSCALLS.7 seg068 0x000f
0009:0867 PHAPI.DOSALLOCREALSEG seg068 0x0067
0009:0899 PHAPI.DOSALLOCREALSEG seg068 0x0099
0009:08eb PHAPI.DOSALLOCREALSEG seg068 0x00eb
0009:0bc2 DOSCALLS.39 seg068 0x03c2
0009:0bd4 DOSCALLS.7 seg068 0x03d4
0009:0d7a DOSCALLS.39 seg068 0x057a
0009:0d8c DOSCALLS.39 seg068 0x058c
0009:0df3 PHAPI.DOSSETPASSTOPROTVEC seg068 0x05f3
0009:0ea6 PHAPI.DOSSETREALPROTVEC seg068 0x06a6
0009:0f4f PHAPI.DOSSETPROTVEC seg068 0x074f
0009:b363 PHAPI.DOSALLOCREALSEG seg082 0x1163
0009:b389 PHAPI.DOSALLOCREALSEG seg082 0x1189
0009:b40b PHAPI.DOSALLOCLINMEM seg082 0x120b
0009:b47a PHAPI.DOSALLOCLINMEM seg082 0x127a
0009:b491 PHAPI.DOSFREELINMEM seg082 0x1291
0009:b4f6 PHAPI.DOSFREELINMEM seg082 0x12f6
0009:b577 PHAPI.DOSALLOCLINMEM seg082 0x1377
0009:b598 PHAPI.DOSALLOCLINMEM seg082 0x1398
0009:b662 PHAPI.DOSALLOCLINMEM seg082 0x1462
0009:b748 PHAPI.DOSALLOCLINMEM seg082 0x1548
0009:b7b3 PHAPI.DOSALLOCLINMEM seg082 0x15b3
0009:b7d1 PHAPI.DOSFREELINMEM seg082 0x15d1
0009:ba35 DOSCALLS.39 seg082 0x1835
0009:ba50 DOSCALLS.39 seg082 0x1850
0009:ba97 PHAPI.DOSFREELINMEM seg082 0x1897
0009:bb5f PHAPI.DOSGETBIOSSEG seg082 0x195f
0009:bb71 PHAPI.DOSMAPREALSEG seg082 0x1971
0009:bb96 PHAPI.DOSMAPREALSEG seg082 0x1996
0009:bbdc PHAPI.DOSMAPLINSEG seg082 0x19dc
0009:bc32 PHAPI.DOSMAPLINSEG seg082 0x1a32
0009:bc57 PHAPI.DOSMAPLINSEG seg082 0x1a57
0009:bcb1 DOSCALLS.7 seg082 0x1ab1
0009:bdee DOSCALLS.7 seg082 0x1bee
0009:c542 PHAPI.DOSMAPLINSEG seg083 0x0142
000a:5746 ASYLUM.56 seg093 0x0146
000a:57de ASYLUM.58 seg093 0x01de
000a:57ea ASYLUM.37 seg093 0x01ea
000a:57f4 ASYLUM.29 seg093 0x01f4
000a:5801 ASYLUM.49 seg093 0x0201
000a:5810 ASYLUM.47 seg093 0x0210
000a:5817 ASYLUM.46 seg093 0x0217
000a:583e ASYLUM.57 seg093 0x023e
000a:5ed0 ASYLUM.25 seg094 0x00d0
000a:5fde ASYLUM.27 seg094 0x01de
000a:6022 ASYLUM.27 seg094 0x0222
000a:60cd ASYLUM.27 seg094 0x02cd
000a:6113 ASYLUM.25 seg094 0x0313
000a:61fe ASYLUM.25 seg094 0x03fe
000a:62f6 ASYLUM.25 seg094 0x04f6
000a:636f ASYLUM.23 seg094 0x056f
000c:11fd ASYLUM.28 seg122 0x0ffd
000c:120e ASYLUM.36 seg122 0x100e
000c:1521 ASYLUM.45 seg122 0x1321
000c:158d ASYLUM.45 seg122 0x138d
000c:25c1 ASYLUM.47 seg122 0x23c1
000c:25c8 ASYLUM.46 seg122 0x23c8
000c:2621 ASYLUM.29 seg122 0x2421
000c:2671 ASYLUM.29 seg122 0x2471
000c:26b8 ASYLUM.37 seg122 0x24b8
000c:2708 ASYLUM.37 seg122 0x2508
000d:9b3a ASYLUM.25 seg138 0x093a
000d:b1cc ASYLUM.27 seg138 0x1fcc
000e:090c ASYLUM.18 seg142 0x210c
000e:0960 ASYLUM.27 seg142 0x2160
000e:2592 ASYLUM.25 seg142 0x3d92
000e:259c ASYLUM.19 seg142 0x3d9c
1 source_ghidra target source_seg source_off_in_seg
2 0003:761e PHAPI.DOSCREATEDSALIAS seg001 0x001e
3 0003:76b1 DOSCALLS.38 seg001 0x00b1
4 0003:76be DOSCALLS.38 seg001 0x00be
5 0003:7795 DOSCALLS.89 seg001 0x0195
6 0003:77ab DOSCALLS.89 seg001 0x01ab
7 0003:f46e DOSCALLS.39 seg001 0x7e6e
8 0003:f51d DOSCALLS.40 seg001 0x7f1d
9 0003:f539 DOSCALLS.41 seg001 0x7f39
10 0003:f561 DOSCALLS.40 seg001 0x7f61
11 0003:f59c DOSCALLS.42 seg001 0x7f9c
12 0003:f6c9 DOSCALLS.42 seg001 0x80c9
13 0003:f851 PHAPI.DOSMAPREALSEG seg001 0x8251
14 0003:f88d DOSCALLS.39 seg001 0x828d
15 0003:f896 DOSCALLS.39 seg001 0x8296
16 0003:f8b3 PHAPI.DOSMAPREALSEG seg001 0x82b3
17 0003:f943 DOSCALLS.127 seg001 0x8343
18 0004:17c6 ASYLUM.36 seg004 0x0dc6
19 0004:17dc ASYLUM.28 seg004 0x0ddc
20 0004:19cf ASYLUM.45 seg004 0x0fcf
21 0004:25a5 ASYLUM.24 seg005 0x07a5
22 0004:6f26 ASYLUM.36 seg011 0x0126
23 0004:6f2e ASYLUM.28 seg011 0x012e
24 0004:6f4d ASYLUM.37 seg011 0x014d
25 0004:6f57 ASYLUM.29 seg011 0x0157
26 0004:70a2 ASYLUM.37 seg011 0x02a2
27 0004:70ad ASYLUM.29 seg011 0x02ad
28 0004:7136 ASYLUM.36 seg011 0x0336
29 0004:713e ASYLUM.28 seg011 0x033e
30 0004:715d ASYLUM.37 seg011 0x035d
31 0004:7167 ASYLUM.29 seg011 0x0367
32 0004:72af ASYLUM.37 seg011 0x04af
33 0004:72ba ASYLUM.29 seg011 0x04ba
34 0006:eba2 ASYLUM.36 seg039 0x09a2
35 0006:ebb5 ASYLUM.37 seg039 0x09b5
36 0006:ebc0 ASYLUM.36 seg039 0x09c0
37 0006:ebd3 ASYLUM.37 seg039 0x09d3
38 0008:67ee PHAPI._DosRealFarCall seg058 0x03ee
39 0008:6a7f PHAPI.DOSALLOCREALSEG seg059 0x007f
40 0008:6aad PHAPI.DOSALLOCREALSEG seg059 0x00ad
41 0008:6ae8 PHAPI._DosRealIntr seg059 0x00e8
42 0008:6b2e PHAPI.DOSMAPREALSEG seg059 0x012e
43 0008:9797 PHAPI.BORISREALINTR seg059 0x2d97
44 0008:97ac PHAPI.BORISREALINTR seg059 0x2dac
45 0008:a06b PHAPI._DosRealFarCall seg059 0x366b
46 0008:ebb2 ASYLUM.34 seg064 0x01b2
47 0008:ebba ASYLUM.33 seg064 0x01ba
48 0008:ebff ASYLUM.31 seg064 0x01ff
49 0008:ec18 ASYLUM.30 seg064 0x0218
50 0008:ec3c ASYLUM.32 seg064 0x023c
51 0008:f208 PHAPI.DOSMAPLINSEG seg065 0x0208
52 0008:f233 PHAPI.DOSMAPLINSEG seg065 0x0233
53 0008:f2bf PHAPI.DOSMAPLINSEG seg065 0x02bf
54 0009:080f DOSCALLS.7 seg068 0x000f
55 0009:0867 PHAPI.DOSALLOCREALSEG seg068 0x0067
56 0009:0899 PHAPI.DOSALLOCREALSEG seg068 0x0099
57 0009:08eb PHAPI.DOSALLOCREALSEG seg068 0x00eb
58 0009:0bc2 DOSCALLS.39 seg068 0x03c2
59 0009:0bd4 DOSCALLS.7 seg068 0x03d4
60 0009:0d7a DOSCALLS.39 seg068 0x057a
61 0009:0d8c DOSCALLS.39 seg068 0x058c
62 0009:0df3 PHAPI.DOSSETPASSTOPROTVEC seg068 0x05f3
63 0009:0ea6 PHAPI.DOSSETREALPROTVEC seg068 0x06a6
64 0009:0f4f PHAPI.DOSSETPROTVEC seg068 0x074f
65 0009:b363 PHAPI.DOSALLOCREALSEG seg082 0x1163
66 0009:b389 PHAPI.DOSALLOCREALSEG seg082 0x1189
67 0009:b40b PHAPI.DOSALLOCLINMEM seg082 0x120b
68 0009:b47a PHAPI.DOSALLOCLINMEM seg082 0x127a
69 0009:b491 PHAPI.DOSFREELINMEM seg082 0x1291
70 0009:b4f6 PHAPI.DOSFREELINMEM seg082 0x12f6
71 0009:b577 PHAPI.DOSALLOCLINMEM seg082 0x1377
72 0009:b598 PHAPI.DOSALLOCLINMEM seg082 0x1398
73 0009:b662 PHAPI.DOSALLOCLINMEM seg082 0x1462
74 0009:b748 PHAPI.DOSALLOCLINMEM seg082 0x1548
75 0009:b7b3 PHAPI.DOSALLOCLINMEM seg082 0x15b3
76 0009:b7d1 PHAPI.DOSFREELINMEM seg082 0x15d1
77 0009:ba35 DOSCALLS.39 seg082 0x1835
78 0009:ba50 DOSCALLS.39 seg082 0x1850
79 0009:ba97 PHAPI.DOSFREELINMEM seg082 0x1897
80 0009:bb5f PHAPI.DOSGETBIOSSEG seg082 0x195f
81 0009:bb71 PHAPI.DOSMAPREALSEG seg082 0x1971
82 0009:bb96 PHAPI.DOSMAPREALSEG seg082 0x1996
83 0009:bbdc PHAPI.DOSMAPLINSEG seg082 0x19dc
84 0009:bc32 PHAPI.DOSMAPLINSEG seg082 0x1a32
85 0009:bc57 PHAPI.DOSMAPLINSEG seg082 0x1a57
86 0009:bcb1 DOSCALLS.7 seg082 0x1ab1
87 0009:bdee DOSCALLS.7 seg082 0x1bee
88 0009:c542 PHAPI.DOSMAPLINSEG seg083 0x0142
89 000a:5746 ASYLUM.56 seg093 0x0146
90 000a:57de ASYLUM.58 seg093 0x01de
91 000a:57ea ASYLUM.37 seg093 0x01ea
92 000a:57f4 ASYLUM.29 seg093 0x01f4
93 000a:5801 ASYLUM.49 seg093 0x0201
94 000a:5810 ASYLUM.47 seg093 0x0210
95 000a:5817 ASYLUM.46 seg093 0x0217
96 000a:583e ASYLUM.57 seg093 0x023e
97 000a:5ed0 ASYLUM.25 seg094 0x00d0
98 000a:5fde ASYLUM.27 seg094 0x01de
99 000a:6022 ASYLUM.27 seg094 0x0222
100 000a:60cd ASYLUM.27 seg094 0x02cd
101 000a:6113 ASYLUM.25 seg094 0x0313
102 000a:61fe ASYLUM.25 seg094 0x03fe
103 000a:62f6 ASYLUM.25 seg094 0x04f6
104 000a:636f ASYLUM.23 seg094 0x056f
105 000c:11fd ASYLUM.28 seg122 0x0ffd
106 000c:120e ASYLUM.36 seg122 0x100e
107 000c:1521 ASYLUM.45 seg122 0x1321
108 000c:158d ASYLUM.45 seg122 0x138d
109 000c:25c1 ASYLUM.47 seg122 0x23c1
110 000c:25c8 ASYLUM.46 seg122 0x23c8
111 000c:2621 ASYLUM.29 seg122 0x2421
112 000c:2671 ASYLUM.29 seg122 0x2471
113 000c:26b8 ASYLUM.37 seg122 0x24b8
114 000c:2708 ASYLUM.37 seg122 0x2508
115 000d:9b3a ASYLUM.25 seg138 0x093a
116 000d:b1cc ASYLUM.27 seg138 0x1fcc
117 000e:090c ASYLUM.18 seg142 0x210c
118 000e:0960 ASYLUM.27 seg142 0x2160
119 000e:2592 ASYLUM.25 seg142 0x3d92
120 000e:259c ASYLUM.19 seg142 0x3d9c

132226
ne_reloc_fixups.json Normal file

File diff suppressed because it is too large Load diff

379
ne_reloc_parser.py Normal file
View file

@ -0,0 +1,379 @@
#!/usr/bin/env python3
"""
NE Relocation Table Parser for Crusader: No Remorse
====================================================
Reads the NE header + per-segment relocation entries from CRUSADER.EXE.
Resolves each CALLF 0x0000:FFFF fixup to its real inter-segment target.
Emits a mapping file suitable for Ghidra annotation.
NE binary: CRUSADER.EXE (bound MZ+NE, NE header at 0x36F70)
Raw import: Ghidra loads the whole file as flat RAM.
Ghidra flat address = file_offset (since it's a raw binary import)
Ghidra seg:off = (flat >> 16) : (flat & 0xFFFF)
"""
import struct, sys, os, json
from collections import defaultdict
EXE_PATH = r'k:\ghidra\Crusader_Decomp\CRUSADER.EXE'
NE_HEADER_OFFSET = 0x36F70 # e_lfanew from MZ header
# ── NE relocation entry address-type codes ──
ADDR_LOBYTE = 0
ADDR_SELECTOR = 2
ADDR_FARPTR = 3 # 16:16 far pointer ← this is CALLF target
ADDR_OFFSET = 5
ADDR_48PTR = 11
ADDR_OFFSET32 = 13
# ── NE relocation entry relocation-type codes ──
REL_INTERNAL = 0 # intra-module (segment:offset)
REL_IMPORTORD = 1 # imported by ordinal
REL_IMPORTNAM = 2 # imported by name
REL_OSFIXUP = 3 # OS fixup
ADDR_TYPE_NAMES = {
0: 'lobyte', 2: 'selector', 3: 'far_ptr_16:16',
5: 'offset16', 11: 'ptr_48', 13: 'offset32'
}
REL_TYPE_NAMES = {
0: 'internal', 1: 'import_ordinal', 2: 'import_name', 3: 'osfixup'
}
def read_u8(data, off):
return data[off]
def read_u16(data, off):
return struct.unpack_from('<H', data, off)[0]
def read_u32(data, off):
return struct.unpack_from('<I', data, off)[0]
def parse_ne_header(data, ne_off):
"""Parse key fields from the NE header."""
magic = data[ne_off:ne_off+2]
assert magic == b'NE', f"Bad NE magic at 0x{ne_off:X}: {magic}"
hdr = {}
hdr['linker_ver'] = read_u8(data, ne_off + 2)
hdr['linker_rev'] = read_u8(data, ne_off + 3)
hdr['entry_table_off'] = read_u16(data, ne_off + 4) + ne_off
hdr['entry_table_len'] = read_u16(data, ne_off + 6)
hdr['flags'] = read_u16(data, ne_off + 12)
hdr['auto_data_seg'] = read_u16(data, ne_off + 14)
hdr['seg_table_off'] = read_u16(data, ne_off + 34) + ne_off
hdr['resource_table_off'] = read_u16(data, ne_off + 36) + ne_off
hdr['resident_name_off'] = read_u16(data, ne_off + 38) + ne_off
hdr['module_ref_off'] = read_u16(data, ne_off + 40) + ne_off
hdr['imported_name_off'] = read_u16(data, ne_off + 42) + ne_off
hdr['nonresident_name_off'] = read_u32(data, ne_off + 44)
hdr['moveable_entries'] = read_u16(data, ne_off + 48)
hdr['alignment_shift'] = read_u16(data, ne_off + 50)
hdr['num_resource_segs'] = read_u16(data, ne_off + 52)
hdr['target_os'] = read_u8(data, ne_off + 54)
hdr['num_segments'] = read_u16(data, ne_off + 44 - 10) # field at offset 0x1C
# Actually let me re-check the NE header layout more carefully
# NE header fields (offsets relative to NE signature):
# 0x1C = number of segments
# 0x22 = offset of segment table (relative to NE header)
# 0x32 = alignment shift count
hdr['num_segments'] = read_u16(data, ne_off + 0x1C)
hdr['seg_table_off'] = read_u16(data, ne_off + 0x22) + ne_off
hdr['alignment_shift'] = read_u16(data, ne_off + 0x32)
hdr['module_ref_off'] = read_u16(data, ne_off + 0x28) + ne_off
hdr['imported_name_off'] = read_u16(data, ne_off + 0x2A) + ne_off
hdr['num_module_refs'] = read_u16(data, ne_off + 0x1E)
return hdr
def parse_segment_table(data, hdr):
"""Parse the NE segment table entries (8 bytes each)."""
segments = []
off = hdr['seg_table_off']
shift = hdr['alignment_shift']
for i in range(hdr['num_segments']):
sector_off = read_u16(data, off)
seg_len = read_u16(data, off + 2)
seg_flags = read_u16(data, off + 4)
min_alloc = read_u16(data, off + 6)
file_offset = sector_off << shift if sector_off != 0 else 0
has_reloc = bool(seg_flags & 0x0100)
# Fix zero length = 64K
if seg_len == 0 and sector_off != 0:
seg_len = 0x10000
segments.append({
'index': i + 1, # 1-based segment number
'file_offset': file_offset,
'length': seg_len,
'flags': seg_flags,
'min_alloc': min_alloc,
'has_reloc': has_reloc,
})
off += 8
return segments
def parse_module_refs(data, hdr):
"""Parse the module reference table → imported module names."""
modules = []
mref_off = hdr['module_ref_off']
iname_off = hdr['imported_name_off']
for i in range(hdr['num_module_refs']):
name_off_rel = read_u16(data, mref_off + i * 2)
name_off_abs = iname_off + name_off_rel
name_len = read_u8(data, name_off_abs)
name = data[name_off_abs + 1: name_off_abs + 1 + name_len].decode('ascii', errors='replace')
modules.append(name)
return modules
def parse_relocations(data, seg):
"""Parse relocation entries for a single segment."""
if not seg['has_reloc']:
return []
# Relocation table starts right after the segment data in the file
reloc_off = seg['file_offset'] + seg['length']
num_relocs = read_u16(data, reloc_off)
reloc_off += 2
entries = []
for i in range(num_relocs):
addr_type = read_u8(data, reloc_off)
rel_type = read_u8(data, reloc_off + 1)
chain_off = read_u16(data, reloc_off + 2) # offset within segment where fixup applies
# Additive flag is bit 2 of rel_type
additive = bool(rel_type & 0x04)
rel_type_base = rel_type & 0x03
entry = {
'addr_type': addr_type,
'addr_type_name': ADDR_TYPE_NAMES.get(addr_type, f'unk_{addr_type}'),
'rel_type': rel_type_base,
'rel_type_name': REL_TYPE_NAMES.get(rel_type_base, f'unk_{rel_type_base}'),
'additive': additive,
'seg_offset': chain_off,
'seg_index': seg['index'],
}
if rel_type_base == REL_INTERNAL:
# Internal reference
target_seg = read_u8(data, reloc_off + 4)
reserved = read_u8(data, reloc_off + 5)
target_off = read_u16(data, reloc_off + 6)
if target_seg == 0xFF:
# Moveable segment, target_off is entry table ordinal
entry['target_type'] = 'moveable_entry'
entry['entry_ordinal'] = target_off
else:
entry['target_type'] = 'fixed'
entry['target_seg'] = target_seg # 1-based segment number
entry['target_offset'] = target_off
elif rel_type_base == REL_IMPORTORD:
module_idx = read_u16(data, reloc_off + 4) # 1-based
ordinal = read_u16(data, reloc_off + 6)
entry['target_type'] = 'import_ordinal'
entry['module_index'] = module_idx
entry['ordinal'] = ordinal
elif rel_type_base == REL_IMPORTNAM:
module_idx = read_u16(data, reloc_off + 4) # 1-based
name_off = read_u16(data, reloc_off + 6)
entry['target_type'] = 'import_name'
entry['module_index'] = module_idx
entry['name_offset'] = name_off
elif rel_type_base == REL_OSFIXUP:
fixup_type = read_u16(data, reloc_off + 4)
entry['target_type'] = 'osfixup'
entry['osfixup_type'] = fixup_type
entries.append(entry)
reloc_off += 8
return entries
def follow_reloc_chain(data, seg, first_offset, addr_type):
"""
NE relocations use a chain: the first entry points to an offset in
the segment. At that offset, a word points to the next offset
needing the same fixup. 0xFFFF terminates the chain.
Returns all offsets in the chain.
"""
offsets = []
seg_data_start = seg['file_offset']
seg_len = seg['length']
current = first_offset
visited = set()
while current != 0xFFFF and current < seg_len:
if current in visited:
break # cycle protection
visited.add(current)
offsets.append(current)
# For far_ptr: the call instruction is CALLF seg:off at the offset
# The offset field (first word) at current contains the next chain link
next_ptr_file = seg_data_start + current
if next_ptr_file + 2 > len(data):
break
next_off = read_u16(data, next_ptr_file)
current = next_off
return offsets
def file_offset_to_ghidra(file_off):
"""Convert file offset to Ghidra seg:off address string (raw import)."""
seg = file_off >> 16
off = file_off & 0xFFFF
return f'{seg:04x}:{off:04x}'
def main():
print(f"Reading {EXE_PATH}...")
with open(EXE_PATH, 'rb') as f:
data = f.read()
print(f" File size: {len(data)} bytes (0x{len(data):X})")
# Verify NE header location
# Check MZ header first
assert data[0:2] == b'MZ', "Not an MZ executable"
lfanew = read_u32(data, 0x3C)
print(f" e_lfanew from MZ header: 0x{lfanew:X}")
# Use the known NE offset
ne_off = NE_HEADER_OFFSET
print(f" Using NE header at: 0x{ne_off:X}")
hdr = parse_ne_header(data, ne_off)
print(f" Segments: {hdr['num_segments']}")
print(f" Alignment shift: {hdr['alignment_shift']}")
print(f" Module refs: {hdr['num_module_refs']}")
modules = parse_module_refs(data, hdr)
print(f" Imported modules: {modules}")
segments = parse_segment_table(data, hdr)
# Parse all relocations
all_fixups = [] # list of resolved fixup records
stats = defaultdict(int)
for seg in segments:
relocs = parse_relocations(data, seg)
if not relocs:
continue
for reloc in relocs:
# Follow the chain to find ALL offsets needing this fixup
chain = follow_reloc_chain(data, seg, reloc['seg_offset'], reloc['addr_type'])
for fixup_off in chain:
fixup_file_off = seg['file_offset'] + fixup_off
ghidra_addr = file_offset_to_ghidra(fixup_file_off)
rec = {
'source_seg': seg['index'],
'source_offset_in_seg': fixup_off,
'source_file_offset': fixup_file_off,
'source_ghidra': ghidra_addr,
'addr_type': reloc['addr_type_name'],
'rel_type': reloc['rel_type_name'],
}
if reloc.get('target_type') == 'fixed':
target_seg_idx = reloc['target_seg']
target_off = reloc['target_offset']
target_seg_info = segments[target_seg_idx - 1]
target_file_off = target_seg_info['file_offset'] + target_off
target_ghidra = file_offset_to_ghidra(target_file_off)
rec['target'] = f'seg{target_seg_idx:03d}:{target_off:04x}'
rec['target_ghidra'] = target_ghidra
rec['target_file_offset'] = target_file_off
elif reloc.get('target_type') == 'moveable_entry':
rec['target'] = f'entry_ordinal_{reloc["entry_ordinal"]}'
rec['target_ghidra'] = '?'
elif reloc.get('target_type') == 'import_ordinal':
mod_idx = reloc['module_index']
mod_name = modules[mod_idx - 1] if mod_idx <= len(modules) else f'mod{mod_idx}'
rec['target'] = f'{mod_name}.{reloc["ordinal"]}'
rec['target_ghidra'] = '?'
elif reloc.get('target_type') == 'import_name':
mod_idx = reloc['module_index']
mod_name = modules[mod_idx - 1] if mod_idx <= len(modules) else f'mod{mod_idx}'
# Read the imported name
iname_base = hdr['imported_name_off']
name_off = iname_base + reloc['name_offset']
name_len = read_u8(data, name_off)
name = data[name_off+1:name_off+1+name_len].decode('ascii', errors='replace')
rec['target'] = f'{mod_name}.{name}'
rec['target_ghidra'] = '?'
elif reloc.get('target_type') == 'osfixup':
rec['target'] = f'osfixup_{reloc["osfixup_type"]}'
rec['target_ghidra'] = '?'
else:
rec['target'] = '???'
rec['target_ghidra'] = '?'
all_fixups.append(rec)
stats[reloc['addr_type_name']] += 1
print(f"\n Total resolved fixup points: {len(all_fixups)}")
print(f" By address type: {dict(stats)}")
# Filter to just far_ptr (CALLF) fixups with internal targets — these are the ones
# that decompile as CALLF 0000:ffff in Ghidra
far_calls = [f for f in all_fixups if f['addr_type'] == 'far_ptr_16:16' and f.get('target_ghidra', '?') != '?']
far_imports = [f for f in all_fixups if f['addr_type'] == 'far_ptr_16:16' and f.get('target_ghidra', '?') == '?']
print(f" Far-call internal fixups: {len(far_calls)}")
print(f" Far-call import fixups: {len(far_imports)}")
# Save full results
out_path = os.path.join(os.path.dirname(EXE_PATH), 'ne_reloc_fixups.json')
with open(out_path, 'w') as f:
json.dump(all_fixups, f, indent=2)
print(f"\n Full fixup table written to: {out_path}")
# Save a focused far-call table (TSV) for easy use
tsv_path = os.path.join(os.path.dirname(EXE_PATH), 'ne_reloc_far_calls.tsv')
with open(tsv_path, 'w') as f:
f.write("source_ghidra\ttarget_ghidra\ttarget_label\tsource_seg\tsource_off_in_seg\n")
for rec in sorted(far_calls, key=lambda r: r['source_file_offset']):
f.write(f"{rec['source_ghidra']}\t{rec['target_ghidra']}\t{rec['target']}\t")
f.write(f"seg{rec['source_seg']:03d}\t0x{rec['source_offset_in_seg']:04x}\n")
print(f" Far-call internal TSV: {tsv_path}")
# Also save import far-calls
imp_path = os.path.join(os.path.dirname(EXE_PATH), 'ne_reloc_far_imports.tsv')
with open(imp_path, 'w') as f:
f.write("source_ghidra\ttarget\tsource_seg\tsource_off_in_seg\n")
for rec in sorted(far_imports, key=lambda r: r['source_file_offset']):
f.write(f"{rec['source_ghidra']}\t{rec['target']}\t")
f.write(f"seg{rec['source_seg']:03d}\t0x{rec['source_offset_in_seg']:04x}\n")
print(f" Far-call import TSV: {imp_path}")
# Print a sample of game-segment far calls (seg039=seg001 region in raw, file offset 0x6E200)
print("\n── Sample: seg039 (NE seg 39, game seg001 area) far-call fixups ──")
seg39_calls = [f for f in far_calls if f['source_seg'] == 39]
for rec in sorted(seg39_calls, key=lambda r: r['source_offset_in_seg'])[:30]:
print(f" {rec['source_ghidra']}{rec['target_ghidra']} ({rec['target']})")
# Print a sample around the entity_ai_update_loop / entity_animation area
print("\n── Sample: seg059 (NE seg 59, game 0007: area) far-call fixups ──")
seg59_calls = [f for f in far_calls if f['source_seg'] == 59]
for rec in sorted(seg59_calls, key=lambda r: r['source_offset_in_seg'])[:30]:
print(f" {rec['source_ghidra']}{rec['target_ghidra']} ({rec['target']})")
if __name__ == '__main__':
main()

View file

@ -0,0 +1,44 @@
{
"transaction": "Repair seg043 boundaries around 0007:5a90",
"remove_functions": [
"0007:5b6f"
],
"create_functions": [
{
"entry": "0007:5a90",
"name": "seg043_func_0090",
"body_start": "0007:5a90",
"body_end": "0007:5b79",
"comment": "Recovered from standalone seg043 boundary scan: true start at seg043:0090, body spans seg043:0090..0179.",
"comment_type": "plate"
},
{
"entry": "0007:5b7a",
"name": "seg043_func_017a",
"body_start": "0007:5b7a",
"body_end": "0007:5c1b",
"comment": "Recovered from standalone seg043 boundary scan: second prologue at seg043:017a, body spans seg043:017a..021b.",
"comment_type": "plate"
},
{
"entry": "0007:5c1c",
"name": "seg043_func_021c",
"body_start": "0007:5c1c",
"body_end": "0007:5c80",
"comment": "Recovered from standalone seg043 boundary scan: third prologue at seg043:021c, body spans seg043:021c..0280.",
"comment_type": "plate"
}
],
"comments": [
{
"address": "0007:5b6f",
"text": "Old auto-created split overlaps the earlier seg043:0090..0179 routine and should not be treated as a real entrypoint.",
"type": "plate"
}
],
"assert_functions": [
"0007:5a90",
"0007:5b7a",
"0007:5c1c"
]
}

5
read_file.py Normal file
View file

@ -0,0 +1,5 @@
f = open(r'k:\ghidra\Crusader_Decomp\tier4_ghidra.txt', 'r')
content = f.read()
f.close()
print('SIZE=' + str(len(content)))
print(content)

20
resolve_bb4f.py Normal file
View file

@ -0,0 +1,20 @@
"""Resolve 0008:bb58 (FUN_0008_bb4f's inner CALLF operand at +1)"""
import json
with open(r'k:\ghidra\Crusader_Decomp\ne_reloc_fixups.json') as f:
fixups = json.load(f)
by_off = {f['source_file_offset']: f for f in fixups}
# 0008:bb58 CALLF, operand at 0008:bb59 = flat 0x8BB59
flat = 0x8BB59
if flat in by_off:
m = by_off[flat]
print(f"0008:bb58 CALLF -> {m.get('target','?')} (ghidra: {m.get('target_ghidra','?')})")
else:
print(f"NOT FOUND at 0x{flat:X}")
# Try nearby
for d in range(-2, 5):
if flat+d in by_off:
m = by_off[flat+d]
print(f" +{d}: {m.get('target','?')} (ghidra: {m.get('target_ghidra','?')})")

55
resolve_top_targets.py Normal file
View file

@ -0,0 +1,55 @@
"""Find the resolved NE targets for the top-called wrapper functions."""
import json
with open(r'k:\ghidra\Crusader_Decomp\ne_reloc_fixups.json') as f:
fixups = json.load(f)
by_off = {f['source_file_offset']: f for f in fixups}
# Top wrappers: look up what their internal CALLF targets are
wrappers = {
'0003:ac9c': 'FUN_0003_ac7e inner CALLF (272 callers, alloc wrapper)',
'0003:a75a': 'FUN_0003_a751 inner CALLF (207 callers, 2-arg forward)',
'0008:bb4f': 'FUN_0008_bb4f (174 callers)',
}
def g2f(a):
s,o = a.split(':')
return (int(s,16)<<16) + int(o,16)
for addr, desc in wrappers.items():
flat = g2f(addr)
for delta in range(0, 5):
if flat + delta in by_off:
m = by_off[flat + delta]
print(f"{addr} ({desc})")
print(f" -> {m.get('target','?')} (ghidra: {m.get('target_ghidra','?')})")
break
else:
print(f"{addr} ({desc}) — NOT FOUND in fixups")
# Also look up 000a:44fd — it had no function, check if it's data or seg boundary
print()
print(f"Checking 000a:44fd — flat 0x{g2f('000a:44fd'):X}")
print(f" This is file offset 0xA44FD")
# Find which NE segment contains this
import csv
with open(r'k:\ghidra\Crusader_Decomp\crusader_ne_segments.csv') as f:
reader = csv.DictReader(f)
for row in reader:
seg_off = int(row['FileOffset'], 16)
seg_len = int(row['Length'], 16)
if seg_off <= 0xA44FD < seg_off + seg_len:
print(f" In NE segment {row['Segment']}: file 0x{seg_off:X}, len 0x{seg_len:X}")
print(f" Offset within segment: 0x{0xA44FD - seg_off:X}")
break
# Also check what calls 000a:44fd (search for its Ghidra address in call patterns)
print()
seg91_calls = [f for f in fixups if f.get('target_ghidra') == '000a:44fd']
print(f"Calls to 000a:44fd (seg091:00fd): {len(seg91_calls)} total")
# Show first 5 callers
for c in seg91_calls[:5]:
src_flat = c['source_file_offset'] - 1
src_ghidra = f"{src_flat>>16:04x}:{src_flat&0xFFFF:04x}"
print(f" from {src_ghidra} (seg{c['source_seg']:03d}+0x{c['source_offset_in_seg']:04x})")

11
script_contents.txt Normal file
View file

@ -0,0 +1,11 @@
from collections import Counter
c = Counter()
with open('ne_reloc_far_calls.tsv') as f:
next(f)
for line in f:
parts = line.strip().split('\t')
tgt = parts[2]
c[tgt] += 1
for i, (addr, cnt) in enumerate(c.most_common(100)):
if i >= 60 and i < 80:
print(f'{i+1:3d} {addr} {cnt}')

20
tier4_ghidra.txt Normal file
View file

@ -0,0 +1,20 @@
61 000b:30a5 42
62 0008:bc27 41
63 0008:d214 40
64 0005:1565 39
65 0008:4bba 39
66 0009:6f5a 38
67 0009:8ef6 38
68 000a:7b30 38
69 000a:7b3f 38
70 0009:6e7f 35
71 000a:7b35 35
72 0009:c433 34
73 0009:2156 33
74 000a:2c41 33
75 000a:454d 32
76 000b:2446 31
77 0005:1238 30
78 000b:1446 30
79 000d:85da 29
80 0005:1511 29

21
tier4_ghidra_check.txt Normal file
View file

@ -0,0 +1,21 @@
SIZE=380
61 000b:30a5 42
62 0008:bc27 41
63 0008:d214 40
64 0005:1565 39
65 0008:4bba 39
66 0009:6f5a 38
67 0009:8ef6 38
68 000a:7b30 38
69 000a:7b3f 38
70 0009:6e7f 35
71 000a:7b35 35
72 0009:c433 34
73 0009:2156 33
74 000a:2c41 33
75 000a:454d 32
76 000b:2446 31
77 0005:1238 30
78 000b:1446 30
79 000d:85da 29
80 0005:1511 29

20
tier4_output.txt Normal file
View file

@ -0,0 +1,20 @@
61 seg109:02a5 42
62 seg061:0227 41
63 seg061:1814 40
64 seg021:1365 39
65 seg055:09ba 39
66 seg076:015a 38
67 seg080:02f6 38
68 seg096:0530 38
69 seg096:053f 38
70 seg076:007f 35
71 seg096:0535 35
72 seg083:0033 34
73 seg070:0556 33
74 seg087:0441 33
75 seg091:014d 32
76 seg108:0a46 31
77 seg021:1038 30
78 seg107:0046 30
79 seg137:07da 29
80 seg021:1311 29

20
tier4_result.txt Normal file
View file

@ -0,0 +1,20 @@
61 seg109:02a5 42
62 seg061:0227 41
63 seg061:1814 40
64 seg021:1365 39
65 seg055:09ba 39
66 seg076:015a 38
67 seg080:02f6 38
68 seg096:0530 38
69 seg096:053f 38
70 seg076:007f 35
71 seg096:0535 35
72 seg083:0033 34
73 seg070:0556 33
74 seg087:0441 33
75 seg091:014d 32
76 seg108:0a46 31
77 seg021:1038 30
78 seg107:0046 30
79 seg137:07da 29
80 seg021:1311 29

0
tier5_errors.txt Normal file
View file

20
tier5_output.txt Normal file
View file

@ -0,0 +1,20 @@
81 0009:1c00 29
82 0008:75f3 29
83 0006:0208 29
84 000a:30d7 29
85 0009:c45f 29
86 0004:d7a0 28
87 000a:5276 28
88 0003:d94f 28
89 000a:7b3a 28
90 0008:ca18 27
91 0008:bd20 27
92 0009:3ceb 27
93 0005:09b4 27
94 0005:0fbb 27
95 0008:d27e 26
96 0005:0376 26
97 000b:2492 26
98 0003:e4d3 26
99 0005:033e 25
100 000a:87fd 25

1
tools/__init__.py Normal file
View file

@ -0,0 +1 @@
"""Workspace helper packages."""

Binary file not shown.

View file

@ -0,0 +1,5 @@
"""PyGhidra helpers for the Crusader Ghidra project."""
from .cli import main
__all__ = ["main"]

View file

@ -0,0 +1,5 @@
from .cli import main
if __name__ == "__main__":
raise SystemExit(main())

View file

@ -0,0 +1,814 @@
from __future__ import annotations
import argparse
import json
from pathlib import Path
from .common import (
DEFAULT_INSTALL_DIR,
DEFAULT_PROJECT_DIR,
DEFAULT_PROJECT_NAME,
DEFAULT_PROGRAM_NAME,
DEFAULT_FOLDER_PATH,
ProjectConfig,
create_function,
decompile_function,
disassemble_function,
format_function_summary,
get_function,
get_function_containing,
get_functions_by_exact_name,
get_xrefs_from,
get_xrefs_to,
list_classes,
list_data_items,
list_exports,
list_root_files,
list_imports,
list_namespaces,
list_segments,
list_strings,
open_program,
open_project,
read_region_bytes,
remove_function,
rename_function,
run_script_file,
save_program,
search_functions_by_name,
set_comment,
transaction,
)
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="PyGhidra helpers for the Crusader project."
)
parser.add_argument(
"--install-dir",
default=str(DEFAULT_INSTALL_DIR),
help="Ghidra install directory.",
)
parser.add_argument(
"--project-dir",
default=str(DEFAULT_PROJECT_DIR),
help="Directory containing the Ghidra project.",
)
parser.add_argument(
"--project-name",
default=DEFAULT_PROJECT_NAME,
help="Ghidra project name.",
)
parser.add_argument(
"--program-name",
default=DEFAULT_PROGRAM_NAME,
help="Program name inside the project.",
)
parser.add_argument(
"--folder-path",
default=DEFAULT_FOLDER_PATH,
help="Project folder path containing the program.",
)
parser.add_argument(
"--restore-project",
action="store_true",
help="Restore project tool state while opening the project.",
)
parser.add_argument(
"--format",
choices=["text", "json"],
default="text",
help="Output format.",
)
subparsers = parser.add_subparsers(dest="command", required=True)
subparsers.add_parser(
"project-files",
help="List root-level files in the Ghidra project.",
)
dump_parser = subparsers.add_parser(
"dump-region",
help="Dump instructions and resolved call targets for an address range.",
)
dump_parser.add_argument("--start", required=True, help="Start address.")
dump_parser.add_argument("--end", required=True, help="Inclusive end address.")
create_parser = subparsers.add_parser(
"create-function",
help="Create a function at an address with an optional explicit body range.",
)
create_parser.add_argument("--entry", required=True, help="Function entry address.")
create_parser.add_argument("--name", required=True, help="New function name.")
create_parser.add_argument("--body-start", help="Function body start address.")
create_parser.add_argument("--body-end", help="Function body end address.")
create_parser.add_argument(
"--plate-comment",
help="Optional plate comment to set at the entry address after creation.",
)
delete_parser = subparsers.add_parser(
"delete-function",
help="Delete a function at an address.",
)
delete_parser.add_argument("--entry", required=True, help="Function entry address.")
rename_parser = subparsers.add_parser(
"rename-function",
help="Rename an existing function by entry address.",
)
rename_parser.add_argument("--entry", required=True, help="Function entry address.")
rename_parser.add_argument("--name", required=True, help="New function name.")
rename_by_address_parser = subparsers.add_parser(
"rename-function-by-address",
help="Rename an existing function by entry address (MCP-style alias).",
)
rename_by_address_parser.add_argument(
"--entry", required=True, help="Function entry address."
)
rename_by_address_parser.add_argument("--name", required=True, help="New function name.")
comment_parser = subparsers.add_parser(
"set-comment",
help="Set a code-unit comment by address.",
)
comment_parser.add_argument("--address", required=True, help="Comment target address.")
comment_parser.add_argument("--text", required=True, help="Comment text.")
comment_parser.add_argument(
"--type",
choices=["pre", "plate", "eol", "repeatable", "post"],
default="plate",
help="Comment type.",
)
decompiler_comment_parser = subparsers.add_parser(
"set-decompiler-comment",
help="Set a decompiler-visible pre-comment by address.",
)
decompiler_comment_parser.add_argument("--address", required=True, help="Comment target address.")
decompiler_comment_parser.add_argument("--text", required=True, help="Comment text.")
disassembly_comment_parser = subparsers.add_parser(
"set-disassembly-comment",
help="Set a disassembly EOL comment by address.",
)
disassembly_comment_parser.add_argument("--address", required=True, help="Comment target address.")
disassembly_comment_parser.add_argument("--text", required=True, help="Comment text.")
get_function_parser = subparsers.add_parser(
"get-function-by-address",
help="Show function metadata for an exact entry address.",
)
get_function_parser.add_argument("--address", required=True, help="Function entry address.")
get_function_containing_parser = subparsers.add_parser(
"get-function-containing",
help="Show function metadata for the function containing an address.",
)
get_function_containing_parser.add_argument(
"--address", required=True, help="Address inside the desired function body."
)
list_functions_parser = subparsers.add_parser(
"list-functions",
help="List all defined functions.",
)
list_functions_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
list_functions_parser.add_argument("--limit", type=int, default=100, help="Maximum functions to print.")
list_segments_parser = subparsers.add_parser(
"list-segments",
help="List memory segments or blocks.",
)
list_segments_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
list_segments_parser.add_argument("--limit", type=int, default=100, help="Maximum segments to print.")
list_data_items_parser = subparsers.add_parser(
"list-data-items",
help="List defined data items.",
)
list_data_items_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
list_data_items_parser.add_argument("--limit", type=int, default=100, help="Maximum data items to print.")
list_classes_parser = subparsers.add_parser(
"list-classes",
help="List class namespaces.",
)
list_classes_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
list_classes_parser.add_argument("--limit", type=int, default=100, help="Maximum classes to print.")
list_strings_parser = subparsers.add_parser(
"list-strings",
help="List defined strings in the program.",
)
list_strings_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
list_strings_parser.add_argument("--limit", type=int, default=2000, help="Maximum strings to print.")
list_strings_parser.add_argument("--filter", help="Optional substring filter.")
list_imports_parser = subparsers.add_parser(
"list-imports",
help="List imported external symbols.",
)
list_imports_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
list_imports_parser.add_argument("--limit", type=int, default=100, help="Maximum imports to print.")
list_exports_parser = subparsers.add_parser(
"list-exports",
help="List exported entry points and symbols.",
)
list_exports_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
list_exports_parser.add_argument("--limit", type=int, default=100, help="Maximum exports to print.")
list_namespaces_parser = subparsers.add_parser(
"list-namespaces",
help="List non-global namespaces, classes, and libraries.",
)
list_namespaces_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
list_namespaces_parser.add_argument("--limit", type=int, default=100, help="Maximum namespaces to print.")
search_functions_parser = subparsers.add_parser(
"search-functions-by-name",
help="List functions whose names contain a substring.",
)
search_functions_parser.add_argument("--query", required=True, help="Substring to match.")
search_functions_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
search_functions_parser.add_argument("--limit", type=int, default=100, help="Maximum functions to print.")
decompile_name_parser = subparsers.add_parser(
"decompile-function",
help="Decompile an exact-named function.",
)
decompile_name_parser.add_argument("--name", required=True, help="Exact function name.")
decompile_name_parser.add_argument("--timeout", type=int, default=30, help="Decompile timeout in seconds.")
decompile_address_parser = subparsers.add_parser(
"decompile-function-by-address",
help="Decompile a function by entry address.",
)
decompile_address_parser.add_argument("--address", required=True, help="Function entry address.")
decompile_address_parser.add_argument("--timeout", type=int, default=30, help="Decompile timeout in seconds.")
disassemble_parser = subparsers.add_parser(
"disassemble-function",
help="Disassemble a function body by entry address.",
)
disassemble_parser.add_argument("--address", required=True, help="Function entry address.")
read_region_parser = subparsers.add_parser(
"read-region",
help="Dump raw bytes for an inclusive address range.",
)
read_region_parser.add_argument("--start", required=True, help="Start address.")
read_region_parser.add_argument("--end", required=True, help="Inclusive end address.")
run_script_parser = subparsers.add_parser(
"run-script",
help="Execute a Python file with project/program context to avoid interactive shell quoting issues.",
)
run_script_parser.add_argument("--script", required=True, help="Path to the Python script file.")
run_script_parser.add_argument(
"--read-only",
action="store_true",
help="Open the program read-only for script execution.",
)
xrefs_to_parser = subparsers.add_parser(
"get-xrefs-to",
help="List references to an address.",
)
xrefs_to_parser.add_argument("--address", required=True, help="Target address.")
xrefs_to_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
xrefs_to_parser.add_argument("--limit", type=int, default=100, help="Maximum references to print.")
xrefs_from_parser = subparsers.add_parser(
"get-xrefs-from",
help="List references from an address.",
)
xrefs_from_parser.add_argument("--address", required=True, help="Source address.")
xrefs_from_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
xrefs_from_parser.add_argument("--limit", type=int, default=100, help="Maximum references to print.")
function_xrefs_parser = subparsers.add_parser(
"get-function-xrefs",
help="List references to a function entry by exact function name.",
)
function_xrefs_parser.add_argument("--name", required=True, help="Exact function name.")
function_xrefs_parser.add_argument("--offset", type=int, default=0, help="Pagination offset.")
function_xrefs_parser.add_argument("--limit", type=int, default=100, help="Maximum references to print.")
plan_parser = subparsers.add_parser(
"apply-plan",
help="Apply a JSON edit plan containing function and comment operations.",
)
plan_parser.add_argument("--plan", required=True, help="Path to the JSON plan file.")
plan_parser.add_argument(
"--dry-run",
action="store_true",
help="Validate and print the plan without modifying the project.",
)
return parser
def build_config(args: argparse.Namespace) -> ProjectConfig:
return ProjectConfig(
install_dir=Path(args.install_dir),
project_dir=Path(args.project_dir),
project_name=args.project_name,
program_name=args.program_name,
folder_path=args.folder_path,
restore_project=args.restore_project,
)
def _emit(args: argparse.Namespace, payload, text: str | None = None) -> int:
if args.format == "json":
print(json.dumps(payload, indent=2, sort_keys=True))
return 0
if text is not None:
print(text)
return 0
if isinstance(payload, list):
for item in payload:
print(item)
return 0
if isinstance(payload, dict):
print(json.dumps(payload, indent=2, sort_keys=True))
return 0
print(payload)
return 0
def _function_to_dict(function) -> dict[str, str]:
summary_text = format_function_summary(function)
lines = summary_text.splitlines()
body_line = lines[3].split(": ", 1)[1]
body_start, body_end = body_line.split(" - ", 1)
return {
"name": function.getName(),
"signature": lines[1].split(": ", 1)[1],
"entry": str(function.getEntryPoint()),
"body_start": body_start,
"body_end": body_end,
}
def _function_line(function) -> str:
return f"{function.getName()} @ {function.getEntryPoint()}"
def _text_or_empty(lines: list[str], empty_message: str) -> str:
return "\n".join(lines) if lines else empty_message
def command_project_files(config: ProjectConfig, _args: argparse.Namespace) -> int:
project = open_project(config)
try:
names = list_root_files(project)
finally:
project.close()
return _emit(_args, names, "\n".join(names))
def command_dump_region(config: ProjectConfig, args: argparse.Namespace) -> int:
from .common import to_address
with open_program(config, read_only=True) as (_project, program):
listing = program.getListing()
memory = program.getMemory()
start = to_address(program, args.start)
end = to_address(program, args.end)
size = end.subtract(start) + 1
buf = bytearray(size)
memory.getBytes(start, buf)
print(f"REGION {args.start}..{args.end} BYTES {bytes(buf[:32]).hex()}")
instruction = listing.getInstructionAt(start)
while instruction is not None and instruction.getAddress().compareTo(end) <= 0:
line = f"{instruction.getAddress()}: {instruction.toString()}"
if instruction.getFlowType().isCall():
references = instruction.getReferencesFrom()
if references:
target = references[0].getToAddress()
function = program.getFunctionManager().getFunctionAt(target)
if function is not None:
line += f" -> {function.getName()} @ {target}"
else:
line += f" -> {target}"
print(line)
instruction = instruction.getNext()
return 0
def command_create_function(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=False) as (project, program):
with transaction(program, f"Create function {args.entry}"):
function = create_function(program, args.entry, args.name, args.body_start, args.body_end)
if args.plate_comment:
set_comment(program, args.entry, args.plate_comment, "plate")
save_program(project, program)
return _emit(
args,
{"status": "ok", "entry": args.entry, "name": function.getName(), "action": "create-function"},
f"created {function.getName()} at {args.entry}",
)
def command_delete_function(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=False) as (project, program):
with transaction(program, f"Delete function {args.entry}"):
removed = remove_function(program, args.entry)
if not removed:
raise RuntimeError(f"no function removed at {args.entry}")
save_program(project, program)
return _emit(
args,
{"status": "ok", "entry": args.entry, "action": "delete-function"},
f"deleted function at {args.entry}",
)
def command_rename_function(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=False) as (project, program):
with transaction(program, f"Rename function {args.entry}"):
function = rename_function(program, args.entry, args.name)
save_program(project, program)
return _emit(
args,
{"status": "ok", "entry": args.entry, "name": function.getName(), "action": "rename-function"},
f"renamed {args.entry} to {function.getName()}",
)
def _set_comment_with_type(config: ProjectConfig, args: argparse.Namespace, address: str, text: str, comment_type: str) -> int:
with open_program(config, read_only=False) as (project, program):
with transaction(program, f"Set comment {address}"):
set_comment(program, address, text, comment_type)
save_program(project, program)
return _emit(
args,
{"status": "ok", "address": address, "type": comment_type, "text": text, "action": "set-comment"},
f"set {comment_type} comment at {address}",
)
def command_set_comment(config: ProjectConfig, args: argparse.Namespace) -> int:
return _set_comment_with_type(config, args, args.address, args.text, args.type)
def command_set_decompiler_comment(config: ProjectConfig, args: argparse.Namespace) -> int:
return _set_comment_with_type(config, args, args.address, args.text, "pre")
def command_set_disassembly_comment(config: ProjectConfig, args: argparse.Namespace) -> int:
return _set_comment_with_type(config, args, args.address, args.text, "eol")
def _require_function_by_address(program, address_text: str):
function = get_function(program, address_text)
if function is None:
raise RuntimeError(f"no function found at {address_text}")
return function
def _require_single_function_by_name(program, name: str):
matches = get_functions_by_exact_name(program, name)
if not matches:
raise RuntimeError(f"no function found with exact name '{name}'")
if len(matches) > 1:
raise RuntimeError(
f"multiple functions match exact name '{name}'; use search-functions-by-name or an address-specific command"
)
return matches[0]
def _print_function_lines(functions) -> None:
for function in functions:
print(f"{function.getName()} @ {function.getEntryPoint()}")
def _print_reference_lines(references: list[dict[str, str | int]]) -> None:
for reference in references:
print(
f"{reference['from']} -> {reference['to']} [{reference['type']}] operand={reference['operand_index']}"
)
def command_get_function_by_address(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
function = _require_function_by_address(program, args.address)
payload = _function_to_dict(function)
text = format_function_summary(function)
return _emit(args, payload, text)
def command_get_function_containing(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
function = get_function_containing(program, args.address)
if function is None:
raise RuntimeError(f"no containing function found at {args.address}")
payload = _function_to_dict(function)
text = format_function_summary(function)
return _emit(args, payload, text)
def command_list_functions(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
functions = search_functions_by_name(program, "", offset=args.offset, limit=args.limit)
payload = [{"name": function.getName(), "entry": str(function.getEntryPoint())} for function in functions]
text = _text_or_empty([_function_line(function) for function in functions], "no functions found")
return _emit(args, payload, text)
def command_search_functions_by_name(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
functions = search_functions_by_name(program, args.query, offset=args.offset, limit=args.limit)
payload = [{"name": function.getName(), "entry": str(function.getEntryPoint())} for function in functions]
text = _text_or_empty([_function_line(function) for function in functions], "no matching functions found")
return _emit(args, payload, text)
def command_list_strings(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
strings = list_strings(program, offset=args.offset, limit=args.limit, filter_text=args.filter)
text = _text_or_empty([f"{entry['address']}: {entry['text']}" for entry in strings], "no strings found")
return _emit(args, strings, text)
def command_list_segments(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
segments = list_segments(program, offset=args.offset, limit=args.limit)
text = _text_or_empty(
[
f"{entry['name']} {entry['start']} - {entry['end']} len={entry['length']}"
f" r={entry['read']} w={entry['write']} x={entry['execute']} init={entry['initialized']}"
for entry in segments
],
"no segments found",
)
return _emit(args, segments, text)
def command_list_data_items(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
items = list_data_items(program, offset=args.offset, limit=args.limit)
text = _text_or_empty(
[
f"{entry['address']} {entry['mnemonic']} len={entry['length']}"
+ (f" value={entry['value']}" if entry['value'] is not None else "")
for entry in items
],
"no data items found",
)
return _emit(args, items, text)
def command_list_classes(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
classes = list_classes(program, offset=args.offset, limit=args.limit)
text = _text_or_empty(
[
f"{entry['name']}" + (f" parent={entry['parent']}" if entry['parent'] else "")
for entry in classes
],
"no classes found",
)
return _emit(args, classes, text)
def command_list_imports(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
imports = list_imports(program, offset=args.offset, limit=args.limit)
text = _text_or_empty([
f"{entry['library']}!{entry['label'] or '<unnamed>'} @ {entry['address'] or '<no address>'}"
for entry in imports
], "no imports found")
return _emit(args, imports, text)
def command_list_exports(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
exports = list_exports(program, offset=args.offset, limit=args.limit)
text = _text_or_empty([
f"{entry['name'] or '<unnamed>'} @ {entry['address']} [{entry['kind']}]"
for entry in exports
], "no exports found")
return _emit(args, exports, text)
def command_list_namespaces(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
namespaces = list_namespaces(program, offset=args.offset, limit=args.limit)
text = _text_or_empty([
f"{entry['name']} [{entry['type']}]" + (f" parent={entry['parent']}" if entry['parent'] else "")
for entry in namespaces
], "no namespaces found")
return _emit(args, namespaces, text)
def command_decompile_function_by_address(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
function = _require_function_by_address(program, args.address)
output = decompile_function(program, function, args.timeout)
return _emit(args, {"address": args.address, "decompiled": output}, output)
def command_decompile_function(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
function = _require_single_function_by_name(program, args.name)
output = decompile_function(program, function, args.timeout)
return _emit(args, {"name": args.name, "decompiled": output}, output)
def command_disassemble_function(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
function = _require_function_by_address(program, args.address)
lines = disassemble_function(program, function)
if not lines:
code_unit = program.getListing().getCodeUnitAt(function.getEntryPoint())
lines = [
f"no instructions found in body {function.getBody().getMinAddress()} - {function.getBody().getMaxAddress()}; entry code unit = {code_unit}"
]
return _emit(args, {"address": args.address, "lines": lines}, "\n".join(lines))
def command_read_region(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
data = read_region_bytes(program, args.start, args.end)
text = f"REGION {args.start}..{args.end} BYTES {data.hex()}"
return _emit(args, {"start": args.start, "end": args.end, "bytes": data.hex()}, text)
def command_get_xrefs_to(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
references = get_xrefs_to(program, args.address, offset=args.offset, limit=args.limit)
text = _text_or_empty([
f"{reference['from']} -> {reference['to']} [{reference['type']}] operand={reference['operand_index']}"
for reference in references
], "no xrefs found")
return _emit(args, references, text)
def command_get_xrefs_from(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
references = get_xrefs_from(program, args.address, offset=args.offset, limit=args.limit)
text = _text_or_empty([
f"{reference['from']} -> {reference['to']} [{reference['type']}] operand={reference['operand_index']}"
for reference in references
], "no xrefs found")
return _emit(args, references, text)
def command_get_function_xrefs(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=True) as (_project, program):
function = _require_single_function_by_name(program, args.name)
references = get_xrefs_to(
program,
str(function.getEntryPoint()),
offset=args.offset,
limit=args.limit,
)
text = _text_or_empty([
f"{reference['from']} -> {reference['to']} [{reference['type']}] operand={reference['operand_index']}"
for reference in references
], "no xrefs found")
return _emit(args, references, text)
def command_run_script(config: ProjectConfig, args: argparse.Namespace) -> int:
script_path = Path(args.script).resolve()
if not script_path.is_file():
raise RuntimeError(f"script file not found: {script_path}")
with open_program(config, read_only=args.read_only) as (project, program):
script_globals = {
"config": config,
"project": project,
"program": program,
"helpers": {
"create_function": create_function,
"decompile_function": decompile_function,
"disassemble_function": disassemble_function,
"format_function_summary": format_function_summary,
"get_function": get_function,
"get_function_containing": get_function_containing,
"get_xrefs_from": get_xrefs_from,
"get_xrefs_to": get_xrefs_to,
"read_region_bytes": read_region_bytes,
"rename_function": rename_function,
"set_comment": set_comment,
},
}
run_script_file(script_path, script_globals)
if not args.read_only:
save_program(project, program)
return _emit(args, {"status": "ok", "script": str(script_path)}, f"ran script {script_path}")
def _load_plan(plan_path: str) -> dict:
with open(plan_path, "r", encoding="utf-8") as handle:
return json.load(handle)
def _print_plan(plan: dict) -> None:
print(json.dumps(plan, indent=2, sort_keys=True))
def command_apply_plan(config: ProjectConfig, args: argparse.Namespace) -> int:
plan = _load_plan(args.plan)
if args.dry_run:
if args.format == "json":
_print_plan(plan)
return 0
_print_plan(plan)
return 0
transaction_name = plan.get("transaction", f"Apply plan {args.plan}")
with open_program(config, read_only=False) as (project, program):
with transaction(program, transaction_name):
for entry in plan.get("remove_functions", []):
removed = remove_function(program, entry)
if not removed:
raise RuntimeError(f"no function removed at {entry}")
for entry in plan.get("rename_functions", []):
rename_function(program, entry["entry"], entry["name"])
for entry in plan.get("create_functions", []):
create_function(
program,
entry["entry"],
entry["name"],
entry.get("body_start"),
entry.get("body_end"),
)
if entry.get("comment"):
set_comment(
program,
entry["entry"],
entry["comment"],
entry.get("comment_type", "plate"),
)
for entry in plan.get("comments", []):
set_comment(
program,
entry["address"],
entry["text"],
entry.get("type", "plate"),
)
for entry in plan.get("assert_functions", []):
if get_function(program, entry) is None:
raise RuntimeError(f"expected function missing at {entry}")
save_program(project, program)
return _emit(args, {"status": "ok", "plan": args.plan}, f"applied plan {args.plan}")
def main(argv: list[str] | None = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
config = build_config(args)
command_map = {
"dump-region": command_dump_region,
"project-files": command_project_files,
"create-function": command_create_function,
"delete-function": command_delete_function,
"rename-function": command_rename_function,
"rename-function-by-address": command_rename_function,
"set-comment": command_set_comment,
"set-decompiler-comment": command_set_decompiler_comment,
"set-disassembly-comment": command_set_disassembly_comment,
"get-function-by-address": command_get_function_by_address,
"get-function-containing": command_get_function_containing,
"list-functions": command_list_functions,
"list-segments": command_list_segments,
"list-data-items": command_list_data_items,
"list-classes": command_list_classes,
"list-strings": command_list_strings,
"list-imports": command_list_imports,
"list-exports": command_list_exports,
"list-namespaces": command_list_namespaces,
"search-functions-by-name": command_search_functions_by_name,
"decompile-function": command_decompile_function,
"decompile-function-by-address": command_decompile_function_by_address,
"disassemble-function": command_disassemble_function,
"read-region": command_read_region,
"get-xrefs-to": command_get_xrefs_to,
"get-xrefs-from": command_get_xrefs_from,
"get-function-xrefs": command_get_function_xrefs,
"run-script": command_run_script,
"apply-plan": command_apply_plan,
}
return command_map[args.command](config, args)
if __name__ == "__main__":
raise SystemExit(main())

View file

@ -0,0 +1,547 @@
from __future__ import annotations
from contextlib import contextmanager
from dataclasses import dataclass
from pathlib import Path
import os
import sys
REPO_ROOT = Path(__file__).resolve().parents[2]
DEFAULT_INSTALL_DIR = Path(
os.environ.get("GHIDRA_INSTALL_DIR", r"I:\Apps\ghidra_11.3.2_PUBLIC")
)
DEFAULT_PROJECT_DIR = REPO_ROOT
DEFAULT_PROJECT_NAME = "Crusader"
DEFAULT_PROGRAM_NAME = "CRUSADER-RAW.EXE"
DEFAULT_FOLDER_PATH = "/"
@dataclass(frozen=True)
class ProjectConfig:
install_dir: Path = DEFAULT_INSTALL_DIR
project_dir: Path = DEFAULT_PROJECT_DIR
project_name: str = DEFAULT_PROJECT_NAME
program_name: str = DEFAULT_PROGRAM_NAME
folder_path: str = DEFAULT_FOLDER_PATH
restore_project: bool = False
def ensure_pyghidra_started(install_dir: Path | None = None):
import pyghidra
resolved_dir = Path(install_dir or DEFAULT_INSTALL_DIR)
if not pyghidra.started():
with suppress_process_output():
pyghidra.start(install_dir=resolved_dir)
return pyghidra
@contextmanager
def suppress_process_output():
with open(os.devnull, "w", encoding="utf-8") as devnull:
original_stdout = os.dup(1)
original_stderr = os.dup(2)
try:
sys.stdout.flush()
sys.stderr.flush()
os.dup2(devnull.fileno(), 1)
os.dup2(devnull.fileno(), 2)
yield
finally:
os.dup2(original_stdout, 1)
os.dup2(original_stderr, 2)
os.close(original_stdout)
os.close(original_stderr)
def parse_address_text(address_text: str) -> int:
text = address_text.strip()
if ":" in text:
segment_text, offset_text = text.split(":", 1)
return (int(segment_text, 16) << 16) + int(offset_text, 16)
return int(text, 0)
def to_address(program, address_text: str):
address_space = program.getAddressFactory().getDefaultAddressSpace()
return address_space.getAddress(parse_address_text(address_text))
def format_address(address) -> str:
return str(address)
def iter_java_items(items):
if hasattr(items, "hasNext") and hasattr(items, "next"):
while items.hasNext():
yield items.next()
return
for item in items:
yield item
def format_project_error(config: ProjectConfig, exc: Exception) -> RuntimeError:
lock_path = config.project_dir / f"{config.project_name}.lock"
details = [
f"unable to open project '{config.project_name}' in '{config.project_dir}'",
str(exc),
]
if lock_path.exists():
details.append(
f"project lock present at '{lock_path}'; close Ghidra or work on a project copy for write operations"
)
return RuntimeError("; ".join(details))
def open_project(config: ProjectConfig):
ensure_pyghidra_started(config.install_dir)
from ghidra.base.project import GhidraProject
try:
return GhidraProject.openProject(
str(config.project_dir),
config.project_name,
config.restore_project,
)
except Exception as exc: # pragma: no cover - depends on local Ghidra state
raise format_project_error(config, exc) from exc
def _candidate_folder_paths(folder_path: str) -> list[str]:
candidates = [folder_path]
for fallback in ("/", "\\", ""):
if fallback not in candidates:
candidates.append(fallback)
return candidates
@contextmanager
def open_program(config: ProjectConfig, read_only: bool):
project = open_project(config)
program = None
last_error = None
try:
for folder_path in _candidate_folder_paths(config.folder_path):
try:
program = project.openProgram(folder_path, config.program_name, read_only)
break
except Exception as exc: # pragma: no cover - depends on local Ghidra state
last_error = exc
if program is None:
raise RuntimeError(
f"unable to open program '{config.program_name}' from project '{config.project_name}': {last_error}"
)
yield project, program
finally:
if project is not None:
if program is not None:
project.close(program)
project.close()
@contextmanager
def transaction(program, description: str):
transaction_id = program.startTransaction(description)
commit = False
try:
yield
commit = True
finally:
program.endTransaction(transaction_id, commit)
def list_root_files(project) -> list[str]:
return [domain_file.getName() for domain_file in project.getRootFolder().getFiles()]
def get_function(program, entry_text: str):
return program.getFunctionManager().getFunctionAt(to_address(program, entry_text))
def get_function_containing(program, address_text: str):
return program.getFunctionManager().getFunctionContaining(to_address(program, address_text))
def read_region_bytes(program, start_text: str, end_text: str) -> bytes:
memory = program.getMemory()
start = to_address(program, start_text)
end = to_address(program, end_text)
size = end.subtract(start) + 1
if size < 0:
raise ValueError(f"invalid address range: {start_text}..{end_text}")
data = bytearray()
current = start
for _ in range(size):
data.append(int(memory.getByte(current)) & 0xFF)
current = current.next()
return bytes(data)
def iter_functions(program):
return program.getFunctionManager().getFunctions(True)
def function_signature(function) -> str:
return function.getPrototypeString(True, True)
def function_body_range(function) -> tuple[str, str]:
body = function.getBody()
return format_address(body.getMinAddress()), format_address(body.getMaxAddress())
def format_function_summary(function) -> str:
body_start, body_end = function_body_range(function)
return (
f"Function: {function.getName()} at {format_address(function.getEntryPoint())}\n"
f"Signature: {function_signature(function)}\n"
f"Entry: {format_address(function.getEntryPoint())}\n"
f"Body: {body_start} - {body_end}"
)
def list_segments(program, offset: int = 0, limit: int = 100):
memory = program.getMemory()
matches = []
skipped = 0
for block in memory.getBlocks():
if skipped < offset:
skipped += 1
continue
matches.append(
{
"name": block.getName(),
"start": format_address(block.getStart()),
"end": format_address(block.getEnd()),
"length": int(block.getSize()),
"initialized": bool(block.isInitialized()),
"read": bool(block.isRead()),
"write": bool(block.isWrite()),
"execute": bool(block.isExecute()),
}
)
if len(matches) >= limit:
break
return matches
def list_data_items(program, offset: int = 0, limit: int = 100):
listing = program.getListing()
matches = []
skipped = 0
for data in iter_java_items(listing.getDefinedData(True)):
if skipped < offset:
skipped += 1
continue
value = data.getValue()
matches.append(
{
"address": format_address(data.getAddress()),
"length": int(data.getLength()),
"mnemonic": data.getMnemonicString(),
"value": None if value is None else str(value),
}
)
if len(matches) >= limit:
break
return matches
def list_classes(program, offset: int = 0, limit: int = 100):
from ghidra.program.model.symbol import SymbolType
symbol_table = program.getSymbolTable()
matches = []
skipped = 0
for symbol in iter_java_items(symbol_table.getDefinedSymbols()):
if symbol.getSymbolType() != SymbolType.CLASS:
continue
namespace = symbol.getObject()
parent = namespace.getParentNamespace() if namespace is not None else None
matches.append(
{
"name": symbol.getName(),
"parent": None if parent is None or parent.isGlobal() else parent.getName(),
}
)
matches.sort(key=lambda entry: (entry["parent"] or "", entry["name"]))
return matches[offset: offset + limit]
def search_functions_by_name(program, query: str, offset: int = 0, limit: int = 100):
lowered = query.lower()
matches = []
skipped = 0
for function in iter_java_items(iter_functions(program)):
if lowered not in function.getName().lower():
continue
if skipped < offset:
skipped += 1
continue
matches.append(function)
if len(matches) >= limit:
break
return matches
def get_functions_by_exact_name(program, name: str):
matches = []
for function in iter_java_items(iter_functions(program)):
if function.getName() == name:
matches.append(function)
return matches
def create_function(program, entry_text: str, name: str, body_start: str | None, body_end: str | None):
from ghidra.program.model.address import AddressSet
from ghidra.program.model.symbol import SourceType
entry_address = to_address(program, entry_text)
body_start_address = to_address(program, body_start or entry_text)
body_end_address = to_address(program, body_end or entry_text)
body = AddressSet(body_start_address, body_end_address)
return program.getFunctionManager().createFunction(
name,
entry_address,
body,
SourceType.USER_DEFINED,
)
def remove_function(program, entry_text: str) -> bool:
return bool(program.getFunctionManager().removeFunction(to_address(program, entry_text)))
def rename_function(program, entry_text: str, new_name: str):
from ghidra.program.model.symbol import SourceType
function = get_function(program, entry_text)
if function is None:
raise ValueError(f"no function found at {entry_text}")
function.setName(new_name, SourceType.USER_DEFINED)
return function
def decompile_function(program, function, timeout_seconds: int = 30) -> str:
from ghidra.app.decompiler import DecompInterface
from ghidra.util.task import ConsoleTaskMonitor
interface = DecompInterface()
interface.openProgram(program)
try:
result = interface.decompileFunction(function, timeout_seconds, ConsoleTaskMonitor())
if not result.decompileCompleted():
error_message = result.getErrorMessage() or "decompilation did not complete"
raise RuntimeError(error_message)
decompiled = result.getDecompiledFunction()
if decompiled is None:
raise RuntimeError("decompiler returned no function text")
return decompiled.getC()
finally:
interface.dispose()
def disassemble_function(program, function) -> list[str]:
from ghidra.program.model.listing import CodeUnit
listing = program.getListing()
lines = []
for instruction in iter_java_items(listing.getInstructions(function.getBody(), True)):
line = f"{format_address(instruction.getAddress())}: {instruction.toString()}"
if instruction.getFlowType().isCall():
references = instruction.getReferencesFrom()
if references:
target = references[0].getToAddress()
target_function = program.getFunctionManager().getFunctionAt(target)
if target_function is not None:
line += f" -> {target_function.getName()} @ {format_address(target)}"
else:
line += f" -> {format_address(target)}"
comment = instruction.getComment(CodeUnit.EOL_COMMENT)
if comment:
line += f" ; {comment}"
lines.append(line)
return lines
def _reference_dict(reference) -> dict[str, str | int]:
return {
"from": format_address(reference.getFromAddress()),
"to": format_address(reference.getToAddress()),
"type": str(reference.getReferenceType()),
"operand_index": int(reference.getOperandIndex()),
}
def get_xrefs_to(program, address_text: str, offset: int = 0, limit: int = 100) -> list[dict[str, str | int]]:
reference_manager = program.getReferenceManager()
target_address = to_address(program, address_text)
results = []
skipped = 0
for reference in iter_java_items(reference_manager.getReferencesTo(target_address)):
if skipped < offset:
skipped += 1
continue
results.append(_reference_dict(reference))
if len(results) >= limit:
break
return results
def get_xrefs_from(program, address_text: str, offset: int = 0, limit: int = 100) -> list[dict[str, str | int]]:
reference_manager = program.getReferenceManager()
source_address = to_address(program, address_text)
results = []
skipped = 0
for reference in iter_java_items(reference_manager.getReferencesFrom(source_address)):
if skipped < offset:
skipped += 1
continue
results.append(_reference_dict(reference))
if len(results) >= limit:
break
return results
def list_strings(program, offset: int = 0, limit: int = 2000, filter_text: str | None = None):
listing = program.getListing()
matches = []
skipped = 0
lowered_filter = filter_text.lower() if filter_text else None
for data in iter_java_items(listing.getDefinedData(True)):
if not data.hasStringValue():
continue
text = str(data.getValue())
if lowered_filter and lowered_filter not in text.lower():
continue
if skipped < offset:
skipped += 1
continue
matches.append(
{
"address": format_address(data.getAddress()),
"length": int(data.getLength()),
"text": text,
}
)
if len(matches) >= limit:
break
return matches
def list_imports(program, offset: int = 0, limit: int = 100):
external_manager = program.getExternalManager()
matches = []
skipped = 0
for library_name in external_manager.getExternalLibraryNames():
for location in iter_java_items(external_manager.getExternalLocations(library_name)):
if skipped < offset:
skipped += 1
continue
label = location.getLabel()
address = location.getAddress()
matches.append(
{
"library": str(library_name),
"label": str(label) if label is not None else None,
"address": format_address(address) if address is not None else None,
}
)
if len(matches) >= limit:
return matches
return matches
def list_exports(program, offset: int = 0, limit: int = 100):
symbol_table = program.getSymbolTable()
function_manager = program.getFunctionManager()
matches = []
skipped = 0
for address in iter_java_items(symbol_table.getExternalEntryPointIterator()):
if skipped < offset:
skipped += 1
continue
function = function_manager.getFunctionAt(address)
primary_symbol = symbol_table.getPrimarySymbol(address)
matches.append(
{
"address": format_address(address),
"name": function.getName() if function is not None else (primary_symbol.getName() if primary_symbol is not None else None),
"kind": "function" if function is not None else (str(primary_symbol.getSymbolType()) if primary_symbol is not None else "unknown"),
}
)
if len(matches) >= limit:
break
return matches
def list_namespaces(program, offset: int = 0, limit: int = 100):
from ghidra.program.model.symbol import SymbolType
symbol_table = program.getSymbolTable()
matches = []
skipped = 0
for symbol in iter_java_items(symbol_table.getDefinedSymbols()):
symbol_type = symbol.getSymbolType()
if symbol_type not in (SymbolType.NAMESPACE, SymbolType.CLASS, SymbolType.LIBRARY):
continue
namespace = symbol.getObject()
parent = namespace.getParentNamespace() if namespace is not None else None
if parent is not None and parent.isGlobal():
parent_name = None
else:
parent_name = parent.getName() if parent is not None else None
if skipped < offset:
skipped += 1
continue
matches.append(
{
"name": symbol.getName(),
"type": str(symbol_type),
"parent": parent_name,
}
)
if len(matches) >= limit:
break
return matches
def run_script_file(script_path: Path, globals_dict: dict):
script_globals = dict(globals_dict)
script_globals.setdefault("__name__", "__main__")
script_globals.setdefault("__file__", str(script_path))
code = compile(script_path.read_text(encoding="utf-8"), str(script_path), "exec")
exec(code, script_globals, script_globals)
return script_globals
def set_comment(program, address_text: str, comment: str, comment_type: str):
from ghidra.program.model.listing import CodeUnit
comment_types = {
"pre": CodeUnit.PRE_COMMENT,
"plate": CodeUnit.PLATE_COMMENT,
"eol": CodeUnit.EOL_COMMENT,
"repeatable": CodeUnit.REPEATABLE_COMMENT,
"post": CodeUnit.POST_COMMENT,
}
if comment_type not in comment_types:
raise ValueError(f"unsupported comment type: {comment_type}")
listing = program.getListing()
target_address = to_address(program, address_text)
code_unit = listing.getCodeUnitAt(target_address)
if code_unit is None:
function = program.getFunctionManager().getFunctionAt(target_address)
if function is not None:
function.setComment(comment)
return
raise ValueError(f"no code unit or function found at {address_text}")
code_unit.setComment(comment_types[comment_type], comment)
def save_program(project, program):
project.save(program)

50
validate_fixups.py Normal file
View file

@ -0,0 +1,50 @@
import json
with open(r'k:\ghidra\Crusader_Decomp\ne_reloc_fixups.json') as f:
fixups = json.load(f)
known_callf_addrs = {
'0007:101c': 'entity_ai_update_loop call#1 (entity_slot_fetch)',
'0007:1093': 'entity_ai_update_loop call#2 (entity_tick_dispatch)',
'0007:2261': 'snap_entity_to_ground call (ground snap thunk)',
'0007:27dc': 'anim_frame_update call#1 (completion_callback)',
'0007:281e': 'anim_frame_update call#3 (notify_frame_progress)',
'0007:2851': 'anim_frame_update call#4 (entity_sprite_advance)',
'0007:8666': 'entity_sync_tile_aux thunk (tile_type_notify)',
}
def ghidra_to_file(addr_str):
seg, off = addr_str.split(':')
return (int(seg, 16) << 16) + int(off, 16)
# Build a lookup dict by source_file_offset for speed
by_offset = {}
for f in fixups:
by_offset[f['source_file_offset']] = f
for addr, desc in sorted(known_callf_addrs.items()):
callf_file = ghidra_to_file(addr)
print(f"\n{addr} = {desc}")
print(f" CALLF file offset: 0x{callf_file:X}")
# The NE fixup offset points to where the patched value goes.
# For CALLF (9A xx xx xx xx), the operand is at addr+1.
# But the reloc chain offset is relative to segment start.
# Let's search for any fixup within +/-2 of both callf_file and callf_file+1
for delta in range(0, 5):
test_off = callf_file + delta
if test_off in by_offset:
m = by_offset[test_off]
tgt = m.get('target', '?')
tgt_g = m.get('target_ghidra', '?')
print(f" FOUND at +{delta}: file=0x{test_off:X} seg{m['source_seg']:03d}+0x{m['source_offset_in_seg']:04X}")
print(f" -> {tgt} (ghidra: {tgt_g})")
break
else:
print(f" NOT FOUND in range [+0..+4]")
# Show what segment this falls in
for s in range(1, 146):
entry = [x for x in fixups if x['source_seg'] == s]
if entry:
# not efficient but ok for debugging
pass