Add various scripts and JSON plans for Ghidra project

- Introduced `seg043_boundary_repair.json` to manage function boundaries in segment 043.
- Created `read_file.py` for reading and printing file content size.
- Added `resolve_bb4f.py` to resolve specific function call targets.
- Implemented `resolve_top_targets.py` to find resolved NE targets for top-called wrapper functions.
- Added `script_contents.txt` to summarize NE relocation far calls.
- Updated `tier4_ghidra.txt`, `tier4_ghidra_check.txt`, `tier4_output.txt`, and `tier4_result.txt` with function call statistics.
- Created `tier5_errors.txt` for error logging and `tier5_output.txt` for additional function call statistics.
- Established `tools` directory with helper scripts for the Ghidra project, including CLI and common functionalities.
- Implemented command-line interface in `cli.py` for various project operations.
- Added `common.py` for shared functions and configurations across tools.
- Introduced `validate_fixups.py` to validate NE relocation fixups against known addresses.
This commit is contained in:
MaddoScientisto 2026-03-20 23:50:39 +01:00
commit 24d4416003
36 changed files with 145712 additions and 14 deletions

View file

@ -31,6 +31,15 @@ applyTo: "**"
- Record raw-import addresses alongside original segment-relative offsets when porting names.
- **Always use `rename_function_by_address`**`rename_function` (by name) fails with "must have required property 'old_name'" and is broken. Use `"function_address": "000c:XXXX"` format.
# PyGhidra Fallback
- Use the local PyGhidra toolkit in `tools/pyghidra_crusader` when MCP is missing an operation such as function creation, deletion, or batched scripted edits.
- The workspace-local Python environment for this toolkit is `.venv-pyghidra311`, created from `C:\Users\Maddo\.pyenv\pyenv-win\versions\3.11.6\python.exe` and installed from the bundled Ghidra 11.3.2 offline packages.
- Default install dir for the toolkit is `I:\Apps\ghidra_11.3.2_PUBLIC`.
- Invoke the toolkit with `\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader ...` from the repo root.
- Keep PyGhidra batches small too: prefer one focused repair plan or 1-5 direct edits at a time.
- Write operations require the Ghidra project to open successfully. If `Crusader.lock` is present because the GUI owns the project, close Ghidra first or operate on a project copy.
# Current Verified Raw-Import Ports
- `0006:e5d0` = `cursor_update_hover` from seg001 `0x0060`

View file

@ -0,0 +1,112 @@
# PyGhidra Ghidra Ops
Use this skill when Ghidra MCP is missing a needed write operation and you need native CPython access to the Ghidra API for the local Crusader project.
## Use Cases
- Create or delete functions in `CRUSADER-RAW.EXE`.
- Apply small batched repairs driven by verified addresses.
- Add comments or rename functions by address from a repeatable JSON plan.
- Inspect project root files to confirm the program name/path before running edits.
## Workspace Defaults
- Ghidra install dir: `I:\Apps\ghidra_11.3.2_PUBLIC`
- Ghidra project dir: repo root
- Ghidra project name: `Crusader`
- Default program: `CRUSADER-RAW.EXE`
- Local Python env: `.venv-pyghidra311`
- CLI entrypoint: `.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader`
## Constraints
- Stay conservative. Use the same rename and batch-size rules as the main Ghidra workflow.
- Prefer one focused plan or 1-5 direct edits at a time.
- Write operations require the project to be openable for modification. If `Crusader.lock` is present because the GUI owns the project, close Ghidra first or work on a copy.
- Keep `crusader_decompilation_notes.md` updated after verified repair batches.
## Commands
List root project files:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader project-files
```
Delete a bad function object:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader delete-function --entry 0007:5b6f
```
Create a repaired function with an explicit body:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader create-function \
--entry 0007:5a90 \
--name seg043_func_0090 \
--body-start 0007:5a90 \
--body-end 0007:5b79 \
--plate-comment "Recovered from standalone seg043 boundary scan"
```
Rename a function by entry address:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader rename-function --entry 0006:02cc --name entity_class_get_flag20
```
Apply a small JSON plan:
```json
{
"transaction": "Repair seg043 boundaries",
"remove_functions": [
"0007:5b6f"
],
"create_functions": [
{
"entry": "0007:5a90",
"name": "seg043_func_0090",
"body_start": "0007:5a90",
"body_end": "0007:5b79",
"comment": "Recovered from standalone seg043 boundary scan"
},
{
"entry": "0007:5b7a",
"name": "seg043_func_017a",
"body_start": "0007:5b7a",
"body_end": "0007:5c1b"
},
{
"entry": "0007:5c1c",
"name": "seg043_func_021c",
"body_start": "0007:5c1c",
"body_end": "0007:5c80"
}
],
"comments": [
{
"address": "0007:5b6f",
"text": "Old auto-created split overlaps the earlier seg043:0090..0179 routine.",
"type": "plate"
}
]
}
```
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader apply-plan --plan .\seg043_repair.json
```
Dry-run a plan before touching the project:
```powershell
.\.venv-pyghidra311\Scripts\python.exe -m tools.pyghidra_crusader apply-plan --plan .\seg043_repair.json --dry-run
```
## Implementation Notes
- Address strings accept raw `SSSS:OOOO` form or plain integers such as `0x75a90`.
- The CLI tries a few root folder path variants when opening the program so it can tolerate minor project path differences.
- Plan files support `remove_functions`, `rename_functions`, `create_functions`, `comments`, and `assert_functions`.

4
.gitignore vendored
View file

@ -5,9 +5,13 @@ ghidra_*
*.swp
*.lock
*.lock~
.tmp_*
# IDE and OS files
.vscode/
.idea/
.DS_Store
Thumbs.db
# Local Python environments
.venv-pyghidra311/

View file

@ -30,6 +30,33 @@
- Naming note:
- `seg001` and `seg021` both contain a keyboard handler; in the full program database, the seg001 copy is named `seg001_input_keyboard_handler` to avoid a symbol collision with seg021 `input_keyboard_handler`.
### Address Space Layout in the Raw Import
Ghidra segment:offset `SSSS:OOOO` = flat address `SSSS * 0x10000 + OOOO`.
| Flat range | Content |
|---|---|
| `0x00000``0x36F6F` | Phar Lap 286 DOS extender (outer MZ stub code) |
| `0x36F70` | NE header (145-segment game image begins here in file) |
| `0x6E570`+ | NE game segments at their Phar Lap linear load addresses |
Mapping rule (verified for seg001 and seg021):
```
runtime_flat_base = NE_segment_file_offset + 0x36F70
```
Example: seg004 at file `0x40A00` → runtime `0x77970` → Ghidra `0007:7970`.
Functions at Ghidra `0003:XXXX` / `0004:XXXX` are **Phar Lap extender code** (flat < `0x40000` is below any game segment). Functions at `0006:E570`+ are game NE segments.
### `0000:ffff` — NE Fixup Placeholder (not a dispatcher)
`unresolved_far_thunk_dispatch` at `0000:ffff` is NOT a runtime function. Every `CALLF 0x0000:ffff` in the binary is a **different** external or inter-segment call patched by the NE loader at runtime. The decompiler body is garbled (it reads NE fixup-chain sentinel data). Decompiler comment added in Ghidra. See individual call sites for per-site behavioral annotations.
Known call-site classifications (by argument pattern):
- `PUSH DS; PUSH imm_ordinal; CALLF` — Phar Lap extender calling a runtime-imported procedure by ordinal
- `PUSH ptr_seg; PUSH ptr_off; CALLF` — inter-NE-segment function call (intra-game far call)
- Multiple typed pushes then CALLF — external C runtime / game subsystem call with normal args
### Latest Raw Full-EXE Porting Progress
- Newly ported and renamed into `CRUSADER-RAW.EXE` from verified `seg001` mapping (`base 0x6E570`):
@ -57,7 +84,9 @@
- Current verified behavior:
- `entity_sync_tile_aux_state` reads entity tile index at `+0x4`, toggles bit `0x04` in tile record `+0x59` based on entity byte `+0x54`, and copies entity word `+0x55` into tile record `+0x0d`.
- `entity_sync_tile_aux_if_linked` only performs the sync when entity link/pointer `+0x50/+0x52` is non-null.
- `entity_mark_dirty_and_sync_tile_aux` calls the linked-sync helper, sets entity flag bit `0x04` at `+0x42`, then enters the existing unresolved thunk path (`0000:ffff`).
- `entity_mark_dirty_and_sync_tile_aux` calls the linked-sync helper, sets entity flag bit `0x04` at `+0x42`, then calls through `0000:ffff` with args `(SS:&tile_index, entity[+0x57])` — annotated at `0007:8666` as `entity_tile_type_notify(tile_index_ptr, type_byte)`.
- New entity field found this pass:
- `entity[+0x57]` (byte) = entity type/class byte (passed to tile-type notification; meaning not yet fully established — adjacent to named fields `+0x54`/`+0x55`)
### Raw 0007 Gameplay Helper Batch (facing/direction)
@ -125,9 +154,25 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
}
```
#### Next RE target (to close remaining uncertainty)
#### Architectural Resolution: `unresolved_far_thunk_dispatch` / `0000:ffff`
- Recover the true callee behind `0000:ffff` for the `0007:224b` call site by relocation/import-table reconstruction or by matching this call path in a cleaner segment-mapped database. That should reveal exact per-slot use of the two dispatch tables and final coordinate math.
**`unresolved_far_thunk_dispatch` is NOT a real dispatcher.** It is the NE binary fixup placeholder.
- In a Phar Lap 286 NE executable, inter-segment and external far calls are stored in the binary as `CALLF 0x0000:ffff` (or similar invalid sentinel values).
- The Phar Lap NE loader patches each of these call sites to the real segment:offset at load time using the per-segment relocation records in the NE file.
- In Ghidra's raw import, those fixups are never applied. Every unresolved far call collapses to the same `0000:ffff` stub, where the decompiler produces garbled output (it's reading fixup-chain data, not real instructions).
- **Each `CALLF 0x0000:ffff` in the binary is a DIFFERENT call with a DIFFERENT actual target.** Identifying the target requires either parsing the NE relocation table or cross-matching with the resolved standalone segment extracts.
Address layout in the raw import (flat_address = `SSSS:OOOO` where flat = `SSSS * 0x10000 + OOOO`):
- `0000:` `0003:` (flat < `0x40000`) = Phar Lap 286 DOS extender code (the outer MZ stub portion)
- `0006:E570` onwards = NE game segments (seg001+ at their Phar Lap-assigned linear addresses)
- Mapping rule verified: `runtime_flat = NE_segment_file_offset + 0x36F70` (the NE header offset in the EXE)
Decompiler comment added to `0000:ffff` in Ghidra documenting this.
#### Next RE targets for `snap_entity_to_ground`
- The `0007:224b` thunk call is an intra-NE inter-segment call (calling into a different game segment with ground-aligned coordinate math). Identifying it requires the NE relocation table or matching the disassembly in the standalone extracts.
### Raw 0007 Gameplay Helper Follow-up: AI sweep + checked spawn path
@ -150,8 +195,11 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
- Added disassembly + decompiler comments capturing stable behavior:
- Reads player entity FAR pointer from global `0x2de4`.
- Copies player world position fields (`+0x40`, `+0x42`) into globals `0x27e7` / `0x27e9` (AI focus position cache used by downstream logic).
- Iterates entity IDs from `2` through `255` and dispatches per-entity processing through the shared thunk path.
- This function now has enough recovered semantics to treat it as the frame-level AI sweep dispatcher even though individual thunked callees remain unresolved in the raw import.
- Iterates entity IDs from `2` through `255` and dispatches per-entity processing through two sequential thunked calls per entity.
- New disassembly comments added at both dispatch call sites:
- `0007:101c`: `entity_slot_fetch(SS:&entity_id)` — first call; resolves entity slot/pointer from loop ID
- `0007:1093`: `entity_tick_dispatch(SS:&entity_id, g_0x27c8)` — second call; per-entity AI tick with global `0x27c8` mode/context word
- Global `0x27c8` is now confirmed as the current targeted/current entity handle: `entity_is_type_match` compares against it directly, and both spawn helpers `map_find_spawn_point` / `enemy_spawn_at_position` snapshot it before their thunked core paths.
### Raw 0007 Gameplay Logic: animation / range / command globals
@ -172,14 +220,18 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
- `g_speed_double_flag` (`0x27fd`) — doubles speed_factor to 2 when set (fast game mode).
- Local variables renamed: `speed_factor` (1 or 2) and `advance_steps` (04, number of frame advances this tick).
- Entity struct fields confirmed (relative to `entity_ptr` as `int*`):
- `[0x1b]` = frame_min (backward direction counter)
- `[0x1c]` = frame_max
- `[0x1d]` = current_frame
- `[0x1e]` = loop_flag
- `[0x1f]` = reverse_direction_flag
- `+0x3f` (as `char*`) = completion handle/sentinel (`-1` = none, `0x2802` = player entity)
- On frame overflow: if completion handle valid and not player-entity, fires thunked event; calls vtable `[+8]` method.
- Added decompiler comment at function entry explaining all fields and behavior.
- `[0x1b]` (byte `+0x36`) = frame_min (backward direction counter)
- `[0x1c]` (byte `+0x38`) = frame_max
- `[0x1d]` (byte `+0x3a`) = current_frame
- `[0x1e]` (byte `+0x3c`) = loop_flag (0 = animation disabled)
- `[0x1f]` (byte `+0x3e`) = reverse_direction_flag / double-speed flag
- `+0x3f` (word, byte-offset) = completion handle/sentinel (`-1` = none, `0x2802` = player entity)
- `+0x00` (far ptr) = vtable pointer
- New disassembly comments added at all three `CALLF 0x0000:ffff` sites and the vtable indirect call:
- `0007:27dc`: `entity_completion_callback(handle)` — fires when loop wraps; skips player handle
- `0007:27fd`: vtable indirect `entity->vtable[+8](entity, 0, 0)``on_loop_complete` virtual method
- `0007:281e`: `notify_frame_progress(handle, current_frame)` — per-frame notification
- `0007:2851`: `entity_sprite_advance(entity_far_ptr, advance_amount, 0)` — core frame-advance call; advance_amount = `entity[+0x3c] * (steps+1) * speed_factor`
#### `entity_command_dispatch` (`0007:0990`) — partially decompiled
@ -191,10 +243,24 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
- Dispatches entity command through shared thunk; actual command table data not yet resolved.
- No incoming XREFs found in the raw import (likely called via table or vtable dispatch).
#### Enemy spawn helper cluster (`0007:505d`, `0007:5259`, `0007:5275`, `0007:5291`)
- Existing raw names align with prior standalone seg001 notes:
- `0007:505d` = `map_find_spawn_point` (`seg001 + 0x6aed`)
- `0007:5259` = `enemy_spawn_with_target` (`seg001 + 0x6ce9`)
- `0007:5275` = `enemy_spawn_no_target` (`seg001 + 0x6d05`)
- `0007:5291` = `enemy_spawn_at_position` (`seg001 + 0x6d21`)
- Current verified raw-import behavior:
- `enemy_spawn_with_target` is a thin wrapper over `enemy_spawn_at_position(..., target_player_flag = 1)`.
- `enemy_spawn_no_target` is the same wrapper but passes `target_player_flag = 0`.
- `map_find_spawn_point` and `enemy_spawn_at_position` both copy DS:`0x27c8` into locals before entering their unresolved thunk body, matching the standalone notes that treat `0x27c8` as the current targeted/current entity handle.
- Short decompiler comments were added in Ghidra on the raw spawn helpers to preserve this provenance.
#### Global map additions (renamed in Ghidra)
| Address | Name | Evidence |
|---------|------|---------|
| `0x27c8` | `g_current_entity_handle` | Compared directly by `entity_is_type_match`; also captured by `entity_ai_update_loop`, `map_find_spawn_point`, and `enemy_spawn_at_position` as the current targeted/current entity handle |
| `0x2de4` | `g_player_entity_farptr` | FAR ptr to player entity; `+0x40`/`+0x42` are world X/Y |
| `0x27e7` | `g_ai_focus_pos_x` | Set by `entity_ai_update_loop` from player entity `+0x40` |
| `0x27e9` | `g_ai_focus_pos_y` | Set by `entity_ai_update_loop` from player entity `+0x42` |
@ -219,12 +285,14 @@ void snap_entity_to_ground(entity_type, spawn_x, spawn_y, spawn_layer) {
- `000e:35ef` = `record_table_next_slot`
- `000e:3639` = `record_table_parse_buffer`
- `000e:3798` = `record_parser_read_line`
- `000e:38a0` = `record_parser_seek_next_marker`
- `000e:38f8` = `record_parser_find_marker`
- `000e:39cc` = `record_parser_dispatch_at_directive`
- Current behavior read from raw-import decompilation/disassembly:
- `record_table_init` clears the table header and zeroes 300 words of inline storage.
- `record_table_parse_buffer` walks a CRLF-separated text buffer, captures each line, splits around a marker helper path, and stores parsed entry state into 0x0c-byte records.
- `record_parser_read_line` advances to the next CRLF-delimited line, rejects lines that start with `@` or with non-identifier punctuation, and terminates the line in-place with `0`.
- `record_parser_seek_next_marker` updates the parser's current marker cursor at `+0x18/+0x1a` by calling `record_parser_find_marker`; returns 1 if another marker was found, 0 at end-of-data.
- `record_parser_find_marker` scans forward until an `@` marker or end-of-data; optionally consumes the remaining length from the parser state.
- `record_parser_dispatch_at_directive` returns `0` unless the current substring begins with `@`; in the `@` case, it advances by 7 bytes and dispatches through a FAR thunk (`0000:ffff`).
@ -758,7 +826,23 @@ A scroll/camera management cluster found in the `0007:bxxx0007:dxxx` range.
| Address | Name | Evidence |
|---------|------|---------|
| `0007:5b6f` | `entity_set_at_target_update_facing` | Sets entity `+0x3a = 1` (arrived flag); calls `entity_set_facing_direction`; clears bit `0x10` from entity type table `0x7e1e[type*0x79+0x59]`; tail-calls thunk to advance state. Called in the entity state machine context. |
| `0007:5b6f` | `entity_set_at_target_update_facing` *(likely internal block, not true top-level function)* | Direct raw-analysis name from the visible local behavior: sets entity `+0x3a = 1` (arrived flag); calls `entity_set_facing_direction`; clears bit `0x10` from entity type table `0x7e1e[type*0x79+0x59]`; then tail-calls onward. Relocation data places it at `seg043:016f`, and resolved call sites exist immediately before/after it (`5b36`, `5b44`, `5bb9`), so this address is likely an internal labeled block inside the larger missing `0007:5a00` seg043 function rather than a true entrypoint. |
### seg043 Standalone Boundary Recovery
- Direct disassembly of `NE_segments/seg043_code_off_75A00_len_336F.bin` shows the first non-zero bytes at offset `0x0090`; offsets `0x0000..0x008f` are all zero in the standalone extract.
- The first three clean 16-bit prologues in seg043 are at:
- `seg043:0090` -> raw `0007:5a90`
- `seg043:017a` -> raw `0007:5b7a`
- `seg043:021c` -> raw `0007:5c1c`
- The first recovered standalone function spans `0x0090..0x0179`, which means the current raw label at `0007:5b6f` falls inside the tail of that routine and overlaps the true return at raw `0007:5b79`.
- Practical consequence: the missing raw `0007:5a00` seg043 function boundary should not start at segment offset `0x0000`, and the current `0007:5b6f` function object should be treated as a mis-split internal block until Ghidra-side function creation/repair is available.
### Entity Class Flag Helper
| Address | Name | Evidence |
|---------|------|---------|
| `0006:02cc` | `entity_class_get_flag20` | Returns `((class_detail[type*0x79 + 0x59] & 0x20) >> 5)`. Conservative raw-analysis name; bit meaning still unknown, so the helper is named after the observed flag mask rather than a guessed behavior. |
### Animation Start Frame Helper
@ -1213,6 +1297,278 @@ Globals: `[0x63da]` = mouse button state, `[0x63d6]/[0x63d8]` = cursor X/Y, `[0x
| Address | Name | Evidence |
|---------|------|---------|
| `000c:dac1` | `cursor_nav_state_reset` | Zeros all directional/button flags; sets `[+0x32/+0x33]=0xff`, `[+0x47]=0xffff` |
## Top-40 Most-Called Far-Call Targets (NE Fixup Resolution)
Named via systematic analysis of 11,692 NE relocation fixup entries. These are the functions most frequently called through the `CALLF 0x0000:ffff` thunk mechanism.
### Tier 1: Top 20 (73+ callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 1 | `000a:44fd` | *(no function in Ghidra)* | 331 | Analysis gap at seg091:00fd. In comutils.c segment near joystick code. Needs manual function creation. |
| 2 | `0003:ac7e` | `mem_alloc` | 272 | Allocation wrapper → seg082:0000 (`0009:a200`) |
| 3 | `0008:dbec` | `entity_word_list_destroy` | 238 | Already named. Frees entity word-list buffer. |
| 4 | `0003:a751` | `mem_free` | 207 | Free wrapper → seg082:007a (`0009:a27a` = `mem_free_checked`) |
| 5 | `0008:bb4f` | `mem_alloc_far` | 174 | Thin wrapper → `mem_alloc` |
| 6 | `0003:a897` | `far_memcpy` | 165 | REP MOVSW + trailing MOVSB |
| 7 | `0005:088f` | `entity_get_type_word` | 130 | Returns type word from table 0x7df9 indexed by slot |
| 8 | `000b:358d` | `sprite_tree_accumulate_pos` | 122 | Recursively sums X/Y offsets (+0x21/+0x23) through linked child nodes (+0x19/+0x1b), copies 8-byte position block via far_memcpy |
| 9 | `0008:ce3d` | `entity_call_two_vtables` | 118 | Calls vtable[+4] at entity+0x1e and +0x28 |
| 10 | `0004:26cd` | `nop_void_stub` | 118 | Empty function, returns void |
| 11 | `0008:ce00` | `entity_call_two_vtables_base` | 117 | Calls vtable[0] at entity+0x1e and +0x28 |
| 12 | `0008:bb8c` | `entity_check_flag_0x4000` | 115 | Short-circuits if flag 0x4000 set at +0x16 |
| 13 | `0008:cda7` | `entity_free_both_word_lists` | 115 | Frees word lists at entity+0x1e and +0x28 if optional pointers at +0x24/+0x26 and +0x2e/+0x30 non-null. Both call `entity_word_list_free_existing`. |
| 14 | `0004:26d2` | `nop_void_stub_b` | 111 | Empty function, returns void |
| 15 | `000a:45fe` | `runtime_init_or_abort` | 108 | Reentrancy-guarded init. Flag at 0x44a4; flushes via FUN_000a_4a56, then calls `crt_exit_wrapper(1)`. Hidden code gap 0x4616-0x4643. |
| 16 | `0004:3324` | `nop_return_zero` | 95 | Returns 0 |
| 17 | `0009:c563` | `event_queue_push` | 82 | Circular buffer enqueue. Ring index (+0xe) masked 0x3f, slot masked 0xfff8. Writes event type word + data byte pair. |
| 18 | `0005:c448` | `list_remove_and_free` | 74 | Unlinks node from linked list via FUN_0005_c495, optionally calls `mem_free` if bit 0 of flags set |
| 19 | `000b:2e00` | *(no function in Ghidra)* | 74 | Analysis gap at seg109:0000. Needs manual function creation. |
| 20 | `0009:1f12` | `dos_file_lseek` | 73 | DOS LSEEK (INT 21h AH=42h) wrapper with error reporting to 0x867a |
### Tier 2: Ranks 21-40 (56-73 callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 21 | `0009:3600` | `rotating_buffer_advance` | 73 | Advances 5-slot circular counter at 0x3eb6, zeros pointer in table at 0x867c, dispatches via jump table |
| 22 | `0009:943a` | `entity_rect_compare_and_dispatch` | 68 | Compares bounding rectangles of two entities, dispatches based on flag bits 4/2/1 at +0x16 |
| 23 | `0009:1e61` | `dos_file_close` | 65 | DOS file close (INT 21h), error reporting, sets handle to -1 |
| 24 | `0005:e252` | *(unnamed — unclear)* | 65 | Copies 11 words from Phar Lap extender area (FUN_0000_12c6+5), then calls thunk. Interrupt/trampoline setup? |
| 25 | `0003:dbcc` | `crt_format_string` | 64 | MetaWare High C formatting wrapper. Calls FUN_0003_bb92 with runtime format dispatch table. |
| 26 | `0007:5a00` | *(no function in Ghidra)* | 64 | High-traffic raw target at `seg043:0000`. Earlier `debris_spawn` / seg001 mapping was rejected after checking relocation labels. Still needs manual function creation and direct analysis. |
| 27 | `000a:4742` | `assert_buffer_valid` | 63 | Validates handle: asserts param_2 == cookie at 0x45a6 and param_1 < limit at 0x87e0 |
| 28 | `0009:9216` | `entity_conditional_render_dispatch` | 63 | Checks entity flag bits 4 and 1 at +0x16, dispatches to vtable[+0xc] or thunk |
| 29 | `0008:cb2c` | `entity_flag20_clear_and_update_target` | 61 | *(already named)* Clears flag bit 0x20, writes target +0x12/+0x14, calls refresh |
| 30 | `0008:cb5c` | `entity_flag20_set_and_init_target` | 61 | *(already named)* Sets flag bit 0x20, inits target if zero, calls refresh |
| 31 | `0007:7306` | `entity_create_stack_object` | 58 | Allocates 0xCC bytes on stack, inits via `object_init_zero_fields` (0005:c400), calls thunk |
| 32 | `0007:8709` | `entity_mark_dirty_and_sync_tile_aux` | 58 | *(already named)* Syncs tile aux, sets flag bit 0x04 at +0x42 |
| 33 | `0007:87c5` | `entity_set_flag20_from_field42` | 58 | Reads entity+0x42/+0x44, calls `entity_flag20_set_and_init_target` with those values |
| 34 | `0007:8508` | `entity_table_lookup_and_dispatch` | 58 | *(already named)* Searches table at 0x2b46, dispatches via indirect jump |
| 35 | `0007:8920` | `entity_call_vtable_slot0c` | 58 | *(already named)* Calls vtable entry at +0x0c |
| 36 | `000a:b988` | `sprite_node_get_or_traverse` | 57 | If child pointer at +0x19/+0x1b non-null, traverses; otherwise returns leaf value |
| 37 | `0003:a98b` | `crt_signed_div32` | 56 | Entry: adjusts near→far stack, sets CX=0 (signed quotient), jumps to `crt_div32_impl` |
| 38 | `000a:7b44` | `nop_return_void_a` | 56 | Empty function (default vtable slot?) |
| 39 | `000a:7b49` | `nop_return_void_b` | 56 | Empty function (default vtable slot?) |
| 40 | `000a:7b53` | `nop_return_void_c` | 56 | Empty function (default vtable slot?) |
### Supporting Functions Discovered
| Address | Name | Description |
|---------|------|-------------|
| `000b:3a00` | `sprite_tree_sum_x_offset` | Recursive: sums field +0x21 through child chain +0x19/+0x1b |
| `000b:3a35` | `sprite_tree_sum_y_offset` | Recursive: sums field +0x23 through child chain +0x19/+0x1b |
| `0003:a845` | `crt_exit_wrapper` | Calls `crt_exit_impl(param,0,0)` |
| `0003:a7ee` | `crt_exit_impl` | Full C exit: atexit handlers, stdio flush, MetaWare runtime cleanup |
| `0003:a9a8` | `crt_div32_impl` | 32-bit division core. CX flags: bit0=unsigned, bit1=modulo, bit2=negate |
| `0005:c400` | `object_init_zero_fields` | Zeros fields +0x25, +0x29, +0x31, +0x32 of a struct. Returns pointer. |
| `000a:4440` | `joystick_read_axes_and_buttons` | Reads PC game port 0x201. Times axis responses, reads button nibble to 0x44a2 |
| `000b:3380` | `sprite_node_is_dirty` | Checks flags at obj+0x29 & 3 == 1 or 3 → returns bool |
| `000b:33a6` | `sprite_node_mark_dirty` | If not dirty, calls FUN_000b_3965 with mode=3 to invalidate |
### Tier 3: Ranks 41-60 (42-56 callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 41 | `000a:7b58` | `nop_return_zero_b` | 56 | Returns 0 (default vtable slot) |
| 42 | `000b:3ab2` | `sprite_node_dispatch_event` | 56 | Large event dispatch: checks event type (2/4/8/0x100), updates global focus ptr at [0x4fd0:4fd2], dispatches via vtable methods [+0x14/+0x18/+0x20/+0x24] by event code. Switch table for 16 event types. |
| 43 | `000a:48ff` | *(no function in Ghidra)* | 55 | Analysis gap in comutils.c segment |
| 44 | `000b:3362` | `sprite_tree_unwind_check` | 55 | Validates SS == param_2 (stack segment guard), then decrements global counter at [0x4fd6] |
| 45 | `000b:40ee` | `sprite_node_update_and_dispatch` | 55 | If `sprite_node_is_dirty` returns false: marks dirty, calcs accumulated bounds via `sprite_tree_get_accumulated_bounds` (3ed8), then dispatches via thunk |
| 46 | `000a:7b5f` | `vtable_stub_trampoline` | 55 | Calls through fixup thunk (forwarder to another function) |
| 47 | `000a:7b78` | `nop_return_void_e` | 55 | Empty function (default vtable slot) |
| 48 | `000a:7b7d` | `nop_return_void_f` | 55 | Empty function (default vtable slot) |
| 49 | `000a:7b4e` | `nop_return_void_d` | 54 | Empty function (default vtable slot) |
| 50 | `000b:330c` | `sprite_tree_dispatch_wrapper` | 52 | Pure thunk wrapper: calls through fixup |
| 51 | `0009:2034` | `dos_file_seek` | 51 | INT 21h AH=42h (LSEEK). Takes file object ptr, extracts handle at obj+4, seeks to offset param. Error reporting to [0x867a]. |
| 52 | `0005:0466` | `entity_resolve_slot_ptr` | 50 | *(already named)* |
| 53 | `0003:a880` | *(no function in Ghidra)* | 49 | Analysis gap in CRT segment |
| 54 | `0006:170c` | `tile_class_get_byte` | 47 | Looks up class data: indexes into table at [0x7e1e] by (*param_1 * 0x79), returns byte at offset +0xc |
| 55 | `000b:4097` | `sprite_dispatch_with_event` | 45 | Pushes event params + global [0x49c2:0x49c4], calls thunk |
| 56 | `0005:02c1` | `entity_is_type_match` | 43 | Compares *param_1 against global at [0x27c8], returns 1 if equal, 0 otherwise |
| 57 | `0003:ad75` | *(no function in Ghidra)* | 43 | Analysis gap in CRT segment |
| 58 | `000a:e709` | `render_dispatch_by_flag` | 43 | Dispatches between two thunk paths based on boolean flag at stack+0x10 |
| 59 | `0003:d0ff` | `crt_sprintf_wrapper` | 42 | Calls FUN_0003_bb92 (format engine) with rearranged params and string constant at 0x67ac |
| 60 | `000b:326e` | `sprite_node_destroy` | 42 | Destructor: sets vtable ptr to 0x501a, clears global [0x4fd0:4fd2] if self, releases child nodes, calls mem_free via thunk |
### Updated Analysis Gaps
`0007:5a00` / `0007:5b6f` reconciliation:
- The earlier standalone seg001 port hypothesis in this subrange was wrong.
- Relocation data places raw `0007:5a00` at `seg043:0000`, and the already-named helper at `0007:5b6f` sits at `seg043:016f`.
- Because of that segment placement, standalone seg001 names such as `debris_spawn` (`0x7490`) and `entity_die` (`0x75ff`) should NOT be ported into this raw range.
- `0007:5b6f` currently remains `entity_set_at_target_update_facing` from direct raw analysis; its behavioral name is no longer in conflict with the standalone seg001 `entity_die` note.
- Additional resolved call targets inside the missing seg043 block were annotated in Ghidra from relocation data:
- `0007:5a8a` -> `entity_set_event_type_checked`
- `0007:5a98` -> `FUN_0008_cc01` (timer-related flag/event helper; tests `+0x16 & 0x2`, sets `+0x16 |= 0x800`, copies event field `+0x06` to `+0x22`, checks `0x1000`, then conditionally dispatches)
- `0007:5b36` -> `entity_get_type_word`
- `0007:5b44` -> `saveslot_read_entry_flags`
- `0007:5bb8` -> `entity_is_type_match`
- `0007:5c49` -> `entity_class_get_flag20`
- `0007:5c8b` -> `mem_alloc_far`
- Current boundary caveat:
- Ghidra likely split the real seg043 routine incorrectly. `0007:5b6f` has no inbound xrefs, while relocation-resolved calls exist on both sides of it inside the same segment window. Treat the current `0007:5b6f` label as a behavioral anchor for one internal block, not yet as a proven standalone function boundary.
- Standalone seg043 disassembly now strengthens that conclusion: real prologues are at raw `0007:5a90`, `0007:5b7a`, and `0007:5c1c`, so the current `0007:5b6f` boundary demonstrably overlaps an earlier function.
| Address | NE Segment | Callers | Notes |
|---------|-----------|---------|-------|
| `000a:44fd` | seg091:00fd | 331 | #1 most-called target! In comutils.c segment. |
| `000b:2e00` | seg109:0000 | 74 | Start of segment 109. |
| `0007:5a00` | seg043:0000 | 64 | Start of segment 43. Earlier seg001 `debris_spawn` port was rejected; still needs manual function creation and direct analysis. |
| `000a:48ff` | seg091:04ff | 55 | In comutils.c segment near joystick code. |
| `0003:a880` | seg005:0880 | 49 | In CRT segment near `far_memcpy`. |
| `0003:ad75` | seg005:0d75 | 43 | In CRT segment near `mem_alloc`. |
| `000a:454d` | seg091:014d | 32 | In comutils.c segment. |
### Tier 4: Ranks 61-80 (29-42 callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 61 | `000b:30a5` | `sprite_tree_forward_wrapper` | 42 | Pure thunk forwarder |
| 62 | `0008:bc27` | `entity_set_event_type_checked` | 41 | *(pre-existing name)* Sets event code at +0x06 with range/timer checks |
| 63 | `0008:d214` | `entity_dispatch_entry_ctor_vtbl_3aa6` | 40 | *(pre-existing name)* Constructor: alloc 0x40, vtbl 3AA6, flag 0x200 |
| 64 | `0005:1565` | `entity_action_by_type_dispatch` | 39 | Checks entity type against whitelist (0x432,0x5a0,0x1fd,0x1fe,0x8f,0x59f,0x2b3,0x2ca), dispatches by flags at [0xc76] and [0x85f] |
| 65 | `0008:4bba` | `channel_slot_enable` | 39 | Sets enable byte=1 in 5-slot table at 0x84ca (slot * 0xd stride) |
| 66 | `0009:6f5a` | `vga_palette_write` | 38 | Writes RGB triplets to VGA DAC (port 0x3C8/0x3C9). Range param_2..param_3 from palette data at *param_1 |
| 67 | `0009:8ef6` | `line_draw_dispatch` | 38 | Compares abs(dx) vs abs(dy) to determine major axis, dispatches to appropriate line draw routine |
| 68 | `000a:7b30` | `nop_return_void_g` | 38 | Empty function (default vtable slot) |
| 69 | `000a:7b3f` | `nop_return_void_h` | 38 | Empty function (default vtable slot) |
| 70 | `0009:6e7f` | `palette_free_if_set` | 35 | Frees existing palette data if ptr non-null, checks alignment |
| 71 | `000a:7b35` | `nop_return_void_i` | 35 | Empty function (default vtable slot) |
| 72 | `0009:c433` | `event_queue_align_index` | 34 | Returns `param_1 & 0xFFF8` — aligns ring index to 8-byte event slot boundary |
| 73 | `0009:2156` | `dos_file_get_size` | 33 | Saves file position, does INT 21h AH=42h AL=02 (seek to end), restores position. Returns file size in DX:AX |
| 74 | `000a:2c41` | `list_iterate_next` | 33 | Linked list iterator: if *out==0 returns first from obj+2; else follows next at ptr+2/+4. Returns bool (has more) |
| 75 | `000a:454d` | *(no function in Ghidra)* | 32 | Analysis gap in comutils.c segment |
| 76 | `000b:2446` | `sprite_clear_redraw_flag` | 31 | Clears flag at obj+0x17e, then dispatches via thunk |
| 77 | `0005:1238` | `entity_get_class_word` | 30 | Looks up table at [0x7e01] indexed by *param_1 * 2, returns word. Sister of `entity_get_type_word` (which uses [0x7df9]) |
| 78 | `000b:1446` | `display_null_check_dispatch` | 30 | Null-checks far ptr params, dispatches to different thunks based on result |
| 79 | `000d:85da` | `map_object_set_dirty_flag` | 29 | Sets byte at global_obj[0x6828]+0x40 = 1 if global non-null, then calls thunk |
| 80 | `0005:1511` | `entity_destroy_trampoline` | 29 | Pure thunk forwarder to entity destruction |
---
## Deep Analysis: Coordinate Transform System
### `world_to_screen_coords` at `0004:e7bd` (NE seg018:07bd)
**Signature:**
```c
void world_to_screen_coords(int world_x, int world_y, int *screen_x, int *screen_y)
```
**Isometric Projection Math:**
```
screen_x = (world_x - world_y) / 2 - camera_x // SAR 1 (signed divide)
screen_y = (world_x + world_y) / 4 - camera_y // SHR 2 (unsigned divide)
```
Camera globals: `g_scroll_offset_x` (DS:0x2bb7), `g_scroll_offset_y` (DS:0x2bb9).
**Assembly detail:**
- `SAR AX, 1` for screen_x — signed arithmetic shift preserves sign for negative (world_x - world_y) differences
- `SHR AX, 2` for screen_y — unsigned logical shift (sum world_x + world_y is always positive)
- The 2:1 ratio (÷2 for X, ÷4 for Y) produces the classic 2:1 isometric diamond tile shape
**Coordinate axes on screen:**
- World X axis → lower-right on screen (+0.5 screen_x, +0.25 screen_y per world unit)
- World Y axis → lower-left on screen (-0.5 screen_x, +0.25 screen_y per world unit)
- Camera subtraction converts absolute world-space to viewport-relative screen coordinates
**Callers (17 across 8 NE segments):**
| Call site | NE Segment | Context |
|-----------|-----------|---------|
| `0004:7d6f` | seg012 | Map/tile rendering |
| `0005:0305` | seg021 | Entity system |
| `0005:432f` | seg021 | Entity placement |
| `0005:4457` | seg021 | Entity placement |
| `0005:6f8f` | seg022 | Entity rendering |
| `0005:7263` | seg022 | Entity rendering |
| `0007:2262` | seg040 | `snap_entity_to_ground` — ground alignment |
| `0007:237d` | seg040 | Ground snap dispatch |
| `0007:cf4e` | seg049 | Entity positioning |
| `0007:d039` | seg049 | Entity positioning |
| `0007:d43f` | seg049 | Entity positioning |
| `0007:d6fe` | seg049 | Entity positioning |
| `0008:3223` | seg053 | Entity-to-screen render setup |
| `0008:32e7` | seg053 | Entity-to-screen render setup |
| `0008:334b` | seg053 | Entity-to-screen render setup |
| `000b:858b` | seg115 | Sprite system |
| `000b:f100` | seg120 | Sprite system |
**Entity struct layout (from seg053 caller at `0008:31f6`):**
```
entity_array_base = far ptr at [DS:0x2cff]
entity_struct_size = 19 bytes (0x13)
entity.world_x = offset +0x0a (word)
entity.world_y = offset +0x0c (word)
```
### Comparison: Two Coordinate Transform Functions
| Property | `world_to_screen_coords` (0004:e7bd) | `world_to_screen_isometric` (0007:be67) |
|----------|---------------------------------------|----------------------------------------|
| Input type | Fine-grained world units (entity positions) | Coarse tile-grid units (map rendering) |
| screen_x | `(wx - wy) / 2 - cam_x` | `(wx + sx) + (wy + sy) * 2` |
| screen_y | `(wx + wy) / 4 - cam_y` | `(wy + sy) * 2 - (wx + sx)` |
| Camera handling | Subtracted after transform | Added before transform |
| Operations | Division (SAR/SHR) | Multiplication (SHL) |
| Aspect ratio | 2:1 (from /2 : /4) | 2:1 (from 1 : 2 multipliers) |
Both functions implement the same 2:1 isometric projection but at different coordinate scales. `world_to_screen_coords` divides down from fine world units while `world_to_screen_isometric` multiplies up from coarse tile units.
### Adjacent Function: `map_position_equal` at `0004:e784`
Compares two 5-byte `map_position` structs: `{ x:word, y:word, layer:byte }`. Returns 1 (AL) if all three fields match, 0 otherwise. Located immediately before `world_to_screen_coords` in seg018.
---
### Tier 5: Ranks 81-100 (25-29 callers)
| Rank | Address | Name | Calls | Description |
|------|---------|------|-------|-------------|
| 81 | `0009:1c00` | `dos_file_handle_init` | 29 | Inits 6-byte file handle struct: dword=0, word+4=0xFFFF (invalid). Aborts on null ptr |
| 82 | `0008:75f3` | `entity_get_ptr` | 29 | *(pre-existing)* Looks up entity far ptr from table at DS:0x39b0, indexed by id*4 |
| 83 | `0006:0208` | `entity_class_get_flag4` | 29 | Returns bit 2 of classinfo byte at [0x7e1e]+*p1*0x79+0x13 → 0 or 1 |
| 84 | `000a:30d7` | `list_node_set_if_context` | 29 | Sets node fields +2/+4 if params match context globals at 0x45a6/0x45a8 |
| 85 | `0009:c45f` | `object_init_and_get_next` | 29 | Calls `object_init_zero_fields` then returns *(result+2) — init+accessor combo |
| 86 | `0004:d7a0` | `object_deref_get_word4` | 28 | Dereferences far ptr chain: returns word at *(*(param_1)+4) |
| 87 | `000a:5276` | `debug_check_flag_45aa` | 28 | If byte at DS:0x45aa non-zero, calls thunk (diagnostic/assert check) |
| 88 | `0003:d94f` | `far_memset` | 28 | Wrapper reordering params for CRT memset impl at 0003:d92b (odd-aligned, word-fill loop) |
| 89 | `000a:7b3a` | `nop_return_void_j` | 28 | Empty function (default vtable slot) |
| 90 | `0008:ca18` | `entity_pair_sync_b` | 27 | *(pre-existing)* Pairwise sync wrapper direction B |
| 91 | `0008:bd20` | `entity_sprite_set_target_pos` | 27 | *(pre-existing)* Sets flag 0x1000, copies player pos to entity +0x0a/+0x0c |
| 92 | `0009:3ceb` | `buffer_release_and_dispatch` | 27 | Frees far ptr at obj+0x3b if set, nulls it; conditionally dispatches on bit 0 |
| 93 | `0005:09b4` | `entity_get_flags_byte` | 27 | Reads byte from [0x7dfd]+id, conditionally extends with classinfo byte at [0x7e1e]+id*0x79+0xf |
| 94 | `0005:0fbb` | `entity_lookup_sprite_word` | 27 | Returns word from [0x7e05]+*p1*2 — sprite/visual index table |
| 95 | `0008:d27e` | `entity_dispatch_trampoline_b` | 26 | Pure forwarder thunk (CALLF thunk only) |
| 96 | `0005:0376` | `entity_resolve_base_type` | 26 | Walks entity class hierarchy (bit 8 in [0x7e01]) via [0x7ded], returns base type from [0x7df1] |
| 97 | `000b:2492` | `sprite_redraw_if_needed` | 26 | If redraw flag at +0x17e is clear, calls update routine + thunk |
| 98 | `0003:e4d3` | `dos_file_open_wrapper` | 26 | Zeros output byte, delegates to file open impl at 0003:bb92 |
| 99 | `0005:033e` | `entity_resolve_base_parent` | 25 | Same hierarchy walk as `entity_resolve_base_type` but returns parent from [0x7ded] |
| 100 | `000a:87fd` | `render_clip_rect_to_viewport` | 25 | Clips 4 rect params to viewport bounds at [0x4014], sets dirty flag at 0x8a16, increments draw counter at 0x4716 |
**Entity Table Pointers (DS-relative, discovered in tier 5):**
| DS Offset | Type | Stride | Purpose |
|-----------|------|--------|---------|
| `0x7dfd` | byte[] | 1 | Entity flags byte (entity_get_flags_byte) |
| `0x7e01` | word[] | 2 | Entity class flags (bit 8 = has parent in hierarchy) |
| `0x7e05` | word[] | 2 | Entity sprite/visual index |
| `0x7ded` | word[] | 2 | Entity parent/hierarchy index |
| `0x7df1` | word[] | 2 | Entity base type word |
| `0x7e1e` | struct[] | 0x79 | Entity class detail records (121 bytes per class) |
### Analysis Gaps (No Function in Ghidra)
These high-traffic addresses need manual function creation in Ghidra (Script Manager or UI):
| Address | NE Segment | Callers | Notes |
|---------|-----------|---------|-------|
| `000a:44fd` | seg091:00fd | 331 | #1 most-called target! In comutils.c segment. |
| `000b:2e00` | seg109:0000 | 74 | Start of segment 109. |
| `0007:5a00` | seg043:0000 | 64 | Start of segment 43. Earlier seg001 `debris_spawn` port was rejected; still needs manual function creation and direct analysis. |
| `0009:a200` | seg082:0000 | - | Target of `mem_alloc`. Start of segment 82. |
| `000c:db68` | `cursor_nav_update_and_dispatch` | Calls `cursor_zone_quadrant_classify`; updates `[+0x37..+0x3a]`; reads `[0x63da]`; switch on direction (08); maps scancodes 0x48/0x50/0x4b/0x4d/0x39 |
| `000c:d3e9` | `cursor_set_ref_and_dispatch` | Null-checks param; sets `*param_1 = &DAT_0000_638e`; calls dispatch |
| `000c:d710` | `cursor_set_ref2_and_dispatch` | Same pattern; sets `*param_1 = &DAT_0000_6346` |

41
disasm_helper.py Normal file
View file

@ -0,0 +1,41 @@
import struct, os, sys
BIN_PATH = r'k:\ghidra\Crusader_Decomp\NE_segments\seg001_code_off_37600_len_8400.bin'
TARGET = 0x265B
with open(BIN_PATH, 'rb') as f:
f.seek(TARGET - 0x200)
data = f.read(0x280)
try:
import capstone
md = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_16)
for ins in md.disasm(data, TARGET - 0x200):
print(' 0x%04x: %s %s' % (ins.address, ins.mnemonic, ins.op_str))
if ins.address > TARGET + 0x40:
break
except ImportError:
print('capstone not available, trying ndisasm...')
import subprocess, tempfile
tmp = os.path.join(os.environ.get('TEMP', '.'), 'seg001_chunk.bin')
with open(tmp, 'wb') as f2:
f2.write(data)
result = subprocess.run(
['ndisasm', '-b', '16', '-o', '0x%x' % (TARGET - 0x200), tmp],
capture_output=True, text=True, timeout=15
)
if result.returncode == 0:
for line in result.stdout.split('\n'):
try:
addr = int(line.split()[0], 16)
if TARGET - 0x200 <= addr <= TARGET + 0x40:
print(line)
except:
pass
else:
print('ndisasm failed:', result.stderr)
# Fallback: hex dump
offset = TARGET - 0x200
for i in range(0, len(data), 16):
hexb = ' '.join('%02x' % b for b in data[i:i+16])
print('0x%04x: %s' % (offset+i, hexb))

11
get_tier4.py Normal file
View file

@ -0,0 +1,11 @@
from collections import Counter
c = Counter()
with open('ne_reloc_far_calls.tsv') as f:
next(f)
for line in f:
parts = line.strip().split('\t')
tgt = parts[1]
c[tgt] += 1
for i, (addr, cnt) in enumerate(c.most_common(100)):
if i >= 60 and i < 80:
print(f'{i+1:3d} {addr} {cnt}')

11
get_tier5.py Normal file
View file

@ -0,0 +1,11 @@
from collections import Counter
c = Counter()
with open('ne_reloc_far_calls.tsv') as f:
next(f)
for line in f:
parts = line.strip().split('\t')
tgt = parts[1]
c[tgt] += 1
for i, (addr, cnt) in enumerate(c.most_common(120)):
if i >= 80 and i < 100:
print(f'{i+1:3d} {addr} {cnt}')

11693
ne_reloc_far_calls.tsv Normal file

File diff suppressed because it is too large Load diff

120
ne_reloc_far_imports.tsv Normal file
View file

@ -0,0 +1,120 @@
source_ghidra target source_seg source_off_in_seg
0003:761e PHAPI.DOSCREATEDSALIAS seg001 0x001e
0003:76b1 DOSCALLS.38 seg001 0x00b1
0003:76be DOSCALLS.38 seg001 0x00be
0003:7795 DOSCALLS.89 seg001 0x0195
0003:77ab DOSCALLS.89 seg001 0x01ab
0003:f46e DOSCALLS.39 seg001 0x7e6e
0003:f51d DOSCALLS.40 seg001 0x7f1d
0003:f539 DOSCALLS.41 seg001 0x7f39
0003:f561 DOSCALLS.40 seg001 0x7f61
0003:f59c DOSCALLS.42 seg001 0x7f9c
0003:f6c9 DOSCALLS.42 seg001 0x80c9
0003:f851 PHAPI.DOSMAPREALSEG seg001 0x8251
0003:f88d DOSCALLS.39 seg001 0x828d
0003:f896 DOSCALLS.39 seg001 0x8296
0003:f8b3 PHAPI.DOSMAPREALSEG seg001 0x82b3
0003:f943 DOSCALLS.127 seg001 0x8343
0004:17c6 ASYLUM.36 seg004 0x0dc6
0004:17dc ASYLUM.28 seg004 0x0ddc
0004:19cf ASYLUM.45 seg004 0x0fcf
0004:25a5 ASYLUM.24 seg005 0x07a5
0004:6f26 ASYLUM.36 seg011 0x0126
0004:6f2e ASYLUM.28 seg011 0x012e
0004:6f4d ASYLUM.37 seg011 0x014d
0004:6f57 ASYLUM.29 seg011 0x0157
0004:70a2 ASYLUM.37 seg011 0x02a2
0004:70ad ASYLUM.29 seg011 0x02ad
0004:7136 ASYLUM.36 seg011 0x0336
0004:713e ASYLUM.28 seg011 0x033e
0004:715d ASYLUM.37 seg011 0x035d
0004:7167 ASYLUM.29 seg011 0x0367
0004:72af ASYLUM.37 seg011 0x04af
0004:72ba ASYLUM.29 seg011 0x04ba
0006:eba2 ASYLUM.36 seg039 0x09a2
0006:ebb5 ASYLUM.37 seg039 0x09b5
0006:ebc0 ASYLUM.36 seg039 0x09c0
0006:ebd3 ASYLUM.37 seg039 0x09d3
0008:67ee PHAPI._DosRealFarCall seg058 0x03ee
0008:6a7f PHAPI.DOSALLOCREALSEG seg059 0x007f
0008:6aad PHAPI.DOSALLOCREALSEG seg059 0x00ad
0008:6ae8 PHAPI._DosRealIntr seg059 0x00e8
0008:6b2e PHAPI.DOSMAPREALSEG seg059 0x012e
0008:9797 PHAPI.BORISREALINTR seg059 0x2d97
0008:97ac PHAPI.BORISREALINTR seg059 0x2dac
0008:a06b PHAPI._DosRealFarCall seg059 0x366b
0008:ebb2 ASYLUM.34 seg064 0x01b2
0008:ebba ASYLUM.33 seg064 0x01ba
0008:ebff ASYLUM.31 seg064 0x01ff
0008:ec18 ASYLUM.30 seg064 0x0218
0008:ec3c ASYLUM.32 seg064 0x023c
0008:f208 PHAPI.DOSMAPLINSEG seg065 0x0208
0008:f233 PHAPI.DOSMAPLINSEG seg065 0x0233
0008:f2bf PHAPI.DOSMAPLINSEG seg065 0x02bf
0009:080f DOSCALLS.7 seg068 0x000f
0009:0867 PHAPI.DOSALLOCREALSEG seg068 0x0067
0009:0899 PHAPI.DOSALLOCREALSEG seg068 0x0099
0009:08eb PHAPI.DOSALLOCREALSEG seg068 0x00eb
0009:0bc2 DOSCALLS.39 seg068 0x03c2
0009:0bd4 DOSCALLS.7 seg068 0x03d4
0009:0d7a DOSCALLS.39 seg068 0x057a
0009:0d8c DOSCALLS.39 seg068 0x058c
0009:0df3 PHAPI.DOSSETPASSTOPROTVEC seg068 0x05f3
0009:0ea6 PHAPI.DOSSETREALPROTVEC seg068 0x06a6
0009:0f4f PHAPI.DOSSETPROTVEC seg068 0x074f
0009:b363 PHAPI.DOSALLOCREALSEG seg082 0x1163
0009:b389 PHAPI.DOSALLOCREALSEG seg082 0x1189
0009:b40b PHAPI.DOSALLOCLINMEM seg082 0x120b
0009:b47a PHAPI.DOSALLOCLINMEM seg082 0x127a
0009:b491 PHAPI.DOSFREELINMEM seg082 0x1291
0009:b4f6 PHAPI.DOSFREELINMEM seg082 0x12f6
0009:b577 PHAPI.DOSALLOCLINMEM seg082 0x1377
0009:b598 PHAPI.DOSALLOCLINMEM seg082 0x1398
0009:b662 PHAPI.DOSALLOCLINMEM seg082 0x1462
0009:b748 PHAPI.DOSALLOCLINMEM seg082 0x1548
0009:b7b3 PHAPI.DOSALLOCLINMEM seg082 0x15b3
0009:b7d1 PHAPI.DOSFREELINMEM seg082 0x15d1
0009:ba35 DOSCALLS.39 seg082 0x1835
0009:ba50 DOSCALLS.39 seg082 0x1850
0009:ba97 PHAPI.DOSFREELINMEM seg082 0x1897
0009:bb5f PHAPI.DOSGETBIOSSEG seg082 0x195f
0009:bb71 PHAPI.DOSMAPREALSEG seg082 0x1971
0009:bb96 PHAPI.DOSMAPREALSEG seg082 0x1996
0009:bbdc PHAPI.DOSMAPLINSEG seg082 0x19dc
0009:bc32 PHAPI.DOSMAPLINSEG seg082 0x1a32
0009:bc57 PHAPI.DOSMAPLINSEG seg082 0x1a57
0009:bcb1 DOSCALLS.7 seg082 0x1ab1
0009:bdee DOSCALLS.7 seg082 0x1bee
0009:c542 PHAPI.DOSMAPLINSEG seg083 0x0142
000a:5746 ASYLUM.56 seg093 0x0146
000a:57de ASYLUM.58 seg093 0x01de
000a:57ea ASYLUM.37 seg093 0x01ea
000a:57f4 ASYLUM.29 seg093 0x01f4
000a:5801 ASYLUM.49 seg093 0x0201
000a:5810 ASYLUM.47 seg093 0x0210
000a:5817 ASYLUM.46 seg093 0x0217
000a:583e ASYLUM.57 seg093 0x023e
000a:5ed0 ASYLUM.25 seg094 0x00d0
000a:5fde ASYLUM.27 seg094 0x01de
000a:6022 ASYLUM.27 seg094 0x0222
000a:60cd ASYLUM.27 seg094 0x02cd
000a:6113 ASYLUM.25 seg094 0x0313
000a:61fe ASYLUM.25 seg094 0x03fe
000a:62f6 ASYLUM.25 seg094 0x04f6
000a:636f ASYLUM.23 seg094 0x056f
000c:11fd ASYLUM.28 seg122 0x0ffd
000c:120e ASYLUM.36 seg122 0x100e
000c:1521 ASYLUM.45 seg122 0x1321
000c:158d ASYLUM.45 seg122 0x138d
000c:25c1 ASYLUM.47 seg122 0x23c1
000c:25c8 ASYLUM.46 seg122 0x23c8
000c:2621 ASYLUM.29 seg122 0x2421
000c:2671 ASYLUM.29 seg122 0x2471
000c:26b8 ASYLUM.37 seg122 0x24b8
000c:2708 ASYLUM.37 seg122 0x2508
000d:9b3a ASYLUM.25 seg138 0x093a
000d:b1cc ASYLUM.27 seg138 0x1fcc
000e:090c ASYLUM.18 seg142 0x210c
000e:0960 ASYLUM.27 seg142 0x2160
000e:2592 ASYLUM.25 seg142 0x3d92
000e:259c ASYLUM.19 seg142 0x3d9c
1 source_ghidra target source_seg source_off_in_seg
2 0003:761e PHAPI.DOSCREATEDSALIAS seg001 0x001e
3 0003:76b1 DOSCALLS.38 seg001 0x00b1
4 0003:76be DOSCALLS.38 seg001 0x00be
5 0003:7795 DOSCALLS.89 seg001 0x0195
6 0003:77ab DOSCALLS.89 seg001 0x01ab
7 0003:f46e DOSCALLS.39 seg001 0x7e6e
8 0003:f51d DOSCALLS.40 seg001 0x7f1d
9 0003:f539 DOSCALLS.41 seg001 0x7f39
10 0003:f561 DOSCALLS.40 seg001 0x7f61
11 0003:f59c DOSCALLS.42 seg001 0x7f9c
12 0003:f6c9 DOSCALLS.42 seg001 0x80c9
13 0003:f851 PHAPI.DOSMAPREALSEG seg001 0x8251
14 0003:f88d DOSCALLS.39 seg001 0x828d
15 0003:f896 DOSCALLS.39 seg001 0x8296
16 0003:f8b3 PHAPI.DOSMAPREALSEG seg001 0x82b3
17 0003:f943 DOSCALLS.127 seg001 0x8343
18 0004:17c6 ASYLUM.36 seg004 0x0dc6
19 0004:17dc ASYLUM.28 seg004 0x0ddc
20 0004:19cf ASYLUM.45 seg004 0x0fcf
21 0004:25a5 ASYLUM.24 seg005 0x07a5
22 0004:6f26 ASYLUM.36 seg011 0x0126
23 0004:6f2e ASYLUM.28 seg011 0x012e
24 0004:6f4d ASYLUM.37 seg011 0x014d
25 0004:6f57 ASYLUM.29 seg011 0x0157
26 0004:70a2 ASYLUM.37 seg011 0x02a2
27 0004:70ad ASYLUM.29 seg011 0x02ad
28 0004:7136 ASYLUM.36 seg011 0x0336
29 0004:713e ASYLUM.28 seg011 0x033e
30 0004:715d ASYLUM.37 seg011 0x035d
31 0004:7167 ASYLUM.29 seg011 0x0367
32 0004:72af ASYLUM.37 seg011 0x04af
33 0004:72ba ASYLUM.29 seg011 0x04ba
34 0006:eba2 ASYLUM.36 seg039 0x09a2
35 0006:ebb5 ASYLUM.37 seg039 0x09b5
36 0006:ebc0 ASYLUM.36 seg039 0x09c0
37 0006:ebd3 ASYLUM.37 seg039 0x09d3
38 0008:67ee PHAPI._DosRealFarCall seg058 0x03ee
39 0008:6a7f PHAPI.DOSALLOCREALSEG seg059 0x007f
40 0008:6aad PHAPI.DOSALLOCREALSEG seg059 0x00ad
41 0008:6ae8 PHAPI._DosRealIntr seg059 0x00e8
42 0008:6b2e PHAPI.DOSMAPREALSEG seg059 0x012e
43 0008:9797 PHAPI.BORISREALINTR seg059 0x2d97
44 0008:97ac PHAPI.BORISREALINTR seg059 0x2dac
45 0008:a06b PHAPI._DosRealFarCall seg059 0x366b
46 0008:ebb2 ASYLUM.34 seg064 0x01b2
47 0008:ebba ASYLUM.33 seg064 0x01ba
48 0008:ebff ASYLUM.31 seg064 0x01ff
49 0008:ec18 ASYLUM.30 seg064 0x0218
50 0008:ec3c ASYLUM.32 seg064 0x023c
51 0008:f208 PHAPI.DOSMAPLINSEG seg065 0x0208
52 0008:f233 PHAPI.DOSMAPLINSEG seg065 0x0233
53 0008:f2bf PHAPI.DOSMAPLINSEG seg065 0x02bf
54 0009:080f DOSCALLS.7 seg068 0x000f
55 0009:0867 PHAPI.DOSALLOCREALSEG seg068 0x0067
56 0009:0899 PHAPI.DOSALLOCREALSEG seg068 0x0099
57 0009:08eb PHAPI.DOSALLOCREALSEG seg068 0x00eb
58 0009:0bc2 DOSCALLS.39 seg068 0x03c2
59 0009:0bd4 DOSCALLS.7 seg068 0x03d4
60 0009:0d7a DOSCALLS.39 seg068 0x057a
61 0009:0d8c DOSCALLS.39 seg068 0x058c
62 0009:0df3 PHAPI.DOSSETPASSTOPROTVEC seg068 0x05f3
63 0009:0ea6 PHAPI.DOSSETREALPROTVEC seg068 0x06a6
64 0009:0f4f PHAPI.DOSSETPROTVEC seg068 0x074f
65 0009:b363 PHAPI.DOSALLOCREALSEG seg082 0x1163
66 0009:b389 PHAPI.DOSALLOCREALSEG seg082 0x1189
67 0009:b40b PHAPI.DOSALLOCLINMEM seg082 0x120b
68 0009:b47a PHAPI.DOSALLOCLINMEM seg082 0x127a
69 0009:b491 PHAPI.DOSFREELINMEM seg082 0x1291
70 0009:b4f6 PHAPI.DOSFREELINMEM seg082 0x12f6
71 0009:b577 PHAPI.DOSALLOCLINMEM seg082 0x1377
72 0009:b598 PHAPI.DOSALLOCLINMEM seg082 0x1398
73 0009:b662 PHAPI.DOSALLOCLINMEM seg082 0x1462
74 0009:b748 PHAPI.DOSALLOCLINMEM seg082 0x1548
75 0009:b7b3 PHAPI.DOSALLOCLINMEM seg082 0x15b3
76 0009:b7d1 PHAPI.DOSFREELINMEM seg082 0x15d1
77 0009:ba35 DOSCALLS.39 seg082 0x1835
78 0009:ba50 DOSCALLS.39 seg082 0x1850
79 0009:ba97 PHAPI.DOSFREELINMEM seg082 0x1897
80 0009:bb5f PHAPI.DOSGETBIOSSEG seg082 0x195f
81 0009:bb71 PHAPI.DOSMAPREALSEG seg082 0x1971
82 0009:bb96 PHAPI.DOSMAPREALSEG seg082 0x1996
83 0009:bbdc PHAPI.DOSMAPLINSEG seg082 0x19dc
84 0009:bc32 PHAPI.DOSMAPLINSEG seg082 0x1a32
85 0009:bc57 PHAPI.DOSMAPLINSEG seg082 0x1a57
86 0009:bcb1 DOSCALLS.7 seg082 0x1ab1
87 0009:bdee DOSCALLS.7 seg082 0x1bee
88 0009:c542 PHAPI.DOSMAPLINSEG seg083 0x0142
89 000a:5746 ASYLUM.56 seg093 0x0146
90 000a:57de ASYLUM.58 seg093 0x01de
91 000a:57ea ASYLUM.37 seg093 0x01ea
92 000a:57f4 ASYLUM.29 seg093 0x01f4
93 000a:5801 ASYLUM.49 seg093 0x0201
94 000a:5810 ASYLUM.47 seg093 0x0210
95 000a:5817 ASYLUM.46 seg093 0x0217
96 000a:583e ASYLUM.57 seg093 0x023e
97 000a:5ed0 ASYLUM.25 seg094 0x00d0
98 000a:5fde ASYLUM.27 seg094 0x01de
99 000a:6022 ASYLUM.27 seg094 0x0222
100 000a:60cd ASYLUM.27 seg094 0x02cd
101 000a:6113 ASYLUM.25 seg094 0x0313
102 000a:61fe ASYLUM.25 seg094 0x03fe
103 000a:62f6 ASYLUM.25 seg094 0x04f6
104 000a:636f ASYLUM.23 seg094 0x056f
105 000c:11fd ASYLUM.28 seg122 0x0ffd
106 000c:120e ASYLUM.36 seg122 0x100e
107 000c:1521 ASYLUM.45 seg122 0x1321
108 000c:158d ASYLUM.45 seg122 0x138d
109 000c:25c1 ASYLUM.47 seg122 0x23c1
110 000c:25c8 ASYLUM.46 seg122 0x23c8
111 000c:2621 ASYLUM.29 seg122 0x2421
112 000c:2671 ASYLUM.29 seg122 0x2471
113 000c:26b8 ASYLUM.37 seg122 0x24b8
114 000c:2708 ASYLUM.37 seg122 0x2508
115 000d:9b3a ASYLUM.25 seg138 0x093a
116 000d:b1cc ASYLUM.27 seg138 0x1fcc
117 000e:090c ASYLUM.18 seg142 0x210c
118 000e:0960 ASYLUM.27 seg142 0x2160
119 000e:2592 ASYLUM.25 seg142 0x3d92
120 000e:259c ASYLUM.19 seg142 0x3d9c

132226
ne_reloc_fixups.json Normal file

File diff suppressed because it is too large Load diff

379
ne_reloc_parser.py Normal file
View file

@ -0,0 +1,379 @@
#!/usr/bin/env python3
"""
NE Relocation Table Parser for Crusader: No Remorse
====================================================
Reads the NE header + per-segment relocation entries from CRUSADER.EXE.
Resolves each CALLF 0x0000:FFFF fixup to its real inter-segment target.
Emits a mapping file suitable for Ghidra annotation.
NE binary: CRUSADER.EXE (bound MZ+NE, NE header at 0x36F70)
Raw import: Ghidra loads the whole file as flat RAM.
Ghidra flat address = file_offset (since it's a raw binary import)
Ghidra seg:off = (flat >> 16) : (flat & 0xFFFF)
"""
import struct, sys, os, json
from collections import defaultdict
EXE_PATH = r'k:\ghidra\Crusader_Decomp\CRUSADER.EXE'
NE_HEADER_OFFSET = 0x36F70 # e_lfanew from MZ header
# ── NE relocation entry address-type codes ──
ADDR_LOBYTE = 0
ADDR_SELECTOR = 2
ADDR_FARPTR = 3 # 16:16 far pointer ← this is CALLF target
ADDR_OFFSET = 5
ADDR_48PTR = 11
ADDR_OFFSET32 = 13
# ── NE relocation entry relocation-type codes ──
REL_INTERNAL = 0 # intra-module (segment:offset)
REL_IMPORTORD = 1 # imported by ordinal
REL_IMPORTNAM = 2 # imported by name
REL_OSFIXUP = 3 # OS fixup
ADDR_TYPE_NAMES = {
0: 'lobyte', 2: 'selector', 3: 'far_ptr_16:16',
5: 'offset16', 11: 'ptr_48', 13: 'offset32'
}
REL_TYPE_NAMES = {
0: 'internal', 1: 'import_ordinal', 2: 'import_name', 3: 'osfixup'
}
def read_u8(data, off):
return data[off]
def read_u16(data, off):
return struct.unpack_from('<H', data, off)[0]
def read_u32(data, off):
return struct.unpack_from('<I', data, off)[0]
def parse_ne_header(data, ne_off):
"""Parse key fields from the NE header."""
magic = data[ne_off:ne_off+2]
assert magic == b'NE', f"Bad NE magic at 0x{ne_off:X}: {magic}"
hdr = {}
hdr['linker_ver'] = read_u8(data, ne_off + 2)
hdr['linker_rev'] = read_u8(data, ne_off + 3)
hdr['entry_table_off'] = read_u16(data, ne_off + 4) + ne_off
hdr['entry_table_len'] = read_u16(data, ne_off + 6)
hdr['flags'] = read_u16(data, ne_off + 12)
hdr['auto_data_seg'] = read_u16(data, ne_off + 14)
hdr['seg_table_off'] = read_u16(data, ne_off + 34) + ne_off
hdr['resource_table_off'] = read_u16(data, ne_off + 36) + ne_off
hdr['resident_name_off'] = read_u16(data, ne_off + 38) + ne_off
hdr['module_ref_off'] = read_u16(data, ne_off + 40) + ne_off
hdr['imported_name_off'] = read_u16(data, ne_off + 42) + ne_off
hdr['nonresident_name_off'] = read_u32(data, ne_off + 44)
hdr['moveable_entries'] = read_u16(data, ne_off + 48)
hdr['alignment_shift'] = read_u16(data, ne_off + 50)
hdr['num_resource_segs'] = read_u16(data, ne_off + 52)
hdr['target_os'] = read_u8(data, ne_off + 54)
hdr['num_segments'] = read_u16(data, ne_off + 44 - 10) # field at offset 0x1C
# Actually let me re-check the NE header layout more carefully
# NE header fields (offsets relative to NE signature):
# 0x1C = number of segments
# 0x22 = offset of segment table (relative to NE header)
# 0x32 = alignment shift count
hdr['num_segments'] = read_u16(data, ne_off + 0x1C)
hdr['seg_table_off'] = read_u16(data, ne_off + 0x22) + ne_off
hdr['alignment_shift'] = read_u16(data, ne_off + 0x32)
hdr['module_ref_off'] = read_u16(data, ne_off + 0x28) + ne_off
hdr['imported_name_off'] = read_u16(data, ne_off + 0x2A) + ne_off
hdr['num_module_refs'] = read_u16(data, ne_off + 0x1E)
return hdr
def parse_segment_table(data, hdr):
"""Parse the NE segment table entries (8 bytes each)."""
segments = []
off = hdr['seg_table_off']
shift = hdr['alignment_shift']
for i in range(hdr['num_segments']):
sector_off = read_u16(data, off)
seg_len = read_u16(data, off + 2)
seg_flags = read_u16(data, off + 4)
min_alloc = read_u16(data, off + 6)
file_offset = sector_off << shift if sector_off != 0 else 0
has_reloc = bool(seg_flags & 0x0100)
# Fix zero length = 64K
if seg_len == 0 and sector_off != 0:
seg_len = 0x10000
segments.append({
'index': i + 1, # 1-based segment number
'file_offset': file_offset,
'length': seg_len,
'flags': seg_flags,
'min_alloc': min_alloc,
'has_reloc': has_reloc,
})
off += 8
return segments
def parse_module_refs(data, hdr):
"""Parse the module reference table → imported module names."""
modules = []
mref_off = hdr['module_ref_off']
iname_off = hdr['imported_name_off']
for i in range(hdr['num_module_refs']):
name_off_rel = read_u16(data, mref_off + i * 2)
name_off_abs = iname_off + name_off_rel
name_len = read_u8(data, name_off_abs)
name = data[name_off_abs + 1: name_off_abs + 1 + name_len].decode('ascii', errors='replace')
modules.append(name)
return modules
def parse_relocations(data, seg):
"""Parse relocation entries for a single segment."""
if not seg['has_reloc']:
return []
# Relocation table starts right after the segment data in the file
reloc_off = seg['file_offset'] + seg['length']
num_relocs = read_u16(data, reloc_off)
reloc_off += 2
entries = []
for i in range(num_relocs):
addr_type = read_u8(data, reloc_off)
rel_type = read_u8(data, reloc_off + 1)
chain_off = read_u16(data, reloc_off + 2) # offset within segment where fixup applies
# Additive flag is bit 2 of rel_type
additive = bool(rel_type & 0x04)
rel_type_base = rel_type & 0x03
entry = {
'addr_type': addr_type,
'addr_type_name': ADDR_TYPE_NAMES.get(addr_type, f'unk_{addr_type}'),
'rel_type': rel_type_base,
'rel_type_name': REL_TYPE_NAMES.get(rel_type_base, f'unk_{rel_type_base}'),
'additive': additive,
'seg_offset': chain_off,
'seg_index': seg['index'],
}
if rel_type_base == REL_INTERNAL:
# Internal reference
target_seg = read_u8(data, reloc_off + 4)
reserved = read_u8(data, reloc_off + 5)
target_off = read_u16(data, reloc_off + 6)
if target_seg == 0xFF:
# Moveable segment, target_off is entry table ordinal
entry['target_type'] = 'moveable_entry'
entry['entry_ordinal'] = target_off
else:
entry['target_type'] = 'fixed'
entry['target_seg'] = target_seg # 1-based segment number
entry['target_offset'] = target_off
elif rel_type_base == REL_IMPORTORD:
module_idx = read_u16(data, reloc_off + 4) # 1-based
ordinal = read_u16(data, reloc_off + 6)
entry['target_type'] = 'import_ordinal'
entry['module_index'] = module_idx
entry['ordinal'] = ordinal
elif rel_type_base == REL_IMPORTNAM:
module_idx = read_u16(data, reloc_off + 4) # 1-based
name_off = read_u16(data, reloc_off + 6)
entry['target_type'] = 'import_name'
entry['module_index'] = module_idx
entry['name_offset'] = name_off
elif rel_type_base == REL_OSFIXUP:
fixup_type = read_u16(data, reloc_off + 4)
entry['target_type'] = 'osfixup'
entry['osfixup_type'] = fixup_type
entries.append(entry)
reloc_off += 8
return entries
def follow_reloc_chain(data, seg, first_offset, addr_type):
"""
NE relocations use a chain: the first entry points to an offset in
the segment. At that offset, a word points to the next offset
needing the same fixup. 0xFFFF terminates the chain.
Returns all offsets in the chain.
"""
offsets = []
seg_data_start = seg['file_offset']
seg_len = seg['length']
current = first_offset
visited = set()
while current != 0xFFFF and current < seg_len:
if current in visited:
break # cycle protection
visited.add(current)
offsets.append(current)
# For far_ptr: the call instruction is CALLF seg:off at the offset
# The offset field (first word) at current contains the next chain link
next_ptr_file = seg_data_start + current
if next_ptr_file + 2 > len(data):
break
next_off = read_u16(data, next_ptr_file)
current = next_off
return offsets
def file_offset_to_ghidra(file_off):
"""Convert file offset to Ghidra seg:off address string (raw import)."""
seg = file_off >> 16
off = file_off & 0xFFFF
return f'{seg:04x}:{off:04x}'
def main():
print(f"Reading {EXE_PATH}...")
with open(EXE_PATH, 'rb') as f:
data = f.read()
print(f" File size: {len(data)} bytes (0x{len(data):X})")
# Verify NE header location
# Check MZ header first
assert data[0:2] == b'MZ', "Not an MZ executable"
lfanew = read_u32(data, 0x3C)
print(f" e_lfanew from MZ header: 0x{lfanew:X}")
# Use the known NE offset
ne_off = NE_HEADER_OFFSET
print(f" Using NE header at: 0x{ne_off:X}")
hdr = parse_ne_header(data, ne_off)
print(f" Segments: {hdr['num_segments']}")
print(f" Alignment shift: {hdr['alignment_shift']}")
print(f" Module refs: {hdr['num_module_refs']}")
modules = parse_module_refs(data, hdr)
print(f" Imported modules: {modules}")
segments = parse_segment_table(data, hdr)
# Parse all relocations
all_fixups = [] # list of resolved fixup records
stats = defaultdict(int)
for seg in segments:
relocs = parse_relocations(data, seg)
if not relocs:
continue
for reloc in relocs:
# Follow the chain to find ALL offsets needing this fixup
chain = follow_reloc_chain(data, seg, reloc['seg_offset'], reloc['addr_type'])
for fixup_off in chain:
fixup_file_off = seg['file_offset'] + fixup_off
ghidra_addr = file_offset_to_ghidra(fixup_file_off)
rec = {
'source_seg': seg['index'],
'source_offset_in_seg': fixup_off,
'source_file_offset': fixup_file_off,
'source_ghidra': ghidra_addr,
'addr_type': reloc['addr_type_name'],
'rel_type': reloc['rel_type_name'],
}
if reloc.get('target_type') == 'fixed':
target_seg_idx = reloc['target_seg']
target_off = reloc['target_offset']
target_seg_info = segments[target_seg_idx - 1]
target_file_off = target_seg_info['file_offset'] + target_off
target_ghidra = file_offset_to_ghidra(target_file_off)
rec['target'] = f'seg{target_seg_idx:03d}:{target_off:04x}'
rec['target_ghidra'] = target_ghidra
rec['target_file_offset'] = target_file_off
elif reloc.get('target_type') == 'moveable_entry':
rec['target'] = f'entry_ordinal_{reloc["entry_ordinal"]}'
rec['target_ghidra'] = '?'
elif reloc.get('target_type') == 'import_ordinal':
mod_idx = reloc['module_index']
mod_name = modules[mod_idx - 1] if mod_idx <= len(modules) else f'mod{mod_idx}'
rec['target'] = f'{mod_name}.{reloc["ordinal"]}'
rec['target_ghidra'] = '?'
elif reloc.get('target_type') == 'import_name':
mod_idx = reloc['module_index']
mod_name = modules[mod_idx - 1] if mod_idx <= len(modules) else f'mod{mod_idx}'
# Read the imported name
iname_base = hdr['imported_name_off']
name_off = iname_base + reloc['name_offset']
name_len = read_u8(data, name_off)
name = data[name_off+1:name_off+1+name_len].decode('ascii', errors='replace')
rec['target'] = f'{mod_name}.{name}'
rec['target_ghidra'] = '?'
elif reloc.get('target_type') == 'osfixup':
rec['target'] = f'osfixup_{reloc["osfixup_type"]}'
rec['target_ghidra'] = '?'
else:
rec['target'] = '???'
rec['target_ghidra'] = '?'
all_fixups.append(rec)
stats[reloc['addr_type_name']] += 1
print(f"\n Total resolved fixup points: {len(all_fixups)}")
print(f" By address type: {dict(stats)}")
# Filter to just far_ptr (CALLF) fixups with internal targets — these are the ones
# that decompile as CALLF 0000:ffff in Ghidra
far_calls = [f for f in all_fixups if f['addr_type'] == 'far_ptr_16:16' and f.get('target_ghidra', '?') != '?']
far_imports = [f for f in all_fixups if f['addr_type'] == 'far_ptr_16:16' and f.get('target_ghidra', '?') == '?']
print(f" Far-call internal fixups: {len(far_calls)}")
print(f" Far-call import fixups: {len(far_imports)}")
# Save full results
out_path = os.path.join(os.path.dirname(EXE_PATH), 'ne_reloc_fixups.json')
with open(out_path, 'w') as f:
json.dump(all_fixups, f, indent=2)
print(f"\n Full fixup table written to: {out_path}")
# Save a focused far-call table (TSV) for easy use
tsv_path = os.path.join(os.path.dirname(EXE_PATH), 'ne_reloc_far_calls.tsv')
with open(tsv_path, 'w') as f:
f.write("source_ghidra\ttarget_ghidra\ttarget_label\tsource_seg\tsource_off_in_seg\n")
for rec in sorted(far_calls, key=lambda r: r['source_file_offset']):
f.write(f"{rec['source_ghidra']}\t{rec['target_ghidra']}\t{rec['target']}\t")
f.write(f"seg{rec['source_seg']:03d}\t0x{rec['source_offset_in_seg']:04x}\n")
print(f" Far-call internal TSV: {tsv_path}")
# Also save import far-calls
imp_path = os.path.join(os.path.dirname(EXE_PATH), 'ne_reloc_far_imports.tsv')
with open(imp_path, 'w') as f:
f.write("source_ghidra\ttarget\tsource_seg\tsource_off_in_seg\n")
for rec in sorted(far_imports, key=lambda r: r['source_file_offset']):
f.write(f"{rec['source_ghidra']}\t{rec['target']}\t")
f.write(f"seg{rec['source_seg']:03d}\t0x{rec['source_offset_in_seg']:04x}\n")
print(f" Far-call import TSV: {imp_path}")
# Print a sample of game-segment far calls (seg039=seg001 region in raw, file offset 0x6E200)
print("\n── Sample: seg039 (NE seg 39, game seg001 area) far-call fixups ──")
seg39_calls = [f for f in far_calls if f['source_seg'] == 39]
for rec in sorted(seg39_calls, key=lambda r: r['source_offset_in_seg'])[:30]:
print(f" {rec['source_ghidra']}{rec['target_ghidra']} ({rec['target']})")
# Print a sample around the entity_ai_update_loop / entity_animation area
print("\n── Sample: seg059 (NE seg 59, game 0007: area) far-call fixups ──")
seg59_calls = [f for f in far_calls if f['source_seg'] == 59]
for rec in sorted(seg59_calls, key=lambda r: r['source_offset_in_seg'])[:30]:
print(f" {rec['source_ghidra']}{rec['target_ghidra']} ({rec['target']})")
if __name__ == '__main__':
main()

View file

@ -0,0 +1,44 @@
{
"transaction": "Repair seg043 boundaries around 0007:5a90",
"remove_functions": [
"0007:5b6f"
],
"create_functions": [
{
"entry": "0007:5a90",
"name": "seg043_func_0090",
"body_start": "0007:5a90",
"body_end": "0007:5b79",
"comment": "Recovered from standalone seg043 boundary scan: true start at seg043:0090, body spans seg043:0090..0179.",
"comment_type": "plate"
},
{
"entry": "0007:5b7a",
"name": "seg043_func_017a",
"body_start": "0007:5b7a",
"body_end": "0007:5c1b",
"comment": "Recovered from standalone seg043 boundary scan: second prologue at seg043:017a, body spans seg043:017a..021b.",
"comment_type": "plate"
},
{
"entry": "0007:5c1c",
"name": "seg043_func_021c",
"body_start": "0007:5c1c",
"body_end": "0007:5c80",
"comment": "Recovered from standalone seg043 boundary scan: third prologue at seg043:021c, body spans seg043:021c..0280.",
"comment_type": "plate"
}
],
"comments": [
{
"address": "0007:5b6f",
"text": "Old auto-created split overlaps the earlier seg043:0090..0179 routine and should not be treated as a real entrypoint.",
"type": "plate"
}
],
"assert_functions": [
"0007:5a90",
"0007:5b7a",
"0007:5c1c"
]
}

5
read_file.py Normal file
View file

@ -0,0 +1,5 @@
f = open(r'k:\ghidra\Crusader_Decomp\tier4_ghidra.txt', 'r')
content = f.read()
f.close()
print('SIZE=' + str(len(content)))
print(content)

20
resolve_bb4f.py Normal file
View file

@ -0,0 +1,20 @@
"""Resolve 0008:bb58 (FUN_0008_bb4f's inner CALLF operand at +1)"""
import json
with open(r'k:\ghidra\Crusader_Decomp\ne_reloc_fixups.json') as f:
fixups = json.load(f)
by_off = {f['source_file_offset']: f for f in fixups}
# 0008:bb58 CALLF, operand at 0008:bb59 = flat 0x8BB59
flat = 0x8BB59
if flat in by_off:
m = by_off[flat]
print(f"0008:bb58 CALLF -> {m.get('target','?')} (ghidra: {m.get('target_ghidra','?')})")
else:
print(f"NOT FOUND at 0x{flat:X}")
# Try nearby
for d in range(-2, 5):
if flat+d in by_off:
m = by_off[flat+d]
print(f" +{d}: {m.get('target','?')} (ghidra: {m.get('target_ghidra','?')})")

55
resolve_top_targets.py Normal file
View file

@ -0,0 +1,55 @@
"""Find the resolved NE targets for the top-called wrapper functions."""
import json
with open(r'k:\ghidra\Crusader_Decomp\ne_reloc_fixups.json') as f:
fixups = json.load(f)
by_off = {f['source_file_offset']: f for f in fixups}
# Top wrappers: look up what their internal CALLF targets are
wrappers = {
'0003:ac9c': 'FUN_0003_ac7e inner CALLF (272 callers, alloc wrapper)',
'0003:a75a': 'FUN_0003_a751 inner CALLF (207 callers, 2-arg forward)',
'0008:bb4f': 'FUN_0008_bb4f (174 callers)',
}
def g2f(a):
s,o = a.split(':')
return (int(s,16)<<16) + int(o,16)
for addr, desc in wrappers.items():
flat = g2f(addr)
for delta in range(0, 5):
if flat + delta in by_off:
m = by_off[flat + delta]
print(f"{addr} ({desc})")
print(f" -> {m.get('target','?')} (ghidra: {m.get('target_ghidra','?')})")
break
else:
print(f"{addr} ({desc}) — NOT FOUND in fixups")
# Also look up 000a:44fd — it had no function, check if it's data or seg boundary
print()
print(f"Checking 000a:44fd — flat 0x{g2f('000a:44fd'):X}")
print(f" This is file offset 0xA44FD")
# Find which NE segment contains this
import csv
with open(r'k:\ghidra\Crusader_Decomp\crusader_ne_segments.csv') as f:
reader = csv.DictReader(f)
for row in reader:
seg_off = int(row['FileOffset'], 16)
seg_len = int(row['Length'], 16)
if seg_off <= 0xA44FD < seg_off + seg_len:
print(f" In NE segment {row['Segment']}: file 0x{seg_off:X}, len 0x{seg_len:X}")
print(f" Offset within segment: 0x{0xA44FD - seg_off:X}")
break
# Also check what calls 000a:44fd (search for its Ghidra address in call patterns)
print()
seg91_calls = [f for f in fixups if f.get('target_ghidra') == '000a:44fd']
print(f"Calls to 000a:44fd (seg091:00fd): {len(seg91_calls)} total")
# Show first 5 callers
for c in seg91_calls[:5]:
src_flat = c['source_file_offset'] - 1
src_ghidra = f"{src_flat>>16:04x}:{src_flat&0xFFFF:04x}"
print(f" from {src_ghidra} (seg{c['source_seg']:03d}+0x{c['source_offset_in_seg']:04x})")

11
script_contents.txt Normal file
View file

@ -0,0 +1,11 @@
from collections import Counter
c = Counter()
with open('ne_reloc_far_calls.tsv') as f:
next(f)
for line in f:
parts = line.strip().split('\t')
tgt = parts[2]
c[tgt] += 1
for i, (addr, cnt) in enumerate(c.most_common(100)):
if i >= 60 and i < 80:
print(f'{i+1:3d} {addr} {cnt}')

20
tier4_ghidra.txt Normal file
View file

@ -0,0 +1,20 @@
61 000b:30a5 42
62 0008:bc27 41
63 0008:d214 40
64 0005:1565 39
65 0008:4bba 39
66 0009:6f5a 38
67 0009:8ef6 38
68 000a:7b30 38
69 000a:7b3f 38
70 0009:6e7f 35
71 000a:7b35 35
72 0009:c433 34
73 0009:2156 33
74 000a:2c41 33
75 000a:454d 32
76 000b:2446 31
77 0005:1238 30
78 000b:1446 30
79 000d:85da 29
80 0005:1511 29

21
tier4_ghidra_check.txt Normal file
View file

@ -0,0 +1,21 @@
SIZE=380
61 000b:30a5 42
62 0008:bc27 41
63 0008:d214 40
64 0005:1565 39
65 0008:4bba 39
66 0009:6f5a 38
67 0009:8ef6 38
68 000a:7b30 38
69 000a:7b3f 38
70 0009:6e7f 35
71 000a:7b35 35
72 0009:c433 34
73 0009:2156 33
74 000a:2c41 33
75 000a:454d 32
76 000b:2446 31
77 0005:1238 30
78 000b:1446 30
79 000d:85da 29
80 0005:1511 29

20
tier4_output.txt Normal file
View file

@ -0,0 +1,20 @@
61 seg109:02a5 42
62 seg061:0227 41
63 seg061:1814 40
64 seg021:1365 39
65 seg055:09ba 39
66 seg076:015a 38
67 seg080:02f6 38
68 seg096:0530 38
69 seg096:053f 38
70 seg076:007f 35
71 seg096:0535 35
72 seg083:0033 34
73 seg070:0556 33
74 seg087:0441 33
75 seg091:014d 32
76 seg108:0a46 31
77 seg021:1038 30
78 seg107:0046 30
79 seg137:07da 29
80 seg021:1311 29

20
tier4_result.txt Normal file
View file

@ -0,0 +1,20 @@
61 seg109:02a5 42
62 seg061:0227 41
63 seg061:1814 40
64 seg021:1365 39
65 seg055:09ba 39
66 seg076:015a 38
67 seg080:02f6 38
68 seg096:0530 38
69 seg096:053f 38
70 seg076:007f 35
71 seg096:0535 35
72 seg083:0033 34
73 seg070:0556 33
74 seg087:0441 33
75 seg091:014d 32
76 seg108:0a46 31
77 seg021:1038 30
78 seg107:0046 30
79 seg137:07da 29
80 seg021:1311 29

0
tier5_errors.txt Normal file
View file

20
tier5_output.txt Normal file
View file

@ -0,0 +1,20 @@
81 0009:1c00 29
82 0008:75f3 29
83 0006:0208 29
84 000a:30d7 29
85 0009:c45f 29
86 0004:d7a0 28
87 000a:5276 28
88 0003:d94f 28
89 000a:7b3a 28
90 0008:ca18 27
91 0008:bd20 27
92 0009:3ceb 27
93 0005:09b4 27
94 0005:0fbb 27
95 0008:d27e 26
96 0005:0376 26
97 000b:2492 26
98 0003:e4d3 26
99 0005:033e 25
100 000a:87fd 25

1
tools/__init__.py Normal file
View file

@ -0,0 +1 @@
"""Workspace helper packages."""

Binary file not shown.

View file

@ -0,0 +1,5 @@
"""PyGhidra helpers for the Crusader Ghidra project."""
from .cli import main
__all__ = ["main"]

View file

@ -0,0 +1,5 @@
from .cli import main
if __name__ == "__main__":
raise SystemExit(main())

View file

@ -0,0 +1,258 @@
from __future__ import annotations
import argparse
import json
from pathlib import Path
from .common import (
DEFAULT_INSTALL_DIR,
DEFAULT_PROJECT_DIR,
DEFAULT_PROJECT_NAME,
DEFAULT_PROGRAM_NAME,
DEFAULT_FOLDER_PATH,
ProjectConfig,
create_function,
get_function,
list_root_files,
open_program,
open_project,
remove_function,
rename_function,
save_program,
set_comment,
transaction,
)
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="PyGhidra helpers for the Crusader project."
)
parser.add_argument(
"--install-dir",
default=str(DEFAULT_INSTALL_DIR),
help="Ghidra install directory.",
)
parser.add_argument(
"--project-dir",
default=str(DEFAULT_PROJECT_DIR),
help="Directory containing the Ghidra project.",
)
parser.add_argument(
"--project-name",
default=DEFAULT_PROJECT_NAME,
help="Ghidra project name.",
)
parser.add_argument(
"--program-name",
default=DEFAULT_PROGRAM_NAME,
help="Program name inside the project.",
)
parser.add_argument(
"--folder-path",
default=DEFAULT_FOLDER_PATH,
help="Project folder path containing the program.",
)
parser.add_argument(
"--restore-project",
action="store_true",
help="Restore project tool state while opening the project.",
)
subparsers = parser.add_subparsers(dest="command", required=True)
subparsers.add_parser(
"project-files",
help="List root-level files in the Ghidra project.",
)
create_parser = subparsers.add_parser(
"create-function",
help="Create a function at an address with an optional explicit body range.",
)
create_parser.add_argument("--entry", required=True, help="Function entry address.")
create_parser.add_argument("--name", required=True, help="New function name.")
create_parser.add_argument("--body-start", help="Function body start address.")
create_parser.add_argument("--body-end", help="Function body end address.")
create_parser.add_argument(
"--plate-comment",
help="Optional plate comment to set at the entry address after creation.",
)
delete_parser = subparsers.add_parser(
"delete-function",
help="Delete a function at an address.",
)
delete_parser.add_argument("--entry", required=True, help="Function entry address.")
rename_parser = subparsers.add_parser(
"rename-function",
help="Rename an existing function by entry address.",
)
rename_parser.add_argument("--entry", required=True, help="Function entry address.")
rename_parser.add_argument("--name", required=True, help="New function name.")
comment_parser = subparsers.add_parser(
"set-comment",
help="Set a code-unit comment by address.",
)
comment_parser.add_argument("--address", required=True, help="Comment target address.")
comment_parser.add_argument("--text", required=True, help="Comment text.")
comment_parser.add_argument(
"--type",
choices=["pre", "plate", "eol", "repeatable", "post"],
default="plate",
help="Comment type.",
)
plan_parser = subparsers.add_parser(
"apply-plan",
help="Apply a JSON edit plan containing function and comment operations.",
)
plan_parser.add_argument("--plan", required=True, help="Path to the JSON plan file.")
plan_parser.add_argument(
"--dry-run",
action="store_true",
help="Validate and print the plan without modifying the project.",
)
return parser
def build_config(args: argparse.Namespace) -> ProjectConfig:
return ProjectConfig(
install_dir=Path(args.install_dir),
project_dir=Path(args.project_dir),
project_name=args.project_name,
program_name=args.program_name,
folder_path=args.folder_path,
restore_project=args.restore_project,
)
def command_project_files(config: ProjectConfig, _args: argparse.Namespace) -> int:
project = open_project(config)
try:
for name in list_root_files(project):
print(name)
finally:
project.close()
return 0
def command_create_function(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=False) as (project, program):
with transaction(program, f"Create function {args.entry}"):
function = create_function(program, args.entry, args.name, args.body_start, args.body_end)
if args.plate_comment:
set_comment(program, args.entry, args.plate_comment, "plate")
save_program(project, program)
print(f"created {function.getName()} at {args.entry}")
return 0
def command_delete_function(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=False) as (project, program):
with transaction(program, f"Delete function {args.entry}"):
removed = remove_function(program, args.entry)
if not removed:
raise RuntimeError(f"no function removed at {args.entry}")
save_program(project, program)
print(f"deleted function at {args.entry}")
return 0
def command_rename_function(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=False) as (project, program):
with transaction(program, f"Rename function {args.entry}"):
function = rename_function(program, args.entry, args.name)
save_program(project, program)
print(f"renamed {args.entry} to {function.getName()}")
return 0
def command_set_comment(config: ProjectConfig, args: argparse.Namespace) -> int:
with open_program(config, read_only=False) as (project, program):
with transaction(program, f"Set comment {args.address}"):
set_comment(program, args.address, args.text, args.type)
save_program(project, program)
print(f"set {args.type} comment at {args.address}")
return 0
def _load_plan(plan_path: str) -> dict:
with open(plan_path, "r", encoding="utf-8") as handle:
return json.load(handle)
def _print_plan(plan: dict) -> None:
print(json.dumps(plan, indent=2, sort_keys=True))
def command_apply_plan(config: ProjectConfig, args: argparse.Namespace) -> int:
plan = _load_plan(args.plan)
if args.dry_run:
_print_plan(plan)
return 0
transaction_name = plan.get("transaction", f"Apply plan {args.plan}")
with open_program(config, read_only=False) as (project, program):
with transaction(program, transaction_name):
for entry in plan.get("remove_functions", []):
removed = remove_function(program, entry)
if not removed:
raise RuntimeError(f"no function removed at {entry}")
for entry in plan.get("rename_functions", []):
rename_function(program, entry["entry"], entry["name"])
for entry in plan.get("create_functions", []):
create_function(
program,
entry["entry"],
entry["name"],
entry.get("body_start"),
entry.get("body_end"),
)
if entry.get("comment"):
set_comment(
program,
entry["entry"],
entry["comment"],
entry.get("comment_type", "plate"),
)
for entry in plan.get("comments", []):
set_comment(
program,
entry["address"],
entry["text"],
entry.get("type", "plate"),
)
for entry in plan.get("assert_functions", []):
if get_function(program, entry) is None:
raise RuntimeError(f"expected function missing at {entry}")
save_program(project, program)
print(f"applied plan {args.plan}")
return 0
def main(argv: list[str] | None = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
config = build_config(args)
command_map = {
"project-files": command_project_files,
"create-function": command_create_function,
"delete-function": command_delete_function,
"rename-function": command_rename_function,
"set-comment": command_set_comment,
"apply-plan": command_apply_plan,
}
return command_map[args.command](config, args)
if __name__ == "__main__":
raise SystemExit(main())

View file

@ -0,0 +1,181 @@
from __future__ import annotations
from contextlib import contextmanager
from dataclasses import dataclass
from pathlib import Path
import os
REPO_ROOT = Path(__file__).resolve().parents[2]
DEFAULT_INSTALL_DIR = Path(
os.environ.get("GHIDRA_INSTALL_DIR", r"I:\Apps\ghidra_11.3.2_PUBLIC")
)
DEFAULT_PROJECT_DIR = REPO_ROOT
DEFAULT_PROJECT_NAME = "Crusader"
DEFAULT_PROGRAM_NAME = "CRUSADER-RAW.EXE"
DEFAULT_FOLDER_PATH = "/"
@dataclass(frozen=True)
class ProjectConfig:
install_dir: Path = DEFAULT_INSTALL_DIR
project_dir: Path = DEFAULT_PROJECT_DIR
project_name: str = DEFAULT_PROJECT_NAME
program_name: str = DEFAULT_PROGRAM_NAME
folder_path: str = DEFAULT_FOLDER_PATH
restore_project: bool = False
def ensure_pyghidra_started(install_dir: Path | None = None):
import pyghidra
resolved_dir = Path(install_dir or DEFAULT_INSTALL_DIR)
if not pyghidra.started():
pyghidra.start(install_dir=resolved_dir)
return pyghidra
def parse_address_text(address_text: str) -> int:
text = address_text.strip()
if ":" in text:
segment_text, offset_text = text.split(":", 1)
return (int(segment_text, 16) << 16) + int(offset_text, 16)
return int(text, 0)
def to_address(program, address_text: str):
address_space = program.getAddressFactory().getDefaultAddressSpace()
return address_space.getAddress(parse_address_text(address_text))
def format_project_error(config: ProjectConfig, exc: Exception) -> RuntimeError:
lock_path = config.project_dir / f"{config.project_name}.lock"
details = [
f"unable to open project '{config.project_name}' in '{config.project_dir}'",
str(exc),
]
if lock_path.exists():
details.append(
f"project lock present at '{lock_path}'; close Ghidra or work on a project copy for write operations"
)
return RuntimeError("; ".join(details))
def open_project(config: ProjectConfig):
ensure_pyghidra_started(config.install_dir)
from ghidra.base.project import GhidraProject
try:
return GhidraProject.openProject(
str(config.project_dir),
config.project_name,
config.restore_project,
)
except Exception as exc: # pragma: no cover - depends on local Ghidra state
raise format_project_error(config, exc) from exc
def _candidate_folder_paths(folder_path: str) -> list[str]:
candidates = [folder_path]
for fallback in ("/", "\\", ""):
if fallback not in candidates:
candidates.append(fallback)
return candidates
@contextmanager
def open_program(config: ProjectConfig, read_only: bool):
project = open_project(config)
program = None
last_error = None
try:
for folder_path in _candidate_folder_paths(config.folder_path):
try:
program = project.openProgram(folder_path, config.program_name, read_only)
break
except Exception as exc: # pragma: no cover - depends on local Ghidra state
last_error = exc
if program is None:
raise RuntimeError(
f"unable to open program '{config.program_name}' from project '{config.project_name}': {last_error}"
)
yield project, program
finally:
if project is not None:
if program is not None:
project.close(program)
project.close()
@contextmanager
def transaction(program, description: str):
transaction_id = program.startTransaction(description)
commit = False
try:
yield
commit = True
finally:
program.endTransaction(transaction_id, commit)
def list_root_files(project) -> list[str]:
return [domain_file.getName() for domain_file in project.getRootFolder().getFiles()]
def get_function(program, entry_text: str):
return program.getFunctionManager().getFunctionAt(to_address(program, entry_text))
def create_function(program, entry_text: str, name: str, body_start: str | None, body_end: str | None):
from ghidra.program.model.address import AddressSet
from ghidra.program.model.symbol import SourceType
entry_address = to_address(program, entry_text)
body_start_address = to_address(program, body_start or entry_text)
body_end_address = to_address(program, body_end or entry_text)
body = AddressSet(body_start_address, body_end_address)
return program.getFunctionManager().createFunction(
name,
entry_address,
body,
SourceType.USER_DEFINED,
)
def remove_function(program, entry_text: str) -> bool:
return bool(program.getFunctionManager().removeFunction(to_address(program, entry_text)))
def rename_function(program, entry_text: str, new_name: str):
from ghidra.program.model.symbol import SourceType
function = get_function(program, entry_text)
if function is None:
raise ValueError(f"no function found at {entry_text}")
function.setName(new_name, SourceType.USER_DEFINED)
return function
def set_comment(program, address_text: str, comment: str, comment_type: str):
from ghidra.program.model.listing import CodeUnit
comment_types = {
"pre": CodeUnit.PRE_COMMENT,
"plate": CodeUnit.PLATE_COMMENT,
"eol": CodeUnit.EOL_COMMENT,
"repeatable": CodeUnit.REPEATABLE_COMMENT,
"post": CodeUnit.POST_COMMENT,
}
if comment_type not in comment_types:
raise ValueError(f"unsupported comment type: {comment_type}")
listing = program.getListing()
code_unit = listing.getCodeUnitAt(to_address(program, address_text))
if code_unit is None:
raise ValueError(f"no code unit found at {address_text}")
code_unit.setComment(comment_types[comment_type], comment)
def save_program(project, program):
project.save(program)

50
validate_fixups.py Normal file
View file

@ -0,0 +1,50 @@
import json
with open(r'k:\ghidra\Crusader_Decomp\ne_reloc_fixups.json') as f:
fixups = json.load(f)
known_callf_addrs = {
'0007:101c': 'entity_ai_update_loop call#1 (entity_slot_fetch)',
'0007:1093': 'entity_ai_update_loop call#2 (entity_tick_dispatch)',
'0007:2261': 'snap_entity_to_ground call (ground snap thunk)',
'0007:27dc': 'anim_frame_update call#1 (completion_callback)',
'0007:281e': 'anim_frame_update call#3 (notify_frame_progress)',
'0007:2851': 'anim_frame_update call#4 (entity_sprite_advance)',
'0007:8666': 'entity_sync_tile_aux thunk (tile_type_notify)',
}
def ghidra_to_file(addr_str):
seg, off = addr_str.split(':')
return (int(seg, 16) << 16) + int(off, 16)
# Build a lookup dict by source_file_offset for speed
by_offset = {}
for f in fixups:
by_offset[f['source_file_offset']] = f
for addr, desc in sorted(known_callf_addrs.items()):
callf_file = ghidra_to_file(addr)
print(f"\n{addr} = {desc}")
print(f" CALLF file offset: 0x{callf_file:X}")
# The NE fixup offset points to where the patched value goes.
# For CALLF (9A xx xx xx xx), the operand is at addr+1.
# But the reloc chain offset is relative to segment start.
# Let's search for any fixup within +/-2 of both callf_file and callf_file+1
for delta in range(0, 5):
test_off = callf_file + delta
if test_off in by_offset:
m = by_offset[test_off]
tgt = m.get('target', '?')
tgt_g = m.get('target_ghidra', '?')
print(f" FOUND at +{delta}: file=0x{test_off:X} seg{m['source_seg']:03d}+0x{m['source_offset_in_seg']:04X}")
print(f" -> {tgt} (ghidra: {tgt_g})")
break
else:
print(f" NOT FOUND in range [+0..+4]")
# Show what segment this falls in
for s in range(1, 146):
entry = [x for x in fixups if x['source_seg'] == s]
if entry:
# not efficient but ok for debugging
pass