Add Ghidra coverage agents and update documentation for enhanced function analysis

- Introduced `Ghidra Coverage Batch Director` and `Ghidra Coverage Mini` agents for improved parallel analysis and function coverage in `CRUSADER.EXE`.
- Updated `ghidra.instructions.md` to clarify documentation practices and legacy file handling.
- Added recent verified function coverage updates to `crusader_decompilation_notes.md` and `plan-mid.md` for better tracking of analysis progress.
- Included new binary files for enhanced data handling in the project.
This commit is contained in:
Marco 2026-04-15 17:16:53 +02:00
commit 328a8ba30f
11 changed files with 282 additions and 3 deletions

View file

@ -15,6 +15,12 @@ Detailed completed analysis belongs in the files under `docs/`, not in this plan
## Progress Snapshot
Latest verified batch: a broad live MCP `CRUSADER.EXE` continuation wave pushed three caller-first `1000` buffered-I/O bundles, one `1078/1060` ItemCache relink bundle, and one `1348/1360` SpriteNode-NewGump geometry bundle. This wave landed `8` additional evidence-backed names: `1000:5c9d = stream_count_buffered_newlines`, `1000:5d1f = buffered_stream_seek`, `1000:5e7f = fwrite_buffered`, `1078:01c8 = ItemCache_RelinkForwardLink_1078`, `1078:01f9 = ItemCache_RelinkMovedBlockLinks_1078`, `1078:023e = ItemCache_RelinkAroundMovedBlock_1078`, `1360:0b65 = sprite_tree_update_dirty_state`, and `1360:0bb2 = sprite_tree_relink_child_after_head`. The same wave also confirmed or materially sharpened neutral evidence comments on `1000:578a`, `1000:58c8`, `1000:6c73`, `1000:5d9f`, `1000:5f0a`, `1000:5f48`, `1000:5fc0`, `1060:1c70`, `1060:0ecf`, `1348:0b39`, `1348:0c92`, `1348:0d07`, and `1348:0d81`, while tightening the already-named geometry anchors `1360:0b43 = sprite_tree_point_in_bounds` and `1360:0c00 = sprite_tree_sum_x_offset`. From the previous verified baseline of `1095` unnamed functions, these eight additional safe renames move the working coverage floor to `1087` unnamed overall, with touched-selector counts now `1000=148`, `1078=18`, `1190=27`, `1348=29`, `1360=25`, and `11d0=31`. Practical next step remains caller-first closure in `1000`, especially the `Filespec_1238_032e` / `UProcess_1420_062f` / `1000:6c73` chain and the remaining sync helper `1000:5d9f`, with a secondary continuation on the still-comment-only `1348` SpriteNode/NewGump wrappers once their family-level naming boundary is clearer.
Latest verified batch: a second live MCP `CRUSADER.EXE` coverage wave ran as six parallel `Ghidra Decomp Mini` passes with explicit per-bundle quotas. Five mini passes were scoped to `3` target functions each and the sixth was deliberately heavier at `6` functions for capacity calibration. The batch landed verified new names `1000:37ca = vsscanf_engine`, `1000:37de = advance_dest_by_char_size`, `1000:56bd = buffer_normalize_and_refill`, `11d0:15f2 = FindLinearCapableProcessForItemType`, `1078:0046 = DList_InsertAfterHead_1078`, `1078:0098 = DList_UnlinkNode_1078`, `1190:006d = rect_union_inplace`, `1190:00da = global_list_pop_head_1478_2cc3`, `1190:0112 = global_list_push_head_1478_2cc3`, `1348:0023 = spritenode_create_and_invoke_0x50`, `1348:006f = spritenode_invoke_0x50`, `1348:0092 = spritenode_invoke_0x50_alt`, `1360:00c7 = alloc_init_1360_obj`, and `1360:0113 = destroy_1360_obj`, plus neutral evidence comments on `1000:578a`, `11d0:0255`, `11d0:04cd`, `1360:0161`, `1360:017a`, `1360:01c7`, and `1360:0218`. Coverage moved from `3032/1140 unnamed` to `3032/1126 unnamed`. Practical calibration result: `4` functions is now the best default load for a GPT-5.4 mini pass, with `6` acceptable only when the bundle is dominated by small wrappers/search helpers instead of deep table or subsystem reasoning.
Latest verified batch: live MCP validation on active `CRUSADER.EXE` succeeded normally (`get_project_access_info`, symbol reads, scripted inspection, and write-capable rename/comment edits), followed by a six-`Ghidra Decomp Mini` coverage sweep plus one direct follow-up patch for selector `10e8`. The batch landed durable evidence-backed names `1000:626f = itoa`, `1000:636e = memmove`, `10e8:00c9 = NPC_SavegameWrite`, `10e8:00f2 = NPC_SavegameRead`, `11d0:2491 = kernel_process_snapshot_writer`, and `11d0:39e6 = read_bios_keyboard_shift_cache`, plus concise decompiler comments in smaller helper selectors `1078`, `1190`, `1348`, and `1360` and in several still-ambiguous `1000` DOS/video/file-I/O wrappers. Practical consequence is that the next NE function-coverage pass should resume caller-first on the skipped `1000` DOS/file-I/O cluster (`37b0..37ff`, `56bd..5825`) and the ambiguous `11d0` dispatch/table-management families instead of reopening the already annotated wrapper lanes.
Latest verified batch: [docs/psx/art-binding-recovery.md](docs/psx/art-binding-recovery.md) now includes a 2026-04-13 focused live `SLUS_002.68` late art-bank corridor pass centered on `wdl_resource_bundle_load_by_index` (`0x80039444`), the header-only write sites `0x8003977c/0x80039a64`, `psx_install_type_art_active_header_and_built_resource` (`0x80045ffc`), `psx_create_image_resource_from_descriptor` (`0x80044434`), and constructor fast paths `0x80024b0c/0x80025004`. Current best read is now exporter-critical and more exact than the older “one late descriptor bank” shorthand: each WDL pass contributes two art-facing late sections, the later `8`-byte header-only override is what leaves raw `0x58`-byte active headers in `DAT_800758d8`, and constructors reuse `DAT_800758c8[type]` when that raw-header signature is present instead of rebuilding. Practical consequence is that standalone parsing should target the late header-only override stream first and treat the earlier built-resource art-install blob as a separate, still-partially-unresolved feed rather than flattening both into one guessed art bank.
Latest verified batch: [docs/psx/map-storage-model.md](docs/psx/map-storage-model.md) now includes a 2026-04-13 live subordinate-section pass on active `SLUS_002.68` centered on `psx_apply_deferred_control_command` and `psx_control_assign_opcode_stream_by_index`. Current best read is now narrower and exporter-relevant: `DAT_80067938` provides constructor-placement-adjacent index data, `DAT_80067838` backs `8`-byte deferred-control row chains consumed by root/live-object mutation helpers, and `DAT_80067840` is an opcode-stream pointer table rather than hidden geometry. Practical consequence is that `post_audio_region_02` should be treated as a mixed resource/control payload zone until smaller typed sub-lanes are split out, not as a presumed flat floor table.