feat: add processor service with Redis-backed job queue

- Introduced a new `processor` service in the Docker Compose setup to handle face matching jobs. - Configured Redis as a job queue and state management system for processing searches. - Updated the backend to enqueue jobs and manage user locks using Redis. - Added environment variables for Redis configuration and runtime paths. - Created technical design documentation for the processor service outlining architecture, queue model, and search lifecycle. - Updated package.json and package-lock.json to include dependencies for BullMQ and ioredis in the processor workspace. - Added sample PKL files for local testing in the `test_pkl` directory.
2026-04-11 17:53:22 +02:00 · 2026-04-11 17:53:22 +02:00 · 81a1ac85af
commit 81a1ac85af
parent d5cdcd3332
20 changed files with 1313 additions and 108 deletions
--- a/faceai/docs/processor-technical-design.md
+++ b/faceai/docs/processor-technical-design.md
@ -0,0 +1,166 @@
+# FaceAI Processor Technical Design
+
+## Goal
+
+Add an internal processor service that executes `face_matcher` jobs for the public FaceAI site, while preventing duplicate searches per user and keeping all state short-lived and restart-safe.
+
+## Scope Of This Slice
+
+- add Redis-backed queue and job state
+- add a dedicated `processor` workspace and container scaffold
+- replace in-memory search orchestration in the public backend
+- preserve the existing frontend polling and legacy return flow
+- support local PKL testing from `test_pkl/`
+
+This slice does not yet implement production NAS mounting, persistent databases, or a final parser tailored to the real matcher CSV format.
+
+## Runtime Architecture
+
+### Public backend
+
+- owns the authenticated API used by the Vue frontend
+- stores uploaded selfies in a shared runtime volume
+- enqueues jobs into BullMQ
+- keeps per-search state, results, rate limits, and active-user locks in Redis
+- never executes `face_matcher` directly
+
+### Processor
+
+- consumes queue jobs from Redis using BullMQ worker concurrency
+- resolves the race-scoped PKL path for each job
+- executes the Linux `face_matcher` binary
+- parses the CSV result into legacy-compatible `photoId` matches
+- writes final state and result payload back to Redis
+
+### Redis
+
+- queue broker for BullMQ
+- source of truth for active-user locks
+- source of truth for search status and short-lived results
+- source of truth for rate-limit counters
+
+## Queue And Locking Model
+
+- queue name: `faceai-searches`
+- active lock key: `faceai:active-search:user:{legacyUserId}`
+- search record key: `faceai:search:{searchId}`
+- result record key: `faceai:result:{resultId}`
+- rate limit key prefix: `faceai:rate-limit:{legacyUserId}`
+
+`POST /api/searches` must acquire the active-user lock before enqueueing. If the lock already exists, the backend returns `409` with error code `ACTIVE_SEARCH_EXISTS`.
+
+The lock is released only when the processor marks the search as terminal: `completed`, `failed`, or `timed_out`.
+
+## Race And PKL Resolution
+
+The canonical race key is the legacy `id_gara`, already exposed as `raceId` in the existing handoff flow.
+
+The processor resolves the PKL path using a race-based directory layout:
+
+```text
+/data/pkl/
+  101/
+    face_encodings.pkl
+  202/
+    face_encodings.pkl
+```
+
+The lookup rule is:
+
+1. try `/data/pkl/{raceId}/face_encodings.pkl`
+2. optionally fall back to `/data/pkl/{raceId}.pkl`
+3. fail the job if neither exists
+
+For local development, `test_pkl/` is mounted into `/data/pkl/test` and the backend can fall back to the first `.pkl` file in that folder when no race-specific file exists yet.
+
+## Shared Runtime Storage
+
+Both the public backend and the processor mount the same writable runtime directory:
+
+```text
+/data/runtime/
+  uploads/
+  searches/
+```
+
+- uploaded selfies are written under `uploads/{searchId}/`
+- worker output and logs are written under `searches/{searchId}/`
+- cleanup can safely remove old per-search directories after retention expires
+
+## Search Lifecycle
+
+1. frontend uploads a selfie and calls `POST /api/searches`
+2. backend validates session, rate limit, and active-user lock
+3. backend stores the upload and creates a Redis search record with status `queued`
+4. backend enqueues a BullMQ job
+5. processor picks up the job and sets status `processing`
+6. processor runs `face_matcher`
+7. processor parses CSV output into matches
+8. processor stores a result record and marks the search `completed`
+9. frontend polling reads Redis-backed state through `GET /api/searches/:id`
+10. existing redirect flow sends the user back to the legacy filtered page
+
+## Search Record Shape
+
+```json
+{
+  "id": "search_...",
+  "status": "queued",
+  "raceId": "101",
+  "userId": "legacy-user-1",
+  "returnUrl": "https://...",
+  "lang": "it",
+  "selfieName": "selfie.jpg",
+  "selfiePath": "/data/runtime/uploads/search_.../selfie.jpg",
+  "resultId": null,
+  "matchCount": 0,
+  "errorCode": null,
+  "errorMessage": null,
+  "createdAt": 0,
+  "startedAt": null,
+  "completedAt": null
+}
+```
+
+## Result Shape
+
+```json
+{
+  "id": "result_...",
+  "raceId": "101",
+  "raceName": "Mezza di Firenze",
+  "userId": "legacy-user-1",
+  "returnUrl": "https://...",
+  "lang": "it",
+  "matches": [
+    {
+      "photoId": "legacy-photo-id",
+      "score": 0.98,
+      "label": "legacy-photo-id"
+    }
+  ],
+  "createdAt": 0
+}
+```
+
+## Compose Topology
+
+- `faceai`: public backend plus built frontend
+- `processor`: queue consumer and matcher executor
+- `redis`: queue and short-lived state
+- `legacy-php`: local bridge simulator for end-to-end testing
+
+## Operational Defaults
+
+- worker concurrency: `2`
+- active search retention: `24h`
+- result retention: `24h`
+- rate limit window: `5 requests / 10 minutes / user`
+- worker timeout: `5 minutes`
+
+## Known Follow-Up Work
+
+- confirm the real CSV columns emitted by `face_matcher`
+- verify the Linux binary shared library requirements inside the processor image
+- replace the PKL fallback with a strict NAS-backed race mapping once the final folder layout is agreed
+- add cleanup jobs for expired runtime files