Regalamiunsorriso/faceai/docs/processor-technical-design.md

# FaceAI Processor Technical Design

## Goal

Add an internal processor service that executes `face_matcher` jobs for the public FaceAI site, while preventing duplicate searches per user and keeping all state short-lived and restart-safe.

## Scope Of This Slice

- add Redis-backed queue and job state
- add a dedicated `processor` workspace and container scaffold
- replace in-memory search orchestration in the public backend
- preserve the existing frontend polling and legacy return flow
- support local PKL testing from `test_pkl/` mounted with the same directory shape used in hosted deployment

This slice does not yet implement production NAS mounting, persistent databases, or a final parser tailored to the real matcher CSV format.

## Runtime Architecture

### Public backend

- owns the authenticated API used by the Vue frontend
- stores uploaded selfies in a shared runtime volume
- enqueues jobs into BullMQ
- keeps per-search state, results, rate limits, and active-user locks in Redis
- never executes `face_matcher` directly

### Processor

- consumes queue jobs from Redis using BullMQ worker concurrency
- resolves the race-scoped PKL path for each job
- executes the Linux `face_matcher` binary
- parses the CSV result into legacy-compatible `photoId` matches
- writes final state and result payload back to Redis

### Redis

- queue broker for BullMQ
- source of truth for active-user locks
- source of truth for search status and short-lived results
- source of truth for rate-limit counters

## Queue And Locking Model

- queue name: `faceai-searches`
- active lock key: `faceai:active-search:user:{legacyUserId}`
- search record key: `faceai:search:{searchId}`
- result record key: `faceai:result:{resultId}`
- rate limit key prefix: `faceai:rate-limit:{legacyUserId}`

`POST /api/searches` must acquire the active-user lock before enqueueing. If the lock already exists, the backend returns `409` with error code `ACTIVE_SEARCH_EXISTS`.

The lock is released only when the processor marks the search as terminal: `completed`, `failed`, or `timed_out`.

## Race And PKL Resolution

The canonical race key is still the legacy `id_gara`, but the worker no longer guesses the PKL path from `raceId` alone.

The legacy handoff must provide a `raceStorage` object with:

- `year`
- `monthFolder` like `04.APRILE`
- `raceFolder` like `PISA`

The processor resolves the PKL path using this mounted directory layout:

```text
/data/pkl/
  2026/
    04.APRILE/
      PISA/
        face_encodings_20260330_170210.pkl
      LUCCA/
        face_encodings_20260330_170155.pkl
```

The lookup rule is:

1. resolve `/data/pkl/{year}/{monthFolder}/{raceFolder}`
2. list files at that race root
3. take the first `.pkl` file found there, regardless of filename
4. fail the job if the directory does not exist or contains no `.pkl` file

For local development, `test_pkl/` is mounted directly into `/data/pkl` in both the public FaceAI container and the processor container, so the same rule is used in every environment.

## Shared Runtime Storage

Both the public backend and the processor mount the same writable runtime directory:

```text
/data/runtime/
  uploads/
  searches/
```

- uploaded selfies are written under `uploads/{searchId}/`
- worker output and logs are written under `searches/{searchId}/`
- cleanup can safely remove old per-search directories after retention expires

## Search Lifecycle

1. frontend uploads a selfie and calls `POST /api/searches`
2. backend validates session, rate limit, and active-user lock
3. backend verifies that the mounted race directory exists and already contains a `.pkl`; if not, it rejects the request before queueing
4. backend stores the upload and creates a Redis search record with status `queued`
5. backend enqueues a BullMQ job
6. processor picks up the job and sets status `processing`
7. processor runs `face_matcher`
8. processor parses CSV output into matches
9. processor stores a result record and marks the search `completed`
10. frontend polling reads Redis-backed state through `GET /api/searches/:id`
11. existing redirect flow sends the user back to the legacy filtered page

## Search Record Shape

```json
{
  "id": "search_...",
  "status": "queued",
  "raceId": "101",
  "raceStorage": {
    "year": "2026",
    "monthFolder": "04.APRILE",
    "raceFolder": "PISA"
  },
  "userId": "legacy-user-1",
  "returnUrl": "https://...",
  "lang": "it",
  "selfieName": "selfie.jpg",
  "selfiePath": "/data/runtime/uploads/search_.../selfie.jpg",
  "resultId": null,
  "matchCount": 0,
  "errorCode": null,
  "errorMessage": null,
  "createdAt": 0,
  "startedAt": null,
  "completedAt": null
}
```

## Result Shape

```json
{
  "id": "result_...",
  "raceId": "101",
  "raceName": "Mezza di Firenze",
  "userId": "legacy-user-1",
  "returnUrl": "https://...",
  "lang": "it",
  "matches": [
    {
      "photoId": "legacy-photo-id",
      "score": 0.98,
      "label": "legacy-photo-id"
    }
  ],
  "createdAt": 0
}
```

## Compose Topology

- `faceai`: public backend plus built frontend
- `processor`: queue consumer and matcher executor
- `redis`: queue and short-lived state
- `legacy-php`: local bridge simulator for end-to-end testing

## Operational Defaults

- worker concurrency: `2`
- active search retention: `24h`
- result retention: `24h`
- rate limit window: `5 requests / 10 minutes / user`
- worker timeout: `5 minutes`

## Known Follow-Up Work

- confirm the real CSV columns emitted by `face_matcher`
- verify the Linux binary shared library requirements inside the processor image
- add cleanup jobs for expired runtime files