Regalamiunsorriso/faceai/docs/processor-technical-design.md

5.5 KiB

FaceAI Processor Technical Design

Goal

Add an internal processor service that executes face_matcher jobs for the public FaceAI site, while preventing duplicate searches per user and keeping all state short-lived and restart-safe.

Scope Of This Slice

  • add Redis-backed queue and job state
  • add a dedicated processor workspace and container scaffold
  • replace in-memory search orchestration in the public backend
  • preserve the existing frontend polling and legacy return flow
  • support local PKL testing from test_pkl/ mounted with the same directory shape used in hosted deployment

This slice does not yet implement production NAS mounting, persistent databases, or a final parser tailored to the real matcher CSV format.

Runtime Architecture

Public backend

  • owns the authenticated API used by the Vue frontend
  • stores uploaded selfies in a shared runtime volume
  • enqueues jobs into BullMQ
  • keeps per-search state, results, rate limits, and active-user locks in Redis
  • never executes face_matcher directly

Processor

  • consumes queue jobs from Redis using BullMQ worker concurrency
  • resolves the race-scoped PKL path for each job
  • executes the Linux face_matcher binary
  • parses the CSV result into legacy-compatible photoId matches
  • writes final state and result payload back to Redis

Redis

  • queue broker for BullMQ
  • source of truth for active-user locks
  • source of truth for search status and short-lived results
  • source of truth for rate-limit counters

Queue And Locking Model

  • queue name: faceai-searches
  • active lock key: faceai:active-search:user:{legacyUserId}
  • search record key: faceai:search:{searchId}
  • result record key: faceai:result:{resultId}
  • rate limit key prefix: faceai:rate-limit:{legacyUserId}

POST /api/searches must acquire the active-user lock before enqueueing. If the lock already exists, the backend returns 409 with error code ACTIVE_SEARCH_EXISTS.

The lock is released only when the processor marks the search as terminal: completed, failed, or timed_out.

Race And PKL Resolution

The canonical race key is still the legacy id_gara, but the worker no longer guesses the PKL path from raceId alone.

The legacy handoff must provide a raceStorage object with:

  • year
  • monthFolder like 04.APRILE
  • raceFolder like PISA

The processor resolves the PKL path using this mounted directory layout:

/data/pkl/
  2026/
    04.APRILE/
      PISA/
        face_encodings_20260330_170210.pkl
      LUCCA/
        face_encodings_20260330_170155.pkl

The lookup rule is:

  1. resolve /data/pkl/{year}/{monthFolder}/{raceFolder}
  2. list files at that race root
  3. take the first .pkl file found there, regardless of filename
  4. fail the job if the directory does not exist or contains no .pkl file

For local development, test_pkl/ is mounted directly into /data/pkl in both the public FaceAI container and the processor container, so the same rule is used in every environment.

Shared Runtime Storage

Both the public backend and the processor mount the same writable runtime directory:

/data/runtime/
  uploads/
  searches/
  • uploaded selfies are written under uploads/{searchId}/
  • worker output and logs are written under searches/{searchId}/
  • cleanup can safely remove old per-search directories after retention expires

Search Lifecycle

  1. frontend uploads a selfie and calls POST /api/searches
  2. backend validates session, rate limit, and active-user lock
  3. backend verifies that the mounted race directory exists and already contains a .pkl; if not, it rejects the request before queueing
  4. backend stores the upload and creates a Redis search record with status queued
  5. backend enqueues a BullMQ job
  6. processor picks up the job and sets status processing
  7. processor runs face_matcher
  8. processor parses CSV output into matches
  9. processor stores a result record and marks the search completed
  10. frontend polling reads Redis-backed state through GET /api/searches/:id
  11. existing redirect flow sends the user back to the legacy filtered page

Search Record Shape

{
  "id": "search_...",
  "status": "queued",
  "raceId": "101",
  "raceStorage": {
    "year": "2026",
    "monthFolder": "04.APRILE",
    "raceFolder": "PISA"
  },
  "userId": "legacy-user-1",
  "returnUrl": "https://...",
  "lang": "it",
  "selfieName": "selfie.jpg",
  "selfiePath": "/data/runtime/uploads/search_.../selfie.jpg",
  "resultId": null,
  "matchCount": 0,
  "errorCode": null,
  "errorMessage": null,
  "createdAt": 0,
  "startedAt": null,
  "completedAt": null
}

Result Shape

{
  "id": "result_...",
  "raceId": "101",
  "raceName": "Mezza di Firenze",
  "userId": "legacy-user-1",
  "returnUrl": "https://...",
  "lang": "it",
  "matches": [
    {
      "photoId": "legacy-photo-id",
      "score": 0.98,
      "label": "legacy-photo-id"
    }
  ],
  "createdAt": 0
}

Compose Topology

  • faceai: public backend plus built frontend
  • processor: queue consumer and matcher executor
  • redis: queue and short-lived state
  • legacy-php: local bridge simulator for end-to-end testing

Operational Defaults

  • worker concurrency: 2
  • active search retention: 24h
  • result retention: 24h
  • rate limit window: 5 requests / 10 minutes / user
  • worker timeout: 5 minutes

Known Follow-Up Work

  • confirm the real CSV columns emitted by face_matcher
  • verify the Linux binary shared library requirements inside the processor image
  • add cleanup jobs for expired runtime files