AHOY Perception
The end-to-end platform for production computer vision — one integrated system from physical sensors to deployed inference: annotate, version, train, evaluate, deploy and monitor, with the data and the model fully traceable at every step.
API-first · SDK · CLI · Web
Edge · On-prem · Cloud · Air-gap
Full data & model lineage
Vertically-integrated architecture
Why it's different
- One platform, not a toolchain. Label, version, train, serve — no seams where reproducibility leaks out.
- Lineage by construction. Every model traces to the exact data and code that made it; data leakage is structurally impossible.
- Git for datasets. Immutable, fingerprinted, copy-on-write versions you can diff and pull back byte-for-byte.
- Runs on what you have. A resource governor admits work against live GPU/CPU capacity — SaaS or on-prem.
- Deploy by YAML. One config file takes an ONNX model to live multi-stream inference, edge to cloud.
The thin glue that is the product:
Best-in-class permissive components:
The same stack, placed where data lives:
AHOY Perception · Solution brief · v1.0 · Confidential — see the companion Technical Architecture & Platform Diagram for detail.
AHOY Perception
End-to-end platform for production computer vision — sensors to inference
Capture
Ingest from cameras, LiDAR, RADAR & recorded files
Annotate
AI-assisted labeling, review & quality control
Version
Immutable, fingerprinted datasets — git for data
Train
Catalog model, fine-tune, or bring your own
Optimize
Evaluate, pick the best, export to ONNX / TensorRT
Deploy & Infer
Serve on edge, on-prem or cloud — heterogeneous compute
Perception Workflow
One API-first platform for the full lifecycle — from raw sensor data to a deployed model — driven from a web studio, SDK, CLI or REST.
Lineage & Trust
Every model traces back to the exact data and code that made it. Reproducibility, access control and compliance across all datasets and models.
Vision Models
State-of-the-art CV architectures plus bring-your-own-model. Detection, classification, pose and segmentation, registered with full lineage.
Data & Compute Core
A content-addressed dataset store with git-like versioning, plus resource-aware training and serving on whatever hardware is present.
Heterogeneous Compute
Deploy on the hardware and clusters you already run. The same stack ships to a Jetson at the edge, a server room, or a cloud region.
From the sensor to the served model
An end-to-end platform and SDK for production computer vision — covering the full journey from physical sensors to deployed inference: annotate, version, train, evaluate, deploy and monitor, with the data and the model fully traceable at every step.
01 Executive summary
Computer-vision teams today stitch together a dozen disconnected tools — one to label, another to store data, a notebook to train, a registry somewhere, and a separate runtime to deploy. The seams are where reproducibility, accuracy and time-to-production are lost.
AHOY Perception collapses that toolchain into a single, vertically-integrated platform. A team ingests data from cameras, LiDAR or RADAR; labels it with AI assistance; freezes it into an immutable, versioned dataset; trains a built-in or custom model on whatever compute is available; evaluates and selects the best candidate; exports it to an optimized runtime format; and deploys it for live inference at the edge, on-prem or in the cloud — all through one API, one SDK and one web studio.
North star. Any model's lineage must trace back to the exact data and the exact code that created it. Every dataset is an immutable, content-addressed snapshot; every training run links a model version to the dataset version and configuration that produced it. Reproducibility is structural, not aspirational.
The guiding build philosophy is to adopt the hard, generic parts and build only the thin glue that is the product: best-in-class permissively-licensed components (annotation, tracking, serving, identity) stitched by an API-first backend, with three things built in-house because off-the-shelf tools do them badly — a git-like data-versioning engine, a resource-aware job governor, and a YAML-driven inference runtime.
02 Architecture at a glance
The platform is organized as five layers. The workflow a user touches sits on top; underneath it, governance makes every action traceable, the model layer supplies the intelligence, the data & compute engine does the heavy lifting, and the infrastructure layer places it all on the hardware you already run.
See the companion Platform Architecture Diagram for the full layered view with per-layer capabilities.
Perception Workflow
The end-to-end lifecycle, exposed identically through Web Studio, Python SDK, CLI and REST.
Lineage & Trust
RBAC, audit, reproducible splits, immutability, tenant isolation, compliance.
Vision Models
Detection, classification, pose, segmentation, plus bring-your-own-model.
Perception Data Engine
Content-addressed versioning, resource governor, sandboxed training, serving.
03 The perception lifecycle
Every capability maps to one stage of a single closed loop. The same steps run whether driven by a point-and-click studio user or a CI pipeline calling the SDK — because both hit the same API.
① Capture — from physical sensors to managed data
Data enters from live cameras (RTSP/RTMP/HLS), recorded video and image files, or other modalities such as LiDAR and RADAR captures. Bytes are streamed into object storage, hashed (SHA-256) and stored exactly once — duplicates across a workspace collapse to a single blob.
② Annotate — label with AI assistance
A rebranded annotation studio handles boxes, masks, keypoints and classes, with optional AI pre-labeling (e.g. SAM / GroundingDINO) to bootstrap labels for a human to correct. Already-labeled data imports directly via COCO or YOLO.
③ Version — git for datasets
The working set is frozen into a numbered, immutable version with a content fingerprint and a deterministic, leakage-safe train/val/test split. Versioning is metadata-only and copy-on-write — a new version that adds 300 images to an existing 1,200 stores only the 300 new blobs.
④ Train — catalog, fine-tune, or bring your own
Pick a built-in architecture, a fine-tuning recipe, or supply your own model code. The job is admitted by the resource governor when a slot is free; untrusted custom code runs sandboxed. Metrics stream live to the studio and the SDK.
⑤ Optimize — evaluate, select, export
Compare runs on per-class metrics, PR curves and confusion matrices; promote the best candidate; export to ONNX (portable) or a TensorRT engine (fastest on NVIDIA), packaged for the inference runtime.
⑥ Deploy & Infer — anywhere
The exported model drops into the YAML-driven inference runtime and serves live streams on edge devices, on-prem servers or cloud GPUs. Detections flow out as overlaid video, files, or structured Kafka messages.
04 Layer-by-layer
Application Layer — Perception Workflow
The platform is API-first: one versioned public API is the product, and the Web Studio, Python SDK and CLI are co-equal clients of it. Everything the UI can do is reachable from code.
- Web Studio — point-and-click for labelers, reviewers and managers.
- Python SDK — script the whole loop: create → upload → version → train → stream metrics → export → deploy.
- CLI — a thin wrapper over the SDK for CI/CD and ops.
- REST API — versioned (
/v1), OpenAPI-specified, with async-job semantics and webhooks.
Governance Layer — Lineage & Trust
- Data & model lineage — a dual foreign key ties every model to its dataset version and run.
- Reproducible splits & immutability — committed versions never change; splits are deterministic.
- RBAC, SSO/MFA, audit logging — adopted via Keycloak; who-changed-what is recorded.
- Tenant isolation & quotas — strict per-workspace separation, storage quotas, GDPR purge.
Model Layer — Vision Models
- Built-in architectures — detection (YOLO, RT-DETR), classification, pose, instance segmentation.
- NVIDIA TAO — the native train → ONNX → TensorRT → DeepStream path for CV.
- Bring-your-own-model — three tiers: catalog, script-mode, and full custom container.
- Plugin contracts — register a model like a DeepStream plugin; the UI builds its config form automatically.
Perception Data Engine — Data & Compute Core
- Content-addressed store — bytes stored once by hash, deduplicated per workspace.
- Git-for-datasets — immutable versions, diffs, tags, lineage, COCO/YOLO import-export.
- Resource governor & job queue — admits work against live hardware capacity.
- Serving runtimes — Triton for hosted endpoints, DeepStream for streaming inference.
Infrastructure Layer — Heterogeneous Compute
- Targets — Edge (Jetson/IGX), on-prem servers, cloud (AWS/Azure/GCP), and air-gapped sites.
- Compute — GPU / MIG slices, multi-GPU, or CPU-only — the platform detects and adapts.
- Storage — cloud S3 or self-hosted SeaweedFS; PostgreSQL for all metadata.
05 Data versioning engine Built in-house
This is the bottom of the stack and the part no off-the-shelf tool gives you the right way. It is an independent, hexagonal module (ports & adapters): a pure core that runs standalone on a laptop (SQLite + local disk), embedded in the platform, or as a microservice — the same code in every mode.
Content-addressed, copy-on-write storage
File bytes are stored once by their sha256 and deduplicated within a workspace. A dataset references content; a version freezes those references. Nothing copies bytes, so versions are cheap metadata.
| Table | Role |
|---|---|
datasets | Named, workspace-owned containers; soft-delete; unique slug per workspace. |
files | Content records — one per unique (workspace, sha256). |
dataset_files | The mutable working area — (dataset, filename) → file. |
dataset_versions | Immutable snapshots — number, fingerprint, split, parent_id lineage. |
version_files | The frozen manifest — (filename, sha256, role, split) per version. |
version_tags | Stable mutable names (production, golden) → one version. |
Reproducible by construction
- Fingerprint —
manifest_hash= sha256 over the sorted manifest rows, computed in Python (codepoint sort, canonical JSON) so it is identical on SQLite and Postgres. - Leakage-safe split — each file's bucket is decided by hashing its content plus a seed, so duplicate bytes can never straddle train and test, and the split survives renames.
- Lineage — every commit records its
parent_id; the chain walks back to origin.
Safe deletion over immutable, deduplicated data
Because versions are immutable and blobs are shared, deletion removes a reference, never naively the bytes. Reference counting plus a grace-period garbage collector reclaims a blob only when zero versions point at it. A lineage guard protects any version a model was trained on, and GDPR erasure is an explicit, audited tombstone.
06 Models & training
For the CV core, NVIDIA TAO Toolkit is the backbone: a clean train → export(.onnx) → TensorRT engine path that drops straight into the inference runtime. Alongside it, MMDetection / RT-DETR and Hugging Face cover Apache/BSD-licensed alternatives.
Three execution tiers for custom models
| Tier | What the user brings | Runs in | Trade-off |
|---|---|---|---|
| 0 · Catalog | A built-in architecture + hyperparameters | Curated images | Lowest effort, safest |
| 1 · Script mode | A training script + a deps list | Curated framework base images | BYO model, recommended default |
| 2 · BYO container | A full Docker image with their code | Their image, sandboxed | Max flexibility, max security surface |
The plugin framework — "define the rules, users add models"
Just as DeepStream lets you drop in a custom plugin by implementing a known interface, the platform defines a small set of extension-point contracts (build / train / evaluate / export). A user implements them to make their own model a first-class, trainable, reproducible citizen — discovered via standard Python entry points, configured from a JSON-Schema manifest that the UI renders into a form automatically.
07 Inference & deployment runtime Built in-house
The Vision SDK replaces dozens of low-level DeepStream config files, GStreamer elements and inference parameters with a single declarative YAML file. You provide an ONNX model and labels; the SDK generates every DeepStream config, links the pipeline, and routes multi-stream output automatically.
Model types
Detector, classifier, pose (17 COCO joints), instance segmentation, and Triton-served variants — chainable (e.g. detect → track → classify).
Architectures
YOLO v5–v13 / v26 & YOLOX, DetectNet_v2, YOLO-pose, YOLO-seg, and any softmax classifier (ResNet, EfficientNet, ViT…).
Inputs
Local files, RTSP, RTMP and HLS streams — single or multi-stream, batched automatically.
Outputs
X11 display overlays, MP4/MKV files, RTSP/RTMP re-streaming, FPS metering, and structured per-frame Kafka detection messages.
Hardware-accelerated encode/decode (GPU by default, CPU fallback) means the same YAML runs on a Jetson at the edge and a multi-GPU server — only the device changes.
08 Reproducibility & lineage
Lineage is the spine that connects the two in-house engines. The data engine fingerprints every dataset version; the training layer records, for every run, which dataset version and which model code produced the artifact.
| Question | How the architecture answers it |
|---|---|
| What exact data trained this model? | A dual FK from the model version to its dataset_version — a single join. |
| Can I rebuild last month's dataset? | version.pull() reconstructs the exact files, labels and split byte-for-byte. |
| Did the data change between runs? | diff(a, b) reports added / removed / relabeled, with old and new hashes. |
| Which code produced the model? | The run records its source type and reference (catalog id, script + commit, or image digest). |
| Did test data leak into training? | Content-hash splitting makes leakage structurally impossible. |
09 Multi-tenancy & resource governance Built in-house
What a user is allowed to run depends on live capacity, not hardcoded limits — and this holds identically for hosted SaaS and a customer's on-prem box, where the hardware is unknown in advance. A resource governor probes the host and admits work against real capacity.
- Probe — GPUs present? VRAM free? CPU cores, RAM, disk — continuously, gracefully handling a no-GPU host.
- Request — every job carries a resource request (catalog models fill it in; plugins declare it).
- Admit / queue / reject — fits now → run; could fit later → queue with a visible position; can never fit → reject with a clear reason.
| Detected hardware | Sharing strategy | Concurrency |
|---|---|---|
| No GPU (CPU-only) | CPU/RAM-bound; GPU jobs rejected | by free cores / RAM |
| One consumer GPU | whole-GPU per job; MPS for bursty small jobs | ~1 heavy job |
| Several GPUs | one job per GPU, or multi-GPU jobs | = # GPUs |
| Datacenter GPU (A100/H100) | optional MIG partitioning into isolated slices | up to # slices |
Tenant isolation is structural: every operation resolves through a single workspace-scoped chokepoint, so cross-tenant access is impossible by construction rather than by per-call discipline.
10 Deployment topologies
Jetson / IGX devices
The inference runtime ships to the device; models exported as TensorRT engines for the target GPU. Streams processed locally, detections published upstream.
Single-tenant appliance
The full platform as a self-contained bundle on the customer's LAN. The governor auto-sizes to the box; license-safe component swaps (SeaweedFS, Valkey).
Multi-tenant SaaS
Hosted, with cloud object storage, GPU pools and MIG-aware scheduling; horizontal scale-out via Ray or Kubernetes when a single node is outgrown.
Disconnected sites
For regulated or offline environments — the same bundle with no outbound dependencies, models and data never leaving the perimeter.
11 Technology stack & licensing
| Concern | Choice | Posture | Why |
|---|---|---|---|
| Public API / SDK / CLI | FastAPI · Python SDK · Typer | Build | One versioned API; clients generated from the spec. |
| Identity | Keycloak | Adopt | Apache-2.0; realms = tenants; SSO/MFA free. |
| Annotation | Label Studio CE | Adopt | Apache-2.0; rebrand & resell freely. |
| Data versioning | Custom on Postgres + object store | Build | SaaS needs row-level, API-first snapshots — not git-shaped tools. |
| Tracking + registry | MLflow (headless) | Adopt | Apache-2.0 REST backend; charts rendered in our UI. |
| Queue + sandbox | RQ/Dramatiq + Valkey · gVisor | Build/Adopt | Clean hard-cancel; safe untrusted-code execution. |
| Training (CV) | NVIDIA TAO · MMDet / RT-DETR | Adopt ⚠ | Native ONNX→TRT→DeepStream; verify NVIDIA EULA. |
| Serving | Triton · DeepStream (Vision SDK) | Adopt/Build | BSD-3 Triton; YAML runtime built in-house. |
| Object storage | Cloud S3 / SeaweedFS | Adopt | Avoid MinIO (AGPL) for the appliance. |
12 Delivery roadmap
| Phase | Delivers | Milestone |
|---|---|---|
| 1 · Data engine | Content-addressed store, immutable versions, diff/pull, lineage, SDK + CLI. | Import a folder → commit v1 → pull it back byte-for-byte. |
| 2 · Label & interchange | Rebranded annotation, COCO/YOLO import-export, tags, stats. | Annotate & freeze a labeled dataset via UI and API. |
| 3 · Train & track | Governor, catalog + script-mode training, MLflow, live metrics, registry. | One-call .train() with live metrics, model tied to dataset version. |
| 4 · BYO & serve | gVisor sandbox + BYO-container, ONNX/TRT export, Triton + DeepStream serving, RBAC, audit. | Multi-tenant beta with full bring-your-own-model. |
| 5 · Appliance & scale | On-prem bundle, air-gap, license-safe swaps; Ray/K8s scale-out. | Customer runs the rebranded appliance on their LAN / edge. |