Solution brief · v1.0

AHOY Perception

The end-to-end platform for production computer vision — one integrated system from physical sensors to deployed inference: annotate, version, train, evaluate, deploy and monitor, with the data and the model fully traceable at every step.

Sensors → Inference
API-first · SDK · CLI · Web
Edge · On-prem · Cloud · Air-gap
Full data & model lineage

The perception workflow — one continuous loop

Vertically-integrated architecture

Application

Sensor ingest · auto-labeling · annotation · versioning · training · evaluation · export · active learning — via Web Studio, Python SDK, CLI & REST.

Governance

Data & model lineage · reproducible splits · immutability · RBAC · SSO/MFA · audit · tenant isolation · quotas · GDPR purge.

Models

Detection (YOLO · RT-DETR) · classification · pose · segmentation · NVIDIA TAO · bring-your-own-model · plugin contracts · registry.

Data & Compute

Content-addressed store · git-for-datasets · COCO/YOLO I/O · resource governor · sandboxed training · Triton & DeepStream serving.

Infrastructure

Edge (Jetson/IGX) · on-prem · cloud · air-gap · Kubernetes · GPU/MIG · CPU-only · S3/SeaweedFS · PostgreSQL.

Why it's different

One platform, not a toolchain. Label, version, train, serve — no seams where reproducibility leaks out.
Lineage by construction. Every model traces to the exact data and code that made it; data leakage is structurally impossible.
Git for datasets. Immutable, fingerprinted, copy-on-write versions you can diff and pull back byte-for-byte.
Runs on what you have. A resource governor admits work against live GPU/CPU capacity — SaaS or on-prem.
Deploy by YAML. One config file takes an ONNX model to live multi-stream inference, edge to cloud.

Built in-house

The thin glue that is the product:

Data versioningResource governorYAML inference runtime

Adopted, license-clean

Best-in-class permissive components:

KeycloakLabel StudioMLflowTritonTAO

Deploy targets

The same stack, placed where data lives:

EdgeOn-premCloudAir-gap

AHOY Perception · Solution brief · v1.0 · Confidential — see the companion Technical Architecture & Platform Diagram for detail.

The perception workflow

Capture

Ingest from cameras, LiDAR, RADAR & recorded files

Annotate

AI-assisted labeling, review & quality control

Version

Immutable, fingerprinted datasets — git for data

Train

Catalog model, fine-tune, or bring your own

Optimize

Evaluate, pick the best, export to ONNX / TensorRT

Deploy & Infer

Serve on edge, on-prem or cloud — heterogeneous compute

Active-learning loop — low-confidence detections from the field flow back to annotation, compounding accuracy.

Application Layer

Perception Workflow

One API-first platform for the full lifecycle — from raw sensor data to a deployed model — driven from a web studio, SDK, CLI or REST.

Sensor Ingest

Auto-Labeling

Annotation Studio

Dataset Versioning

Training & Tuning

Model Evaluation

Export & Optimize

Active Learning

Web Studio

Python SDK

CLI

REST API

Governance Layer

Lineage & Trust

Every model traces back to the exact data and code that made it. Reproducibility, access control and compliance across all datasets and models.

Data & Model Lineage

Reproducible Splits

Dataset Immutability

RBAC

Auth · SSO / MFA

Audit Logging

Tenant Isolation

Storage Quotas

GDPR Purge

Experiment Tracking

Model Layer

Vision Models

State-of-the-art CV architectures plus bring-your-own-model. Detection, classification, pose and segmentation, registered with full lineage.

Detection · YOLO / RT-DETR

Classification

Pose Estimation

Instance Segmentation

NVIDIA TAO

Bring-Your-Own-Model

Plugin Contracts

Model Registry

Hyperparameter Tuning

Model Routing

Perception Data Engine

Data & Compute Core

A content-addressed dataset store with git-like versioning, plus resource-aware training and serving on whatever hardware is present.

Content-Addressed Store

Git-for-Datasets

COCO / YOLO I/O

Dataset Diff & Stats

Resource Governor

Job Queue

Sandboxed Execution

Triton Serving

DeepStream Runtime

MLflow Backend

Infrastructure Layer

Heterogeneous Compute

Deploy on the hardware and clusters you already run. The same stack ships to a Jetson at the edge, a server room, or a cloud region.

Edge · Jetson / IGX

On-prem

Cloud · AWS / Azure / GCP

Air-gap

Kubernetes

GPU / MIG

CPU-only

S3 / SeaweedFS

PostgreSQL

AHOY Perception

Technical Architecture · v1.0

From the sensor to the served model

An end-to-end platform and SDK for production computer vision — covering the full journey from physical sensors to deployed inference: annotate, version, train, evaluate, deploy and monitor, with the data and the model fully traceable at every step.

Sensors → Inference, A-to-ZAPI-first · SDK · CLI · WebEdge · On-prem · CloudFull data & model lineage

01 Executive summary

One platform from the sensor to the served model

Computer-vision teams today stitch together a dozen disconnected tools — one to label, another to store data, a notebook to train, a registry somewhere, and a separate runtime to deploy. The seams are where reproducibility, accuracy and time-to-production are lost.

AHOY Perception collapses that toolchain into a single, vertically-integrated platform. A team ingests data from cameras, LiDAR or RADAR; labels it with AI assistance; freezes it into an immutable, versioned dataset; trains a built-in or custom model on whatever compute is available; evaluates and selects the best candidate; exports it to an optimized runtime format; and deploys it for live inference at the edge, on-prem or in the cloud — all through one API, one SDK and one web studio.

North star. Any model's lineage must trace back to the exact data and the exact code that created it. Every dataset is an immutable, content-addressed snapshot; every training run links a model version to the dataset version and configuration that produced it. Reproducibility is structural, not aspirational.

The guiding build philosophy is to adopt the hard, generic parts and build only the thin glue that is the product: best-in-class permissively-licensed components (annotation, tracking, serving, identity) stitched by an API-first backend, with three things built in-house because off-the-shelf tools do them badly — a git-like data-versioning engine, a resource-aware job governor, and a YAML-driven inference runtime.

02 Architecture at a glance

A vertically-integrated stack, five layers deep

The platform is organized as five layers. The workflow a user touches sits on top; underneath it, governance makes every action traceable, the model layer supplies the intelligence, the data & compute engine does the heavy lifting, and the infrastructure layer places it all on the hardware you already run.

See the companion Platform Architecture Diagram for the full layered view with per-layer capabilities.

Application

Perception Workflow

The end-to-end lifecycle, exposed identically through Web Studio, Python SDK, CLI and REST.

Governance

Lineage & Trust

RBAC, audit, reproducible splits, immutability, tenant isolation, compliance.

Model

Vision Models

Detection, classification, pose, segmentation, plus bring-your-own-model.

Data & Compute

Perception Data Engine

Content-addressed versioning, resource governor, sandboxed training, serving.

03 The perception lifecycle

Six stages, one continuous loop

Every capability maps to one stage of a single closed loop. The same steps run whether driven by a point-and-click studio user or a CI pipeline calling the SDK — because both hit the same API.

① Capture — from physical sensors to managed data

Data enters from live cameras (RTSP/RTMP/HLS), recorded video and image files, or other modalities such as LiDAR and RADAR captures. Bytes are streamed into object storage, hashed (SHA-256) and stored exactly once — duplicates across a workspace collapse to a single blob.

② Annotate — label with AI assistance

A rebranded annotation studio handles boxes, masks, keypoints and classes, with optional AI pre-labeling (e.g. SAM / GroundingDINO) to bootstrap labels for a human to correct. Already-labeled data imports directly via COCO or YOLO.

③ Version — git for datasets

The working set is frozen into a numbered, immutable version with a content fingerprint and a deterministic, leakage-safe train/val/test split. Versioning is metadata-only and copy-on-write — a new version that adds 300 images to an existing 1,200 stores only the 300 new blobs.

④ Train — catalog, fine-tune, or bring your own

Pick a built-in architecture, a fine-tuning recipe, or supply your own model code. The job is admitted by the resource governor when a slot is free; untrusted custom code runs sandboxed. Metrics stream live to the studio and the SDK.

⑤ Optimize — evaluate, select, export

Compare runs on per-class metrics, PR curves and confusion matrices; promote the best candidate; export to ONNX (portable) or a TensorRT engine (fastest on NVIDIA), packaged for the inference runtime.

⑥ Deploy & Infer — anywhere

The exported model drops into the YAML-driven inference runtime and serves live streams on edge devices, on-prem servers or cloud GPUs. Detections flow out as overlaid video, files, or structured Kafka messages.

The loop closes. Low-confidence detections observed in the field are surfaced back into annotation as the next batch to label — an active-learning flywheel that compounds accuracy with every deployment cycle.

04 Layer-by-layer

What lives in each layer

Application Layer — Perception Workflow

The platform is API-first: one versioned public API is the product, and the Web Studio, Python SDK and CLI are co-equal clients of it. Everything the UI can do is reachable from code.

Web Studio — point-and-click for labelers, reviewers and managers.
Python SDK — script the whole loop: create → upload → version → train → stream metrics → export → deploy.
CLI — a thin wrapper over the SDK for CI/CD and ops.
REST API — versioned (/v1), OpenAPI-specified, with async-job semantics and webhooks.

Governance Layer — Lineage & Trust

Data & model lineage — a dual foreign key ties every model to its dataset version and run.
Reproducible splits & immutability — committed versions never change; splits are deterministic.
RBAC, SSO/MFA, audit logging — adopted via Keycloak; who-changed-what is recorded.
Tenant isolation & quotas — strict per-workspace separation, storage quotas, GDPR purge.

Model Layer — Vision Models

Built-in architectures — detection (YOLO, RT-DETR), classification, pose, instance segmentation.
NVIDIA TAO — the native train → ONNX → TensorRT → DeepStream path for CV.
Bring-your-own-model — three tiers: catalog, script-mode, and full custom container.
Plugin contracts — register a model like a DeepStream plugin; the UI builds its config form automatically.

Perception Data Engine — Data & Compute Core

Content-addressed store — bytes stored once by hash, deduplicated per workspace.
Git-for-datasets — immutable versions, diffs, tags, lineage, COCO/YOLO import-export.
Resource governor & job queue — admits work against live hardware capacity.
Serving runtimes — Triton for hosted endpoints, DeepStream for streaming inference.

Infrastructure Layer — Heterogeneous Compute

Targets — Edge (Jetson/IGX), on-prem servers, cloud (AWS/Azure/GCP), and air-gapped sites.
Compute — GPU / MIG slices, multi-GPU, or CPU-only — the platform detects and adapts.
Storage — cloud S3 or self-hosted SeaweedFS; PostgreSQL for all metadata.

05 Data versioning engine Built in-house

"Git for datasets" — the foundation everything sits on

This is the bottom of the stack and the part no off-the-shelf tool gives you the right way. It is an independent, hexagonal module (ports & adapters): a pure core that runs standalone on a laptop (SQLite + local disk), embedded in the platform, or as a microservice — the same code in every mode.

Content-addressed, copy-on-write storage

File bytes are stored once by their sha256 and deduplicated within a workspace. A dataset references content; a version freezes those references. Nothing copies bytes, so versions are cheap metadata.

Table	Role
`datasets`	Named, workspace-owned containers; soft-delete; unique slug per workspace.
`files`	Content records — one per unique `(workspace, sha256)`.
`dataset_files`	The mutable working area — `(dataset, filename) → file`.
`dataset_versions`	Immutable snapshots — number, fingerprint, split, `parent_id` lineage.
`version_files`	The frozen manifest — `(filename, sha256, role, split)` per version.
`version_tags`	Stable mutable names (`production`, `golden`) → one version.

Reproducible by construction

Fingerprint — manifest_hash = sha256 over the sorted manifest rows, computed in Python (codepoint sort, canonical JSON) so it is identical on SQLite and Postgres.
Leakage-safe split — each file's bucket is decided by hashing its content plus a seed, so duplicate bytes can never straddle train and test, and the split survives renames.
Lineage — every commit records its parent_id; the chain walks back to origin.

Safe deletion over immutable, deduplicated data

Because versions are immutable and blobs are shared, deletion removes a reference, never naively the bytes. Reference counting plus a grace-period garbage collector reclaims a blob only when zero versions point at it. A lineage guard protects any version a model was trained on, and GDPR erasure is an explicit, audited tombstone.

06 Models & training

Built-in intelligence, plus bring-your-own-model

For the CV core, NVIDIA TAO Toolkit is the backbone: a clean train → export(.onnx) → TensorRT engine path that drops straight into the inference runtime. Alongside it, MMDetection / RT-DETR and Hugging Face cover Apache/BSD-licensed alternatives.

Three execution tiers for custom models

Tier	What the user brings	Runs in	Trade-off
0 · Catalog	A built-in architecture + hyperparameters	Curated images	Lowest effort, safest
1 · Script mode	A training script + a deps list	Curated framework base images	BYO model, recommended default
2 · BYO container	A full Docker image with their code	Their image, sandboxed	Max flexibility, max security surface

The plugin framework — "define the rules, users add models"

Just as DeepStream lets you drop in a custom plugin by implementing a known interface, the platform defines a small set of extension-point contracts (build / train / evaluate / export). A user implements them to make their own model a first-class, trainable, reproducible citizen — discovered via standard Python entry points, configured from a JSON-Schema manifest that the UI renders into a form automatically.

Why build untrusted-code execution? Tiers 1–2 run customer code on shared GPUs. The runtime is sandboxed with gVisor + nvproxy (the approach Modal Labs uses for multi-tenant GPU sandboxing) — composable with MIG slicing and low-overhead for GPU-bound work — with Kata/Confidential-Containers as an optional VM-grade premium tier.

07 Inference & deployment runtime Built in-house

From ONNX to live streams with one YAML file

The Vision SDK replaces dozens of low-level DeepStream config files, GStreamer elements and inference parameters with a single declarative YAML file. You provide an ONNX model and labels; the SDK generates every DeepStream config, links the pipeline, and routes multi-stream output automatically.

Model types

Detector, classifier, pose (17 COCO joints), instance segmentation, and Triton-served variants — chainable (e.g. detect → track → classify).

Architectures

YOLO v5–v13 / v26 & YOLOX, DetectNet_v2, YOLO-pose, YOLO-seg, and any softmax classifier (ResNet, EfficientNet, ViT…).

Inputs

Local files, RTSP, RTMP and HLS streams — single or multi-stream, batched automatically.

Outputs

X11 display overlays, MP4/MKV files, RTSP/RTMP re-streaming, FPS metering, and structured per-frame Kafka detection messages.

Hardware-accelerated encode/decode (GPU by default, CPU fallback) means the same YAML runs on a Jetson at the edge and a multi-GPU server — only the device changes.

08 Reproducibility & lineage

Trace any model back to its exact data and code

Lineage is the spine that connects the two in-house engines. The data engine fingerprints every dataset version; the training layer records, for every run, which dataset version and which model code produced the artifact.

Question	How the architecture answers it
What exact data trained this model?	A dual FK from the model version to its `dataset_version` — a single join.
Can I rebuild last month's dataset?	`version.pull()` reconstructs the exact files, labels and split byte-for-byte.
Did the data change between runs?	`diff(a, b)` reports added / removed / relabeled, with old and new hashes.
Which code produced the model?	The run records its source type and reference (catalog id, script + commit, or image digest).
Did test data leak into training?	Content-hash splitting makes leakage structurally impossible.

09 Multi-tenancy & resource governance Built in-house

Admit work against the hardware that is actually present

What a user is allowed to run depends on live capacity, not hardcoded limits — and this holds identically for hosted SaaS and a customer's on-prem box, where the hardware is unknown in advance. A resource governor probes the host and admits work against real capacity.

Probe — GPUs present? VRAM free? CPU cores, RAM, disk — continuously, gracefully handling a no-GPU host.
Request — every job carries a resource request (catalog models fill it in; plugins declare it).
Admit / queue / reject — fits now → run; could fit later → queue with a visible position; can never fit → reject with a clear reason.

Detected hardware	Sharing strategy	Concurrency
No GPU (CPU-only)	CPU/RAM-bound; GPU jobs rejected	by free cores / RAM
One consumer GPU	whole-GPU per job; MPS for bursty small jobs	~1 heavy job
Several GPUs	one job per GPU, or multi-GPU jobs	= # GPUs
Datacenter GPU (A100/H100)	optional MIG partitioning into isolated slices	up to # slices

Tenant isolation is structural: every operation resolves through a single workspace-scoped chokepoint, so cross-tenant access is impossible by construction rather than by per-call discipline.

10 Deployment topologies

The same stack, placed where the data lives

Edge

Jetson / IGX devices

The inference runtime ships to the device; models exported as TensorRT engines for the target GPU. Streams processed locally, detections published upstream.

On-prem

Single-tenant appliance

The full platform as a self-contained bundle on the customer's LAN. The governor auto-sizes to the box; license-safe component swaps (SeaweedFS, Valkey).

Cloud

Multi-tenant SaaS

Hosted, with cloud object storage, GPU pools and MIG-aware scheduling; horizontal scale-out via Ray or Kubernetes when a single node is outgrown.

Air-gap

Disconnected sites

For regulated or offline environments — the same bundle with no outbound dependencies, models and data never leaving the perimeter.

11 Technology stack & licensing

Adopt the generic parts; build only the product

Concern	Choice	Posture	Why
Public API / SDK / CLI	FastAPI · Python SDK · Typer	Build	One versioned API; clients generated from the spec.
Identity	Keycloak	Adopt	Apache-2.0; realms = tenants; SSO/MFA free.
Annotation	Label Studio CE	Adopt	Apache-2.0; rebrand & resell freely.
Data versioning	Custom on Postgres + object store	Build	SaaS needs row-level, API-first snapshots — not git-shaped tools.
Tracking + registry	MLflow (headless)	Adopt	Apache-2.0 REST backend; charts rendered in our UI.
Queue + sandbox	RQ/Dramatiq + Valkey · gVisor	Build/Adopt	Clean hard-cancel; safe untrusted-code execution.
Training (CV)	NVIDIA TAO · MMDet / RT-DETR	Adopt ⚠	Native ONNX→TRT→DeepStream; verify NVIDIA EULA.
Serving	Triton · DeepStream (Vision SDK)	Adopt/Build	BSD-3 Triton; YAML runtime built in-house.
Object storage	Cloud S3 / SeaweedFS	Adopt	Avoid MinIO (AGPL) for the appliance.

Licensing landmines avoided. ClearML server (SSPL) and Ultralytics YOLO (AGPL) can force source disclosure for a resold product — replaced by RQ/MLflow and MMDetection/RT-DETR respectively. Redis → Valkey, MinIO → SeaweedFS, Grafana kept internal-only. Engineering guidance, not legal advice — counsel should confirm SSPL/AGPL and NVIDIA EULA terms before commercial release.

12 Delivery roadmap

Walking skeleton first, then layer outward

Phase	Delivers	Milestone
1 · Data engine	Content-addressed store, immutable versions, diff/pull, lineage, SDK + CLI.	Import a folder → commit v1 → pull it back byte-for-byte.
2 · Label & interchange	Rebranded annotation, COCO/YOLO import-export, tags, stats.	Annotate & freeze a labeled dataset via UI and API.
3 · Train & track	Governor, catalog + script-mode training, MLflow, live metrics, registry.	One-call `.train()` with live metrics, model tied to dataset version.
4 · BYO & serve	gVisor sandbox + BYO-container, ONNX/TRT export, Triton + DeepStream serving, RBAC, audit.	Multi-tenant beta with full bring-your-own-model.
5 · Appliance & scale	On-prem bundle, air-gap, license-safe swaps; Ray/K8s scale-out.	Customer runs the rebranded appliance on their LAN / edge.