AHOYAHOY PerceptionArchitecture · v1.0
1 / 3
AHOYSolution brief · v1.0

AHOY Perception

The end-to-end platform for production computer vision — one integrated system from physical sensors to deployed inference: annotate, version, train, evaluate, deploy and monitor, with the data and the model fully traceable at every step.

Sensors → Inference
API-first · SDK · CLI · Web
Edge · On-prem · Cloud · Air-gap
Full data & model lineage
The perception workflow — one continuous loop

Vertically-integrated architecture

Application
Sensor ingest · auto-labeling · annotation · versioning · training · evaluation · export · active learning — via Web Studio, Python SDK, CLI & REST.
Governance
Data & model lineage · reproducible splits · immutability · RBAC · SSO/MFA · audit · tenant isolation · quotas · GDPR purge.
Models
Detection (YOLO · RT-DETR) · classification · pose · segmentation · NVIDIA TAO · bring-your-own-model · plugin contracts · registry.
Data & Compute
Content-addressed store · git-for-datasets · COCO/YOLO I/O · resource governor · sandboxed training · Triton & DeepStream serving.
Infrastructure
Edge (Jetson/IGX) · on-prem · cloud · air-gap · Kubernetes · GPU/MIG · CPU-only · S3/SeaweedFS · PostgreSQL.

Why it's different

  • One platform, not a toolchain. Label, version, train, serve — no seams where reproducibility leaks out.
  • Lineage by construction. Every model traces to the exact data and code that made it; data leakage is structurally impossible.
  • Git for datasets. Immutable, fingerprinted, copy-on-write versions you can diff and pull back byte-for-byte.
  • Runs on what you have. A resource governor admits work against live GPU/CPU capacity — SaaS or on-prem.
  • Deploy by YAML. One config file takes an ONNX model to live multi-stream inference, edge to cloud.
Built in-house

The thin glue that is the product:

Data versioningResource governorYAML inference runtime
Adopted, license-clean

Best-in-class permissive components:

KeycloakLabel StudioMLflowTritonTAO
Deploy targets

The same stack, placed where data lives:

EdgeOn-premCloudAir-gap

AHOY Perception · Solution brief · v1.0 · Confidential — see the companion Technical Architecture & Platform Diagram for detail.

AHOY

AHOY Perception

End-to-end platform for production computer vision — sensors to inference

Your private environment · Edge · On-prem · Cloud
The perception workflow
01

Capture

Ingest from cameras, LiDAR, RADAR & recorded files

02

Annotate

AI-assisted labeling, review & quality control

03

Version

Immutable, fingerprinted datasets — git for data

04

Train

Catalog model, fine-tune, or bring your own

05

Optimize

Evaluate, pick the best, export to ONNX / TensorRT

06

Deploy & Infer

Serve on edge, on-prem or cloud — heterogeneous compute

Active-learning loop — low-confidence detections from the field flow back to annotation, compounding accuracy.
Application Layer

Perception Workflow

One API-first platform for the full lifecycle — from raw sensor data to a deployed model — driven from a web studio, SDK, CLI or REST.

Sensor Ingest
Auto-Labeling
Annotation Studio
Dataset Versioning
Training & Tuning
Model Evaluation
Export & Optimize
Active Learning
Web Studio
Python SDK
CLI
REST API
Governance Layer

Lineage & Trust

Every model traces back to the exact data and code that made it. Reproducibility, access control and compliance across all datasets and models.

Data & Model Lineage
Reproducible Splits
Dataset Immutability
RBAC
Auth · SSO / MFA
Audit Logging
Tenant Isolation
Storage Quotas
GDPR Purge
Experiment Tracking
Model Layer

Vision Models

State-of-the-art CV architectures plus bring-your-own-model. Detection, classification, pose and segmentation, registered with full lineage.

Detection · YOLO / RT-DETR
Classification
Pose Estimation
Instance Segmentation
NVIDIA TAO
Bring-Your-Own-Model
Plugin Contracts
Model Registry
Hyperparameter Tuning
Model Routing
Perception Data Engine

Data & Compute Core

A content-addressed dataset store with git-like versioning, plus resource-aware training and serving on whatever hardware is present.

Content-Addressed Store
Git-for-Datasets
COCO / YOLO I/O
Dataset Diff & Stats
Resource Governor
Job Queue
Sandboxed Execution
Triton Serving
DeepStream Runtime
MLflow Backend
Infrastructure Layer

Heterogeneous Compute

Deploy on the hardware and clusters you already run. The same stack ships to a Jetson at the edge, a server room, or a cloud region.

Edge · Jetson / IGX
On-prem
Cloud · AWS / Azure / GCP
Air-gap
Kubernetes
GPU / MIG
CPU-only
S3 / SeaweedFS
PostgreSQL
Workflow Governance Models Data & Compute Infrastructure
AHOY Perception · Reference Architecture · v1.0
AHOYAHOY Perception
Technical Architecture · v1.0

From the sensor to the served model

An end-to-end platform and SDK for production computer vision — covering the full journey from physical sensors to deployed inference: annotate, version, train, evaluate, deploy and monitor, with the data and the model fully traceable at every step.

Sensors → Inference, A-to-ZAPI-first · SDK · CLI · WebEdge · On-prem · CloudFull data & model lineage

01 Executive summary

One platform from the sensor to the served model

Computer-vision teams today stitch together a dozen disconnected tools — one to label, another to store data, a notebook to train, a registry somewhere, and a separate runtime to deploy. The seams are where reproducibility, accuracy and time-to-production are lost.

AHOY Perception collapses that toolchain into a single, vertically-integrated platform. A team ingests data from cameras, LiDAR or RADAR; labels it with AI assistance; freezes it into an immutable, versioned dataset; trains a built-in or custom model on whatever compute is available; evaluates and selects the best candidate; exports it to an optimized runtime format; and deploys it for live inference at the edge, on-prem or in the cloud — all through one API, one SDK and one web studio.

North star. Any model's lineage must trace back to the exact data and the exact code that created it. Every dataset is an immutable, content-addressed snapshot; every training run links a model version to the dataset version and configuration that produced it. Reproducibility is structural, not aspirational.

The guiding build philosophy is to adopt the hard, generic parts and build only the thin glue that is the product: best-in-class permissively-licensed components (annotation, tracking, serving, identity) stitched by an API-first backend, with three things built in-house because off-the-shelf tools do them badly — a git-like data-versioning engine, a resource-aware job governor, and a YAML-driven inference runtime.

02 Architecture at a glance

A vertically-integrated stack, five layers deep

The platform is organized as five layers. The workflow a user touches sits on top; underneath it, governance makes every action traceable, the model layer supplies the intelligence, the data & compute engine does the heavy lifting, and the infrastructure layer places it all on the hardware you already run.

See the companion Platform Architecture Diagram for the full layered view with per-layer capabilities.

Application

Perception Workflow

The end-to-end lifecycle, exposed identically through Web Studio, Python SDK, CLI and REST.

Governance

Lineage & Trust

RBAC, audit, reproducible splits, immutability, tenant isolation, compliance.

Model

Vision Models

Detection, classification, pose, segmentation, plus bring-your-own-model.

Data & Compute

Perception Data Engine

Content-addressed versioning, resource governor, sandboxed training, serving.

03 The perception lifecycle

Six stages, one continuous loop

Every capability maps to one stage of a single closed loop. The same steps run whether driven by a point-and-click studio user or a CI pipeline calling the SDK — because both hit the same API.

① Capture — from physical sensors to managed data

Data enters from live cameras (RTSP/RTMP/HLS), recorded video and image files, or other modalities such as LiDAR and RADAR captures. Bytes are streamed into object storage, hashed (SHA-256) and stored exactly once — duplicates across a workspace collapse to a single blob.

② Annotate — label with AI assistance

A rebranded annotation studio handles boxes, masks, keypoints and classes, with optional AI pre-labeling (e.g. SAM / GroundingDINO) to bootstrap labels for a human to correct. Already-labeled data imports directly via COCO or YOLO.

③ Version — git for datasets

The working set is frozen into a numbered, immutable version with a content fingerprint and a deterministic, leakage-safe train/val/test split. Versioning is metadata-only and copy-on-write — a new version that adds 300 images to an existing 1,200 stores only the 300 new blobs.

④ Train — catalog, fine-tune, or bring your own

Pick a built-in architecture, a fine-tuning recipe, or supply your own model code. The job is admitted by the resource governor when a slot is free; untrusted custom code runs sandboxed. Metrics stream live to the studio and the SDK.

⑤ Optimize — evaluate, select, export

Compare runs on per-class metrics, PR curves and confusion matrices; promote the best candidate; export to ONNX (portable) or a TensorRT engine (fastest on NVIDIA), packaged for the inference runtime.

⑥ Deploy & Infer — anywhere

The exported model drops into the YAML-driven inference runtime and serves live streams on edge devices, on-prem servers or cloud GPUs. Detections flow out as overlaid video, files, or structured Kafka messages.

The loop closes. Low-confidence detections observed in the field are surfaced back into annotation as the next batch to label — an active-learning flywheel that compounds accuracy with every deployment cycle.

04 Layer-by-layer

What lives in each layer

Application Layer — Perception Workflow

The platform is API-first: one versioned public API is the product, and the Web Studio, Python SDK and CLI are co-equal clients of it. Everything the UI can do is reachable from code.

  • Web Studio — point-and-click for labelers, reviewers and managers.
  • Python SDK — script the whole loop: create → upload → version → train → stream metrics → export → deploy.
  • CLI — a thin wrapper over the SDK for CI/CD and ops.
  • REST API — versioned (/v1), OpenAPI-specified, with async-job semantics and webhooks.

Governance Layer — Lineage & Trust

  • Data & model lineage — a dual foreign key ties every model to its dataset version and run.
  • Reproducible splits & immutability — committed versions never change; splits are deterministic.
  • RBAC, SSO/MFA, audit logging — adopted via Keycloak; who-changed-what is recorded.
  • Tenant isolation & quotas — strict per-workspace separation, storage quotas, GDPR purge.

Model Layer — Vision Models

  • Built-in architectures — detection (YOLO, RT-DETR), classification, pose, instance segmentation.
  • NVIDIA TAO — the native train → ONNX → TensorRT → DeepStream path for CV.
  • Bring-your-own-model — three tiers: catalog, script-mode, and full custom container.
  • Plugin contracts — register a model like a DeepStream plugin; the UI builds its config form automatically.

Perception Data Engine — Data & Compute Core

  • Content-addressed store — bytes stored once by hash, deduplicated per workspace.
  • Git-for-datasets — immutable versions, diffs, tags, lineage, COCO/YOLO import-export.
  • Resource governor & job queue — admits work against live hardware capacity.
  • Serving runtimes — Triton for hosted endpoints, DeepStream for streaming inference.

Infrastructure Layer — Heterogeneous Compute

  • Targets — Edge (Jetson/IGX), on-prem servers, cloud (AWS/Azure/GCP), and air-gapped sites.
  • Compute — GPU / MIG slices, multi-GPU, or CPU-only — the platform detects and adapts.
  • Storage — cloud S3 or self-hosted SeaweedFS; PostgreSQL for all metadata.

05 Data versioning engine Built in-house

"Git for datasets" — the foundation everything sits on

This is the bottom of the stack and the part no off-the-shelf tool gives you the right way. It is an independent, hexagonal module (ports & adapters): a pure core that runs standalone on a laptop (SQLite + local disk), embedded in the platform, or as a microservice — the same code in every mode.

Content-addressed, copy-on-write storage

File bytes are stored once by their sha256 and deduplicated within a workspace. A dataset references content; a version freezes those references. Nothing copies bytes, so versions are cheap metadata.

TableRole
datasetsNamed, workspace-owned containers; soft-delete; unique slug per workspace.
filesContent records — one per unique (workspace, sha256).
dataset_filesThe mutable working area(dataset, filename) → file.
dataset_versionsImmutable snapshots — number, fingerprint, split, parent_id lineage.
version_filesThe frozen manifest(filename, sha256, role, split) per version.
version_tagsStable mutable names (production, golden) → one version.

Reproducible by construction

  • Fingerprintmanifest_hash = sha256 over the sorted manifest rows, computed in Python (codepoint sort, canonical JSON) so it is identical on SQLite and Postgres.
  • Leakage-safe split — each file's bucket is decided by hashing its content plus a seed, so duplicate bytes can never straddle train and test, and the split survives renames.
  • Lineage — every commit records its parent_id; the chain walks back to origin.

Safe deletion over immutable, deduplicated data

Because versions are immutable and blobs are shared, deletion removes a reference, never naively the bytes. Reference counting plus a grace-period garbage collector reclaims a blob only when zero versions point at it. A lineage guard protects any version a model was trained on, and GDPR erasure is an explicit, audited tombstone.

06 Models & training

Built-in intelligence, plus bring-your-own-model

For the CV core, NVIDIA TAO Toolkit is the backbone: a clean train → export(.onnx) → TensorRT engine path that drops straight into the inference runtime. Alongside it, MMDetection / RT-DETR and Hugging Face cover Apache/BSD-licensed alternatives.

Three execution tiers for custom models

TierWhat the user bringsRuns inTrade-off
0 · CatalogA built-in architecture + hyperparametersCurated imagesLowest effort, safest
1 · Script modeA training script + a deps listCurated framework base imagesBYO model, recommended default
2 · BYO containerA full Docker image with their codeTheir image, sandboxedMax flexibility, max security surface

The plugin framework — "define the rules, users add models"

Just as DeepStream lets you drop in a custom plugin by implementing a known interface, the platform defines a small set of extension-point contracts (build / train / evaluate / export). A user implements them to make their own model a first-class, trainable, reproducible citizen — discovered via standard Python entry points, configured from a JSON-Schema manifest that the UI renders into a form automatically.

Why build untrusted-code execution? Tiers 1–2 run customer code on shared GPUs. The runtime is sandboxed with gVisor + nvproxy (the approach Modal Labs uses for multi-tenant GPU sandboxing) — composable with MIG slicing and low-overhead for GPU-bound work — with Kata/Confidential-Containers as an optional VM-grade premium tier.

07 Inference & deployment runtime Built in-house

From ONNX to live streams with one YAML file

The Vision SDK replaces dozens of low-level DeepStream config files, GStreamer elements and inference parameters with a single declarative YAML file. You provide an ONNX model and labels; the SDK generates every DeepStream config, links the pipeline, and routes multi-stream output automatically.

Model types

Detector, classifier, pose (17 COCO joints), instance segmentation, and Triton-served variants — chainable (e.g. detect → track → classify).

Architectures

YOLO v5–v13 / v26 & YOLOX, DetectNet_v2, YOLO-pose, YOLO-seg, and any softmax classifier (ResNet, EfficientNet, ViT…).

Inputs

Local files, RTSP, RTMP and HLS streams — single or multi-stream, batched automatically.

Outputs

X11 display overlays, MP4/MKV files, RTSP/RTMP re-streaming, FPS metering, and structured per-frame Kafka detection messages.

Hardware-accelerated encode/decode (GPU by default, CPU fallback) means the same YAML runs on a Jetson at the edge and a multi-GPU server — only the device changes.

08 Reproducibility & lineage

Trace any model back to its exact data and code

Lineage is the spine that connects the two in-house engines. The data engine fingerprints every dataset version; the training layer records, for every run, which dataset version and which model code produced the artifact.

QuestionHow the architecture answers it
What exact data trained this model?A dual FK from the model version to its dataset_version — a single join.
Can I rebuild last month's dataset?version.pull() reconstructs the exact files, labels and split byte-for-byte.
Did the data change between runs?diff(a, b) reports added / removed / relabeled, with old and new hashes.
Which code produced the model?The run records its source type and reference (catalog id, script + commit, or image digest).
Did test data leak into training?Content-hash splitting makes leakage structurally impossible.

09 Multi-tenancy & resource governance Built in-house

Admit work against the hardware that is actually present

What a user is allowed to run depends on live capacity, not hardcoded limits — and this holds identically for hosted SaaS and a customer's on-prem box, where the hardware is unknown in advance. A resource governor probes the host and admits work against real capacity.

  • Probe — GPUs present? VRAM free? CPU cores, RAM, disk — continuously, gracefully handling a no-GPU host.
  • Request — every job carries a resource request (catalog models fill it in; plugins declare it).
  • Admit / queue / reject — fits now → run; could fit later → queue with a visible position; can never fit → reject with a clear reason.
Detected hardwareSharing strategyConcurrency
No GPU (CPU-only)CPU/RAM-bound; GPU jobs rejectedby free cores / RAM
One consumer GPUwhole-GPU per job; MPS for bursty small jobs~1 heavy job
Several GPUsone job per GPU, or multi-GPU jobs= # GPUs
Datacenter GPU (A100/H100)optional MIG partitioning into isolated slicesup to # slices

Tenant isolation is structural: every operation resolves through a single workspace-scoped chokepoint, so cross-tenant access is impossible by construction rather than by per-call discipline.

10 Deployment topologies

The same stack, placed where the data lives
Edge

Jetson / IGX devices

The inference runtime ships to the device; models exported as TensorRT engines for the target GPU. Streams processed locally, detections published upstream.

On-prem

Single-tenant appliance

The full platform as a self-contained bundle on the customer's LAN. The governor auto-sizes to the box; license-safe component swaps (SeaweedFS, Valkey).

Cloud

Multi-tenant SaaS

Hosted, with cloud object storage, GPU pools and MIG-aware scheduling; horizontal scale-out via Ray or Kubernetes when a single node is outgrown.

Air-gap

Disconnected sites

For regulated or offline environments — the same bundle with no outbound dependencies, models and data never leaving the perimeter.

11 Technology stack & licensing

Adopt the generic parts; build only the product
ConcernChoicePostureWhy
Public API / SDK / CLIFastAPI · Python SDK · TyperBuildOne versioned API; clients generated from the spec.
IdentityKeycloakAdoptApache-2.0; realms = tenants; SSO/MFA free.
AnnotationLabel Studio CEAdoptApache-2.0; rebrand & resell freely.
Data versioningCustom on Postgres + object storeBuildSaaS needs row-level, API-first snapshots — not git-shaped tools.
Tracking + registryMLflow (headless)AdoptApache-2.0 REST backend; charts rendered in our UI.
Queue + sandboxRQ/Dramatiq + Valkey · gVisorBuild/AdoptClean hard-cancel; safe untrusted-code execution.
Training (CV)NVIDIA TAO · MMDet / RT-DETRAdopt ⚠Native ONNX→TRT→DeepStream; verify NVIDIA EULA.
ServingTriton · DeepStream (Vision SDK)Adopt/BuildBSD-3 Triton; YAML runtime built in-house.
Object storageCloud S3 / SeaweedFSAdoptAvoid MinIO (AGPL) for the appliance.
Licensing landmines avoided. ClearML server (SSPL) and Ultralytics YOLO (AGPL) can force source disclosure for a resold product — replaced by RQ/MLflow and MMDetection/RT-DETR respectively. Redis → Valkey, MinIO → SeaweedFS, Grafana kept internal-only. Engineering guidance, not legal advice — counsel should confirm SSPL/AGPL and NVIDIA EULA terms before commercial release.

12 Delivery roadmap

Walking skeleton first, then layer outward
PhaseDeliversMilestone
1 · Data engineContent-addressed store, immutable versions, diff/pull, lineage, SDK + CLI.Import a folder → commit v1 → pull it back byte-for-byte.
2 · Label & interchangeRebranded annotation, COCO/YOLO import-export, tags, stats.Annotate & freeze a labeled dataset via UI and API.
3 · Train & trackGovernor, catalog + script-mode training, MLflow, live metrics, registry.One-call .train() with live metrics, model tied to dataset version.
4 · BYO & servegVisor sandbox + BYO-container, ONNX/TRT export, Triton + DeepStream serving, RBAC, audit.Multi-tenant beta with full bring-your-own-model.
5 · Appliance & scaleOn-prem bundle, air-gap, license-safe swaps; Ray/K8s scale-out.Customer runs the rebranded appliance on their LAN / edge.