VidStream: Real-Time Video Inference

Module: ats/vidstream Status: Production-ready with ONNX inference Last Updated: 2026-01-10


Overview

VidStream provides WebSocket-based real-time video object detection for QNTX. Browser camera frames are sent via WebSocket to the Go server, which processes them through ONNX Runtime models via Rust CGO bindings, returning bounding box detections.

Purpose

Enable real-time browser-based video analysis where each frame generates object detections (YOLO models). Designed for low-latency frame processing with browser camera access - no desktop permissions required.


Architecture

High-Level Flow

┌──────────────┐  WebSocket   ┌─────────────┐    CGO     ┌────────────────┐
│   Browser    │─────────────▶│ Go Server   │───────────▶│ Rust ONNX      │
│  Camera API  │  JSON frames │ handlers.go │  vidstream │ VideoEngine    │
│  640x480     │              │             │            │ yolo11n.onnx   │
└──────────────┘              └─────────────┘            └────────────────┘
       │                             │                           │
       │                             ▼                           ▼
       │                      ┌──────────────┐           ┌──────────────┐
       └─────────────────────│ Bounding     │◀──────────│ Detections   │
         JSON detections      │ boxes drawn  │           │ [person, 89%]│
                              └──────────────┘           └──────────────┘

Detailed Structure

┌─────────────────────────────────────────────────────────────────┐
│                         Go Application                          │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                 vidstream package                        │   │
│  │  video_cgo.go (build tag: cgo && rustvideo)             │   │
│  │  video_nocgo.go (fallback stub)                         │   │
│  │  types.go (shared types)                                │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                              │ CGO
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Rust Library (libqntx_vidstream)            │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │   ffi.rs     │  │  engine.rs   │  │     types.rs         │  │
│  │  C-ABI layer │──│  Processing  │──│  Detection, BBox,    │  │
│  │  Memory mgmt │  │  Pipeline    │  │  Config, Stats       │  │
│  └──────────────┘  └──────────────┘  └──────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Key Design Decisions

DecisionRationale
CGO over gRPCMinimize latency for real-time processing
Excluded from workspaceCGO libraries have different build lifecycle
Pre-allocated buffersReduce GC pressure in hot path
Build tagsAllow compilation without Rust toolchain
Stub inferenceDecouple FFI work from ML integration

Current Implementation

Working Features ✓

ComponentStatusDetails
ONNX inferenceYOLOv11n model (10MB), 80ms latency
WebSocket transport10MB message limit for 4.4MB JSON frames
Browser camera accessMediaDevices API, 640x480 @ 62 FPS
Frame throttling5 FPS inference to prevent overload
Object detectionPerson, tie, cup, etc. (COCO classes)
Bounding box renderingGreen boxes with confidence labels
Async engine initNon-blocking 2-5s model load
Detection statsFPS, latency, detection count

Known Issues

IssueImpactWorkaround
JSON frame encoding4.4MB per frame (inefficient)TODO: Switch to binary WebSocket frames (would reduce to 1.2MB)
WebSocket message limitRequired increasing from 1MB → 10MBSee server/client.go:36 TODO comment
Test timing issuesTest flakes due to async initSkip slow tests in make test

File Reference

Rust (src/)

lib.rs          - Crate entry, re-exports public types
types.rs        - BoundingBox, Detection, ProcessingStats, Config
engine.rs       - VideoEngine with process_frame pipeline
ffi.rs          - C-compatible FFI functions and types

Go (vidstream/)

types.go        - Shared types (no build tags)
video_cgo.go    - CGO implementation (build tag: cgo && rustvideo)
video_nocgo.go  - Stub returning ErrNotAvailable (build tag: !cgo || !rustvideo)

Headers (include/)

video_engine.h  - C API declarations for CGO

Build Instructions

Rust Library with ONNX

cd ats/vidstream

# Build with ONNX support (REQUIRED for inference)
cargo build --release --features onnx

# Library output: target/release/libqntx_vidstream.{so,dylib,dll}

Go Application

# Build with rustvideo tag (enables CGO)
CGO_ENABLED=1 go build -tags rustvideo ./cmd/qntx

# Runtime library path (macOS)
export DYLD_LIBRARY_PATH=$PWD/ats/vidstream/target/release:$DYLD_LIBRARY_PATH

# Runtime library path (Linux)
export LD_LIBRARY_PATH=$PWD/ats/vidstream/target/release:$LD_LIBRARY_PATH

Quick Start

# 1. Build Rust library
cd ats/vidstream && cargo build --release --features onnx && cd ../..

# 2. Start QNTX server
DYLD_LIBRARY_PATH=$PWD/ats/vidstream/target/release ./qntx

# 3. Open browser to http://localhost:3030
# 4. Click VidStream button (⮀ icon)
# 5. Click "Initialize ONNX" → "Start Camera"
# 6. See detections rendered as green bounding boxes

Architecture Details

WebSocket Message Flow

Client → Server (vidstream_init):

{
  "type": "vidstream_init",
  "model_path": "ats/vidstream/models/yolo11n.onnx",
  "confidence_threshold": 0.5,
  "nms_threshold": 0.45
}

Server → Client (vidstream_init_success):

{
  "type": "vidstream_init_success",
  "width": 640,
  "height": 480,
  "ready": true
}

Client → Server (vidstream_frame) - 5 FPS:

{
  "type": "vidstream_frame",
  "frame_data": [255, 128, 64, ...],  // RGBA bytes as JSON array (4.4MB)
  "width": 640,
  "height": 480,
  "format": "rgba8"
}

Server → Client (vidstream_detections):

{
  "type": "vidstream_detections",
  "detections": [
    {
      "ClassID": 0,
      "Label": "person",
      "Confidence": 0.89,
      "BBox": { "X": 109, "Y": 46, "Width": 525, "Height": 433 }
    }
  ],
  "stats": {
    "inference_us": 79915,
    "total_us": 80605,
    "detections_final": 1
  }
}

Performance Characteristics

MetricValueNotes
Model size10MBYOLOv11n (nano variant)
Engine init time2-5sOne-time cost on first load
Inference latency60-80msPer frame (CPU-only)
Frame rate5 FPSThrottled to prevent WebSocket overload
Payload size4.4MBJSON encoding (inefficient, see TODO)
WebSocket limit10MBAllows ~2 frames buffered

Future Optimizations

Binary WebSocket Frames (TODO):

GPU Inference (when needed):

Object Tracking (future):


Related Files

FilePurpose
server/vidstream.goWebSocket message handlers (init, frame)
server/client.goWebSocket limits and logging
web/ts/vidstream-window.tsFrontend camera, rendering, throttling
ats/vidstream/src/engine.rsRust ONNX inference pipeline
ats/vidstream/src/ffi.rsCGO interface layer

API Reference