The operating layer for on-device AI

on-device AI
that scale.

One API for every backend. Models that ship over the air. Runs anywhere with a chip — and falls back to the cloud when it can't.

Swift · run on-device
iOSmacOSwatchOStvOSvisionOS
let model = try XybridModelLoader.fromRegistry(modelId: "smollm2-360m").load()

let result = try model.run(envelope: Envelope.text(text: 'Explain quantum computing'))

print(result.text)
● 112 tok/s · on-device
Text generation · smollm2-360m
Run a small language model on-device with cloud fallback.
Apache-2.0 · v0.1.0 · day-one SOTA

Built for leading open-weight models

−80%
inference cost vs. cloud-only
14ms
p50 latency on-device (360m)
Day 0
support for new SOTA models
7
first-class SDK languages
How it routes

The router picks where each call runs.

Per-request decision based on model size, available memory, battery, and your latency budget. On-device by default; falls back to cloud when it can't.

chat()
text in
transcribe()
audio in
embed()
vectors in
route()
On-device
default
Hybrid
mixed
Cloud
fallback

On-device

default

Phone, laptop, console. Zero round-trip, zero spend.

Hybrid

mixed

Embeddings on-device; long-context generation in the cloud.

Cloud

fallback

When the device can't. Same API, same envelope.

No compromise

Everything you get from cloud. None of the bills.

One runtime that ships natively to every device, with cloud as a soft fallback — billed only when the device truly couldn't handle it.

CapabilityCloud onlyOn-premFull on-deviceXybridrecommended
Runs offline
Zero per-token cost
Same SDK across platforms
Telemetry + cost analytics
OTA model updates
Vendor lock-inhighlow
Falls back when device can't
Platform

Integrate once. Ship forever.

Six primitives that make on-device AI shippable. Use the runtime alone, or pair it with the platform to operate models at fleet scale.

Every backend, one API

Bring your own SDK target — iOS, Android, Flutter, Unity, Linux/Edge. The same envelope runs everywhere.

// runtime
Seamless device ↔ cloud

Route per request based on model size, battery, and latency budget. Fall back to cloud automatically when needed.

// router
OTA model updates

Ship new models without touching code. Canary, region-scoped or fleet-wide. Rollback in one click.

// platform
Evals across runtimes

Model-aware harness with prompt libraries. Compare quality across chips, OS versions, and quantization.

// platform
Per-model optimization

Same intent, tuned per backend. The same prompt runs cheaper on every chip — automatically.

// platform
Private by default

Data stays on-device. Telemetry is opt-in and aggregated. SOC-2 in progress.

// runtime

One SDK. Every platform.

Write your AI pipeline once and deploy it natively across mobile, desktop, and game engines — with hardware acceleration on every target.

Console

Operate fleets without leaving a tab.

One UI for telemetry, devices, models, registry, keys, and settings — built for operators running on-device AI at fleet scale.

console.xybrid.dev / telemetry

Telemetry

Last 24h · 1,284 traces · 3 routes
live 24h ▾
p50 latency
14 ms
on-device
p95 latency
92 ms
fallback
route mix
78% on-device
14% hybrid · 8% cloud
cost saved
$1,847
vs. cloud-only
Throughput req/min
Trace · req_8f4a · 96 ms total ● success
request
guardrails.in
router.pick
embed
on-device.run
decode
guardrails.out
telemetry.flush
Open source

The runtime is open source.

Read the code. Run it locally. Fork the harness. The platform is the optional layer; the runtime is yours forever.

xybrid-ai/xybrid
commits this month
contributors
Apache-2.0 no copyleft
Questions

Common questions.

Anything that fits in your device's memory budget runs locally. The router decides per-request based on model size, available memory, battery state and your latency budget. If a request can't be served on-device, it falls back to a hosted endpoint — same response shape, same SDK call.
v0.1.0 · free to prototype

Integrate once. Ship anywhere.

Get an API key in 30 seconds. The SDK is open source — you're never blocked on us.

xybrid-ai/xybrid · Apache-2.0