The operating layer for on-device AI

on-device AI
that scale.

One API for every backend. Models that ship over the air. Runs anywhere with a chip — and falls back to the cloud when it can't.

Get an API key View on GitHub

Swift · run on-device

iOSmacOSwatchOStvOSvisionOS

let model = try XybridModelLoader.fromRegistry(modelId: "smollm2-360m").load()

let result = try model.run(envelope: Envelope.text(text: 'Explain quantum computing'))

print(result.text)

● 112 tok/s · on-device

Text generation · smollm2-360m

Run a small language model on-device with cloud fallback.

Apache-2.0 · v0.1.0 · day-one SOTA

Built for leading open-weight models

−80%

inference cost vs. cloud-only

14ms

p50 latency on-device (360m)

Day 0

support for new SOTA models

first-class SDK languages

How it routes

The router picks where each call runs.

Per-request decision based on model size, available memory, battery, and your latency budget. On-device by default; falls back to cloud when it can't.

chat()

text in

transcribe()

audio in

embed()

vectors in

route()

On-device

default

Hybrid

mixed

Cloud

fallback

On-device

default

Phone, laptop, console. Zero round-trip, zero spend.

Hybrid

mixed

Embeddings on-device; long-context generation in the cloud.

Cloud

fallback

When the device can't. Same API, same envelope.

No compromise

Everything you get from cloud. None of the bills.

One runtime that ships natively to every device, with cloud as a soft fallback — billed only when the device truly couldn't handle it.

Capability	Cloud only	On-prem
Runs offline
Zero per-token cost
Same SDK across platforms
Telemetry + cost analytics
OTA model updates
Vendor lock-in	high	low
Falls back when device can't

Platform

Integrate once. Ship forever.

Six primitives that make on-device AI shippable. Use the runtime alone, or pair it with the platform to operate models at fleet scale.

Every backend, one API

Bring your own SDK target — iOS, Android, Flutter, Unity, Linux/Edge. The same envelope runs everywhere.

// runtime

Seamless device ↔ cloud

Route per request based on model size, battery, and latency budget. Fall back to cloud automatically when needed.

// router

OTA model updates

Ship new models without touching code. Canary, region-scoped or fleet-wide. Rollback in one click.

// platform

Evals across runtimes

Model-aware harness with prompt libraries. Compare quality across chips, OS versions, and quantization.

// platform

Per-model optimization

Same intent, tuned per backend. The same prompt runs cheaper on every chip — automatically.

// platform

Private by default

Data stays on-device. Telemetry is opt-in and aggregated. SOC-2 in progress.

// runtime

One SDK. Every platform.

Write your AI pipeline once and deploy it natively across mobile, desktop, and game engines — with hardware acceleration on every target.

Console

Operate fleets without leaving a tab.

One UI for telemetry, devices, models, registry, keys, and settings — built for operators running on-device AI at fleet scale.

console.xybrid.dev / telemetry

Telemetry

Last 24h · 1,284 traces · 3 routes

live 24h ▾

p50 latency

14 ms

on-device

p95 latency

92 ms

fallback

route mix

78% on-device

14% hybrid · 8% cloud

cost saved

$1,847

vs. cloud-only

Throughput req/min

Trace · req_8f4a · 96 ms total ● success

request

guardrails.in

router.pick

embed

on-device.run

decode

guardrails.out

telemetry.flush

Open the console ↗

Open source

The runtime is open source.

Read the code. Run it locally. Fork the harness. The platform is the optional layer; the runtime is yours forever.

Read the code Quickstart ↗

xybrid-ai/xybrid —

— commits this month

— contributors

Apache-2.0 no copyleft

Questions

Common questions.

What runs on-device vs. in the cloud?

Anything that fits in your device's memory budget runs locally. The router decides per-request based on model size, available memory, battery state and your latency budget. If a request can't be served on-device, it falls back to a hosted endpoint — same response shape, same SDK call.

Which models are supported on day one?

What does an OTA model update look like?

Is this just a wrapper around llama.cpp?

How is pricing structured?

Do you handle telemetry / privacy?

v0.1.0 · free to prototype

Integrate once. Ship anywhere.

Get an API key in 30 seconds. The SDK is open source — you're never blocked on us.

Get an API key View on GitHub

xybrid-ai/xybrid · Apache-2.0

on-device AI that scale.

The router picks where each call runs.

On-device

Hybrid

Cloud

Everything you get from cloud. None of the bills.

Integrate once. Ship forever.

One SDK. Every platform.

Operate fleets without leaving a tab.

The runtime is open source.

Common questions.

Integrate once. Ship anywhere.

on-device AI
that scale.