OpenRelay is in Preview — the API is still evolving and may change without notice.

Provider nodes

Hardware requirements, NVIDIA GPU passthrough, node health checks, and the full lifecycle of your provider hardware.

This page covers what a provider box needs, how GPU passthrough is set up, how to confirm a node is healthy, and how nodes behave over their lifetime. To bring your first node online, start with Providers — onboarding is one command. Everything here is handled for you by that command; it's documented so you know what's happening and how to operate the box afterwards.

Hardware requirements

RequirementDetail
OSUbuntu 22.04 or 24.04 LTS, with root / sudo.
NetworkOutbound HTTPS (443) only — the node dials out, so no inbound ports are required.
Toolscurl (preinstalled); the installer pulls jq, tar, unzip, pciutils itself.
CPU virtualizationIntel VT-x / AMD SVM — required to run VMs (/dev/kvm).
IOMMUIntel VT-d / AMD AMD-Vi — required for GPU passthrough nodes.
GPUNVIDIA (or AMD) for GPU capacity. CPU-only nodes are also welcome.
RAM32 GB minimum, 64 GB+ recommended — each VM needs system RAM.
DiskNVMe with headroom for base VM images. The health check warns below 50 GB free and fails below 20 GB.

Two ways to serve GPUs

A GPU node runs customer VMs with direct GPU passthrough (QEMU + VFIO) by default. A container-GPU mode (gVisor + nvproxy) is also available — pass gpu=1 to the bootstrap. See Providers → Options.

NVIDIA GPU passthrough

The one-command install configures passthrough for you — you don't hand-edit GRUB or modprobe files. On a GPU box it:

  1. Detects your NVIDIA/AMD GPUs with lspci (works even with no driver installed).
  2. If they aren't passthrough-ready, enables IOMMU on the kernel command line, binds each GPU (and its IOMMU-group siblings) to vfio-pci, and installs a boot-time service (vectorlay-vfio-bind.service) so the binding survives reboots.
  3. Asks you to reboot and re-run the exact same command. The node is intentionally not enrolled until passthrough actually works — we never advertise a GPU that can't run a VM.

If, after a reboot, the GPUs still aren't bound, enable virtualization + IOMMU in your BIOS/UEFI, then reboot and re-run:

  • Intel — enable VT-x and VT-d.
  • AMD — enable SVM and IOMMU (set to Enabled, not Auto).

A passthrough GPU leaves the local console

Once a GPU is bound for passthrough it no longer drives a local display. Manage the box over SSH, serial, or IPMI — not a monitor. Re-run the command promptly after rebooting (within ~an hour) so your enrollment token is still valid.

Multi-GPU nodes

The unit of allocation is the IOMMU group: GPUs that share a group are always passed to a VM together. The installer binds every NVIDIA/AMD GPU on the host to vfio-pci. For full-node (all-GPU) allocations, NVSwitches are passed through too so NVLink works — this is automatic.

GPU runtime modes

  • Passthrough (QEMU + VFIO) — the default for GPU boxes with no host driver. The VM gets bare-metal GPU performance.
  • Container GPU (gVisor + nvproxy) — pass gpu=1 to the bootstrap. GPUs stay bound to the host NVIDIA driver and are shared via nvproxy; no IOMMU/VFIO needed.

Check node health

After install — or any time — run the built-in doctor:

sudo vectorlay-node-doctor

It prints PASS / FAIL per check and a final verdict:

  • Agents & tunnelnebula, nomad, and node-agent running; auto-update timer enabled.
  • Virtualization/dev/kvm present.
  • GPU passthrough — IOMMU active and per-IOMMU-group VFIO readiness, ending with N/M GPU unit(s) ready to rent.
  • Disk — enough free space for VM base images.

A green doctor means the control plane will advertise the node's GPU units.

  • If a unit shows WEDGED, a GPU fell off the bus (the D3cold reset bug) — power-cycle the host (a soft reboot may not clear it).
  • If a unit isn't bound to vfio-pci, run sudo systemctl restart vectorlay-vfio-bind or re-run the bootstrap.

Node lifecycle

Two credentials (don't conflate them)

Provisioning token (vtk_…)Durable identity
WhatOne-time bootstrap keyData-plane mTLS cert + durable node token
Lifetime1 hour, single-useLives as long as the node
On disk?No — passed to the installer, used, discardedYes — under /etc/vectorlay/
Used forEnrollment onlyEvery gateway tunnel + node-agent self-update

Everything after enrollment runs on the durable identity — the provisioning token expiring an hour later is by design and harmless.

Reboots & power loss

systemd brings nebula, nomad, and node-agent back at boot; they read their credentials from disk and rejoin automatically. On GPU nodes the VFIO bind service re-binds the GPUs before Nomad starts. No action needed.

Automatic updates

The node-agent keeps itself current via a jittered systemd timer (~hourly + up to 20 min), authenticated with the durable node token. Updates drain in-flight HTTP/SSH first, install atomically, and roll back on failure — live sessions survive.

  • Opt out per node: enroll with NODE_AGENT_AUTOUPDATE=0, or sudo systemctl disable --now node-agent-update.timer.
  • Force a check now: sudo systemctl start node-agent-update.service (logs: journalctl -u node-agent-update).

Recycle / re-enroll

Re-running the onboarding one-liner on an existing box is safe and idempotent: the node keeps its identity (same node ID and overlay IP), its token is rotated, no quota is consumed, and nothing referencing the node (clusters / VMs) breaks. Use it to repair a node or roll it onto new config.

Take a node offline

To temporarily remove a node from the network, stop its agents:

sudo systemctl stop nomad nebula node-agent

Nomad stops scheduling to it and the data-plane tunnel closes (in-flight work drains first). Start the services again — or reboot — to bring it back.

Decommission a node

To retire a node permanently, set its status to removed from the Nodes view in the dashboard, or via the API:

curl -fsSL -X PATCH -H "Authorization: Bearer $VL_KEY" \
  -H "Content-Type: application/json" -d '{"status":"removed"}' \
  https://api.openrelay.inc/v1/orgs/$VL_ORG/provider/nodes/$NODE_ID

This frees the node's workloads and GPU units. A decommission isn't a one-way door: re-running the onboarding one-liner on a removed node auto-recovers it.

Troubleshooting

SymptomCause / fix
403 NOT_A_PROVIDERThe org isn't an approved provider yet.
401Bad, missing, or expired API key / provisioning token.
Install prints nothing ("0 logs")curl -fsSL … | sudo bash swallows HTTP errors, so an empty script runs. Re-run with curl -sS -w 'HTTP %{http_code}\n' … to see the real status.
checksum mismatch / 502 ARTIFACTA stale artifact — re-run; artifacts are re-fetched and verified each run.
GPUs not bound to vfio-pci after rebootEnable VT-d / IOMMU in BIOS; sudo systemctl restart vectorlay-vfio-bind; reboot and re-run.
A GPU unit shows WEDGEDThe GPU fell off the bus (D3cold reset bug) — power-cycle the host.

See Providers for onboarding and Authentication for API keys.

On this page