Documentation

Everything you need to deploy and scale GPU inference workloads.

Get started in seconds

deploy.py

import vectorlay

# Initialize client
client = vectorlay.Client(api_key="vl_xxx")

# Deploy a GPU cluster
cluster = client.clusters.create(
    name="my-inference-cluster",
    gpu_type="h100",
    replicas=3,
    container="my-model:latest"
)

# Get inference endpoint
print(f"Endpoint: {cluster.endpoint}")

Quick Start

Deploy your first GPU cluster in under 5 minutes.

API Reference

Complete reference for REST and gRPC APIs.

Container Guide

Learn how to containerize your ML models for deployment.

Scaling & Autoscaling

Configure automatic scaling based on traffic patterns.

Authentication

Set up API keys, tokens, and access control.

Tutorials

Step-by-step guides for common use cases.

SDKs

System Status

All systems operational

View status page →

Need help?

Our team is here to help you get started.

Contact Support