Documentation

Everything you need to deploy and scale GPU inference workloads.

Get started in seconds

deploy.py
import vectorlay

# Initialize client
client = vectorlay.Client(api_key="vl_xxx")

# Deploy a GPU cluster
cluster = client.clusters.create(
    name="my-inference-cluster",
    gpu_type="h100",
    replicas=3,
    container="my-model:latest"
)

# Get inference endpoint
print(f"Endpoint: {cluster.endpoint}")

System Status

All systems operational
View status page →

Need help?

Our team is here to help you get started.

Contact Support