Documentation
Everything you need to deploy and scale GPU inference workloads.
Get started in seconds
deploy.py
import vectorlay
# Initialize client
client = vectorlay.Client(api_key="vl_xxx")
# Deploy a GPU cluster
cluster = client.clusters.create(
name="my-inference-cluster",
gpu_type="h100",
replicas=3,
container="my-model:latest"
)
# Get inference endpoint
print(f"Endpoint: {cluster.endpoint}")Quick Start
Deploy your first GPU cluster in under 5 minutes.
API Reference
Complete reference for REST and gRPC APIs.
Container Guide
Learn how to containerize your ML models for deployment.
Scaling & Autoscaling
Configure automatic scaling based on traffic patterns.
Authentication
Set up API keys, tokens, and access control.
Tutorials
Step-by-step guides for common use cases.