Developer Documentation
VectorLay API
Deploy and manage GPU inference clusters programmatically. The VectorLay REST API gives you full control over cluster lifecycle, scaling, and GPU availability.
https://api.vectorlay.comBearer vl_...application/jsonGet started
Send requests
Once healthy, your cluster endpoint is ready to receive inference traffic.
API referenceAPI overview
All endpoints use JSON request and response bodies. Authenticate every request with your API key in the Authorization header.
/v1/clustersCreate a cluster
Deploy a containerized workload onto GPU infrastructure. Specify the GPU type, number of replicas, container image, and environment variables. The cluster will be provisioned and assigned a unique endpoint URL.
nameA human-readable name for the cluster. Must be unique within your organization.gpu_typeGPU model to use. See /v1/gpus for available types.replicasNumber of GPU instances to provision. Traffic is load balanced across replicas.containerDocker image to run. Supports public and private registries.curl -X POST https://api.vectorlay.com/v1/clusters \
-H "Authorization: Bearer vl_xxx" \
-H "Content-Type: application/json" \
-d '{
"name": "my-cluster",
"gpu_type": "rtx-4090",
"replicas": 1,
"container": "vllm/vllm-openai:latest",
"env": {
"MODEL": "meta-llama/Llama-3.1-8B-Instruct"
}
}'{
"id": "cl_abc123",
"name": "my-cluster",
"status": "provisioning",
"gpu_type": "rtx-4090",
"replicas": 1,
"endpoint": "https://my-cluster-abc123.run.vectorlay.com",
"created_at": "2026-01-15T10:30:00Z"
}/v1/clustersList clusters
Retrieve all clusters in your organization. Returns each cluster's current status, GPU type, replica count, and endpoint URL.
curl https://api.vectorlay.com/v1/clusters \
-H "Authorization: Bearer vl_xxx"{
"data": [
{
"id": "cl_abc123",
"name": "my-cluster",
"status": "healthy",
"gpu_type": "rtx-4090",
"replicas": 1,
"endpoint": "https://my-cluster-abc123.run.vectorlay.com"
}
]
}Guides
Quick Start
Deploy your first GPU cluster in under 5 minutes.
Read guideAuthentication
API keys, organization roles, and SSH key setup.
Read guideVirtual Machines
Full VMs with dedicated GPU access and SSH connectivity.
Read guideContainer Guide
Package ML models as containers for deployment.
Read guideScaling
Manual scaling and autoscaling configuration.
Read guideTutorials
Step-by-step guides for common use cases.
Read guide