Loading…
Loading…
Developer Documentation
Deploy and manage GPU inference clusters programmatically. The VectorLay REST API gives you full control over cluster lifecycle, scaling, and GPU availability.
https://api.vectorlay.comBearer vl_...application/jsonOnce healthy, your cluster endpoint is ready to receive inference traffic.
API referenceAll endpoints use JSON request and response bodies. Authenticate every request with your API key in the Authorization header.
/v1/clustersDeploy a containerized workload onto GPU infrastructure. Specify the GPU type, number of replicas, container image, and environment variables. The cluster will be provisioned and assigned a unique endpoint URL.
nameA human-readable name for the cluster. Must be unique within your organization.gpu_typeGPU model to use. See /v1/gpus for available types.replicasNumber of GPU instances to provision. Traffic is load balanced across replicas.containerDocker image to run. Supports public and private registries.curl -X POST https://api.vectorlay.com/v1/clusters \
-H "Authorization: Bearer vl_xxx" \
-H "Content-Type: application/json" \
-d '{
"name": "my-cluster",
"gpu_type": "rtx-4090",
"replicas": 1,
"container": "vllm/vllm-openai:latest",
"env": {
"MODEL": "meta-llama/Llama-3.1-8B-Instruct"
}
}'{
"id": "cl_abc123",
"name": "my-cluster",
"status": "provisioning",
"gpu_type": "rtx-4090",
"replicas": 1,
"endpoint": "https://my-cluster-abc123.run.vectorlay.com",
"created_at": "2026-01-15T10:30:00Z"
}/v1/clustersRetrieve all clusters in your organization. Returns each cluster's current status, GPU type, replica count, and endpoint URL.
curl https://api.vectorlay.com/v1/clusters \
-H "Authorization: Bearer vl_xxx"{
"data": [
{
"id": "cl_abc123",
"name": "my-cluster",
"status": "healthy",
"gpu_type": "rtx-4090",
"replicas": 1,
"endpoint": "https://my-cluster-abc123.run.vectorlay.com"
}
]
}Deploy your first GPU cluster in under 5 minutes.
Read guideAPI keys, organization roles, and SSH key setup.
Read guideFull VMs with dedicated GPU access and SSH connectivity.
Read guidePackage ML models as containers for deployment.
Read guideManual scaling and autoscaling configuration.
Read guideStep-by-step guides for common use cases.
Read guide