API Reference
The VectorLay REST API lets you manage GPU clusters, check availability, and monitor deployments programmatically.
Authentication
All API requests require an API key passed in the Authorization header:
Authorization: Bearer vl_your_api_keyGenerate API keys in your dashboard settings. Keys are scoped to your organization.
Base URL
https://api.vectorlay.comEndpoints
/v1/clustersCreate a new GPU cluster with the specified configuration.
/v1/clustersList all clusters in your organization.
/v1/clusters/:idGet detailed information about a specific cluster.
/v1/clusters/:idUpdate cluster configuration (replicas, env vars, etc.).
/v1/clusters/:idTerminate a cluster and release all resources.
/v1/clusters/:id/statusGet real-time health and status of all cluster replicas.
/v1/gpusList available GPU types and current pricing.
/v1/gpus/:type/availabilityCheck real-time availability for a specific GPU type.
Example: Create a cluster
curl -X POST https://api.vectorlay.com/v1/clusters \
-H "Authorization: Bearer vl_xxx" \
-H "Content-Type: application/json" \
-d '{
"name": "my-cluster",
"gpu_type": "rtx-4090",
"replicas": 2,
"container": "vllm/vllm-openai:latest",
"env": {
"MODEL": "meta-llama/Llama-3.1-8B-Instruct"
}
}'{
"id": "cl_abc123",
"name": "my-cluster",
"status": "provisioning",
"gpu_type": "rtx-4090",
"replicas": 2,
"endpoint": "https://my-cluster-abc123.run.vectorlay.com",
"created_at": "2026-01-15T10:30:00Z"
}Rate Limits
API requests are rate limited to 100 requests per minute per API key. Rate limit headers are included in all responses:
X-RateLimit-Limit— Maximum requests per windowX-RateLimit-Remaining— Requests remainingX-RateLimit-Reset— Window reset time (Unix timestamp)
Error Handling
Errors return standard HTTP status codes with a JSON body:
{
"error": {
"code": "invalid_gpu_type",
"message": "GPU type 'rtx-5080' is not available.",
"status": 400
}
}