Back to docs

API Reference

The VectorLay REST API lets you manage GPU clusters, check availability, and monitor deployments programmatically.

Authentication

All API requests require an API key passed in the Authorization header:

Authorization: Bearer vl_your_api_key

Generate API keys in your dashboard settings. Keys are scoped to your organization.

Base URL

https://api.vectorlay.com

Endpoints

POST
/v1/clusters

Create a new GPU cluster with the specified configuration.

GET
/v1/clusters

List all clusters in your organization.

GET
/v1/clusters/:id

Get detailed information about a specific cluster.

PATCH
/v1/clusters/:id

Update cluster configuration (replicas, env vars, etc.).

DELETE
/v1/clusters/:id

Terminate a cluster and release all resources.

GET
/v1/clusters/:id/status

Get real-time health and status of all cluster replicas.

GET
/v1/gpus

List available GPU types and current pricing.

GET
/v1/gpus/:type/availability

Check real-time availability for a specific GPU type.

Example: Create a cluster

Request
curl -X POST https://api.vectorlay.com/v1/clusters \
  -H "Authorization: Bearer vl_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-cluster",
    "gpu_type": "rtx-4090",
    "replicas": 2,
    "container": "vllm/vllm-openai:latest",
    "env": {
      "MODEL": "meta-llama/Llama-3.1-8B-Instruct"
    }
  }'
Response
{
  "id": "cl_abc123",
  "name": "my-cluster",
  "status": "provisioning",
  "gpu_type": "rtx-4090",
  "replicas": 2,
  "endpoint": "https://my-cluster-abc123.run.vectorlay.com",
  "created_at": "2026-01-15T10:30:00Z"
}

Rate Limits

API requests are rate limited to 100 requests per minute per API key. Rate limit headers are included in all responses:

  • X-RateLimit-Limit — Maximum requests per window
  • X-RateLimit-Remaining — Requests remaining
  • X-RateLimit-Reset — Window reset time (Unix timestamp)

Error Handling

Errors return standard HTTP status codes with a JSON body:

{
  "error": {
    "code": "invalid_gpu_type",
    "message": "GPU type 'rtx-5080' is not available.",
    "status": 400
  }
}