API Reference

The VectorLay REST API lets you manage GPU clusters, check availability, and monitor deployments programmatically.

Authentication

All API requests require an API key passed in the Authorization header:

Authorization: Bearer vl_your_api_key

Generate API keys in your dashboard settings. Keys are scoped to your organization.

Base URL

https://api.vectorlay.com

Endpoints

POST

/v1/clusters

Create a new GPU cluster with the specified configuration.

GET

/v1/clusters

List all clusters in your organization.

GET

/v1/clusters/:id

Get detailed information about a specific cluster.

PATCH

/v1/clusters/:id

Update cluster configuration (replicas, env vars, etc.).

DELETE

/v1/clusters/:id

Terminate a cluster and release all resources.

GET

/v1/clusters/:id/status

Get real-time health and status of all cluster replicas.

GET

/v1/gpus

List available GPU types and current pricing.

GET

/v1/gpus/:type/availability

Check real-time availability for a specific GPU type.

Example: Create a cluster

Request

curl -X POST https://api.vectorlay.com/v1/clusters \
  -H "Authorization: Bearer vl_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-cluster",
    "gpu_type": "rtx-4090",
    "replicas": 2,
    "container": "vllm/vllm-openai:latest",
    "env": {
      "MODEL": "meta-llama/Llama-3.1-8B-Instruct"
    }
  }'

Response

{
  "id": "cl_abc123",
  "name": "my-cluster",
  "status": "provisioning",
  "gpu_type": "rtx-4090",
  "replicas": 2,
  "endpoint": "https://my-cluster-abc123.run.vectorlay.com",
  "created_at": "2026-01-15T10:30:00Z"
}

Rate Limits

API requests are rate limited to 100 requests per minute per API key. Rate limit headers are included in all responses:

X-RateLimit-Limit — Maximum requests per window
X-RateLimit-Remaining — Requests remaining
X-RateLimit-Reset — Window reset time (Unix timestamp)

Error Handling

Errors return standard HTTP status codes with a JSON body:

{
  "error": {
    "code": "invalid_gpu_type",
    "message": "GPU type 'rtx-5080' is not available.",
    "status": 400
  }
}