Back to docs

Quick Start

Deploy your first GPU inference cluster in under 5 minutes.

1. Create an account

Sign up at vectorlay.com/get-started and create your organization. You'll get an API key automatically.

2. Deploy a cluster

Use the REST API to deploy your first GPU cluster:

terminal
curl -X POST https://api.vectorlay.com/v1/clusters \
  -H "Authorization: Bearer vl_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-first-cluster",
    "gpu_type": "rtx-4090",
    "replicas": 1,
    "container": "vllm/vllm-openai:latest",
    "env": {
      "MODEL": "meta-llama/Llama-3.1-8B-Instruct",
      "MAX_MODEL_LEN": "4096"
    }
  }'

3. Check cluster status

Poll the status endpoint until your cluster is healthy:

terminal
curl https://api.vectorlay.com/v1/clusters/cl_abc123/status \
  -H "Authorization: Bearer vl_xxx"

4. Send a request

Once your cluster is healthy, send inference requests to your endpoint:

terminal
curl https://your-cluster.run.vectorlay.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Next steps