1. Create an account
Sign up at vectorlay.com/get-started and create your organization. You'll get an API key automatically.
2. Deploy a cluster
Use the REST API to deploy your first GPU cluster:
terminal
curl -X POST https://api.vectorlay.com/v1/clusters \
-H "Authorization: Bearer vl_xxx" \
-H "Content-Type: application/json" \
-d '{
"name": "my-first-cluster",
"gpu_type": "rtx-4090",
"replicas": 1,
"container": "vllm/vllm-openai:latest",
"env": {
"MODEL": "meta-llama/Llama-3.1-8B-Instruct",
"MAX_MODEL_LEN": "4096"
}
}'3. Check cluster status
Poll the status endpoint until your cluster is healthy:
terminal
curl https://api.vectorlay.com/v1/clusters/cl_abc123/status \
-H "Authorization: Bearer vl_xxx"4. Send a request
Once your cluster is healthy, send inference requests to your endpoint:
terminal
curl https://your-cluster.run.vectorlay.com/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'