Prerequisites
Learn about StarOps Prerequisites
Prerequisites
In your AWS account you will need to have an existing EKS cluster and quota to provision GPU instances for inference.
EKS
To use StarOps you will need an AWS account with at least one EKS cluster available in which StarOps can deploy models, an inference service, and any dependencies for model serving. During the deployment process you will be able to designate your cluster if you have more than one EKS cluster set up in your account.
Today the inference service is available in your VPC only.
GPU Instances
To serve large language models (LLMs) on your AWS account you will also need to ensure your service quote supports GPU instances. If you have not used GPU instances yet or have a new account, your vCPU quota for GPU instances will generally be set to 0 and you will need to request a quota limit increase. See below on how to request.
GPU Quota Requests
You can request a quota increase in your account using the account management console or the CLI tool. Please note that quota requests can take several days to process. You can check the status of your request in the AWS console.
Full details on this AWS quota request process can be found here: AWS Quota Increase Guide
How to calculate your vCPU requirements for your request
When you request your service quota increase you will need to specify the vCPUs. To assist with your request you can refer to the supported Models Table which provides a recommended vCPU per model
For example, to run Llama-3.2-3B-Instruct you will need to request 2 vCPUs to support the recommended g6.xlarge instance.