Run GPU inference and training jobs on serverless infrastructure that scales with you.
-
Updated
Jun 14, 2024 - Shell
Run GPU inference and training jobs on serverless infrastructure that scales with you.
Runpod-LLM provides ready-to-use container scripts for running large language models (LLMs) easily on RunPod.
Docker Compose stack for serving a local, OpenAI-compatible LLM (vLLM on Intel XPU) on an Intel Arc Pro B60 GPU — reproducible config with operator and developer docs.
Production-ready Kubernetes cluster with GPU support for deploying scalable LLM inference APIs on AWS. Automated setup with Kubespray, NVIDIA device plugin, and FastAPI-powered language model serving.
Deep learning environment setups
Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.
To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."