#

vllm

Here are 1,459 public repositories matching this topic...

modelscope / FunASR

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

Updated Jun 29, 2026
Python

meta-llama / llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

python machine-learning ai pytorch llama finetuning llm langchain vllm llama2

Updated May 19, 2026
Jupyter Notebook

Orchestra-Research / AI-Research-SKILLs

Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepower. Maintained by Orchestra Research.

ai skills gemini codex claude ai-research machine-leanring megatron huggingface gpt-5 vllm grpo claude-code claude-skills

Updated Jun 16, 2026
TeX

LMCache / LMCache

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

fast amd cuda inference pytorch speed rocm kv-cache llm vllm

Updated Jun 29, 2026
Python

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

reinforcement-learning raylib transformers proximal-policy-optimization large-language-models reinforcement-learning-from-human-feedback vllm visual-language-models

Updated Jun 17, 2026
Python

xorbitsai / inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

Updated Jun 29, 2026
Python

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

kubernetes rust routing-engine omni diffusion vllm llm-inference tensorrt-llm sglang disaggregated-serving

Updated Jun 29, 2026
Rust

Mooncake

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

reinforcement-learning inference rdma disaggregation llm vllm sglang kvcache trt-llm tokenspeed

Updated Jun 29, 2026
C++

kserve / kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Updated Jun 25, 2026
Go

UltraRAG

OpenBMB / UltraRAG

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

flask demo ui mcp openai easy gpt embedding vlm multimodal rag sentence-transformers huggingface-transformers llm vllm qwen deepseek

Updated Jun 28, 2026
Python

Awesome-LLM-Inference

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

Updated Jun 23, 2026
Python

gpustack / gpustack

A GPU cluster manager for high-performance AI model serving (vLLM, SGLang) and on-demand SSH-accessible GPU instances.

cuda inference openai llama maas rocm ascend llm llm-serving vllm genai llm-inference qwen deepseek sglang distributed-inference high-performance-inference mindie

Updated Jun 29, 2026
Python

katanaml / sparrow

Structured data extraction, instruction calling and agentic workflows with ML, LLM and Vision LLM

computer-vision machinelearning huggingface-transformers documentai llm vllm agentic-ai

Updated Jun 27, 2026
Python

mostlygeek / llama-swap

Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc

golang openai llama openai-api llamacpp vllm localllm localllama

Updated Jun 29, 2026
Go

vllm-project / semantic-router

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

kubernetes rust golang mcp fine-tuning pii-detection mixture-of-models huggingface-transformers bert-classification llm prompt-engineering vllm huggingface-candle ai-gateway semantic-router prompt-guard llmrouter openclaw

Updated Jun 29, 2026
Go

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

python course serving llm large-language-model vllm qwen qwen2

Updated Jun 13, 2026
Python

PaddlePaddle / FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

inference openai serving ernie llm llm-serving vllm ernie-45 ernie-45-vl

Updated Jun 26, 2026
Python

containers / ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

ai containers cuda intel hip hacktoberfest inference-server podman llm llamacpp vllm

Updated Jun 29, 2026
Python

lemony-ai / cascadeflow

Cascading runtime for AI agents. Optimize cost, latency, quality, and policy decisions inside the agent loop.

Updated May 16, 2026
Python

vllm-project / vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

inference transformer model-serving mlops ascend llm llmops llm-serving vllm

Updated Jun 29, 2026
C++

Improve this page

Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."