Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2
-
Updated
Jun 29, 2026 - Rust
Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2
eLLM can infer LLM on CPUs faster than on GPUs
Local LLM inference engine written from scratch in Rust — hand-written AVX-512 assembly kernels, Metal & Vulkan compute shaders. Supports Qwen3, Mistral3, ... Q4/INT8/BF16 quantization.
Pure-Rust, CPU-only OCR engine for Baidu Unlimited-OCR (a DeepSeek-OCR-derived 3B MoE VLM). One fixed model, custom int4/int8 kernels, no ML framework, no GPU. Pre-Phase-0 scaffold.
LLM inference in Rust - Metal & CUDA
Rust-native MoE inference runtime with custom CUDA kernels for Blackwell GPUs. Includes DFlash speculative decoding, multi-tier Engram memory, and entropy-adaptive routing. Targets Qwen3.5-35B-A3B on a single RTX 5060 Ti 16GB.
Run Qwen3.5-122B-A10B on a 16 GB MacBook Air via SSD-streamed MoE expert weights.
SSD-streaming MoE inference engine for consumer hardware. Run 80B parameter models on a 24GB Mac.
PERSPECTIVE v2 — A 1.05 trillion parameter sparse Mixture-of-Experts language model that runs on consumer hardware (4 GB VRAM + 32 GB RAM). Features O(1) perspective decay recurrence, 3D torus manifold routing, native ternary {-1,0,+1} weights, holographic distributed memory, and hard geometric safety constraints. Built in Rust.
Enabling inference of large mixture-of-experts (MoE) models on Apple Silicon using dynamic offloading.
Heterogeneous Compute Cascade (HCC) — distributed 400B-parameter MoE inference across dual AMD Ryzen AI MAX+ 395 'Strix Halo' workstations via USB4.
Frontier AI on the Macs you already own. Treats your SSD as memory and splits models across paired devices, so 18 GB models run on 8 GB Macs. Local, private, OpenAI-compatible — works with Claude Code, Cursor, any agent. Built on iroh + SwiftLM.
Add a description, image, and links to the moe topic page so that developers can more easily learn about it.
To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."