Skip to content
View mratsim's full-sized avatar
:shipit:
:shipit:
  • Paris

Organizations

@numforge

Block or report mratsim

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Free open-source community edition of DevCleaner - a fast development cache cleaner for macOS

Nim 5 Updated Jun 13, 2026
Python 365 34 Updated Jun 15, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 1,500 95 Updated Jun 8, 2026

Systematic benchmark study of DeepSeek-V4-Flash inference on 4× NVIDIA RTX PRO 6000 Blackwell (TP=4, FP8 KV, MTP=2, 1M context). Sustained decode matrix + Estonia long-context profile.

3 Updated Jun 21, 2026

Reverse engineering notes. Personal reference only. Everything here is a best-guess reconstruction.

Python 56 13 Updated Jun 23, 2026

A collection of tricks and tools to speed up transformer models

TeX 208 14 Updated May 6, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 14 7 Updated Jun 29, 2026

How good are LLMs at generating Lean code

Python 8 2 Updated May 24, 2026

Nim CPS runtime with http1.1, http2, http3, ws, sse, webtransport, irc, dns and a React-like DSL, http server DSL, and wasm compilation

Nim 9 Updated Apr 25, 2026

Wayland Compositor in Minecraft

Java 2,521 50 Updated Jun 28, 2026

(at least a useful portion of) Temporal Logic of Actions, a.k.a. TLA in Lean 4

Lean 29 Updated Jun 28, 2026

Minim(al/ized) LRU cache

Nim 8 Updated Feb 23, 2026

Dynamic Memory Management for Serving LLMs without PagedAttention

C 498 42 Updated Jun 10, 2026

A SillyTavern fork with Bun as the backend, along with UI/UX improvements

JavaScript 68 10 Updated Jun 29, 2026

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,636 288 Updated Jun 30, 2026

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 190 14 Updated Jun 29, 2026
Cuda 3 2 Updated Mar 21, 2026

henka is a bindings generator for Nim 👑

Nim 7 1 Updated Jun 2, 2026

An ultra-fast, distributed Safetensors loader

C++ 64 9 Updated Jun 22, 2026

High-performance safetensors model loader

Python 152 31 Updated Jun 29, 2026

Hub for ongoing Qwen inference benchmarks on NVIDIA Blackwell. Indexes all studies, hosts the rolling SOTA leaderboard, points to the toolchain.

Python 2 Updated May 15, 2026

Fast Tokens

Rust 106 14 Updated Jun 23, 2026

Docker images for LLM inference (SGLang + vLLM) on NVIDIA Blackwell GPUs (SM120, CUDA 13.2)

Python 27 4 Updated Jun 29, 2026

A multimodal database engine written in Nim

HTML 20 Updated Jun 18, 2026

Library for Nim.

Nim 16 Updated Jul 8, 2020

Shaders in Nim language

Nim 106 8 Updated Feb 21, 2026

Constraint solving library

Nim 55 7 Updated Jan 20, 2026

Remove large amounts of unwanted applications quickly.

C# 19,982 865 Updated Jun 18, 2026

A modern system cleaner built in Go with a TUI and CLI.

Go 64 4 Updated Jun 2, 2026

A high performance Python graph library implemented in Rust.

Rust 1,713 214 Updated Jun 22, 2026
Next