We make large language models faster on real hardware.
Zapdev-labs builds quantization, inference runtimes, and benchmarks — Rust and Python tooling that cuts memory, raises tokens-per-second, and ships OpenAI-compatible APIs you can run locally.
- Weight & KV-cache quantization
- Local inference (Rust + Python)
- Cross-backend TPS benchmarks
- Low-VRAM and edge deployment
12
Open-source projects
3-bit
KV cache compression
>
TPS vs stock llama.cpp
zapdev-labs / oxidize — bench session a91c
$ turboforge hardware probe
→ Ryzen 7 PRO 6850H · 28GB RAM · llama.cpp + vLLM candidates
$ turboquant bench --bits 3 --target kv-cache
→ FastVQ: 6x KV memory reduction · recall within baseline
$ oxidize serve --model qwen3.5-4b-q4
ok OpenAI /v1 ready · oxidize runtime (not llama-server)
$ python benchmark.py --backends llama_cpp,oxidize,miniforge
→ same GGUF · llama.cpp baseline · oxidize + miniforge ahead on TPS
ok fastest this run: oxidize (higher tok/s than llama.cpp)
$ ▊
Flagship work
Open source for faster LLMs
These are the repos we actively push on GitHub — inference engines, quantizers, runtimes, and the benchmarks that prove the wins.
Install
One command to run models locally
The installer detects your OS and architecture, drops the oxidize binary on your PATH, and falls back to a source build when no prebuilt is published. macOS, Linux, and Windows — x86_64 and arm64.
$ curl -fsSL https://zapdev.link/install.sh | shCargo
From source, any platform
cargo install --git https://github.com/Zapdev-labs/oxidize --bin oxidize --lockedwget
No curl? No problem
wget -qO- https://zapdev.link/install.sh | shPiping to a shell runs code from the network. Read it first: zapdev.link/install.sh
More from the org
Benchmarks, memory, and agents
How we ship performance
From probe to production API
Our repos chain together: measure the machine, compress weights and KV cache, benchmark backends, then serve through OpenAI-compatible endpoints.
- 01
Probe hardware
turboforge hardware probe · miniforge config doctor
- 02
Compress
turboquant / oxidize-quantize · 3-bit KV and weight paths
- 03
Benchmark
ollama-performance-benchmark · tokens/sec per backend
- 04
Serve
oxidize-server · turboforge runtime · miniforge streaming
- 05
Ship
OpenAI-compatible APIs and reproducible manifests
People
Who builds here
The team behind Zapdev-labs — inference, quantization, and the benchmarks that back our claims.
Proof, not promises
Benchmarks you can rerun
ollama-performance-benchmark runs each backend sequentially on the same GGUF and prompt. On our hardware, oxidize and miniforge beat stock llama.cpp tokens-per-second; results land in results.csv. turboforge persists benchmark history on its runtime path.
Reference