We Shipped ML Performance Tools — Then We Had to Shut Down
We shipped oxidize, turboquant, PetaLLM, and the rest of the Zapdev-labs stack — then shut down when our R&D budget could not sustain the work.
By Founder | Jackson Wheeler
I am writing this as the founder of ZapDev, not as a faceless product team. Over the past year we stopped treating inference performance as a nice-to-have and built a stack of real tools — open source on Zapdev-labs. We shipped oxidize, quantization work in turboquant, memory-frugal runners like PetaLLM and Ommi LLM, hardware-aware deployment with TurboForge, and more. Then our R&D budget ran out. We are sunsetting the ZapDev app builder and winding down active development. New project creation is disabled. Existing projects stay in read-only mode until June 21, 2026, so you can export what you need. Thank you to everyone who used ZapDev, cloned our repos, and filed issues that made these tools better.
What We Actually Shipped
Everything below lives in public repos under github.com/Zapdev-labs. These are the ML performance products we released — not placeholders, not a rebrand deck.
oxidize
oxidize is our Rust workspace for local LLM tooling: `oxidize-core` for model loading, quantization, and sampling primitives; `oxidize-cli` for chat, profiling, and model planning; `oxidize-server` as an OpenAI-compatible HTTP API; `oxidize-quantize` for file quantization; and `oxidize-py` Python bindings via PyO3. It was the foundation we wanted for fast, local inference without duct-taping five repos together.
turboquant (FastVQ)
turboquant implements TurboQuant-style vector quantization for model weights and KV caches — published on PyPI as `fastvq`. PolarQuant, QJL, and the combined two-stage pipeline target 3-bit compression, smaller KV footprints, and benchmark suites you can run on real hardware. Miniforge integrates TurboQuant KV cache compression (`turbo3`) for production-style MiniMax runs.
PetaLLM (airllm)
airllm ships as **PetaLLM** on PyPI: layer-wise inference that lets 70B-class models run on a single 4GB GPU, with optional 4-bit/8-bit block-wise compression for roughly 3x speedups when disk I/O is the bottleneck. We pushed support across Llama 3, Mixtral, Qwen, and very large checkpoints — including paths toward 405B-class models on tight VRAM.
Ommi LLM
ommi-llm is another memory-efficient inference engine: one transformer layer at a time, SafeTensor sharding, async prefetch, 4-bit/8-bit compression, and an MCP server plus TUI for local model management. Same mission as PetaLLM — run 70B+ on consumer GPUs — with a different integration surface for agents and terminal workflows.
TurboForge & Miniforge
TurboForge is a hardware-aware AI runtime: OpenAI-compatible serving, manifests and audit logs, planner-driven optimization hints, and backends across `llama.cpp`, vLLM, and a mock path for development. Miniforge targets high-performance MiniMax M2.7 inference on constrained machines — GGUF quantization, TurboQuant KV cache, tool calling, vision, and runtime presets tuned for real RAM limits. miniforge-2 continued that line of work in parallel.
Benchmarks & measurement
ollama-performance-benchmark runs back-to-back local LLM tokens-per-second tests across Ollama, vLLM, llama.cpp, and Miniforge on the same prompt and model — sequential, reproducible, no fake parallel speedups. We used it constantly while tuning the rest of the stack.
Why We Shut Down
We did not shut down because oxidize or turboquant failed in isolation. We shut down because our **R&D budget was really low** — enough to ship and open-source a serious portfolio, not enough to keep full-time engineers on bare-metal calibration, new model architectures every month, and an AI app builder at the same time. Each repo above needs ongoing maintenance: driver stacks change, GGUF families break loaders, quantization recipes need revalidation. We chose depth over another commoditized app generator. We shipped on Zapdev-labs. We ran out of runway to maintain everything. Sunsetting active development is the honest outcome.
What Happens to Your Projects
All existing ZapDev projects remain in read-only mode until June 21, 2026. Export your code and assets anytime before that date. After June 21, project data will be permanently deleted. If you are juggling many repos and need help exporting, email Jacksonwheeler@zapdev.link and we will help manually.
Looking for an Alternative?
If you relied on ZapDev for AI-powered app generation, we recommend LuminaWeb at luminaweb.app as the closest spiritual successor. Multi-framework support, sandboxed execution, and a conversational builder should feel familiar. We have no financial relationship with LuminaWeb — it is simply the best alternative we have found.
Stay in Touch
Artifacts and research notes from ZapDev Labs live on GitHub at github.com/Zapdev-labs. The main ZapDev codebase is available at github.com/Zapdev-labs/zapdev and may be archived in the future. Follow there if you want benchmarks, open-source releases, or postmortems on what we learned. Questions about the transition go to Jacksonwheeler@zapdev.link. I am grateful you were part of this chapter. We built real ML performance tooling on a shoestring — and I am proud of what the team shipped, even though we could not afford to keep going.