Discussion about this post

User's avatar
Alina Khay's avatar

Loved this piece—thanks for sharing your insights!

Pawel Jozefiak's avatar

25 tps for SVG on a 16.8GB quant is a different machine than what I had a couple weeks ago. I tried something heavier first (70B on a Mac Mini) and the swap filled before VRAM did, almost cooked the SSD.

Wrote it up here for anyone considering the same path: https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026 - Qwen 35B-A3B ended up being the sweet spot for me, MoE leaves more headroom than dense at the same memory footprint.

Still figuring out the right context-length-vs-tps tradeoff for agent loops though.

3 more comments...

No posts

Ready for more?