25 tps for SVG on a 16.8GB quant is a different machine than what I had a couple weeks ago. I tried something heavier first (70B on a Mac Mini) and the swap filled before VRAM did, almost cooked the SSD.
[Simon Willison 的 weekly newsletter](https://simonw.substack.com/p/gpt-55-chatgpt-images-20-qwen36-27b) — 想问他:Now that Codex is gone and the main model is the coding model, do you actually notice a quality difference in practice, or is this more of a product taxonomy cleanup?
These are great, I’m glad to be a GitHub sponsor too. You’ve really helped me keep up to date but not have to follow every daily new change out there. There’s just too much and went a few weeks trying to keep up and finally threw in the towel after not sleeping lol.
Loved this piece—thanks for sharing your insights!
25 tps for SVG on a 16.8GB quant is a different machine than what I had a couple weeks ago. I tried something heavier first (70B on a Mac Mini) and the swap filled before VRAM did, almost cooked the SSD.
Wrote it up here for anyone considering the same path: https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026 - Qwen 35B-A3B ended up being the sweet spot for me, MoE leaves more headroom than dense at the same memory footprint.
Still figuring out the right context-length-vs-tps tradeoff for agent loops though.
[Simon Willison 的 weekly newsletter](https://simonw.substack.com/p/gpt-55-chatgpt-images-20-qwen36-27b) — 想问他:Now that Codex is gone and the main model is the coding model, do you actually notice a quality difference in practice, or is this more of a product taxonomy cleanup?
These are great, I’m glad to be a GitHub sponsor too. You’ve really helped me keep up to date but not have to follow every daily new change out there. There’s just too much and went a few weeks trying to keep up and finally threw in the towel after not sleeping lol.
model release velocity is now its own product problem. three drops in a week means integration timelines are basically a fiction.