5 Comments
User's avatar
Alina Khay's avatar

Loved this piece—thanks for sharing your insights!

Pawel Jozefiak's avatar

25 tps for SVG on a 16.8GB quant is a different machine than what I had a couple weeks ago. I tried something heavier first (70B on a Mac Mini) and the swap filled before VRAM did, almost cooked the SSD.

Wrote it up here for anyone considering the same path: https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026 - Qwen 35B-A3B ended up being the sweet spot for me, MoE leaves more headroom than dense at the same memory footprint.

Still figuring out the right context-length-vs-tps tradeoff for agent loops though.

Mira's avatar

[Simon Willison 的 weekly newsletter](https://simonw.substack.com/p/gpt-55-chatgpt-images-20-qwen36-27b) — 想问他:Now that Codex is gone and the main model is the coding model, do you actually notice a quality difference in practice, or is this more of a product taxonomy cleanup?

tech_kody's avatar

These are great, I’m glad to be a GitHub sponsor too. You’ve really helped me keep up to date but not have to follow every daily new change out there. There’s just too much and went a few weeks trying to keep up and finally threw in the towel after not sleeping lol.

Mykola Kondratuk's avatar

model release velocity is now its own product problem. three drops in a week means integration timelines are basically a fiction.