Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac
Plus Visualizing local election results with Datasette, Observable and MapLibre GL
In this newsletter:
Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac
Visualizing local election results with Datasette, Observable and MapLibre GL
Plus 9 links and 4 quotations
Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 2024-11-12
There's a whole lot of buzz around the new Qwen2.5-Coder Series of open source (Apache 2.0 licensed) LLM releases from Alibaba's Qwen research team. On first impression it looks like the buzz is well deserved.
Qwen claim:
Qwen2.5-Coder-32B-Instruct has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o.
That's a big claim for a 32B model that's small enough that it can run on my 64GB MacBook Pro M2. The Qwen published scores look impressive, comparing favorably with GPT-4o and Claude 3.5 Sonnet (October 2024) edition across various code-related benchmarks:
How about benchmarks from other researchers? Paul Gauthier's Aider benchmarks have a great reputation and Paul reports:
The new Qwen 2.5 Coder models did very well on aider's code editing benchmark. The 32B Instruct model scored in between GPT-4o and 3.5 Haiku.
84% 3.5 Sonnet, 75% 3.5 Haiku, 74% Qwen2.5 Coder 32B, 71% GPT-4o, 69% Qwen2.5 Coder 14B, 58% Qwen2.5 Coder 7B
That was for the Aider "whole edit" benchmark. The "diff" benchmark scores well too, with Qwen2.5 Coder 32B tying with GPT-4o (but a little behind Claude 3.5 Haiku).
Given these scores (and the positive buzz on Reddit) I had to try it for myself.
My attempts to run the Qwen/Qwen2.5-Coder-32B-Instruct-GGUF Q8 using llm-gguf were a bit too slow, because I don't have that compiled to use my Mac's GPU at the moment.
But both the Ollama version and the MLX version worked great!
I installed the Ollama version using:
ollama pull qwen2.5-coder:32b
That fetched a 20GB quantized file. I ran a prompt through that using my LLM tool and Sergey Alexandrov's llm-ollama plugin like this:
llm install llm-ollama
llm models # Confirming the new model is present
llm -m qwen2.5-coder:32b 'python function that takes URL to a CSV file and path to a SQLite database, fetches the CSV with the standard library, creates a table with the right columns and inserts the data'
Here's the result. The code worked, but I had to work around a frustrating ssl
bug first (which wouldn't have been an issue if I'd allowed the model to use requests
or httpx
instead of the standard library).
I also tried running it using the Apple Silicon fast array framework MLX using the mlx-llm library directly, run via uv like this:
uv run --with mlx-lm \
mlx_lm.generate \
--model mlx-community/Qwen2.5-Coder-32B-Instruct-8bit \
--max-tokens 4000 \
--prompt 'write me a python function that renders a mandelbrot fractal as wide as the current terminal'
That gave me a very satisfying result - when I ran the code it generated in a terminal I got this:
MLX reported the following performance metrics:
Prompt: 49 tokens, 95.691 tokens-per-sec
Generation: 723 tokens, 10.016 tokens-per-sec
Peak memory: 32.685 GB
Let's see how it does on the Pelican on a bicycle benchmark.
llm -m qwen2.5-coder:32b 'Generate an SVG of a pelican riding a bicycle'
Here's what I got:
Questionable Pelican SVG drawings aside, this is a really promising development. 32GB is just small enough that I can run the model on my Mac without having to quit every other application I'm running, and both the speed and the quality of the results feel genuinely competitive with the current best of the hosted models.
Given that code assistance is probably around 80% of my LLM usage at the moment this is a meaningfully useful release for how I engage with this class of technology.
Visualizing local election results with Datasette, Observable and MapLibre GL - 2024-11-09
Alex Garcia and myself hosted the first Datasette Open Office Hours on Friday - a live-streamed video session where we hacked on a project together and took questions and tips from community members on Discord.
We didn't record this one (surprisingly not a feature that Discord offers) but we hope to do more of these and record them in the future.
This post is a detailed write-up of what we built during the session.
San Mateo County election results
I live in El Granada, a tiny town just north of Half Moon Bay in San Mateo County, California.
Every county appears to handle counting and publishing election results differently. For San Mateo County the results are published on this page, and detailed per-precinct and per-candidate breakdowns are made available as a CSV file.
(I optimistically set up a Git scraper for these results in simonw/scrape-san-mateo-county-election-results-2024 only to learn that the CSV is updated just once a day, not continually as the ballots are counted.)
I'm particularly invested in the results of the Granada Community Services District board member elections. Our little town of El Granada is in "unincorporated San Mateo County" which means we don't have a mayor or any local officials, so the closest we get to hyper-local government is the officials that run our local sewage and parks organization! My partner Natalie ran the candidate forum event (effectively the debate) featuring three of the four candidates running for the two open places on the board.
Let's explore the data for that race using Datasette.
Importing CSV data into Datasette
I ran my part of the demo using Datasette Cloud, the beta of my new hosted Datasette service.
I started by using the pre-configured datasette-import plugin to import the data from the CSV file into a fresh table:
Modifying the schema
The table imported cleanly, but all of the columns from the CSV were still being treated as text. I used the datasette-edit-schema plugin to switch the relevant columns to integers so that we could run sums and sorts against them.
(I also noted that I really should add a "detect column types" feature to that plugin!)
The resulting 15,589 rows represent counts from individual precincts around the county for each of the races and measures on the ballot, with a row per precinct per candidate/choice per race.
Faceting and filtering the table
Since I'm interested in the Granada Community Services District election, I applied a facet on "Contest_title" and then used that to select that specific race.
I applied additional facets on "candidate_name" and "Precinct name".
This looks right to me: we have 7 precincts and 4 candidates for 28 rows in total.
Importing geospatial precinct shapes
Those precinct names are pretty non-descriptive! What does 33001 mean?
To answer that question, I added a new table.
San Mateo County offers precinct maps in the form of 23 PDF files. Our precincts are in the "Unincorporated Coastside" file:
Thankfully the county also makes that data available as geospatial data, hosted using Socrata with an option to export as GeoJSON.
The datasette-import
plugin can handle JSON files... and if a JSON file contains a top-level object with a key that is an array of objects, it will import those objects as a table.
Dragging that file into Datasette is enough to import it as a table with a properties
JSON column containing properties and a geometry
JSON columnn with the GeoJSON geometry.
Here's where another plugin kicks in: datasette-leaflet-geojson looks for columns that contain valid GeoJSON geometries and... draws them on a map!
So now we can see the shape of the individual geometries.
Enriching that data to extract the precinct IDs
The precinctid
is present in the data, but it's tucked away in a JSON object in that properties
JSON blob. It would be more convenient if it was a top-level column.
Datasette's enrichments feature provides tools for running operations against every row in a table and adding new columns based on the results.
My Datasette Cloud instance was missing the datasette-enrichments-quickjs plugin that would let me run JavaScript code against the data. I used my privileged access on Datasette Cloud to add that plugin to my requirements and restarted the instance to install it.
I used that to run this JavaScript code against every row in the table and saved the output in a new precinct_id
column:
function enrich(row) {
return JSON.parse(row.properties).precinctid;
}
This took less than a second to run, adding and populating a new precinct_id
column for the table.
Running a join
I demonstrated how to run a join between the election results and the precincts table using the Datasette SQL query editor.
I tried a few different things, but the most interesting query was this one:
select
Precinct_name,
precincts.geometry,
total_ballots,
json_group_object(
candidate_name,
total_votes
) as votes_by_candidate
from
election_results
join precincts on election_results.Precinct_name = precincts.precinct_id
where
Contest_title = "Granada Community Services District Members, Board of Directors"
group by
Precinct_name,
precincts.geometry,
total_ballots;
Creating an API token to access the data
I was nearly ready to hand over to Alex for the second half of our demo, where he would use Observable Notebooks to build some custom visualizations on top of the data.
A great pattern for this is to host the data in Datasette and then fetch it into Observable via the Datasette JSON API.
Since Datasette Cloud instances are private by default we would need to create an API token that could do this.
I used this interface (from the datasette-auth-tokens plugin) to create a new token with read-only access to all databases and tables in the instance:
Since we're running a dedicated instance just for Datasette Public Office Hours there's no reason not to distribute that read-only token in publically accessible code.
Getting CORS working
Embarrassingly, I had forgotten that we would need CORS headers in order to access the data from an Observable notebook. Thankfully we have another plugin for that: datasette-cors. I installed that quickly and we confirmed that it granted access to the API from Observable as intended.
I handed over to Alex for the next section of the demo.
Working with Datasette in Observable
Alex started by running a SQL query from client-side JavaScript to pull in the joined data for our specific El Granada race:
sql = `
select
Precinct_name,
precincts.geometry,
Split_name,
Reporting_flag,
Update_count,
Pct_Id,
Pct_seq_nbr,
Reg_voters,
Turn_Out,
Contest_Id,
Contest_seq_nbr,
Contest_title,
Contest_party_name,
Selectable_Options,
candidate_id,
candidate_name,
Candidate_Type,
cand_seq_nbr,
Party_Code,
total_ballots,
total_votes,
total_under_votes,
total_over_votes,
[Vote Centers_ballots],
[Vote Centers_votes],
[Vote Centers_under_votes],
[Vote Centers_over_votes],
[Vote by Mail_ballots],
[Vote by Mail_votes],
[Vote by Mail_under_votes],
[Vote by Mail_over_votes]
from
election_results join precincts on election_results.Precinct_name = precincts.precinct_id
where "Contest_title" = "Granada Community Services District Members, Board of Directors"
limit 101;`
And in the next cell:
raw_data = fetch(
`https://datasette-public-office-hours.datasette.cloud/data/-/query.json?_shape=array&sql=${encodeURIComponent(
sql
)}`,
{
headers: {
Authorization: `Bearer ${secret}`
}
}
).then((r) => r.json())
Note the ?_shape=array
parameter there, which causes Datasette to output the results directly as a JSON array of objects.
That's all it takes to get the data into Observable. Adding another cell like this confirms that the data is now available:
Inputs.table(raw_data)
Visualizing those with MapLibre GL
There are plenty of good options for visualizing GeoJSON data using JavaScript in an Observable notebook.
Alex started with MapLibre GL, using the excellent OpenFreeMap 3D tiles:
viewof map = {
const container = html`<div style="height:800px;">`;
yield container;
const map = (container.value = new maplibregl.Map({
container,
zoom: 2,
//style: "https://basemaps.cartocdn.com/gl/voyager-gl-style/style.json",
style: "https://tiles.openfreemap.org/styles/liberty",
scrollZoom: true
}));
yield container;
map.on("load", function () {
map.fitBounds(d3.geoBounds(data), { duration: 0 });
map.addSource("precincts", {
type: "geojson",
data: data
});
map.addLayer({
id: "precincts",
type: "fill",
source: "precincts",
paint: {
"fill-opacity": 0.4,
"fill-color": [
"case",
["==", ["get", "ratio"], null], "#000000",
[
"interpolate",
["linear"],
["get", "ratio"],
0.0, "#0000ff",
0.5, "#d3d3d3",
1.0, "#ff0000"
]
]
}
});
map.on("click", "precincts", (e) => {
const { precinct, ratio } = e.features[0].properties;
const description = JSON.stringify();
new maplibregl.Popup()
.setLngLat(e.lngLat)
.setHTML(description)
.addTo(map);
});
});
invalidation.then(() => map.remove());
}
(This is just one of several iterations, I didn't capture detailed notes of every change Alex made to the code.)
Observable Plot
Observable notebooks come pre-loaded with the excellent Observable Plot charting library - Mike Bostock's high-level charting tool built on top of D3.
Alex used that to first render the shapes of the precincts directly, without even needing a tiled basemap:
Plot.plot({
width,
height: 600,
legend: true,
projection: {
type: "conic-conformal",
parallels: [37 + 4 / 60, 38 + 26 / 60],
rotate: [120 + 30 / 60, 0],
domain: data
},
marks: [
Plot.geo(data, {
strokeOpacity: 0.1,
fill: "total_votes",
title: (d) => JSON.stringify(d.properties),
tip: true
})
]
})
The parallels
and rotate
options there come from the handy veltman/d3-stateplane repo, which lists recommended settings for the State Plane Coordinate System used with projections in D3. Those values are for California Zone 3.
Bringing it all together
For the grand finale, Alex combined everything learned so far to build an interactive map allowing a user to select any of the 110 races on the ballot and see a heatmap of results for any selected candidate and option:
You can try this out in Alex's notebook. Here's the relevant code (Observable cells are divided by // ---
comments). Note that Observable notebooks are reactive and allow variables to be referenced out of order.
// Select the contest
viewof contest = Inputs.select(contests, { label: "Choose a contest" })
// ---
// And the candidate
viewof candidate = Inputs.radio(
candidates,
{
label: "Choose a candidate",
value: candidates[0]
}
)
// ---
// Show the map itself
Plot.plot({
width,
height: 600,
legend: true,
color: { scheme: "blues", legend: true },
projection: {
type: "mercator",
domain: data2
},
marks: [
Plot.geo(data2, {
strokeOpacity: 0.1,
fill: "ratio",
tip: true
})
]
})
# ---
data2 = ({
type: "FeatureCollection",
features: raw_data2.map((d) => ({
type: "Feature",
properties: {
precinct: d.Precinct_name,
total_ballots: d.total_ballots,
ratio: JSON.parse(d.votes_by_candidate)[candidate] / d.total_ballots
},
geometry: JSON.parse(d.geometry)
}))
})
// ---
raw_data2 = query(
`select
Precinct_name,
precincts.geometry,
total_ballots,
json_grop_object(
candidate_name,
total_votes
) as votes_by_candidate
from
election_results
join precincts on election_results.Precinct_name = precincts.precinct_id
where Contest_title = :contest
group by
Precinct_name,
precincts.geometry,
total_ballots;`,
{ contest }
)
// ---
raw_data2 = query(
`select
Precinct_name,
precincts.geometry,
total_ballots,
json_group_object(
candidate_name,
total_votes
) as votes_by_candidate
from
election_results
join precincts on election_results.Precinct_name = precincts.precinct_id
where Contest_title = :contest
group by
Precinct_name,
precincts.geometry,
total_ballots;`,
{ contest }
)
// ---
// Fetch the available contests
contests = query("select distinct Contest_title from election_results").then(
(d) => d.map((d) => d.Contest_title)
)
// ---
// Extract available candidates for selected contest
candidates = Object.keys(
JSON.parse(raw_data2[0].votes_by_candidate)
)
// ---
function query(sql, params = {}) {
return fetch(
`https://datasette-public-office-hours.datasette.cloud/data/-/query.json?${new URLSearchParams(
{ sql, _shape: "array", ...params }
).toString()}`,
{
headers: {
Authorization: `Bearer ${secret}`
}
}
).then((r) => r.json());
}
We'll be doing this again
This was our first time trying something like this and I think it worked really well. We're already thinking about ways to improve it next time:
I want to record these sessions and make them available on YouTube for people who couldn't be there live
It would be fun to mix up the format. I'm particularly keen on getting more people involved giving demos - maybe having 5-10 minute lightning demo slots so we can see what other people are working on
Keep an eye on this blog or on the Datasette Discord for news about future sessions.
Link 2024-11-08 ChainForge:
I'm still on the hunt for good options for running evaluations against prompts. ChainForge offers an interesting approach, calling itself "an open-source visual programming environment for prompt engineering".
The interface is one of those boxes-and-lines visual programming tools, which reminds me of Yahoo Pipes.
It's open source (from a team at Harvard) and written in Python, which means you can run a local copy instantly via uvx
like this:
uvx chainforge serve
You can then configure it with API keys to various providers (OpenAI worked for me, Anthropic models returned JSON parsing errors due to a 500 page from the ChainForge proxy) and start trying it out.
The "Add Node" menu shows the full list of capabilities.
The JavaScript and Python evaluation blocks are particularly interesting: the JavaScript one runs outside of a sandbox using plain eval()
, while the Python one still runs in your browser but uses Pyodide in a Web Worker.
Link 2024-11-08 uv 0.5.0:
The first backwards-incompatible (in minor ways) release after 30 releases without a breaking change.
I found out about this release this morning when I filed an issue about a fiddly usability problem I had encountered with the combo of uv
and conda
... and learned that the exact problem had already been fixed in the brand new version!
Quote 2024-11-09
This is a very friendly and supportive place where you are surrounded by peers - we all want to help each other succeed. The golden rule of this server is:
Don't ever try to impress anyone here with your knowledge! Instead try to impress folks here with your desire to learn, and desire to help others learn.
Link 2024-11-10 Everything I've learned so far about running local LLMs:
Chris Wellons shares detailed notes on his experience running local LLMs on Windows - though most of these tips apply to other operating systems as well.
This is great, there's a ton of detail here and the root recommendations are very solid: Use llama-server
from llama.cpp and try ~8B models first (Chris likes Llama 3.1 8B Instruct at Q4_K_M as a first model), anything over 10B probably won't run well on a CPU so you'll need to consider your available GPU VRAM.
This is neat:
Just for fun, I ported llama.cpp to Windows XP and ran a 360M model on a 2008-era laptop. It was magical to load that old laptop with technology that, at the time it was new, would have been worth billions of dollars.
I need to spend more time with Chris's favourite models, Mistral-Nemo-2407 (12B) and Qwen2.5-14B/72B.
Chris also built illume, a Go CLI tool for interacting with models that looks similar to my own LLM project.
Link 2024-11-11 MDN Browser Support Timelines:
I complained on Hacker News today that I wished the MDN browser compatibility ables - like this one for the Web Locks API - included an indication as to when each browser was released rather than just the browser numbers.
It turns out they do! If you click on each browser version in turn you can see an expanded area showing the browser release date:
There's even an inline help tip telling you about the feature, which I've been studiously ignoring for years.
I want to see all the information at once without having to click through each browser. I had a poke around in the Firefox network tab and found https://bcd.developer.mozilla.org/bcd/api/v0/current/api.Lock.json- a JSON document containing browser support details (with release dates) for that API... and it was served using access-control-allow-origin: *
which means I can hit it from my own little client-side applications.
I decided to build something with an autocomplete drop-down interface for selecting the API. That meant I'd need a list of all of the available APIs, and I used GitHub code search to find that in the mdn/browser-compat-datarepository, in the api/
directory.
I needed the list of files in that directory for my autocomplete. Since there are just over 1,000 of those the regular GitHub contents API won't return them all, so I switched to the tree APIinstead.
Here's the finished tool - source code here:
95% of the code was written by LLMs, but I did a whole lot of assembly and iterating to get it to the finished state. Three of the transcripts for that:
Web Locks API Browser Support Timeline in which I paste in the original API JSON and ask it to come up with a timeline visualization for it.
Enhancing API Feature Display with URL Hash where I dumped in a more complex JSON example to get it to show multiple APIs on the same page, and also had it add
#fragment
bookmarking to the toolFetch GitHub API Data Hierarchy where I got it to write me an async JavaScript function for fetching a directory listing from that tree API.
Quote 2024-11-11
As a junior engineer, there's simply no substitute for getting the first 100K lines of code under your belt. The "start over each day" method will help get you to those 100K lines faster.
You might think covering the same ground multiple times isn't as valuable as getting 100K diverse lines of code. I disagree. Solving the same problem repeatedly is actually really beneficial for retaining knowledge of patterns you figure out.
You only need 5K perfect lines to see all the major patterns once. The other 95K lines are repetition to rewire your neurons.
Quote 2024-11-11
That development time acceleration of 4 days down to 20 minutes… that’s equivalent to about 10 years of Moore’s Law cycles. That is, using generative AI like this is equivalent to computers getting 10 years better overnight.
That was a real eye-opening framing for me. AI isn’t magical, it’s not sentient, it’s not the end of the world nor our saviour; we don’t need to endlessly debate “intelligence” or “reasoning.” It’s just that… computers got 10 years better. The iPhone was first released in 2007. Imagine if it had come out in 1997 instead. We wouldn’t even know what to do with it.
Link 2024-11-11 Binary vector embeddings are so cool:
Evan Schwartz:
Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression and ~25x retrieval speedup.
It's so unintuitive how well this trick works: take a vector of 1024x4 byte floating point numbers (4096 bytes = 32,768 bits), turn that into an array of single bits for > 0 or <= 0 which reduces it to just 1024 bits or 128 bytes - a 1/32 reduction.
Now you can compare vectors using a simple Hamming distance - a count of the number of bits that differ - and yet still get embedding similarity scores that are only around 10% less accurate than if you had used the much larger floating point numbers.
Evan digs into models that this works for, which include OpenAI's text-embedding-3-large
and the small but powerful all-MiniLM-L6-v2
.
Link 2024-11-11 How I ship projects at big tech companies:
This piece by Sean Goedecke on shipping features at larger tech companies is fantastic.
Why do so many engineers think shipping is easy? I know it sounds extreme, but I think many engineers do not understand what shipping even is inside a large tech company. What does it mean to ship? It does not mean deploying code or even making a feature available to users. Shipping is a social construct within a company. Concretely, that means that a project is shipped when the important people at your company believe it is shipped.
Sean emphasizes communication, building confidence and gaining trust and the importance of deploying previews of the feature (for example using feature flags) as early as possible to get that crucial internal buy-in and feedback from other teams.
I think a lot of engineers hold off on deploys essentially out of fear. If you want to ship, you need to do the exact opposite: you need to deploy as much as you can as early as possible, and you need to do the scariest changes as early as you can possibly do them. Remember that you have the most end-to-end context on the project, which means you should be the least scared of scary changes.
Link 2024-11-12 Ars Live: Our first encounter with manipulative AI:
I'm participating in a live conversation with Benj Edwards on 19th November reminiscing over that incredible time back in February last year when Bing went feral.
Link 2024-11-13 django-plugin-django-debug-toolbar:
Tom Viner built a plugin for my DJP Django plugin system that configures the excellent django-debug-toolbar debugging tool.
You can see everything it sets up for you in this Python code: it configures installed apps, URL patterns and middleware and sets the INTERNAL_IPS
and DEBUG
settings.
Here are Tom's running notes as he created the plugin.
Link 2024-11-13 Ollama: Llama 3.2 Vision:
Ollama released version 0.4 last week with support for Meta's first Llama vision model, Llama 3.2.
If you have Ollama installed you can fetch the 11B model (7.9 GB) like this:
ollama pull llama3.2-vision
Or the larger 90B model (55GB download, likely needs ~88GB of RAM) like this:
ollama pull llama3.2-vision:90b
I was delighted to learn that Sukhbinder Singh had already contributed support for LLM attachments to Sergey Alexandrov's llm-ollamaplugin, which means the following works once you've pulled the models:
llm install --upgrade llm-ollama
llm -m llama3.2-vision:latest 'describe' \
-a https://static.simonwillison.net/static/2024/pelican.jpg
This image features a brown pelican standing on rocks, facing the camera and positioned to the left of center. The bird's long beak is a light brown color with a darker tip, while its white neck is adorned with gray feathers that continue down to its body. Its legs are also gray.
In the background, out-of-focus boats and water are visible, providing context for the pelican's environment.
That's not a bad description of this image, especially for a 7.9GB model that runs happily on my MacBook Pro.
Quote 2024-11-13
This tutorial exists because of a particular quirk of mine: I love to write tutorials about things as I learn them. This is the backstory of TRPL, of which an ancient draft was "Rust for Rubyists." You only get to look at a problem as a beginner once, and so I think writing this stuff down is interesting. It also helps me clarify what I'm learning to myself.
I tried out the Qwen Coder model using ollama - M2 Mac 64 GB. It worked pretty well. I asked the same coding questions I had asked Perplexity - all real coding questions - and Qwen answered all of them well. Pretty impressive, especially that it runs on the local machine!
Thanks for sharing this info!