Trying out llama.cpp's new vision support

Plus Gemini 2.0 image creation via their API

May 11, 2025

In this newsletter:

Trying out llama.cpp's new vision support

Plus 11 links and 3 quotations and 1 TIL and 3 notes

Trying out llama.cpp's new vision support - 2025-05-10

This llama.cpp server vision support via libmtmd pull request - via Hacker News - was merged earlier today. The PR finally adds full support for vision models to the excellent llama.cpp project. It's documented on this page, but the more detailed technical details are covered here. Here are my notes on getting it working on a Mac.

llama.cpp models are usually distributed as .gguf files. This project introduces a new variant of those called mmproj, for multimodal projector. libmtmd is the new library for handling these.

You can try it out by compiling llama.cpp from source, but I found another option that works: you can download pre-compiled binaries from the GitHub releases.

On macOS there's an extra step to jump through to get these working, which I'll describe below.

Update: it turns out the Homebrew package for llama.cpp turns things around extremely quickly. You can run brew install llama.cpp or brew upgrade llama.cpp and start running the below tools without any extra steps.

I downloaded the llama-b5332-bin-macos-arm64.zip file from this GitHub release and unzipped it, which created a build/bin directory.

That directory contains a bunch of binary executables and a whole lot of .dylib files. macOS wouldn't let me execute these files because they were quarantined. Running this command fixed that for the llama-mtmd-cli and llama-server executables and the .dylib files they needed:

sudo xattr -rd com.apple.quarantine llama-server llama-mtmd-cli *.dylib

Now I can run an interactive terminal LLM session using this command:

./llama-mtmd-cli -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL

The first time this runs it fetches four files from unsloth/gemma-3-4b-it-GGUF - a quantized copy of Google DeepMind's Gemma 3 4B - and stores them in ~/Library/Caches/llama.cpp:

2.4G unsloth_gemma-3-4b-it-GGUF_gemma-3-4b-it-UD-Q4_K_XL.gguf
217B unsloth_gemma-3-4b-it-GGUF_gemma-3-4b-it-UD-Q4_K_XL.gguf.json
812M unsloth_gemma-3-4b-it-GGUF_mmproj-F16.gguf
203B unsloth_gemma-3-4b-it-GGUF_mmproj-F16.gguf.json

Once they've downloaded an interactive session starts. You can chat directly with the model or use the /image path-to-image.jpg command to load an image into the context.

I loaded in a pelican photo and prompted "describe it" and got back this:

It’s a striking photograph of a brown pelican standing in shallow water on a sandy beach. Here's a breakdown of what I see:
Subject: The main focus is a large brown pelican. It’s standing with its wings fully extended, creating a dramatic pose.
Pose: The pelican is in a slightly awkward, almost comical stance, with its legs planted firmly in the wet sand and its wings spread wide. It looks like it might be preparing to take off or is simply enjoying the water.
Color & Texture: The pelican is a rich, dark brown with lighter markings on its wings and head. The sand is a muted beige color, and the water reflects the sunlight, creating a shimmering effect. You can see the texture of the sand and the ripples in the water.
Lighting: The lighting appears to be bright and sunny, casting shadows and highlighting the details of the pelican's feathers.
Background: The background is blurred, suggesting a shallow depth of field, which keeps the focus on the pelican. It appears to be a beach scene.
Overall Impression: The image has a dynamic feel due to the pelican’s pose and the movement implied by the water. It’s a well-composed shot that captures a moment of wildlife in its natural habitat.
Do you want me to zoom in on a particular detail, or perhaps tell you more about pelicans?

Not bad for a 3.2GB model running on my laptop!

Screenshot of the terminal session showing the response quoted above.

Running llama-server

Even more fun is the llama-server command. This starts a localhost web server running on port 8080 to serve the model, with both a web UI and an OpenAI-compatible API endpoint.

The command to run it is the same:

./llama-server -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL

Now visit

http://localhost:8080

in your browser to start interacting with the model:

screenshot of llama.cpp web UI - I've added a photo of some pelicans and the prompt decsribe these creatures, it came back with the following: Okay, let's describe these beautiful birds! These are Brown Pelicans (Pelecanus occidentalis). Here's a breakdown of their key features, based on the image: Appearance: They are large, brown birds with distinctive long, pointed bills. Their plumage is a rich, chocolate brown, with lighter underparts. They have large, webbed feet. Posture: As you can see, they're standing together on a concrete embankment, likely near the water's edge. They're in a somewhat huddled or attentive stance. Size: They are quite large birds – adults can have a wingspan of up to 6.5 feet (2 meters). Behavior: Pelicans are known for their unique feeding technique, where they plunge-dive into the water to scoop up fish with their pouches. In the image, you can see: A group of 6-7 Brown Pelicans. A single bird in the foreground, slightly out of focus, showing a more detailed view of their feathers and feet. Where they are: The presence of these birds suggests they are likely in a coastal or wetland environment – perhaps a bay, estuary, or near a large body of water. Do you want me to delve deeper into any specific aspect of these birds, such as their habitat, diet, or conservation status? On the right is a Conversations sidebar with three other conversations listed.

It miscounted the pelicans in the group photo, but again, this is a tiny 3.2GB model.

With the server running on port 8080 you can also access the OpenAI-compatible API endpoint. Here's how to do that using curl:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Describe a pelicans ideal corporate retreat"}
    ]
  }' | jq

I built a new plugin for LLM just now called llm-llama-server to make interacting with this API more convenient. You can use that like this:

llm install llm-llama-server
llm -m llama-server 'invent a theme park ride for a pelican'

Or for vision models use llama-server-vision:

llm -m llama-server-vision 'describe this image' -a https://static.simonwillison.net/static/2025/pelican-group.jpg

The LLM plugin uses the streaming API, so responses will stream back to you as they are being generated.

Animated terminal session. $ llm -m llama-server 'invent a theme park ride for a pelican' Okay, this is a fun challenge! Let's design a theme park ride specifically for a pelican – a majestic, diving bird. Here’s my concept: Ride Name: “Pelican’s Plunge” Theme: Coastal Exploration & Underwater Discovery Target Audience: Families with children (8+ recommended), animal lovers, and those who enjoy a mix of thrills and gentle exploration. Ride Type: A partially submerged, rotating “pod” experience with a focus on simulated dives and underwater views. Ride Mechanics: 1. The Pod: Guests ride in a large, semi-circular pod shaped like a stylized, open-mouthed pelican’s beak. It’s made of reinforced, transparent acrylic and has comfortable seating inside. The pod can hold around 8-10 people. 2. The Launch: Guests board the pod and are positioned facing forward. The ride begins with a slow, gentle rise up a ramp, mimicking the pelican’s ascent from the water. 3. The "Dive" Sequence: This is the core of the ride. The pod enters a large, darkened chamber that simulates the ocean floor. * Rotating Simulation: The pod begins to rotate slowly, mimicking a pelican diving into the water. * Water Effects: The chamber is filled with realistic, moving water – created by a sophisticated system of pumps, jets, and screens. This creates the illusion of being surrounded by the ocean. * Projection Mapping: Powerful projection mapping is used on the walls and floor to create stunning underwater visuals: schools of fish, coral reefs, kelp forests, and even glimpses of marine life like sharks (safely projected, of course!). * “Dive” Trigger: At specific points in the rotation, the pod will perform a short, controlled “dive” – a sudden drop that creates a feeling of speed and the sensation of plunging into the water. Sensors detect the speed of the rotation and trigger these dives. 4. Underwater Exploration: After the initial dive, the pod continues its rotation, offering increasingly detailed views of the projected underwater scenes. * Interactive Elements (Optional): Small, strategically placed screens could display sonar-like visuals, allowing guests to “scan” the environment and reveal hidden creatures or details. 5. The Ascent & Return: The ride gradually slows, bringing the pod back to the surface. As it rises, the projections shift to show a sunny coastline and seabirds flying overhead. The pod returns to the loading area. Theming & Atmosphere: * Sound Design: Immersive sound effects – waves crashing, seabirds calling, underwater ambience – are crucial. * Lighting: Dynamic lighting that shifts with the projections and the "dive" sequences. * Pelican Props: Realistic pelican statues and props are integrated throughout the queue and surrounding area. * Educational Element: Small informational panels explain pelican behavior, conservation efforts, and the importance of marine ecosystems. Why this works for a pelican: * Mimics Natural Behavior: The ride accurately reflects a pelican’s primary activity – diving for fish. * Visually Engaging: The combination of water effects, projection mapping, and rotation creates a captivating and immersive experience. * Family-Friendly Thrill: The “dive” sequences provide a moderate thrill without being overly intense. * Educational Value: It promotes awareness and appreciation for these amazing birds and the marine environment. --- Further Development Ideas: * Different "Dive Routes": Create multiple routes through the underwater environment, each with a different theme (e.g., a coral reef route, a deep-sea route, a pelican’s feeding ground route). * Animatronic Pelican: A large animatronic pelican could “greet” guests as they board the pod. * Smell Integration: Subtle scents of saltwater and seaweed could enhance the immersion. Would you like me to brainstorm a specific element of the ride further, such as: * The projection mapping details? * The technical aspects of the water effects? * A unique interactive element?

Link 2025-05-07 astral-sh/ty:

Astral have been working on this "extremely fast Python type checker and language server, written in Rust" quietly but in-the-open for a while now. Here's the first alpha public release - albeit not yet announced - as ty on PyPI (nice donated two-letter name!)

You can try it out via uvx like this - run the command in a folder full of Python code and see what comes back:

uvx ty check

I got zero errors for my recent, simple condense-json library and a ton of errors for my more mature sqlite-utils library - output here.

It really is fast:

cd /tmp
git clone https://github.com/simonw/sqlite-utils
cd sqlite-utils
time uvx ty check

Reports it running in around a tenth of a second (0.109 total wall time) using multiple CPU cores:

uvx ty check  0.18s user 0.07s system 228% cpu 0.109 total

Running time uvx mypy . in the same folder (both after first ensuring the underlying tools had been cached) took around 7x longer:

uvx mypy .  0.46s user 0.09s system 74% cpu 0.740 total

This isn't a fair comparison yet as ty still isn't feature complete in comparison to mypy.

Link 2025-05-07 llm-prices.com:

I've been maintaining a simple LLM pricing calculator since October last year. I finally decided to split it out to its own domain name (previously it was hosted at tools.simonwillison.net/llm-prices), running on Cloudflare Pages.

Screenshot of the llm-prices.com site - on the left is a calculator interface for entering number of input tokens, output tokens and price per million of each. On the right is a table of models and their prices, sorted cheapest first.

The site runs out of my simonw/llm-prices GitHub repository. I ported the history of the old llm-prices.html file using a vibe-coded bash script that I forgot to save anywhere.

I rarely use AI-generated imagery in my own projects, but for this one I found an excellent reason to use GPT-4o image outputs... to generate the favicon! I dropped a screenshot of the site into ChatGPT (o4-mini-high in this case) and asked for the following:

design a bunch of options for favicons for this site in a single image, white background

A 3x3 grid of simple icon concepts: green coins/circles, a green price tag with dollar sign, a calculator with dollar sign, a calculator with plus sign, a blue chat bubble with three dots, a green brain icon, the letters "AI" in dark gray, a document with finger pointing at it, and green horizontal bars of decreasing size.

I liked the top right one, so I cropped it into Pixelmator and made a 32x32 version. Here's what it looks like in my browser:

A cropped web browser showing the chosen favicon - it's a calculator with a dollar sign overlapping some of the keys.

I added a new feature just now: the state of the calculator is now reflected in the #fragment-hash URL of the page, which means you can link to your previous calculations.

I implemented that feature using the new gemini-2.5-pro-preview-05-06, since that model boasts improved front-end coding abilities. It did a pretty great job - here's how I prompted it:

llm -m gemini-2.5-pro-preview-05-06 -f https://www.llm-prices.com/ -s 'modify this code so that the state of the page is reflected in the fragmenth hash URL - I want to capture the values filling out the form fields and also the current sort order of the table. These should be respected when the page first loads too. Update them using replaceHistory, no need to enable the back button.'

Here's the transcript and the commit updating the tool, plus an example link showing the new feature in action (and calculating the cost for that Gemini 2.5 Pro prompt at 16.8224 cents, after fixing the calculation.)

Link 2025-05-07 Medium is the new large:

New model release from Mistral - this time closed source/proprietary. Mistral Medium claims strong benchmark scores similar to GPT-4o and Claude 3.7 Sonnet, but is priced at $0.40/million input and $2/million output - about the same price as GPT 4.1 Mini. For comparison, GPT-4o is $2.50/$10 and Claude 3.7 Sonnet is $3/$15.

The model is a vision LLM, accepting both images and text.

More interesting than the price is the deployment model. Mistral Medium may not be open weights but it is very much available for self-hosting:

Mistral Medium 3 can also be deployed on any cloud, including self-hosted environments of four GPUs and above.

Mistral's other announcement today is Le Chat Enterprise. This is a suite of tools that can integrate with your company's internal data and provide "agents" (these look similar to Claude Projects or OpenAI GPTs), again with the option to self-host.

Is there a new open weights model coming soon? This note tucked away at the bottom of the Mistral Medium 3 announcement seems to hint at that:

With the launches of Mistral Small in March and Mistral Medium today, it's no secret that we're working on something 'large' over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we're excited to 'open' up what's to come :)

I released llm-mistral 0.12 adding support for the new model.

Link 2025-05-07 Create and edit images with Gemini 2.0 in preview:

Gemini 2.0 Flash has had image generation capabilities for a while now, and they're now available via the paid Gemini API - at 3.9 cents per generated image.

According to the API documentation you need to use the new gemini-2.0-flash-preview-image-generation model ID and specify {"responseModalities":["TEXT","IMAGE"]} as part of your request.

Here's an example that calls the API using curl (and fetches a Gemini key from the llm keys get store):

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key=$(llm keys get gemini)" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Photo of a raccoon in a trash can with a paw-written sign that says I love trash"}
      ]
    }],
    "generationConfig":{"responseModalities":["TEXT","IMAGE"]}
  }' > /tmp/raccoon.json

Here's the response. I got Gemini 2.5 Pro to vibe-code me a new debug tool for visualizing that JSON. If you visit that tool and click the "Load an example" link you'll see the result of the raccoon image visualized:

Render JSON from Gemini Image Generation tool. Paste Gemini JSON here: a bunch of JSON with a base64 encoded PNG. Then buttons to Load an example, or a really big (40MB) example or Render JSON. The Rendered Content shows a photograph of a raccoon in an open top bin holding a sign that says I heart trash.

The other prompt I tried was this one:

Provide a vegetarian recipe for butter chicken but with chickpeas not chicken and include many inline illustrations along the way

The result of that one was a 41MB JSON file(!) containing 28 images - which presumably cost over a dollar since images are 3.9 cents each.

Some of the illustrations it chose for that one were somewhat unexpected:

Text reads: "* ½ teaspoon Kashmiri chili powder (or paprika for milder flavor)" followed by a group photo of people in formal attire with black suits and light blue ties standing in rows outdoors, then "* ½ cup heavy cream (or coconut cream for vegan option)" followed by a close-up image of dried cumin seeds or similar brown spice.

If you want to see that one you can click the "Load a really big example" link in the debug tool, then wait for your browser to fetch and render the full 41MB JSON file.

The most interesting feature of Gemini (as with GPT-4o images) is the ability to accept images as inputs. I tried that out with this pelican photo like this:

cat > /tmp/request.json << EOF
{
  "contents": [{
    "parts":[
      {"text": "Modify this photo to add an inappropriate hat"},
      {
        "inline_data": {
          "mime_type":"image/jpeg",
          "data": "$(base64 -i pelican.jpg)"
        }
      }
    ]
  }],
  "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
}
EOF

# Execute the curl command with the JSON file
curl -X POST \
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key='$(llm keys get gemini) \
  -H 'Content-Type: application/json' \
  -d @/tmp/request.json \
  > /tmp/out.json

And now the pelican is wearing a hat:

A pelican with its wings outstretched wearing an inappropriate pink bowler hat. The hat looks a little bit pasted on.

Link 2025-05-07 Introducing web search on the Anthropic API:

Anthropic's web search (presumably still powered by Brave) is now also available through their API, in the shape of a new web search tool called web_search_20250305.

You can specify a maximum number of uses per prompt and you can also pass a list of disallowed or allowed domains, plus hints as to the user's current location.

Search results are returned in a format that looks similar to the Anthropic Citations API.

It's charged at $10 per 1,000 searches, which is a little more expensive than what the Brave Search API charges ($3 or $5 or $9 per thousand depending on how you're using them).

I couldn't find any details of additional rules surrounding storage or display of search results, which surprised me because both Google Gemini and OpenAI have these for their own API search results.

Link 2025-05-08 llm-gemini 0.19.1:

Bugfix release for my llm-gemini plugin, which was recording the number of output tokens (needed to calculate the price of a response) incorrectly for the Gemini "thinking" models. Those models turn out to return candidatesTokenCount and thoughtsTokenCount as two separate values which need to be added together to get the total billed output token count. Full details in this issue.

I spotted this potential bug in this response log this morning, and my concerns were confirmed when Paul Gauthier wrote about a similar fix in Aider in Gemini 2.5 Pro Preview 03-25 benchmark cost, where he noted that the $6.32 cost recorded to benchmark Gemini 2.5 Pro Preview 03-25 was incorrect. Since that model is no longer available (despite the date-based model alias persisting) Paul is not able to accurately calculate the new cost, but it's likely a lot more since the Gemini 2.5 Pro Preview 05-06 benchmark cost $37.

I've gone through my gemini tag and attempted to update my previous posts with new calculations - this mostly involved increases in the order of 12.336 cents to 16.316 cents (as seen here).

Quote 2025-05-08

But I’ve also had my own quiet concerns about what [vibe coding] means for early-career developers. So much of how I learned came from chasing bugs in broken tutorials and seeing how all the pieces connected, or didn’t. There was value in that. And maybe I’ve been a little protective of it.

A mentor challenged that. He pointed out that debugging AI generated code is a lot like onboarding into a legacy codebase, making sense of decisions you didn’t make, finding where things break, and learning to trust (or rewrite) what’s already there. That’s the kind of work a lot of developers end up doing anyway.

Ashley Willis

Quote 2025-05-08

Microservices only pay off when you have real scaling bottlenecks, large teams, or independently evolving domains. Before that? You’re paying the price without getting the benefit: duplicated infra, fragile local setups, and slow iteration.

Oleg Pustovit

Link 2025-05-08 Reservoir Sampling:

Yet another outstanding interactive essay by Sam Rose (previously), this time explaining how reservoir sampling can be used to select a "fair" random sample when you don't know how many options there are and don't want to accumulate them before making a selection.

Reservoir sampling is one of my favourite algorithms, and I've been wanting to write about it for years now. It allows you to solve a problem that at first seems impossible, in a way that is both elegant and efficient.

I appreciate that Sam starts the article with "No math notation, I promise." Lots of delightful widgets to interact with here, all of which help build an intuitive understanding of the underlying algorithm.

Animated demo. As a slider moves from left to right the probability of cards drawn from a deck is simulated. Text at the bottom reads Anything older than 15 cards ago is has a less than 0.01% chance of being held when I stop.

Sam shows how this algorithm can be applied to the real-world problem of sampling log files when incoming logs threaten to overwhelm a log aggregator.

The dog illustration is commissioned art and the MIT-licensed code is available on GitHub.

Quote 2025-05-08

If Claude is asked to count words, letters, and characters, it thinks step by step before answering the person. It explicitly counts the words, letters, or characters by assigning a number to each. It only answers the person once it has performed this explicit counting step. [...]

If Claude is shown a classic puzzle, before proceeding, it quotes every constraint or premise from the person’s message word for word before inside quotation marks to confirm it’s not dealing with a new variant. [...]

If asked to write poetry, Claude avoids using hackneyed imagery or metaphors or predictable rhyming schemes.

Claude's system prompt

Link 2025-05-08 SQLite CREATE TABLE: The DEFAULT clause:

If your SQLite create table statement includes a line like this:

CREATE TABLE alerts (
    -- ...
    alert_created_at text default current_timestamp
)

current_timestamp will be replaced with a UTC timestamp in the format 2025-05-08 22:19:33. You can also use current_time for HH:MM:SS and current_date for YYYY-MM-DD, again using UTC.

Posting this here because I hadn't previously noticed that this defaults to UTC, which is a useful detail. It's also a strong vote in favor of YYYY-MM-DD HH:MM:SS as a string format for use with SQLite, which doesn't otherwise provide a formal datetime type.

Link 2025-05-09 Gemini 2.5 Models now support implicit caching:

I just spotted a cacheTokensDetails key in the token usage JSON while running a long chain of prompts against Gemini 2.5 Flash - despite not configuring caching myself:

{"cachedContentTokenCount": 200658, "promptTokensDetails": [{"modality": "TEXT", "tokenCount": 204082}], "cacheTokensDetails": [{"modality": "TEXT", "tokenCount": 200658}], "thoughtsTokenCount": 2326}

I went searching and it turns out Gemini had a massive upgrade to their prompt caching earlier today:

Implicit caching directly passes cache cost savings to developers without the need to create an explicit cache. Now, when you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit. We will dynamically pass cost savings back to you, providing the same 75% token discount. [...]
To make more requests eligible for cache hits, we reduced the minimum request size for 2.5 Flash to 1024 tokens and 2.5 Pro to 2048 tokens.

Previously you needed to both explicitly configure the cache and pay a per-hour charge to keep that cache warm.

This new mechanism is so much more convenient! It imitates how both DeepSeek and OpenAI implement prompt caching, leaving Anthropic as the remaining large provider who require you to manually configure prompt caching to get it to work.

Gemini's explicit caching mechanism is still available. The documentation says:

Explicit caching is useful in cases where you want to guarantee cost savings, but with some added developer work.

With implicit caching the cost savings aren't possible to predict in advance, especially since the cache timeout within which a prefix will be discounted isn't described and presumably varies based on load and other circumstances outside of the developer's control.

Update: DeepMind's Philipp Schmid:

There is no fixed time, but it's should be a few minutes.

Link 2025-05-09 sqlite-utils 4.0a0:

New alpha release of sqlite-utils, my Python library and CLI tool for manipulating SQLite databases.

It's the first 4.0 alpha because there's a (minor) backwards-incompatible change: I've upgraded the .upsert() and .upsert_all() methods to use SQLIte's UPSERT mechanism, INSERT INTO ... ON CONFLICT DO UPDATE. Details in this issue.

That feature was added to SQLite in version 3.24.0, released 2018-06-04. I'm pretty cautious about my SQLite version support since the underlying library can be difficult to upgrade, depending on your platform and operating system.

I'm going to leave the new alpha to bake for a little while before pushing a stable release. Since this is a major version bump I'm going to take the opportunity to see if there are any other minor API warts that I can clean up at the same time.

Note 2025-05-09

I had some notes in a GitHub issue thread in a private repository that I wanted to export as Markdown. I realized that I could get them using a combination of several recent projects.

Here's what I ran:

export GITHUB_TOKEN="$(llm keys get github)"                                             
llm -f issue:https://github.com/simonw/todos/issues/170 \
  -m echo --no-log | jq .prompt -r > notes.md

I have a GitHub personal access token stored in my LLM keys, for use with Anthony Shaw's llm-github-models plugin.

My own llm-fragments-github plugin expects an optional GITHUB_TOKEN environment variable, so I set that first - here's an issue to have it use the github key instead.

With that set, the issue: fragment loader can take a URL to a private GitHub issue thread and load it via the API using the token, then concatenate the comments together as Markdown. Here's the code for that.

Fragments are meant to be used as input to LLMs. I built a llm-echo plugin recently which adds a fake LLM called "echo" which simply echos its input back out again.

Adding --no-log prevents that junk data from being stored in my LLM log database.

The output is JSON with a "prompt" key for the original prompt. I use jq .prompt to extract that out, then -r to get it as raw text (not a "JSON string").

... and I write the result to notes.md.

Link 2025-05-10 TIL: SQLite triggers:

I've been doing some work with SQLite triggers recently while working on sqlite-chronicle, and I decided I needed a single reference to exactly which triggers are executed for which SQLite actions and what data is available within those triggers.

I wrote this triggers.py script to output as much information about triggers as possible, then wired it into a TIL article using Cog. The Cog-powered source code for the TIL article can be seen here.

Note 2025-05-10 Poker Face season two just started on Peacock (the US streaming service). It's my favorite thing on TV right now. I've started threads on MetaFilter FanFare for episodes one, two and three.

Note 2025-05-11 Achievement unlocked: tap danced in the local community college dance recital.

Eliza

May 14

I'm mostly using ChatGPT for my day-to-day AI stuff, and I absolutely love the image generation, but I've run into the fixed dimensions it produces a lot. It says of itself:

Currently, ChatGPT (using DALL·E 3 via GPT-4o) supports the following image dimensions:

Square: 1024×1024

Portrait: 1024×1792

Landscape: 1792×1024

And that's it! Do you know why they only support these? I'd love to make more types of images like favicons, slack emojis, banners, etc. I like the idea of adding a bunch to one image like you did above, it would be nice to have a workflow where I don't have to crop the image into the dimensions I want, though.

Expand full comment

Rob Bru

May 11

Long live homebrew

Simon Willison’s Newsletter

Discussion about this post