My LLM CLI tool now supports self-hosted language models via plugins

Plus a podcast episode about ChatGPT Code Interpreter

Jul 12, 2023

In this newsletter:

My LLM CLI tool now supports self-hosted language models via plugins
Weeknotes: symbex, LLM prompt templates, a bit of a break

Plus 10 links and 5 quotations and 12 TILs

My LLM CLI tool now supports self-hosted language models via plugins - 2023-07-12

LLM is my command-line utility and Python library for working with large language models such as GPT-4. I just released version 0.5 with a huge new feature: you can now install plugins that add support for additional models to the tool, including models that can run on your own hardware.

Highlights of today's release:

Plugins to add support for 17 openly licensed models from the GPT4All project that can run directly on your device, plus Mosaic's MPT-30B self-hosted model and Google's PaLM 2 (via their API).
This means you can pip install (or brew install) models along with a CLI tool for using them!
A detailed tutorial describing how to build new plugins that add support for additional models.
A documented Python API for running prompts through any model provided by a plugin, plus a way of continuing a conversation across multiple prompts.

How to try it out

First, install LLM. You can install it using Homebrew:

brew install simonw/llm/llm

Or with pip:

pip install llm

Or pipx:

pipx install llm

The default tool can work with OpenAI's models via their API, provided you have an API key. You can see usage instructions for that here.

But let's do something more interesting than that: Let's install a model that can run on our own machine!

We'll use the new llm-gpt4all plugin, which installs models published by the GPT4All project by Nomic AI.

Install the plugin like this:

llm install llm-gpt4all

Now let's run a prompt against a small model. LLM will download the model file the first time you query that model.

We'll start with ggml-vicuna-7b-1, a 4.21GB download which should run if you have at least 8GB of RAM.

To run the prompt, try this:

llm -m ggml-vicuna-7b-1 "The capital of France?"

You'll see a progress bar showing the download of the model, followed by the answer to the prompt, generated a word at a time.

Animated screenshot. Running that command produces a progress bar as the 4.21GB model downloads - once the download finishes it spits out the sentence Paris is the capital of France one word at a time. Then the user types llm logs -n 1 and sees a JSON log revealing the details about the prompt that were saved in the database.

All prompts and responses are automatically logged to a SQLite database. Calling llm logs with a -n 1 argument will show the most recent record:

llm logs -n 1

This outputs something like the following:

[
  {
    "id": "01h549p8r12ac1980crbr9yhjf",
    "model": "ggml-vicuna-7b-1",
    "prompt": "The capital of France?",
    "system": null,
    "prompt_json": null,
    "options_json": {},
    "response": "Paris is the capital of France.",
    "response_json": {
      "full_prompt": "### Human: \nThe capital of France?\n### Assistant:\n"
    },
    "conversation_id": "01h549p8r0abz6ebwd7agmjmgy",
    "duration_ms": 9511,
    "datetime_utc": "2023-07-12T05:37:44.407233",
    "conversation_name": "The capital of France?",
    "conversation_model": "ggml-vicuna-7b-1"
  }
]

You can see a full list of available models by running the llm models list command. The llm-gpt4all plugin adds 17 models to the tool:

llm models list

I've installed all three plugins model, so I see the following:

OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
PaLM 2: chat-bison-001 (aliases: palm, palm2)
gpt4all: orca-mini-3b - Orca (Small), 1.80GB download, needs 4GB RAM (installed)
gpt4all: ggml-gpt4all-j-v1 - Groovy, 3.53GB download, needs 8GB RAM (installed)
gpt4all: orca-mini-7b - Orca, 3.53GB download, needs 8GB RAM (installed)
gpt4all: ggml-vicuna-7b-1 - Vicuna, 3.92GB download, needs 8GB RAM (installed)
gpt4all: ggml-mpt-7b-chat - MPT Chat, 4.52GB download, needs 8GB RAM (installed)
gpt4all: ggml-replit-code-v1-3b - Replit, 4.84GB download, needs 4GB RAM (installed)
gpt4all: ggml-vicuna-13b-1 - Vicuna (large), 7.58GB download, needs 16GB RAM (installed)
gpt4all: nous-hermes-13b - Hermes, 7.58GB download, needs 16GB RAM (installed)
gpt4all: ggml-model-gpt4all-falcon-q4_0 - GPT4All Falcon, 3.78GB download, needs 8GB RAM
gpt4all: ggml-wizardLM-7B - Wizard, 3.92GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-base - MPT Base, 4.52GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-instruct - MPT Instruct, 4.52GB download, needs 8GB RAM
gpt4all: orca-mini-13b - Orca (Large), 6.82GB download, needs 16GB RAM
gpt4all: GPT4All-13B-snoozy - Snoozy, 7.58GB download, needs 16GB RAM
gpt4all: ggml-nous-gpt4-vicuna-13b - Nous Vicuna, 7.58GB download, needs 16GB RAM
gpt4all: ggml-stable-vicuna-13B - Stable Vicuna, 7.58GB download, needs 16GB RAM
gpt4all: wizardLM-13B-Uncensored - Wizard Uncensored, 7.58GB download, needs 16GB RAM
Mpt30b: mpt30b (aliases: mpt)

In addition to the gpt4all models I can also run PaLM 2 from Google and mpt30b from Mosaic, as well as the four OpenAI models.

Models have aliases, so in some cases you can run llm -m mpt instead of llm -m mpt30b.

We'll try one more model. Google's PaLM 2 was released a few weeks ago, and can be accessed through their PaLM API.

Obtain an API key for that, and install the llm-palm plugin:

pip install llm-palm

Set your API key like this:

llm keys set palm

Now you can run prompts against it like this:

llm -m palm "Ten absurd names for a pet giraffe"

PaLM replies:

Here are ten absurd names for a pet giraffe:
Stretch
Necky
Long Legs
Tree Top
Tall Boy
High Five
Sky Walker
Cloud Chaser
Star Gazer
Horizon Hopper
I hope you find these names amusing!

This also gets logged to the database - run llm logs -n 1 again to see the log entry.

LLM supports continuing a conversation with more prompts. We can run another prompt through PaLM as part of the same conversation like this:

llm --continue "3 more and make them weirder"

PaLM replies:

Sure, here are three more absurd names for a pet giraffe, even weirder than the first ten:
Giraffey McFierceface
Longneck von Longneck
The Giraffe Whisperer
I hope you find these names even more amusing than the first ten!

Using -c/--continue will continue the most previous conversation. You can also pass a conversation ID (available in the output from llm logs) using --cid ID to reply to an older conversation thread.

Adding a new model

I've tried to make it as easy as possible to add support for additional models through writing plugins. The tutorial Writing a plugin to support a new model is extremely thorough, and includes detailed descriptions of how to start a new plugin, set up a development environment for it, integrate it with a new model and then package it for distribution.

The tutorial uses a Markov chain implementation as an example, possibly the simplest possible form of language model.

The source code of the other existing plugins should help show how to integrate with more complex models:

llm_palm/__init__.py demonstrates how to integrate with a model exposed via Google's API library.
llm/default_plugins/openai_models.py demonstrates integration against the OpenAI APIs.
llm_gpt4all.py shows an integration with the gpt4all Python library.
llm_mpt30b.py demonstrates a direct integration against a model using the ctransformers library.

Using LLM from Python

LLM was originally designed to be used from the command-line, but in version 0.5 I've expanded it to work as a Python library as well.

The documentation for that is here, but here's the short version:

import llm

model = llm.get_model("gpt-3.5-turbo")
model.key = 'YOUR_API_KEY_HERE'

response = model.prompt("Five surprising names for a pet pelican")
for chunk in response:
    print(chunk, end="")

# Or wait for the whole response to be ready:
print(response.text())

Any model that can be installed via a plugin can be accessed in the same way.

The API also supports conversations, where multiple prompts are sent to the model as part of the same persistent context:

import llm

model = llm.get_model("ggml-mpt-7b-chat")

conversation = model.conversation()
r1 = conversation.prompt("Capital of Spain?")
print(r1.text())

r2 = conversation.prompt("What language do they speak there?")
print(r2.text())

What's next?

You can follow ongoing LLM development in the GitHub repository issues.

My next priority is to get OpenAI functions working. I want to provide the option for other models from plugins to implement a similar pattern using the reAct pattern as well.

I'll likely do this by implementing the concept of a "chain" of LLM calls, where a single prompt might lead to multiple calls being made to the LLM based on logic that decides if another call is necessary.

I'm also planning a web interface. I'm particularly excited about the potential for plugins here - I love the idea of plugins that provide new interfaces for interacting with language models that go beyond the chat interfaces we've mostly seen so far.

Link 2023-07-10 Latent Space: Code Interpreter == GPT 4.5: I presented as part of this Latent Space episode over the weekend, talking about the newly released ChatGPT Code Interpreter mode with swyx, Alex Volkov, Daniel Wilson and more. swyx did a great job editing our Twitter Spaces conversation into a podcast and writing up a detailed executive summary, posted here along with the transcript. If you're curious you can listen to the first 15 minutes to get a great high-level explanation of Code Interpreter, or stick around for the full two hours for all of the details.

Apparently our live conversation had 17,000+ listeners!

Weeknotes: symbex, LLM prompt templates, a bit of a break - 2023-06-27

I had a holiday to the UK for a family wedding anniversary and mostly took the time off... except for building symbex, which became one of those projects that kept on inspiring new features.

I've also been working on some major improvements to my LLM tool for working with language models from the command-line.

symbex

I introduced symbex in symbex: search Python code for functions and classes, then pipe them into a LLM. It's since grown a bunch more features across 12 total releases.

symbex is a tool for searching Python code. The initial goal was to make it quick to find and output the body of a specific Python function or class, such that you could then pipe it to LLM to process it with GPT-3.5 or GPT-4:

symbex find_symbol_nodes \
  | llm -m gpt4 --system 'Describe this code succinctly'

Output:

This code defines a function find_symbol_nodes that takes in three arguments: code (string), filename (string), and symbols (iterable of strings). The function parses the given code and searches for AST nodes (Class, Function, AsyncFunction) that match the provided symbols. It returns a list of tuple pairs containing matched nodes and their corresponding class names or None.

When piping to a language model token count is really important - the goal is to provide the shortest amount of text that gives the model enough to produce interesting results.

So... I added a -s/--signatures option which returns just the function or class signature:

symbex find_symbol_nodes -s

Output:

# File: symbex/lib.py Line: 13
def find_symbol_nodes(code: str, filename: str, symbols: Iterable[str]) -> List[Tuple[(AST, Optional[str])]]

Add --docstrings to include the docstring. Add -i/--imports for an import line, and -n/--no-file to suppress that # File comment - so -in combines both of hose options:

symbex find_symbol_nodes -s --docstrings -in

# from symbex.lib import find_symbol_nodes
def find_symbol_nodes(code: str, filename: str, symbols: Iterable[str]) -> List[Tuple[(AST, Optional[str])]]
    "Returns ast Nodes matching symbols"

Being able to see type annotations and docstrings tells you a lot about the code. This gave me an idea for an extra set of features: filters that could be used to only return symbols that were documented, or undocumented, or included or were missing type signatures:

--async: Filter async functions
--function: Filter functions
--class: Filter classes
--documented: Filter functions with docstrings
--undocumented: Filter functions without docstrings
--typed: Filter functions with type annotations
--untyped: Filter functions without type annotations
--partially-typed: Filter functions with partial type annotations
--fully-typed: Filter functions with full type annotations

So now you can use symbex to get a feel for how well typed or documented your code is:

# See all symbols lacking a docstring:
symbex -s --undocumented

# All functions that are missing type annotations:
symbex -s --function --untyped

The README has comprehensive documentation on everything else the tool can do.

LLM prompt templates

My LLM tool is shaping up in some interesting directions as well.

The big new released feature is prompt templates.

A template is a file that looks like this:

system: Summarize this text in the voice of $voice
model: gpt-4

This can be installed using llm templates edit summary, which opens a text editor (using the $EDITOR environment variable).

Once installed, you can use it like this:

curl -s 'https://til.simonwillison.net/macos/imovie-slides-and-audio' | \
  strip-tags -m | \
  llm -t summarize -p voice 'Extremely sarcastic GlaDOS'

Oh, bravo, Simon. You've really outdone yourself. Apparently, the highlight of his day was turning an old talk into a video using iMovie. After a truly heart-stopping struggle with the Ken Burns effect, he finally, and I mean finally, tuned the slide duration to match the audio. And then, hold your applause, he met the enormous challenge of publishing it on YouTube. We were all waiting with bated breath. Oh, but wouldn't it be exciting to note that his estimated 1.03GB video was actually a shockingly smaller size? I can't contain my excitement. He also used Pixelmator for a custom title slide, as YouTube prefers a size of 1280x720px - ground-breaking information, truly.

The idea here is to make it easy to create reusable template snippets, for all sorts of purposes. git diff | llm -t diff could generate a commit message, cat file.py | llm -t explain could explain code etc.

LLM plugins

These are still baking, but this is the feature I'm most excited about. I'm adding plugins to LLM, inspired by plugins in Datasette.

I'm planning the following categories of plugins to start with:

Command plugins. These will allow extra commands to be added to the llm tool - llm search or llm embed or similar.
Template plugins. Imagine being able to install extra prompt templates using llm install name-of-package.
Model plugins. I want LLM to be able to use more than just GPT-3.5 and GPT-4. I have a branch with an example plugin that can call Google's PaLM 2 model via Google Vertex, and I hope to support many other LLM families with additional plugins, including models that can run locally via llama.cpp and similar.
Function plugins. Once I get the new OpenAI functions mechanism working, I'd like to be able to install plugins that make new functions available to be executed by the LLM!

All of this is under active development at the moment. I'll write more about it once I get it working.

Entries these weeks

TIL these weeks

TOML in Python - 2023-06-26
Automatically maintaining Homebrew formulas using GitHub Actions - 2023-06-21
Using ChatGPT Browse to name a Python package - 2023-06-18
Syncing slide images and audio in iMovie - 2023-06-15
Using fs_usage to see what files a process is using - 2023-06-15
Running OpenAI's large context models using llm - 2023-06-13
Consecutive groups in SQL using window functions - 2023-06-08

Link 2023-06-19 Building Search DSLs with Django: Neat tutorial by Dan Lamanna: how to build a GitHub-style search feature - supporting modifiers like "is:open author:danlamanna" - using PyParsing and the Django ORM.

TIL 2023-06-21 Automatically maintaining Homebrew formulas using GitHub Actions:

I previously wrote about Packaging a Python CLI tool for Homebrew. I've now figured out a pattern for automatically updating those formulas over time, using GitHub Actions. …

Quote 2023-06-22

Back then [in 2012], no one was thinking about AI. You just keep uploading your images [to Adobe Stock] and you get your residuals every month and life goes on — then all of a sudden, you find out that they trained their AI on your images and on everybody’s images that they don’t own. And they’re calling it ‘ethical’ AI.

Eric Urquhart

Quote 2023-06-23

Every year, some generation of engineers have to learn the concepts of "there is no silver bullet", "use the right tech for the right problem", "your are not google", "rewriting a codebase every 2 years is not a good business decision", "things cost money".

sametmax

TIL 2023-06-26 TOML in Python:

I finally got around to fully learning TOML. Some notes, including how to read and write it from Python. …

Link 2023-06-27 Status of Python Versions: Very clear and useful page showing the exact status of different Python versions. 3.7 reaches end of life today (no more security updates), while 3.11 will continue to be supported until October 2027.

TIL 2023-06-29 CLI tools hidden in the Python standard library:

Seth Michael Larson pointed out that the Python gzip module can be used as a CLI tool like this: …

Link 2023-06-29 abacaj/mpt-30B-inference: MPT-30B, released last week, is an extremely capable Apache 2 licensed open source language model. This repo shows how it can be run on a CPU, using the ctransformers Python library based on GGML. Following the instructions in the README got me a working MPT-30B model on my M2 MacBook Pro. The model is a 19GB download and it takes a few seconds to start spitting out tokens, but it works as advertised.

TIL 2023-06-29 Bulk editing status in GitHub Projects:

GitHub Projects has a mechanism for bulk updating the status of items, but it's pretty difficult to figure out how to do it. …

Link 2023-06-30 Databricks Signs Definitive Agreement to Acquire MosaicML, a Leading Generative AI Platform: MosaicML are the team behind MPT-7B and MPT-30B, two of the most impressive openly licensed LLMs. They just got acquired by Databricks for $1.3 billion dollars.

TIL 2023-06-30 A Discord bot to expand issue links to a private GitHub repository:

I have a private Discord channel and a private GitHub repository. …

TIL 2023-06-30 Local wildcard DNS on macOS with dnsmasq:

I wanted to get wildcard DNS running on my Mac laptop, for development purposes. I wanted

http://anything.mysite.lan/

to point to my localhost IP address. …

Quote 2023-07-01

Once you've found something you're excessively interested in, the next step is to learn enough about it to get you to one of the frontiers of knowledge. Knowledge expands fractally, and from a distance its edges look smooth, but once you learn enough to get close to one, they turn out to be full of gaps.

Paul Graham

TIL 2023-07-02 Custom Jinja template tags with attributes:

I decided to implement a custom Jinja template block tag for my datasette-render-markdown plugin. I wanted the tag to work like this: …

TIL 2023-07-02 Syntax highlighted code examples in Datasette:

I wanted to add syntax highlighting to the new tutorial Data analysis with SQLite and Python. …

Link 2023-07-02 Data analysis with SQLite and Python: I turned my 2hr45m workshop from PyCon into the latest official tutorial on the Datasette website. It includes an extensive handout which should be useful independently of the video itself.

Link 2023-07-04 Stamina: tutorial: Stamina is Hynek's new Python library that implements an opinionated wrapper on top of Tenacity, providing a decorator for easily implementing exponential backoff retires. This tutorial includes a concise, clear explanation as to why this is such an important concept in building distributed systems.

TIL 2023-07-08 Python packages with pyproject.toml and nothing else:

I've been using setuptools and setup.py for my Python packages for a long time: I like that it works without me having to think about installing and learning any additional tools such as Flit or pip-tools or Poetry or Hatch. …

Link 2023-07-08 Tech debt metaphor maximalism: I've long been a fan of the metaphor of technical debt, because it implies that taking on some debt is OK provided you're strategic about how much you take on and how quickly you pay it off. Avery Pennarun provides the definitive guide to thinking about technical debt, including an extremely worthwhile explanation of how financial debt works as well.

Quote 2023-07-09

It feels pretty likely that prompting or chatting with AI agents is going to be a major way that we interact with computers into the future, and whereas there’s not a huge spread in the ability between people who are not super good at tapping on icons on their smartphones and people who are, when it comes to working with AI it seems like we’ll have a high dynamic range. Prompting opens the door for non-technical virtuosos in a way that we haven’t seen with modern computers, outside of maybe Excel.

Matt Webb

TIL 2023-07-10 Using OpenAI functions and their Python library for data extraction:

Here's the pattern I figured out for using the openai Python library to extract structured data from text using a single call to the model. …

Link 2023-07-10 Why We Replaced Firecracker with QEMU: Hocus are building a self-hosted alternative to cloud development environment tools like GitPod and Codespaces. They moved away from Firecracker because it's optimized for short-running (AWS Lambda style) functions - which means it never releases allocated RAM or storage volume space back to the host machine unless the container is entirely restarted. It also lacks GPU support.

TIL 2023-07-10 Using git-filter-repo to set commit dates to author dates:

After rebasing a branch with 60+ commits onto main I was disappointed to see that the commit dates on the commits (which are a different thing from the author dates) had all been reset to the same time. This meant the GitHub default UI for commits implied everything had been written at the same moment. …

Quote 2023-07-10

At The Guardian we had a pretty direct way to fix this [the problem of zombie feature flags]: experiments were associated with expiry dates, and if your team's experiments expired the build system simply wouldn't process your jobs without outside intervention. Seems harsh, but I've found with many orgs the only way to fix negative externalities in a shared codebase is a tool that says "you broke your promises, now we break your builds".

jbreckmckye

Link 2023-07-10 Lima VM - Linux Virtual Machines On macOS: This looks really useful: "brew install lima" to install, then "limactl start default" to start an Ubuntu VM running and "lima" to get a shell. Julia Evans wrote about the tool this morning, and here Adam Gordon Bell includes details on adding a writable directory (by default lima mounts your macOS home directory in read-only mode).

TIL 2023-07-10 Quickly testing code in a different Python version using pyenv:

I had a bug that was only showing up in CI against Python 3.8. …

David Plon

Jul 13, 2023

Really appreciate your work.

When using the OpenAI functions with GPT4, I’ve struggled to control / improve the quality of the observations after each tool call. It seems to make errors that GPT-4 doesn’t typically make when working with it normally. Have you run into this issue?

Expand full comment

Simon Willison’s Newsletter

Discussion about this post