In this newsletter:
Interesting ideas in Observable Framework
Weeknotes: Getting ready for NICAR
Plus 20 links and 2 quotations and 2 TILs
Interesting ideas in Observable Framework - 2024-03-03
Mike Bostock, Announcing: Observable Framework:
Today we’re launching Observable 2.0with a bold new vision: an open-source static site generator for building fast, beautiful data apps, dashboards, and reports.
Our mission is to help teams communicate more effectively with data. Effective presentation of data is critical for deep insight, nuanced understanding, and informed decisions. Observable notebooks are great for ephemeral, ad hoc data exploration. But notebooks aren't well-suited for polished dashboards and apps.
Enter Observable Framework.
There are a lot of really interesting ideas in Observable Framework.
A static site generator for data projects and dashboards
At its heart, Observable Framework is a static site generator. You give it a mixture of Markdown and JavaScript (and potentially other languages too) and it compiles them all together into fast loading interactive pages.
It ships with a full featured hot-reloading server, so you can edit those files in your editor, hit save and see the changes reflected instantly in your browser.
Once you're happy with your work you can run a build command to turn it into a set of static files ready to deploy to a server - or you can use the npm run deploy
command to deploy it directly to Observable's own authenticated sharing platform.
JavaScript in Markdown
The key to the design of Observable Framework is the way it uses JavaScript in Markdown to create interactive documents.
Here's what that looks like:
# This is a document
Markdown content goes here.
This will output 1870:
```js
34 * 55
```
And here's the current date and time, updating constantly:
```js
new Date(now)
```
The same thing as an inline string: ${new Date(now)}
Any Markdown code block tagged js
will be executed as JavaScript in the user's browser. This is an incredibly powerful abstraction - anything you can do in JavaScript (which these days is effectively anything at all) can now be seamlessly integrated into your document.
In the above example the now
value is interesting - it's a special variable that provides the current time in milliseconds since the epoch, updating constantly. Because now
updates constantly, the display value of the cell and that inline expression will update constantly as well.
If you've used Observable Notebooks before this will feel familiar - but notebooks involve code and markdown authored in separate cells. With Framework they are all now part of a single text document.
Aside: when I tried the above example I found that the ${new Date(now)}
inline expression displayed as Mon Feb 19 2024 20:46:02 GMT-0800 (Pacific Standard Time)
while the js
block displayed as 2024-02-20T04:46:02.641Z
. That's because inline expressions use the JavaScript default string representation of the object, while the js
block uses the Observable display()
function which has its own rules for how to display different types of objects, visible in inspect/src/inspect.js.
Everything is still reactive
The best feature of Observable Notebooks is their reactivity - the way cells automatically refresh when other cells they depend on change. This is a big difference to Python's popular Jupyter notebooks, and is the signature feature of marimo, a new Python notebook tool.
Observable Framework retains this feature in its new JavaScript Markdown documents.
This is particularly useful when working with form inputs. You can drop an input onto a page and refer its value throughout the rest of the document, adding realtime interactivity to documents incredibly easily.
Here's an example. I ported one of my favourite notebooks to Framework, which provides a tool for viewing download statistics for my various Python packages.
The Observable Framework version can be found at https://simonw.github.io/observable-framework-experiments/package-downloads - source code here on GitHub.
This entire thing is just 57 lines of Markdown. Here's the code with additional comments (and presented in a slightly different order - the order of code blocks doesn't matter in Observable thanks to reactivity).
# PyPI download stats for Datasette projects
Showing downloads for **${packageName}**
It starts with a Markdown <h1>
heading and text that shows the name of the selected package.
```js echo
const packageName = view(Inputs.select(packages, {
value: "sqlite-utils",
label: "Package"
}));
```
This block displays the select widget allowing the user to pick one of the items from the packages
array (defined later on).
Inputs.select()
is a built-in method provided by Framework, described in the Observable Inputs documentation.
The view()
function is new in Observable Framework - it's the thing that enables the reactivity, ensuring that updates to the input selection are acted on by other code blocks in the document.
Because packageName
is defined with const
it becomes a variable that is visible to other js
blocks on the page. It's used by this next block:
```js echo
const data = d3.json(
`https://datasette.io/content/stats.json?_size=max&package=${packageName}&_sort_desc=date&_shape=array`
);
Here we are fetching the data that we need for the chart. I'm using d3.json()
(all of D3 is available in Framework) to fetch the data from a URL that includes the selected package name.
The data is coming from Datasette, using the Datasette JSON API. I have a SQLite table at datasette.io/content/stats that's updated once a day with the latest PyPI package statistics via a convoluted series of GitHub Actions workflows, described previously.
Adding .json
to that URL returns the JSON, then I ask for rows for that particular package, sorted descending by date and returning the maximum number of rows (1,000) as a JSON array of objects.
Now that we have data
as a variable we can manipulate it slightly for use with Observable Plot - parsing the SQLite string dates into JavaScript Date
objects:
```js echo
const data_with_dates = data.map(function(d) {
d.date = d3.timeParse("%Y-%m-%d")(d.date);
return d;
})
```
This code is ready to render as a chart. I'm using Observable Plot - also packaged with Framework:
```js echo
Plot.plot({
y: {
grid: true,
label: `${packageName} PyPI downloads per day`
},
width: width,
marginLeft: 60,
marks: [
Plot.line(data_with_dates, {
x: "date",
y: "downloads",
title: "downloads",
tip: true
})
]
})
```
So we have one cell that lets the user pick the package they want, a cell that fetches that data, a cell that processes it and a cell that renders it as a chart.
There's one more piece of the puzzle: where does that list of packages come from? I fetch that with another API call to Datasette. Here I'm using a SQL query executed against the /contentdatabase directly:
```js echo
const packages_sql = "select package from stats group by package order by max(downloads) desc"
```
```js echo
const packages = fetch(
`https://datasette.io/content.json?sql=${encodeURIComponent(
packages_sql
)}&_size=max&_shape=arrayfirst`
).then((r) => r.json());
```
_shape=arrayfirst
is a shortcut for getting back a JSON array of the first column of the resulting rows.
That's all there is to it! It's a pretty tiny amount of code for a full interactive dashboard.
Only include the code that you use
You may have noticed that my dashboard example uses several additional libraries - Inputs
for the form element, d3
for the data fetching and Plot
for the chart rendering.
Observable Framework is smart about these. It implements lazy loading in development mode, so code is only loaded the first time you attempt to use it in a cell.
When you build and deploy your application, Framework automatically loads just the referenced library code from the jsdelivr CDN.
Cache your data at build time
One of the most interesting features of Framework is its Data loader mechanism.
Dashboards built using Framework can load data at runtime from anywhere using fetch()
requests (or wrappers around them). This is how Observable Notebooks work too, but it leaves the performance of your dashboard at the mercy of whatever backends you are talking to.
Dashboards benefit from fast loading times. Framework encourages a pattern where you build the data for the dashboard at deploy time, bundling it together into static files containing just the subset of the data needed for the dashboard. These can be served lightning fast from the same static hosting as the dashboard code itself.
The design of the data loaders is beautifully simple and powerful. A data loader is a script that can be written in any programming language. At build time, Framework executes that script and saves whatever is outputs to a file.
A data loader can be as simple as the following, saved as quakes.json.sh
:
curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson
When the application is built, that filename tells Framework the destination file (quakes.json
) and the loader to execute (.sh
).
This means you can load data from any source using any technology you like, provided it has the ability to output JSON or CSV or some other useful format to standard output.
Comparison to Observable Notebooks
Mike introduced Observable Framework as Observable 2.0. It's worth reviewing how the this system compares to the original Observable Notebook platform.
I've been a huge fan of Observable Notebooks for years - 38 blog posts and counting! The most obvious comparison is to Jupyter Notebooks, where they have some key differences:
Observable notebooks use JavaScript, not Python.
The notebook editor itself isn't open source - it's a hosted product provided on observablehq.com. You can export the notebooks as static files and run them anywhere you like, but the editor itself is a proprietary product.
Observable cells are reactive. This is the key difference with Jupyter: any time you change a cell all other cells that depend on that cell are automatically re-evaluated, similar to Excel.
The JavaScript syntax they use isn't quite standard JavaScript - they had to invent a new
viewof
keyword to support their reactivity model.Editable notebooks are a pretty complex proprietary file format. They don't play well with tools like Git, to the point that Observable ended up implementing their own custom version control and collaboration systems.
Observable Framework reuses many of the ideas (and code) from Observable Notebooks, but with some crucial differences:
Notebooks (really documents) are now single text files - Markdown files with embedded JavaScript blocks. It's all still reactive, but the file format is much simpler and can be edited using any text editor, and checked into Git.
It's all open source. Everything is under an ISC license (OSI approved) and you can run the full editing stack on your own machine.
It's all just standard JavaScript now - no custom syntax.
A change in strategy
Reading the tea leaves a bit, this also looks to me like a strategic change of direction for Observable as a company. Their previous focus was on building great collaboration tools for data science and analytics teams, based around the proprietary Observable Notebook editor.
With Framework they appear to be leaning more into the developer tools space.
On Twitter @observablehq describes itself as "The end-to-end solution for developers who want to build and host dashboards that don’t suck" - the Internet Archive copy from October 3rd 2023 showed "Build data visualizations, dashboards, and data apps that impact your business — faster."
I'm excited to see where this goes. I've limited my usage of Observable Notebooks a little in the past purely due to the proprietary nature of their platform and the limitations placed on free accounts (mainly the lack of free private notebooks), while still having enormous respect for the technology and enthusiastically adopting their open source libraries such as Observable Plot.
Observable Framework addresses basically all of my reservations. It's a fantastic new expression of the ideas that made Observable Notebooks so compelling, and I expect to use it for all sorts of interesting projects in the future.
Weeknotes: Getting ready for NICAR - 2024-02-27
Next week is NICAR 2024 in Baltimore - the annual data journalism conference hosted by Investigative Reporters and Editors. I'm running a workshop on Datasette, and I plan to spend most of my time in the hallway track talking to people about Datasette, Datasette Cloud and how the Datasette ecosystem can best help support their work.
I've been working with Alex Garcia to get Datasette Cloud ready for the conference. We have a few new features that we're putting the final touches on, in addition to ensuring features like Datasette Enrichments and Datasette Comments are in good shape for the event.
Releases
llm-mistral 0.3 - 2024-02-26
LLM plugin providing access to Mistral models using the Mistral API
Mistral released Mistral Large this morning, so I rushed out a new release of my llm-mistral plugin to add support for it.
pipx install llm
llm install llm-mistral --upgrade
llm keys set mistral
# <Paste in your Mistral API key>
llm -m mistral-large 'Prompt goes here'
The plugin now hits the Mistral API endpoint that lists models (via a cache), which means future model releases should be supported automatically without needing a new plugin release.
dclient 0.3 - 2024-02-25
A client CLI utility for Datasette instances
dclient provides a tool for interacting with a remote Datasette instance. You can use it to run queries:
dclient query https://datasette.io/content \
"select * from news limit 3"
You can set aliases for your Datasette instances:
dclient alias add simon https://simon.datasette.cloud/data
And for Datasette 1.0 alpha instances with the write API (as seen on Datasette Cloud) you can insert data into a new or an existing table:
dclient auth add simon
# <Paste in your API token>
dclient insert simon my_new_table data.csv --create
The 0.3 release adds improved support for streaming data into a table. You can run a command like this:
tail -f log.ndjson | dclient insert simon my_table \
--nl - --interval 5 --batch-size 20
The --interval 5
option is new: it means that records will be written to the API if 5 seconds have passed since the last write. --batch-size 20
means that records will be written in batches of 20, and will be sent as soon as the batch is full or the interval has passed.
datasette-events-forward 0.1a1 - 2024-02-20
Forward Datasette analytical events on to another Datasette instance
I wrote about the new Datasette Eventsmechanism in the 1.0a8 release notes. This new plugin was originally built for Datasette Cloud - it forwards analytical events from an instance to a central analytics instance. Using Datasette Cloud for analytics for Datasette Cloud is a pleasing exercise in dogfooding.
datasette-auth-tokens 0.4a9 - 2024-02-20
Datasette plugin for authenticating access using API tokens
A tiny cosmetic bug fix.
datasette 1.0a11 - 2024-02-19
An open source multi-tool for exploring and publishing data
I'm increasing the frequency of the Datasette 1.0 alphas. This one has a minor permissions fix (the ability to replace a row using the insert API now requires the update-row
permission) and a small cosmetic fix which I'm really pleased with: the menus displayed by the column action menu now align correctly with their cog icon!
datasette-edit-schema 0.8a0 - 2024-02-18
Datasette plugin for modifying table schemas
This is a pretty significant release: it adds finely-grained permission support such that Datasette's core create-table
, alter-table
and drop-table
permissions are now respected by the plugin.
The alter-table
permission was introduced in Datasette 1.0a9 a couple of weeks ago.
datasette-unsafe-actor-debug 0.2 - 2024-02-18
Debug plugin that lets you imitate any actor
When testing permissions it's useful to have a really convenient way to sign in to Datasette using different accounts. This plugin provides that, but only if you start Datasette with custom plugin configuration or by using this new 1.0 alpha shortcut setting option:
datasette -s plugins.datasette-unsafe-actor-debug.enabled 1
datasette-studio 0.1a0 - 2024-02-18
Datasette pre-configured with useful plugins. Experimental alpha.
An experiment in bundling plugins. pipx install datasette-studio
gets you an installation of Datasette under a separate alias - datasette-studio
- which comes preconfigured with a set of useful plugins.
The really fun thing about this one is that the entire package is defined by a pyproject.tomlfile, with no additional Python code needed. Here's a truncated copy of that TOML:
[project]
name = "datasette-studio"
version = "0.1a0"
description = "Datasette pre-configured with useful plugins"
requires-python = ">=3.8"
dependencies = [
"datasette>=1.0a10",
"datasette-edit-schema",
"datasette-write-ui",
"datasette-configure-fts",
"datasette-write",
]
[project.entry-points.console_scripts]
datasette-studio = "datasette.cli:cli"
I think it's pretty neat that a full application can be defined like this in terms of 5 dependencies and a custom console_scripts
entry point.
Datasette Studio is still very experimental, but I think it's pointing in a promising direction.
datasette-enrichments-opencage 0.1.1 - 2024-02-16
Geocoding and reverse geocoding using OpenCage
This resolves a dreaded "database locked" error I was seeing occasionally in Datasette Cloud.
Short version: SQLite, when running in WAL mode, is almost immune to those errors... provided you remember to run all write operations in short, well-defined transactions.
I'd forgotten to do that in this plugin and it was causing problems.
After shipping this release I decided to make it much harder to make this mistake in the future, so I released Datasette 1.0a10 which now automatically wraps calls to database.execute_write_fn()
in a transaction even if you forget to do so yourself.
Blog entries
My first full blog post of the year to end up on Hacker News, where it sparked a lively conversation with 489 comments!
TILs
Tracking SQLite table history using a JSON audit log - 2024-02-27
Yet another experiment with audit tables in SQLite. This one uses a terrifying nested sequenc of json_patch()
calls to assemble a JSON document describing the change made to the table.
Val Town is a very neat attempt at solving another of my favourite problems: how to execute user-provided code safely in a sandbox. It turns out to be the perfect mechanism for running simple scheduled functions such as code that reads data and writes it to Datasette Cloud using the write API.
Getting Python MD5 to work with FIPS systems - 2024-02-14
FIPS is the Federal Information Processing Standard, and systems that obey it refuse to run Datasette due to its use of MD5 hash functions. I figured out how to get that to work anyway, since Datasette's MD5 usage is purely cosmetic, not cryptographic.
Running Ethernet over existing coaxial cable - 2024-02-13
This actually showed up on Hacker Newswithout me noticing until a few days later, where many people told me that I should rewire my existing Ethernet cables rather than resorting to more exotic solutions.
Piping from rg to llm to answer questions about code - 2024-02-11
I guess this is another super lightweight form of RAG: you can use the rg
context options (include X lines before/after each match) to assemble just enough context to get useful answers to questions about code.
Quote2024-02-21
When I first published the micrograd repo, it got some traction on GitHub but then somewhat stagnated and it didn't seem that people cared much. [...] When I made the video that built it and walked through it, it suddenly almost 100X'd the overall interest and engagement with that exact same piece of code.
[...] you might be leaving somewhere 10-100X of the potential of that exact same piece of work on the table just because you haven't made it sufficiently accessible.
Link 2024-02-22 JavaScript Bloat in 2024:
Depressing review of the state of page bloat in 2024 by Nikita Prokopov. Some of these are pretty shocking: 12MB for a Booking.com search, 9MB for a Google search, 20MB for Gmail(!), 31MB for LinkedIn. No wonder the modern web can feel sludgy even on my M2 MacBook Pro.
Link 2024-02-22 Okay, Color Spaces:
Fantastic interactive explanation of how color spaces work by Eric Portis.
Link 2024-02-23 PGlite:
PostgreSQL compiled for WebAssembly and turned into a very neat JavaScript library. Previous attempts at running PostgreSQL in WASM have worked by bundling a full Linux virtual machine - PGlite just bundles a compiled PostgreSQL itself, which brings the size down to an impressive 3.7MB gzipped.
Link 2024-02-23 Bloom Filters, explained by Sam Rose:
Beautifully designed explanation of bloom filters, complete with interactive demos that illustrate exactly how they work.
Link 2024-02-23 Does Offering ChatGPT a Tip Cause it to Generate Better Text? An Analysis:
Max Woolf:"I have a strong hunch that tipping does in fact work to improve the output quality of LLMs and its conformance to constraints, but it’s very hard to prove objectively. [...] Let’s do a more statistical, data-driven approach to finally resolve the debate."
Link 2024-02-24 How to make self-hosted maps that work everywhere and cost next to nothing:
Chris Amico provides a detailed roundup of the state of web mapping in 2024. It's never been easier to entirely host your own mapping infrastructure, thanks to OpenStreetMap, Overture, MBTiles, PMTiles, Maplibre and a whole ecosystem of other fine open source projects.
I like Protomaps creator Brandon Liu's description of this: "post-scarcity web mapping".
Link 2024-02-24 Upside down table trick with CSS:
I was complaining how hard it is to build a horizontally scrollable table with a scrollbar at the top rather than the bottom and RGBCube on Lobste.rs suggested rotating the container 180 degrees and then the table contents and headers 180 back again... and it totally works! Demo in this CodePen.
Link 2024-02-25 dclient 0.3:
dclient is my CLI utility for working with remote Datasette instances - in particular for authenticating with them and then running both read-only SQL queries and inserting data using the new Datasette write JSON API. I just picked up work on the project again after a six month gap - the insert command can now be used to constantly stream data directly to hosted Datasette instances such as Datasette Cloud.
Link 2024-02-26 Mistral Large:
Mistral Medium only came out two months ago, and now it's followed by Mistral Large. Like Medium, this new model is currently only available via their API. It scores well on benchmarks (though not quite as well as GPT-4) but the really exciting feature is function support, clearly based on OpenAI's own function design.
Functions are now supported via the Mistral API for both Mistral Large and the new Mistral Small, described as follows: "Mistral Small, optimised for latency and cost. Mistral Small outperforms Mixtral 8x7B and has lower latency, which makes it a refined intermediary solution between our open-weight offering and our flagship model."
TIL 2024-02-27 Tracking SQLite table history using a JSON audit log:
I continue to collect ways of tracking the history of a table of data stored in SQLite - see sqlite-history for previous experiments. …
Link 2024-02-27 All you need is Wide Events, not “Metrics, Logs and Traces”:
I've heard great things about Meta's internal observability platform Scuba, here's an explanation from ex-Meta engineer Ivan Burmistrov describing the value it provides and comparing it to the widely used OpenTelemetry stack.
Link 2024-02-27 The Zen of Python, Unix, and LLMs with Simon Willison:
I'm participating in a live online fireside chat with Hugo Bowne-Anderson tomorrow afternoon (3pm Pacific / 6pm Eastern / 11pm GMT) talking about LLMs, Datasette, my open source process, applying the Unix pipes philosophy to LLMs and a whole lot more. It's free to register.
Link 2024-02-28 Testcontainers:
Not sure how I missed this: Testcontainers is a family of testing libraries (for Python, Go, JavaScript, Ruby, Rust and a bunch more) that make it trivial to spin up a service such as PostgreSQL or Redis in a container for the duration of your tests and then spin it back down again.
The Python example code is delightful:
redis = DockerContainer("redis:5.0.3-alpine").with_exposed_ports(6379)
redis.start()
wait_for_logs(redis, "Ready to accept connections")
I much prefer integration-style tests over unit tests, and I like to make sure any of my projects that depend on PostgreSQL or similar can run their tests against a real running instance. I've invested heavily in spinning up Varnish or Elasticsearch ephemeral instances in the past - Testcontainers look like they could save me a lot of time.
The open source project started in 2015, span off a company called AtomicJar in 2021 and was acquired by Docker in December 2023.
Quote2024-02-28
For the last few years, Meta has had a team of attorneys dedicated to policing unauthorized forms of scraping and data collection on Meta platforms. The decision not to further pursue these claims seems as close to waving the white flag as you can get against these kinds of companies. But why? [...]
In short, I think Meta cares more about access to large volumes of data and AI than it does about outsiders scraping their public data now. My hunch is that they know that any success in anti-scraping cases can be thrown back at them in their own attempts to build AI training databases and LLMs. And they care more about the latter than the former.
Link 2024-02-29 The Zen of Python, Unix, and LLMs:
Here's the YouTube recording of my 1.5 hour conversation with Hugo Bowne-Anderson yesterday.
I fed a Whisper transcript to Google Gemini Pro 1.5 and asked it for the themes from our conversation, and it said we talked about "Python's success and versatility, the rise and potential of LLMs, data sharing and ethics in the age of LLMs, Unix philosophy and its influence on software development and the future of programming and human-computer interaction".
Link 2024-02-29 GGUF, the long way around:
Vicki Boykis dives deep into the GGUF format used by llama.cpp, after starting with a detailed description of how PyTorch models work and how they are traditionally persisted using Python pickle.
Pickle lead to safetensors, a format that avoided the security problems with downloading and running untrusted pickle files.
Llama.cpp introduced GGML, which popularized 16-bit (as opposed to 32-bit) quantization and bundled metadata and tensor data in a single file.
GGUF fixed some design flaws in GGML and is the default format used by Llama.cpp today.
Link 2024-02-29 Datasette 1.0a12:
Another alpha release, this time with a new query_actions() plugin hook, a new design for the table, database and query actions menus, a "does not contain" table filter and a fix for a minor bug with the JavaScript makeColumnActions() plugin mechanism.
Link 2024-03-01 Endatabas:
Endatabas is "an open source immutable database" - also described as "SQL document database with full history".
It uses a variant of SQL which allows you to insert data into tables that don't exist yet (they'll be created automatically) then run standard select queries, joins etc. It maintains a full history of every record and supports the recent SQL standard "FOR SYSTEM_TIME AS OF" clause for retrieving historical records as they existed at a specified time (it defaults to the most recent versions).
It's written in Common Lisp plus a bit of Rust, and includes Docker images for running the server and client libraries in JavaScript and Python. The on-disk storage format is Apache Arrow, the license is AGPL and it's been under development for just over a year.
It's also a document database: you can insert JSON-style nested objects directly into a table, and query them with path expressions like "select users.friends[1] from users where id = 123;"
They have a WebAssembly version and a nice getting started tutorial which you can try out directly in your browser.
Their "Why?" page lists full history, time travel queries, separation of storage from compute, schemaless tables and columnar storage as the five pillars that make up their product. I think it's a really interesting amalgamation of ideas.
Link 2024-03-01 Streaming HTML out of order without JavaScript:
A really interesting new browser capability. If you serve the following HTML:
Then later in the same page stream an element specifying that slot:
Item number 1
The previous slot will be replaced while the page continues to load.
I tried the demo in the most recent Chrome, Safari and Firefox (and Mobile Safari) and it worked in all of them.
The key feature is shadowrootmode=open, which looks like it was added to Firefox 123 on February 19th 2024 - the other two browsers are listed on caniuse.com as gaining it around March last year.
Link 2024-03-02 The Radio Squirrels of Point Reyes:
Beautiful photo essay by Ann Hermes about the band of volunteer "radio squirrels" keeping maritime morse code radio transmissions alive in the Point Reyes National Seashore.
TIL 2024-03-02 Using packages from JSR with esbuild:
JSR is a brand new package repository for "modern JavaScript and TypeScript", launched on March 1st by the Deno team as a new alternative to npm …
Link 2024-03-03 The One Billion Row Challenge in Go: from 1m45s to 4s in nine solutions:
How fast can you read a billion semicolon delimited (name;float) lines and output a min/max/mean summary for each distinct name - 13GB total?
Ben Hoyt describes his 9 incrementally improved versions written in Go in detail. The key optimizations involved custom hashmaps, optimized line parsing and splitting the work across multiple CPU cores.
Link 2024-03-03 Who Am I? Conditional Prompt Injection Attacks with Microsoft Copilot:
New prompt injection variant from Johann Rehberger, demonstrated against Microsoft Copilot. If the LLM tool you are interacting with has awareness of the identity of the current user you can create targeted prompt injection attacks which only activate when an exploit makes it into the token context of a specific individual.