Building a tool showing how Gemini Pro can…

Simon Willison

Aug 26, 2024

Plus creating alt text for a bot using GPT-4o, and converting a PDF paper to semantic HTML with Gemini 1.5 Pro

Read →

2 Comments

Declan

Aug 26

This llm git commit message command line tool is fantastic!

Expand full comment

Nope

Sep 10

I've been trying to use gemini's bounding boxes just as i saw your newsletter! It's a great built-in feature, as I was first exploring with yolo5 but realized i would need to train it which I dont have time or experience. Gemini works pretty well but i just cant get it to give me the full precision i need. I'm trying to highlight number labels on images so i can automate labeling, but the fine level coordinates aren't quite there, plus occasional wild misses and hallucinations. Prior to thinking about using a vision model, I couldn't seem to get an adaptive enough OCR solution. Given the "almost right" llm output, I spent HOURS trying to figure out if I had the grid system wrong, the conversion wrong, or something else. Thanks to your article, i finally had the confidence to say about the issues... "its not me.... its google". Also, was happy to see my multi AI provider approach really mirrored what you described as well given their differing capabilities. Thank you for your perfect timing and expertise!! :)

Expand full comment

Simon Willison’s Newsletter

Building a tool showing how Gemini Pro can…