So much thanks for this - I tried it out on o3 and it works a treat. For years I’ve been managing this process on a weird Rube Goldberg workflow of browser instrumentation, HTML parsing and spreadsheet analysis. Obviously the AI is being massively helpful here in CoT and summarisation but perhaps the biggest breakthrough is having algorithmic access to Google searches at last - something that was always problematic before.
For what it's worth, I've had a couple of recent interactions with o3 where it searched in response to my prompts but then misinterpreted/hallucinated what it found in the searches. I was looking at features of an open source data catalog and potential implementation approaches based on the feature set. This is obviously a very small sample size, but so far I'm tentatively in the camp saying that o3 makes up things more often, even in search-related use cases.
Same happened to me today. The ability to use tools during reasoning has such potential but something is awry in how they've implemented it; I'm guessing that they are doing some kind of chunking so that it's cherry-picking contextless chunks from page content, or doing some other kind of lossy compression.
So much thanks for this - I tried it out on o3 and it works a treat. For years I’ve been managing this process on a weird Rube Goldberg workflow of browser instrumentation, HTML parsing and spreadsheet analysis. Obviously the AI is being massively helpful here in CoT and summarisation but perhaps the biggest breakthrough is having algorithmic access to Google searches at last - something that was always problematic before.
https://open.substack.com/pub/kendallmoon/p/the-quiet-death-of-personalization?r=5a3ym1&utm_medium=ios
I wrote a recent post on the way that llms will upend the economic model for the web. Would love your thoughts.
For what it's worth, I've had a couple of recent interactions with o3 where it searched in response to my prompts but then misinterpreted/hallucinated what it found in the searches. I was looking at features of an open source data catalog and potential implementation approaches based on the feature set. This is obviously a very small sample size, but so far I'm tentatively in the camp saying that o3 makes up things more often, even in search-related use cases.
Same happened to me today. The ability to use tools during reasoning has such potential but something is awry in how they've implemented it; I'm guessing that they are doing some kind of chunking so that it's cherry-picking contextless chunks from page content, or doing some other kind of lossy compression.