Discussion about this post

User's avatar
Jim Procter's avatar

I find Steve Krause's quote kind of weird.. MCP is really a pattern/protocol.. any api that offers dynamic self description of current capabilities does the same (squinting at graphQL, SOAs etc.. here ;) ). Fully agree tho that the fact that the ai client is dynamic is the thing (and prolly way easier than handcrafting an open ended system as traditional static code).

.. ramble over.. Loved this edition! And the brown pelican :)

Joe Faith's avatar

Thanks Simon, but I'm not sure the new version of the test is going in the right direction.

The original version showed that LLMs can now produce simple SVGs of particular bird species on things that look a bit like bikes. The new version seems to just ask for a slightly more specific bird on a slightly more specific bike. This is certainly computationally harder (especially rendering it in SVG) but not fundamentally different. Its 'just' more scaling.

The missing capability seems to be to represent something that is mechanically and physically plausible, a bike that can actually be propelled and steered. None of the current LLMs seem to do this. We need to see forks and chains! AIUI this causal understanding is the capability gap that lies behind the current focus on building world models (see https://substack.com/home/post/p-179449714).

Maybe that's the direction that the new version of the test should take -- and how we judge the results. Something like "generate an SVG of a pelican riding a mechanically-plausible bike it can propel and steer"?

2 more comments...

No posts

Ready for more?