Susty Ventures

4o

Created on May 16, 2024 12:00 EDT

This Monday, we all geared up for another round of OpenAI announcing the Next Big Thing. With GPT-4o, I observe that we are entering a new phase of the ChatGPT era, and I predict that the utility perception of “AI” is at risk unless we adjust our expectations somewhat.

The Good

I will start by saying that I was pretty impressed at what shipped: the o in 4o stands for omni, and this new model has supposedly been rearchitected from the ground up to use voice text and image natively, rather than via a pipeline of models to translate to text as was necessary before.

This opens the field to all kinds of additional non-verbal inputs that are expected to follow the same emergent property gains that ChatGPT astonished us all with back in ’22 through text alone, but now multimodal. One immediate big gain seems to be a fully bidirectional mode, where the AI can be interrupted right in the middle of speaking. This was always a struggle in the past as one would have to phrase very carefully the request to a Google Home/Alexa and then cringe while listening to an answer that was usually off-topic or unhelpful, before being able to retry.

The Bad

So, interacting with these AIs is now more natural-feeling than it has ever been before. However, this could be the limit to where chatty assistants take us.

Shiny and cool though the 4o demo is, and has every indication of working as advertised given OpenAI’s proven aggressive shipping schedule, activity from competitors like Google and interest from Apple, I worry that collectively we are engineering ourselves into an entirely circular problem.

Real problems are complex

I already had misgivings around the GPT-4V(ision) model used to decipher parking rules that are too complicated to begin with, having been developed in successive layers of human bureaucracy to the point of irreducibility (nothing new here since Parkinson’s Law).

Now, we have a telco demo that cannot possibly scale either for the business or for the consumer (by my estimate, the entire interaction uses 370 tokens, costing roughly $0.005 shared equally among the two). Not only inefficient, the whole thing is prone to the slightest deviation filling space needlessly as the two models tirelessly generate off of one another without apparent purpose.

This is becoming more obvious now with highly natural, human understandable discussions, but at it’s most ridiculous this is becoming equivalent to buying two chess computers and connecting them up so you can be left in peace, as in the Le Chat cartoon strip.[1]

We tend to fool ourselves

I see boosterism everywhere attributing a personality to something that has none, and designating this technology as somehow essential. ChatGPT and its ilk are literally designed to print words resembling a conversation but isn’t one at all: many books and blog posts that copy out their entire discussion with LLMs end up looking like talking to a clever mirror that has do sophisticated operations to render an image using hyper realistic graphics, but that end up painting a very generic picture. There is really little utility for another sonnet about dinosaurs after you’ve tried it once or twice.

However, access is good

One redeeming factor in the current situation is that futuristic AI tools that previously required a dedicated setup of compute to train models and run inference are now mostly dematerialised, which means that we can test them straight away and see for ourselves how effective they really are.

And in sustainability? The developers may even one day find a way to make money off AI. The FT reports buildout of US data centres will grow energy needs by an additional 30GW by 2030, 3x from today. The increase in capex likely means a real shift in how Big Tech companies do business, with many downstream effects.

And in healthcare? There are still many processes in healthcare that have not been automated, and I see the bet being that those highly human tasks can be slowly filled in by AI agents, providing more structure at lower human effort, similar to the AI Scribe segment already being rolled out, with some promising timesaving reports - but it’s worth considering why these processes are this way in the first place.

This excitement and expenditure may be worth it if we do things differently as a result. The potential to re-architect parts of our economy is clearly there, but it has to be directed by us humans, rather than over-reliance on a curated composite of our knowledge to date, which is of limited help in coming up with new solutions and better ways of doing things.

[1] Amusingly the SEO for Le Chat, the Belgian response to Garfield, is now completely overrun by the letters GPT