Internet News
We're (Still) Not Giving Data Enough Credit
In his AI Speaker Series presentation at Sutter Hill Ventures, UC Berkeley's Alexei Efros argued that data, not algorithms, drives AI progress in visual computing. Here's my notes from his talk: We're (Still) Not Giving Data Enough Credit.
Large data is necessary but not sufficient. We need to learn to be humble and to give the data the credit that it deserves. The visual computing field's algorithmic bias has obscured data's fundamental role. recognizing this reality becomes crucial for evaluating where AI breakthroughs will emerge.
The Role of Data- Data got little respect in academia until recently as researchers spent years on algorithms, then scrambled for datasets at the last minute
- This mentality hurt us and stifled progress for a long time.
- Scientific Narcissism in AI: we prefer giving credit to human cleverness over data's role
- Human understanding relies heavily on stored experience, not just incoming sensory data.
- People see detailed steam engines in Monet's blurry brushstrokes, but the steam engine is in your head. Each person sees different versions based on childhood experiences
- People easily interpret heavily pixelated footage with brains filling in all the missing pieces
- "Mind is largely an emergent property of data" -Lance Williams
- Three landmark face detection papers achieved similar performance with completely different algorithms: neural networks. naive Bayes, and boosted cascades
- The real breakthrough wasn't algorithmic sophistication. It was realizing we needed negative data (images without faces). But 25 years later, we still credit the fancy algorithm.
- Efros's team demonstrated hole-filling in images using 2 million Flickr images with basic nearest-neighbor lookup. "The stupidest thing and it works."
- Comparing approaches with identical datasets revealed that fancy neural networks performed similarly to simple nearest neighbors.
- All the solution was in the data. Sophisticated algorithms often just perform fast lookup because the lookup contains the problem's solution.
- MIT's Aude Oliva's experiments reveal extraordinary human capacity for remembering natural images.
- But memory works selectively: high recognition rates for natural, meaningful images vs. near-chance performance on random textures.
- We don't have photographic memory. We remember things that are somehow on the manifold of natural experience.
- This suggests human intelligence is profoundly data-driven, but focused on meaningful experiences.
- Psychologist Alison Gopnik reframes AI as cultural technologies. Like printing presses, they collect human knowledge and make it easier to interface with it
- They're not creating truly new things, they're sophisticated interpolation systems.
- "Interpolation in sufficiently high dimensional space is indistinguishable from magic" but the magic sits in the data, not the algorithms
- Perhaps visual and textual spaces are smaller than we imagine, explaining data's effectiveness.
- 200 faces in PCA could model the whole of humanity's face. Can expand this to linear subspaces of not just pixels, but model weights themselves.
- Startup algorithm: "Is there enough data for this problem?" Text: lots of data, excellent performance. Images: less data, getting there. Video/Robotics: harder data, slower progress
- Current systems are "distillation machines" compressing human data into models.
- True intelligence may require starting from scratch: remove human civilization artifacts and bootstrap from primitive urges: hunger, jealousy, happiness
- "AI is not when a computer can write poetry. AI is when the computer will want to write poetry"
Podcast: Generative AI in the Real World
I recently had the pleasure of speaking with Ben Lorica on O'Reilly's Generative AI in the Real World podcast about how software applications are changing in the age of AI. We discussed a number of topics including:
- The shift from "running code + database" to "URL + model" as the new definition of an application
- How this transition mirrors earlier platform shifts like the web and mobile, where initial applications looked less robust but evolved significantly over time
- How a database system designed for AI agents instead of humans operates
- The "flipped" software development process where AI coding agents allow teams to build working prototypes rapidly first, then design and integrate them into products
- How this impacts design and engineering roles, requiring new skill sets but creating more opportunities for creation
- The importance of taste and human oversight in AI systems
- And more...
You can listen to the podcast Generative AI in the Real World: Luke Wroblewski on When Databases Talk Agent-Speak (29min) on O-Reilly's site. Thanks to all the folks there for the invitation.
Future Product Days: Hidden Forces Driving User Behavior
In her talk Reveal the Hidden Forces Driving User Behavior at Future Product Days, Sarah Thompson shared insights on how to leverage behavioral science to create more effective user experiences. Here's my notes from her talk:
- While AI technology evolves exponentially, the human brain has not had a meaningful update in approximately 40,000 years so we're still designing for the "caveman brain"
- This unchanging human element provides a stable foundation for design that doesn't change with every wave of technology
- Behavioral science matters more than ever because we now have tools that allow us to scale faster than ever
- All decisions are emotional because there is an system one (emotional) part of the brain that makes decisions first. This part of the brain lights up 10 seconds before a person is even aware they made a decision
- System 1 thinking is fast, automatic, and helped us survive through gut reactions. It still runs the show today but uses shortcuts and over 180 known cognitive biases to navigate complexity
- Every time someone makes a decision, the emotional brain instantly predicts whether there are more costs or gains to taking action. More costs? Don't do it. More gains? Move forward
- The emotional brain only cares about six intuitive categories of costs and gains: internal (mental, emotional, physical) and external (social, material, temporal)
- Mental: "Thinking is hard" We evolved to conserve mental effort - people drop off with too many choices, stick with defaults. Can the user understand what they need to do immediately?
- Social: "We are wired to belong" We evolved to treat social costs as life or death situations. Does this make users feel safe, seen, or part of a group? Or does it raise embarrassment or exclusion?
- Emotional: "Automatic triggers" Imagery and visuals are the fastest way to set emotional tone. What automatic trigger (positive or negative) might this design bring up for someone?
- Physical: "We're wired to conserve physical effort" Physical gains include tap-to-pay, facial recognition, wearable data collection. Can I remove real or perceived physical effort?
- Material: "Our brains evolved in scarcity" Scarcity tactics like "Bob booked this three minutes ago" drive immediate action. Are we asking people to give something up or are we giving them something in return?
- Temporal: "We crave immediate rewards" Any time people have to wait, we see drop off. Can we give immediate reward or make people feel like they're saving time?
- You can't escape the caveman brain, but you can design for it.
Future Product Days: How to solve the right problem with AI
In his How to solve the right problem with AI presentation at Future Product Days, Dave Crawford shared insights on how to effectively integrate AI into established products without falling into common traps. Here are my notes from his talk:
- Many teams have been given the directive to "go add some AI" to their products. With AI as a technology, it's very easy to fall into the trap of having an AI hammer where every problem looks like an AI nail.
- We need to focus on using AI where it's going to give the most value to users. It's not what we can do with AI, it's what makes sense to do with AI.
- People typically encounter AI through four main interaction types
- Discovery AI: Helps people find, connect, and learn information, often taking the place of search
- Analytical AI: Analyzes data to provide insights, such as detecting cancer from medical scans
- Generative AI: Creates content like images, text, video, and more
- Functional AI: Actually gets stuff done by performing actions directly or interacting with other services
- AI interaction patterns exist on a context spectrum from high user burden to low user burden
- Open Text-Box Chat: Users must provide all context (ChatGPT, Copilot) - high overhead for users
- Sidecar Experience: Has some context about what's happening in the rest of the app, but still requires context switching
- Embedded: Highly contextual AI that appears directly in the flow of user's work
- Background: Agents that perform tasks autonomously without direct user interaction
- Think Simply: Make something that makes sense and provides clear value. Users need to know what to expect from your AI experience
- Think Contextually: Can you make the experience more relevant for people using available context? Customize experiences within the user's workflow
- Think Big: AI can do a lot, so start big and work backwards.
- Mine, Reason, Infer: Make use of the information people give you.
- Think Proactively: What kinds of things can you do for people before they ask?
- Think Responsibly: Consider environmental and cost impacts of using AI.
- We should focus on delivering value first over delightful experiences
- Boring tasks that users find tedious
- Complex activities users currently offload to other services
- Long-winded processes that take too much time
- Frustrating experiences that cause user pain
- Repetitive tasks that could be automated
- Don't solve problems that are already well-solved with simpler solutions
- Not all AI needs to be a chat interface. Sometimes traditional UI is better than AI
- Users' tolerance and forgiveness of AI is really low. It takes around 8 months for a user to want to try an AI product again after a bad experience
- We're now trying to find the right problems to solve rather than finding the right solutions to problems. Build things that solve real problems, not just showcase AI capabilities
Future Product Days: The AI Adoption Gap
In her The AI Adoption Gap: Why Great Features Go Unused talk at Future Product Days in Copenhagen, Kate Moran shared insights on why users don't utilize AI features in digital products. Here's my notes from her talk:
- The best way to understand the people we're creating digital products for is to talk to them and watch them use our products.
- Most people are not looking for AI features nor are they expecting them. People are task-focused, they're just trying to get something done and move on.
- Top three reasons people don't use AI features: they have no reason to use it, they don't see it, they don't know how to use it.
- There are other issues like in enterprise use cases, trust. But these are the main ones.
- People don't care about the technology, they care about the outcome. AI-powered is not a value-add. Solving someone's problem is a value-add.
- Amazon introduced a shopping assistant that when tested, people really liked because the assistant has a lot of context: what you bought before, what you are looking at now, and more
- However, people could not find this feature and did not know how to use it. The button is labeled "Rufus" people don't associate this with something that helps them get answers about their shopping.
- Findability is how well you can locate something you are looking for. Discoverability is finding something you weren't looking for.
- In interfaces that people use a lot (are familiar with), they often miss new features especially when they are introduced with a single action among many others
- Designers are making basic mistakes that don't have anything to do with AI (naming, icons, presentation)
- People say conversational interfaces are the easiest to use but it's not true. Open text fields feel like search, so people treat them like smarter search instead of using the full capability of AI systems
- People have gotten used to using succinct keywords in text fields instead of providing lots of context to AI models that produce better outcomes
- Smaller-scope AI features like automatic summaries that require no user interaction perform well because they integrate seamlessly into existing workflows
- These adoption challenges are not exclusive to AI but apply to any new feature, As a result, all your existing design skills remain highly valuable for AI features.
Future Product Days: Future of Product Creators
In his talk The Future of Product Creators at Future Product Days in Copenhagen, Tobias Ahlin argued that divergent opinions and debate, not just raw capability, are the missing factors for achieving useful outcomes from AI systems. Here are my notes from his presentation:
- Many people are exposing a future vision where parallel agents creating products and features on demand.
- 2025 marked the year when agentic workflows became part of daily product development. AI agents quantifiably outperform humans on standardized tests: reading, writing, math, coding, and even specialized fields.
- Yet we face the 100 interns problem: managing agents that are individually smarter but "have no idea where they're going"
- Fundamental reasoning gaps: AI models have fundamental reasoning gaps. For example, AI can calculate rock-paper-scissors odds while failing to understand it has a built-in disadvantage by going second.
- Fatal mistakes in real-world applications: suggesting toxic glue for pizza, recommending eating rocks for minerals.
- Performance plateau problem: Unlike humans who improve with sustained effort, AI agents plateau after initial success and cannot meaningfully progress even with more time
- Real-world vs. benchmark performance: Research from Monitor shows 63% of AI-generated code fails tests, with 0% working without human intervention
- True reasoning is fundamentally a social function, "optimized for debate and communication, not thinking in isolation"
- Court systems exemplify this: adversarial arguments sharpen and improve each other through conflict
- Individual biases can complement each other when structured through critical scrutiny systems
- Teams naturally create conflicting interests: designers want to do more, developers prefer efficiency, PMs balance scope.This tension drives better outcomes
- AI significantly outperforms humans in creativity tests. In a Cornell study, GPT-4 performed better than 90.6% of humans in idea generation, with AI ideas being seven times more likely to rank in the top 10%
- So the cost of generating ideas is moving towards zero but human capability remains capped by our ability to evaluate and synthesize those ideas
- Current agents primarily help with production but future productivity requires and equal amount of effort in evaluation and synthesis.
- Institutionalized disconfirmation: creating systems where disagreement drives clarity, similar to scientific peer review
- Agents designed to disagree in loops: one agent produces code, another evaluates it, creating feedback systems that can overcome performance plateaus
- True reasoning will come from agents that are designed to disagree in loops rather than simple chain-of-thought approaches
Defining Chat Apps
With each new technology platform shift, what defines a software application changes dramatically. Even though we're still in the midst of the AI shift, there's emergent properties that seem to be shaping what at least a subset of AI applications, let's call them chat apps, might look like going forward.
At a high level, applications are defined by the systems they're discovered and operated in. This frames what capabilities they can utilize, their primary inputs, outputs, and more. That sounds abstract so let's make it concrete. Applications during the PC era were compiled binaries sold as shrink-wrapped software that used local compute and storage, monitors as output, and the mouse and keyboard as input.
These capabilities defined not only their interfaces (GUI) but their abilities as well. The same is true for applications born of the AI era. How they're discovered and where they operate will also define them. And that's particularly true of "chat apps".
So what's a chat app? Most importantly a chat app's compute engine is an AI model which means all the capabilities of the model also become capabilities of the app. If the model can translate from one language to another, the app can. If a model can generate PDF files, the app can. It's worth noting that "model" could be a combination of AI models (with routing), prompts and tools. But to an end user, it would appear as a singular entity like ChatGPT or Claude.
When an application runs in Claude or ChatGPT, it's like running in an OS (windows during the PC era, iOS during the mobile era). So how do you run a chat app in an AI model like Claude and what happens when you do? Today an application can be added to Claude as a "connector" probably running as a remote Model Context Protocol (MCP) server. The process involves some clicking through forms and dialog boxes but once setup, Claude has the ability to use the application on behalf of the person that added it.
As mentioned above, the capabilities of Claude are now the capabilities of the app. Claude can accept natural language as input, so can the app. When people upload an image to Claude, it understands its content, so does the app. Claude can search and browse the Web, so can the app. The same is true for output. If Claude can turn information into a PDF, so can the app. If Claude can add information to Salesforce, so can the app. You get the idea.
So what's left for the application to do? If the AI model provides input, output, and processing capabilities, what does a chat app do? In it's simplest form a chat app can be a database that stores the information the app uses for input, output, and processing and a set of dynamic instructions for the AI model on how to use it.
As always, a tangible example makes this clear. Let's say I want to make a chat app for tracking the concerts I'm attending. Using a service like AgentDB, I can start with an existing file of concerts I'm tracking or ask a model to create one. I then have a remote MCP server backed by a database of concert information and a dynamic template that continually instructs an AI model on to use it.
When I add that remote MCP link to Claude, I can: upload screenshots of upcoming concerts to track using Claude's image parsing ability); generate a calendar view of all my concerts (using Claude's coding ability); find additional information about an upcoming show (using Claude's Web search tools); and so on. All of these capabilities of the Claude "model" work with my database plus template, aka my chat app.
You can make your own chat apps instantly by using AgentDB to create a remote MCP link and adding it to Claude as a connector. It's not a very intuitive process today but will very likely feel as simple as using a mobile app store in the not too distant future. At which point, chat apps will probably proliferate.
Letting the Machines Learn
Every time I present on AI product design, I'm asked about AI and intellectual property. Specifically: aren't you worried about AI models "stealing" your work? I always answer that if I accused AI models of theft, I'd have to accuse myself as well. Let me explain…
I've spent 30 years writing three books and over two thousand articles on digital product design and strategy. But during those same 30 years? I've consumed exponentially more. Countless books, articles, tweets. Thousands of conversations. Products I've used, solutions I've analyzed. All of it shaped what I know and how I write.
If you asked me to trace the next sentence I type back to its sources, to properly attribute the influences that led to those specific words, I couldn't do it. The synthesis happens at a level I can't fully decompose.
AI models are doing what we do. Reading, viewing, learning, synthesizing. The only difference is scale. They process vastly more information than any human could. When they generate text, they're drawing from that accumulated knowledge. Sound familiar?
So when an AI model produces something influenced by my writings, how is that different from a designer who read my book and applies those principles? I put my books out there for people to buy and learn from. My articles? Free for anyone to read. Why should machines be excluded from that learning opportunity?
"But won't AI companies unfairly profit from training on your content?"
From AI model companies, for $20 per month, I get an assistant that's read more than I ever could, available instantly, capable of helping with everything from code reviews to strategic analysis. That same $20 couldn't buy me two hours of entry-level human assistance.
The benefit I receive from these models, trained on the collective knowledge of millions of contributors, including my microscopic contribution, dwarfs any hypothetical loss from my content being training data. In fact, I'm humbled that my thoughts could even be part of a knowledge base used by billions of people.
So let machines learn, just like humans do. For me, the value I get back from well-trained AI models far exceeds what my contribution puts in.
Unstructured Input in AI Apps Instead of Web Forms
Web forms exist to put information from people into databases. The input fields and formatting rules in online forms are there to make sure the information fits the structure a database needs. But unstructured input in AI-enabled applications means machines, instead of humans, can do this work.
17 years ago, I wrote a book on Web Form Design that started with "Forms suck." Fast forward to today and the sentiment still holds true. No one likes filling in forms but forms remain ubiquitous because they force people to provide information in the way it's stored within the database of an application. You know the drill: First Name, Last Name, Address Line 2, State abbreviation, and so on.
With Web forms, the burden is on people to adapt to databases. Today's AI models, however, can flip this requirement. That is, they allow people to provide information in whatever form they like and use AI do the work necessary to put that information into the right structure for a database.
How does this work? Instead of a Web form enforcing the database's input requirements a dynamic context system can handle it. One way of doing this is with AgentDB's templating system, which provides instructions to AI models for reading and writing information to a database.
With AgentDB connected to an AI model (via an MCP server), a person can simply say "add this" and provide an image, PDF, audio, video, you name it. The model will use AgentDB's template to decide what information to extract from this unstructured input and how to format it for the database. In the case where something is missing or incomplete, the model can ask for clarification or use tools (like search) to find possible answers.
In the example above, I upload a screenshot from Instagram announcing a concert and ask the AI model to add it to my concert tracker. The AgentDB template tells the model it needs Show, Date, Venue, City, Time, and Ticket Price for each database entry. So the AI model pulls this information from the unstructured input (screenshot) and, if complete, turns it into the structured format a database needs.
Of course, the unstructured input can also be a photo, a link to a Web page, a Word document, a PDF file, or even just audio where you say what you want to add. In each case the combination of AI model and AgentDB will fill in the database for you.
No Web form required. And no form is the best kind of Web Form Design.
World Knowledge Improves AI Apps
Applications built on top of large-scale AI models benefit from the AI model's built-in capabilities without requiring app developers to write additional code. Essentially if the AI model can do it, an application built on top of it can do it as well. To illustrate, let's look at the impact of a model's World knowledge on an app.
For years, software applications consisted of running code and a database. As a result, their capabilities were defined by coded features and what was inside the database. When the running code is replaced by a large language model (LLM), however, the information encoded in model's weights instantly becomes part of the capabilities of the application.
With AI apps, end users are no longer constrained by the code developers had the time and foresight to write. All the World knowledge (and other capabilities) in an AI model are now part of the application's logic. Since that sounds abstract let's look at a concrete example.
I created an AI app with AgentDB by uploading a database of NBA statistics spanning 77 years and 13.6 million play-by-play records. When I add the MCP link AgentDB makes for me to Anthropic's Claude, I have an application consisting of a database optimized for AI model use, and an AI model (Claude) to use as the application's brain. Here's a video tutorial on how to do this yourself.
In the past a developer would need to write code to render the user interface for an application front-end to this database. That code would determine what kind of questions people could get answers to. Usually this meant a bunch of UI input elements to search and filter games by date, team, player, etc. The NBA's stats page (below) is a great example of this kind of interface.
But no matter how much code developers write, they can't cover all the ways people might want to interact with information about the NBA's 77 years. For instance, a question like "What were the last 5 plays in the Malice in the Palace game?" requires either running code that can translate malice in the palace to a specific date and game or an extra field in the database for game nicknames.
When a large language model is an application's compute, however, no extra code needs to be written. The association between Malice in the Palace and November 19, 2004 is present in an AI model's weights and it can translate the natural language question into a form the associated database can answer.
An AI model can use its World knowledge to translate people's questions into the kind of multi-step queries needed to answer what seem like simple questions. Consider the example below of: "Who was the tallest player drafted in Ant-Man’s NBA draft class?" We need to figure what player Ant-Man refers to, what year he was drafted, who else was drafted then, get all their heights, and then compare them. Not a simple query to write by hand but with AI acting as an application's brain... it's quick and easy.
World knowledge, of course, isn't the only capability built-in to large-language models. There's multi-language support, vision (for image parsing), tool use, and more emerging. All of these are also application capabilities when you build apps on top of AI models.
Chat is: the Future or a Terrible UI
As the proliferation of AI-powered chat interfaces in software continues, people increasingly take one of two sides. Chat is the future of all UI or chat is a terrible UI. Turns out there's reason to believe both, here's a bunch of them.
Back in 2013, I proposed a variant of Jamie Zawinski's popular Law of Software Envelopment reframed as:
Every mobile app attempts to expand until it includes chat. Those applications which do not are replaced by ones which can.Today every major mobile app has some form of chat function whether social network, e-commerce, ride-share, and so on. So chat is already pervasive and thereby familiar, which made it a great interface to usher in the age of AI. But is it AI's final form?
“Chat is the future of software.”- People already know how to use chat interfaces. This familiarity means people can jump right in and start using powerful AI systems.
- An empty text box is great at capturing user intent: people can simply tell chat apps what they want to get done. “Just look at Google.”
- Natural language allows people to communicate what they want like they would in the real World, no need to learn a UI.
- The best interface is… no interface, an invisible interface, etc.
- Conversational interfaces can shift topics and goals, providing a way to compose information and actions that’s just right for specific. needs.
- Voice input means people don’t have to type but can still simply chat with powerful systems.
- Chat user interfaces for AI models are a fundamental shift from forcing humans to learn computers to computers understanding human language.
- Chat interfaces face the classic "invisible UI" problem: without clear affordances, people don't know what they can do, nor how to get the best results from them.
- Walls of text are suboptimal to communicate and display complex information and relationships unlike images, tables, charts, ad more.
- Scrolling through conversation threads to find and extract relevant information is painful, especially as chat conversations run long.
- Context gets lost in back and forth interactions which slow everything down. Typing everything you want to do is cumbersome.
- Language is a terrible way to describe visual, spatial, and temporal things.
- Voice-based interfaces make it even harder to communicate information better suited to images and user interfaces.
- We’re very early in the evolution of AI-powered software and lots of different and useful interfaces for interacting with AI will emerge.
It's also worth noting that chat isn't the only way to integrate AI in software products and increasingly agent-based applications outperform chat-only solutions. So expect things to keep changing.
Platform Shifts Redefine Apps
With each major technology platform shift, people underestimate how much "what an application is and how it's built" changes. From mainframes to PCs, to Web, to Mobile and now AI, computing platform changes redefined software and created new opportunities and constraints for application design and development.
These shifts not only impacted how applications work but also where they run, what they look like, how they're built, delivered, and experienced by people.
Mainframe era: Applications lived on massive shared computers in climate-controlled rooms, with people typing text-only commands into terminals that were basically windows into a distant brain. All the intelligence sat somewhere else, and you just got text back.
PC era: Software became physical products you'd buy in boxes, install from floppy disks or CDs, and run entirely on your own machine. Suddenly computing power lived under your desk, and applications could use rich graphical interfaces instead of just green text on black screens.
Web era: Applications moved into browsers accessed through URLs, shifting from installed software to services that updated automatically. No more version numbers or install wizards, just type an address and you're using the latest version built out of cross-platform Web standards UI components.
Mobile era: Applications shrank into task-focused apps downloaded from curated stores, designed for fingers not mice, and aware of your location and orientation. Computing became something in your pocket that could make use of the environment around you through cameras, GPS, and on-device sensors.
AI era: Instead of screens and buttons, applications are conversations where AI models understand intent, execute complex tasks, and adapt to context without explicit programming for every scenario. And we're just getting started.
While it's true that AI applications sound a lot like the mainframe applications of old, those apps required exact syntax and returned predetermined responses. AI applications understand natural language and generate solutions on the fly. They don't just process commands, they reason through problems and build UI as needed.
During each of these platform shifts, companies react the same way. They attempt to port the application models they had without thinking through and embracing what's different. Early Web site were posters and brochures. Early mobile apps were ported Websites. Just like early TV shows were just radio shows with cameras pointed at them.
But at the start of a technology platform shift, how applications will change isn't clear. It takes time for new forms to develop. As they do most companies will end up rebuilding their apps like they did for the Web, mobile, and more. Companies that embrace new capabilities and modes of building early on can gain a foothold and grow. That's why technology shifts are accompanied by a surge of new start-ups. Change is opportunity.
Five Paths to Solving Robotics
In his AI Speaker Series presentation at Sutter Hill Ventures, Google DeepMind's Ted Xiao outlined five worldviews on how to achieve useful, ubiquitous robotics and dug into his team's work integrating frontier models like Gemini directly into robotic systems. Here' my notes from his talk:
We're at a unique moment in robotics where there's no consensus on the path forward. Unlike other AI breakthroughs where approaches quickly consolidated, robotics remains wide open with multiple reasonable paths showing early signs of success. Ted presented five worldviews, each with smart researchers and builders pursuing them with conviction:
Industry IncumbentThese researchers believe general-purpose robotics is the wrong goal. Purpose-built solutions actually work today - from industrial automation to appliances we don't even call robots anymore. When robotics succeeds, we just call them tools. The path forward: directly optimize for specific use cases using decades of control theory and hardware expertise.
Humanoid CompanyThese researchers see hardware as the primary bottleneck. Once platforms stabilize, researchers excel at extracting performance - drones went from fragile research prototypes to consumer products, quadrupeds became robust commercial platforms. Humanoid form factors matter because the world is built for humans, and human-like robots can better leverage internet-scale human data.
Robot Foundation Model StartupThese researchers focuses on robot data and algorithms as the key. Generality is non-negotiable - transformative technologies are general by nature. The core challenge: building an "internet of robotics data" either vertically (solve one domain completely, then expand) or horizontally (achieve robotics' GPT-2 moment first, then improve). Bitter Lesson Believer
These researchers argue frontier models are the only existence proof of technology that can model internet-scale data with human-level performance. You can't solve robotics without incorporating these "magical artifacts" into the exploration process. Frontier model trends and compute lead robotics by about two years. AGI Bro
These researchers take the most radical position: just solve AGI and ask it to solve robotics. The Platonic Representation Hypothesis suggests that as AI models improve across domains, their internal representations converge. Perfect language understanding might inherently include physical understanding.
Gemini RoboticsTed's team at Google DeepMind pursued the Bitter Lesson approach, building robotics capabilities directly into Gemini rather than treating frontier models as black boxes.
Their Gemini Robotics system first enhanced embodied reasoning - teaching the model to understand the physical world better through 2D bounding boxes in cluttered scenes, 3D understanding with depth and orientation, pointing for granular precision, and grasp angles for manipulation. The system then learned low-level control with diverse robot actions, operating at 50Hz control frequency with quarter-second end-to-end latency. This unlocked three key advances:
- Interactivity: The robot responds to dynamic scenes, following objects as they move and adapting to human interference
- Dexterity: Beyond rigid objects, it can fold clothes, wrap headphone wires, and manipulate shoelaces
- Generalization: Handles visual distribution shifts (new lighting, distractors), semantic variations (typos, different languages), and spatial changes (different sized objects requiring different strategies)
When deployed at a conference with completely novel conditions - crowds, different lighting, new table - the system maintained reasonable behavior for arbitrary user requests, showing sparks of that GPT-2 moment where it attempts something sensible regardless of input.
Dark Horses and Emerging Paradigms- Several emerging paradigms could completely upend current approaches.
- Video World Models learning physics without robots through action-conditioned video generation
- Robot-Free Data from simulation or humans with head-mounted cameras
- Thinking Models applying frontier models' reasoning capabilities to robotics
- Locomotion-Manipulation Unity bridging RL-based locomotion with foundation model manipulation
There's no consensus on which path will win. Each approach has reasonable arguments and early signs of success. The lack of agreement isn't a weakness - it's what makes this the most exciting time in robotics history.
Rethinking Applications for AI
With every new technology platform, the concept of an application shifts. Consider the difference between compiled apps during the PC era, online applications during the Web, and app stores during mobile. Now with AI it's happening again.
Before getting into the impact AI is having on applications, it's worth noting we still have downloadable desktop applications, Web applications, mobile app stores and everything in between. Technology platform shifts don't wipe out the past and they also don't happen overnight. So AI-driven changes, while happening fast, are going to be happening for a long time.
The basic components of an application have also stayed consistent for a long time. An application at its highest level is just running code and a database. The database stores the information an application manipulates and the running code allows you to manipulate it through input and output controls (user interface, auth, etc.).
As AI coding agents have gotten more capable, they've increasingly been able to handle more of the running code aspect of an application. Not only can they generate code, they can review it, fix it, and maintain it. So it's not hard to see how AI agents can be a self-sustaining loop.
As AI coding agents take on more and more of the running code aspect of an application, they increasingly need to create, update, and work with databases. Today's databases, however, were made for people to use, not agents. So we built a database system for AI applications called AgentDB designed for agents, not people.
AgentDB allows agents to manifest new databases by just referencing a unique ID. Instead of filling out a series of forms - like people do when creating a database. It also provides agents with templates that let them start using databases immediately and consistently across use cases. These templates are dynamic so as agents learn new or better ways to use a database, that information is passed on to all subsequent agent use.
With these two changes, the concept of an application is already shifting. But what if the idea of needing "running code" is also changing? By fronting an AgentDB database and template system with a remote Model Context Protocol (MCP) server: all you need is a URL plus an AI model to have an app.
All you need is a URL plus an AI model to have an app.In this video, I demonstrate uploading a CSV file of a credit card statement to AgentDB. The system creates a database and template, encapsulates both with a remote MCP server URL that you can add to any AI application that supports remote MCP like Claude, Cursor, Augment Code, etc. The end result is an instant chat app.
Through natural language instructions, you can read and write data immediately and consistently and ask for any variant of user interface you want. Most credit card websites are painfully limiting but now I can create the specific visualizations, categories, queries, and features I want. No waiting around for the credit card site to implement new code.
You also don't need a CSV file to make an app. Just tell an AI model connected to AgentDB what you want. It can use AgentDB to create a database, populate it, and then ensure anything you add to it includes the right information. Tracking the date, location, and cost of concert tickets? AgentDB will enforce all that info is there and if you add a new bit of data to track, it can update all your records (see video below).
You can try making your own chat app from a database or CSV file at the demo page on AgentDB to get a feel for it. There's definitely some rough edges especially when trying to add a remote MCP server to some AI applications (in fact, this whole step should go away) but it's still pretty compelling.
As I mentioned at the start, we don't fully know how the AI platform shift will transform applications yet. Clearly, though, there's big changes coming.
Dynamic Context for AI Agents
For AI applications, context is king. So context management, and thereby context engineering, is critical to getting accurate answers to questions, keeping AI agents on task, and more. But context is also hard earned and fragile, which is why we launched templates in AgentDB.
When an AI agent decides it needs to make use of a database, it needs to go through a multi-step process of understanding. It usually takes 3-7 calls before an agent understands enough about a database's structure to accomplish something meaningful with it. That's a lot of time and tokens spent on understanding. Worse still, this discovery tax gets paid repeatedly. Every new agent session starts from zero, relearning the same database semantics that previous agents already figured out.
Templates in AgentDB tackle this by giving AI agents the context they need upfront, rather than forcing them to discover it through trial and error. Templates provide two key pieces of information about a database upfront: a semantic description and structural definition.
The semantic description explains why the database exists and how it should be used. It includes mappings for enumerated values and other domain-specific knowledge. Think of it as the database's user manual written for AI agents. The structural component uses migration schemas to define the database layout. This gives agents immediate understanding of tables, relationships, and data types without needing to query the system architecture.
With AgentDB templates, agents requests like "give me a list of my to-dos" (to-do database) or "create a new opportunity for this customer" (CRM database) work immediately.
Once you've defined a template, it works for any database that follows that pattern. So one template can provide the context an AI agent needs for any number of databases with the same intent. Like a tot-do list database for every user to keep with an earlier example.
But static instructions for AI agents only go so far. These are thinking machines after all. So AgentDB templates can evolve with on use. For example, a template can be dynamically updated with specific queries that worked well. This creates a feedback loop where templates become more effective over time, learning from real-world usage to provide better guidance to future AI interactions.
AgentDB templates are provided to AI agents as an MCP server which also supports raw SQL access. So AI agents can make use of a database effectively right away and still experiment through querying. AgentDB templates are another example of designing software for AI systems rather than humans because they're different "users".
Prompt Building User Interfaces
Perhaps the biggest problem facing AI products today is: people don't know all the things these products can do nor how to get the best results out of them. Not surprising when you consider most AI product interfaces are just empty text fields asking "what do you want to do?". Prompt building user interfaces can help answer that question and more.
We've been exploring ways to help people understand what's possible and how to accomplish it in Bench. Bench is AI for everyday work tasks. As such, it can do a lot: search the Web, browse the Web as you (with a browser extension), generate reports, make PowerPoint, use your email, and many more of the things that make up people's daily work tasks. The problem is... that's a lot.
To give people a better sense of what Bench can do, we started with suggested prompts (aka instructions) that accomplished specific work tasks. To make these as relevant as possible, we added an initial screen to the Bench start experience asking people to specify their primary roles at work: Engineering, Design, Sales, etc. If they did, the suggested prompts would be reflective of the kinds of things they might do at work. For example Sales folks would see suggestions like: research a prospect, prep for a sales meeting, summarize customer feedback, and so on.
The problem with these kinds of high level suggestions is they are exactly that: too high level. Though relevant to a role, they're not relevant to someone's current work tasks. Sales teams are researching prospects but doing it in a way that's specific to the product they're selling and the prospect they're researching. Generic prompt suggestions aren't that useful.
To account for this, we attempted to personalize the role-based suggestions by researching people's companies in the background while they signed up. This additional information allowed us to make suggestions more specific to the industry and company people worked for. This definitely made suggested prompts more specific, but it also made them less useful. Researching someone's company gives you some context but not nearly the amount its employees have. Because of this, personalized suggested prompts felt "off". So we went back to more generic suggestions but made them more atomic.
Instead of encompassing a complete work task, atomic suggestions just focused on part of it: where the information for a work task was coming from (look at my Gmail, search my Notion) and what the output of a work task should be (create a Word Doc, make a chart). These suggestions gave people a better sense of Bench's capabilities. It can read my calendar, it can make Google sheets. Almost immediately, though, it felt like these atomic suggestions should be combine-able.
To enable this, we made a prompt rewriter that would change based on what atomic suggestions people chose. If they picked Use Salesforce and Create Google Doc, the rewriter would merge these into a single instruction that made sense "Use [variable] from Salesforce to create a Google Doc". This turned the process of writing complex prompts into just clicking suggestions. The way these suggestions were laid out, however, didn't make clear they could be combined like this. They looked and felt like discrete prompts.
Enter the task builder. In the latest version of Bench, atomic suggestions have been expanded and laid out more like the building blocks of a prompt. People can either select what they want to do, use, make, or any combination of the three. The prompt rewriter then stitches together a machine-written prompt along with some optional inputs field people can fill in to provide more details about the work task they want to get done.
This prompt builder UI does a few things for people using Bench. It:
- makes what the product can do clearer
- provides a way to surface new functionality as it's added to the product
- rewrites people's prompts in a way that gets them to better outcomes
- clarifies what people can add to a prompt to make their tasks more effective
While that's a decent amount of good outcomes, design is never done and AI capabilities keep improving. As a result, I'm sure we're not done with not only Bench's task builder UI but solutions to discoverability and prompting in AI products overall. In other words... more to come.
Prompt Building User Interfaces
Perhaps the biggest problem facing AI products today is: people don't know all the things these products can do nor how to get the best results out of them. Not surprising when you consider most AI product interfaces are just empty text fields asking "what do you want to do?". Prompt building user interfaces can help answer that question and more.
We've been exploring ways to help people understand what's possible and how to accomplish it in Bench. Bench is AI for everyday work tasks. As such, it can do a lot: search the Web, browse the Web as you (with a browser extension), generate reports, make PowerPoint, use your email, and many more of the things that make up people's daily work tasks. The problem is... that's a lot.
To give people a better sense of what Bench can do, we started with suggested prompts (aka instructions) that accomplished specific work tasks. To make these as relevant as possible, we added an initial screen to the Bench start experience asking people to specify their primary roles at work: Engineering, Design, Sales, etc. If they did, the suggested prompts would be reflective of the kinds of things they might do at work. For example Sales folks would see suggestions like: research a prospect, prep for a sales meeting, summarize customer feedback, and so on.
The problem with these kinds of high level suggestions is they are exactly that: too high level. Though relevant to a role, they're not relevant to someone's current work tasks. Sales teams are researching prospects but doing it in a way that's specific to the product they're selling and the prospect they're researching. Generic prompt suggestions aren't that useful.
To account for this, we attempted to personalize the role-based suggestions by researching people's companies in the background while they signed up. This additional information allowed us to make suggestions more specific to the industry and company people worked for. This definitely made suggested prompts more specific, but it also made them less useful. Researching someone's company gives you some context but not nearly the amount its employees have. Because of this, personalized suggested prompts felt "off". So we went back to more generic suggestions but made them more atomic.
Instead of encompassing a complete work task, atomic suggestions just focused on part of it: where the information for a work task was coming from (look at my Gmail, search my Notion) and what the output of a work task should be (create a Word Doc, make a chart). These suggestions gave people a better sense of Bench's capabilities. It can read my calendar, it can make Google sheets. Almost immediately, though, it felt like these atomic suggestions should be combine-able.
To enable this, we made a prompt rewriter that would change based on what atomic suggestions people chose. If they picked Use Salesforce and Create Google Doc, the rewriter would merge these into a single instruction that made sense "Use [variable] from Salesforce to create a Google Doc". This turned the process of writing complex prompts into just clicking suggestions. The way these suggestions were laid out, however, didn't make clear they could be combined like this. They looked and felt like discrete prompts.
Enter the task builder. In the latest version of Bench, atomic suggestions have been expanded and laid out more like the building blocks of a prompt. People can either select what they want to do, use, make, or any combination of the three. The prompt rewriter then stitches together a machine-written prompt along with some optional inputs field people can fill in to provide more details about the work task they want to get done.
This prompt builder UI does a few things for people using Bench. It:
- makes what the product can do clearer
- provides a way to surface new functionality as it's added to the product
- rewrites people's prompts in a way that gets them to better outcomes
- clarifies what people can add to a prompt to make their tasks more effective
While that's a decent amount of good outcomes, design is never done and AI capabilities keep improving. As a result, I'm sure we're not done with not only Bench's task builder UI but solutions to discoverability and prompting in AI products overall. In other words... more to come.
AI Has Flipped Software Development
For years, it's been faster to create mockups and prototypes of software than to ship it to production. As a result, software design teams could stay "ahead" of engineering. Now AI coding agents make development 10x faster, flipping the traditional software development process on its head.
In my thirty years of working on software, the design teams I was part of were typically operating "out ahead" of our software development counterparts. Unburdened by existing codebases, technical debt, performance, and infrastructure limitations, designers could work quickly in mockups, wireframes, and even prototypes to help envision what we could or should build before time and effort was invested into actually building it.
While some software engineering teams could ship in days, in most (especially larger) organizations, building new features or redesigning apps could take months if not quarters or years. So there was plenty of time for designers to explore and iterate. This was also reflected in the ratio of designers to developers in most companies: an average of one designer for every twenty engineers.
When designs did move to the production engineering phase, there'd (hopefully) be a bunch of back and forth to resolve unanswered questions, new issues that came up, or changing requirements. A lot of this burden fell on engineering as they encountered edge cases, things missing in specs, cross-device capability differences, and more. What it added up to though, was that the process to build and launch something often took longer than the process to design it.
AI coding tools change this dynamic. Across several of our companies, software development teams are now "out ahead" of design. To be more specific, collaborating with AI agents (like Augment Code) allows software developers to move from concept to working code 10x faster. This means new features become code at a fast and furious pace.
When software is coded this way, however, it (currently at least) lacks UX refinement and thoughtful integration into the structure and purpose of a product. This is the work that designers used to do upfront but now need to "clean up" afterward. It's like the development process got flipped around. Designers used to draw up features with mockups and prototypes, then engineers would have to clean them up to ship them. Now engineers can code features so fast that designers are ones going back and cleaning them up.
So scary time to be a designer? No. Awesome time to be a designer. Instead of waiting for months, you can start playing with working features and ideas within hours. This allows everyone, whether designer or engineer, an opportunity to learn what works and what doesn’t. At its core rapid iteration improves software and the build, use/test, learn, repeat loop just flipped, it didn't go away.
In his Designing Perplexity talk at Sutter Hill Ventures, Henry Modisett described this new state as "prototype to productize" rather than "design to build". Sounds right to me.
AI Has Flipped Software Development
For years, it's been faster to create mockups and prototypes of software than to ship it to production. As a result, software design teams could stay "ahead" of engineering. Now AI coding agents make development 10x faster, flipping the traditional software development process on its head.
In my thirty years of working on software, the design teams I was part of were typically operating "out ahead" of our software development counterparts. Unburdened by existing codebases, technical debt, performance, and infrastructure limitations, designers could work quickly in mockups, wireframes, and even prototypes to help envision what we could or should build before time and effort was invested into actually building it.
While some software engineering teams could ship in days, in most (especially larger) organizations, building new features or redesigning apps could take months if not quarters or years. So there was plenty of time for designers to explore and iterate. This was also reflected in the ratio of designers to developers in most companies: an average of one designer for every twenty engineers.
When designs did move to the production engineering phase, there'd (hopefully) be a bunch of back and forth to resolve unanswered questions, new issues that came up, or changing requirements. A lot of this burden fell on engineering as they encountered edge cases, things missing in specs, cross-device capability differences, and more. What it added up to though, was that the process to build and launch something often took longer than the process to design it.
AI coding tools change this dynamic. Across several of our companies, software development teams are now "out ahead" of design. To be more specific, collaborating with AI agents (like Augment Code) allows software developers to move from concept to working code 10x faster. This means new features become code at a fast and furious pace.
When software is coded this way, however, it (currently at least) lacks UX refinement and thoughtful integration into the structure and purpose of a product. This is the work that designers used to do upfront but now need to "clean up" afterward. It's like the development process got flipped around. Designers used to draw up features with mockups and prototypes, then engineers would have to clean them up to ship them. Now engineers can code features so fast that designers are ones going back and cleaning them up.
So scary time to be a designer? No. Awesome time to be a designer. Instead of waiting for months, you can start playing with working features and ideas within hours. This allows everyone, whether designer or engineer, an opportunity to learn what works and what doesn’t. At its core rapid iteration improves software and the build, use/test, learn, repeat loop just flipped, it didn't go away.
In his Designing Perplexity talk at Sutter Hill Ventures, Henry Modisett described this new state as "prototype to productize" rather than "design to build". Sounds right to me.
Designing Software for AI Agents
From making apps, browsing the Web, to creating files, today's AI agents today can take on an increasing number of computing tasks on their own. But the software underlying these capabilities, wasn't made for agents. It was designed and built for people to use. As such there's an opportunity, and perhaps an increasing need, to rethink these systems for agent use.
When building agent-based AI applications, you'll likely butt up against a number of situations where existing software isn't optimized for what thinking machines can do. For instance, Web search. Nearly every agent-based AI application makes use of information on the Web to get things done. But Web Search APIs weren't written with agents in mind.
They provide a limited number of search results and a condensed snippet format that lines up more with how people use Web search interfaces. We get a page of ten blue links and scan them to decide which one to click. But AI agents aren't people. Not only can they make sense of many more search results at once, but their performance usually improves with larger document summaries and contents. People on the other hand, are unlikely to read through all search results before making a decision. So search APIs could certainly be rethought for agents.
Similarly, when agents are developing applications or collecting data, they can make use of databases. But once again databases were designed and built for people to use not AI agents. And once again they can be rethought for agents, which is what we did with our most recent launch: AgentDB.
Agents can (and do) produce 1000x more databases than people every day, so the process of spinning up and managing any database for an agent needs to be as easy and maintenance-free as possible. Most of the databases AI agents create will be short-lived after serving their initial purpose. But some databases will be used again and others still will be used regularly.
With this kind of volume costs can become an issue, so keeping that many databases available needs to be as cost effective as possible. Last but not least, the content of databases needs to work well as context for AI models so agents can use this data as part of their tasks.
AgentDB is a database system designed around these considerations. With AgentDB, creating a database only requires a Universally Unique Identifier (UUID). There's no setup or configuration step. So whenever an AI agent decides it needs a database, it has one simply by creating a UUID. No forms or set-up wizards involved.
Databases in AgentDB are stored as files not hosted services requiring compute and maintenance. If an AI agent needs to query a database or append to it, it can. But if it never needs to access it again, the database is just a file. That means you're only paying for the cost of storage to keep it around and because AgentDB databases are just files, they scale. Meaning they can easily keep up with the scale of AI agents.
To make data within each AgentDB database easily accessible as context for AI models, every AgentDB account is also an MCP server. This makes the data portable across AI applications as long as they support MCP server connections (which most do).
Altogether this example illustrates how even the most fundamental software infrastructure systems, like databases, can be rethought for the age of AI. The AgentDB database system doesn't look like a hosted database as a service solution because it's not designed and built for database admins and back-end developers. It's built for today's thinking machines.
And as agents take on more computing tasks for people, it won't be the only software made with agents as first class users.
