Internet News

Rethinking Applications for AI

LukeW - Sun, 08/17/2025 - 2:00pm

With every new technology platform, the concept of an application shifts. Consider the difference between compiled apps during the PC era, online applications during the Web, and app stores during mobile. Now with AI it's happening again.

Before getting into the impact AI is having on applications, it's worth noting we still have downloadable desktop applications, Web applications, mobile app stores and everything in between. Technology platform shifts don't wipe out the past and they also don't happen overnight. So AI-driven changes, while happening fast, are going to be happening for a long time.

The basic components of an application have also stayed consistent for a long time. An application at its highest level is just running code and a database. The database stores the information an application manipulates and the running code allows you to manipulate it through input and output controls (user interface, auth, etc.).

As AI coding agents have gotten more capable, they've increasingly been able to handle more of the running code aspect of an application. Not only can they generate code, they can review it, fix it, and maintain it. So it's not hard to see how AI agents can be a self-sustaining loop.

As AI coding agents take on more and more of the running code aspect of an application, they increasingly need to create, update, and work with databases. Today's databases, however, were made for people to use, not agents. So we built a database system for AI applications called AgentDB designed for agents, not people.

AgentDB allows agents to manifest new databases by just referencing a unique ID. Instead of filling out a series of forms - like people do when creating a database. It also provides agents with templates that let them start using databases immediately and consistently across use cases. These templates are dynamic so as agents learn new or better ways to use a database, that information is passed on to all subsequent agent use.

With these two changes, the concept of an application is already shifting. But what if the idea of needing "running code" is also changing? By fronting an AgentDB database and template system with a remote Model Context Protocol (MCP) server: all you need is a URL plus an AI model to have an app.

All you need is a URL plus an AI model to have an app.

In this video, I demonstrate uploading a CSV file of a credit card statement to AgentDB. The system creates a database and template, encapsulates both with a remote MCP server URL that you can add to any AI application that supports remote MCP like Claude, Cursor, Augment Code, etc. The end result is an instant chat app.

Through natural language instructions, you can read and write data immediately and consistently and ask for any variant of user interface you want. Most credit card websites are painfully limiting but now I can create the specific visualizations, categories, queries, and features I want. No waiting around for the credit card site to implement new code.

You can try making your own chat app from a database or CSV file at the demo page on AgentDB to get a feel for it. There's definitely some rough edges especially when trying to add a remote MCP server to some AI applications (in fact, this whole step should go away) but it's still pretty compelling.

As I mentioned at the start, we don't fully know how the AI platform shift will transform applications yet. Clearly, though, there's big changes coming.

Dynamic Context for AI Agents

LukeW - Wed, 08/06/2025 - 2:00pm

For AI applications, context is king. So context management, and thereby context engineering, is critical to getting accurate answers to questions, keeping AI agents on task, and more. But context is also hard earned and fragile, which is why we launched templates in AgentDB.

When an AI agent decides it needs to make use of a database, it needs to go through a multi-step process of understanding. It usually takes 3-7 calls before an agent understands enough about a database's structure to accomplish something meaningful with it. That's a lot of time and tokens spent on understanding. Worse still, this discovery tax gets paid repeatedly. Every new agent session starts from zero, relearning the same database semantics that previous agents already figured out.

Templates in AgentDB tackle this by giving AI agents the context they need upfront, rather than forcing them to discover it through trial and error. Templates provide two key pieces of information about a database upfront: a semantic description and structural definition.

The semantic description explains why the database exists and how it should be used. It includes mappings for enumerated values and other domain-specific knowledge. Think of it as the database's user manual written for AI agents. The structural component uses migration schemas to define the database layout. This gives agents immediate understanding of tables, relationships, and data types without needing to query the system architecture.

With AgentDB templates, agents requests like "give me a list of my to-dos" (to-do database) or "create a new opportunity for this customer" (CRM database) work immediately.

Once you've defined a template, it works for any database that follows that pattern. So one template can provide the context an AI agent needs for any number of databases with the same intent. Like a tot-do list database for every user to keep with an earlier example.

But static instructions for AI agents only go so far. These are thinking machines after all. So AgentDB templates can evolve with on use. For example, a template can be dynamically updated with specific queries that worked well. This creates a feedback loop where templates become more effective over time, learning from real-world usage to provide better guidance to future AI interactions.

AgentDB templates are provided to AI agents as an MCP server which also supports raw SQL access. So AI agents can make use of a database effectively right away and still experiment through querying. AgentDB templates are another example of designing software for AI systems rather than humans because they're different "users".

Prompt Building User Interfaces

LukeW - Sun, 07/27/2025 - 2:00pm

Perhaps the biggest problem facing AI products today is: people don't know all the things these products can do nor how to get the best results out of them. Not surprising when you consider most AI product interfaces are just empty text fields asking "what do you want to do?". Prompt building user interfaces can help answer that question and more.

We've been exploring ways to help people understand what's possible and how to accomplish it in Bench. Bench is AI for everyday work tasks. As such, it can do a lot: search the Web, browse the Web as you (with a browser extension), generate reports, make PowerPoint, use your email, and many more of the things that make up people's daily work tasks. The problem is... that's a lot.

To give people a better sense of what Bench can do, we started with suggested prompts (aka instructions) that accomplished specific work tasks. To make these as relevant as possible, we added an initial screen to the Bench start experience asking people to specify their primary roles at work: Engineering, Design, Sales, etc. If they did, the suggested prompts would be reflective of the kinds of things they might do at work. For example Sales folks would see suggestions like: research a prospect, prep for a sales meeting, summarize customer feedback, and so on.

The problem with these kinds of high level suggestions is they are exactly that: too high level. Though relevant to a role, they're not relevant to someone's current work tasks. Sales teams are researching prospects but doing it in a way that's specific to the product they're selling and the prospect they're researching. Generic prompt suggestions aren't that useful.

To account for this, we attempted to personalize the role-based suggestions by researching people's companies in the background while they signed up. This additional information allowed us to make suggestions more specific to the industry and company people worked for. This definitely made suggested prompts more specific, but it also made them less useful. Researching someone's company gives you some context but not nearly the amount its employees have. Because of this, personalized suggested prompts felt "off". So we went back to more generic suggestions but made them more atomic.

Instead of encompassing a complete work task, atomic suggestions just focused on part of it: where the information for a work task was coming from (look at my Gmail, search my Notion) and what the output of a work task should be (create a Word Doc, make a chart). These suggestions gave people a better sense of Bench's capabilities. It can read my calendar, it can make Google sheets. Almost immediately, though, it felt like these atomic suggestions should be combine-able.

To enable this, we made a prompt rewriter that would change based on what atomic suggestions people chose. If they picked Use Salesforce and Create Google Doc, the rewriter would merge these into a single instruction that made sense "Use [variable] from Salesforce to create a Google Doc". This turned the process of writing complex prompts into just clicking suggestions. The way these suggestions were laid out, however, didn't make clear they could be combined like this. They looked and felt like discrete prompts.

Enter the task builder. In the latest version of Bench, atomic suggestions have been expanded and laid out more like the building blocks of a prompt. People can either select what they want to do, use, make, or any combination of the three. The prompt rewriter then stitches together a machine-written prompt along with some optional inputs field people can fill in to provide more details about the work task they want to get done.

This prompt builder UI does a few things for people using Bench. It:

  • makes what the product can do clearer
  • provides a way to surface new functionality as it's added to the product
  • rewrites people's prompts in a way that gets them to better outcomes
  • clarifies what people can add to a prompt to make their tasks more effective

While that's a decent amount of good outcomes, design is never done and AI capabilities keep improving. As a result, I'm sure we're not done with not only Bench's task builder UI but solutions to discoverability and prompting in AI products overall. In other words... more to come.

Prompt Building User Interfaces

LukeW - Sun, 07/27/2025 - 2:00pm

Perhaps the biggest problem facing AI products today is: people don't know all the things these products can do nor how to get the best results out of them. Not surprising when you consider most AI product interfaces are just empty text fields asking "what do you want to do?". Prompt building user interfaces can help answer that question and more.

We've been exploring ways to help people understand what's possible and how to accomplish it in Bench. Bench is AI for everyday work tasks. As such, it can do a lot: search the Web, browse the Web as you (with a browser extension), generate reports, make PowerPoint, use your email, and many more of the things that make up people's daily work tasks. The problem is... that's a lot.

To give people a better sense of what Bench can do, we started with suggested prompts (aka instructions) that accomplished specific work tasks. To make these as relevant as possible, we added an initial screen to the Bench start experience asking people to specify their primary roles at work: Engineering, Design, Sales, etc. If they did, the suggested prompts would be reflective of the kinds of things they might do at work. For example Sales folks would see suggestions like: research a prospect, prep for a sales meeting, summarize customer feedback, and so on.

The problem with these kinds of high level suggestions is they are exactly that: too high level. Though relevant to a role, they're not relevant to someone's current work tasks. Sales teams are researching prospects but doing it in a way that's specific to the product they're selling and the prospect they're researching. Generic prompt suggestions aren't that useful.

To account for this, we attempted to personalize the role-based suggestions by researching people's companies in the background while they signed up. This additional information allowed us to make suggestions more specific to the industry and company people worked for. This definitely made suggested prompts more specific, but it also made them less useful. Researching someone's company gives you some context but not nearly the amount its employees have. Because of this, personalized suggested prompts felt "off". So we went back to more generic suggestions but made them more atomic.

Instead of encompassing a complete work task, atomic suggestions just focused on part of it: where the information for a work task was coming from (look at my Gmail, search my Notion) and what the output of a work task should be (create a Word Doc, make a chart). These suggestions gave people a better sense of Bench's capabilities. It can read my calendar, it can make Google sheets. Almost immediately, though, it felt like these atomic suggestions should be combine-able.

To enable this, we made a prompt rewriter that would change based on what atomic suggestions people chose. If they picked Use Salesforce and Create Google Doc, the rewriter would merge these into a single instruction that made sense "Use [variable] from Salesforce to create a Google Doc". This turned the process of writing complex prompts into just clicking suggestions. The way these suggestions were laid out, however, didn't make clear they could be combined like this. They looked and felt like discrete prompts.

Enter the task builder. In the latest version of Bench, atomic suggestions have been expanded and laid out more like the building blocks of a prompt. People can either select what they want to do, use, make, or any combination of the three. The prompt rewriter then stitches together a machine-written prompt along with some optional inputs field people can fill in to provide more details about the work task they want to get done.

This prompt builder UI does a few things for people using Bench. It:

  • makes what the product can do clearer
  • provides a way to surface new functionality as it's added to the product
  • rewrites people's prompts in a way that gets them to better outcomes
  • clarifies what people can add to a prompt to make their tasks more effective

While that's a decent amount of good outcomes, design is never done and AI capabilities keep improving. As a result, I'm sure we're not done with not only Bench's task builder UI but solutions to discoverability and prompting in AI products overall. In other words... more to come.

AI Has Flipped Software Development

LukeW - Sat, 07/26/2025 - 2:00pm

For years, it's been faster to create mockups and prototypes of software than to ship it to production. As a result, software design teams could stay "ahead" of engineering. Now AI coding agents make development 10x faster, flipping the traditional software development process on its head.

In my thirty years of working on software, the design teams I was part of were typically operating "out ahead" of our software development counterparts. Unburdened by existing codebases, technical debt, performance, and infrastructure limitations, designers could work quickly in mockups, wireframes, and even prototypes to help envision what we could or should build before time and effort was invested into actually building it.

While some software engineering teams could ship in days, in most (especially larger) organizations, building new features or redesigning apps could take months if not quarters or years. So there was plenty of time for designers to explore and iterate. This was also reflected in the ratio of designers to developers in most companies: an average of one designer for every twenty engineers.

When designs did move to the production engineering phase, there'd (hopefully) be a bunch of back and forth to resolve unanswered questions, new issues that came up, or changing requirements. A lot of this burden fell on engineering as they encountered edge cases, things missing in specs, cross-device capability differences, and more. What it added up to though, was that the process to build and launch something often took longer than the process to design it.

AI coding tools change this dynamic. Across several of our companies, software development teams are now "out ahead" of design. To be more specific, collaborating with AI agents (like Augment Code) allows software developers to move from concept to working code 10x faster. This means new features become code at a fast and furious pace.

When software is coded this way, however, it (currently at least) lacks UX refinement and thoughtful integration into the structure and purpose of a product. This is the work that designers used to do upfront but now need to "clean up" afterward. It's like the development process got flipped around. Designers used to draw up features with mockups and prototypes, then engineers would have to clean them up to ship them. Now engineers can code features so fast that designers are ones going back and cleaning them up.

So scary time to be a designer? No. Awesome time to be a designer. Instead of waiting for months, you can start playing with working features and ideas within hours. This allows everyone, whether designer or engineer, an opportunity to learn what works and what doesn’t. At its core rapid iteration improves software and the build, use/test, learn, repeat loop just flipped, it didn't go away.

In his Designing Perplexity talk at Sutter Hill Ventures, Henry Modisett described this new state as "prototype to productize" rather than "design to build". Sounds right to me.

AI Has Flipped Software Development

LukeW - Sat, 07/26/2025 - 2:00pm

For years, it's been faster to create mockups and prototypes of software than to ship it to production. As a result, software design teams could stay "ahead" of engineering. Now AI coding agents make development 10x faster, flipping the traditional software development process on its head.

In my thirty years of working on software, the design teams I was part of were typically operating "out ahead" of our software development counterparts. Unburdened by existing codebases, technical debt, performance, and infrastructure limitations, designers could work quickly in mockups, wireframes, and even prototypes to help envision what we could or should build before time and effort was invested into actually building it.

While some software engineering teams could ship in days, in most (especially larger) organizations, building new features or redesigning apps could take months if not quarters or years. So there was plenty of time for designers to explore and iterate. This was also reflected in the ratio of designers to developers in most companies: an average of one designer for every twenty engineers.

When designs did move to the production engineering phase, there'd (hopefully) be a bunch of back and forth to resolve unanswered questions, new issues that came up, or changing requirements. A lot of this burden fell on engineering as they encountered edge cases, things missing in specs, cross-device capability differences, and more. What it added up to though, was that the process to build and launch something often took longer than the process to design it.

AI coding tools change this dynamic. Across several of our companies, software development teams are now "out ahead" of design. To be more specific, collaborating with AI agents (like Augment Code) allows software developers to move from concept to working code 10x faster. This means new features become code at a fast and furious pace.

When software is coded this way, however, it (currently at least) lacks UX refinement and thoughtful integration into the structure and purpose of a product. This is the work that designers used to do upfront but now need to "clean up" afterward. It's like the development process got flipped around. Designers used to draw up features with mockups and prototypes, then engineers would have to clean them up to ship them. Now engineers can code features so fast that designers are ones going back and cleaning them up.

So scary time to be a designer? No. Awesome time to be a designer. Instead of waiting for months, you can start playing with working features and ideas within hours. This allows everyone, whether designer or engineer, an opportunity to learn what works and what doesn’t. At its core rapid iteration improves software and the build, use/test, learn, repeat loop just flipped, it didn't go away.

In his Designing Perplexity talk at Sutter Hill Ventures, Henry Modisett described this new state as "prototype to productize" rather than "design to build". Sounds right to me.

Designing Software for AI Agents

LukeW - Sun, 07/20/2025 - 2:00pm

From making apps, browsing the Web, to creating files, today's AI agents today can take on an increasing number of computing tasks on their own. But the software underlying these capabilities, wasn't made for agents. It was designed and built for people to use. As such there's an opportunity, and perhaps an increasing need, to rethink these systems for agent use.

When building agent-based AI applications, you'll likely butt up against a number of situations where existing software isn't optimized for what thinking machines can do. For instance, Web search. Nearly every agent-based AI application makes use of information on the Web to get things done. But Web Search APIs weren't written with agents in mind.

They provide a limited number of search results and a condensed snippet format that lines up more with how people use Web search interfaces. We get a page of ten blue links and scan them to decide which one to click. But AI agents aren't people. Not only can they make sense of many more search results at once, but their performance usually improves with larger document summaries and contents. People on the other hand, are unlikely to read through all search results before making a decision. So search APIs could certainly be rethought for agents.

Similarly, when agents are developing applications or collecting data, they can make use of databases. But once again databases were designed and built for people to use not AI agents. And once again they can be rethought for agents, which is what we did with our most recent launch: AgentDB.

Agents can (and do) produce 1000x more databases than people every day, so the process of spinning up and managing any database for an agent needs to be as easy and maintenance-free as possible. Most of the databases AI agents create will be short-lived after serving their initial purpose. But some databases will be used again and others still will be used regularly.

With this kind of volume costs can become an issue, so keeping that many databases available needs to be as cost effective as possible. Last but not least, the content of databases needs to work well as context for AI models so agents can use this data as part of their tasks.

AgentDB is a database system designed around these considerations. With AgentDB, creating a database only requires a Universally Unique Identifier (UUID). There's no setup or configuration step. So whenever an AI agent decides it needs a database, it has one simply by creating a UUID. No forms or set-up wizards involved.

Databases in AgentDB are stored as files not hosted services requiring compute and maintenance. If an AI agent needs to query a database or append to it, it can. But if it never needs to access it again, the database is just a file. That means you're only paying for the cost of storage to keep it around and because AgentDB databases are just files, they scale. Meaning they can easily keep up with the scale of AI agents.

To make data within each AgentDB database easily accessible as context for AI models, every AgentDB account is also an MCP server. This makes the data portable across AI applications as long as they support MCP server connections (which most do).

Altogether this example illustrates how even the most fundamental software infrastructure systems, like databases, can be rethought for the age of AI. The AgentDB database system doesn't look like a hosted database as a service solution because it's not designed and built for database admins and back-end developers. It's built for today's thinking machines.

And as agents take on more computing tasks for people, it won't be the only software made with agents as first class users.

Designing Software for AI Agents

LukeW - Sun, 07/20/2025 - 2:00pm

From making apps, browsing the Web, to creating files, today's AI agents today can take on an increasing number of computing tasks on their own. But the software underlying these capabilities, wasn't made for agents. It was designed and built for people to use. As such there's an opportunity, and perhaps an increasing need, to rethink these systems for agent use.

When building agent-based AI applications, you'll likely butt up against a number of situations where existing software isn't optimized for what thinking machines can do. For instance, Web search. Nearly every agent-based AI application makes use of information on the Web to get things done. But Web Search APIs weren't written with agents in mind.

They provide a limited number of search results and a condensed snippet format that lines up more with how people use Web search interfaces. We get a page of ten blue links and scan them to decide which one to click. But AI agents aren't people. Not only can they make sense of many more search results at once, but their performance usually improves with larger document summaries and contents. People on the other hand, are unlikely to read through all search results before making a decision. So search APIs could certainly be rethought for agents.

Similarly, when agents are developing applications or collecting data, they can make use of databases. But once again databases were designed and built for people to use not AI agents. And once again they can be rethought for agents, which is what we did with our most recent launch: AgentDB.

Agents can (and do) produce 1000x more databases than people every day, so the process of spinning up and managing any database for an agent needs to be as easy and maintenance-free as possible. Most of the databases AI agents create will be short-lived after serving their initial purpose. But some databases will be used again and others still will be used regularly.

With this kind of volume costs can become an issue, so keeping that many databases available needs to be as cost effective as possible. Last but not least, the content of databases needs to work well as context for AI models so agents can use this data as part of their tasks.

AgentDB is a database system designed around these considerations. With AgentDB, creating a database only requires a Universally Unique Identifier (UUID). There's no setup or configuration step. So whenever an AI agent decides it needs a database, it has one simply by creating a UUID. No forms or set-up wizards involved.

Databases in AgentDB are stored as files not hosted services requiring compute and maintenance. If an AI agent needs to query a database or append to it, it can. But if it never needs to access it again, the database is just a file. That means you're only paying for the cost of storage to keep it around and because AgentDB databases are just files, they scale. Meaning they can easily keep up with the scale of AI agents.

To make data within each AgentDB database easily accessible as context for AI models, every AgentDB account is also an MCP server. This makes the data portable across AI applications as long as they support MCP server connections (which most do).

Altogether this example illustrates how even the most fundamental software infrastructure systems, like databases, can be rethought for the age of AI. The AgentDB database system doesn't look like a hosted database as a service solution because it's not designed and built for database admins and back-end developers. It's built for today's thinking machines.

And as agents take on more computing tasks for people, it won't be the only software made with agents as first class users.

Context Management UI in AI Products

LukeW - Tue, 07/08/2025 - 2:00pm

They say context is king and that's certainly true in AI products where the content, tools, and instructions applications provide to AI models shape their behavior and subsequent results. But if context is so critical, how do we allow people to understand and manage it when interacting with AI-driven software?

In AI products, there's a lot of stuff that could be in context (provided to an AI model as part of its instructions) at any given point, but not everything will be in context all the time because AI models have context limits. So when getting results from AI products, people aren't sure if or how much they should trust them. Was the right information used to answer my question? Did the model hallucinate or use the wrong information?

When I launched my personal AI two years ago, context was much simpler than it is today. In Ask LukeW, when people ask a question about digital product design, the system searches through my writings, finds and puts the most relevant bits into context for AI models to use and reference, then cities them in the results people see. This is pretty transparent in the interface: the articles, videos, audio, and PDFs used are shown on the right with citations within each response to where these files were used the most.

The most complicated things get in Ask LukeW is when someone opens one of these citied articles, videos, or PDFs to view its full contents. In this case, a small "context chip" is added to the question bar to make clear questions can be asked of just this file. In other words, the file is the primary thing in context. If someone wants to ask a question of the whole corpus of my writings and talks again, they can simply click on the X that removes this context constraint and the chip disappears from the question bar. You can try this out yourself here.

Context chips are pretty common in AI products today because they're a relatively easy way to both give people a sense of what's influencing an AI model's replies and to add or remove it. When what's in context expands, however, they don't scale very well. For example, Augment Code uses context chips for retrieval systems, active files, selected text, and more.

Using a context chip to display everything influencing an AI model's response begins to break down when many things (especially different things) are in context. Displaying them all eats up valuable space in the UI and requires that their names or identifiers are truncated to fit. That kind of defeats the purpose of "showing you what's in context". Also when AI products do automatic context retrieval like Augment Code's context retrieval engine: does that always show up as a chip? or should people not worry about it and trust the system is finding and putting the right things into context?

With AI products using agents these issues are compounded because each tool call an agent makes can retrieve context in different ways or multiple times. So showing every bit of context found or created by tools as a context chip quickly breaks down. To account for this in earlier versions of Bench, we showed the context from tools used by agents as it was being created. But this turned out to be a jarring experience as the context would show up then go away when the next tool's context arrived (as you can see in the video).

Since then, we've moved to showing an agent's process of creating something as condensed steps with links to the context in each step. So people can click on any given steps to see the context a tool either found or created. But that context isn't being automatically flashed in front of them as it's made. This lets people focus on the output and only dig into the process when they want to understand what led to the output.

This approach becomes even more relevant with agent orchestration. When agents can make use of agents themselves, you end up with nested amounts of context. Told you things were a lot simpler two years ago! In these cases, Bench just shows the collective context combined from multiple tool calls in one link. This allows people to examine what cumulative context was created by sub agents. But importantly this combined context is treated the same way - whether it comes from a single tool or a subagent that uses multiple tools.

While making context understood and manageable feels like the right thing to provide transparency and control, increasingly people seem to focus more on the output of AI products and less on the process that created them. Only when things don't seem "right" do they dig into the kinds of process timelines and context links that Bench provides. So if people become even more confident using AI products, we might see context management UIs with even less presence.

Context Management UI in AI Products

LukeW - Tue, 07/08/2025 - 2:00pm

They say context is king and that's certainly true in AI products where the content, tools, and instructions applications provide to AI models shape their behavior and subsequent results. But if context is so critical, how do we allow people to understand and manage it when interacting with AI-driven software?

In AI products, there's a lot of stuff that could be in context (provided to an AI model as part of its instructions) at any given point, but not everything will be in context all the time because AI models have context limits. So when getting results from AI products, people aren't sure if or how much they should trust them. Was the right information used to answer my question? Did the model hallucinate or use the wrong information?

When I launched my personal AI two years ago, context was much simpler than it is today. In Ask LukeW, when people ask a question about digital product design, the system searches through my writings, finds and puts the most relevant bits into context for AI models to use and reference, then cities them in the results people see. This is pretty transparent in the interface: the articles, videos, audio, and PDFs used are shown on the right with citations within each response to where these files were used the most.

The most complicated things get in Ask LukeW is when someone opens one of these citied articles, videos, or PDFs to view its full contents. In this case, a small "context chip" is added to the question bar to make clear questions can be asked of just this file. In other words, the file is the primary thing in context. If someone wants to ask a question of the whole corpus of my writings and talks again, they can simply click on the X that removes this context constraint and the chip disappears from the question bar. You can try this out yourself here.

Context chips are pretty common in AI products today because they're a relatively easy way to both give people a sense of what's influencing an AI model's replies and to add or remove it. When what's in context expands, however, they don't scale very well. For example, Augment Code uses context chips for retrieval systems, active files, selected text, and more.

Using a context chip to display everything influencing an AI model's response begins to break down when many things (especially different things) are in context. Displaying them all eats up valuable space in the UI and requires that their names or identifiers are truncated to fit. That kind of defeats the purpose of "showing you what's in context". Also when AI products do automatic context retrieval like Augment Code's context retrieval engine: does that always show up as a chip? or should people not worry about it and trust the system is finding and putting the right things into context?

With AI products using agents these issues are compounded because each tool call an agent makes can retrieve context in different ways or multiple times. So showing every bit of context found or created by tools as a context chip quickly breaks down. To account for this in earlier versions of Bench, we showed the context from tools used by agents as it was being created. But this turned out to be a jarring experience as the context would show up then go away when the next tool's context arrived (as you can see in the video).

Since then, we've moved to showing an agent's process of creating something as condensed steps with links to the context in each step. So people can click on any given steps to see the context a tool either found or created. But that context isn't being automatically flashed in front of them as it's made. This lets people focus on the output and only dig into the process when they want to understand what led to the output.

This approach becomes even more relevant with agent orchestration. When agents can make use of agents themselves, you end up with nested amounts of context. Told you things were a lot simpler two years ago! In these cases, Bench just shows the collective context combined from multiple tool calls in one link. This allows people to examine what cumulative context was created by sub agents. But importantly this combined context is treated the same way - whether it comes from a single tool or a subagent that uses multiple tools.

While making context understood and manageable feels like the right thing to provide transparency and control, increasingly people seem to focus more on the output of AI products and less on the process that created them. Only when things don't seem "right" do they dig into the kinds of process timelines and context links that Bench provides. So if people become even more confident using AI products, we might see context management UIs with even less presence.

What Do You Want To AI?

LukeW - Sun, 06/29/2025 - 2:00pm

Alongside an increasing sameness of features and user interfaces, AI applications have also converged on their approach to primary calls to action: "What Do You Want To ___?" But is there a better way... especially for more domain specific applications?

Looking across AI products today, most feature an open-ended text field with an equally open-ended call to action:

  • What do you want to know?
  • What can I help with?
  • What do you want to create?
  • What do you want to build?
  • What will you imagine?
  • Ask anything...
  • Ask a question...
  • Ask [AI tool]...

So many questions. I've even turned them into a running joke. When a financial company integrates their AI: "What do you want to bank?" or "What do you want to accountant?" Silly I know, but it illustrates the issue. People often don't know what AI products can do nor how to best instruct/prompt them. Questions just exacerbate the issue.

It may be a small detail but instead of asking, how about instructing? Reve's image creation call to action says: "Describe an image or drop one here...". Bench's AI-powered workspace starts with: "Describe the task you want Bench to do...". Both calls to action are still open ended enough that so they can capture the kind of broad intent AI models can handle. But perhaps there's something to having a bit more guidance beyond "What Do You Want To AI?"

What Do You Want To AI?

LukeW - Sun, 06/29/2025 - 2:00pm

Alongside an increasing sameness of features and user interfaces, AI applications have also converged on their approach to primary calls to action: "What Do You Want To ___?" But is there a better way... especially for more domain specific applications?

Looking across AI products today, most feature an open-ended text field with an equally open-ended call to action:

  • What do you want to know?
  • What can I help with?
  • What do you want to create?
  • What do you want to build?
  • What will you imagine?
  • Ask anything...
  • Ask a question...
  • Ask [AI tool]...

So many questions. I've even turned them into a running joke. When a financial company integrates their AI: "What do you want to bank?" or "What do you want to accountant?" Silly I know, but it illustrates the issue. People often don't know what AI products can do nor how to best instruct/prompt them. Questions just exacerbate the issue.

It may be a small detail but instead of asking, how about instructing? Reve's image creation call to action says: "Describe an image or drop one here...". Bench's AI-powered workspace starts with: "Describe the task you want Bench to do...". Both calls to action are still open ended enough that so they can capture the kind of broad intent AI models can handle. But perhaps there's something to having a bit more guidance beyond "What Do You Want To AI?"

More on Generative Publishing

LukeW - Sat, 06/21/2025 - 2:00pm

One of the most common questions people ask my personal AI, Ask LukeW, is "how did you build this?" While I've written a lot about the high level architecture and product design details of the service, I never published a more technical overview. Doing so highlighted enough interesting generative publishing ideas that I decided to share a bit about the process.

First of all, Ask LukeW makes use of the thousands of articles I've written over the years to answer people's questions about digital product design. Yes, that's a lot of writing but it's not enough to capture all the things I've learned over the past 30 years. Which means sometimes people Ask LukeW questions that I can answer but haven't written about.

In the admin system I built for Ask LukeW, I can not only see the questions that don't get answered well but I can also add content to answer them better in the future. Over the last two years, I've added about 500 answers and thereby expanded the corpus Ask LukeW can respond from by a lot. So the next time similar questions get asked, people aren't left without answers.

That process is an interesting part of generative publishing that I've written about before but it's also how I know that people regularly ask how I built Ask LukeW. they want technical details: what frameworks, what models, what services. I never wrote this up because I'm not that technical and several great engineers helped me build Ask LukeW. As a result, I didn't think I'd do a great job detailing the technical aspect of things.

But one day it occurred to me I could use our AI for code company, Augment Code, which has a deep contextual understanding of codebases to help me write up how Ask LukeW works. I opened the codebase in VS Code and asked Augment the questions people asked me: "how does the feature work?" "what is the codebase?" "what is the tech stack?" and got great detailed responses.

Augment, however, doesn't answer questions the way I do. So I took Augment's detailed technical replies and dropped them into another one of our companies, Bench. A while back I had Bench read a lot of my blog posts and create a prompt that writes articles the way I would. I've saved this prompt in Bench's agent library and can apply it anytime I want it to write like I would.

Once I had Augment's technical details of how Ask LukeW worked written the way I'd explain them by Bench, I took the results and added them as saved answers to the Ask LukeW corpus. Now anytime someone asks these kinds of questions, they get much more detailed technical answers. In fact, this worked so well that I also asked Augment to write up the overall tech stack for my Website and went through the same process.

I for one, found this a really enlightening look at where generative publishing is now. I can see what kinds of information I should be publishing by looking at the questions people ask my personal AI but don't get good answers for. I can use an AI for coding tool to turn code into prose. I can use an agentic workspace to rewrite that prose the way I would because I taught it to write like me. And finally I can feed that content back into my overall corpus so it's available for any similar questions people ask in the future.

That doesn't look like the publishing of old to me. Of course, it's split between multiple tools, requires me know what each one can do, and a host of other issues. We're still early but it's exciting.

More on Generative Publishing

LukeW - Sat, 06/21/2025 - 2:00pm

One of the most common questions people ask my personal AI, Ask LukeW, is "how did you build this?" While I've written a lot about the high level architecture and product design details of the service, I never published a more technical overview. Doing so highlighted enough interesting generative publishing ideas that I decided to share a bit about the process.

First of all, Ask LukeW makes use of the thousands of articles I've written over the years to answer people's questions about digital product design. Yes, that's a lot of writing but it's not enough to capture all the things I've learned over the past 30 years. Which means sometimes people Ask LukeW questions that I can answer but haven't written about.

In the admin system I built for Ask LukeW, I can not only see the questions that don't get answered well but I can also add content to answer them better in the future. Over the last two years, I've added about 500 answers and thereby expanded the corpus Ask LukeW can respond from by a lot. So the next time similar questions get asked, people aren't left without answers.

That process is an interesting part of generative publishing that I've written about before but it's also how I know that people regularly ask how I built Ask LukeW. they want technical details: what frameworks, what models, what services. I never wrote this up because I'm not that technical and several great engineers helped me build Ask LukeW. As a result, I didn't think I'd do a great job detailing the technical aspect of things.

But one day it occurred to me I could use our AI for code company, Augment Code, which has a deep contextual understanding of codebases to help me write up how Ask LukeW works. I opened the codebase in VS Code and asked Augment the questions people asked me: "how does the feature work?" "what is the codebase?" "what is the tech stack?" and got great detailed responses.

Augment, however, doesn't answer questions the way I do. So I took Augment's detailed technical replies and dropped them into another one of our companies, Bench. A while back I had Bench read a lot of my blog posts and create a prompt that writes articles the way I would. I've saved this prompt in Bench's agent library and can apply it anytime I want it to write like I would.

Once I had Augment's technical details of how Ask LukeW worked written the way I'd explain them by Bench, I took the results and added them as saved answers to the Ask LukeW corpus. Now anytime someone asks these kinds of questions, they get much more detailed technical answers. In fact, this worked so well that I also asked Augment to write up the overall tech stack for my Website and went through the same process.

I for one, found this a really enlightening look at where generative publishing is now. I can see what kinds of information I should be publishing by looking at the questions people ask my personal AI but don't get good answers for. I can use an AI for coding tool to turn code into prose. I can use an agentic workspace to rewrite that prose the way I would because I taught it to write like me. And finally I can feed that content back into my overall corpus so it's available for any similar questions people ask in the future.

That doesn't look like the publishing of old to me. Of course, it's split between multiple tools, requires me know what each one can do, and a host of other issues. We're still early but it's exciting.

Common AI Product Issues

LukeW - Thu, 06/19/2025 - 2:00pm

At this point, almost every software domain has launched or explored AI features. Despite the wide range of use cases, most of these implementations have been the same ("let's add a chat panel to our app"). So the problems are the same as well.

Capability Awareness

Open-ended interfaces to AI models have the same problem as every "invisible" interface that came before them. Without a clear set of affordances, people don't know what they can do. The vision of these invisible UIs was always something like "Voice interfaces will work when you can ask them anything". Today it's "AI chat interfaces will work because you can tell them to do anything". Sounds great but...

In reality, even extremely capable systems (like extremely capable people) have limitations. They do some things well, some things ok, and other things poorly. How you ask them to do things also matters as different phrasings yield different results. But without affordances, these guideposts are as invisible as the UI.

I'm pretty certain this is the biggest problem in AI product interfaces today: because large-scale AI models can do so many things (but not all things or all things equally well), most people don't know what they can do nor how to best instruct/prompt them.

Context Awareness

If capability awareness is knowing what an AI product can do, context awareness is knowing how it did it. The fundamental question here is "what information did an AI product use to provide an answer?" But there's lots of potential answers especially as agents can make use of an increasing number and variety of tools. Some examples of what could be in context (considered in an AI model's response):

  • It's own training data? If so, when was the cut off?
  • The history of your session with the model? If so, going how far back?
  • The history of all your sessions or a user profile? If so, which parts?
  • Specific tools like search or browse? If so, which of their results?
  • Specific connections to other services or accounts? If so...

You get the idea. There's a lot of stuff that could be in context at any given point, but not everything will be in context all the time because models have context limits. So when getting replies people aren't sure if or how much they should trust them. Was the right information used or not (hallucinations)?

Walls of Text

While writing has done an enormous amount to enable communication, it's not the only medium for conveying information and, often, it may not be the best. Despite this, most AI products render the streams of text emitting from AI models as their primary output and they render them in a linear "chat-like" interface. Unsurprisingly, people have a hard time extracting and recalling information by scrolling through long blocks of text.

As the novelty of AI models being able to write text wears off, people increasingly ask for visuals, tables, and other formats like slides, spreadsheets as output instead of just walls of text.

And More..

Yes, there's other issues with AI products. I'm not suggesting this is a complete list but it is reflective of what I'm currently seeing over and over in user testing and across multiple domains. But it's still early for AI products so... more solutions and issues to come.

Common AI Product Issues

LukeW - Thu, 06/19/2025 - 2:00pm

At this point, almost every software domain has launched or explored AI features. Despite the wide range of use cases, most of these implementations have been the same ("let's add a chat panel to our app"). So the problems are the same as well.

Capability Awareness

Open-ended interfaces to AI models have the same problem as every "invisible" interface that came before them. Without a clear set of affordances, people don't know what they can do. The vision of these invisible UIs was always something like "Voice interfaces will work when you can ask them anything". Today it's "AI chat interfaces will work because you can tell them to do anything". Sounds great but...

In reality, even extremely capable systems (like extremely capable people) have limitations. They do some things well, some things ok, and other things poorly. How you ask them to do things also matters as different phrasings yield different results. But without affordances, these guideposts are as invisible as the UI.

I'm pretty certain this is the biggest problem in AI product interfaces today: because large-scale AI models can do so many things (but not all things or all things equally well), most people don't know what they can do nor how to best instruct/prompt them.

Context Awareness

If capability awareness is knowing what an AI product can do, context awareness is knowing how it did it. The fundamental question here is "what information did an AI product use to provide an answer?" But there's lots of potential answers especially as agents can make use of an increasing number and variety of tools. Some examples of what could be in context (considered in an AI model's response):

  • It's own training data? If so, when was the cut off?
  • The history of your session with the model? If so, going how far back?
  • The history of all your sessions or a user profile? If so, which parts?
  • Specific tools like search or browse? If so, which of their results?
  • Specific connections to other services or accounts? If so...

You get the idea. There's a lot of stuff that could be in context at any given point, but not everything will be in context all the time because models have context limits. So when getting replies people aren't sure if or how much they should trust them. Was the right information used or not (hallucinations)?

Walls of Text

While writing has done an enormous amount to enable communication, it's not the only medium for conveying information and, often, it may not be the best. Despite this, most AI products render the streams of text emitting from AI models as their primary output and they render them in a linear "chat-like" interface. Unsurprisingly, people have a hard time extracting and recalling information by scrolling through long blocks of text.

As the novelty of AI models being able to write text wears off, people increasingly ask for visuals, tables, and other formats like slides, spreadsheets as output instead of just walls of text.

And More..

Yes, there's other issues with AI products. I'm not suggesting this is a complete list but it is reflective of what I'm currently seeing over and over in user testing and across multiple domains. But it's still early for AI products so... more solutions and issues to come.

Agent Management Interface Patterns

LukeW - Sun, 06/08/2025 - 2:00pm

As an increasing number of AI applications evolve to agents doing work for people, agent management becomes a critical part of these product's design. How can people start, steer, and stop multiple agents (and subagents) and stay on top of their results? Here's several approaches we've been building and testing.

Whenever a new technology emerges, user interfaces go through a balancing act between making the new technology approachable through common patterns and embodying what makes it unique. Make things too different and risk not having an onramp that brings people on board smoothly. Make things too familiar and risk limiting the potential of new capabilities within old models and interactions.

"Copy, extend, and finally, discovery of a new form. It takes a while to shed old paradigms." - Scott Jenson

As an example, Apple's VisionOS interface notably made use of many desktop and mobile interaction patterns to smooth the transition to spatial computing. But at the same time, they didn't take full advantage of spatial computing's opportunities by boxing limitless 3D interactions within the windows, icons, and menus, and pointers (WIMP) familiar to desktop interfaces.

Hence, the balancing act.

This context helps frame the way we've approached designing agent management interfaces. Are there high level user interface patterns that are both familiar enough for people to intuit how they work and flexible enough to enable effective AI agent management at a high level? In an agent-centric AI application like Augment Code for software development or Bench for office productivity, people need to be able to:

  • Start new agents through a combination of instructions and context (files, connections, etc.)
  • Schedule agents to run at certain times or under certain conditions.
  • Scrutinize the work of agents to asses whether or not they're making the right kind of progress.
  • Steer agents when they go off course, require clarification, or uncover something that suggests they should take a different path.
  • Stop agents when they've either done enough or are no longer being effective.
  • See, share, and save the results or processes of agents.

To help people adapt to agent management, we explored how interface patterns like kanban boards, dashboards, inboxes, tasks lists and calendars could fulfill many of these requirements by presenting the state of multiple agents and allowing people to access specific agents when they need to take further action.

Kanban Board

Kanban boards visualize work as cards moving through distinct stages, typically arranged in columns from left to right to represent progress through a workflow. They could be used to organize agents as they transition between scheduled, running, complete, and reviewed states. Or within workflows specific to domains like sales or engineering.

This pattern seems like a straightforward way to give people a sense of the state of multiple agents. But in kanban boards, people also expect to be able to move items between cards. How that would affect agents? Would they begin a new task defined by the card? Would that create a new agent or re-route an existing one?

Dashboard

Dashboards pull together multiple data sources into a unified monitoring interface through different visualizations like charts, graphs, and metrics. Unlike a kanban board, there's no workflow implied by the arrangement of the elements in a dashboard so you can pretty much represent agents anywhere and any way you like.

While that seems appealing, especially to those yearning for a "mission control" style interface to manage agents, it can quickly become problematic. When agents can be represented in different ways in different parts of a UI, it's hard to grasp both the big picture and details of what's happening.

Inbox

The inbox pattern organizes items in a chronological stream that requires user action to process. Items are listed from newest to oldest with visual cues like unread counts so people can quickly assess and act on items without losing context. Most of us do so every day in our messaging and email apps so applying the same model to agents seems natural.

But if you get too much email or too many texts, your inbox can get away from you. So it's not an ideal pattern for applications with a high volume of agents to manage nor for those that require coordination of multiple, potentially inter-dependent agents.

For what it's worth, this where we iterated to (for now) in Bench. So if you'd like to try this pattern out, fire off a few agents there.

Task List

Task lists present items as discrete, actionable units with clear completion states (usually a checkbox). Their vertical stack format lets people focus on specific tasks while still seeing the bigger picture. Task lists can be highly structured or pretty ad hoc lists of random to-dos.

Indented lists of subtasks can also display parallel agent processes and show the inter-dependencies of agents but perhaps at the expense of simplicity. In a single linear list, like an Inbox, its much easier to see what's happening than in a hierarchical task list where some subtasks may be collapsed but relevant.

Calendar

Calendar interfaces use a grid structure that maps to our understanding of time, with consistent rows and columns representing dates and times. This allows people to make use of both temporal memory and spatial memory to locate and contextualize items. Calendars also typically provide high level (month) and detailed (day) views of what's happening.

When it comes to scheduling agents, a calendar makes a lot of sense: just add it the same way you'd add a meeting. It's also helpful for contextually grouping the work of agents with actual meetings. "These tasks were all part of this project's brainstorm meeting." "I ran that task right after our one-on-one meeting." Representing the work of agents on a calendar can be tricky, though, as agents can run for minutes or many hours. And where should event-triggered agents should up on a calendar?

Coming back to Scott Jenson's quote at the start of this article, it takes a while to discover new paradigms and discover new forms. So it's quite likely as these interface patterns are adapted to agent management use cases, they'll evolve further and not end up looking much like their current selves. As David Hoang recently suggested, maybe agent management interfaces should learn from patterns found in Real-Time Strategy (RTS) games instead? Interesting...

Agent Management Interface Patterns

LukeW - Sun, 06/08/2025 - 2:00pm

As an increasing number of AI applications evolve to agents doing work for people, agent management becomes a critical part of these product's design. How can people start, steer, and stop multiple agents (and subagents) and stay on top of their results? Here's several approaches we've been building and testing.

Whenever a new technology emerges, user interfaces go through a balancing act between making the new technology approachable through common patterns and embodying what makes it unique. Make things too different and risk not having an onramp that brings people on board smoothly. Make things too familiar and risk limiting the potential of new capabilities within old models and interactions.

"Copy, extend, and finally, discovery of a new form. It takes a while to shed old paradigms." - Scott Jenson

As an example, Apple's VisionOS interface notably made use of many desktop and mobile interaction patterns to smooth the transition to spatial computing. But at the same time, they didn't take full advantage of spatial computing's opportunities by boxing limitless 3D interactions within the windows, icons, and menus, and pointers (WIMP) familiar to desktop interfaces.

Hence, the balancing act.

This context helps frame the way we've approached designing agent management interfaces. Are there high level user interface patterns that are both familiar enough for people to intuit how they work and flexible enough to enable effective AI agent management at a high level? In an agent-centric AI application like Augment Code for software development or Bench for office productivity, people need to be able to:

  • Start new agents through a combination of instructions and context (files, connections, etc.)
  • Schedule agents to run at certain times or under certain conditions.
  • Scrutinize the work of agents to asses whether or not they're making the right kind of progress.
  • Steer agents when they go off course, require clarification, or uncover something that suggests they should take a different path.
  • Stop agents when they've either done enough or are no longer being effective.
  • See, share, and save the results or processes of agents.

To help people adapt to agent management, we explored how interface patterns like kanban boards, dashboards, inboxes, tasks lists and calendars could fulfill many of these requirements by presenting the state of multiple agents and allowing people to access specific agents when they need to take further action.

Kanban Board

Kanban boards visualize work as cards moving through distinct stages, typically arranged in columns from left to right to represent progress through a workflow. They could be used to organize agents as they transition between scheduled, running, complete, and reviewed states. Or within workflows specific to domains like sales or engineering.

This pattern seems like a straightforward way to give people a sense of the state of multiple agents. But in kanban boards, people also expect to be able to move items between cards. How that would affect agents? Would they begin a new task defined by the card? Would that create a new agent or re-route an existing one?

Dashboard

Dashboards pull together multiple data sources into a unified monitoring interface through different visualizations like charts, graphs, and metrics. Unlike a kanban board, there's no workflow implied by the arrangement of the elements in a dashboard so you can pretty much represent agents anywhere and any way you like.

While that seems appealing, especially to those yearning for a "mission control" style interface to manage agents, it can quickly become problematic. When agents can be represented in different ways in different parts of a UI, it's hard to grasp both the big picture and details of what's happening.

Inbox

The inbox pattern organizes items in a chronological stream that requires user action to process. Items are listed from newest to oldest with visual cues like unread counts so people can quickly assess and act on items without losing context. Most of us do so every day in our messaging and email apps so applying the same model to agents seems natural.

But if you get too much email or too many texts, your inbox can get away from you. So it's not an ideal pattern for applications with a high volume of agents to manage nor for those that require coordination of multiple, potentially inter-dependent agents.

For what it's worth, this where we iterated to (for now) in Bench. So if you'd like to try this pattern out, fire off a few agents there.

Task List

Task lists present items as discrete, actionable units with clear completion states (usually a checkbox). Their vertical stack format lets people focus on specific tasks while still seeing the bigger picture. Task lists can be highly structured or pretty ad hoc lists of random to-dos.

Indented lists of subtasks can also display parallel agent processes and show the inter-dependencies of agents but perhaps at the expense of simplicity. In a single linear list, like an Inbox, its much easier to see what's happening than in a hierarchical task list where some subtasks may be collapsed but relevant.

Calendar

Calendar interfaces use a grid structure that maps to our understanding of time, with consistent rows and columns representing dates and times. This allows people to make use of both temporal memory and spatial memory to locate and contextualize items. Calendars also typically provide high level (month) and detailed (day) views of what's happening.

When it comes to scheduling agents, a calendar makes a lot of sense: just add it the same way you'd add a meeting. It's also helpful for contextually grouping the work of agents with actual meetings. "These tasks were all part of this project's brainstorm meeting." "I ran that task right after our one-on-one meeting." Representing the work of agents on a calendar can be tricky, though, as agents can run for minutes or many hours. And where should event-triggered agents should up on a calendar?

Coming back to Scott Jenson's quote at the start of this article, it takes a while to discover new paradigms and discover new forms. So it's quite likely as these interface patterns are adapted to agent management use cases, they'll evolve further and not end up looking much like their current selves. As David Hoang recently suggested, maybe agent management interfaces should learn from patterns found in Real-Time Strategy (RTS) games instead? Interesting...

The Receding Role of AI Chat

LukeW - Sun, 06/01/2025 - 2:00pm

While chat interfaces to AI models aren't going away anytime soon, the increasing capabilities of AI agents are making the concept of chatting back and forth with an AI model to get things done feel archaic.

Let me first clarify that I don't mean open-ended text fields where people declare their intent are going away. As I wrote recently there will be even more broad input affordances in software whether for text, image, audio, video, or more. When I say chat AIs, I mean applications whose primary mode of getting things done is through a back and forth messaging conversation with an AI model: you type something, the model responds, you type something... and on it goes until you get the output you need.

Anyone that's interacted with an application like this knows that the AI model's responses quickly get lost in conversation threads and producing something from a set of chat replies can be painful. This kind of interface isn't optimal for tasks like authoring a document, writing code, or creating slides. To account for this some applications now include a canvas or artifact area where the output of the AI model's work can go.

In these layouts, the chat interface usually goes from being a single-pane layout to a split-pane layout. Roughly half the UI for input in the form of chat and half of it for output in the form of a canvas or artifact viewer. In these kinds of applications, we already begin to see the prominence of chat receding as people move between providing input and reviewing, editing, or acting on output.

In this model, however, the onus is still on the user to chat back and forth with a model until it produces their desired output in the artifact or canvas pane. Agents (AI models to make use of tools) change this dynamic. People state their objectives and the AI model(s) plans which tools to use and how to accomplish their task.

Instead of each step being a back and forth chat between a person and an AI model, the vast majority, if not all, of the steps are coordinated by the model(s) itself. This again reduces the role of chat. The model(s) takes care of the back and forth and in most cases simply lets people know when its done so they can review and make use of its output.

When agents can use multiple tools, call other agents and run in the background, a person's role moves to kicking things off, clarifying things when needed, and making use of the final output. There's a lot less chatting back and forth. As such, the prominence of the chat interface can recede even further. It's there if you want to check the steps an AI took to accomplish your task. But until then it's out of your way so you can focus on the output.

You can see this UI transition in the AI workspace, Bench. The first version was focused on back and forth instructions with models to get things done: a single-pane AI chat UI. Then a split-paned interface put more emphasis on the results of these instructions with half the screen devoted to an output pane. Today Bench runs and coordinates agents in the background. So the primary interaction is kicking off tasks and reviewing results when they're ready.

In this UI, the chat interface is not only reduced to less than a fourth of the screen but also collapsed by default hiding the model's back and forth conversations with itself unless people want to dig into it.

When working with AI models this way, the process of chatting back and forth to create things within in messaging UI feels dated. AI that takes your instructions, figures out how to get things done using tools, multiple models, changeable plans, and just tells you when it's finished feels a lot more like "the future". Of course I put future in quotes because at the rate AI moves these days the future will be here way sooner than any of us think. So... more UI changes to come!

The Receding Role of AI Chat

LukeW - Sun, 06/01/2025 - 2:00pm

While chat interfaces to AI models aren't going away anytime soon, the increasing capabilities of AI agents are making the concept of chatting back and forth with an AI model to get things done feel archaic.

Let me first clarify that I don't mean open-ended text fields where people declare their intent are going away. As I wrote recently there will be even more broad input affordances in software whether for text, image, audio, video, or more. When I say chat AIs, I mean applications whose primary mode of getting things done is through a back and forth messaging conversation with an AI model: you type something, the model responds, you type something... and on it goes until you get the output you need.

Anyone that's interacted with an application like this knows that the AI model's responses quickly get lost in conversation threads and producing something from a set of chat replies can be painful. This kind of interface isn't optimal for tasks like authoring a document, writing code, or creating slides. To account for this some applications now include a canvas or artifact area where the output of the AI model's work can go.

In these layouts, the chat interface usually goes from being a single-pane layout to a split-pane layout. Roughly half the UI for input in the form of chat and half of it for output in the form of a canvas or artifact viewer. In these kinds of applications, we already begin to see the prominence of chat receding as people move between providing input and reviewing, editing, or acting on output.

In this model, however, the onus is still on the user to chat back and forth with a model until it produces their desired output in the artifact or canvas pane. Agents (AI models to make use of tools) change this dynamic. People state their objectives and the AI model(s) plans which tools to use and how to accomplish their task.

Instead of each step being a back and forth chat between a person and an AI model, the vast majority, if not all, of the steps are coordinated by the model(s) itself. This again reduces the role of chat. The model(s) takes care of the back and forth and in most cases simply lets people know when its done so they can review and make use of its output.

When agents can use multiple tools, call other agents and run in the background, a person's role moves to kicking things off, clarifying things when needed, and making use of the final output. There's a lot less chatting back and forth. As such, the prominence of the chat interface can recede even further. It's there if you want to check the steps an AI took to accomplish your task. But until then it's out of your way so you can focus on the output.

You can see this UI transition in the AI workspace, Bench. The first version was focused on back and forth instructions with models to get things done: a single-pane AI chat UI. Then a split-paned interface put more emphasis on the results of these instructions with half the screen devoted to an output pane. Today Bench runs and coordinates agents in the background. So the primary interaction is kicking off tasks and reviewing results when they're ready.

In this UI, the chat interface is not only reduced to less than a fourth of the screen but also collapsed by default hiding the model's back and forth conversations with itself unless people want to dig into it.

When working with AI models this way, the process of chatting back and forth to create things within in messaging UI feels dated. AI that takes your instructions, figures out how to get things done using tools, multiple models, changeable plans, and just tells you when it's finished feels a lot more like "the future". Of course I put future in quotes because at the rate AI moves these days the future will be here way sooner than any of us think. So... more UI changes to come!

Syndicate content
©2003 - Present Akamai Design & Development.