Internet News
Smashing Conf: Is Atomic Design Dead?
In his Is Atomic Design Dead? presentation at Smashing Conf New York, Brad Frost discussed the history of design systems and today's situation especially in light of very capable AI models than can generate code and designs. Here's my notes on his talk.
- Websites started as HTML and CSS. People began to design websites in Photoshop and as the number of Web sites and apps increased, the need for managing a brand and style across multiple platforms became clear. To manage this people turned to frameworks and component libraries which resulted in more frameworks and tools that eventually got integrated into design tools like Figma. It's been an ongoing expansion...
- There's been lots of change over the years but at the highest level, we have design systems and products that use them to enforce brand, consistency, accessibility, and more.
- Compliance to design systems pushes from one side and product needs push from the other. There needs to be a balance but currently the gap between the two is growing. A good balance is achieved through a virtuous cycle between product and systems.
- The atomic design system tried to intentionally define use of atoms, molecules, organisms, templates, and pages to bridge the gap between the end state of a product and a design system.
- As an industry, we went too far in resourcing design systems and making them a standalone thing within a company. They've been isolated.
- Design system makers can't be insular. They need to reach out to product teams and work with them. They need to be helping product teams achieve their goals.
- What if there were one global design system with common reusable components? Isn't that what HTML is for? Yes, but it's insufficient because we're still rebuilding date pickers everywhere.
- Open UI tracks popular design systems and what's in them. It's a start to seeing what global component needs for the Web could look like.
- Many pattern libraries ship with an aesthetic and people need to tweak it. A global design system should be very vanilla so you can style it as much as you want.
- The Web still has an amazing scale of communication and collaboration. We need to rekindle the ideas of the early Web. We need to share and build together to get to a common freely usable design system.
- AI models can help facilitate design system work. Today they do an OK job but in the future, fine-tuned models may create custom components on the fly. They can also translate between one design system and another or translate across programming languages.
- This methodology could help companies translate existing and legacy code to new modern design systems. Likewise sketches or mockups could be quickly translated directly to design system components thereby speeding up processes.
- Combining design system specifications with large language models allows you to steer AI generations more directly toward the right kind of code and components.
- When product experiences are more dynamic (can be built on the fly), can we adapt them to individual preferences and needs? Like custom styles or interactions.
- AI is now part of our design system toolkit and design systems are part of our AI toolkit.
- But the rapid onset of AI also raises higher level questions about what designers and developers should be doing in the future? We're more than rectangle creators. We think and feel which differentiates us from just production level tasks. Use your brains, your intuition, and whole self to solve real problems.
Smashing Conf: How to Use AI to Build Accessible Products
In her How to Use AI to Build Accessible Products presentation at Smashing Conf New York, Carie Fisher discussed using AI coding tools to test and suggest fixes for accessibility issues in Web pages. Here's my notes on her talk.
- AI is everywhere. You can use it to write content, code, create images, and more. It impacts how everyone will work.
- But ultimately, AI is just a tool but it might not always be the right one. We need to find the tasks where it has the potential to add value.
- Over 1 billion people on the planet identify as having a disability. Accessible code allows them to access digital experiences and helps companies be complaint with emerging laws requiring accessible Web pages and apps. Businesses also get SEO, brand, and more benefits from accessible code.
- AI tools like Github Copilot can find accessibility issues in seconds consistently, especially compared to the manual checks currently being done by humans. AI can also spot patterns across a codebase and suggest solutions.
- Existing AI coding tools like Github Copilot are already better than Linters for finding accessibility issues.
- AI can suggest and implement code fixes for accessibility issues. It can also be added to CI/CD pipelines to check for accessibility issues at the point of each commit. AI can also serve as an accessibility mentor for developers by providing real-time suggestions.
- More complex accessibility issues especially those that need user context may go unfound when just using AI. Sometimes AI output can be incomplete or hallucinate solutions that are not correct. As a result, we can't over rely on just AI to solve all accessibility problems. We still need human review today.
- To improve AI accessibility, provide expanded prompts that reference or include specifications. Code reviews can double check accessibility suggestions from AI-based systems. Regularly test and refine your AI-based solutions to improve outcomes.
- Combing AI and human processes and values can help build a culture of accessibility.
Ask Luke: Streaming Inline Images
Since launching the Ask Luke feature on this site last year, we've added the ability for the system to respond to questions about product design by citing articles, videos, audio, and PDFs. Now we're introducing the ability to cite the thousands of images I've created over the years and reference them directly in answers.
Significant improvements in AI vision models have given us the ability to quickly and easily describe visual content. I recently outlined how we used this capability to index the content of PDF pages in more depth making individual PDF pages a much better source of content in the Ask Luke corpus.
We applied the same process and pipeline to the thousands of images I've created for articles and presentations over the years. Essentially, each image on my Website gets parsed by a vision model and we add the resulting text-based description to the set of content we can use to answer people's design questions. Here's an example of the kinds of descriptions we're creating. As you can see, the descriptions can get pretty detailed when needed.
If someone asks a question where an image is a key part of the answer, our replies not only return streaming text and citations but inline images as well. In this question asking about Amazon's design changes over the years, multiple images are included directly in the response.
Not only are images displayed where relevant, the answer refers to them and often refers to the contents of the image. In the same Amazon navigation example, the answer refers to the green and white color scheme of the image in addition to its contents.
Now that we've got citations and images steaming inline in Ask Luke responses, perhaps adding inline videos and audio files queued to relevant timestamps might be next? We're already integrating those in the conversational UI so why not... AI is a hell of a drug.
Further ReadingAdditional articles about what I've tried and learned by rethinking the design and development of my Website using large-scale AI models.
- New Ways into Web Content: rethinking how to design software with AI
- Integrated Audio Experiences & Memory: enabling specific content experiences
- Expanding Conversational User Interfaces: extending chat user interfaces
- Integrated Video Experiences: adding video experiences to conversational UI
- Integrated PDF Experiences: unique considerations when adding PDF experiences
- Dynamic Preview Cards: improving how generated answers are shared
- Text Generation Differences: testing the impact of AI new models
- PDF Parsing with Vision Models: using AI vision models to extract PDF contents
- Streaming Citations: citing relevant articles, videos, PDFs, etc. in real-time
- Streaming Inline Images: indexing & displaying relevant images in answers
Big thanks to Sidharth Lakshmanan and Sam Breed for the development help.
Scaling Platforms Through Use Cases
New technology companies often have grand ambitions. And for good reasons - ambitious plans help recruit talent, raise capital, and set the bar high. But progress toward these high-level goals relies on identifying and excelling at much lower-level use cases.
It's very common for new technology companies to aspire being "the platform for... the Internet of things, AI analytics, mobile testing, etc." Being a platform means you capture a lot of uses cases or to put it more simply... people use your service for a lot of different things. And more use equals more value.
But vision is not the same as strategy. Vision is about the end goal. It paints a picture of the future state you're aiming for. It’s what you want to achieve. Strategy, on the other hand, is how you get there.
When you use a broad vision as a strategy, you end up having a hard time making decisions and rationalizing a never-ending set of opinions. With a strategy like “we’ll be the platform for the Internet of things”, everyone has an opinion on how things on the Internet should work -which one do we listen to?
Consider instead a specific market for the Internet of things, like home automation, and an even more specific use case for home automation like "controlling the temperature in your house". It's much easier to evaluate decisions about what a good experience for controlling the temperature in your house is than for "we’ll be the platform for the Internet of things”.
But if you focus on such a narrow use case, how will you ever build a big business? I'm not suggesting abandoning the big vision instead I'm advocating for having a strategy based on solving concrete uses cases to get there. Let's look at another example: Yelp.
Today, Yelp is used for recommendations for all kinds of services: skydiving training, auto body shops, tea parlors, and more. But it didn't start that way. Yes, Yelp likely started with the ambitious vision of being a platform for all service recommendations. But it first launched in San Francisco with restaurant reviews. A very specific market and very specific use case.
Why start with restaurants? A good starting use case is the one with the most acute pain. In the context of services, people need to eat three times a day. They get their hair cut once a month and maybe need a plumber once a year. So where should Yelp start? Probably restaurants.
When solving for a specific use case, it's important to build with the bigger vision in mind and not paint yourself into a corner of only being useful for one thing. But you definitely have to be great at solving each use case your platform supports. How else will you convince people to adopt your solution? Once you can demonstrate clear value for a specific use case, you can tackle more (likely adjacent ones).
This way of scaling ensures your solution is actually good at addressing a concrete problem people have not just an abstract vision. When you hear "What's your platform for? Well... you can use it for pretty much anything." in a sales pitch, that's a warning sign.
When you instead address specific use cases well, you learn what parts of your platform matter the most by identifying patterns and doubling down on them. It's only from solving highly specific use cases that you actually get to a platform that can be broadly used for many different things. And why Amazon started by only selling books on the Web.
Ask LukeW: Streaming Citations
The Ask Luke feature on this site uses the thousands of articles, hundreds of PDFs, dozens of videos, and more I've created over the years to answer people's questions about digital product design. Since it launched a year ago, we've been iterating on the core of the Ask Luke system: retrieving relevant content to improve answers.
The most important job of any product interface is making its value clear and accessible to people. Most apps resort to some form of onboarding to accomplish this, but it's exponentially more impactful to experience value than to be told it exists. Likewise it's much more effective to learn through using an interface than through a tutorial explaining it.
These two factors make the seemingly simple job of "getting people to product value" quite difficult. Compounding the issue is that fact that interface solutions that accomplish this often feel simple and obvious -but only after they're uncovered. So iterating to an interface that intuitively conveys value and purpose is usually an iterative process.
That's a long-winded introduction, but it's important context for the changes we made to Ask Luke. The purpose and value of this feature is to pull the most relevant bits of my writings, videos, audio, and files together to answer people's questions about digital product design. So we made a bunch of changes to make that even more front and center -to make how Ask Luke works more obvious.
Now as answers to people's questions stream in, we add citations to the relevant articles, videos, PDF, etc. being used to answer a question in real-time. We also add these citations to the list of sources on the right dynamically instead of all at once before a question is answered.
Before people were able to select any given source and view it in the Ask Luke conversational UI. With these updates, they are also taken to the relevant part of a source: to the relevant point in a video; to the relevant page in a PDF. Since this is easier to see than read about, here's a quick video demonstrating these changes and hopefully making the value and purpose of Ask Luke a bit more obvious.
Further Reading- Integrated Audio Experiences & Memory: enabling specific content experiences within a conversational UI
- Expanding Conversational User Interfaces: extending chat user interfaces to better support AI capabilities
- Integrated Video Experiences: adding video-specific experiences within conversational UI
- Integrated PDF Experiences: unique considerations when adding PDF experiences
- Dynamic Preview Cards: created on the fly to improve sharing answers
- Text Generation Differences: testing the impact of AI new models
- PDF Parsing with Vision Models: using AI vision models to extract content form PDFs
Big thanks to Sidharth Lakshmanan and Sam Breed for the engineering lift on these changes.
Google Glass in an AI World
I often use surfing as a metaphor for new technology. Go too early and you don't catch the wave. Go too late and you don't catch it either. Similarly next generation hardware or software may be too early for its time. I found myself wondering if this was the case for Google Glass and AI.
For those who don't remember, Google Glass was an early augmented reality headset that despite early excitement was ultimately shuttered. I spent time with the developer version of Google Glass in 2013 and, while promising, didn't think it was ready. But the technical capabilities of the device were impressive especially for its time. Glass featured:
- a camera for taking photos and video
- a microphone for accepting voice commands
- a speaker for audio input only you could hear (bone conduction)
- a mini projector to display information and interface controls in the corner of your field of vision
- a trackpad for controlling the interface and voice commands
- a number of sensors for capturing and reacting to device movement, like head gestures
- WiFi and Bluetooth connectivity
What Google Glass didn't have is AI. That is, vision and language models that can parse and react to audio and video from the real World. As I illustrated in a look at early examples of multi-modal personal assistants: faced with a rat's nest of signs, you want to know if it's ok to park your car. A multi-modal assistant could take an image (live camera feed or still photo), a voice command (in natural language), and possibly some additional context (time, location, historical data) as input and assemble a response (or action) that considers all these factors.
Google Glass had a lot of the technical capabilities (except for processing power) to make this possible in a lightweight form factor. Maybe it just missed the AI wave.
iOS18 Photos: Tab Bar to Single Scroll View
The most significant user interface change from iOS 17 to iOS 18 are the navigation differences in Apple's Photos app. The ubiquitous tab bar that's became the default navigation model in mobile apps is gone and in its place is one long scrolling page. So how does it work and why?
Most mobile applications have adopted a bottom bar for primary navigation controls. On Android it's called bottom navigation and on iOS, a tab bar, but the purpose is the same: make the top-level sections of an application visible and let people move between them.
And it works. Across multiple studies and experiments, companies found when critical parts of an application are made more visible, usage of them increases. For example, Facebook saw that not only did engagement go up when they moved from a “hamburger” menu to a bottom tab bar in their iOS app, but several other important metrics went up as well. Results like this made use of tab bars grow.
But in iOS 18, Apple removed the tab bar in their Photos app. Whereas the prior version had visible tabs for the top-level sections (Library, For You, Albums, Search), the redesign is just a single scroll view. The features previously found in each tab are now accessed by scrolling up and down vs. switching between tabs. One notable exception is Search which stays anchored at the top of the screen.
In addition to the persistent Search button, there's also a Select action and user profile image that opens a sheet with account settings. As you scroll up into your Photo library a persistent set of View controls appears at the bottom of the screen as well. The Close action scrolls you to the end of your Photo library and reveals a bit of the actions below making the location of features previously found in tabs more clear.
It's certainly a big change and given the effectiveness of tab bars, its also a change that has people questioning why? I have no inside information on Apple's decision-making process here but based on what I've learned about how people use Google Photos, Yahoo! Photos, and Flickr, I can speculate.
- By far the dominant use of a Photo gallery is scrolling to find an image whether to share, view, or just browse.
- Very few people organize their photo libraries and those that do, do it rarely.
- People continue to have poor experiences with searching images, despite lots of improvements, so they default to browsing when trying to find photos.
- Most automatic curation features like those found in For You just get ignored.
All that together can easily get you to the design answer of "the app should just be a scrolling list of all your Photos". Of course there's trade-offs. The top-level sections, and their features are much less visible, and thereby less obvious. The people who do make use of features like Albums and Memories now need to scroll to them vs. tapping once. But as iOS18 rolls out to everyone in the Fall, we'll see if these trade-offs were worth it.
A Visual Approach to Help Pages
As the functionality and scope of Web sites and applications has grown over the years, so has the prevalence of Help pages. Nearly every feature has an explanatory article outlining how to use it and why. But most Help pages are walls of text making them hard to act on. So a few years ago, we tried something different.
First let's look at the status quo. This Help page from Amazon is both pretty typical and by those standards, pretty good. It's specific to one topic, brief, outlines steps clearly, and includes links to help people accomplish their intended task. Companies iterated to these kinds of Help pages because they mostly work and because they're less work.
Keeping Help text up to date and accurate is less labor-intensive than updating images or videos with the same information. But as the old saying goes, a picture is worth a lot of words and there's a reason many people turn to video tutorials to learn how to do things instead of reading about how to do them.
When building Polar several years ago, we wanted a more approachable and fun way of helping people learn how to use our product. And while you might say "the best Help pages are no Help pages -just make your app easy to use" not all Help pages are smearing over usability issues. Some introduce higher level concepts, others outline capabilities, and some serve as marketing for specific features.
So with those goals in mind, we iterated to a simple formula. Each concept or feature gets a Help page that has a title alongside 1-2 sentences and as many sections consisting of a title, 1-2 sentences, plus a graphic as needed.
This approach meant people primarily relied on images (or their alt tags if visually impaired) to figure out how to get things done. So we iterated a fair amount on the images to find the right balance of detail and abstraction. Make the UI too realistic and it becomes hard to focus on the relevant elements. Realistic UI images also need updating anytime the actual product UI changes. Conversely, make the image too simplistic and it doesn't provide enough detail for people to actually learn how to do things.
Of course, not all Help topics are well suited to an image but the process of trying to create one often triggers ideas on how to simplify the actual UI or concepts within a product. So it's worth the iteration.
But is a visual approach to Help pages able to scale? Assuming it works, can companies invest the time and effort needed to generate all these images and keep them up to date? Perhaps in a time of image generation AI models, it's increasingly possible through automated or supervised pipelines. Time will tell!
Intent-driven User Interfaces
Increasingly when I see designers defaulting to more Ul controls and form elements in software interface designs, I encourage them to consider the implications of intent-driven instructions. Here's why...
For years l've used this image of Adobe Illustrator's user interface evolution to highlight the continuous march of "more features, more Ul" that drives nearly every software company's releases. The end result for end users is more functions they don't know about and don't use. Not great.
So what's the alternative? Perhaps something like Christian Cantrell's Photoshop assistant demos. In this series of videos, Christian uses natural language instructions connected to Photoshop's APIs to do things like mask the subject of a series of photos, blur the background in images, create layers and more. All without needing to know how and without clicking a bunch of windows, icons, menus, and pointers (WIMP).
Intent-driven instructions to mask the subject of multiple images in Photoshop:
Intent-driven instructions to mask the blur the backgrounds of multiple images in Photoshop:
Intent-driven instructions to create layers and objects in Photoshop:
While these kinds of interactions won't immediately replace conventional graphical user interface controls, it's pretty clear they enable a new way of control software with hundreds of features... just tell it what you want to do.
Distraction Control for the Web
Browsing the Web on your smartphone these days can feel like a gauntlet: accept this cookie consent, close this newsletter promo, avoid this app install banner. This morass of attention-seeking actions makes it hard to focus on content. Enter Apple's Distraction Control feature.
There's more than 7 billion active smartphones on the planet. This is the Web they are getting.
I won't get into how the Web became a minefield of pop-ups, banners, overlays, modals, and other forms of annoyance. For that you can take a look at my Mind the Gap presentation which goes into depth on why and what designers can do about it. But it's pretty clear the average mobile Web experience sucks.
And when things suck, people usually decide to do something about it. In this case, with iOS 18, Apple is giving average folks a chance to fight back with Distraction Control. When turned on, this new feature allows anyone to remove distracting elements on Web pages complete with a satisfying animation.
Newsletter pop-up? Boom, gone. Mobile app banner? Boom. Interstitial ad? Boom. Is it perfect? No. Elements might come back after you remove them if the page is reloaded. Accessing the control takes a few taps. But it's a way for people to fight back against Web clutter and we need more.
The Death of Lorem Ipsum
For years, designers have used Lorem Ipsum text as a placeholder in interface design layouts. But unless you're designing a pseudo-Latin text reader, using actual content provides a much more realistic picture of what a UI design needs to support. Today Large Language Models (LLMs) can provide designers with highly relevant content instantly so Lorem Ipsum can finally die.
It's long been argued (well at least by me in 2019) that using Lorem Ipsum text to mock up application interfaces fails to represent real content, often leading to usability issues and unrealistic designs that don't account for actual text lengths, line breaks, or content hierarchy in a final product. But Lorem Ipsum persisted as a design tool of choice because getting real content was hard.
To get very realistic content, designers would need access to where real content existed or pester engineers or domain experts to collect realistic content for them. It's not hard to see why some of these requests took a while or never got prioritized. And while some teams took the time to build tooling that enabled more realistic content in the design process, Lorem Ipsum was a much easier path for most.
Today, Large Language Models (LLMs) can not only generate sample content but also create highly specific and relevant content for just about any application you're designing. And given these tools are fast, widely available and free, there's no excuse to not use very realistic content in application designs. For example, if designing a food delivery app. A few prompts will give you real content, real quick.
So there's no excuses for Lorem Ipsum no more.
A Proliferation of Terms
When working through the early stages of a product design, it's common that labels for objects and actions emerge organically. No one is overly concerned about making these labels consistent (yet). But if this proliferation of terms doesn't get reined in early, both product design and strategy get harder.
Do we call it a library, a folder, a collection, a workspace, a section, a category, a topic? How about a document, page, file, entry, article, worksheet? And.. what's the difference? While these kinds of decisions might not be front and center when working out designs for a product or feature, they can impact a lot.
For starters, having clear definitions for concepts helps keep teams on the same page. When engineering works on implementing a new object type, they're aligned with what design is thinking, which is what the sales team is pitching potential customers on. Bringing a product to life is hard enough, why complicate things by using different terms for similar things or vice versa?
Inconsistent terms are obviously also a comprehension issue for the people using our products. "Here's it's called a Document, there it's called an Article. Are those the same?" Additionally, undefined terms often lead to miscellaneous bins in our user interfaces. "What's inside Explore?" When the definition of objects and actions isn't clear, what choice do we have but to drop them into vague sounding containers like Discover?
The more a product gets developed (especially by bigger teams) the more things can diverge because people's mental model of what terms mean can vary a lot. So it's really useful to proactively put together a list of the objects and actions that make up an application and draft some simple one-liner definitions for each. These lists almost always kick off useful high-level discussions within teams on what we're building and for who. Being forced to define things requires you to think them through: what is this feature doing and why?
And of course, consistent labels also ease comprehension for users. Once people learn what something means, they'll be able to apply that knowledge elsewhere -instead of having to contend with mystery meat navigation.
Ask LukeW: PDF Parsing with Vision Models
Over the years, I've given more than 300 presentations on design. Most of these have been accompanied by a slide deck to illustrate my points and guide the narrative. But making the content in these decks work well with the Ask Luke conversational interface on this site has been challenging. So now I'm trying a new approach with AI vision models.
To avoid application specific formats (Keynote, PowerPoint), I've long been making my presentation slides available for download as PDF documents. These files usually consist of 100+ pages and often don't include a lot of text, leaning instead on visuals and charts to communicate information. To illustrate, here's of few of these slides from my Mind the Gap talk.
In an earlier article on how we built the Ask Luke conversational interface, I outlined the issues with extracting useful information from these documents. I wanted the content in these PDFs to be available when answering people's design questions in addition to the blog articles, videos and audio interviews that we were already using.
But even when we got text extraction from PDFs working well, running the process on any given PDF document would create many content embeddings of poor quality (like the one below). These content chunks would then end up influencing the answers we generated in less than helpful ways.
To prevent these from clogging up our limited context (how much content we can work with to create an answer) with useless results, we set up processes to remove low quality content chunks. While that improved things, the content in these presentations was no longer accessible to people asking questions on Ask Luke.
So we tried a different approach. Instead of extracting text from each page of a PDF presentation, we ran it through an AI vision model to create a detailed description of the content on the page. In the example below, the previous text extraction method (on the left) gets the content from the slide. The new vision model approach (on the right) though, does a much better job creating useful content for answering questions.
Here's another example illustrating the difference between the PDF text extraction method used before and the vision AI model currently in use. This time instead of a chart, we're generating a useful description of a diagram.
This change is now rolled out across all the PDFs the Ask Luke conversational interface can reference to answer design questions. Gone are useless content chunks and there's a lot more useful content immediately available.
Thanks to Yangguang Li for the dev help on this change.
Ask LukeW: Text Generation Differences
As the number of highly capable large language models (LLMs) released continues to quickly increase, I added the ability to test new models when they become available in the Ask Luke conversational interface on this site.
For context there's a number of places in the Ask Luke pipeline that make use of AI models to transform, clean, embed, retrieve, generate content and more. I put together a short video that explains how this pipeline is constructed and why if you're interested.
Specifically for the content generation step, once the right content is found, ranked, and assembled into a set of instructions, I can select which large language model to send these instructions to. Every model gets the same instructions unless they can support a larger context window. In which case they might get more ranked results than a model with a smaller context size.
Despite the consistent instructions, switching LLMs can have a very big impact on answer generation. I'll leave you to guess which of these two answers is powered by OpenAI's GPT-4 and which one comes from Antrhopic's new (this week) Claude 3.5 Sonnet.
Some of you might astutely point out that the instruction set could be altered in specific ways when changing models. Recently, we've found the most advanced LLMs to be more interchangeable than before. But there's still differences in how they generate content as you can clearly see in the example above. Which one is best though... could soon be a matter of personal preference.
Thanks to Yangguang Li and Sam for the dev help on this feature.