Web Standards
Video: Structuring Website Content with AI
To create useful conversational interfaces for specific sets of content like this Website, we can use a variety of AI models to add structure to videos, audio files, and text. In this 2.5 minute video from my How AI Ate My Website talk, I discuss how and also illustrate if you can model a behavior, you can probably train a machine to do it at scale.
TranscriptThere's more document types than just web pages. Videos, podcasts, PDFs, images, and more. So let's look at some of these object types and see how we can break them down using AI models in a way that can then be reassembled into the Q&A interface we just saw.
For each video file, we first need to turn the audio into written text. For that, we use a speech-to-text AI model. Next, we need to break that transcript down into speakers. For that, we use a diarization model. Finally, a large language model allows us to make a summary, extract keyword topics, and generate a list of questions each video can answer.
We also explored models for identifying objects and faces, but don't use them here. But we did put together a custom model for one thing, keyframe selection. There's also a processing step that I'll get to in a bit, but first let's look at this keyframe selection use case.
We needed to pick out good thumbnails for each video to put into the user interface. Rather than manually viewing each video and selecting a specific keyframe for the thumbnail, we grabbed a bunch automatically, then quickly trained a model by providing examples of good results. Show the speaker, eyes open, no stupid grin.
In this case, you can see it nailed the which Paris girl are you backdrop, but left a little dumb grin, so not perfect. But this is a quick example of how you can really think about having AI models do a lot of things for you.
If you can model the behavior, you can probably train a machine to do it at scale. In this case, we took an existing model and just fine-tuned it with a smaller number of examples to create a useful thumbnail picker.
In addition to video files, we also have a lot of audio, podcasts, interviews, and so on. Lots of similar AI tasks to video files. But here I wanna discuss the processing step on the right.
There's a lot of cleanup work that goes into making sure our AI generated content is reliable enough to be used in citations and key parts of the product experience. We make sure proper nouns align, aka Luke is Luke. We attach metadata that we have about the files, date, type, location, and break it all down into meaningful chunks that can be then used to assemble our responses.
Video: Expanding Conversational Interfaces
In this 4 minute video from my How AI Ate My Website talk, I illustrate how focusing on understanding the problem instead of starting with a solution can guide the design of conversational (AI-powered) interfaces. So they don't all have to look like chatbots.
TranscriptBut what if instead we could get closer to the way I'd answer your question in real life? That is, I'd go through all the things I've written or said on the topic, pull them together into a coherent reply, and even cite the sources, so you can go deeper, get more context, or just verify what I said.
In this case, part of my response to this question comes from a video of a presentation just like this one, but called Mind the Gap. If you select that presentation, you're taken to the point in the video where this topic comes up. Note the scrubber under the video player.
The summary, transcript, topics, speaker diarization, and more are all AI generated. More on that later, but essentially, this is what happens when a bunch of AI models effectively eat all the pieces of content that make up my site and spit out a very different interaction model.
Now the first question people have about this is how is this put together? But let's first look at what the experience is, and then dig into how it gets put together. When seeing this, some of you may be thinking, I ask a question, you respond with an answer.
Isn't that just a chatbot? Chatbot patterns are very familiar to all of us, because we spend way too much time in our messaging apps. The most common design layout of these apps is a series of alternating messages. I say something, someone replies, and on it goes. If a message is long, space for it grows in the UI, sometimes even taking up a full screen.
Perhaps unsurprisingly, it turns out this design pattern isn't optimal for iterative conversations with sets of documents, like we're dealing with here. In a recent set of usability studies of LLM-based chat experiences, the Nielsen-Norman group found a bunch of issues with this interaction pattern, in particular with people's need to scroll long conversation threads to find and extract relevant information. As they called out, this behavior is a significant point of friction, which we observed with all study participants.
To account for this, and a few additional considerations, we made use of a different interaction model, instead of the chatbot pattern. Through a series of design explorations, we iterated to something that looks a little bit more like this.
In this approach, previous question and answer pairs are collapsed, with a visible question and part of its answer. This enables quick scanning to find relevant content, so no more scrolling massive walls of text. Each question and answer pair can be expanded to see the full response, which as we saw earlier can run long due to the kinds of questions being asked.
Here's how things look on a large screen. The most recent question and answer is expanded by default, but you can quickly scan prior questions, find what you need, and then expand those as well. Net-net, this interaction works a little bit more like a FAQ pattern than a chatbot pattern, which kind of makes sense when you think about it. The Q&A process is pretty similar to a help FAQ. Have a question, get an answer.
It's a nice example of how starting with the problem space, not the solution, is useful. I bring this up because too often designers start the design process with something like a competitive audit, where they look at what other companies are doing and, whether intentionally or not, end up copying it, instead of letting the problem space guide the solution.
In this case, starting with understanding the problem versus looking at solutions got us to a more of a FAQ thing than a chatbot thing. So now we have an expandable conversational interface that collapses and extends to make finding relevant answers much easier.
AI Models Enable New Capabilities
In the introduction to my How AI Ate My Website talk, I frame AI capabilities as a set of language and vision operations that allows us to rethink how people experience Web sites. AI tasks like text summarization, speech to text, and more can be used to build new interactions with existing content as outlined in this short 3 minute video.
TranscriptHow AI Ate My Website What do most people picture of that title, AI Eating a Website? They might perhaps imagine some scary things, like a giant computer brain eating up web pages on its way to global dominance.
In truth though, most people today probably think of AI as something more like ChatGPT, the popular large language model from OpenAI. These kinds of AI models are trained on huge amounts of data, including sites like mine, which gives them the ability to answer questions such as, Who is Luke? ChatGPT does a pretty good job, so I guess I don't need an intro slide in my presentations anymore.
But it's not just my site that's part of these massive training sets. And since large language models are essentially predicting the next token in a sequence, they can easily predict very likely, but incorrect answers. For instance, it's quite likely a product designer like me went to CMU, but I did not. Even though ChatGPT keeps insisting that I did, in this case, for a master's degree.
No problem though, because of reinforcement learning, many large language models are tuned to please us. So correct them, and they'll comply, or veer off into weird spaces.
Let's zoom out to see this relationship between large language models and websites. A website like mine, including many others, has lots of text. That text gets used as training data for these immense auto-completion machines, like ChatGPT. That's how it gets the ability to create the kinds of responses we just looked at.
This whole idea of training giant machine brains on the totality of published content on the internet can lead people to conjure scary AI narratives.
But thinking in terms of a monolithic AI brain, isn't that helpful to understanding AI capabilities and how they can help us? While ChatGPT is an AI model, it's just one kind, a large language model. There's lots of different AI models that can be used for different tasks, like language operations, vision operations, and more.
Some models do more than one task, others are more specialized. What's very different from a few years ago though, is that general purpose models, things that can do a lot of different tasks, are now widely available and effectively free.
We can use these AI models to rethink what's possible when people interact with our websites, to enable experiences that were impossible before, to go from scary AI thing to awesome new capabilities, and hopefully make the web cool again, because right now, sorry, it's not very cool.
Early Glimpses of Really Personal Assistants
Recently I've stumbled into a workflow that's starting to feel like the future of work. More specifically, a future with really personal assistants that accelerate and augment people's singular productivity needs and knowledge.
"The future is already here – it's just not evenly distributed." -William Gibson, 2003Over the past few months, I've been iterating on a feature of this Website that answers people's digital product design questions in natural language using the over 2,000 text articles, 375 presentations, 100 videos, and more that I've authored over the past 28 years. While the project primarily started as testbed for conversational interface design, it's morphed into quite a bit more.
Increasingly, I've started to use the Ask Luke functionality as an assistant that knows my work almost as well as I do, can share it with others, and regularly expands its usefulness. For example, when asked a question on Twitter (ok, X) I can use Ask Luke to instantly formulate an answer and respond with a link to it.
Ask Luke answers use the most relevant parts of my archive of writings, presentations, and more when responding. In this case, the response includes several citations that were used to create the final answer:
- a video that begins that the 56:04 timestamp where the topic of name fields came up in a Q&A session after my talk
- a PDF of a presentation I gave on on Mobile checkout where specific slides outlined the pros and cons of single name fields
- and several articles I wrote that expanded on name fields in Web forms
It's not hard to see how the process of looking across thousands of files, finding the right slides, timestamps in videos, and links to articles would have taken me a lot longer than the ~10 seconds it takes Ask Luke to generate a response. Already a big personal productivity gain.
I've even found that I can mostly take questions as they come to me and produce responses as this recent email example shows. No need to reformat or adjust the question, just paste it in and get the response.
But what about situations where I may have information in my head but haven't written anything on the topic? Or where I need to update what I wrote in light of new information or experiences I've come across? As these situations emerged, we expanded the admin features for Ask Luke to allow me to edit generated answers or write new answers (often through audio dictation).
Any new or edited answer then becomes part of the index used to answer subsequent questions people ask. I can also control how much an edited or new answer should influence a reply and which citations should be prioritized alongside the answer. This grows the content available in Ask Luke and helps older content remain relevant.
Having an assistant that can accept instructions (questions) in the exact form you get them (no rewriting), quickly find relevant content in your digital exhaust (documents, presentations, recordings, etc.), assemble responses the way you would, cite them in detail, and help you grow your personal knowledge base... well it feels like touching the future.
And it's not hard to imagine how similar really personal assistants could benefit people at work, home, and school.
Further Reading- New Ways into Web Content: how AI enables a different way of interacting with a Web site
- Integrated Audio Experiences: enabling specific content experiences within a conversational UI
- Expanding Conversational User Interfaces: extending chat user interfaces to better support AI capabilities
- Integrated Video Experiences: adding video-specific experiences within conversational UI
- Integrated PDF Experiences: managing PD challenges within conversational UI
AI Models in Software UI
As more companies work to integrate the capabilities of powerful generative AI language and vision models into new and existing software, high-level interaction patterns are emerging. I've personally found these distinct approaches to AI integration useful for talking with folks about what might work for their specific products and use cases.
In the first approach, the primary interface affordance is an input that directly (for the most part) instructs an AI model(s). In this paradigm, people are authoring prompts that result in text, image, video, etc. generation. These prompts can be sequential, iterative, or un-related. Marquee examples are OpenAI's ChatGPT interface or Midjourney's use of Discord as an input mechanism. Since there are few, if any, UI affordances to guide people these systems need to respond to a very wide range of instructions. Otherwise people get frustrated with their primarily hidden (to the user) limitations.
The second approach doesn't include any UI elements for directly controlling the output of AI models. In other words, there's no input fields for prompt construction. Instead instructions for AI models are created behind the scenes as people go about using application-specific UI elements. People using these systems could be completely unaware an AI model is responsible for the output they see. This approach is similar to YouTube's use of AI models (more machine learning than generative) for video recommendations.
The third approach is application specific UI with AI assistance. Here people can construct prompts through a combination of application-specific UI and direct model instructions. These could be additional controls that generate portions of those instructions in the background. Or the ability to directly guide prompt construction through the inclusion or exclusion of content within the application. Examples of this pattern are Microsoft's Copilot suite of products for GitHub, Office, and Windows.
These entry points for AI assistance don't have to be side panels, they could be overlays, modals, inline menus and more. What they have in common, however, is that they supplement application specific UIs instead of completely replacing them.
Actual implementations of any of these patterns are likely to blur the lines between them. For instance, even when the only UI interface is an input for prompt construction, the system may append or alter people's input behind the scenes to deliver better results. Or an AI assistance layer might primarily serve as an input for controlling the UI of an application instead of working alongside it. Despite that, I've still found these three high-level approaches to be helpful in thinking through where and how AI models are surfaced in software applications.
Until the Right Design Emerges...
Too often, the process of design is cut short. When faced with user needs or product requirements, many designers draft a mockup or wireframe informed by what they've seen or experienced before. But that's actually when the design process starts, not ends.
"Art does not begin with imitation, but with discipline."—Sun Ra, 1956Your first design, while it may seem like a solution, is usually just an early definition of the problem you are trying to solve. This iteration surfaces unanswered questions, puts assumptions to the test, and generally works to establish what you need to learn next.
"Design is the art of gradually applying constraints until only one solution remains."—UnknownEach subsequent iteration is an attempt to better understand what is actually needed to solve the specific problem you're trying to address with your design. The more deeply you understand the problem, the more likely you are to land on an elegant and effective solution. The process of iteration is a constant learning process that gradually reveals the right path forward.
"True simplicity is, well, you just keep on going and going until you get to the point where you go... Yeah, well, of course." —Jonathan Ive, September, 2013When the right approach reveals itself, it feels obvious. But only in retrospect. Design is only obvious in retrospect. It takes iteration and discipline to get there. But when you do get there, it's much easier to explain your design decisions to others. You know why the design is the right one and can frame your rationale in the context of the problem you are trying to solve. This makes presenting designs easier and highlights the strategic impact of designers.
Multi-Modal Personal Assistants: Early Explorations
With growing belief that we're quickly moving to a world of personalized multi-modal software assistants, many companies are working on early glimpses of this potential future. Here's a few ways you can explore bits of what these kinds of interactions might become.
But first, some context. Today's personal multi-modal assistant explorations are largely powered by AI models that can perform a wide variety of language and vision tasks like summarizing text, recognizing objects in images, synthesizing speech, and lots more. These tasks are coupled with access to tools, information, and memory that makes them directly relevant to people's immediate situational needs.
To simplify that, here's a concrete example: faced with a rat's nest of signs, you want to know if it's ok to park your car. A personal multi-modal assistant could take an image (live camera feed or still photo), a voice command (in natural language), and possibly some additional context (time, location, historical data) as input and assemble a response (or action) that considers all these factors.
So where can you try this out? As mentioned, several companies are tackling different parts of the problem. If you squint a bit at the following list, it's hopefully clear how these explorations could add up to a new computing paradigm.
OpenAI's native iOS app can take image and audio input and respond in both text and speech using their most advanced large language model, GPT4... if you sign up for their $20/month GPT+ subscription. With an iPhone 15 Pro ($1,000+), you can configure the phone's hardware action button to directly open voice control in OpenAI's app. This essentially gives you an instant assistant button for audio commands. Image input, however, still requires tapping around the app and only works with static images not a real-time camera feed.
Humane's upcoming AI Pin (preorder $699) handles multiple inputs with a built in microphone, camera, touch surface, and sensors for light, motion, GPS, and more. It likewise, makes use of a network connection ($24/month) and Large Language Models to respond to natural language requests but instead of making use of your smartphone screen and microphone for output, it makes use of it's own speaker and laser projection display. Definitely on the "different" end of hardware and display spectrum.
Rewind's Pendant (preorder for $59) is a wearable that captures what you say and hear in the real world and then transcribes, encrypts, and stores it on your phone. It's mostly focused on the audio input side of a multi-modal personal assistant but the company's goal is to make use what the device captures to create a "personalized AI powered by truly everything you’ve seen, said, or heard."
New Computer's Dot app (not yet available) has released some compelling videos of a multi-modal personal assistant that runs on iOS. In particular, the ability to add docs and images that become part of a longer term personal memory.
While I'm sure more explorations and developed products are coming, this list let's you touch parts of the future while it's being sorted out... wrinkles and all.
Always Be Learning
The mindset to “always be learning” is especially crucial in the field of digital product design where not only is technology continouosly evolving, but so are the people we're designing for.
To quote Bruce Sterling, because people are “time bound entities moving from cradle to grave”, their context, expectations, and problems are always changing. So design solutions need to change along with them.
As a result, designers have to keep learning about how our products are being used, abused, or discarded and we need to feed those lessons back into our designs. Good judgement comes from experiences, and experience comes from bad judgements. Therefore, continuous learning is crucial for refining judgement and improving design outcomes.
"There’s the object, the actual product itself, and then there’s all that you learned. What you learned is as tangible as the product itself, but much more valuable, because that’s your future." -Jony Ive, 2014So how can we always be learning? Start with the mindset that you have a lot to learn and sometimes unlearn. Spend your time in environments that encourage deeper problem-understanding and cross-disciplinary collaboration. This means not just designing but prototyping as well. Design to build, build to learn.
Recognize the patterns you encounter along the way and make time to explore them. This extends what you've learned into a more broadly useful set of skills and better prepares you for the next set of things you'll need to learn.
Rapid Iterative Testing and Evaluation (RITE)
Rapid Iterative Testing and Evaluation or RITE is a process I've used while working at Yahoo! and Google to quickly make progress on new product designs and give teams a deeper shared understanding of the problem space they're working on.
RITE is basically a continuous process of designing and building a prototype, testing it with users, and making changes within a short period, typically a few days. The goal is to quickly identify and address issues, and then iterate on the design based on the what was learned. This gives teams regular face time with end users and collectively grows their knowledge of the needs, environments, and expectations of their customers.
The way I've typically implemented RITE is every Monday, Tuesday, and Wednesday, we design and build a prototype. Then every Thursday, we bring in people to use the prototype through a series of 3-5 usability tests that the whole team attends. On Friday, we discuss the results of that testing together and decide what to change during the following week. This cycle is repeated week after week. In some cases running for months.
This approach puts customers front and center in the design process and allows for quick adaptation to issues and opportunities each week. The RITE method is also useful because it provides insights not just opinions. In other words, if there's a debate about a design decision, we can simply test it with users that week. This squashes a lot of open-ended discussions that don't result in action because the cost of trying something out is incredibly low. "OK we'll try it."
The cadence of weekly user tests also really aligns teams on common goals as everyone participates in observing problems and opportunities, exploring solutions, and seeing the results of their proposals. Over and over again.
Smashing Conf: Journey in Enterprise UX
In her A Journey in Enterprise UX talk at Smashing Conf Antwerp, Stephanie Walter outlined her learnings doing UX research and design for internal enterprise users.
- Enterprise software is design complex due to a wide range of use cases and specific requirements. Most of the time it is ugly and hard to use but it doesn't have to be that way.
- An internal tool can lots of different user groups. Before you even start research, get familiar with the "as is" what are the processes, jargon, and what is currently in place.
- Quantitative data analysis lets learn what features get used and how much. Can also analyze the content of these features.
- Analyzing content allows you to remove duplicated content and rework the information architecture.
- To get internal users fro research, make friends with different departments, get referrals you'll find people who can help you improve the tools they work with.
- Most enterprise tools are very task orientated: learn how they do these tools, identify pain points, and content needed.
- User research questions: tell me about..., walk me through the steps, show me how you..., if you have a magic wand what would you change?
- Keep track and document everything. Even if it is out of scope, might be useful in the future.
- People are not used to user centered design processes, might need to dig to find the needs instead hearing solutions.
- Define priorities: list big pain points and needs, decides with the team on what is fast track vs. big topics
- Fast track: content and features that are low stakes and don't need extensive feedback therefore can be done quick.
- For bog topics, you need more data: gather existing information, schedule follow-up sessions, iterate on solutions, and do usability testing.
- Observational testing allows you to watch how people work and see where the issues are.
- If users have questions during the session, take notes and save them for the end to not bias testing.
- User diaries allow you to understand usage over a period of time. This helps find where people fallback to previous tools or processes.
- Don't oversimplify interfaces for people who need features to do their job. Progressive disclosure and customization options are useful.
- Content might be there for a reason but you're allowed to question that need.
- People want to work with the data, let them export or copy data to move it in and out of your tools.
- Find the small things to make people's live easier. There's lots of these opportunities in enterprise tools.
- Users don't care what data goes into what tool, but they care about too many clicks, especially for tasks they do regularly.
- Offer training: some people need and expect it, others won't so make it optional and in multiple formats. Training doesn't mean your UX is bad.
- Training can be used to collect user feedback, you can hear the questions they ask.
- Complex internal organizations can slow things down, be patient. Things don't change overnight.
- Understand what makes people click, and leverage it.
- Don't bring an opinion to a data fight: measure and bring proof. Have unbiased data.
- Enterprise users are starting to demand better tools and experience. Make the process of designing internal tools visible to users so they understand the rationale behind designs.
- Get champions and advocates in your user base.
- Complexity is scary, break it into pieces and tackle it small parts at a time. User research helps you connect the pieces.
Smashing Conf: UX Writing with a Point of View
In his Designing a Product with a Point of View talk at Smashing Conf Antwerp, Nick DiLallo described the role of writers in defining a unique product personality and brand.
- With placeholder content, it's hard to evaluate the interface. Words help make products simple and clear but also provide a personality.
- The first step to writing is defining your audience. This helps inform more than words.
- When creating an audience, don't been too broad: "people who watch movies" vs. "film obsessives" helps you more more decisions.
- Another way to focus an audience definition is to add "people who.." The point is provide focus for designs.
- Say something interesting. Start with a sentence to plant a flag or establish a point of view.
- A lot of companies use words like "fast, simple, or fun". But this sounds like everyone, so it's not interesting.
- Sometimes we define a feature but instead of "Keep track of your runs." consider "Compete with thousands of runners". The sentences can help guide a lot of design decisions.
- Write out words to describe features and content in your product. This communicates a perspective on what you are doing.
- Think really deeply on what words to use in the interface and why. There's many ways to frame the same action.
- Not all parts of an interface need to be creative, some require conventional labels to be clear like: add to cart.
- What you include is what you care about. "we think this is important..." What you include communicates a point of view.
- Bigger means more important. What you emphasize communicates what you care about.
- Writers look for opportunities to communicate in an interface. Even tiny moments (like the footer) can say a lot about who you are and how you think.
- You can also overdo it. Be careful about adding brand voice in places that don't need it. Places like maps, calendars, might not need a lot of brand voice.
- It's not just words but the entire interface that communicates with users.
- When you work in UX you have to make hard decisions about how to surface potentially offensive issues: gender, race, nationalities, etc.
- Do what you write. For example "free trail" with a credit card screen. Clear and simple words should not kick complexity down the can.
- Writing can show what's broken with the UX.
Generative Agents
In his AI Speaker Series presentation at Sutter Hill Ventures, Joon Park discussed his work on generated AI agents, their architecture, and what we might learn form them about human behavior. Here's my notes from his talk:
- Can we create computer generated behavior that simulates human behavior in a compelling way? While this has been very complicated to date, LLMs offer us a new way to tackle the problem.
- The way we behave and communicate is much too vast and too complex for us to be able to create with existing methods.
- Large language models (LLMs) are trained on broad data that reflects our lives, like the traces on our social web, Wikipedia, and more. So these models include a tremendous amount about us, how we live, talk, and behave.
- With the right method, LLMs can be transformed into the core ingredient that has been missing in the past that will enable us to simulate human behavior.
- Generative agents are a new way to simulate human behavior using LLMs. They are complemented with an agent architecture that remembers, reflects, and plans based on constantly growing memories and cascading social dynamics.
- Smallville is a custom-built game world which simulates a small village. 25 generative agents are initiated with a paragraph description of their personality and motivations. No other information is provided to them.
- As individuals, agents set plans, and execute on them. They wake up in the morning, do their routines, and go to work in the sandbox game environment.
- First, an agent basically generates a natural language statement describing their current action. They then translate this into concrete grounded movements that can affect the sandbox game environment.
- They actually influence the state of the objects that are in this world. So a refrigerator can be empty when the agent uses a table to make breakfast.
- They determine whether they want to engage in conversations when they see another agent. And they generate the actual dialogue if they decide to engage.
- Just how agents can form dialogue with each other, a user can engage in a dialogue with these agents by specifying a persona. For instance, a news reporter.
- Users can also alter the state of the agent's environment, control an agent, or actually enter as an outside visitor.
- In this simulation, information diffuses across the community as agents share information with each other and form new relationships.
- In the center of the architecture that powers generative agents is a memory stream that maintains a record of agents' experiences in natural language.
- From the memory stream, records are retrieved as relevant to the agents' cognitive processes. A retrieval function that takes the agent's current situation as input and returns a subset of the memory stream to pass to a LLM, which then generates the final output behavior of the agents.
- Retrieval is a linear combination of the recency, importance, and relevance function for each piece of memory.
- The importance function is a prompt that asks the large-range model for the event status. You're basically asking the agent in natural language, this is who you are. How important is this to you?
- The relevance function clusters records of agents' memory into higher-level abstract thoughts that are called reflections. Once they are synthesized, these reflections are just a type of memory and are just stored in the memory stream along with other raw observational memories.
- Over time, this generates trees of reflections and the leaf nodes are basically the observations. As you go higher-level up the tree, you're starting to answer some of the core questions about who agents are, what drives them, what does they like.
- While we can generate plausible behavior in response to situations, this might sacrifice the quality for long-term actions. Agents need to plan over a longer time horizon than just now.
- Plans describe a future sequence of actions for the agent and help keep the agent's behavior consistent over time and are generated by a prompt that summarizes the agent and the agent's current status.
- In order to control for granularity, plans are generated in large chunks to hourly to 1-15 minute increments.
- How do we evaluate them if agents remember, plan, and reflect in a believable manner?
- Ask the agents a series of questions and human investigators to rank the answers in order to calculate true skill ratings for each condition
- Found that the core components of our agent architecture, observation, plan, and reflection, each contribute critically to the believability of these agents.
- But agents would sometimes fail to retrieve certain memories and sometimes embellish their memory (with hallucinations).
- And instruction tuning of LLMs also influenced how agents spoke to each other (overly formal or polite).
- Going forward, the promise of generative agents is that we can actually create accurately simulation human behaviors.
New business wanted
Last week Krijn and I decided to cancel performance.now() 2021. Although it was the right decision it leaves me in financially fairly dire straits. So I’m looking for new jobs and/or donations.
Even though the Corona trends in NL look good, and we could probably have brought 350 people together in November, we cannot be certain: there might be a new flare-up. More serious is the fact that it’s very hard to figure out how to apply the Corona checks Dutch government requires, especially for non-EU citizens. We couldn’t figure out how UK and US people should be tested, and for us that was the straw that broke the camel’s back. Cancelling the conference relieved us of a lot of stress.
Still, it also relieved me of a lot of money. This is the fourth conference in a row we cannot run, and I have burned through all my reserves. That’s why I thought I’d ask for help.
So ...
Has QuirksMode.org ever saved you a lot of time on a project? Did it advance your career? If so, now would be a great time to make a donation to show your appreciation.
I am trying my hand at CSS coaching. Though I had only few clients so far I found that I like it and would like to do it more. As an added bonus, because I’m still writing my CSS for JavaScripters book I currently have most of the CSS layout modules in my head and can explain them straight away — even stacking contexts.
Or if there’s any job you know of that requires a technical documentation writer with a solid knowledge of web technologies and the browser market, drop me a line. I’m interested.
Anyway, thanks for listening.
position: sticky, draft 1
I’m writing the position: sticky part of my book, and since I never worked with sticky before I’m not totally sure if what I’m saying is correct.
This is made worse by the fact that there are no very clear tutorials on sticky. That’s partly because it works pretty intuitively in most cases, and partly because the details can be complicated.
So here’s my draft 1 of position: sticky. There will be something wrong with it; please correct me where needed.
The inset properties are top, right, bottom and left. (I already introduced this terminology earlier in the chapter.)
h3,h4,pre {clear: left} section.scroll-container { border: 1px solid black; width: 300px; height: 250px; padding: 1em; overflow: auto; --text: 'scroll box'; float: left; clear: left; margin-right: 0.5em; margin-bottom: 1em; position: relative; font-size: 1.3rem; } .container,.outer-container { border: 1px solid black; padding: 1em; position: relative; --text: 'container'; } .outer-container { --text: 'outer container'; } :is(.scroll-container,.container,.outer-container):before { position: absolute; content: var(--text); top: 0.2em; left: 0.2em; font-size: 0.8rem; } section.scroll-container h2 { position: sticky; top: 0; background: white; margin: 0 !important; color: inherit !important; padding: 0.5em !important; border: 1px solid; font-size: 1.4rem !important; } .nowrap p { white-space: nowrap; } Introductionposition: sticky is a mix of relative and fixed. A sticky box takes its normal position in the flow, as if it had position: relative, but if that position scrolls out of view the sticky box remains in a position defined by its inset properties, as if it has position: fixed. A sticky box never escapes its container, though. If the container start or end scrolls past the sticky box abandons its fixed position and sticks to the top or the bottom of its container.
It is typically used to make sure that headers remain in view no matter how the user scrolls. It is also useful for tables on narrow screens: you can keep headers or the leftmost table cells in view while the user scrolls.
Scroll box and containerA sticky box needs a scroll box: a box that is able to scroll. By default this is the browser window — or, more correctly, the layout viewport — but you can define another scroll box by setting overflow on the desired element. The sticky box takes the first ancestor that could scroll as its scroll box and calculates all its coordinates relative to it.
A sticky box needs at least one inset property. These properties contain vital instructions, and if the sticky box doesn’t receive them it doesn’t know what to do.
A sticky box may also have a container: a regular HTML element that contains the sticky box. The sticky box will never be positioned outside this container, which thus serves as a constraint.
The first example shows this set-up. The sticky <h2> is in a perfectly normal <div>, its container, and that container is in a <section> that is the scroll box because it has overflow: auto. The sticky box has an inset property to provide instructions. The relevant styles are:
section.scroll-container { border: 1px solid black; width: 300px; height: 300px; overflow: auto; padding: 1em; } div.container { border: 1px solid black; padding: 1em; } section.scroll-container h2 { position: sticky; top: 0; } The rules Sticky headerRegular content
Regular content
Regular content
Regular content
Regular content
Regular content
Regular content
Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
Now let’s see exactly what’s going on.
A sticky box never escapes its containing box. If it cannot obey the rules that follow without escaping from its container, it instead remains at the edge. Scroll down until the container disappears to see this in action.
A sticky box starts in its natural position in the flow, as if it has position: relative. It thus participates in the default flow: if it becomes higher it pushes the paragraphs below it downwards, just like any other regular HTML element. Also, the space it takes in the normal flow is kept open, even if it is currently in fixed position. Scroll down a little bit to see this in action: an empty space is kept open for the header.
A sticky box compares two positions: its natural position in the flow and its fixed position according to its inset properties. It does so in the coordinate frame of its scroll box. That is, any given coordinate such as top: 20px, as well as its default coordinates, is resolved against the content box of the scroll box. (In other words, the scroll box’s padding also constrains the sticky box; it will never move up into that padding.)
A sticky box with top takes the higher value of its top and its natural position in the flow, and positions its top border at that value. Scroll down slowly to see this in action: the sticky box starts at its natural position (let’s call it 20px), which is higher than its defined top (0). Thus it rests at its position in the natural flow. Scrolling up a few pixels doesn’t change this, but once its natural position becomes less than 0, the sticky box switches to a fixed layout and stays at that position.
The sticky box has bottom: 0
Regular content
Regular content
Regular content
Regular content
Regular content
Regular content
Sticky headerContent outside container
Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
It does the same for bottom, but remember that a bottom is calculated relative to the scroll box’s bottom, and not its top. Thus, a larger bottom coordinate means the box is positioned more to the top. Now the sticky box compares its default bottom with the defined bottom and uses the higher value to position its bottom border, just as before.
With left, it uses the higher value of its natural position and to position its left border; with right, it does the same for its right border, bearing in mind once more that a higher right value positions the box more to the left.
If any of these steps would position the sticky box outside its containing box it takes the position that just barely keeps it within its containing box.
Details Sticky headerVery, very long line of content to stretch up the container quite a bit
Regular content
Regular content
Regular content
Regular content
Regular content
Regular content
Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
The four inset properties act independently of one another. For instance the following box will calculate the position of its top and left edge independently. They can be relative or fixed, depending on how the user scrolls.
p.testbox { position: sticky; top: 0; left: 0; }Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
The sticky box has top: 0; bottom: 0
Regular content
Regular content
Regular content
Regular content
Sticky headerRegular content
Regular content
Regular content
Regular content
Regular content
Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
Setting both a top and a bottom, or both a left and a right, gives the sticky box a bandwidth to move in. It will always attempt to obey all the rules described above. So the following box will vary between 0 from the top of the screen to 0 from the bottom, taking its default position in the flow between these two positions.
p.testbox { position: sticky; top: 0; bottom: 0; } No containerRegular content
Regular content
Sticky headerRegular content
Regular content
Regular content
Regular content
Regular content
Regular content
Regular content
Regular content
Regular content
So far we put the sticky box in a container separate from the scroll box. But that’s not necessary. You can also make the scroll box itself the container if you wish. The sticky element is still positioned with respect to the scroll box (which is now also its container) and everything works fine.
Several containers Sticky headerRegular content
Regular content
Regular content
Regular content
Regular content
Regular content
Regular content
Content outside container
Content outside container
Content outside outer container
Content outside outer container
Or the sticky item can be several containers removed from its scroll box. That’s fine as well; the positions are still calculated relative to the scroll box, and the sticky box will never leave its innermost container.
Changing the scroll box Sticky headerThe container has overflow: auto.
Regular content
Regular content
Regular content
Regular content
Regular content
Regular content
Content outside container
Content outside container
Content outside container
One feature that catches many people (including me) unaware is giving the container an overflow: auto or hidden. All of a sudden it seems the sticky header doesn’t work any more.
What’s going on here? An overflow value of auto, hidden, or scroll makes an element into a scroll box. So now the sticky box’s scroll box is no longer the outer element, but the inner one, since that is now the closest ancestor that is able to scroll.
The sticky box appears to be static, but it isn’t. The crux here is that the scroll box could scroll, thanks to its overflow value, but doesn’t actually do so because we didn’t give it a height, and therefore it stretches up to accomodate all of its contents.
Thus we have a non-scrolling scroll box, and that is the root cause of our problems.
As before, the sticky box calculates its position by comparing its natural position relative to its scroll box with the one given by its inset properties. Point is: the sticky box doesn’t scroll relative to its scroll box, so its position always remains the same. Where in earlier examples the position of the sticky element relative to the scroll box changed when we scrolled, it no longer does so, because the scroll box doesn’t scroll. Thus there is no reason for it to switch to fixed positioning, and it stays where it is relative to its scroll box.
The fact that the scroll box itself scrolls upward is irrelevant; this doesn’t influence the sticky box in the slightest.
Sticky headerRegular content
Regular content
Regular content
Regular content
Regular content
Regular content
Regular content
Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
Content outside container
One solution is to give the new scroll box a height that is too little for its contents. Now the scroll box generates a scrollbar and becomes a scrolling scroll box. When we scroll it the position of the sticky box relative to its scroll box changes once more, and it switches from fixed to relative or vice versa as required.
Minor itemsFinally a few minor items:
- It is no longer necessary to use position: -webkit-sticky. All modern browsers support regular position: sticky. (But if you need to cater to a few older browsers, retaining the double syntax doesn’t hurt.)
- Chrome (Mac) does weird things to the borders of the sticky items in these examples. I don’t know what’s going on and am not going to investigate.
Breaking the web forward
Safari is holding back the web. It is the new IE, after all. In contrast, Chrome is pushing the web forward so hard that it’s starting to break. Meanwhile web developers do nothing except moan and complain. The only thing left to do is to pick our poison.
blockquote { font-size: inherit; font-family: inherit; } blockquote p { font-size: inherit; font-family: inherit; } Safari is the new IERecently there was yet another round of “Safari is the new IE” stories. Once Jeremy’s summary and a short discussion cleared my mind I finally figured out that Safari is not IE, and that Safari’s IE-or-not-IE is not the worst problem the web is facing.
Perry Sun argues that for developers, Safari is crap and outdated, emulating the old IE of fifteen years ago in this respect. He also repeats the theory that Apple is deliberately starving Safari of features in order to protect the app store, and thus its bottom line. We’ll get back to that.
The allegation that Safari is holding back web development by its lack of support for key features is not new, but it’s not true, either. Back fifteen years ago IE held back the web because web developers had to cater to its outdated technology stack. “Best viewed with IE” and all that. But do you ever see a “Best viewed with Safari” notice? No, you don’t. Another browser takes that special place in web developers’ hearts and minds.
Chrome is the new IE, but in reverseJorge Arango fears we’re going back to the bad old days with “Best viewed in Chrome.” Chris Krycho reinforces this by pointing out that, even though Chrome is not the standard, it’s treated as such by many web developers.
“Best viewed in Chrome” squares very badly with “Safari is the new IE.” Safari’s sad state does not force web developers to restrict themselves to Safari-supported features, so it does not hold the same position as IE.
So I propose to lay this tired old meme to rest. Safari is not the new IE. If anything it’s the new Netscape 4.
Meanwhile it is Chrome that is the new IE, but in reverse.
Break the web forwardBack in the day, IE was accused of an embrace, extend, and extinguish strategy. After IE6 Microsoft did nothing for ages, assuming it had won the web. Thanks to web developers taking action in their own name for the first (and only) time, IE was updated once more and the web moved forward again.
Google learned from Microsoft’s mistakes and follows a novel embrace, extend, and extinguish strategy by breaking the web and stomping on the bits. Who cares if it breaks as long as we go forward. And to hell with backward compatibility.
Back in 2015 I proposed to stop pushing the web forward, and as expected the Chrome devrels were especially outraged at this idea. It never went anywhere. (Truth to tell: I hadn’t expected it to.)
I still think we should stop pushing the web forward for a while until we figure out where we want to push the web forward to — but as long as Google is in charge that won’t happen. It will only get worse.
On alertA blog storm broke out over the decision to remove alert(), confirm() and prompt(), first only the cross-origin variants, but eventually all of them. Jeremy and Chris Coyier already summarised the situation, while Rich Harris discusses the uses of the three ancient modals, especially when it comes to learning JavaScript.
With all these articles already written I will only note that, if the three ancient modals are truly as horrendous a security issue as Google says they are it took everyone a bloody long time to figure that out. I mean, they turn 25 this year.
Although it appears Firefox and Safari are on board with at least the cross-origin part of the proposal, there is no doubt that it’s Google that leads the charge.
From Google’s perspective the ancient modals have one crucial flaw quite apart from their security model: they weren’t invented there. That’s why they have to be replaced by — I don’t know what, but it will likely be a very complicated API.
Complex systems and arrogant priests rule the webThus the new embrace, extend, and extinguish is breaking backward compatibility in order to make the web more complicated. Nolan Lawson puts it like this:
we end up with convoluted specs like Service Worker that you need a PhD to understand, and yet we still don't have a working <dialog> element.
In addition, Google can be pretty arrogant and condescending, as Chris Ferdinandi points out.
The condescending “did you actually read it, it’s so clear” refrain is patronizing AF. It’s the equivalent of “just” or “simply” in developer documentation.
I read it. I didn’t understand it. That’s why I asked someone whose literal job is communicating with developers about changes Chrome makes to the platform.
This is not isolated to one developer at Chrome. The entire message thread where this change was surfaced is filled with folks begging Chrome not to move forward with this proposal because it will break all-the-things.
If you write documentation or a technical article and nobody understands it, you’ve done a crappy job. I should know; I’ve been writing this stuff for twenty years.
Extend, embrace, extinguish. And use lots of difficult words.
Patience is a virtueAs a reaction to web dev outcry Google temporarily halted the breaking of the web. That sounds great but really isn’t. It’s just a clever tactical move.
I saw this tactic in action before. Back in early 2016 Google tried to break the de-facto standard for the mobile visual viewport that I worked very hard to establish. I wrote a piece that resonated with web developers, whose complaints made Google abandon the plan — temporarily. They tried again in late 2017, and I again wrote an article, but this time around nobody cared and the changes took effect and backward compatibility was broken.
So the three ancient modals still have about 12 to 18 months to live. Somewhere in late 2022 to early 2023 Google will try again, web developers will be silent, and the modals will be gone.
The pursuit of appinessBut why is Google breaking the web forward at such a pace? And why is Apple holding it back?
Safari is kept dumb to protect the app store and thus revenue. In contrast, the Chrome team is pushing very hard to port every single app functionality to the browser. Ages ago I argued we should give up on this, but of course no one listened.
When performing Valley Kremlinology, it is useful to see Google policies as stemming from a conflict between internal pro-web and anti-web factions. We web developers mainly deal with the pro-web faction, the Chrome devrel and browser teams. On the other hand, the Android team is squarely in the anti-web camp.
When seen in this light the pro-web camp’s insistence on copying everything appy makes excellent sense: if they didn’t Chrome would lag behind apps and the Android anti-web camp would gain too much power. While I prefer the pro-web over the anti-web camp, I would even more prefer the web not to be a pawn in an internal Google power struggle. But it has come to that, no doubt about it.
Solutions?Is there any good solution? Not really.
Jim Nielsen feels that part of the issue is the lack of representation of web developers in the standardization process. That sounds great but is proven not to work.
Three years ago Fronteers and I attempted to get web developers represented and were met with absolute disinterest. Nobody else cared even one shit, and the initiative sank like a stone.
So a hypothetical web dev representative in W3C is not going to work. Also, the organisational work would involve a lot of unpaid labour, and I, for one, am not willing to do it again. Neither is anyone else. So this is not the solution.
And what about Firefox? Well, what about it? Ten years ago it made a disastrous mistake by ignoring the mobile web for way too long, then it attempted an arrogant and uninformed come-back with Firefox OS that failed, and its history from that point on is one long slide into obscurity. That’s what you get with shitty management.
Pick your poisonSo Safari is trying to slow the web down. With Google’s move-fast-break-absofuckinglutely-everything axiom in mind, is Safari’s approach so bad?
Regardless of where you feel the web should be on this spectrum between Google and Apple, there is a fundamental difference between the two.
We have the tools and procedures to manage Safari’s disinterest. They’re essentially the same as the ones we deployed against Microsoft back in the day — though a fundamental difference is that Microsoft was willing to talk while Apple remains its old haughty self, and its “devrels” aren’t actually allowed to do devrelly things such as managing relations with web developers. (Don’t blame them, by the way. If something would ever change they’re going to be our most valuable internal allies — just as the IE team was back in the day.)
On the other hand, we have no process for countering Google’s reverse embrace, extend, and extinguish strategy, since a section of web devs will be enthusiastic about whatever the newest API is. Also, Google devrels talk. And talk. And talk. And provide gigs of data that are hard to make sense of. And refer to their proprietary algorithms that “clearly” show X is in the best interest of the web — and don’t ask questions! And make everything so fucking complicated that we eventually give up and give in.
So pick your poison. Shall we push the web forward until it’s broken, or shall we break it by inaction? What will it be? Privately, my money is on Google. So we should say goodbye to the old web while we still can.
Custom properties and @property
You’re reading a failed article. I hoped to write about @property and how it is useful for extending CSS inheritance considerably in many different circumstances. Alas, I failed. @property turns out to be very useful for font sizes, but does not even approach the general applicability I hoped for.
Grandparent-inheritingIt all started when I commented on what I thought was an interesting but theoretical idea by Lea Verou: what if elements could inherit the font size of not their parent, but their grandparent? Something like this:
div.grandparent { /* font-size could be anything */ } div.parent { font-size: 0.4em; } div.child { font-size: [inherit from grandparent in some sort of way]; font-size: [yes, you could do 2.5em to restore the grandparent's font size]; font-size: [but that's not inheriting, it's just reversing a calculation]; font-size: [and it will not work if the parent's font size is also unknown]; }Lea told me this wasn’t a vague idea, but something that can be done right now. I was quite surprised — and I assume many of my readers are as well — and asked for more information. So she wrote Inherit ancestor font-size, for fun and profit, where she explained how the new Houdini @property can be used to do this.
This was seriously cool. Also, I picked up a few interesting bits about how CSS custom properties and Houdini @property work. I decided to explain these tricky bits in simple terms — mostly because I know that by writing an explanation I myself will understand them better — and to suggest other possibilities for using Lea’s idea.
Alas, that last objective is where I failed. Lea’s idea can only be used for font sizes. That’s an important use case, but I had hoped for more. The reasons why it doesn’t work elsewhere are instructive, though.
Tokens and valuesLet’s consider CSS custom properties. What if we store the grandparent’s font size in a custom property and use that in the child?
div.grandparent { /* font-size could be anything */ --myFontSize: 1em; } div.parent { font-size: 0.4em; } div.child { font-size: var(--myFontSize); /* hey, that's the grandparent's font size, isn't it? */ }This does not work. The child will have the same font size as the parent, and ignore the grandparent. In order to understand why we need to understand how custom properties work. What does this line of CSS do?
--myFontSize: 1em;It sets a custom property that we can use later. Well duh.
Sure. But what value does this custom property have?
... errr ... 1em?
Nope. The answer is: none. That’s why the code example doesn’t work.
When they are defined, custom properties do not have a value or a type. All that you ordered the browsers to do is to store a token in the variable --myFontSize.
This took me a while to wrap my head around, so let’s go a bit deeper. What is a token? Let’s briefly switch to JavaScript to explain.
let myVar = 10;What’s the value of myVar in this line? I do not mean: what value is stored in the variable myVar, but: what value does the character sequence myVar have in that line of code? And what type?
Well, none. Duh. It’s not a variable or value, it’s just a token that the JavaScript engine interprets as “allow me to access and change a specific variable” whenever you type it.
CSS custom properties also hold such tokens. They do not have any intrinsic meaning. Instead, they acquire meaning when they are interpreted by the CSS engine in a certain context, just as the myVar token is in the JavaScript example.
So the CSS custom property contains the token 1em without any value, without any type, without any meaning — as yet.
You can use pretty any bunch of characters in a custom property definition. Browsers make no assumptions about their validity or usefulness because they don’t yet know what you want to do with the token. So this, too, is a perfectly fine CSS custom property:
--myEgoTrip: ppk;Browsers shrug, create the custom property, and store the indicated token. The fact that ppk is invalid in all CSS contexts is irrelevant: we haven’t tried to use it yet.
It’s when you actually use the custom property that values and types are assigned. So let’s use it:
background-color: var(--myEgoTrip);Now the CSS parser takes the tokens we defined earlier and replaces the custom property with them:
background-color: ppk;And only NOW the tokens are read and intrepreted. In this case that results in an error: ppk is not a valid value for background-color. So the CSS declaration as a whole is invalid and nothing happens — well, technically it gets the unset value, but the net result is the same. The custom property itself is still perfectly valid, though.
The same happens in our original code example:
div.grandparent { /* font-size could be anything */ --myFontSize: 1em; /* just a token; no value, no meaning */ } div.parent { font-size: 0.4em; } div.child { font-size: var(--myFontSize); /* becomes */ font-size: 1em; /* hey, this is valid CSS! */ /* Right, you obviously want the font size to be the same as the parent's */ /* Sure thing, here you go */ }In div.child he tokens are read and interpreted by the CSS parser. This results in a declaration font-size: 1em;. This is perfectly valid CSS, and the browsers duly note that the font size of this element should be 1em.
font-size: 1em is relative. To what? Well, to the parent’s font size, of course. Duh. That’s how CSS font-size works.
So now the font size of the child becomes the same as its parent’s, and browsers will proudly display the child element’s text in the same font size as the parent element’s while ignoring the grandparent.
This is not what we wanted to achieve, though. We want the grandparent’s font size. Custom properties — by themselves — don’t do what we want. We have to find another solution.
@propertyLea’s article explains that other solution. We have to use the Houdini @property rule.
@property --myFontSize { syntax: "<length>"; initial-value: 0; inherits: true; } div { border: 1px solid; padding: 1em; } div.grandparent { /* font-size could be anything */ --myFontSize: 1em; } div.parent { font-size: 0.4em; } div.child { font-size: var(--myFontSize); }Now it works. Wut? Yep — though only in Chrome so far.
@property --myFontSize { syntax: ""; initial-value: 0; inherits: true; } section.example { max-width: 500px; } section.example div { border: 1px solid; padding: 1em; } div.grandparent { font-size: 23px; --myFontSize: 1em; } div.parent { font-size: 0.4em; } div.child { font-size: var(--myFontSize); } This is the grandparent This is the parent This is the childWhat black magic is this?
Adding the @property rule changes the custom property --myFontSize from a bunch of tokens without meaning to an actual value. Moreover, this value is calculated in the context it is defined in — the grandfather — so that the 1em value now means 100% of the font size of the grandfather. When we use it in the child it still has this value, and therefore the child gets the same font size as the grandfather, which is exactly what we want to achieve.
(The variable uses a value from the context it’s defined in, and not the context it’s executed in. If, like me, you have a grounding in basic JavaScript you may hear “closures!” in the back of your mind. While they are not the same, and you shouldn’t take this apparent equivalency too far, this notion still helped me understand. Maybe it’ll help you as well.)
Unfortunately I do not quite understand what I’m doing here, though I can assure you the code snippet works in Chrome — and will likely work in the other browsers once they support @property.
Misson completed — just don’t ask me how.
SyntaxYou have to get the definition right. You need all three lines in the @property rule. See also the specification and the MDN page.
@property --myFontSize { syntax: "<length>"; initial-value: 0; inherits: true; }The syntax property tells browsers what kind of property it is and makes parsing it easier. Here is the list of possible values for syntax, and in 99% of the cases one of these values is what you need.
You could also create your own syntax, e.g. syntax: "ppk | <length>"
Now the ppk keyword and any sort of length is allowed as a value.
Note that percentages are not lengths — one of the many things I found out during the writing of this article. Still, they are so common that a special value for “length that may be a percentage or may be calculated using percentages” was created:
syntax: "<length-percentage>"Finally, one special case you need to know about is this one:
syntax: "*"MDN calls this a universal selector, but it isn’t, really. Instead, it means “I don’t know what syntax we’re going to use” and it tells browsers not to attempt to interpret the custom property. In our case that would be counterproductive: we definitely want the 1em to be interpreted. So our example doesn’t work with syntax: "*".
initial-value and inheritsAn initial-value property is required for any syntax value that is not a *. Here that’s simple: just give it an initial value of 0 — or 16px, or any absolute value. The value doesn’t really matter since we’re going to overrule it anyway. Still, a relative value such as 1em is not allowed: browsers don’t know what the 1em would be relative to and reject it as an initial value.
Finally, inherits: true specifies that the custom property value can be inherited. We definitely want the computed 1em value to be inherited by the child — that’s the entire point of this experiment. So we carefully set this flag to true.
Other use casesSo far this article merely rehashed parts of Lea’s. Since I’m not in the habit of rehashing other people’s articles my original plan was to add at least one other use case. Alas, I failed, though Lea was kind enough to explain why each of my ideas fails.
Percentage of what?Could we grandfather-inherit percentual margins and paddings? They are relative to the width of the parent of the element you define them on, and I was wondering if it might be useful to send the grandparent’s margin on to the child just like the font size. Something like this:
@property --myMargin { syntax: "<length-percentage>"; initial-value: 0; inherits: true; } div.grandparent { --myMargin: 25%; margin-left: var(--myMargin); } div.parent { font-size: 0.4em; } div.child { margin-left: var(--myMargin); /* should now be 25% of the width of the grandfather's parent */ /* but isn't */ }Alas, this does not work. Browsers cannot resolve the 25% in the context of the grandparent, as they did with the 1em, because they don’t know what to do.
The most important trick for using percentages in CSS is to always ask yourself: “percentage of WHAT?”
That’s exactly what browsers do when they encounter this @property definition. 25% of what? The parent’s font size? Or the parent’s width? (This is the correct answer, but browsers have no way of knowing that.) Or maybe the width of the element itself, for use in background-position?
Since browsers cannot figure out what the percentage is relative to they do nothing: the custom property gets the initial value of 0 and the grandfather-inheritance fails.
ColoursAnother idea I had was using this trick for the grandfather’s text colour. What if we store currentColor, which always has the value of the element’s text colour, and send it on to the grandchild? Something like this:
@property --myColor { syntax: "<color>"; initial-value: black; inherits: true; } div.grandparent { /* color unknown */ --myColor: currentColor; } div.parent { color: red; } div.child { color: var(--myColor); /* should now have the same color as the grandfather */ /* but doesn't */ }Alas, this does not work either. When the @property blocks are evaluated, and 1em is calculated, currentColor specifically is not touched because it is used as an initial (default) value for some inherited SVG and CSS properties such as fill. Unfortunately I do not fully understand what’s going on, but Tab says this behaviour is necessary, so it is.
Pity, but such is life. Especially when you’re working with new CSS functionalities.
ConclusionSo I tried to find more possbilities for using Lea’s trick, but failed. Relative units are fairly sparse, especially when you leave percentages out of the equation. em and related units such as rem are the only ones, as far as I can see.
So we’re left with a very useful trick for font sizes. You should use it when you need it (bearing in mind that right now it’s only supported in Chromium-based browsers), but extending it to other declarations is not possible at the moment.
Many thanks to Lea Verou and Tab Atkins for reviewing and correcting an earlier draft of this article.
Let’s talk about money
Let’s talk about money!
Let’s talk about how hard it is to pay small amounts online to people whose work you like and who could really use a bit of income. Let’s talk about how Coil aims to change that.
Taking a subscription to a website is moderately easy, but the person you want to pay must have enabled them. Besides, do you want to purchase a full subscription in order to read one or two articles per month?
Sending a one-time donation is pretty easy as well, but, again, the site owner must have enabled them. And even then it just gives them ad-hoc amounts that they cannot depend on.
Then there’s Patreon and Kickstarter and similar systems, but Patreon is essentially a subscription service while Kickstarter is essentially a one-time donation service, except that both keep part of the money you donate.
And then there’s ads ... Do we want small content creators to remain dependent on ads and thus support the entire ad ecosystem? I, personally, would like to get rid of them.
The problem today is that all non-ad-based systems require you to make conscious decisions to support someone — and even if you’re serious about supporting them you may forget to send in a monthly donation or to renew your subscription. It sort-of works, but the user experience can be improved rather dramatically.
That’s where Coil and the Web Monetization Standard come in.
Web MonetizationThe idea behind Coil is that you pay for what you consume easily and automatically. It’s not a subscription - you only pay for what you consume. It’s not a one-time donation, either - you always pay when you consume.
Payments occur automatically when you visit a website that is also subscribed to Coil, and the amount you pay to a single site owner depends on the time you spend on the site. Coil does not retain any of your money, either — everything goes to the people you support.
In this series of four articles we’ll take a closer look at the architecture of the current Coil implementation, how to work with it right now, the proposed standard, and what’s going to happen in the future.
OverviewSo how does Coil work right now?
Both the payer and the payee need a Coil account to send and receive money. The payee has to add a <meta> tag with a Coil payment pointer to all pages they want to monetize. The payer has to install the Coil extension in their browsers. You can see this extension as a polyfill. In the future web monetization will, I hope, be supported natively in all browsers.
Once that’s done the process works pretty much automatically. The extension searches for the <meta> tag on any site the user visits. If it finds one it starts a payment stream from payer to payee that continues for as long as the payer stays on the site.
The payee can use the JavaScript API to interact with the monetization stream. For instance, they can show extra content to paying users, or keep track of how much a user paid so far. Unfortunately these functionalities require JavaScript, and the hiding of content is fairly easy to work around. Thus it is not yet suited for serious business purposes, especially in web development circles.
This is one example of how the current system is still a bit rough around the edges. You’ll find more examples in the subsequent articles. Until the time browsers support the standard natively and you can determine your visitors’ monetization status server-side these rough bits will continue to exist. For the moment we will have to work with the system we have.
This article series will discuss all topics we touched on in more detail.
Start now!For too long we have accepted free content as our birthright, without considering the needs of the people who create it. This becomes even more curious for articles and documentation that are absolutely vital to our work as web developers.
Take a look at this list of currently-monetized web developer sites. Chances are you’ll find a few people whose work you used in the past. Don’t they deserve your direct support?
Free content is not a right, it’s an entitlement. The sooner we internalize this, and start paying independent voices, the better for the web.
The only alternative is that all articles and documentation that we depend on will written by employees of large companies. And employees, no matter how well-meaning, will reflect the priorities and point of view of their employer in the long run.
So start now.
In order to support them you should invest a bit of time once and US$5 per month permanently. I mean, that’s not too much to ask, is it?
ContinueI wrote this article and its sequels for Coil, and yes, I’m getting paid. Still, I believe in what they are doing, so I won’t just spread marketing drivel. Initially it was unclear to me exactly how Coil works. So I did some digging, and the remaining parts of this series give a detailed description of how Coil actually works in practice.
For now the other three articles will only be available on dev.to. I just published part 2, which gives a high-level overview of how Coil works right now. Part 3 will describe the meta tag and the JavaScript API, and in part 4 we’ll take a look at the future, which includes a formal W3C standard. Those parts will be published next week and the week after that.
