htmx

htmx: Simplicity in an Age of Complicated Solutions

Erik Heemskerk

2024-05-30 • 17 minutes

Almost four decades ago, Fred Brooks, software engineer and writer of The Mythical Man-Month, wrote:

There is inherently no silver bullet.

He was writing about complexity in the building of software, and how there is no conceivable way to make the process go significantly faster. He also classified two kinds of complexity encountered when building software: essential complexity, which is inherent to the problem the software is trying to solve; and accidental complexity, which is complexity introduced by the choices we make in the development process. An example of accidental complexity is that of reusability; you don’t want to write the same piece of code twice, because you don’t want to have to make changes in two (or more) places when the piece of code needs changing.

The term ‘silver bullet’ is also used derogatorily to refer to the fact that there is no single technology or piece of software that is universally applicable in all situations, or even more than a handful. The ‘silver bullet’ technology is a dream that lingers stubbornly in the minds of developers; who wouldn’t want to stumble upon that one technology that magically solves all of your problems? However, as Fred Brooks has eloquently stated, there is no such thing. The industry seems to struggle to learn this.

Candidates for the ‘silver bullet’ often focus on accidental complexity, trying to make the developers’ lives easier, but as another evergreen phrase says, there’s no such thing as a free lunch, meaning everything comes with a cost or a trade-off. In my opinion, nowadays, the cost often being paid is that of simplicity; while a particular solution makes it easier to reuse code, it might also make it more complicated.

Front-end complexity

Front-end development is rife with examples of this. The engineering culture among many front-end developers seems to be one where simplicity is a dirty word. There are towering levels of abstraction, comically complicated and long build processes, bad performance, horrible tooling, and buggy applications, and everybody seems fine with this.

Animated version of the ‘This is fine’ meme

Let’s look at an example. We want to add search functionality to our site. Using nothing more than HTML 5, we can create a form with a text field.

<form method="GET" action="/search">
  <input type="search" name="query" placeholder="Search" />
  <input type="submit" value="Search" />
</form>

Search form in HTML5

This is very basic, but it gets the job done. When you type something in the field and press Enter or click the Search button, your query is sent to the server and the response, hopefully containing some search results, replaces the entire page. The fact that it replaces the entire page can be jarring. When you have a slow or bad connection (which everyone who’s ever used hotel or train WiFi can confirm is still a thing) the page temporarily stops being interactive while blocking resources are being loaded. When it finishes loading, you’ll have lost all state on the page, such as scroll position, data entered in form fields, collapsed or expanded elements, and so on.

Changing only a part of the page to display the search results would result in a much better user experience. Unfortunately, this is not part of HTML. JavaScript can help us out here, surely.

Vanilla JavaScript

Let’s just add a <script> tag with some ‘vanilla’ JavaScript code to make the process asynchronous. Naturally, we’re going to request some JSON from a ‘REST endpoint’. Not GraphQL. That would be over-engineering.

<input type="search" id="searchInput" placeholder="Search...">
<button id="searchButton">Search</button>
<ul id="resultsList"></ul>

<script>
  const resultsList = document.getElementById('resultsList');
  const searchInput = document.getElementById('searchInput');
  const searchButton = document.getElementById('searchButton');

  searchButton.addEventListener('click', handleSearch);
  searchInput.addEventListener('keypress', function(event) {
    if (event.key === 'Enter') {
      handleSearch();
    }
  });

  async function handleSearch() {
    const searchTerm = searchInput.value.trim();
    const encodedSearchTerm = encodeURIComponent(searchTerm);
    const apiUrl = `/search?query=${encodedSearchTerm}`;
    resultsList.innerHTML = ''; // Clear previous results

    const response = await fetch(apiUrl);

    if (!response.ok) {
      throw new Error('Network response was not ok');
    }
    const data = await response.json();

    if (data.length === 0) {
      resultsList.innerHTML = 'No results found.';
    } else {
      data.forEach(result => {
        const listItem = document.createElement('li');
        listItem.textContent = result.title;
        resultsList.appendChild(listItem);
      });
    }
}
</script>

Yikes. There is so much more here than in the HTML version. A lot of the code is focused on translation. Translation between the raw search query and a URL. Translation between the (assumed) JSON response and DOM elements. Look at how much we need to do; grabbing and creating DOM elements, adding event handlers, handling keystrokes…

This is a very limited example with many problems:

The representation of search results is so basic it’s not even a link;
It works pretty well when you have awesome network conditions (say, between your browser and your local host), but when you’re using hotel WiFi, it would be nice if there was some sort of loading indicator;
If you initiate a new request while another one is still in flight, it would be nice if the older one is canceled or at least discarded.

Note that there is an implied contract between this code running in the browser and the code on the server: the shape of the response. The code assumes the response is JSON containing an array of objects, where each object has a property called title. If that is somehow not the case, it will appear to the user as if nothing happened (because there is no error handling).

Dealing with the loading indicator and aborting in-flight requests takes a bit of extra code, but having readable code for the layout of results is a stretch. Let’s just pull out all the stops and use React!

React

import React, { useState, useEffect } from 'react';

function Search() {
  const [searchTerm, setSearchTerm] = useState('');
  const [results, setResults] = useState([]);
  const [loading, setLoading] = useState(false);
  const [abortController, setAbortController] = useState(null);

  const handleSearch = async () => {
    if (abortController) {
      // Abort the previous request if there is one
      abortController.abort();
    }

    if (searchTerm.trim() === '') {
      setResults([]);
      return;
    }

    const encodedSearchTerm = encodeURIComponent(searchTerm.trim());
    const apiUrl = `/search?q=${encodedSearchTerm}`;

    setResults([]);
    setLoading(true);

    const controller = new AbortController();
    setAbortController(controller);

    try {
      const response = await fetch(apiUrl, { signal: controller.signal });

      if (controller.signal.aborted) {
        // Request was aborted, so no further action is needed
        return;
      }

      if (!response.ok) {
        throw new Error('Network response was not ok');
      }

      const data = await response.json();
      setResults(data);
    } catch (error) {
      if (error.name !== 'AbortError') {
        // Display an error message only if it's not an abort
        console.error('Fetch error:', error);
      }
    } finally {
      setLoading(false);
    }
  };

  const handleKeyPress = (event) => {
    if (event.key === 'Enter') {
      handleSearch();
    }
  };

  useEffect(() => {
    return () => {
      // Cleanup function to abort the request if the component unmounts
      if (abortController) {
        abortController.abort();
      }
    };
  }, [abortController]);

  return (
    <input
      type="search"
      onChange={(e) => setSearchTerm(e.target.value)}
      onKeyPress={handleKeyPress}
    />
    <button onClick={handleSearch}>Search</button>
    {loading && <div>Loading...</div>}
    <ul>
      {results.map((result, index) => (
          <li key={`result-${index}`}><a href={result.href}>{result.title}</a></li>
      ))}
    </ul>
  );
}

export default Search;

Look at how massive this is becoming. To be fair, this isn’t even that much bigger than the pure JavaScript version. There’s a bunch of state declaration and handling, some error handling, and there’s a few lines of additional logic around the logic of aborting requests. I’d like to point out two things, though.

First, we now have a lot more state to manage, including some state that needs to be cleaned up when the component is removed from the DOM (or ‘unmounted’, in React-speak); that’s the useEffect call near the end.

Second, unlike the two earlier solutions, you can’t just plonk this in a file, access it using a <script> tag, and expect it to work. Essentially, you need to transpile this kind of JavaScript file into an actual JavaScript file. To be able to do that you need to set up a React application. This means running an NPM script that sets up a complicated directory structure, which is configured so you can run a command-line application to start a web server, which can then serve your React application. Even after you do a so-called production build, you’re left with an HTML file with placeholders and script tags and a whole bunch of JavaScript files called chunks, which contain tiny parts of your application and the required third-party libraries it needs. All the parts only make sense at run-time, where they hopefully all combine and work.

And for all of its complexity, this version of the search box still has limitations. It doesn’t support incremental search, which is where searching is continually performed while you’re typing, without losing focus or having to click a button. In web applications this is generally not done on every keypress, but only when the user stops typing for a fraction of a second. Another problem is that when you have searched, the URL of the page is not updated. This means it’s not possible to share your search query, bookmark it, or open it in a new browser tab. This flaw is shared by the pure JavaScript version, but not by the simple HTML version because that simply navigates to a different URL.

While it is certainly possible to overcome all of these issues in either the pure JavaScript or React versions of the code, we all understand the code would become even more complicated. Instead, let’s look at an approach that, instead, focuses on the HTML side of things, without piling on more and more JavaScript.

Htmx

Htmx is a tiny JavaScript library that lets you add Fetch functionality to HTML elements by using nothing more than HTML attributes. You can let a standard <button> element make an HTTP request when it’s clicked and handle the response by just a few attributes.

<button hx-post="/transfer/all?to=randomCharity" 
        hx-target="#thanks">
  Empty my bank account
</button>
<div id="thanks"></div>

The hx-post attribute here defines that when that button is clicked, a POST request is made to that URL. The hx-target attribute defines where to display the response as a CSS selector; in this case in an element with an id attribute equal to thanks. So how does htmx know what HTML elements to create from the response? A good question with a simple answer: the response is HTML. That way, your browser does not need to run code to transform the data it receives from a back-end system into DOM elements—it just receives HTML, which it already knows how to deal with.

Hypermedia

Unlike most SPA frameworks, htmx is not trying to run an entire application in the browser, but is instead trying to keep the browser ‘dumb’. It does this by utilizing a decades-old concept called hypermedia. What’s hypermedia again? According to the book Hypermedia Systems, by the authors of htmx:

Hypermedia is a media, for example, a text, that includes non-linear branching from one location in the media to another, via, for example, hyperlinks embedded in the media.

The crucial bit here is embedded in the media: a hypertext (a text that is hypermedia, such as HTML) contains all the information about possible interactions encoded within the text. Think of a search result page in Google a couple of years ago; you knew there were more pages available because there was a ‘next page’ link. There’s no need for you or your browser to understand what the URL of the next page is: that is embedded in the HTML your browser received from Google. If you change your URL structure, you only need to change it in one application. If you add a new feature, say navigating not to the next page but the one five pages over, it is as simple as adding another hyperlink.

REST, done poorly

For those of you who have been developing Web APIs for a while, this might cause a few bells to ring. A concept that is still often heard when talking about Web APIs is HATEOAS: Hypertext As The Engine Of Application State. This is a constraint for RESTful APIs that is derived from a few disparate concepts in Roy Fielding’s dissertation on Representational State Transfer. These concepts are stateless communication and self-descriptive messages, and the idea that clients should be able to navigate and interact with applications using only the information provided by the server, without requiring hard-coded knowledge or out-of-band information. This is precisely the thing that hypermedia solves.

REST, or Representational State Transfer, was originally conceived as an architectural style for distributed applications. Nowadays ‘REST’ usually refers to a JSON API that has an opinion on how URLs are constructed and that uses more verbs than GET and POST. The thing is: JSON, in and of itself, does not facilitate hypermedia controls (elements that allow for this non-linear branching), so most JSON APIs are not, in fact, RESTful.

Efforts have been made to make JSON APIs more RESTful by adding links to documents. One of the most familiar ones is HAL, or Hypertext Application Language. It adds a well-known property to all JSON documents and even within elements of a document. Imagine the response to our search operation looking like this:

{
  "_links": {
    "self": "https://example.org/api/search?q=complexity&p=4",
    "first": "https://example.org/api/search?q=complexity",
    "prev": "https://example.org/api/search?q=complexity&p=3",
    "next": "https://example.org/api/search?q=complexity&p=5"
    "last": "https://example.org/api/search?q=complexity&p=42"
  },
  "_embedded": {
    "results": [
      {
        "_links": {
          "self": "http://www.cs.unc.edu/techreports/86-020.pdf"
        },
        "title": "No Silver Bullet: Essence and Accidents of Software Engineering" 
      },
      // ...
    ]
  }
}

This describes how to get to the first page, the next page, and how to navigate to each of the search results. It does not, for example, describe which page you’re on, how many pages there are in total, or how to get to the seventeenth page. Yes, you could add URLs for all of the pages, but now you need hard-coded knowledge in the client to deal with these links. At least first, prev, next, and last are kind of standardized names of links, but to provide links to arbitrary pages you’d need to create a custom scheme, something like page:17. Which needs more hard-coded knowledge for it to be usable.

Another solution would be to add a link ‘template’ that describes how the URL for a particular page is constructed. Something like this:

{
  "_links": {
    "template:page": "https://example.org/api/search?q=complexity&p=%d"
  }
}

Now you need hard-coded knowledge in the client telling it to look for this link and to replace the placeholder %d with the page number it wants.

And the limitations go on: imagine how you would describe something as trivial as a search form in this fashion. You’d need all kinds of placeholders and magic values that need to somehow mean something to the client application. Truly RESTful JSON APIs are sometimes possible, but they’re very limited and brittle.

REST, done proper

HTML, on the other hand, is true hypermedia, so it’s almost trivial to create a RESTful application using only HTML. HTML has its limitations, as we discussed, and we can overcome most of those using a library like htmx. It’s easy to imagine what our search results would look like in HTML.

<ul>
  <li><a href="http://www.cs.unc.edu/techreports/86-020.pdf">No Silver Bullet: Essence and Accidents of Software Engineering</a></li>
</ul>
<nav>
  <ul>
    <li><a href="/search?q=complexity">&laquo; First</a></li>
    <li><a href="/search?q=complexity&amp;p=3">3</a></li>
    <li>4</li>
    <li><a href="/search?q=complexity&amp;p=5">5</a></li>
    <li><a href="/search?q=complexity&amp;p=42">Last &raquo;</a></li>
  </ul>
</nav>

When this is rendered by a browser, even without any styling, it is fully functional. The hyperlinks just work, because they are self-descriptive. If we want to add a search box to be able to manually enter the page number, we can do just that, without even getting fancy. We just add the following fragment into the <nav> element:

<form method="get" action="/search">
  <input type="hidden" name="q" value="complexity" />
  <input type="number" name="p" min="1" max="42" />
</form>

How simple! And because of the power of HTML, this just works. You enter a number, press Enter, and it goes. Hidden fields are used to transfer all of the state of the application, and it’s up to the browser to render the correct user interface elements and make them behave appropriately. Authentication is done the same way it is for viewing any page because it is a page. Caching works out of the box, assuming you’ve set your response headers correctly.

The remaining problem is that this still refreshes the entire page, and that causes a loss of state. Let’s remedy that using htmx.

The power of attributes

As we touched on before, htmx enhances HTML by defining new attributes that give each element the power to trigger Fetch requests, which result in elements being swapped in. These attributes let you define, among others:

How and where to make the request; this is done by attributes like hx-get, hx-post, hx-put, hx-delete, and so on.
What to swap in the page; primarily, this is directed by hx-target. hx-swap lets you define how to swap the content; should the inner HTML of the target be replaced, or the outer HTML (i.e. replace the entire element), or should it be added inside or after the target?
When to make the request. The default trigger depends on the element. For example, text boxes and drop-down lists are triggered by their value changing, forms are triggered by the form being submitted, and everything else is triggered by a click. You can use hx-trigger to specify a different event to trigger on, or even multiple ones, and add modifiers, such as delays or only triggering when the value has changed.
Which part of the response to use. By default, everything is used, but if you only want to use a specific part of the response, you can use hx-select.
Whether to push a URL into the browser’s history, replacing the URL in the address bar. This way your application state is easily bookmarked or shared. hx-push-url will let you push either the request URL or a custom URL into the browser history.

There are a bunch more attributes (though not a ton), but these are the core ones that will solve most problems. So how can we apply this to, for example, a pagination link?

Let’s imagine we don’t change anything about the way our application renders pages. That means /search?q=complexity will return a full HTML page with the first page of results, and /search?q=complexity&p=2 will return a full HTML page with the second page of results. We’ll put the search results into a <section id="results"> element because that gives us an easy target. This is what an individual page link looks like:

<li>
  <a hx-get="/search?q=complexity&amp;p=3" 
     hx-select="section#results"
     hx-target="section#results"
     hx-swap="outerHTML"
     hx-push-url="true">3</a>
</li>

These five attributes define the following behavior:

hx-get: When this link is clicked (or otherwise activated), make a GET request to /search?q=complexity&p=3.
hx-select: From the response to that, select a <section> element with the ID results.
hx-target: Swap that into the <section> element with the ID results currently on the page.
hx-swap: Replace the outer HTML of the target. In other words, replace the entire element.
hx-push-url: Push the URL /search?q=complexity&p=3 into the browser history and the address bar.

And that’s it! Now the pagination links replace only the part of the page they affect. We’ve had to write exactly zero JavaScript for it, and our application is still RESTful.

But wait, it gets even better. Htmx supports progressive enhancement. This means that you can start with basic, HTML-only functionality, and only if the browser allows JavaScript and the htmx library has been loaded, will its full functionality be enabled. This also makes it easier for search engines to understand your page structure. It looks like this:

<li>
  <a href="/search?q=complexity&amp;p=3" 
     hx-boost="true"
     hx-select="section#results"
     hx-target="section#results"
     hx-swap="outerHTML">3</a>
</li>

Instead of hx-get, we have a normal href attribute, and because we’ve defined hx-boost=true on an anchor tag, htmx knows to make a GET request to the URL in the href attribute and to push the URL into the browser history.

Search in htmx

If you’re still here, kudos. Now let’s circle back to the search example we’ve been looking at before and see what that would look like using htmx.

<input type="search"
       name="q"
       placeholder="Search"
       hx-get="/search"
       hx-target="section#searchResults"
       hx-select="section#searchResults"
       hx-trigger="search, click from:#searchButton"
       hx-indicator="#searchLoading"
       hx-sync="this:replace" />
<button id="searchButton">Search</button>
<div id="searchLoading" class="htmx-indicator">Loading...</div>
<section id="searchResults">
  <!-- put search results here -->
</section>

Hopefully, you understand what the hx-get, hx-target, and hx-select elements are doing here.

hx-trigger is defining when the requests are triggered. In this case, that is on the search event, or when a click event is fired by the element with ID searchButton. Next, hx-indicator sets an element that should be shown whenever this element has a request in flight. Finally, hx-sync defines that if any new requests are triggered while there is still one in flight, the one in flight is replaced by the new one.

This is functionally identical to the React version. However, using htmx, we can also get rid of those limitations I mentioned earlier: incremental search and updating the URL in the address bar.

<input type="search"
       name="q"
       placeholder="Search"
       hx-get="/search"
       hx-target="section#results"
       hx-select="section#results"
       hx-trigger="search, click from:#searchButton, keyup changed delay:300ms"
       hx-indicator="#searchLoading"
       hx-sync="this:replace"
       hx-push-url="true" />

The only new things are an additional trigger and the hx-push-url attribute. The new trigger, keyup changed delay:300ms, means the request should be triggered on ‘key up’ events, but only when the value of the field has changed, and only when no new key-up requests have been received for 300 milliseconds. This is incremental searching, but without sending a barrage of requests to the application, which is what would happen if you sent a request for every keystroke. Finally, because hx-push-url is set to true, the final URL of the request is set as the new location in the address bar, and the previous address is added to the browser history. If you’ve ever had to implement pushing and popping browser history, you’ll recognize how simple htmx’s solution is.

HOWL

If you have a front end implemented largely in JavaScript, then the following argument is likely to have reared its head:

Let’s use JavaScript on the back end as well, so we can reuse code, and front-end developers can work on the back end as well.

I loathe this argument, for several reasons.

First of all, JavaScript. JavaScript is not a great language for anything except inside the browser. It never was and it likely never will be, unless it gets a decent standard library and ditches its type coercion and many other broken language features. No amount of developer discipline and linting can truly mitigate this. Unfortunately, we’re kind of stuck with JavaScript in the browser, but let’s not use it anywhere else if we can help it.

Second of all, reusing code between the front end and the back end. It sounds like a good idea, but a web server responding to requests uses a very different application model than code running in a browser responding to user interaction. To reuse the code almost certainly means forcing one of the two into abstractions that don’t make a lot of sense. On top of that, because of the different application models and many more differences, there’s a substantial learning curve between front-end and back-end development.

The more you invest in JavaScript on the front end, the more the pressure of this argument will grow.

Fortunately, when using htmx, you won’t need to engage in this argument, since you’ll hardly write any JavaScript on the front end. You can use any server-side language and framework you like. There’s no pressure to use JavaScript if that isn’t preferred. Is most of your existing code base in Python? Django or Flask both integrate very well with htmx. Do you like Java or .NET? Maybe Ruby or Rust? All of these and more have excellent options to create HTML-based applications, and it’s trivial to add htmx to them. This is referred to by the htmx community as HOWL: Hypermedia On Whatever you Like.

Silver bullet?

I hope you’ll agree that htmx results in a simpler solution than a SPA framework, and that is worth a lot. Carson Gross (the developer behind htmx) talks about the Complexity Budget, or in other words: where in your application you allow complexity to exist. For most applications, the front-end should be just that, the front-end. What differentiates your application from others is usually the back-end functionality. Saving the complexity budget for the back-end, rather than spending it in the front-end, makes a lot of sense.

However, as this article starts, there is no such thing as a silver bullet. Htmx does not work in all situations. When a lot of your application is graphical (like a mapping or drawing app) or when a large portion of your UI needs to be redrawn in response to user input (like a spreadsheet or word processor), then you need a JavaScript-based solution. When your app needs to be available even when the user is offline, you’ll need a JavaScript-based solution. Fortunately, the vast majority of applications are just text and images and do not suffer from these limitations. For that vast majority, htmx is likely a good solution, and probably a better one than any of the currently popular SPA frameworks.

Have a look at the htmx site, where there is a lot of good documentation, as well as a collection of essays that do a much better job of explaining the merits than this brief article.