Home | Dries Buytaert

Acquia sponsors NPR

As I wrote in my previous post, you might be seeing a lot more of Acquia in the coming weeks. If you listen to NPR, you may have heard our new radio ads.

Like our highway billboards and train station takeover, our NPR campaign is another great opportunity to reach commuters.

NPR is a national non-profit media organization with a network of more than 1,000 affiliated radio stations across the United States — and quite a few use Drupal and Acquia for their sites. It boasts listenership of nearly 30 million, and its airwaves reach nearly 99 percent of Americans.

Our NPR ads are running during the morning and evening commutes. In addition, Acquia ads will be featured on the Marketplace Tech podcast, which is popular among technology decision makers. Between the podcasts and radio ads, the potential reach is 64 million impressions.

We have always believed in doing well by doing good. Sponsoring NPR creates brand awareness for Acquia, but also supports NPR financially. High-quality media organizations are facing incredible challenges today, and underwriting NPR's work is a nice way for Acquia to give back.

Acquia takes over a subway station

If you pass through Kendall Square MBTA station in the Boston area, you'll see a station "takeover" starting this week featuring the Acquia brand.

Like our highway billboards introduced in December, the goal is for more people to learn about Acquia during their commutes. I'm excited about this campaign, because Acquia often feels like a best-kept secret to many.

The Kendall Square station takeover will introduce Acquia to 272,000 daily commuters in one of the biggest innovation districts in the Boston area – and home to the prestigious MIT.

An Acquia poster at Kendall Square station featuring an Acquia employee

In addition to posters on every wall of the station, the campaign includes Acquia branding on entry turnstiles, 75 digital live boards, and geo-targeted mobile ads that commuters may see while looking at their phones while waiting for the train. It will be hard not to be introduced to Acquia.

What makes this extra special is that all of the ads feature photographs of actual Acquia employees (Acquians, as we call ourselves), which is a nice way to introduce our company to people who may not know us.

Optimizing site performance by "lazy loading" images

Recently, I've been spending some time making performance improvements to my site. In my previous blog post on this topic, I described my progress optimizing the JavaScript and CSS usage on my site, and concluded that image optimization was the next step.

Last summer I published a blog post about my vacation in Acadia National Park. Included in that post are 13 photos with a combined size of about 4 MB.

When I benchmarked that post with https://webpagetest.org, it showed that it took 7.275 seconds (blue vertical line) to render the page.

The graph shows that the browser downloaded all 13 images to render the page. Why would a browser download all images if most of them are below the fold and not shown until a user starts scrolling? It makes very little sense.

As you can see from the graph, downloading all 13 images take a very long time (purple horizontal bars). No matter how much you optimize your CSS and JavaScript, this particular blog post would have remained slow until you optimize how images are loaded.

"Lazy loading" images is one solution to this problem. Lazy loading means that the images aren't loaded until the user scrolls and the images come into the browser's viewport.

You might have seen lazy loading in action on websites like Facebook, Pinterest or Medium. It usually goes like this:

You visit a page as you normally would, scrolling through the content.
Instead of the actual image, you see a blurry placeholder image.
Then, the placeholder image gets swapped out with the final image as quickly as possible.

An animated GIF of a user scrolling a webpage and a placeholder images being replaced by the final image

To support lazy loading images on my blog I do three things:

Automatically generate lightweight yet useful placeholder images.
Embed the placeholder images directly in the HTML to speed up performance.
Replace the placeholder images with the real images when they become visible.

Generating lightweight placeholder images

To generate lightweight placeholder images, I implemented a technique used by Facebook: create a tiny image that is a downscaled version of the original image, strip out the image's metadata to optimize its size, and let the browser scale the image back up.

To create lightweight placeholder images, I resized the original images to be 5 pixels wide. Because I have about 10,000 images on my blog, my Drupal-based site automates this for me, but here is how you create one from the command line using ImageMagick's convert tool:

$ convert -resize 5x -strip original.jpg placeholder.jpg

-resize 5x resizes the image to be 5 pixels wide while maintaining its aspect ratio.
-strip removes all comments and redundant headers in the image. This helps make the image's file size as small as possible.

The resulting placeholder images are tiny — often shy of 400 bytes.

Large metal pots with wooden lids in which lobsters are boiled — *The original image that we need to generate a placeholder for.*

An example placeholder image shows brown and black tones — *The generated placeholder, scaled up by a browser from a tiny image that is 5 pixels wide. The size of this placeholder image is only 395 bytes.*

Here is another example to illustrate how the colors in the placeholders nicely match the original image:

A sunrise with beautiful reds and black silhouettes

An example placeholder image that shows red and black tones

Even though the placeholder image should only be shown for a fraction of a second, making them relevant is a nice touch as they suggest what is coming. It's also an important touch, as users are very impatient with load times on the web.

Embedding placeholder images directly in HTML

One not-so-well-known feature of the <img> element is that you can embed an image directly into the HTML document using the data URL scheme:

<img src="data:image/jpg;base64,/9j/4AAQSkZJRgABAQEA8ADwAAD/2wB
  DAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQICAQECAQEB
  AgICAgICAgICAQICAgICAgICAgL/2wBDAQEBAQEBAQEBAQECAQEBAgICAgICA
  gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgL/wA
  ARCAADAAUDAREAAhEBAxEB/8QAFAABAAAAAAAAAAAAAAAAAAAACf/EAB8QAAM
  AAQMFAAAAAAAAAAAAAAECAwcEBQYACAkTMf/EABUBAQEAAAAAAAAAAAAAAAAA
  AAIG/8QAJBEAAQIFAgcAAAAAAAAAAAAAAQURAAIDBDEGFBIVQUVRcYH/2gAMA
  wEAAhEDEQA/AAeyb5HO8o8lSLZd01Jz2nbKoK4yxDVvZqYl7uaV4CWdmZQSSS
  ST86FJBsafEJK15KD05ioNk4G6Yeg0V9bVCmZpXt08sB2hJ8DJ2Tn7H/2Q==" />

Data URLs are composed of four parts: the data: prefix, a media type indicating the type of data (image/jpg), an optional base64 token to indicate that the data is base64 encoded, and the base64 encoded image data itself.

data:[<media type>][;base64],<data>

To base64 encode an image from the command line, use:

$ base64 placeholder.jpg

To base64 encode an image in PHP, use:

$data =  base64_encode(file_get_contents('placeholder.jpg'));

What is the advantage of embedding a base64 encoded image using a data URL? It eliminates HTTP requests as the browser doesn't have to set up new HTTP connections to download the images. Fewer HTTP requests usually means faster page load times.

Replacing placeholder images with real images

Next, I used JavaScript's IntersectionObserver to replace the placeholder image with the actual image when it comes into the browser's viewport. I followed Jeremy Wagner's approach shared on Google Web Fundamentals Guide on lazy loading images — with some adjustments.

It starts with the following HTML markup:

<img class="lazy" src="placeholder.jpg" data-src="original.jpg" />

The three relevant pieces are:

The class="lazy" attribute, which is what you'll select the element with in JavaScript.
The src attribute, which references the placeholder image that will appear when the page first loads. Instead of linking to placeholder.jpg I embed the image data using the data URL technique explained above.
The data-src attribute, which contains the URL to the original image that will replace the placeholder when it comes in focus.

Next, we use JavaScript's IntersectionObserver to replace the placeholder images with the actual images:

document.addEventListener('DOMContentLoaded', function() {
  var lazyImages = [].slice.call(document.querySelectorAll('img.lazy'));

  if ('IntersectionObserver' in window) {
    let lazyImageObserver = new IntersectionObserver(
      function(entries, observer) {
        entries.forEach(function(entry) {
          if (entry.isIntersecting) {
            let lazyImage = entry.target;
            lazyImage.src = lazyImage.dataset.src;
            lazyImageObserver.unobserve(lazyImage);
          }
        });
    });

    lazyImages.forEach(function(lazyImage) {
      lazyImageObserver.observe(lazyImage);
    });
  }
  else {
    // For browsers that don't support IntersectionObserver yet,
    // load all the images now:
    lazyImages.forEach(function(lazyImage) {
      lazyImage.src = lazyImage.dataset.src;
    });
  }
});

This JavaScript code queries the DOM for all <img> elements with the lazy class. The IntersectionObserver is used to replace the placeholder image with the original image when the img.lazy elements enter the viewport. When IntersectionObserver is not supported, the images are replaced on the DOMContentLoaded event.

By default, the IntersectionObserver's callback is triggered the moment a single pixel of the image enters the browser's viewport. However, using the rootMargin property, you can trigger the image swap before the image enters the viewport. This reduces or eliminates the visual or perceived lag time when swapping a placeholder image for the actual image.

I implemented that on my site as follows:

const config = {
    // If the image gets within 250px of the browser's viewport, 
    // start the download:
    rootMargin: '250px 0px',
  };

let lazyImageObserver = new IntersectionObserver(..., config);

Lazy loading images drastically improves performance

After making these changes to my site, I did a new https://webpagetest.org benchmark run:

A diagram that shows page load times for dri.es before making performance improvements

You can clearly see that the page became a lot faster to render:

The document is complete after 0.35 seconds (blue vertical line) instead of the original 7.275 seconds.
No images are loaded before the document is complete, compared to 13 images being loaded before.
After the document is complete, one image (purple horizontal bar) is downloaded. This is triggered by the JavaScript code as the result of one image being above the fold.

Lazy loading images improves web page performance by reducing the number of HTTP requests, and consequently reduces the amount of data that needs to be downloaded to render the initial page.

Is base64 encoding images bad for SEO?

Faster sites have a SEO advantage as page speed is a ranking factor for search engines. But, lazy loading might also be bad for SEO, as search engines have to be able to discover the original images.

To find out, I headed to Google Search Console. Google Search Console has a "URL inspection" feature that allows you to look at a webpage through the eyes of Googlebot.

I tested it out with my Acadia National Park blog post. As you can see in the screenshot, the first photo in the blog post was not loaded. Googlebot doesn't seem to support data URLs for images.

A screenshot that shows Googlebot doesn't render placeholder images that are embedded using data URLs

Is `IntersectionObserver` bad for SEO?

The fact that Googlebot doesn't appear to support data URLs does not have to be a problem. The real question is whether Googlebot will scroll the page, execute the JavaScript, replace the placeholders with the actual images, and index those. If it does, it doesn't matter that Googlebot doesn't understand data URLs.

To find out, I decided to conduct an experiment. For the experiment, I published a blog post about Matt Mullenweg and me visiting a museum together. The images in that blog post are lazy loaded and can only be discovered by Google if its crawler executes the JavaScript and scrolls the page. If those images show up in Google's index, we know there is no SEO impact.

I only posted that blog post yesterday. I'm not sure how long it takes for Google to make new posts and images available in its index, but I'll keep an eye out for it.

If the images don't show up in Google's index, lazy loading might impact your SEO. My solution would be to selectively disable lazy loading for the most important images only. (Note: even if Google finds the images, there is no guarantee that it will decide to index them — short blog posts and images are often excluded from Google's index.)

Conclusions

Lazy loading images improves web page performance by reducing the number of HTTP requests and data needed to render the initial page.

Ideally, over time, browsers will support lazy loading images natively, and some of the SEO challenges will no longer be an issue. Until then, consider adding support for lazy loading yourself. For my own site, it took about 40 lines of JavaScript code and 20 lines of additional PHP/Drupal code.

I hope that by sharing my experience, more people are encouraged to run their own sites and to optimize their sites' performance.

Two internet entrepreneurs walk into an old publishing house

A month ago, Matt Mullenweg, co-founder of WordPress and founder of Automattic, visited me in Antwerp, Belgium. While I currently live in Boston, I was born and raised in Antwerp, and also started Drupal there.

We spent the morning together walking around Antwerp and visited the Plantin Moretus Museum.

The museum is the old house of Christophe Plantin, where he lived and worked around 1575. At the time, Plantin had the largest printing shop in the world, with 56 employees and 16 printing presses. These presses printed 1,250 sheets per day.

Today, the museum hosts the two oldest printing presses in the world. In addition, the museum has original lead types of fonts such as Garamond and hundreds of ancient manuscripts that tell the story of how writing evolved into the art of printing.

The old house, printing business, presses and lead types are the earliest witnesses of a landmark moment in history: the invention of printing, and by extension, the democratization of publishing, long before our digital age. It was nice to visit that together with Matt as a break from our day-to-day focus on web publishing.

An old printing press at the Plantin Moretus Museum

Dries and Matt in front of the oldest printing presses in the world

An old globe at the Plantin Moretus Museum

Why the EU Copyright Directive is a threat to the Open Web

After much debate, the EU Copyright Directive is now moving to a final vote in the European Parliament. The directive, if you are not familiar, was created to prohibit spreading copyrighted material on internet platforms, protecting the rights of creators (for example, many musicians have supported this overhaul).

The overall idea behind the directive — compensating creators for their online works — makes sense. However, the implementation and execution of the directive could have a very negative impact on the Open Web. I'm surprised more has not been written about this within the web community.

For example, Article 13 requires for-profit online services to implement copyright filters for user-generated content, which includes comments on blogs, reviews on commerce sites, code on programming sites or possibly even memes and cat photos on discussion forums. Any for-profit site would need to apply strict copyright filters on content uploaded by a site's users. If sites fail to correctly filter copyrighted materials, they will be directly liable to rights holders for expensive copyright infringement violations.

While implementing copyright filters may be doable for large organizations, it may not be for smaller organizations. Instead, small organizations might decide to stop hosting comments or reviews, or allowing the sharing of code, photos or videos. The only for-profit organizations potentially excluded from these requirements are companies earning less than €10 million a year, until they have been in business for three years. It's not a great exclusion, because there are a lot of online communities that have been around for more than three years and don't make more than €10 million a year.

The EU tends to lead the way when it comes to internet legislation. For example, GDPR has proven successful for consumer data protection and has sparked state-by-state legislation in the United States. In theory, the EU Copyright Directive could do the same thing for modern internet copyright law. My fear is that in practice, these copyright filters, if too strict, could discourage the free flow of information and sharing on the Open Web.

Dries Buytaert