Performance regression tests on every pull request with Lighthouse and CircleCI

Web performance is important for a whole bunch of reasons, notably including user experience and the bottom line. And with our fancy 2019 JavaScript toolchains, it can be easy to accidentally ship something that sacrifices performance, especially for users who aren't on great networks or don't have the latest devices.

Lighthouse is an open-source tool for auditing web pages in terms of performance, accessibility, and more. It's a powerful tool for ensuring that you're delivering a performant experience to users. It's easy to get started using Lighthouse by installing it as a Chrome extension, running it against your page, and then looking over the report:

The report contains details about the individual audits contributing to the top level scores, as well as concrete recommendations for improvement.

And this is all good and chill. But what I really want is to integrate these audits into my CI/CD pipeline in a way that A) encourages people working on my project to embrace performance budgets, without adding too much friction and manual work to everyone's workflow, and B) automatically prevents regressions against those performance budgets.

Regressions undo hard-won achievements quickly and without mercy. It takes time to understand what's busted, hunt down the cause, and clean up the mess. William James wrote the following in 1890 in reference to building new personal habits, but it might as well be about software regressions:

Each lapse is like the letting fall of a ball of string which one is carefully winding up; a single slip undoes more than a great many turns will wind again.

So regression sucks, and as a consequence, preventing regression is good shit:

Link

And that all sounds good, but automating this kind of thing tends to get complex fast because (among other reasons):

  • There are a bunch of pieces involved and it isn't always clear how they relate to each other and who is responsible for what. For example: lighthouse, headless chrome, headed chrome, other browsers (?), containers, puppeteer, chrome-launcher, different CI/CD providers (Travis, CircleCI, Jenkins, etc)
  • There are a bunch of tools in the same neighborhood, with sometimes-overlapping aims. For example: WebPageTest, cypress.io, selenium, phantomJS
  • Slow feedback cycles and relative lack of visibility often involved in building out and testing CI/CD pipelines

This post walks step-by-step through defining performance budgets, then setting up a CI/CD pipeline in CircleCI such that every pull request (PR) against a Github repo runs Lighthouse in a docker container against the PR's code, propagating the scores back to the PR in a comment (along with links to the full reports), and optionally preventing the PR from being merged if the performance budgets aren't met.

We go over testing pages that require authentication, running multiple tests concurrently, smoothing out inconsistencies through multiple runs against each page, and writing custom audits to measure things that Lighthouse doesn't provide out of the box.

While this post focuses on using Lighthouse for performance testing, it also applies for testing accessibility, SEO, and "best practices", which are all part of the standard Lighthouse audits. This example is coupled to CircleCI but the general idea can be applied with different CI/CD providers.

The app under test

Here is the demo app we'll be running tests against – it is a marketplace for training videos:

Yes! The logo is Papyrus

App details

  • The production deployment lives at kubernetesfordogs.com
  • The staging deployment lives at staging.kubernetesfordogs.com
  • This is a single-page app bootstrapped with create-react-app. It uses AWS amplify to define and deploy the infrastructure in AWS – e.g. Cognito for authentication, S3 for storage, Cloudfront as a CDN – and to build and deploy the app onto that infrastructure. (This app is the answer to the question "what is the fastest way I can throw together a semi-realistic fully static SPA with real authentication but no servers or even server applications to manage, at basically no cost?" One cool thing I discovered is that, were it not for a couple minor snags such as the slow spinup for a Cloudfront distribution, it could be easy to use what amplify provides out of the box to roll your own equivalent of Netlify's "Deploy Previews", i.e., each PR is deployed into its own environment)
  • The app already has in place a basic CI/CD pipeline defined in CircleCI: for every branch push to Github, the app is deployed to staging. (Oversimplified setup so we can focus on the Lighthouse integration piece)

Performance testing goals

  • Define a high level performance budget such as "a Lighthouse performance score of at least 90."
  • Every time a PR is deployed, automatically run Lighthouse against it.
  • Run a custom performance audit that isn't part of Lighthouse's existing audits: we want to make sure the app's primary JavaScript bundle doesn't exceed a certain threshold (let's say 250kb).
  • Run Lighthouse against the home page as an anonymous (not logged in) user
  • Also, run Lighthouse against the dashboard page as an authenticated user, specifically a user who has a mountain of data that could potentially slow down their experience. The dashboard page requires logging in and looks like this:

  • To deal with inconsistencies (such as a flaky backend service we're interacting with), run Lighthouse against each page multiple times and extract a median or best score for each page.
  • Surface the results of the Lighthouse tests inside the Github pull request as a comment with the scores and a link to each of the full Lighthouse html reports.
  • If we don't meet the performance budgets, set the PR status check to "failed" so that we can prevent merging if desired.

First steps

I quickly discovered lighthouse-ci, a project which offers a number of different ways to get started with Lighthouse in a CI/CD environment. I decided against using it because:

  • They provide an easy option where you basically submit urls and performance budgets to a service they're running. They run Lighthouse against your page, and post the results as a comment on your PR (as lighthousebot, a GH account they own) and update your PR status check accordingly. This seems pretty great for simple use cases, but our project entails a level of custom configuration that makes this option impossible (side note: if this works for your use case, here is a post about setting it up). Next!
  • They also provide a way to run the same service they're running: a backend server that does the actual Lighthouse runs in a docker container, and a frontend server that is the glue between the backend, CI server, Github, and it also has a UI. This could probably work but I would prefer not having two long-running server processes for this system; I would like to spin up containers on demand to run Lighthouse because it is less operational overhead, likely to be cheaper, and easier to horizontally scale (so that we can run a bunch of tests at once!).
  • This project (at least the frontend and CLI components) seemed somewhat coupled to Travis CI, and our app is already happily using CircleCI.

I decided to snag the docker container used for their backend server because that seemed like a good starting point for running Lighthouse in a container.

Setting performance budgets

Let's pretend that we've had a giant meeting with all the stakeholders in the world (mainly dogs, presumably) and we effortlessly came to an agreement about our app's performance budgets. Now, we want to define these someplace high level in the project, and later programmatically compare them against the actual Lighthouse scores for each PR. Let's put them in package.json:

  "lighthouse": {
    "requiredScores": {
      "performance": 95,
      "accessibility": 90,
      "best-practices": 80,
      "seo": 90
    }
  }

The pipeline: step-by-step

Here's a high level overview of the pipeline we're going to build:

Will run in CircleCI on every push to Github repo

In CircleCI, each step above is called a job and the sequence of jobs is called a workflow.

This project assumes CircleCI is already integrated with the Github repo, but if it weren't, doing that is simple:

  • Sign up for a free account and authorize CircleCI to interact with Github
  • Go to "add projects" and add your repo
  • In your repo's root directory add a directory .circleci and inside that, config.yml
  • Turn off email notifications for the project :p
  • Enable Github checks integration for the repo

That's pretty much it!

We'll start with the basic setup and then add custom audits, authentication, and multiple test runs. We will skip the configuration for the first two jobs (installing dependencies and building/deploying the app) to focus on the Lighthouse part, but you can find it all in the Github repo, and you can even jump straight to the heavily-commented .circleci/config.yml file that wires everything together from the top-down.

Run Lighthouse

  perfTests:
    docker:
      - image: kporras07/lighthouse-ci

    steps:
      - checkout

      - run:
          name: Run lighthouse against staging deployment

          environment:
            TEST_URL: https://staging.kubernetesfordogs.com

          command: |
            lighthouse $TEST_URL \
              --port=9222 \
              --chrome-flags=\"--headless\" \
              --output-path=/home/chrome/reports/anonymous-"$(echo -n $CIRCLE_SHELL_ENV | md5sum | awk '{print $1}')" \
              --output=json \
              --output=html

      - persist_to_workspace:
          root: /home/chrome
          paths:
            - reports

Here we define a CircleCI job called perfTests.

CircleCI will pull the docker image  kporras07/lighthouse-ci from hub.docker.com and execute our steps inside it.

I wanted to use the docker image defined in the lighthouse-ci project, but the Lighthouse team does not publish an official docker image and I didn't feel like adding a step that builds/caches the docker image in my pipeline (I'd rather let CircleCI handle that for me), so I'm using kporras07/lighthouse-ci, which is published by someone unknown to me and theoretically reflects what is defined in the lighthouse-ci Dockerfile, but who knows. An official docker image published by the Lighthouse team would be preferable from a security point of view, but for this project I just want to keep moving, so, yolo.

We have just one step to execute. It defines a human-friendly name to display in the CircleCI UI, puts the url we want to run against into an environment variable called TEST_URL (we'll see why later), and invokes the command lighthouse. (Side note: CircleCI does not preserve docker entrypoints by default, which is perfect for our use case; we want to be dropped off at a shell in the container and provide our own set of commands.)

So we run lighthouse against staging.kubernetesfordogs.com. Noteworthy flags include --output, where we say that we want both json and html reports (the former for analyzing if we're under budget, the latter to show to humans) and --output-path, where we set a location and file prefix for the reports. The file prefix is a function of the value $CIRCLE_SHELL_ENV, which is provided by CircleCI and is unique for each container. This will be useful when we get into doing multiple runs at once (on second thought, this could probably have just been a random string but ok sure).

Next is persist_to_workspace. Here we take the reports generated inside the container in /home/chrome/reports and persist them to the workspace under the path /reports. A workspace is a place to store state across jobs within the same workflow. Our next job is going to analyze the reports so we're using the workspace to pass them along.

Analyze scores, update PR

 processResults:
    docker:
      - image: circleci/node:10.15.0

    steps:
      - checkout

      - restore_cache:
          keys:
            - node-v1-{{ checksum "package.json" }}-{{ checksum "yarn.lock" }}

      - attach_workspace:
          at: "."

      - store_artifacts:
          path: reports
          destination: reports

      - run:
          name: Analyze and report desired vs actual lighthouse scores
          command: ./ci-scripts/analyze_scores.js package.json reports

Here we define a CircleCI job called processResults.

CircleCI will pull the docker image circleci/node:10.15.0 from hub.docker.com and execute our steps inside it.  This is a pre-built convenience image provided by CircleCI that is a good starting point for doing Node.js stuff plus some additional common tools for CI/CD.

Our steps:

  • checkout: fetch our git repository
  • restore_cache: the first job that we hand-waved over used yarn to install our node modules and then cached them, and now we're fetching those from the cache instead of doing yarn again. The cache key is a concatenated checksum of package.json and a checksum of yarn.lock. This means that if one of those files changes, we'll have to do yarn again. (Why not just yarn.lock? I wanted an easy mechanism to bust the cache while building the pipeline, and this way I can just increment a counter in package.json to do that).
  • attach_workspace: here we attach the workspace where we saved all the reports in the previous step. We mount it in the current directory.
  • store_artifacts: here we upload all the reports as long-term artifacts that will be associated with this job. This means the reports will be in S3 where we can easily view the html reports, and do data analysis or whatever on the json reports.
  • run: here we run a script in our repo, ci-scripts/analyze_scores.js, that reads all the json reports from the Lighthouse runs, compares them to the performance budgets defined in package.json, and decides if we passed or failed. It prints some output to the console for more context inside the CI dashboard, and then it uses circle-github-bot to post a comment to the PR (really it posts a comment to the commit, and that comment shows up in the PR). I created a separate Github account to post these kind of messages, and created a personal access token for that account (with super narrow permissions), then fed the token into the circle-github-bot so it could comment on behalf of my new account.

Define the workflow

workflows:
  version: 2
  deployToStagingAndTest:
    jobs:
      - nodeModules
      - deployToStaging:
          requires: 
            - nodeModules
      - perfTests:
          requires: 
            - deployToStaging
      - processResults:
          requires: 
            - perfTests

Now we set up the a workflow called deployToStagingAndTest that wires together the sequence of the jobs.

Improvements

At this point we have a functioning basic setup! But can we do better???

Custom audits

We want to specify a maximum JavaScript bundle size for the main application bundle. Let's augment our new configuration section of package.json to reflect this, and we'll have our audit reference it:

  "lighthouse": {
    "requiredScores": {
      "performance": 95,
      "accessibility": 90,
      "best-practices": 80,
      "seo": 90,
      "bundle-size": 100
    },
    "maxBundleSizeKb": 250,
    "jsBundleRegex": "/static/js/[^(main)].*chunk\\.js"
  }

For scores, we've added a new category called bundle-size and specified that we must get a score of 100. It seemed like these numbers are all 0-100 so we'll basically just do 0 for fail and 100 for pass, and require a pass.

We've additionally specified maxBundleSizeKb of 250 and jsBundleRegex, a (fragile) JavaScript regular expression that will identify our bundle among other scripts.

Lighthouse provides helpful docs on writing custom audits. The first thing we need to write a custom audit is a custom configuration file to pass in when we invoke lighthouse. Let's create custom-config.js:

module.exports = {
  extends: "lighthouse:default",

  audits: ["bundle-size-audit"],

  categories: {
    "bundle-size": {
      title: "JS Bundle Size",
      description: "Can we keep it under the threshold???",
      auditRefs: [
        // When we add more custom audits, `weight` controls how they're averaged together.
        { id: "bundle-size-audit", weight: 1 }
      ]
    }
  }
};

We are extending the default configuration by:

  • Adding a new audit called bundle-size-audit. This tells Lighthouse to look for a file called bundle-size-audit.js that contains the actual audit definition.
  • Adding a new category called bundle-size that is a sibling to performance, accessibility, etc. It will be displayed prominently in the html reports. The category could include many audits but it only includes our new bundle-size-audit. (It probably would make more sense to put our new audit in the existing performance category, but we'll do it this we do it this way for illustration).

Next let's add the audit definition, bundle-size-audit.js:


const Audit = require("/usr/local/lib/node_modules/lighthouse").Audit;

class BundleSizeAudit extends Audit {
  static get meta() {
    return {
      id: "bundle-size-audit",
      title: "JS bundle size",
      failureTitle: `JS bundle exceeds your threshold of ${
        process.env.MAX_BUNDLE_SIZE_KB
      }kb`,
      description: "Compares JS bundle sizes with predefined thresholds.",
      requiredArtifacts: ["devtoolsLogs"]
    };
  }

  static async audit(artifacts, context) {
    const devtoolsLogs = artifacts.devtoolsLogs["defaultPass"];
    const networkRecords = await artifacts.requestNetworkRecords(devtoolsLogs);

    const bundleRecord = networkRecords.find(
      record =>
        record.resourceType === "Script" &&
        new RegExp(process.env.JS_BUNDLE_REGEX).test(record.url)
    );

    const belowThreshold =
      bundleRecord.transferSize <= process.env.MAX_BUNDLE_SIZE_KB * 1024;

    return {
      rawValue: (bundleRecord.transferSize / 1024).toFixed(1),
      // Cast true/false to 1/0
      score: Number(belowThreshold),
      displayValue: `${bundleRecord.url} was ${(
        bundleRecord.transferSize / 1024
      ).toFixed(1)}kb`
    };
  }
}

module.exports = BundleSizeAudit;

Here we define metadata about the audit, notably including requiredArtifacts, which states which gatherers the audit depends on. Gatherers gather data. We don't need to write and depend on a custom gatherer, because the devtoolsLogs already contain the data we need. So our audit function looks through network records, finds our JS bundle based on the regular expression in package.json, and compares its size to the limit defined in package.json. We return a score of 1 if it's all gravy, and otherwise 0. Before, I said the scoring scale was 0-100; it's really 0-1 but it's displayed as 0-100.

(What up withrequire("/usr/local/lib/node_modules/lighthouse"), you ask? This was a cheap and dirty hack to sidestep some permissions issues with npm link and ensure that we require the same lighthouse that is globally installed in the container. "Don't @ me")

Now, when we invoke lighthouse, we need to pass in our custom config file via --config-path, and we also want to read the maxBundleSizeKb and jsBundleRegex values we defined in package.json and pass those through as environment variables so that our custom audit can read them via process.env. Let's update our CircleCI config file accordingly:

- run:
    name: Run lighthouse against staging deployment
    environment:
      TEST_URL: https://staging.kubernetesfordogs.com
    command: |
      MAX_BUNDLE_SIZE_KB="$(node -p 'require("./package.json").lighthouse.maxBundleSizeKb')" \
      JS_BUNDLE_REGEX="$(node -p 'require("./package.json").lighthouse.jsBundleRegex')" \
      lighthouse $TEST_URL \
        --config-path=./lighthouse-config/custom-config.js \
        --port=9222 \
        --chrome-flags=\"--headless\" \
        --output-path=/home/chrome/reports/anonymous-"$(echo -n $CIRCLE_SHELL_ENV | md5sum | awk '{print $1}')" \
        --output=json \
        --output=html

Authentication

To test a page as a logged-in user we need to first do some setup. It's possible when invoking lighthouse to pass in an --extra-headers flag, which means that if we could do some setup work to grab an authentication token and then pass it in as a cookie, or some other other header, that could work. But we're using AWS Cognito for authentication, along with its browser JS SDK, and it's not quite as simple as getting a token and passing it along as a header, as far as I can tell.

There are a handful of values that we need to put in localStorage. It seems like the easiest way to deal with it would be driving a browser to the login form, programmatically entering our credentials, waiting for that request to succeed, and now that those values are in localStorage, continuing on with the Lighthouse test.

If you don't like hacks, put on your seatbelt!

We are going to write a custom gatherer that doesn't gather anything – we will abuse its lifecycle hooks to let puppeteer connect to the browser, do the login, and then peace out, all just before the performance test happens. LOL.

const Gatherer = require('/usr/local/lib/node_modules/lighthouse').Gatherer;
const puppeteer = require('puppeteer');

// https://github.com/GoogleChrome/lighthouse/issues/3837#issuecomment-345876572
class Authenticate extends Gatherer {
  async beforePass(options) {

    const ws = await options.driver.wsEndpoint();

    const browser = await puppeteer.connect({
      browserWSEndpoint: ws,
     });

    const page = await browser.newPage();
    await page.goto(process.env.TEST_URL);

    await page.click('input[name=username]');
    await page.keyboard.type(process.env.DOG_USER);

    await page.click('input[name=password]');
    await page.keyboard.type(process.env.DOG_PASSWORD);

    // lmao
    await page.click('span[class^=Section__sectionFooterPrimaryContent] button');

    // this means the login succeeded
    await page.waitForSelector('.dashboard');

    browser.disconnect();
    return {};
  }
}

module.exports = Authenticate;

We also need to create a custom audit that depends on this gatherer (see requiredArtifacts), otherwise, the gatherer won't be included in the run:

const Audit = require("/usr/local/lib/node_modules/lighthouse").Audit;
  // HACK! the "Authenticate" audit only exists to pull in its gatherer; 
  // otherwise no audits depend on the gatherer and it is not run.

class AuthenticateAudit extends Audit {
  static get meta() {
    return {
      id: "authenticate-audit",
      title: "Authenticate",
      failureTitle: "Did not authenticate (?)",
      description: "HACK! defining this just to depend on the Authenticate gatherer, so it runs.",

      requiredArtifacts: ["Authenticate"]
    };
  }

  static async audit(artifacts, context) {
    return {
      rawValue: 420,
      score: 100,
      displayValue: 'Ok',
    };
  }
}

module.exports = AuthenticateAudit;

And then, we augment our config file so that it includes the audit:

 audits: ["bundle-size-audit", "authenticate-audit"],

Back in the CircleCI config.yml we update the url environment variable to point to the authenticated page:

TEST_URL: https://staging.kubernetesfordogs.com/dashboard

Then we set the DOG_USER and DOG_PASSWORD environment variables in the CircleCI console, and it actually works! It logs in and then performs the tests.

There are other ways we could have done this, like by using chrome-launcher to launch the browser and then passing it to puppeteer and then passing it to lighthouse (or something) but it seemed like we'd have to switch to invoking lighthouse via Node API instead of the CLI and I wanted to be able to stick with the CLI invocation.

We can now create a job called perfTestsAuthenticated which incorporates all these changes, and we'll keep the original perfTests as it was before. We update the workflow like so:

workflows:
  version: 2
  deployToStagingAndTest:
    jobs:
      - nodeModules
      - deployToStaging:
          requires: 
            - nodeModules
      - perfTests:
          requires: 
            - deployToStaging
      - perfTestsAuthenticated:
          requires: 
            - deployToStaging
      - processResults:
          requires: 
            - perfTests
            - perfTestsAuthenticated

Multiple test runs

The cool thing is, to run the same job multiple times concurrently (i.e., do more than one test run for each url at the same time), now all we have to do is set a parallelism key in the job with a number greater than the default of 1. So let's bump that up for both perfTests and perfTestsAuthenticated jobs:

  perfTests:
    parallelism: 2

This plus the updated workflow means that we are running four tests (two test runs against each of two pages), all at the same time! Nice.

The whole flow

So now whenever we open a PR against this repo, the PR status checks go yellow while the workflow runs:

We can optionally click the "Checks" tab to get more information right inside Github:

And we can optionally see more detail on the CircleCI side:

When the tests are finished running, we'll get a comment back on our PR, and if we didn't meet our performance budget (as is the case for this PR) the PR check will have failed (which, with just one additional click in the repo settings, we could use to prevent this PR from being merged):

And finally, we can click the links in the PR comment to view detailed reports for each of the runs:

Discussion

We now have a pretty decent working example. Some things that are out of scope here but could be interesting next steps:

  • We could schedule a similar workflow to run daily against production and then use the json reports to visualize our Lighthouse scores over time. This could be a cool way to see medium- and long- term progress.
  • We could optimize the pipeline for better speed, flexibility, etc. A good first step might be improving our usage of the tools CircleCI provides for persisting data in workflows (e.g. use the workspace to persist node modules across sequential jobs in the same workflow, rather than using the cache).
  • We are currently using a free plan on CircleCI, and since our project is open-source, we can run four containers at any given time, each having access to 2 vCPUs and 4GB RAM. That seems to be plenty of resources for this project, but if there was some reason that it wasn't enough – maybe we decide we want to test a whole bunch of pages at once – we could look into keeping the same overall structure but migrating the actual Lighthouse container execution somewhere else that's elastic and lets us pay for actual usage.

My overall experience with Lighthouse is that it is an awesome piece of software. Props to the folks working on it! There are a couple downsides to be aware of if you're considering it:

  • It is not currently a great tool if your use case is "multi-step flows" where you want to test performance at every step, for example, add something to the cart, test performance, navigate to the checkout, test performance, etc. My understanding is that this is because lighthouse currently insists on controlling the navigation to a given page and during those navigations, state is generally lost – i.e., you can't be the one navigating around and maintaining state and periodically telling lighthouse "ok measure stuff now". It seems like the workaround we used to do an authentication step (a custom gatherer) could work for some cases with more steps, but it's unclear to me if that would always work – and it seems like it'd be pretty messy even if it did work.
  • It currently doesn't drive any browsers besides Chrome

In the past I've reached for WebPageTest private instances to do this kind of thing, and that is a really neat tool as well, but for cases where I can get away with it (not needing additional browsers, etc) I'll probably prefer Lighthouse going forward because it was easier to work with and customize, it's simpler to run on demand vs as one or more long-running processes, and getting the tests around accessibility, SEO, etc, for free is a sweet benefit.

Also, CircleCI pretty much rules.

If you enjoyed this post, check out my next post, which uses AWS Lambda to perform thousands of Lighthouse runs at the same time for the use case of testing a large site.

Show Comments