Cypress on AWS Lambda: part 1

One of the things that sucks about end-to-end tests is that they're slow to run, and this slowness compounds as teams and test suites grow.

I want to run Cypress on AWS Lambda in the context of a CI/CD pipeline, because this would open the door to parallelizing the crap out of my end-to-end tests, severely limiting that compounding slowness.

But there are PROBLEMS.

At the time of this writing, Cypress can't drive a headless browser (though they're working on it). Cypress requires a display driver, which the Lambda execution environment does not provide. Cypress can drive Electron in "headless" mode but that also requires a display driver – under the covers it's still non-headless Chromium.

So, probably ill-advisedly, I want to drive a headed browser on AWS Lambda, and it seems like getting Xvfb, a virtual display server, into the Lambda function's execution environment might be a feasible, if rube-goldbergian, way to do it.

Part 1 of this series will focus on getting a single Cypress test to run on Lambda.

It can also be read as a guide (/cautionary tale?) about how to run something on AWS Lambda that you probably shouldn't.

In this post, I've intentionally preserved the issues I ran into, to show how I worked around or through them. If you don't want to deal with a long meandering post and/or you want to skip right to running Cypress on Lambda, see the Github repo.

Prior art

A few resources that were indispensable in figuring out how to do this:

High level recipe

  1. Create a local environment that's as similar as possible to AWS Lambda, so we can iterate with a much tighter feedback loop.
  2. Figure out what it takes to get Cypress running there.
  3. Extract whatever dependencies we just added and get them into the actual AWS Lambda execution environment.

Step 1: create a Lambda-like local environment

We'll use docker-lambda, a set of docker images that replicate the live AWS Lambda environment. This project rules!

For each supported Lambda runtime, such as nodejs8.10, docker-lambda provides two docker images: one for compiling dependencies (e.g. lambci/lambda:build-nodejs8.10) and one for invoking your function (e.g. lambci/lambda:nodejs8.10). The latter contains more of the restrictions of the actual Lambda execution environment.

We'll use 'em both.

Step 2: get Cypress running in Lambda-like local environment

Test setup

Before even trying to run Cypress locally inside one of the docker-lambda containers, I want to get a basic setup with a single e2e test running on my local machine (OS X).

The site we'll write an end-to-end test against is Prison Data, a project I did a few years ago that visualizes global incarceration rates:

Based on data from prisonstudies.org

First, we install Cypress and add a config file that it expects:

$ npm init
$ npm install cypress --save
$ echo "{}" > cypress.json

Then we create cypress/integration/sample_spec.js, a test that visits the home page and validates that the string "Prison Data" exists:

$ mkdir -p cypress/integration
$ cat > cypress/integration/sample_spec.js <<EOF
describe("My First Test", function() {
  it("Visits the Prison Data project", function() {
    cy.visit("https://stuartsan.github.io/prisonstudies");

    cy.contains("Prison Data");
  });
});
EOF

And now we run Cypress:

$ ./node_modules/.bin/cypress run

And it works!

Running in the Lambda build container

The next step is getting the exact same thing running inside a lambci/lambda:build-nodejs8.10 container. I'll create my own docker image that's based on that one, and try to reproduce what we've just done locally inside there:

from lambci/lambda:build-nodejs8.10

WORKDIR /app

COPY package.json .
COPY package-lock.json .
RUN npm install

COPY cypress.json .
COPY cypress ./cypress

CMD npx cypress run

We can build this image:

$ docker build . -t cypress-lambda

And run the container:

$ docker run --rm cypress-lambda

But it will not be happy:

[21:11:03] → Your system is missing the dependency: XVFB

At this point, I'm going to hop into a shell in the container to poke around:

$ docker run -it cypress-lambda bash
bash-4.2$

Ok so, Cypress told us we need Xvfb, let's try installing it:

bash-4.2$ yum -y install Xvfb

And running Cypress again:

bash-4.2$ ./node_modules/.bin/cypress run

Crap! We are missing shared libraries:

We can figure out specifically which libraries are missing by using ldd:

bash-4.2$ ldd /root/.cache/Cypress/3.1.5/Cypress/Cypress | grep 'not found'
        libgtk-x11-2.0.so.0 => not found
        libgdk-x11-2.0.so.0 => not found
        libpangocairo-1.0.so.0 => not found
        libatk-1.0.so.0 => not found
        libgdk_pixbuf-2.0.so.0 => not found
        libpango-1.0.so.0 => not found
        libXcursor.so.1 => not found
        libXrandr.so.2 => not found
        libXss.so.1 => not found
        libgconf-2.so.4 => not found

At this point I refer the reader to nightmare-on-amazon-linux.md, which details the process of stepping through and installing everything needed to run Nightmare on Amazon Linux.

Both Nightmare and Cypress are driving Electron, and that's what we're really installing most of these dependencies for as far as I can tell, so I had to make a couple tweaks, but ultimately pretty much copied their approach wholesale.

So at this point, our Dockerfile has been augmented to install a couple binaries and a bunch of libraries, and compile a few things from source:

from lambci/lambda:build-nodejs8.10

WORKDIR /app

RUN yum -y install wget

# eltool.sh taken from
# https://gist.github.com/dimkir/f4afde77366ff041b66d2252b45a13db
COPY eltool.sh .
RUN ./eltool.sh dev-tools
RUN ./eltool.sh dist-deps
RUN ./eltool.sh centos-deps
RUN ./eltool.sh gconf-compile gconf-install
RUN ./eltool.sh pixbuf-compile pixbuf-install
RUN ./eltool.sh gtk-compile
RUN ./eltool.sh gtk-install
RUN ./eltool.sh xvfb-install

# provides libasound
RUN yum install -y alsa-lib*

COPY package.json .
COPY package-lock.json .
RUN npm install

COPY cypress.json .
COPY cypress ./cypress

COPY link.sh .
RUN ./link.sh

CMD npx cypress run

Also worth noting is link.sh, which creates hard links to a few libraries in the same directory as the Cypress binary, because apparently Electron (which is inside the Cypress binary) expects that:

#!/bin/bash

cd ~/.cache/Cypress/3.1.5/Cypress/
ln -PL /usr/local/lib/libgconf-2.so.4
ln -PL /usr/local/lib/libgtk-x11-2.0.so.0
ln -PL /usr/local/lib/libgdk-x11-2.0.so.0
ln -PL /usr/local/lib/libgdk_pixbuf-2.0.so.0

And now, we can rebuild the image and run it again:

$ docker build . -t cypress-lambda
$ docker run --rm cypress-lambda

And it actually works!!!1

Running in the Lambda "run" container

The next step is getting it to run inside lambci/lambda:nodejs8.10 – which is more representative of the actual Lambda execution environment – within a nodejs Lambda function handler.

The way that is invoked is like:

docker run --rm -v "$PWD":/var/task lambci/lambda:nodejs8.10

And what that's doing is, mounting our current working directory into the container to /var/task, which is where our deployment package will end up in Lambda, and (as a default) looking for index.js and within that looking for a handler called handler.

So we need to get the binaries we installed in the build image (Xvfb and Cypress), and all the libraries they depend on, and extract them onto the host (my laptop). Within the build image, I'm going to move everything we know we need into a new directory called lib – we'll do this in a script called pack-lib.sh:

#!/bin/bash

mkdir lib

# copy Xvfb and Cypress binaries' shared library dependencies into lib
ldd /usr/bin/Xvfb \
  | cut -d' ' -f 3 | tr -d '\r' \
  | xargs -I{} cp -R -L {} ./lib/
ldd /root/.cache/Cypress/3.1.5/Cypress/Cypress \
  | cut -d' ' -f 3 | tr -d '\r' \
  | xargs -I{} cp -R -L {} ./lib/

# more dependencies we know we need
cp -L -R /usr/share/X11/xkb ./lib/
cp -L -R /root/.cache/Cypress/3.1.5/Cypress/* ./lib/

And run it toward the end of the Dockerfile:

COPY pack-lib.sh .
RUN ./pack-lib.sh

Now, we can rebuild the image, and back on the host, we can grab all that stuff, plus node_modules and Xvfb:

docker run --name cypress-lambda cypress-lambda sleep 1
docker cp -L cypress-lambda:/app/lib .
docker cp -L cypress-lambda:/app/node_modules .
docker cp -L cypress-lambda:/usr/bin/Xvfb .

And we need to write a handler function, index.js:

const Xvfb = require("./xvfb.js");
const cypress = require("cypress");

process.env.CYPRESS_RUN_BINARY = "/var/task/lib/Cypress";

var xvfb = new Xvfb({
  xvfb_executable: "./Xvfb",
  dry_run: false
});


exports.handler = function(event, context) {
  xvfb.start((err, xvfbProcess) => {
    if (err) context.done(err);

    function done(err, result) {
      xvfb.stop(err => context.done(err, result));
    }

    cypress
      .run({
        spec: "cypress/integration/sample_spec.js",
        env: {
          DEBUG: "cypress:*",
        },
      })
      .then(results => {
        console.log(results);
        done(null, results);
      })
      .catch(err => {
        console.error(err);
        done(err);
      });
  });
};

This handler depends on ./xvfb.js, another thing I copied from the nightmare-lambda-tutorial. I won't go into the details of that module except to say that it provides a wrapper we can use to ensure that Xvfb is running when our Lambda function is invoked. This module invokes the Xvfb binary in a new child process and sets LD_LIBRARY_PATH to point to wherever we have placed the shared libraries that Xvfb depends on.

So now let's try invoking it:

docker run --rm -v "$PWD":/var/task lambci/lambda:nodejs8.10

Annnnnd it fails:

2019-03-15T02:02:53.504Z	d58c144a-51a0-1a1d-12c2-32bcaecb11d4	[02:02:53]  Verifying Cypress can run /var/task/lib [started]
sh: /usr/bin/xkbcomp: No such file or directory
sh: /usr/bin/xkbcomp: No such file or directory
XKB: Failed to compile keymap
Keyboard initialization failed. This could be a missing or incorrect setup of xkeyboard-config.
(EE)
Fatal server error:
(EE) Failed to activate core devices.(EE)
2019-03-15T02:02:55.214Z	d58c144a-51a0-1a1d-12c2-32bcaecb11d4	[02:02:55]  Verifying Cypress can run /var/task/lib [failed]
2019-03-15T02:02:55.214Z	d58c144a-51a0-1a1d-12c2-32bcaecb11d4	[02:02:55] → Cypress failed to start.

/usr/bin/xkbcomp...what's that all about? Seems like Xvfb is assuming that /usr/bin/xkbcomp exists, but it doesn't. And there's no way for us to get it there –  in Lambda, only /tmp is writable! And /var/task is where our deployment package ends up.

Maybe we can use strings to examine the Xvfb binary and learn more:

$ strings Xvfb | grep xkbcomp
XKB: Could not invoke xkbcomp
"%s%sxkbcomp" -w %d %s -xkm "%s" -em1 %s -emp %s -eml %s "%s%s.xkm"
"Errors from xkbcomp are not fatal to the X server"
"The XKEYBOARD keymap compiler (xkbcomp) reports:"
XKB: Could not invoke xkbcomp: not enough memory

So xkbcomp is apparently a keymap compiler. And the format string thing ("%s%sxkbcomp...") seems like it's building up the path to /usr/bin/xkbcomp.

Googling around a bit, I ended up here, which suggested that the path is hardwired, and that maybe we could patch the binary to avoid needing to edit the source and compile Xvfb ourselves.

So that's what we're doing now. Everything is fine.

That format string is generating a command, which invokes xkbcomp, which compiles a keymap. We'll compile our own keymap in advance and hack that format string so that it INSTEAD generates a command that copies our precompiled keymap into place.

Compile the keymap

We'll add a new script, xkb-compile.sh, to run in the build image:

#!/bin/bash

cat <<EOF > default.xkb 
xkb_keymap "default" {
  xkb_keycodes             { include "evdev+aliases(qwerty)" };
  xkb_types                { include "complete" };
  xkb_compatibility        { include "complete" };
  xkb_symbols              { include "pc+us+inet(evdev)" };
  xkb_geometry             { include "pc(pc105)" };
};
EOF

xkbcomp -xkm default.xkb 

And that will compile default.xkm. We'll also update the pack-lib.sh script to put the keymap in with everything else we need to extract from the build container:

cp /app/default.xkm ./lib/
Patch the binary

If we target that format string like this, we can see its offset in the binary (1546856):

bash-4.2$ strings -t d /usr/bin/Xvfb | grep xkbcomp | grep xkm
1546856 "%s%sxkbcomp" -w %d %s -xkm "%s" -em1 %s -emp %s -eml %s "%s%s.xkm"

Then we can use this fact, and the dd command, to replace that string with something else:

bash-4.2$ echo -n 'R="%X%X%d%X%X%X%X%X%X" /bin/cp /var/task/lib/default.xkm /tmp/%s.xkm' \
  | dd bs=1 of=/usr/bin/Xvfb seek=1546856 conv=notrunc

We're creating a variable R just to throw away the first nine interpolated values. Also note that our string happens to match the length of the string it's replacing; if it were shorter we'd need to pad it with some whitespace. Now, when Xvfb is like "time to compile a keymap!" instead of doing that, our command will be executed, copying our pre-compiled keymap into /tmp.

Boom (?)

So, now we have patch.sh:

#!/bin/bash

# https://unix.stackexchange.com/a/315172
# you gotta see it to believe it
echo 'patching Xvfb binary, yolo...'
position=$(strings -t d /usr/bin/Xvfb | grep xkbcomp | grep xkm | cut -d' ' -f1)
# this string needs to match the length of the string it's replacing
echo -n 'R="%X%X%d%X%X%X%X%X%X" /bin/cp /var/task/lib/default.xkm /tmp/%s.xkm' \
  | dd bs=1 of=/usr/bin/Xvfb seek="$position" conv=notrunc

And we update the Dockerfile to run both new scripts in the build image:

COPY xkb-compile.sh .
RUN ./xkb-compile.sh

COPY patch.sh .
RUN ./patch.sh

And once again build and schlep the stuff from the build container onto the host, and run:

docker run --rm -v "$PWD":/var/task lambci/lambda:nodejs8.10

And now, Cypress starts but hangs forever at Verifying Cypress can run. I found this issue and tried setting DEBUG for more informative logs, but nothing changed.

At this point I'm too deep to back out, so I start hacking at the Cypress source in node_modules/cypress/lib/exec/run.js. I found this line:

return verify.start().then(run);

And cut out the verification step altogether, replacing it with:

return run();

Now, I get the real error that was apparently being swallowed in the verification step:

A JavaScript error occurred in the main process
Uncaught Exception:
Error: Failed to get 'appData' path
    at Object.<anonymous> (/var/task/lib/resources/electron.asar/browser/init.js:149:39)

It turns out Electron, like other browsers, has a per-user application data directory, and by default that is ~/.config on linux. This is a problem because Electron wants to write to that directory, and we can only write to /tmp. We can override the directory with XDG_CONFIG_HOME though, so we'll do that in index.js:

process.env.XDG_CONFIG_HOME = "/tmp";

And revert the Cypress source change and run it again, and it works!!!

Step 3: Extract dependencies, get 'em into Lambda

We've already figured out how to extract the dependencies from the build container. Now, we just need to package and deploy it all to Lambda.

But there is a big problem:

$ du -sh lib
531M    lib

All the dependencies we packed into lib take up 531MB. And a Lambda function's max deployment package size (unzipped) is 250MB. And we can only write to /tmp, 512MB max.

How big is lib compressed?

$ tar czf lib.tar.gz lib/
$ du -sh lib.tar.gz
144M    lib.tar.gz

Ok. We'll take a two-pronged approach to this problem:

  1. Trim whatever we can in lib to get it under 512MB
  2. Compress lib and jam it into the deployment package, then at runtime extract it to /tmp.

I found several node_modules directories in lib and ran node-prune on a few of the larger ones, which brought lib down to about 490MB. Further pruning would be better but it's good enough for now!

We add this to pack-lib.sh in the build image:

curl -sfL https://install.goreleaser.com/github.com/tj/node-prune.sh \
  | bash -s -- -b /usr/local/bin
node-prune lib/resources/app/packages/server
node-prune lib/resources/app/packages/https-proxy
node-prune lib/resources/app/packages/electron

And after packing the lib, back in the Dockerfile, tar and gzip it up:

RUN GZIP=-9 tar cvzf lib.tar.gz ./lib

We augment our handler in index.js so each cold start extracts the tarball to /tmp/lib before running Cypress:

let libExtracted = false;

exports.handler = function(event, context) {
  if (!libExtracted) {
    child_process.execSync("rm -rf /tmp/* && tar xzf lib.tar.gz -C /tmp", {
      stdio: "inherit"
    });
    libExtracted = true;
  }

And any of our previous references to /var/task/lib need to be changed to /tmp/lib (for the binary patch thing, this means padding the new string with whitespace so it maintains the same length!).

I've been invoking the handler function locally like this:

docker run --rm -v "$PWD":/var/task lambci/lambda:nodejs8.10

The directory where my deployment package is mounted, /var/task, is writeable within the container. Which prevents me from hitting problems that I'll hit in the actual Lambda execution environment, where /var/task is read-only.

So really, I should've been tacking :ro onto the volume mount, to make it read-only and hit problems sooner:

docker run --rm -v "$PWD":/var/task:ro lambci/lambda:nodejs8.10

And adding that leads us to our next failure that we'll also hit on AWS:

Error reading from: `/var/task/cypress.json`
`Error: EROFS: read-only file system, access '/var/task'`
{"failures":1,"message":"Could not find Cypress test run results"}

Cypress is trying to write to /var/task/cypress (and maybe even it's trying to open /var/task/cypress.json in read/write mode?) because it sees /var/task as the project root.

I arrived at a very lazy solution of moving all the cypress stuff to /tmp at handler runtime:

  if (!libExtracted) {
    child_process.execSync("rm -rf /tmp/* && tar xzf lib.tar.gz -C /tmp", {
      stdio: "inherit"
    });
    child_process.execSync("cp /var/task/cypress.json /tmp/cypress.json", {
      stdio: "inherit"
    });
    child_process.execSync("cp -R /var/task/cypress /tmp", {
      stdio: "inherit"
    });
    libExtracted = true;
  }

And telling cypress where to find it:

    cypress
      .run({
        spec: "/tmp/cypress/integration/sample_spec.js",
        env: {
          DEBUG: "cypress:*",
        },
        project: "/tmp"
      })

And that resolved that.

Send it

We're finally ready to deploy it and see what happens. I moved all the existing files into a directory called lambda, and created lambda.tf alongside it, which defines everything we need in Terraform:

provider "aws" {
  region                  = "us-west-2"
  shared_credentials_file = "~/.aws/credentials"
  profile                 = "default"
}

resource "aws_s3_bucket" "cypress_lambda" {
  bucket = "cypress-lambda"
  acl    = "private"
}

data "archive_file" "lambda" {
  type        = "zip"
  source_dir  = "lambda/"
  output_path = "lambda.zip"
}

resource "aws_s3_bucket_object" "lambda" {
  bucket = "${aws_s3_bucket.cypress_lambda.id}"

  key    = "lambda.zip"
  source = "${data.archive_file.lambda.output_path}"
  etag   = "${md5(file("lambda.zip"))}"
}

resource "aws_lambda_function" "cypress_runner" {
  function_name = "cypress_runner"
  s3_bucket     = "${aws_s3_bucket.cypress_lambda.id}"
  s3_key        = "${aws_s3_bucket_object.lambda.key}"
  role          = "${aws_iam_role.lambda.arn}"
  handler       = "index.handler"
  runtime       = "nodejs8.10"
  memory_size   = 3008
  source_code_hash = "${md5(file("${data.archive_file.lambda.output_path}"))}"
  timeout       = 90 
}

resource "aws_iam_role" "lambda" {
  name = "lambda"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

resource "aws_iam_role_policy" "lambda" {
  name = "lambda_init"
  role = "${aws_iam_role.lambda.id}"

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
		{
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*",
      "Effect": "Allow"
    }
  ]
}
EOF
}

In addition to defining the Lambda function, this zips up lambda/ into lambda.zip, our deployment package, and sends it to S3 (rather than doing direct upload – too large). It includes the minimal permissions needed to invoke the function and write logs to CloudWatch.

After doing terraform init and terraform apply, I navigate to the AWS console and trigger the function there.

Meanwhile, in CloudWatch:

As you can imagine, it didn't work. Timed out waiting for the browser to connect.


A few hours later I was doing something else and I remembered Marco Lüthy's post about running headless Chrome on Lambda. Something about the critical importance of /dev/shm???

We're running Chromium 59, via Electron, via Cypress. As I understand it, that version of Chromium made the hardcoded assumption of being able to write to /dev/shm and there was no way to disable it. Marco changed the Chromium source and compiled it to work around this. That doesn't seem like it's going to work here.

Let's see if we can find any hardcoded references to /dev/shm in Electron:

$ strings lib/Cypress | grep 'dev\/shm'
/dev/shm/
/dev/shm
/dev/shm.  Try 'sudo chmod 1777 /dev/shm' to fix.

Well, there ya go. We're already out here patching binaries. Let's just change the first two to /tmp/shm and see what happens???

Everything. Is. Fine.

We'll update patch.sh so that in the build image, now we also patch the Cypress binary to substitute in /tmp/shm:

position=$(strings -t d /app/lib/Cypress | grep '/dev\/shm\/' | cut -d' ' -f1)
echo -n '/tmp/shm/' | dd bs=1 of=/app/lib/Cypress seek="$position" conv=notrunc

position=$(strings -t d /app/lib/Cypress | grep '/dev\/shm' -m 1 | cut -d' ' -f1)
echo -n '/tmp/shm' | dd bs=1 of=/app/lib/Cypress seek="$position" conv=notrunc

And add one final setup statement in the handler:

child_process.execSync("mkdir /tmp/shm", { stdio: "inherit" });

And rebuild the image, and schlep the deps, and terraform apply to send the latest to AWS, and trigger the function again.

And miraculously, it works:

Cha-ching!

A cold start runs in about 15 seconds total, and max memory used is 1289 MB – way less than I provisioned (3008 MB). The "headed" Electron variant also runs fine.

Conclusion

I have no idea how reliable this is; at this point my only claim is that it is possible to run Cypress on Lambda today.  

As soon as Cypress supports headless Chromium this will become unnecessary. But I hope it's a useful look into what's involved in getting something tricky running on Lambda.

You can find the Github repo here.

Part two will build upon this to parallelize a Cypress end-to-end test suite. Subscribe to my newsletter to be notified when it comes out!

Show Comments