Earlier we wrote about stress testing, featuring Blazemeter where you could learn how to do crash your site without worrying about the infrastructure. So why did I even bother to write this post about the do-it-yourself approach? We have a complex frontend app, where it would be nearly impossible to simulate all the network activities faithfully during a long period of time. We wanted to use a browser-based testing framework, namely WebdriverI/O with some custom Node.js packages on Blazemeter, and it proved to be quicker to start to manage the infrastructure and have full control of the environment. What happened in the end? Using a public cloud provider (in our case, Linode), we programmatically launched the needed number of machines temporarily, provisioned them to have the proper stack, and the WebdriverI/O test was executed. With Ansible, Linode CLI and WebdriverIO, the whole process is repeatable and scalable, let’s see how!
Infrastructure phase
Any decent cloud provider has an interface to provision and manage cloud machines from code. Given this, if you need an arbitrary number of computers to launch the test, you can have it for 1-2 hours (100 endpoints for a price of a coffee, how does this sound?).
There are many options to dynamically and programmatically create virtual machines for the sake of stress testing. Ansible offers dynamic inventory, however the cloud provider of our choice wasn’t included in the latest stable version of Ansible (2.7) by the the time of this post. Also the solution below makes the infrastructure phase independent, any kind of provisioning (pure shell scripts for instance) is possible with minimal adaptation.
Let’s follow the steps at the guide on the installation of Linode CLI. The key is to have the configuration file at ~/.linode-cli
with the credentials and the machine defaults. Afterwards you can create a machine with a one-liner:
linode-cli linodes create --image "linode/ubuntu18.04" --region eu-central --authorized_keys "$(cat ~/.ssh/id_rsa.pub)" --root_pass "$(date +%s | sha256sum | base64 | head -c 32 ; echo)" --group "stress-test"
Given the specified public key, password-less login will be possible. However this is far from enough before the provisioning. Booting takes time, SSH server is not available immediately, also our special situation is that after the stress test, we would like to drop the instances immediately, together with the test execution to minimize costs.
Waiting for machine booting is a slightly longer snippet, the CSV output is robustly parsable:
## Wait for boot, to be able to SSH in.
while linode-cli linodes list --group=stress-test --text --delimiter ";" --format 'status' --no-headers | grep -v running
do
sleep 2
done
However the SSH connection is likely not yet possible, let’s wait for the port to be open:
for IP in $(linode-cli linodes list --group=stress-test --text --delimiter ";" --format 'ipv4' --no-headers);
do
while ! nc -z $IP 22 < /dev/null > /dev/null 2>&1; do
sleep 1
done
done
You may realize that this is overlapping with the machine booting wait. The only benefit is that separating the two allows more sophisticated error handling and reporting.
Afterwards, deleting all machines in our group is trivial:
for ID in $(linode-cli linodes list --group=stress-test --text --delimiter ";" --format 'id' --no-headers);
do
linode-cli linodes delete "$ID"
done
So after packing everything in one script, also to put an Ansible invocation in the middle, we end up with stress-test.sh
:
#!/bin/bash
LINODE_GROUP="stress-test"
NUMBER_OF_VISITORS="$1"
NUM_RE='^[0-9]+$'
if ! [[ $NUMBER_OF_VISITORS =~ $NUM_RE ]] ; then
echo "error: Not a number: $NUMBER_OF_VISITORS" >&2; exit 1
fi
if (( $NUMBER_OF_VISITORS > 100 )); then
echo "warning: Are you sure that you want to create $NUMBER_OF_VISITORS linodes?" >&2; exit 1
fi
echo "Reset the inventory file."
cat /dev/null > hosts
echo "Create the needed linodes, populate the inventory file."
for i in $(seq $NUMBER_OF_VISITORS);
do
linode-cli linodes create --image "linode/ubuntu18.04" --region eu-central --authorized_keys "$(cat ~/.ssh/id_rsa.pub)" --root_pass "$(date +%s | sha256sum | base64 | head -c 32 ; echo)" --group "$LINODE_GROUP" --text --delimiter ";"
done
## Wait for boot.
while linode-cli linodes list --group="$LINODE_GROUP" --text --delimiter ";" --format 'status' --no-headers | grep -v running
do
sleep 2
done
## Wait for the SSH port.
for IP in $(linode-cli linodes list --group="$LINODE_GROUP" --text --delimiter ";" --format 'ipv4' --no-headers);
do
while ! nc -z $IP 22 < /dev/null > /dev/null 2>&1; do
sleep 1
done
### Collect the IP for the Ansible hosts file.
echo "$IP" >> hosts
done
echo "The SSH servers became available"
echo "Execute the playbook"
ansible-playbook -e 'ansible_python_interpreter=/usr/bin/python3' -T 300 -i hosts main.yml
echo "Cleanup the created linodes."
for ID in $(linode-cli linodes list --group="$LINODE_GROUP" --text --delimiter ";" --format 'id' --no-headers);
do
linode-cli linodes delete "$ID"
done
Provisioning phase
As written earlier, Ansible is just an option, however a popular option to provision machines. For such a test, even a bunch of shell command would be sufficient to setup the stack for the test. However, after someone tastes working with infrastructure in a declarative way, this becomes the first choice.
If this is your first experience with Ansible, check out the official documentation. In a nutshell, we just declare in YAML how the machine(s) should look, and what packages it should have.
In my opinion, a simple playbook like this below, is readable and understandable as-is, without any prior knowledge. So our main.yml
is the following:
- name: WDIO-based stress test
hosts: all
remote_user: root
tasks:
- name: Update and upgrade apt packages
become: true
apt:
upgrade: yes
update_cache: yes
cache_valid_time: 86400
- name: WDIO and Chrome dependencies
package:
name: "{{ item }}"
state: present
with_items:
- unzip
- nodejs
- npm
- libxss1
- libappindicator1
- libindicator7
- openjdk-8-jre
- name: Download Chrome
get_url:
url: "https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb"
dest: "/tmp/chrome.deb"
- name: Install Chrome
shell: "apt install -y /tmp/chrome.deb"
- name: Get Chromedriver
get_url:
url: "https://chromedriver.storage.googleapis.com/73.0.3683.20/chromedriver_linux64.zip"
dest: "/tmp/chromedriver.zip"
- name: Extract Chromedriver
unarchive:
remote_src: yes
src: "/tmp/chromedriver.zip"
dest: "/tmp"
- name: Start Chromedriver
shell: "nohup /tmp/chromedriver &"
- name: Sync the source code of the WDIO test
copy:
src: "wdio"
dest: "/root/"
- name: Install WDIO
shell: "cd /root/wdio && npm install"
- name: Start date
debug:
var=ansible_date_time.iso8601
- name: Execute
shell: 'cd /root/wdio && ./node_modules/.bin/wdio wdio.conf.js --spec specs/stream.js'
- name: End date
debug:
var=ansible_date_time.iso8601
We install the dependencies for Chrome, Chrome itself, WDIO, and then we can execute the test. For this simple case, that’s enough. As I referred to earlier:
ansible-playbook -e 'ansible_python_interpreter=/usr/bin/python3' -T 300 -i hosts main.yml
What’s the benefit over the shell scripting? For this particular use-case, mostly that Ansible makes sure that everything can happen in parallel and we have sufficient error-handling and reporting.
Test phase
We love tests. Our starter kit has WebdriverIO tests (among many other type of tests), so we picked it to stress test the full stack. If you are familiar with JavaScript or Node.js the test code will be easy to grasp:
const assert = require('assert');
describe('podcasts', () => {
it('should be streamable', () => {
browser.url('/');
$('.contact .btn').click();
browser.url('/team');
const menu = $('.header.menu .fa-bars');
menu.waitForDisplayed();
menu.click();
$('a=Jobs').click();
menu.waitForDisplayed();
menu.click();
$('a=Podcast').click();
$('#mep_0 .mejs__controls').waitForDisplayed();
$('#mep_0 .mejs__play button').click();
$('span=00:05').waitForDisplayed();
});
});
This is our spec file, which is the essence, alongside with the configuration.
Could we do it with a bunch of requests in jMeter or Gatling? Almost. The icing on the cake is where we stress test the streaming of the podcast. We simulate a user who listens the podcast for 10 seconds. For for any frontend-heavy app, realistic stress testing requires a real browser, WDIO provides us exactly this.
Test execution phase
After making the shell script executable (chmod 750 stress-test.sh
), we are able to execute the test either:
- with one visitor from one virtual machine:
./stress-test.sh 1
- with 100 visitors from 100 virtual machines for each:
./stress-test.sh 100
with the same simplicity. However, for very large scale tests, you should think about some bottlenecks, such as the capacity of the datacenter on the testing side. It might make sense to randomly pick a datacenter for each testing machine.
The test execution consists of two main parts: bootstrapping the environment and executing the test itself. If bootstrapping the environment takes too high of a percentage, one strategy is to prepare a Docker image, and instead of creating the environment again and again, just use the image. In that case, it’s a great idea to check for a container-specific hosting solution instead of standalone virtual machine.
Would you like to try it out now? Just do a git clone https://github.com/Gizra/diy-stress-test.git
!
Result analysis
For such a distributed DIY test, analyzing the results could be challenging. For instance, how would you measure requests/second for a specific browser-based test, like WebdriverI/O?
For our case, the analysis happens on the other side. Almost all hosting solutions we encounter support New Relic, which could help a lot in such an analysis. Our test was DIY, but the result handling was outsourced. The icing on the cake is that it helps to track down the bottlenecks too, so a similar solution for your hosting platform can be applied as well.
However what if you’d like to somehow gather results together after such a distributed test execution?
Without going into detail, you may study the fetch module of Ansible, so you can gather a result log from all the test servers and have it locally in a central place.
Conclusion
It was a great experience that after we faced some difficulty with a hosted stress test platform; in the end, we were able to recreate a solution from scratch without much more development time. If your application also needs special, unusual tools for stress-testing, you might consider this approach. All the chosen components, such as Linode, WebdriverIO or Ansible are easily replaceable with your favorite solution. Geographically distributed stress testing, fully realistic website visitors with heavy frontend logic, low-cost stress testing – it seems now you’re covered!