Chapter 20

Testing and deployment

Testing: it’s one of those potentially dull and yet highly contentious topics. I’ll try to avoid the dull, but there may be some contention brewing on the horizon.

Most programmers work on software that others have created and that others will develop after them. Sharing the code creates a community in which the team of developers live and interact. Preserving code quality not only has a direct impact on how well a developer can do their job, it becomes a moral obligation to their colleagues in the community.

Add to this the responsibility to deliver quality to the paying customer, the direct relationship between app uptime and revenue, plus the difficulty of testing in a complex web ecosystem, and it’s easy to see why web developers have a love/hate relationship with testing.

What to test

Web app tests fall into five main categories:

Functional tests: does the app work?
Compatibility tests: does the app work consistently for everyone?
Performance tests: does the app respond quickly and how does traffic affect performance?
Security tests: is the app secure against attacks?
Usability tests: is the app easy to use and does it respond to interaction as expected?

All these tests add up to a lot of time, but do you really need them all? As usual, it depends. Let common sense prevail. If your app is used in a hospital to prescribe medication doses, or it’s a critical financial component in a large enterprise, don’t skimp on the tests. On the other hand, I suspect that most of you are building a spanking new web app that doesn’t impact human safety or hundreds of jobs. The critical thing to remember for new ideas is that your app will probably change.

If you’re following this book’s advice to build a minimum viable product, you will do the bare minimum necessary to get something out the door as quickly as possible to test the waters. Once you know more about your market, you can refine your app and try again, each time inching closer to a product that your customers want.

Focus your tests around the MVP process and apply the same minimal approach. Test what you need to ensure that your product is given a fair chance in the market, but don’t worry about scaling to 100,000 customers or about the long-term testability of your code: it is likely to change significantly in the first few iterations. Your biggest problems right now are identifying desirable features and getting people to use the app.

Some types of test require greater investment than others before they pay off.

Some tests yield improvements quickly (the left graph) whereas others require more investment before they start to pay off (the right graph). Be lazy and target the quick wins for your MVP. Invest in medium-term tests when your app finds a foothold in the market, and as your codebase undergoes less change between revisions.

Functional testing

The lazy version of functional testing is more accurately called system testing, which confirms that the app works as a whole but doesn’t validate individual functions of the code.

First, you create a simple test plan listing the primary and secondary app features and paths to test before each release. Next, you or your team should personally use the app to run through the test plan and check for major problems. Finally, ask some friends or beta testers to use the app and report any problems.

That’s it. Any bugs that aren’t discovered through regular activity can probably be disregarded for the time being. It’s not comprehensive or re-usable, but for an MVP app with minimal features it should cover the basics. When your app gains traction and the codebase begins to settle between iterations, it’s time to progress to a medium-term investment in functional testing: automated unit and interface tests that enable efficient regression tests of changes between versions – don’t worry, I’ll decipher what these mean in the next section.

Unit tests

A unit test is a piece of code designed to verify the correctness of an individual function (or unit) of the codebase. It does this through one or more assertions: statements of conditions and their expected results.

To test a function that calculates the season for a given date and location, a unit test assertion could be, “I expect the answer ‘summer’ for 16 July in New York”. A developer normally creates multiple test assertions for non-trivial functions, grouped into a single test case. In the previous example, additional assertions in the test case should check a variety of locations, dates and expected seasons.

Not all functions are as straightforward to test. Many rely on data from a database or interactions with other pieces of code, which makes them difficult to test in isolation. Solutions exist for these scenarios, such as mock objects and dependency injection, but they create a steep initial learning curve for those new to unit testing.

It also takes some investment to pay off. Even a simple web app can contain hundreds of individual functions. Not all require a unit test, but even if the initial tests are focused solely on functions that contain critical logic, a significant number may be required to catch all of the important bugs.

The effort does eventually pay off, however. Commonly stated benefits of unit testing include:

Greater ease in making sizeable changes to the code. Unit tests are automated and isolated to individual functions, allowing you to quickly check whether changes break existing functionality.
Better designed, re-usable code. Developers familiar with unit testing tend to create code that is easier to test (known as the code’s testability). This inherently encourages best practices: independent, decoupled functions with clear logical responsibilities.
Better documentation. A unit test provides a developer with an easy way to understand a function’s behaviour, and is more likely to be updated than a document.

Some developers find unit testing so beneficial that they make it the initial scaffolding from which the app is developed. Under this test-driven development approach, each test is written prior to the functional code, to define the expectations of the function. When it is first run, the test should fail. The developer then writes the minimum amount of code necessary to satisfy the test. Confidence in the successful test enables the developer to iteratively re-factor improvements to the code.

Whether or not you decide to adopt test-driven development, hundreds of frameworks are available to ease your implementation of automated unit testing. The Wikipedia list¹ is a great place to start.

Automated interface testing

Web app logic is shifting from the server to the browser. As your app progresses from a simple MVP to a more mature product, the interface code will become more elaborate, and manual tests more cumbersome.

Automated interface tests share the same assertion principle as unit tests: conditions are set and the results are checked for validity. In the case of interface tests, the conditions are established through a number of virtual mouse clicks, form interactions and keystrokes that simulate a user’s interaction with the app. The result is usually confirmed by checking a page element for a word, such as a success message following a form submission.

In terms of automated interface test frameworks, a sole developer or small team of developers who are intimately familiar with the interface code may prefer the strictly code-oriented approach of a tool such as Watir². For larger teams or apps with particularly dynamic interfaces, the graphical test recording of Selenium³ may be better suited. Both tools support automated tests on multiple platforms and browsers.

Once you’ve started automated interface testing it’s easy to get sucked in, as you try to cover every permutation of user journey and data input. For the sake of practicality, it’s best to ignore business logic in the early days: tests for valid shipping and tax values are better served in unit tests. Instead, create tests for critical workflows like user registration and user login – verify paths through the interface rather than value-based logic.

Compatibility testing

Print designers and television companies enjoy a luxury unknown to web developers: the limits of their media. We must contend with the rapid proliferation of devices and software with differing capabilities, and must squeeze as much compatibility as we can out of our apps to attract and retain the largest possible audience.

The main causes of web app incompatibility are:

Web browsers. Internet Explorer, Firefox, Google Chrome and Safari each enjoy a non-trivial market share, with variations in rendering and JavaScript engines.
Versions. Web browsers regularly update their layout engines and technology support. Internet Explorer 9 renders a webpage very differently from Internet Explorer 6. Google Chrome progressed from version 0.2 to version 10 in less than three years⁴, a major release every three months on average.
Operating systems. Web browsers are not consistent across operating systems, even if the differences are limited to native interface components like form buttons. For example, Safari displays slight differences between Windows, Mac OS X and iOS.
Devices. Desktop computer, laptop, tablet, smartphone, LCD television and digital projector: each has a hardware and software profile that influences the display and experience of your web app.
Display profiles. Screen resolution, pixel density and colour management variations alter the perception of your app interface.
Configurations and preferences: Window sizes, plug-ins, fonts, zoom magnification and browser privacy settings vary from user to user.
Personal capabilities. Each of us has a unique set of physical and mental capabilities that changes throughout our lives, from our ability to see, hear and move, to the language and words that we understand. Addressing these accessibility needs is not only commercially shrewd, it is a legal requirement in many countries⁵.
Environments. The local environment may override the capabilities of the device and user. For example, audio, display or device input may be restricted if the user is browsing in a coffee shop, in direct sunlight or in a noisy classroom.

That seems like an awful lot to think about, but the extent to which the technical compatibility factors affect the success of your app will depend on your target market.

Markets with high and low technical sophistication tend to exhibit less diversity in web software and devices

If your app is targeted at web developers or other technology-savvy users (AKA geeks), you can presume that a majority will have recent web browser versions and sophisticated hardware. Alternatively, for an app designed for a traditional financial enterprise, you may be able to assume a majority of users with Microsoft operating systems and Internet Explorer. Only a mass-market app (Google, Facebook and the rest) must consider the widest possible variation in hardware, software and user capabilities from the get-go.

If you don’t trust your gut audience stereotypes to refine the range of your compatibility tests, use analytics data from your teaser page or MVP advertising campaign. Build a quantified profile of your target market and aim to provide compatibility for at least 90% of the users based on the largest share of browser/operating system/version permutations.

Worst case: if you can’t assume or acquire any personalised statistics, test compatibility for browsers listed in the Yahoo! A-grade browser support chart⁶.

It’s important to realise that not all market segments offer equal value. To use a sweeping generalisation as an example, you may find that Mac Safari users constitute a slightly smaller share of your visitors than Windows Firefox users, but they convert to paid customers at twice the rate. It’s important that you measure conversion as quickly as possible in your app lifecycle (you can start with users who sign up for email alerts on the teaser page) and prioritise compatibility tests accordingly.

You can overcome cross-browser inconsistencies if you take advantage of mature front-end JavaScript libraries, CSS frameworks and a CSS reset style sheet. Some additional problems may be resolved by validating your HTML and CSS; use the W3C online validator⁷ or install a browser validation plug-in to detect and correct mistakes.

To achieve accurate cross-browser compatibility you’ll need regular access to a variety of browsers, versions and operating systems. You may find that online services like Browsershots⁸ or Spoon⁹ suit your needs, but to regularly test dynamic web interfaces, nothing beats having a fast local install of the browser, either as a native installation or in a local virtual environment, such as VMWare¹⁰ or Parallels¹¹. Microsoft handily makes virtual images of IE6, 7 and 8 available¹². If you opt for virtualisation, upgrade your computer’s memory to appreciably improve performance.

Aiming for compatibility on all four major browsers, many developers favour a particular testing order:

Google Chrome. Excellent modern standards support, efficient rendering, a speedy JavaScript engine and integrated debugging tools make Chrome a great baseline for standards-based browser compatibility.
Safari. As the internal WebKit rendering engine is shared with Chrome, achieving Safari compatibility should be a straightforward second step.
Firefox. Firefox feels a little slower to regularly tweak-and-refresh, but a high adherence to standards and mature debugging tools ease compatibility from Chrome and Safari to Firefox.
Internet Explorer. Finally, once the standards-based browsers have been satisfied without too many tweaks to the code, it’s time to slog through the browser-specific workarounds for the several popular but often standards-averse versions of Internet Explorer.

Like cross-browser tests, accessibility tests are offered by a number of free online web services, (WAVE¹³, for example). Automatic tests can only detect a subset of the full spectrum of accessibility issues, but many of the better services behave as guided evaluations that walk you through the manual tests. For apps in development that aren’t live on the web, or for a faster test-fix-test workflow, you may prefer to use a browser plug-in, like the Firefox Accessibility Extension¹⁴.

Some of the most important accessibility issues¹⁵ to test include:

Do images, videos and audio files have accurate text alternatives?
Do form controls have relevant labels or titles?
Is the content marked up with the most appropriate semantic HTML, including form and table elements?
Does the app reset the focus to the new content after an Ajax content update?
Do colour and contrast choices allow the text to be read easily?
If colour is used to convey information (in required form fields, for instance), is the same information also available in text format?
Can all of the app functionality be accessed using the keyboard alone?

Performance testing

Web app responsiveness can be evaluated through performance tests, load tests and stress tests. For the sake of practicality, you may want to consider starting with simple performance tests and hold off on the more exhaustive load and stress tests until you’ve gained some customers. Luckily for us, scalable cloud hosting platforms enable us to be slightly lazy about performance optimisation.

Performance tests

Performance tests measure typical response times for the app: how long do the key pages and actions take to load for a single user? This is an easy but essential test. You’ll first need to configure your database server and web application server with profilers to capture timing information, for example with Microsoft SQL Server Profiler (SQL Server), MySQL Slow Query Log (MySQL), dotTrace¹⁶ (.NET), or XDebug¹⁷ with Webgrind¹⁸ (PHP). You can then use the app, visiting the most important pages and performing the most common actions. The resultant profiler data will highlight major bottlenecks in the code, such as badly constructed SQL queries or inefficient functions. The Yahoo! YSlow¹⁹ profiler highlights similar problems in front-end code.

Of course, this isn’t an accurate indication of the final production performance, with only one or two people accessing the app on a local development server, but it’s valuable nonetheless. You can eliminate the most serious obstructions and establish a baseline response time, which can be used to configure a timed performance test as an automated unit or interface test.

Yahoo! YSlow

Load tests

Load tests simulate the expected load on the app by automatically creating virtual users with concurrent requests to the app. Load is normally incremented up to the maximum expected value to identify the point at which the application becomes unresponsive.

For example, if an app is expected to serve 100 users simultaneously, the load test might begin at 10 users, each of whom make 500 requests to the app. The performance will be measured and recorded before increasing to 20 users, who each make 500 requests, and so forth.

New bottlenecks may appear in your web app profiling results that highlight a need for caching, better use of file locking, or other issues that didn’t surface in the simpler single-user performance test. Additionally, load tests can identify hard limits or problems with server resources, like memory, disk space and so on, so be sure to additionally profile your web server with something like top²⁰ (Linux) or Performance Monitor²¹ (Windows).

Because load tests evaluate the server environment as well as the web app code, they should be run against the live production server(s) or representative development server(s) with similar configurations. As such, the response times will more accurately reflect what the user will experience. Which brings us to an interesting question: what is an acceptable response time?

It all depends on the value of the action. A user is more prepared to wait for a complex financial calculation that could save them hundreds of dollars than to wait for the second page of a news item to load. All things considered, you should aim to keep response times to less than one second²² to avoid interrupting the user’s flow.

Free load testing software packages include ApacheBench²³, Siege²⁴, httperf²⁵, and the more graphical JMeter²⁶ and The Grinder²⁷.

Stress tests

A stress test evaluates the graceful recovery of an app when placed under abnormal conditions. To apply a stress test, deliberately remove resources from the environment or overwhelm the application while it is in use:

Use a load test tool to simulate an unsupportable volume of traffic.
Create a large temporary file that fills the available disk space.
Restart the database server.
Run a processor-intensive application on the web server.

When the resources are reinstated the application should recover and serve visitors normally. More importantly, the forced fail should not cause any detrimental data corruption or data loss, which may include:

Incomplete cache files that are mistakenly processed or displayed when the app recovers.
An incomplete financial transaction, where payment is taken from the customer but their order is not recorded.

Because a stress test tackles infrequent edge cases, it is another task that you can choose to defer until your app begins to see some success, even if you do have to deal with the occasional consequence, such as manual refunds and cache re-builds.

Security testing

The discouraging reality is that it’s impossible to be fully secure from all the varieties of attacks that can be launched against your app. To get the best security coverage, it is vital to test the security of your app at the lowest level possible. A single poor cryptography choice in the code may expose dozens of vulnerabilities in the user interface.

There really is no substitute for your team having sufficient knowledge of secure development practices (see chapter 18) at the start of the project. “An ounce of prevention is worth a pound of cure”, as the internet assures me Benjamin Franklin once said, albeit about firefighting rather than web app security.

The next best thing is to instigate manual code reviews. If you’re the sole developer, put time aside to review the code with the intention of checking only for potential security vulnerabilities. If you’re working in a team, schedule regular team or peer reviews of the code security and ensure that all developers are aware of the common attack vectors: unescaped input, unescaped output, weak cryptography, overly trusted cookies and HTTP headers, and so on.

If your app handles particularly sensitive information – financial, health or personal – you should consider paying for a security audit by an accredited security consultant as soon as you can afford to. Web app security changes by the week and you almost certainly don’t have the time to dedicate to the issue.

You should frequently run automated penetration tests, which are not a silver bullet but are useful for identifying obvious vulnerabilities. Many attackers are amateurs who rely on similar automated security test software to indiscriminately scattergun attack thousands of websites. By running the software first, you’re guarding against all but the most targeted of attacks against your app. Skipfish²⁸, ratprox²⁹⁷ and the joyously named Burp Intruder³⁰ are three such tools that can be used in conjunction with data from the attack pattern database fuzzdb³¹.

For a fuller understanding of web app security testing, put some time aside to read through the comprehensive OWASP Testing Guide³².

Usability testing

See chapter 15 for an in-depth look at usability testing.

Deployment

In the early stages of your web app’s development you’ll probably manually copy files from your local development computer to your online server. While you have no customers and a simple single server, it’s more valuable to devote your time to iterations of your MVP features than to a sophisticated deployment process.

As the customer base, technical complexity and hosting requirements of your app grow, the inefficiency and fragility of manual SFTP sessions will quickly become apparent.

Automated deployment isn’t only about enforcing the quality of code upgrades and reducing downtime. A solid deployment tool gives you the confidence to push changes to your users more easily, speed up the feedback loop on new features, isolate bugs quicker, iterate faster and build a profitable product sooner.

The simplest form of automated deployment will script the replication of changed files from your local environment to the live server. You can do this through your version control software³³ or rsync³⁴, but eventually you’ll run into problems with choreographing file and database changes, differences in local/live configuration, and any number of other technical intricacies.

A better solution assumes that the app deployment has three parts: the local preparation (or build), the transfer of files, and the post-upload remote configuration.

A build is normally associated with the compilation of code into executable files, but the term can also apply to popular interpreted web languages that don’t require compilation, such as PHP or Ruby. For the purposes of deployment, the release build process normally includes:

A fresh checkout of the code to a test environment.
Preparation of files for the live environment, which may include minimisation of JavaScript and CSS files, and bundling individual images into single sprite files.
A run through all automated tests where the deployment will halt on a failed test.
Configuration of files for the live environment (database settings, for instance).
Automatic creation of release notes or updated documentation.

With the build prepared, the files can be transferred to one or more live servers with rsync or a similar utility. The files for each release should be copied to a new time-stamped directory, not directly over the live files.

In the final stage of the deployment, the equivalent of an install script is run on the server to switch from the current release to the newly uploaded release:

The database is backed up.
Database migration scripts are run on the database to modify or add the appropriate tables and columns for the latest code³⁵. In most situations, this is safe to run on the live database because changes tend to be backwards compatible with the current live code, or should be designed to be so. In the worst case, the live site may need to be put into maintenance mode to prevent data modification while the live database is copied, major changes are applied, and the remainder of the install process is carried out.
Automated tests are run on the new live install. If a test fails, the install halts and all updates, including database migrations, are rolled back.
Cache and session files are reset.
The new version is made live by pointing the live server directory at the new release directory with a symlink. Using this system, no app files ever exist under the live server directory, which always points to a specific release directory.

Use a symlink to quickly move your website between different versions

Automated build and deployment tools are readily available in most popular web languages: Jenkins³⁶ and CruiseControl³⁷ (Java), Phing³⁸ (PHP) and Capistrano³⁹ (Ruby) are among those frequently used. Note that except for familiarity, there’s no reason why your deployment tool has to be the same language as your web app. Just because your app is in PHP, it doesn’t mean you should rule out the excellent Jenkins or Capistrano tools from your process.

For an extra layer of confidence in your release process, you should incorporate an intermediate deployment to a staging server, where you test database migrations on a copy of the live database and perform manual acceptance tests. If your team uses your app internally, you can even make the staging server the primary version of the app that you use, so that you ‘dogfood’ the new candidate version for a couple of days before pushing it out to the live server.

Updates

How often you release an update will depend on a number of factors: how rapidly you develop features; how much manual testing is required; how much time you have available for testing; and how easy or automated the deployment process is.

You should schedule releases so that you don’t build up a long backlog of changes. Each change adds risk to the release, and feedback is easier to measure when new features are released independently. Some companies like Etsy make dozens of releases a day⁴⁰, but this continuous deployment approach relies on a serious investment in deployment automation and comprehensive automated test coverage. A more reasonable schedule, as adopted by Facebook and others, is to aim for a release once a week.

Summary

Tests and deployment options come in many shapes and sizes; start with critical checks to your core features and gradually expand your test infrastructure as your app features stabilise and your user base grows.

Create a test plan of primary and secondary features and paths.
Run through the test plan with your team.
Implement unit tests for critical functions and complex business logic.
Implement automated interface tests to test paths through the interface.
Build a profile of your target market’s browser usage.
Use local virtualised browsers for compatibility testing.
Test for accessibility issues, including alternative text, appropriate semantic markup, use of colour and keyboard controls.
Profile your database and web server during normal use.
Load test to identify resource issues and bottlenecks.
Stress test to assess graceful recoverability.
Perform manual code reviews.
Run automated penetration tests.
Develop an automated build and deployment process.

Web App Success book cover Also available to buy in a beautiful limited edition paperback and eBook.

This work is licensed under a Creative Commons Attribution 4.0 International License.

¹ http://en.wikipedia.org/wiki/List_of_unit_testing_frameworks
² http://watir.com/
³ http://seleniumhq.org/
⁴ http://en.wikipedia.org/wiki/Google_Chrome#Release_history
⁵ http://www.w3.org/WAI/Policy/
⁶ http://developer.yahoo.com/yui/articles/gbs/
⁷ http://validator.w3.org/
⁸ http://browsershots.org/
⁹ http://spoon.net/browsers/
¹⁰ http://www.vmware.com/
¹¹ http://www.parallels.com/products/desktop/
¹² http://bit.ly/ie-vpc
¹³ http://wave.webaim.org/
¹⁴ http://www.accessfirefox.org/Firefox_Accessibility_Extension.php
¹⁵ http://www.w3.org/WAI/WCAG20/quickref/
¹⁶ http://www.jetbrains.com/profiler/
¹⁷ http://www.xdebug.org/
¹⁸ http://code.google.com/p/webgrind/
¹⁹ http://developer.yahoo.com/yslow/
²⁰ http://en.wikipedia.org/wiki/Top_(software)
²¹ http://technet.microsoft.com/en-us/library/cc749249.aspx
²² http://www.useit.com/papers/responsetime.html
²³ http://httpd.apache.org/docs/2.0/programs/ab.html
²⁴ http://www.joedog.org/index/siege-home
²⁵ http://www.hpl.hp.com/research/linux/httperf/
²⁶ http://jakarta.apache.org/jmeter/
²⁷ http://grinder.sourceforge.net/siege-home
²⁸ http://code.google.com/p/skipfish/
²⁹ http://code.google.com/p/ratproxy/
³⁰ http://portswigger.net/burp/intruder.html
³¹ http://code.google.com/p/fuzzdb/
³² http://www.owasp.org/index.php/Category%3AOWASP_Testing_Project
³³ Such as a Subversion export or Git push to the remote live server.
³⁴ http://breckyunits.com/code/use_rsync_to_deploy_your_website
³⁵ If you chose to use a schema-less NoSQL database, this step of the deployment is much easier.
³⁶ http://jenkins-ci.org/
³⁷ http://cruisecontrol.sourceforge.net/
³⁸ http://www.phing.info/trac/
³⁹ https://github.com/capistrano
⁴⁰ http://codeascraft.etsy.com/2010/05/20/quantum-of-deployment/