Published by Bill Siggelkow - January 8, 2019

Succeed with continuous delivery

Transform your development process and get an advantage over your competition.

Engineering at Calendly

Continuous Delivery (CD) of your software application can transform your development process and give you a significant advantage over your competition.

What is Continuous Delivery?

Jez Humble, co-author of Continuous Delivery succinctly defines the term as “a set of principles and practices to reduce the cost, time and risk of delivering incremental changes to users.” Two related terms are “continuous deployment” and “continuous integration” (CI).

Continuous deployment is defined as automated continuous delivery. You can’t have continuous deployment without continuous delivery.

Continuous integration is the practice of merging developer code branches to a shared mainline on a frequent basis (typically, several times on a given day). CI also implies building and testing of this shared code base on each merge. Continuous integration is one of the practices that help with continuous delivery.

Let’s look at some of the techniques and strategies that we have employed at Calendly that have brought us success with our Continuous Delivery process.

Enable continuous delivery with sound development principles

To be successful with CD, developers need to be able to code with confidence. You don’t want to be the one that breaks the build, let alone be the one that introduces a bug to production.

Image courtesy of Nicolas Sanguinetti

We can code and deliver with more confidence when we are able to keep changes small to reduce risk and impact to our production environment.

The delivery process itself needs to be something that we don’t need to think about (redefine) every time we merge code. Whether automated or not, we want to use a consistent, repeatable process that spans the entire development lifecycle. The more that we can automate that process, the less likelihood of human failure.

Since we will deliver frequent changes to production—a reasonable frequency for a small application could easily be once an hour or so—we need to minimize service interruptions on deploy.

All of this requires us to control the deployment environments with a fine-level of granularity so that a changeset delivered to production can be enabled or disabled, monitored, and potentially rolled back, if necessary, with ease.

Review and test so you can code with confidence

As a developer, I want to be assured that the code I write is high quality and faithfully provides the desired end-user experience. I want my code to be maintainable and consistent with internal as well as industry-accepted practices. I want to know that my code is free from side effects that might cause other parts of the application to break.

Peer review all code changes

At Calendly, all code changesets are managed via GitHub pull requests. A pull request (PR) cannot be merged into the mainline code branch until the PR has been reviewed and approved. We require approvals by development, quality assurance and product management. This multi-layered approach ensures that no single-point-of-failure exists in the review process.

Developer reviews focus on aspects such as code structure, architecture and consistency. Every code change should have corresponding changes to automated tests. It’s a red flag if one sees code changes to application code but no changes to test code. Developers are also encouraged to verify functionality as well. Quality assurance will review the code branch running in a live QA environment to ensure that the code meets the functional requirements and does not cause regression in other parts of the application. Product management has the final say-so to verify that the code meets product needs.

Use automated testing

Our automated test suite makes use of both “unit tests” and “feature tests.” The former test the code by isolating the “unit” of functionality and ensuring that the unit implements the expected behavior. The “unit” will vary depending on the code changes—it could represent a model class, a service-level object or something in between.

Our “feature tests” are generally end-to-end browser tests that provide automated tests that implement various use case scenarios of the functionality. These feature tests are based on the acceptance criteria in the “user stories” (more on that in a bit).

Use manual testing when needed

We strive to use automated testing as much as possible. However, in some cases it’s more cost effective for a human to perform certain types of tests than to develop a machine to do it. At Calendly, every pull request goes through manual testing. The manual testing is highly focused on the functionality provided by the code change. While quality assurance analysts will naturally perform some regression testing when testing a code change, we generally rely on our automated test suite as our first line of defense for regression bugs.

Manual testing gives us a platform for focused regression testing. For example, the Calendly application integrates with end user’s external calendar providers like Google, Office 365 and iCloud. A code change in this area of application for one provider may be made at a level of abstraction that may affect the other providers. When a developer creates a pull request, they include specific instructions to the QA analysts to focus their testing, ensuring that the enhancement for Google calendar integration does not break integrations with the other providers. Manual testing is made easier when we keep the impact of the changes small.

Ensure feature completeness with “test fest”

At Calendly, a typical feature implementation is the result of multiple pull requests. Once all pull requests have been delivered for a given feature, a final manual testing session is run to ensure that the feature, as a whole, meets all requirements. We refer to this as “test fest.” It’s the last step prior to rolling out a feature to production.

During “test fest,” each user story that comprises the epic is verified against multiple browsers, operating systems and devices, and user types. Here’s an example of how we use track and manage test fest:

Test Fest Plan

Make the process easier by keeping changes small

Success in continuous delivery starts with planning. We can only minimize risk on a code deployment if we can keep the impact and scope of a code change small.

At Calendly, we maintain a one-to-one relationship between a user story and a pull request. One user story will result in one, and only one, pull request. Therefore, to keep the code change small, the user story itself needs to be small and focused.

For example, let’s say we have a feature request like the following:

As a user,
I want an integration with Zoom,
So that meetings booked with me have Zoom video conferencing links generated automatically.

We would actually consider this story an “epic”—meaning it should be broken down into smaller stories and tasks.

  1. The user needs a page that tells them about the integration with Zoom.
  2. The user needs a way to establish a connection with their Zoom account via OAuth.
  3. The Zoom account information needs to be persisted so that it can be used to create video conferencing links when a meeting is booked.

In the planning process, developers work with product managers to ensure that user stories are sized appropriately. The benefits of keeping stories small include:

  • Code review is much easier and higher quality—wouldn’t you prefer to review six files rather than sixty!
  • Testing is easier because, normally, less of the application is affected by the change.

But if we are only delivering small code changes, how can we provide to our end users significant features that make a positive, game-changing impact on their user experience? These would seem to be conflicting goals between product development and strategic planning. We manage this dichotomy through fine-grained control of our deployment environments.

Control your deployment environments

Because our stories and associated code changes are small, we use feature flags to group the related code together into a set that can be selectively enabled at runtime.

Use feature flags to enable features at runtime

A feature flag controls the availability of a given feature to a single user. The feature flag is used to gate whether or not the user knows about the feature and has access to use it. In code, developers use explicit conditional checks, based on if a given feature flag is rolled out to a user, to determine what the application does. While this may seem like a highly intrusive way of coding, in practice the feature flag only needs to guard the code that makes the feature visually available to a user.

For example, let’s say I’m adding a new feature to my application that allows a user to integrate their Calendly account with Zoom. In the planning process, we can create a separate story for the hyperlink on the application that allows the user to enable the integration. Our first story will be to create a feature flag specifically for this.

Then, in the follow-on story, we will only need to use that feature flag on the webpage that displays the hyperlink.

{{#withOrgFeatureFlag 'zoom_feature'}}
  %a(href="/integrations/zoom")
  %h4 Zoom
    Automatically include Zoom meeting details in Calendly events.
{{/withOrgFeatureFlag}}

All of the other user stories for backend data models and services, even other web pages, will not need this conditional check—they will just need to be written in such a way that they can be tested by QA (with the feature flag turned on in QA environments). On production, we can have incomplete features without the risk of users seeing them.

There are development costs associated with feature flags. By definition, our codebase must support multiple paths through the application, based on whether or not a feature flag is enabled. Likewise, our automated tests now also need to support twice as many use cases—the use case with the feature enabled and the use case when the feature is disabled.

Over time, this can lead to a buildup of code cruft and technical debt. We manage this by strictly using feature flags as temporal settings. Once a given feature is out in production, rolled out to 100% and vetted as complete, we then deliver code that removes the feature flag (and the corresponding conditional logic in code and tests). We essentially clean up the cruft and pay down the tech debt.

Test changesets using ephemeral review environments

Whenever a developer creates a pull request from a code branch, a fully-functioning application environment automatically spins up, using the code on that branch.

GitHub PR Deployment

We use Heroku as our application platform, and these ephemeral instances are provided by Heroku and known as review apps. This review environment provides a platform for functional review by development, QA and product management. We have built a custom Slack command to automate some common tasks on review apps.

After review, the developer merges the pull request into the master branch and the review instance is spun down—making it available for new pull requests. The merge to master triggers a build and a test run on our continuous integration server. Upon successful run there, the code is automatically deployed to production. Immediately after deployment, developers and QA smoke test the code change on production as a final check. We also actively monitor our logs and error reports after delivery.

Minimize service interruptions

Because we deploy multiple times a day, our users cannot tolerate any downtime of our application or significant impacts to performance. We employ a number of strategies that help us maintain 100% uptime and avoid degradation of service.

Maintain zero downtime

Heroku provides a service known as preboot. Preboot spins up and begins sending traffic to new instances (dynos) of the application while old instances are still running and processing requests. Once all requests for the old instances have been processed and the new instances are up and fulfilling requests, the old instances are shut down. This ensures that no requests get dropped on deployment.

But you must be careful with this approach—for a short period of time, the old application code will be running at the same time as the new application code. This can be a problem—for instance—with database changes.

For example, let’s say that we want to remove a database column that is no longer needed. We will write a database migration to remove the column, and then remove usage of that column from application code. The database is a shared resource, and database migrations must be run before application code, that relies on the database change, is deployed. If we were to make the database change, old application instances will potentially break on new deployments.

Backwards compatibility issue

Therefore, we separate this type of change into multiple deployments:

  1. First we deploy a code change that excludes the column from the object-relational mapping tool that we are using (in our case, that is Rails ActiveRecord). The new application code behaves like the column is gone, but the old application code continues to function since the column still exists in the database.
  2. Then we deploy the database migration to physically remove the column from the database. At this point, there are no dependencies on that column in any application code and the column can safely be removed.

There can be other situations that call for this type of phased approach. For example, let’s say you are changing from one external email service provider to another. You need to be prepared to handle the case where the old email provider will still be handling requests until all the new application code, that uses the new provider, is spun up. This may require you to keep both services enabled for a short period of time. It’s not uncommon for us to have small code changesets to manage these interim periods. The benefits of zero downtime outweigh the cost of this added complexity.

There are other strategies for maintaining zero downtime such as the use of load balancers, so-called A/B switches and others. Look to your PaaS (platform-as-a-service) provider or local DevOps team to see what features are available for zero-downtime.

Plan for failure

Despite our best efforts to ensure that a production deployment does not result in bugs, and delivers the desired feature, things can and will go wrong. The best approach is to develop processes and strategies for handling this failure. We use a process of post-deployment testing and monitoring.

  1. Developers are required to smoke test their change immediately after deployment.
  2. For the first hour and a half after deployment, the developer monitors the logs and error reports.
  3. If an error results from the code change, the developer along with QA and product management determine the best approach for resolving the problem.

We have a separate process for managing production bugs. As a group, the severity of the error is determined. We use a classification system to determine impact:

  • Severity 1: Issues that significantly impact the business or most of our customers.
    • Example: Complete site outage.
    • Example: Customers unable to book meetings.
  • Severity 2: Issues that impact a significant a portion of our users in their critical paths.
    • Example: Users unable to login via username/password flow.
  • Severity 3: Issues that impact a small subset of users in their critical path or larger groups of users outside of the critical paths.
    • Example: Cannot schedule a meeting with Event Types that have Paypal enabled.
  • Severity 4: Issues that reflect poorly on Calendly or confuse our users, but do not impact our customers’ ability to use the product.
    • Example: Signup form has incorrect field label.
    • Example: Unable to view pricing information on marketing page when using an iPad.
  • Severity 5: Issues that do not have a significant impact on the business or our users.
    • Example: Outlook Desktop users see “something_went_wrong” instead of “Something Went Wrong” (translated message) if there is an issue.
    • Example: Event Type page shows as a single column (mobile layout) on desktop instead.
    • Default severity on new production bugs opened

Once the severity has been determined, a strategy for handling the error is agreed on. For example, a Severity 1 issue may call for an immediate rollback of the change. On the other hand, a Severity 5 issue may be something that can be tolerated in production for a time. In this case, we may decide to create a user story for the bug that can then be prioritized separately.

Make it yours

Your development process and product management needs are unique and will impact your approach to continuous delivery. Take the time to analyze what your goals are and be prepared to adjust the process as you go along.

Have you implemented or plan to implement continuous delivery in your organization? Comment below and let us know how it’s working for you.