Why We Broke Our Philosophical Vows to Bring You CircleCI 2.0

Written by z00b | Published 2017/08/03
Tech Story Tags: continuous-integration | philosophy | devops | software-development | circleci

TLDRvia the TL;DR App

A few weeks ago, we released CircleCI 2.0. This was a tremendous effort, involving every person at CircleCI by the time it reached General Availability. Exactly the kind of effort that we try to avoid as a CI/CD company.

We fundamentally changed the guts of our product, and there’s no way for that to not be terrifying. It took six months to get this in front of the first customer, and another nine to get to GA. It’s impossible to tell you how relieved we are to have reached this milestone because it means we can actually start delivering code in small chunks again.

So why would we, as a company that literally has ‘CI’ in its name, spend so much time crafting an actual release? Doesn’t that go against everything we believe?

Yes! But we had to do it because we absolutely believe in continuous delivery.

Green Fields Are Scary

For many of us who watched it happen, Joel’s words still ring true: rewriting the code is a Thing You Should Never Do. Tossing years of work aside and halting development to build a Better Version is a terrible, terrible decision: that highly attractive green field and its pastoral purity are grown on buried radioactive waste. That’s just not how we do things.

We’re used to a frequent cadence, finishing small features at a steady, rapid clip. Major releases scare us even more than your average software company. So swapping out the core of our entire platform was doubly dangerous: there was the inherently risky business of trying to recreate our product, Except Better, AND we had to work in the dark for months without the guiding light of constant validation.

It’s like asking a TV production team to stop making episodes and shoot an entire movie: you don’t get a pilot for validation, you don’t get to measure each episode, and you sure as hell don’t get to rewrite story arcs on the fly. You get one shot and a big budget, and you’d better not screw it up.

But it’s cool. Just make a blockbuster.

An Inflection Point

Our infrastructure’s efficiency was headed towards a local maximum. We could see the future of our own needs and our customers’, but we couldn’t see an incremental path towards that future. The global maximum was on a different hill and we needed to make a leap.

We racked our brains, searching for a way to gradually update our platform, but we came up short. Eventually, we reached a grim conclusion: we’d have to embark on a treacherous journey — a complete, non-continuous replatforming of our infrastructure.

Fundamental architecture changes don’t happen incrementally, we realized. Shit.

This path’s inherent danger forced us to take a hard look at our principles: of all of the benefits of CI/CD, what could we keep, and what would we have to toss out? How could we continue to make progress in other areas of the product while rewriting the core?

We could have been stubborn. But instead we asked ourselves these questions, breaking down our values and rebuilding them as we rebuilt CircleCI. We charged into this project because we knew what would be on the other side: better continuous delivery.

And we believe in that philosophy so utterly that it made sense to temporarily embrace an opposing worldview. By doing that, we were able to keep our commitment to continuous delivery. Ultimately, we had to make a giant leap so we could take small steps again, but towards a higher goal.

CircleCI: Our Continuous Mission

That being said, we made the most of the principles we could keep: we spiked, we time-boxed, and when we failed, we failed fast. We built tooling so we could run our own builds in parallel on 1.0 and 2.0, getting feedback early without disrupting our delivery. The new code for 2.0 was built solely on 2.0 so that issues wouldn’t fester. We got a closed alpha into the hands of a few customers as soon as we possibly could. All in the name of learning.

And we learned. There were design flaws and architecture flaws and decisions that just didn’t even make any sense. But that’s exactly what we needed to know and we are so thankful to have customers who dove in and helped us figure it out.

Once we’d ironed out the early kinks, we moved to a closed beta. The two major differences between the beta and the alpha were that (1) people could request access to the new platform, and (2) we’d be testing operations at scale. Scaling was its own set of learnings, but each step helped get us to a better product faster. Eventually, we opened the floodgates with an open beta, which ended when we released CircleCI 2.0.

Throughout the process, we also did our best to constrain the scope. We found appropriate seams to insert a modified build engine while minimizing required changes elsewhere. We also exposed the new flow via branch-level configuration, so we could solicit real feedback without interrupting the day-to-day software delivery of our customers.

So, we didn’t completely abandon our philosophy; no, we borrowed everything we could to help us get through an otherwise uncomfortable and risky delivery period. But we couldn’t wait to get back into our regular routine of reacting to customer needs and quickly shipping code.

Admittedly, it was tempting to continue tinkering with the product forever, caught in a vortex of endless perfectionism. But we knew we needed to get back to our core philosophy: continuously delivering value to our customers. So we committed to CircleCI 2.0 as our default platform and we’re already building new things on top of it.

In fact, without CircleCI 2.0, we couldn’t have brought you Workflows, the first major feature to use the new architecture.

To Explore Strange New Workflows

The old CircleCI was opinionated: we told you how you should work: build, test, deploy — in that order. Those phases were hammered into our configuration, and you had to live with it, whether your organization liked it or not. If a test failed, you’d have to return to square zero and run the whole thing again.

But CircleCI 2.0 allowed us to break these lockstep phases into loosely coupled jobs, which let us build Workflows. Now you don’t have to recompile your code every time a test fails; just re-run your build from the last failed job. Even better, each job can have an associated branch and parallelism level. Throw resources at the slower tests, run tests on some branches and only deploy master… the world is truly your configurable oyster.

We’re not telling you how to do your jobs anymore — you’re telling us how to do your jobs.

TL; DR

We’re proud of what we’ve achieved: builds on CircleCI 2.0 are blazing fast, great Docker support means it’s better aligned with how our customers work, and we’ve put way more control in the hands of you, the developer. But we don’t think of this “release” as the end of this story; this is just the prologue.

Now that we have this solid foundation, we’re stoked to build on top of it. Workflows is only the first of a series of capabilities that we’ll be building with CircleCI 2.0. Whether you’ve been with us for years or are just joining us now, we’re excited to take you with us.

So, stop waiting for great things to build — build great things.

Originally published at circleci.com on July 27, 2017.


Published by HackerNoon on 2017/08/03