One vs. many — Why we moved from multiple git repos to a monorepo and how we set it up

How managing source code became transparent

This blog post is part of a series where I share our migration from monolithical applications (each with their own source repository) deployed on AWS to a distributed services architecture (with all source code hosted in a monorepo) deployed on Google Cloud Platform.

Part 1: “A monorepo, GitHub Flow and automation FTW”
Part 2 (this post): “One vs. many — Why we moved from multiple git repos to a monorepo and how we set it up”
Part 3: “A (mostly) automated release process”
Part 4: “Our approach to software development consistency”
Part 5 (coming soon): “Debug microservices locally”

Multiple repositories means multiple everything

Let’s list some of the things we need to manage with a repository:

Dependencies
Test configuration
Pull request templates
Pull requests / labels
ESLint
Prettier
Deployment and release scripts

For some things, such as managing dependencies, services like Greenkeeper may help. However, if a dependency releases a new major version, you have to manually apply that to all repositories and run the tests.

It became clear that none of us enjoyed any of these maintenance tasks and we rather spend the time to make our market research chatbots more valuable to our customers.

Dependencies

Lerna

Our code is mainly written in Javascript, which brought us to look at Lerna.

Lerna is a tool for managing JavaScript projects with multiple packages.

We decided to take this one step further. Instead of managing our npm packages only, we configured Lerna to also manage our services, which live in the same monorepo.

Our monorepo directory structure is as follows:

.├── lerna.json├── package.json├── packages└── services

The lerna.json file is straight forward:

{"lerna": "2.4.0","npmClient": "yarn","useWorkspaces": true, // See "Yarn Workspaces" below"packages": ["packages/*", "services/*"],"version": "independent"}

With this configuration, our services can depend on packages and Lerna takes care of symlinking them. For example, we can run yarn add package-z within the services/service-a directory and lerna symlinks package-z properly. No more dealing with yarn link.

NPM scoped packages

To Lerna, packages/* and services/* are considered packages. Most lerna commands support the --scope flag, but that only works if you follow a strict naming convention for your name properties in the package.json files.

We decided to separate packages from services by using different scoped packages. Since packages/* get deployed to NPM, they use the company default scope (e.g. @my-company). Services in services/* use a @my-company-servicesscope. Packages and services are further prefixed with web-* vs svr-* to distinguish between different types of packages and services.

Yarn Workspaces

Lerna is great at managing inter-dependencies and running npm scripts or even arbitrary commands across all packages or subsets thereof.

However, each package and service by default gets their own node_modules folder. That is a lot of duplication…

The fine folks who give us Yarn released “Workspaces” and kindly enough blogged how to use it with Lerna: https://yarnpkg.com/blog/2017/08/02/introducing-workspaces/

Besides the "useWorkspaces": true in the lerna.json, you also have to add "workspaces": ["packages/*", "services/*"] to your root package.json file. That’s it.

Now when you run yarn and lerna bootstrap, your root node_modules folder contains close to all npm packages you ever need. This saves both time and disk space. The following showcases the difference between not using Yarn Workspaces and using it in our monorepo. The stats are based on 20 packages managed by Lerna, run on a 2016 MacBook Pro.

Without Yarn Workspaces

+-----------------+--------+| Command | Time |+-----------------+--------+| yarn install | 13.23s || lerna bootstrap | 72.33s |+-----------------+--------+

This adds 96,112 files at a total of 666.4mb to disk.

With Yarn Workspaces

+-----------------+--------+| Command | Time |+-----------------+--------+| yarn install | 17.26s || lerna bootstrap | 3.85s |+-----------------+--------+

This adds 32,008 files at a total of 267.1mb to disk.

Conclusion

Waiting an extra 4 seconds to install the root packages is worth the savings we get with lerna bootstrap. With a bit of caching on the continuous integration server, things look even better, but I’m getting ahead of myself.

Test Configuration

We use Jest, but decided to let Lerna manage the test runner instances. (FYI, Jest comes with a multi-project-runner that may be useful in your use case.)

In our case, we like the --scope flag Lerna provides to run commands in certain directories only.More importantly, we have a variety of packages and services, some can be used in Node.js, others in the browser and some are isomorphic.

To accommodate for that, we have the following Jest configuration setup:

.├── jest.config.js├── packages│ ├── iso-package│ │ ├── jest.config.js│ ├── svr-package│ │ └── jest.config.js│ └── web-package│ └── jest.config.js├── services│ ├── svr-service│ │ └── jest.config.js│ ├── web-service│ └── jest.config.js└── tests-setup├── polyfill.js└── setup.js

The root-level jest.config.js contains the base Jest configuration we apply across all packages and services. It looks something like that:

// jest.config.jsmodule.exports = {collectCoverageFrom: ['**/*.js'],resetMocks: true,verbose: true}

Web packages and services

A web-* package or service uses the following jest.config.js within its root directory:

// packages/web-*/jest.config.js or services/web-*/jest.config.jsconst jestBase = require('../../jest.config.js')module.exports = {...jestBase,coverageThreshold: {global: {statements: 100,branches: 100,functions: 100,lines: 100}},browser: true,setupFiles: ['<rootDir>/../../tests-setup/polyfill.js','<rootDir>/../../tests-setup/setup.js']}

Node.js / isomorphic packages and services

A iso-* or svr-* package or service uses the following jest.config.js within its root directory:

const jestBase = require('../../jest.config.js')module.exports = {...jestBase,coverageThreshold: {global: {statements: 100,branches: 100,functions: 100,lines: 100}},testEnvironment: 'node'}

Notice how we configure the coverageThreshold on a per package / service level? This allows individual teams to set their own thresholds. Managing that per package / service is significantly simpler than at the monorepo root level.

Test Execution

The root package.json file contains a "test": "lerna exec yarn test" script. Each package and service has its own test script that simply invokes Jest: "test: jest". The pattern applies to test:coverage as well.

We can now use Lerna’s flags to do all sorts of nice things:

Run tests for all services: yarn test --scope @my-company-services/*.
Run test coverage for all web packages: yarn test:coverage --scope @my-company/web-*.
Run tests for the @my-company/iso-package package and all packages and services that depend on it: yarn test --scope @my-company/iso-package --include-filtered-dependencies.

Why use **lerna exec** to execute a npm script when **lerna run** does exactly that?

From what we encountered, lerna run swallows the output of the npm scripts. With the --stream flag, we get the output but it’s neither formatted nor does it have coloured console output.

Conclusion

While I could imagine Jest’s multi-project-runner to be more performant than our solution, we like Lerna’s powerful flags and decided to forgo Jest’s approach. This may very well change as more and more tests get added to the monorepo. (Happy to chat about that if anyone has some thoughts)

ESLint & Prettier

No special consideration was necessary. Simply add your config files to the repository root and it works as expected.

Pull requests, templates & labels

The pull request template is configured once in the .github/PULL_REQUEST_TEMPLATE.md file. It applies across all packages and services.

Compared to multiple repositories, managing pull requests in a monorepo requires a bit more thinking. At the time of this writing, we have not yet decided how we will deal with that. A few notes from initial discussions include:

Use a labels with unique colours per package / service (we’ll run out of distinguishable colours quickly though)
Use green labels for new features, red for bugs. Create green and red labels per package / service and add the package / service name as the label’s name.

Conclusion

The benefits of a monorepo immediately were apparent to the team. Prior to that, we used yarn link to deal with a small SDK we use to integrate with the backend API. It works, if you’re careful and don’t deal with Docker as we do for our local development. Regardless though, it is still a mental burden on each individual developer who works on the SDK.

Getting everything configured took time, I am not going to sugarcoat that. Thanks to an amazing and curious team who showed patience throughout that transition period, we’re now in a place to spend more time building software rather than maintaining source repositories. Thank you!