Mono-repo Vs. Multi-repo Vs. Hybrid: What’s the Right Approach?

Written by aviyo | Published 2020/06/26
Tech Story Tags: engineering | software-engineering | development | projects | repository | programming | productivity | product-management

TLDR Mono-repo Vs. Hybrid: What’s the Right Approach? The benefits of using a monorepo vs. multi-repos are tremendous, but you need to be aware of the difficulties that surface when the projects start to scale up. Facebook built its own custom filesystem (used fuse file system) and source control (fbsource — customized Mercurial) in order to address the scale-up issues. We decided to break the monolith and move to microservices. If projects were dependent on each other, the coupling became only API contracts.via the TL;DR App

I still remember my first day at Outbrain. As part of the Bootcamp (training program), we were required to clone the code from a repository called the trunk (one monolithic repo that contained all our codebase). It took at least half a day to clone and build the whole source code. Over the next year or two in which my team worked with a monorepo, we just suffered — cloning the repo was time-consuming; the slow build/release time frustrated us; flaky tests and bad commits affected all the engineering; and let’s not even mention the IntelliJ indexing time, which easily afforded us time to run down for a chatty coffee break.
But wait a second. Google, Facebook, Twitter, and other big companies all use mono-repos, right? Why shouldn’t you do the same? The benefits must be tremendous, no?
After our experience at Outbrain, I can say that , yes, they are, but you need to be aware of the difficulties that surface when the projects start to scale up.
Large companies that use a monorepo have developed sophisticated tools and spent tremendous resources, money, and time to work with it. For instance, Facebook built its own custom filesystem (used fuse file system) and source control (fbsource — customized Mercurial) in order to address the scale-up issues.
I want to share with you our thoughts on mono-repo vs multi-repo and the hybrid approach that we eventually adopted at Outbrain.
The mono-repo
Mono-repo for the type of high-scale project we were working on hampered our development process. For a good article that breaks down the theoretical foundation that gives rise to the problems inherent in mono-repo check out Mono-repos — Please Don’t! , in which Matt Klein runs through every possible complication you encounter when using this approach without developing tools that require extensive human and monetary resources.
Does this mean we should never use a monorepo? No.
Monorepos provide a centralized place to manage dependencies what makes upgrade libraries to be easier task, allow for greater collaboration and code sharing, and use just one CI.
Nevertheless, the monorepo slowed our development process, affecting the company’s total velocity. As a result, we decided to break the monolith and move to microservices. If projects were dependent on each other, the coupling between projects became only API contracts. At this point, although we were still in monorepo, the contracts allowed us to transition to multi-repo. That is, since the team’s code was now dependent only on API contracts, we were able to move the code to their own repository in order to make them fully autonomous.
How the multi-repo was structured
Each repository contained two main modules:
  • Libraries — This module contained all the libraries that we released to the artifactory so that other teams could use it.
  • Services — This module contained all our deployable services and internal libraries.
Were all our problems solved? No.
Like mono-repo, multi-repo had both advantages and disadvantages. On the one hand, we had wanted each repository to contain its own dependencies managed by their own teams, but this led to dependency conflicts among different versions (Teams shared their libraries and API). On the other hand, decoupling gave the teams full autonomy. As a result, the CI was faster. Moreover, flexibility empowered teams to maximize their velocity.
The transition to multi-repo required all teams to manage their dependencies by themselves, reducing visibility and control of the dependencies among multi-repos. Unfortunately, this gave rise to dependency conflicts, conflicting versions of 3rd-party libraries- as well as our own- appearing in production as ‘NoSuchMethodError’ or ‘ClassNotFoundError’.
In order to regain visibility and control we decided to write an inhouse tool — the bumper tool.
The bumper tool
Imagine a mechanism that can create a dedicated branch, update 100+ repos, and trigger a build in a matter of minutes. People generally don’t remember where dependencies are used in their repository. We wanted an automation tool to update versions of dependencies (both 3rd party and teams’ libraries dependencies across multiple repositories). The bumper tool automatically sends pull requests with the relevant version updates to repository owners.
The bumper works in two phases:
  1. Scan Phase — The bumper periodically scans repositories and collects information, e.g. repository location, dependency info, pom.xml location, etc.
  2. Update Phase — The bumper bumps specific, multiple dependencies and sends pull requests to repo owners.
The tool can be used either internally (Team A exposes its API by releasing its library into an artifactory, and other teams can use this API as a dependency that is managed by a version) or externally (in our case by Maven dependencies).
Update versions via Slack!
The bumper tool is actually just a Slack bot. To update versions, we simply executed commands in Slack. A dependency (artifact) can be managed using different property names among repositories. That’s why we start with a query.
Eventually we bump properties related to specific artifacts in specific pom.xml.
Done! The bumper creates and sends pull requests to relevant repositories, and notifies all owners whose dependencies were changed.
As a result, each bump contains a list of all pull requests and can be found in a nice visual (the dashboard) that tracks the status of the pull requests.
The bumper tool helped us automatically bump versions among repositories but didn’t save us from the dependency hell, as was described earlier.
Hybrid-repo
With hybrid-repo we have one repo that is responsible for keeping internal shared libraries and APIs between teams. To maintain its high compilation speed and keep the repo lean we make sure not to use slowly-compiled libraries, like libraries written in Scala or libraries that have long-running tests.
One of the advantages of hybrid-repos is that they reduce dependency conflicts, since we have one place with a repo that manages all dependencies. Now repositories with services have to upgrade only one dependency instead of a couple of versions from multiple teams, and we can still use the bumper tool to bump (external/internal) versions, which make version alignment easier.
But, we need to make sure the repo remains in a manageable size and doesn’t scale up.
What is the right approach?
This is not a one-size-fits-all solution. The answer lies in your team’s collaborative analysis of the project requirements and available resources. All possible approaches need to be considered, and you obviously need to tailor your decision to the specific requirements and needs of your organization, just as we did at Outbrain.
Strike a balance - find your hybrid!

Written by aviyo | I’m an application software engineer at Outbrain, with a passion for new technology.
Published by HackerNoon on 2020/06/26