paint-brush
4 Critical Steps To Build A Large Catalog Of Connectors Remarkably Wellby@jean-lafleur
134 reads

4 Critical Steps To Build A Large Catalog Of Connectors Remarkably Well

by John LafleurNovember 5th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The art of building a large catalog of connectors is thinking in onion layers. Airbyte is building an open-source data integration platform at Airbyte. We haven’t fully built our manufacturing plant, but engineers can already add one new connector every day. Building an integration takes us less than 3 hours and our goal is to bring it down to less than 10 minutes. We have the same ambition for every connector for every other family of integrations with no code or code very little code.
featured image - 4 Critical Steps To Build A Large Catalog Of Connectors Remarkably Well
John Lafleur HackerNoon profile picture

The art of building a large catalog of connectors is thinking in onion layers.

We’re building an open-source data integration platform at Airbyte. We launched our MVP about a month ago. We were thrilled by the amount of feedback and support we got from the community. We even got our first big pull request from a contributor this week (2,000+ lines of code). But during this full month, we didn’t release any new connectors. You might wonder why we didn’t build on that momentum. If people were excited with our MVP even though it had only 6 connectors, you might think we should have ramped up on the number of connectors as fast as possible. We didn’t do that for two very important and differentiating reasons.

First, we were defining exactly what the best data protocol would be if we wanted to solve data integration once and for all, and this for all companies. You can learn more about our specification here. Even though it’s not final yet, you will have a glimpse of our vision for the future. 

Second, and just as important, we were building a real manufacturing plant for data integration connectors. See, our team led data integration at Liveramp, which has more than 1,000 data ingestion connectors and 1,000+ distribution connectors. So we have the experience of abstracting what can be abstracted and simplifying the manufacturing of new integration (very often without code). We haven’t fully built our manufacturing plant, but engineers can already add one new connector every day. 

This article describes how we built this connector manufacturing plant. 

What you need to think about when building a large number of connectors

When building a large catalog of connectors, there are several things that you need to think through. 

Initial build

This is when you start from a blank page. This step usually requires a little bit of planning since it involves communication with external teams/companies.

The initial build step involves:

  • Access to the source/destination documentation
  • Access to test accounts, test infrastructure, etc.
  • Using golden path encoding good practices
  • Using the best language for the task: today, we support both Java and Python, but anyone can add their own languageC
  • reating documentation
  • Defining the necessary inputs

Tests

Tests are essential to make sure that any code or protocol change won’t affect the connectors. They need to run before every merge. 

They also ensure that the connector behaves as you expect. For that you need to run your connector against the actual production service. For example, if you’re working on the Salesforce connector, you must make sure that Salesforce actually behaves the way you expect. It is not unusual that an API or service documentation doesn’t fully reflect the reality.

We currently have the foundation of our test framework; it allows developers to focus solely on providing inputs and outputs, and the rest is taken care of by the framework.

These tests give us 90% certainty that the connector is fully functional. If there are edge cases, it is always possible to add more custom tests.

Liveliness & Change detection

It is essential to ensure that the source or destination continues to behave as it was encoded  during the initial build phase and to ensure that the source or destination is still alive for monitoring purposes. 

These verifications must be run at a cadence, and any failure needs to be investigated and fixed, leading to the maintenance phase.

Maintenance

We need to define how we are going to update the connector, push changes and propagate the changes to all the running instances of Airbyte.

The art of building connectors is thinking in onion layers

Segmenting cattle code

To make a parallel with the pet/cattle concept that is well known in DevOps/Infrastructure, a connector is cattle code, and you want to spend as little time on it as possible. Anything you can do to prevent yourself from doing work in the future, you need to do. This will accelerate your production tremendously.

Abstractions as onion layers

Maximizing high-leverage work leads you to build your architecture with an onion-esque structure:

The center defines the lowest level of the API. Implementing a connector at that level requires a lot of engineering time. But, it is your escape hatch for very complex connectors where you need a lot of control.

Then, you build new layers of abstraction that help tackle families of connectors very quickly.

Today, we’ve built one of these abstractions to support existing Singer integration. Building an integration leveraging Singer takes us less than 3 hours, and our goal is to bring it down to less than 10 minutes. 

We have the same ambition for every other family of sources and destinations.

As we continue to improve our manufacturing plant for connectors, we will build tools that will allow us to handle 95% of integrations with no or very little code.

This is how we are going to address the long tail of integrations and how we’re going to make integrations a commodity.

What Airbyte has built up to now

We’ve built the following:

  • The center of the onion
  • The golden path in Java & Python to build new connectors
  • The first version of the integration test framework
  • Connectors:10 sources with a rate of 1 new source per day, and 4 destinations
  • A layer to quickly support Singer integrations

What our ambitions are with this connector manufacturing plant

We want to reach a rate of 5 connectors per day and accelerate even beyond that. 

We also want to provide the community with more tools to build and contribute their own connectors. Ideally, 95% of connectors can be added to Airbyte with no code.

We hope this gives you a better understanding of what we’ve been up to and what our real ambitions are. If you see any ways to improve this architecture, we’re all ears. Don’t hesitate to join our Slack to discuss any questions or suggestions with the team.

Previously published at https://airbyte.io/articles/data-engineering-thoughts/how-to-build-thousands-of-connectors/