While “software is [still actively] eating the world”, it’s also clear that open source is taking over software.

Simply put, open source is a superior approach at building and distributing software because it provides important guaranties around how software can be discovered, tried, operated, collaborated on and packaged. For those reasons, it is not surprising that it has taken over most of the modern data stack: infrastructure, databases, orchestration, data processing, AI/ML and beyond.

Looking back, the main reason why I originally created both Apache Airflow and Apache Superset while I was at Airbnb in 2014–17 is because the vendors…


Batch data processing — historically known as ETL — is extremely challenging. It’s time-consuming, brittle, and often unrewarding. Not only that, it’s hard to operate, evolve, and troubleshoot.

In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. This post distills fragments of wisdom accumulated while working at Yahoo, Facebook, Airbnb and Lyft, with the perspective of well over a decade of data warehousing and data engineering experience.

Let’s start with a quick primer/refresher on what functional programming is about, from the functional programming Wikipedia page:

In…


This post follows up on The Rise of the Data Engineer, a recent post that was an attempt at defining data engineering and described how this new role relates to historical and modern roles in the data space.

In this post, I want to expose the challenges and risks that cripple data engineers and enumerates the forces that work against this discipline as it goes through its adolescence.


Introduction

Every once in a while I read a post about the future of tech that resonates with clarity.

A few weeks ago it was The Rise of the Data Engineer by Maxime Beauchemin, a data engineer at Airbnb and creator of their data pipeline framework, Apache Airflow. At Astronomer, Apache Airflow is at the very core of our tech stack: our integration workflows are defined by data pipelines built in Apache Airflow as directed acyclic graphs (DAGs). …


I joined Facebook in 2011 as a business intelligence engineer. By the time I left in 2013, I was a data engineer.

I wasn’t promoted or assigned to this new role. Instead, Facebook came to realize that the work we were doing transcended classic business intelligence. The role we’d created for ourselves was a new discipline entirely.

My team was at forefront of this transformation. We were developing new skills, new ways of doing things, new tools, and — more often than not — turning our backs to traditional methods.

We were pioneers. We were data engineers!

Data Engineering?

Data science as…

Maxime Beauchemin

Founder and CEO at Preset, creator of Apache Superset and Apache Airflow

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store