The Future of Business Intelligence is Open Source

While “software is [still actively] eating the world”, it’s also clear that open source is taking over software.

Simply put, open source is a superior approach at building and distributing software because it provides important guaranties around how software can be discovered, tried, operated, collaborated on and packaged. For those reasons, it is not surprising that it has taken over most of the modern data stack: infrastructure, databases, orchestration, data processing, AI/ML and beyond.

Looking back, the main reason why I originally created both Apache Airflow and Apache Superset while I was at Airbnb in 2014–17 is because the vendors in the data space were failing to:

  • Keep up with the pace of innovation in the data ecosystem

As it is often the case with open source, the capacity to integrate and to extend were always at the core of how we approached the architecture of those two projects.

Headaches with Tableau

More specifically for Superset, the main driver to start the project at the time was the fact that Tableau (our main data visualization solution at the time) couldn’t connect natively to Apache Druid and Trino / Presto, our data engines of choice that provided the properties and guaranties that we needed to satisfy our data use cases.

With Tableau’s “Live Mode” misbehaving in intricate ways at the time (I won’t get into this!), we were stirred towards using Tableau Extracts. Extracts crumbled under the data volumes we had at Airbnb, creating a whole lot of challenges around non-additive metrics (think distinct user counts) and forcing us to intricately pre-compute multiple “grouping sets” which broke down some of the Tableau paradigms and confused users. Secondarily, we had a limited number of licenses for Tableau, and generally had an order of magnitude more employees that wanted/needed access to our internal than our contract allowed. That’s without mentioning the fact that for a cloud-native company, Tableau’s Windows-centric approach at the time didn’t work well for the team.

Some of the premises mentioned above have since changed, but the power of open source and the core principles on which it’s built have only grown. In this blog post, I will explain why the future of business intelligence is open source.

Benefits of Open Source

If I could only use a single word to describe why the time is right for organizations to adopt open source BI, the word would be freedom. Flowing from the principle of freedom comes a few more concrete superpowers for an organization:

  • the power to customize, extend and integrate

Extend, customize and integrate

Airbnb wanted to integrate in-house tools like Dataportal and Minerva with a dashboarding tool to enable democratization of data within their organization. Because Superset is open source and Airbnb actively contributes to the project, they were able to supercharge Superset with in-house components with relative ease.

On the visualization side, organizations like Nielsen are creating new visualizations and deploying in their Superset environments. They’re going a step further by empowering their engineers to contribute to the customizability and extensibility of Superset. The Superset platform is now flexible enough so that anyone can build their own custom visualization plugins, a benefit that is unmatched in the marketplace.

Within the wider community, many report using the rich REST API that ships with Superset, allowing them full programmatic control over all aspects of the platform. Given that pretty much everything that user can do in Superset can be done through the API, sky is the limit when it comes to automating processes in and around Superset.

Around the topic of integration, members from the Superset community have added support for over 30 databases ( and growing!) by submitting code and documentation contributions. Because the core contributors bet on the right open source components ( SQLAlchemy and Python DB-API 2.0), the Superset community both gives and receives to/from the wider Python community.

The power of the community

Open source communities are composed of a diverse group of people that come together over a similar set of needs, and this group is empowered to contribute to the common good. Vendors, on the other hand, tend to focus on their most important customers. Open source is a fundamentally different model that’s much more collaborative and frictionless. Communities, as a result of this fundamentally de-centralized model, are very resilient to changes that vendor-led products struggle with. As contributors and organizations come and go, the community lives on!

At the core of the community are the active contributors that typically operate as a dynamic meritocracy. Network effects attracts attention and talent, and communities welcome and offer guidance to newcomers because their goals are aligned. With the rise of platforms like GitHub, software is pretty unique in that engineers and developers from around the world seem to be able to come together and work collaboratively with minimal overhead. Those dynamics are fairly well understood and accepted as a disruptive paradigm shift in how people collaborate to build modern software.

Beyond the software at the core of the project, dynamic communities are contributing in all sorts of ways that provide even more value to the community. Here are some examples:

  • rich and up-to-date documentation

Avoid lock-in

Recently, Atlassian acquired the proprietary BI platform Chart.io, started to downsize the Chart.io team, and announced their intention to shut down the platform. Their customers now have to scramble and find a new home for their analytics assets that they now have to rebuild.

This isn’t a new phenomenon. Given how mature and dynamic the BI market is, consolidation has been accelerating over the past few years:

  • Tableau was acquired by Salesforce

While consolidation is likely to continue, these concerns don’t arise when your BI platform is open source. If you’re self hosting, you are essentially immune to vendors lock in. If you choose to partner with a commercial open source (COSS), you should have an array of options from alternative vendors to hiring expertise in the marketplace, all the way to taking ownership and operating the software on your own.

For example, if you were using the Amazon Managed Workflows for Apache Airflow service to take care of your Airflow needs, and somehow Amazon decided to shut down the service (this is totally hypothetical!), you’d be left with a set of viable options:

  • Select and migrate to another service provider in the space: most likely Astronomer or Google Cloud Composer

Even at Preset, where we offer a cloud hosted version of Superset, we don’t fork the Superset code and instead run the same Superset that’s available to everyone. In Preset cloud, you can freely import and export data sources, charts, and dashboards. This is not unique to Preset, COSS vendors understand that “no lock-in!” is core to their value proposition and are incentivized to provide clear guarantees around this.

To conclude

Clearly open source is extremely disruptive as it provides freedom and a set of guaranties that really matter when it comes to adopting software. Even more clearly, these guaranties fully apply when it comes to business intelligence.

More specifically around business intelligence, Apache Superset has matured to a level where it’s a very compelling choice over the proprietary solution. Since the original creation in 2015 at an Airbnb hackathon, the project has come a very long way. Here are some recent highlights:

  • Reached the major milestone of 1.0 (2021), which represents the project and product reaching adulthood and being fully ready to compete in the BI marketplace. Superset is ready for mass adoption.

All in all, Apache Superset offers a combination of features and guaranties that is unique to open source BI. To learn more, please visit and join our growing community!

superset.apache.org/community

Originally published at https://preset.io.

Founder and CEO at Preset, creator of Apache Superset and Apache Airflow

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store