Data Observability that Fits Any Data Team’s Structure

In our travels through data teams large and small, we’ve discovered that data teams tend to fall into one of three different structures, which were well described in this excellent piece by Mikkel Densoe:

Fully centralized: In a fully centralized model, there is a singular data team that typically rolls up to a VP of Data or CTO. This team includes data engineers, analysts, analytics engineers and data scientists under one umbrella. This singular team supports all data operations for the entire organization, including delivering data to the various functions or lines of business.
Fully embedded: data engineers, data scientists, and analysts are members of different teams throughout the business. Each team will support a different line of business or function (i.e., finance, growth, marketing, operations, etc.). These embedded teams typically have deep familiarity with the business use of the data they support.
Hybrid: there is a centralized team that owns the data infrastructure and provides tooling to help smaller embedded teams within the organization. The central team purely handle infrastructure and tools, or it may also centralize some of the data model, for example maintaining some “gold standard” or “core” tables, but still giving freedom to embedded teams to build their own models.

Regardless of team structure, data reliability will become a challenge as the volume of data, the number of use cases, and the size of the organization expands. When implementing a data observability platform to measure and improve data reliability, you’ll need to consider the basic structure of your teams. That will dictate the best strategy for rolling out and managing observability over your data, pipelines, and assets like analytics dashboards and machine learning models.

Data observability for centralized teams

Overview

It will be easy for centralized teams to ensure the basics across the entire organization (is data loading on time, is there too much or too little data being loaded, etc.). But that’s just the tip of the iceberg when it comes to observability. Beyond basic freshness and volume monitoring that should be applied to every table in the warehouse, it’s important to also monitor the critical tables that each line of business depends on. That involves deeper monitoring for duplication, completeness, distributions, and business logic that often rely on domain knowledge.

For centralized teams, it’s often more challenging to gather this domain knowledge and apply the right in-depth monitoring.

The challenge

With fully centralized data teams, domain knowledge about the data is often shared with the line of business, and not 100% in the heads of the data team members. Therefore, it can be hard to apply deep monitoring to track business logic-specific information without consulting with line of business owners, slowing down the rollout of data observability.

The advantage

Centralized teams have an easier time getting permissions and access to data and infrastructure, and managing their observability once it’s set up. They tend to get set up faster, face less red tape, and can more easily assign responders to the data reliability issues that they identify, for example by sharing a single on-call rotation.

Your recommended data observability rollout

If you are a centralized data team, first identify the lines of business (LOBs) that most heavily leverage your data. Then, partner with them one at a time, taking time to understand where they’re leveraging data (e.g. building a list of their key dashboards or ML models), what types of business-specific checks they need to have monitored, and who should be notified if problems are identified by the observability system. Solve their needs before moving to the next line of business, making it easier to get stakeholders from other LOB’s onboard. This approach may feel repetitive at first, but it allows you to polish your team’s data observability process, and build trust within the organization over time.

Data observability for embedded teams

Overview

Embedded teams will have no problem applying in depth monitoring as they should already be deeply familiar with their data. If following a truly embedded model, there might not be a clear single owner of your data observability tools, so it’s important that the tool be easy to adopt to prevent disparate silos from forming.

Embedded teams will have a harder time with widespread, shallow checks that reach across all parts of the organization. Is data being managed in the same way on one team as on another? It’s difficult to tell. Without one singular person responsible for that breadth of data quality, the diffusion of responsibility can cause haphazard levels of observability—and therefore reliability—for the organization’s data, and slow down the ability for the org to trust data overall.

The challenge

Embedded teams may struggle to deploy uniform operational metrics across the entire data stack. For example, they might struggle to track freshness and volume tracking for an entire pipeline, from raw data to the final dashboard or ML model. We often find that these types of teams duplicate efforts unnecessarily. It can be challenging to assign a responder to a data reliability issue, depending on where it occurred in the pipeline. There's no central owner of data reliability and observability, so any effort can be an uphill battle to make data reliable for the organization as a whole.

The advantage

Embedded teams have an easier time getting really great observability enabled for their own line of business. Data experts sit directly on the teams they’re impacting, so they have deep expertise in their niche, and can operate the data observability tooling without having to involve another team.

Your recommended data observability rollout

Each embedded team should be responsible for the operational quality of the data they support, but should share tools as much as possible with other embedded teams. The more teams utilize the shared observability tooling, the fewer blind spots will occur simply because embedded teams aren’t communicating clearly. Combining data lineage with SLAs can help teams identify where the hand-offs are when looking at an entire pipeline, and route responsibility for solving outages to the proper team’s responder.

Data observability for hybrid teams

Overview

In the hybrid model, the central team should own the observability tooling, making it available to their embedded counterparts as a self service offering. But before doing so, the centralized partners should deploy operational level checks—e.g.freshness and volume monitoring—for all tables in their data stack—and enable the embedded teams to implement deeper monitoring based on their domain knowledge. If the central team is responsible for ingestion, their transformation framework, and ETL/ELT orchestrator, having central control of these operational checks will help them prevent infrastructure issues from impacting the embedded teams’ pipelines.

Once the centralized team has covered these operational monitoring goals, the observability tool can be opened up to the embedded partners to apply deeper monitoring the specific parts of the data model that they own.

Challenge

The biggest challenge for hybrid teams is often getting data reliability on the roadmap for both the central team and the embedded teams at the same time. Once both groups agree to staff the observability effort, they should have the easiest time rolling it out, thanks to the excellent division of labor.

The advantage

Hybrid teams have a clear and straightforward division between operational and business-logic observability. When each team can focus on what they care about, and nothing more, the scope of data quality work is well-defined. Hybrid teams can be speedier and more efficient in achieving their objectives, as long as they both get data observability on their roadmap around the same time. The central team may need a few weeks to months to get started, with the embedded teams fast-following.

Your recommended data observability rollout

Roll out data quality tools to the central data team first, and then to just one or two embedded teams at a time. This ensures basic monitoring for ingestion, replication, etc. is in place before the embedded teams spend time enabling deeper monitoring.

Also Published Here