Solutions
Solution
Industry Spotlight
.avif)
Watch our latest video case study!
Check out how Colibri's partnership with Nomo Fintech has transformed their approach to data
Learn more
Success stories
Insights
.png)
It usually happens when dealing with messy, legacy data sources. Every time the business needs to connect a new data source to the warehouse, a developer has to build a bespoke pipeline for it. You write the code, test it, deploy it, and maintain it. Eventually, you look up and realise your architecture is just a massive pile of near-identical notebooks that all do roughly the same thing, but in slightly different ways.
We recently faced this exact scenario with a client in the pension fund administration space. We weren’t working with tidy, modern data feeds. We were dealing with two decades-old legacy systems spanning hundreds, if not thousands, of files. We had proprietary flat files with complex multivalue structures and 30-year-old COBOL files relying on packed decimal encoding.
Traditionally, this would mean building a completely custom pipeline for every single file. That is a slow, expensive, and deeply frustrating way to scale an enterprise data platform.
So, we decided to fundamentally flip the problem: instead of writing code per source, what if you write config? Adding a new file becomes a metadata exercise, not an engineering one.
To build the backbone of our client's Bronze and Silver ingestion layers, we implemented DLT-Meta.
Built on top of Databricks’ Lakeflow Declarative Pipelines, DLT-Meta is a metadata-driven ingestion framework. The core concept is beautifully simple: instead of writing separate code for every source, you define your source and target metadata once in a configuration file (a Dataflowspec). From there, a single generic pipeline handles the rest.
For a project dealing with thousands of legacy files, this distinction matters a lot. DLT-Meta means onboarding a new file is largely a configuration task, not a development one.
For data leaders, the commercial impact of this shift is undeniable. It directly impacts how quickly the business can get new data into the hands of analysts, and how much it costs to keep the whole thing running as the data landscape grows.
Is this ready for the enterprise? In short: yes, it is genuinely ready, but you need to go in with your eyes open.
DLT-Meta is a Databricks Labs project, which means it doesn’t come with the formal SLAs of a first-party product. If an internal team tries to deploy this alone and something breaks, they will find themselves relying on GitHub repositories for fixes.
This is exactly why enterprise adoption requires an experienced engineering partner. You need a team that can not only implement the framework but provide the ongoing managed support to keep it robust. We are using it in a real production-bound environment with complex legacy sources, and with the right architectural guardrails, the framework holds up brilliantly. If you are already on Databricks, it is worth adopting now.
But the biggest hurdle to getting this working in a messy corporate environment isn’t the software. It’s source system knowledge. DLT-Meta handles the pipeline mechanics brilliantly, but it cannot tell you what your data actually means. On this project, we're dealing with 30-year-old legacy systems where the routing logic for some files is buried in code that only one person fully understands. If you don't have that domain knowledge captured somewhere, the framework is waiting on you, not the other way around. The tech is the easy part.
So, what does the next evolution of this architecture look like?
Our source systems come with a FILE-DESCR file. This file describes the structure of every file at runtime, field names, multivalue groupings, the lot.
The next step in this transformation isn't about heavy engineering; it is about pushing automation to its absolute limit. The goal is to build a fully metadata-driven ingestion layer that makes onboarding completely self-service. The framework reads the FILE-DESCR, derives the metadata it needs automatically, and you are done. No manual configuration step, no manual intervention.
The underlying framework is already there. The final step is simply partnering with the right experts to fully leverage what your source systems are already telling you.
If your team is spending more time maintaining one-off pipelines than actually delivering value from the data, it is time to change your approach.