Menu
Discuss a project
Book a call
Back
Discuss a project
Book a call
Back
Back
Whitepaper
8 MIN READ

Turning Data Into Answers With Databricks

Organisations sitting on rich cross-domain datasets often struggle to surface that data to the analysts who need it most. Traditional approaches force consumers through ticketing queues, ad-hoc SQL access, or bulk file exports with zero cost visibility.

Aviary is a lightweight, self-service data portal built entirely on the Databricks platform. Powered by Databricks Lakebase (managed PostgreSQL OLTP) for sub-second catalog queries, Databricks Apps for a polished front-end, and Databricks Genie for natural language data access, Aviary lets data consumers browse, filter, query, and export governed datasets with built-in access controls and transparent chargeback pricing, without writing a single line of SQL.

Key outcomes:

  • Full cost transparency via per-row chargeback surfaced at the point of export
  • Zero infrastructure to manage. Lakebase handles connection pooling, OAuth rotation, and scaling automatically
  • Governance built in. Restricted datasets require explicit access requests; open datasets are instantly browsable
  • Natural language querying. Genie lets non-technical consumers ask questions in plain English and receive structured answers, governed by the same access controls as the rest of the marketplace.

Why Aviary?

In large-scale organisations, data is the connective tissue between business units. Yet, for those tasked with turning that data into strategy, the reality is often a collection of expensive, fragmented silos.

Aviary was built on a simple premise: Data consumers don’t want a warehouse; they want answers. By focusing on the specific friction points that stall institutional intelligence, Aviary transforms data from a static liability into a self-service product.

The pain points it solves:

  • The "Gatekeeper" Bottleneck: Traditionally, accessing data requires technical tickets and middleman intervention. This creates a "latency of insight"—by the time an analyst receives the data, the window to act has often closed.
  • The Context Gap: Decision-makers often work with "mystery data." Without immediate visibility into lineage and freshness (e.g., When was this last updated?), high-stakes decisions are built on guesswork.
  • The Discovery Paradox: Organisations often own massive amounts of data that remain unused simply because no one knows they exist. Without a central "shop window," redundant collection thrives while synergy dies.
  • The Hidden Cost of Export: In a cloud-native world, data movement has a price. Most organisations suffer from "bill shock" because the financial impact of an export is rarely made explicit before a user requests access to a dataset.

The Architecture in 60 Seconds

Aviary's architecture is deliberately minimal:

How It Works

1. Metadata-driven catalog

Every dataset in Aviary is registered in a single datasets_metadata table stored in Lakebase. Each row defines:

  • The dataset's name, domain (sector), vendor, and description
  • The underlying table name and timestamp column for freshness tracking
  • A JSON array of filter configurations (dropdowns, sliders, date pickers, free-text search)
  • Chargeback rules specifying whether exports are free or priced per N rows

This means adding a new dataset to the marketplace is a single INSERT statement into a delta table. No code changes, no redeployment.

2. Multi-dimensional browsing

Consumers land on a home page organised by sector (Energy, Finance, Healthcare, Transportation) and vendor. Each sector card shows a live dataset count. Clicking through reveals dataset cards with:

  • Certification status (Certified / Uncertified)
  • Licence-based access controls that determine whether a dataset is freely browsable or requires approval before any data can be viewed or exported
  • Live row counts and date ranges pulled directly from the data tables
  • Chargeback pricing displayed at the card level, so consumers understand the cost model before they even open a dataset

Licence-based access and approval workflows. Not all datasets are equal. Some carry third-party licensing terms, contain commercially sensitive information, or are governed by regulatory constraints.

Aviary classifies datasets into access tiers:

This approval workflow means organisations can onboard sensitive or licensed datasets into the marketplace without exposing them to unauthorised consumers. Once approved, access persists until explicitly revoked, providing a full audit trail of who requested access, when it was granted, and by whom.

3. Export with chargeback transparency

When a consumer clicks "Export," Aviary calculates the estimated cost based on the matched row count and the dataset's chargeback rate. A confirmation dialog shows:

  • Total rows to be exported
  • Estimated cost (e.g., "$28.48 for 284K rows at $1.00/10K")
  • An explicit "Accept & Download" action

This makes data consumption costs visible at the point of decision, not buried in a monthly invoice.

4. Access governance

Restricted datasets show only metadata (name, description, sector). No data preview, no export. Consumers can submit an access request directly from the card, triggering notification to dataset owners.

The Lakebase Advantage

Why Lakebase instead of querying Delta tables directly?

  • Sub-second responses. Lakebase serves catalog and filtered data queries in <100ms via PostgreSQL wire protocol, making the UI feel instant
  • Connection pooling. psycopg_pool with OAuth token rotation handles concurrent users without connection storms
  • Transactional metadata. Dataset registration, access grants, and chargeback rules benefit from ACID semantics
  • No warehouse spin-up. Consumers don't wait for a SQL warehouse to start; Lakebase is always-on

The Genie Advantage

Why embed Databricks Genie into a data marketplace?

Aviary's filter-based browsing works well when consumers know what they're looking for. But the most common question in any data team isn't "show me column X filtered by Y." It's: "Do we have data that can answer this question?"

Genie bridges that gap by letting consumers query datasets in plain English, without needing to understand table schemas, filter configurations, or SQL syntax.

  • Natural language access for non-technical users. Operational managers, finance teams, and regulatory analysts can ask "What was our total energy generation in Q3 by fuel type?" and receive a structured answer, without submitting a ticket or learning SQL
  • Governed by the same access controls. Genie respects Aviary's licence-based access tiers. If a user hasn't been approved for a restricted dataset, Genie won't surface its contents, even if the question matches
  • Context-aware across the catalog. Because Genie operates on the same Unity Catalog tables that back the marketplace, it understands table relationships, column semantics, and data freshness. It can route a question to the correct dataset automatically
  • Reduces filter fatigue. For datasets with dozens of filterable dimensions, constructing the right combination of dropdowns and sliders is tedious. A natural language query like "fraud transactions over $500 in the last 30 days" is faster and more intuitive than clicking through four filter panels
  • Accelerates time-to-insight. Instead of: browse catalog, open dataset, configure filters, preview, export, then analyse, consumers can go directly from question to answer in a single interaction

How Genie Fits Into the Aviary Architecture

Genie doesn't replace the structured browsing experience. It complements it. Power users who know exactly which dataset they need can go straight to it via sector/vendor navigation. Everyone else can start with a question and let Genie guide them to the right data.

Results & Impact

Conclusion

Aviary demonstrates that a production-grade data portal doesn't require a massive engineering effort. By leveraging Databricks for the data and metadata layer, Databricks Apps for the front-end, and integrated identity management (OAuth) for seamless, password-less authentication, we created a governed, chargeback-aware ecosystem.

The true value of this solution lies in how it changes the daily lives of the people using it:

  • For the Analyst & Consultant: It eliminates the "latency of insight." Instead of spending days waiting on access tickets or manually verifying data freshness, they can browse a verified "shop window," apply filters, and pull exactly what they need in seconds.
  • For the Data Engineer & Owner: It removes the burden of manual fulfillment. By automating governance and access requests, data owners maintain strict control without becoming a bottleneck, while the organisation gains total visibility into consumption costs.

Ultimately, Aviary solves the business case for data democratisation. It ensures that data is no longer a fragmented technical liability locked in silos, but a high-velocity asset that is easy to find, safe to use, and transparently priced.