Solutions
Solution
Industry Spotlight
.avif)
Watch our latest video case study!
Check out how Colibri's partnership with Nomo Fintech has transformed their approach to data
Learn more
Success stories
Insights
At Colibri Digital, we recently built a system that goes beyond sentiment. It identifies not only how people feel, but also who said what, about whom, and with what emotional tone and confidence - all in real time.
We call it our Mention & Sentiment Agent - a generalisable framework designed for any business that needs to extract opinions from long-form text: whether it’s public consultations, policy briefings, investor reports, or social media commentary.
Imagine receiving thousands of stakeholder comments, press mentions, or policy feedback entries. You don’t just want to know how people feel - you want to know:
In most real-world applications, opinion data is ambiguous, unstructured, and hard to evaluate at scale.
Our system leverages GPT-4 via OpenAI API, orchestrated through a modular backend, to process and return structured, context-aware JSON. Each “chunk” of text is parsed for:
This multi-attribute output enables downstream teams - from analysts to policy leads - to filter, rank, and aggregate insights without combing through raw text.
In this simple example, Feargal Sharkey expresses negative sentiment towards two water companies: Severn Trent Water and Thames Water. An example of extracted JSON would be the following:
[
{
"Speaker": "Feargal Sharkey",
"Mention": "Severn Trent Water",
"Sentiment": "Negative",
"Confidence": 90%
},
{
"Speaker": "Feargal Sharkey",
"Mention": "Thames Water",
"Sentiment": "Negative",
"Confidence": 90%
}
]
We followed a modular design, making each component replaceable or extensible for different industries:
📥 Frontend (Streamlit + Next.js)
Upload reports or connect live sources (e.g. internal document stores, press APIs).
🧩 Backend (Django API + Task Router)
Parses and chunks large documents, assigns prompts per chunk, and orchestrates LLM calls.
🤖 LLM Engine (GPT-4 via OpenAI)
Runs structured prompts to extract mentions, sentiment, emotion, and summaries.
🧠 Post-Processing Layer
Normalises results, handles edge cases (like unknown speakers), deduplicates mentions, and aligns output with UX requirements.
📊 Output (Dashboard or JSON)
Live results visualised as mention-sentiment tables, confidence histograms, or mention maps - ready to plug into wider analytics workflows.
Our internal benchmarks (using gold-standard annotations) showed:
“The CEO said the merger could pose risks for small investors”.
Not just positive or negative - but who said it, what was said, and how it was framed.
What sets this apart from generic sentiment systems?
✔️ Mention-Level Precision
Supports multi-mention documents, even when sentiment varies across entities.
✔️ Speaker Attribution
Handles ambiguity like “the company spokesperson” or “according to the report” - and applies fallback logic when unclear.
✔️ Chunk-Based Prompting
Allows processing of long documents beyond context window limits.
✔️ Confidence-Weighted Results
Every opinion is scored with confidence - allowing users to filter for high-certainty statements only.
✔️ Domain Agnostic
While originally used for policy and corporate text, it generalises to:
Final Thoughts
Sentiment alone doesn’t tell the full story.
The future lies in entity-aware, speaker-specific, confidence-weighted opinion extraction, grounded in language, context, and structure. With LLMs like GPT-4 and careful prompt engineering, this is not just possible - it’s production-ready.
If your organisation works with unstructured opinion data - from customer feedback to stakeholder reports - we’d love to talk.