Menu
Discuss a project
Book a call
Back
Discuss a project
Book a call
Back
Back
Articles
8 MIN READ

Beyond Sentiment: Building a Text Intelligence System with GPT-4

We often talk about sentiment analysis as the go-to method for extracting meaning from unstructured text. Positive or negative? Neutral or toxic? But what if that’s only part of the picture?

At Colibri Digital, we recently built a system that goes beyond sentiment. It identifies not only how people feel, but also who said what, about whom, and with what emotional tone and confidence - all in real time.

We call it our Mention & Sentiment Agent - a generalisable framework designed for any business that needs to extract opinions from long-form text: whether it’s public consultations, policy briefings, investor reports, or social media commentary.

The Real Challenge: Opinions Are Messy

Imagine receiving thousands of stakeholder comments, press mentions, or policy feedback entries. You don’t just want to know how people feel - you want to know:

  • Who is speaking?

  • Who or what they’re talking about?


  • What tone or attitude do they express?

  • How confident is the system in its interpretation?

In most real-world applications, opinion data is ambiguous, unstructured, and hard to evaluate at scale.

The Solution: A Mention-Level, Entity-Aware Sentiment Engine

Our system leverages GPT-4 via OpenAI API, orchestrated through a modular backend, to process and return structured, context-aware JSON. Each “chunk” of text is parsed for:

  • Mentions of entities (e.g. companies, policies, individuals)

  • Speakers (who is expressing the opinion)


  • Sentiment per mention (positive, neutral, negative)


  • Emotion (e.g. concern, anger, optimism)

  • Confidence score (how certain the LLM is)

  • Overall summary (generated from combined chunks).

This multi-attribute output enables downstream teams - from analysts to policy leads - to filter, rank, and aggregate insights without combing through raw text.

Visual: Real-World Opinion Extraction

In this simple example, Feargal Sharkey expresses negative sentiment towards two water companies: Severn Trent Water and Thames Water. An example of extracted JSON would be the following:

[

  {

    "Speaker": "Feargal Sharkey",

    "Mention": "Severn Trent Water",

    "Sentiment": "Negative",

    "Confidence": 90%

  },

  {

    "Speaker": "Feargal Sharkey",

    "Mention": "Thames Water",

    "Sentiment": "Negative",

    "Confidence": 90%

  }

]

Architecture Overview

We followed a modular design, making each component replaceable or extensible for different industries:

📥 Frontend (Streamlit + Next.js)

Upload reports or connect live sources (e.g. internal document stores, press APIs).

🧩 Backend (Django API + Task Router)

Parses and chunks large documents, assigns prompts per chunk, and orchestrates LLM calls.

🤖 LLM Engine (GPT-4 via OpenAI)

Runs structured prompts to extract mentions, sentiment, emotion, and summaries.

🧠 Post-Processing Layer

Normalises results, handles edge cases (like unknown speakers), deduplicates mentions, and aligns output with UX requirements.

📊 Output (Dashboard or JSON)

Live results visualised as mention-sentiment tables, confidence histograms, or mention maps - ready to plug into wider analytics workflows.

System Performance Highlights

Our internal benchmarks (using gold-standard annotations) showed:

  • 90%+ recall for mention identification

  • 90%+ recall for speaker identification (on fuzzy-matching basis)


  • 80%+ sentiment classification accuracy


  • Support for nuanced statements like:



“The CEO said the merger could pose risks for small investors”.

Not just positive or negative - but who said it, what was said, and how it was framed.

Built for Real-World Use

What sets this apart from generic sentiment systems?

✔️ Mention-Level Precision

Supports multi-mention documents, even when sentiment varies across entities.

✔️ Speaker Attribution

Handles ambiguity like “the company spokesperson” or “according to the report” - and applies fallback logic when unclear.

✔️ Chunk-Based Prompting

Allows processing of long documents beyond context window limits.

✔️ Confidence-Weighted Results

Every opinion is scored with confidence - allowing users to filter for high-certainty statements only.

✔️ Domain Agnostic

While originally used for policy and corporate text, it generalises to:

  • ESG and sustainability briefings

  • Healthcare reports


  • Legal or regulatory filings

  • Consumer reviews.

Final Thoughts

Sentiment alone doesn’t tell the full story.

The future lies in entity-aware, speaker-specific, confidence-weighted opinion extraction, grounded in language, context, and structure. With LLMs like GPT-4 and careful prompt engineering, this is not just possible - it’s production-ready.

If your organisation works with unstructured opinion data - from customer feedback to stakeholder reports - we’d love to talk.

Speak to an expert