Articles

8 MIN READ

Beyond Sentiment: Building a Text Intelligence System with GPT-4

We often talk about sentiment analysis as the go-to method for extracting meaning from unstructured text. Positive or negative? Neutral or toxic? But what if that’s only part of the picture?

Written by

Gianluca Manzo

Published on

Copy Link

https://www.colibri.com/insights/beyond-sentiment-building-a-text-intelligence-system-with-gpt-4

At Colibri Digital, we recently built a system that goes beyond sentiment. It identifies not only how people feel, but also who said what, about whom, and with what emotional tone and confidence - all in real time.

‍

We call it our Mention & Sentiment Agent - a generalisable framework designed for any business that needs to extract opinions from long-form text: whether it’s public consultations, policy briefings, investor reports, or social media commentary.

‍

The Real Challenge: Opinions Are Messy

‍

Imagine receiving thousands of stakeholder comments, press mentions, or policy feedback entries. You don’t just want to know how people feel - you want to know:

‍

Who is speaking?

Who or what they’re talking about?

‍

What tone or attitude do they express?

How confident is the system in its interpretation?

In most real-world applications, opinion data is ambiguous, unstructured, and hard to evaluate at scale.

‍

The Solution: A Mention-Level, Entity-Aware Sentiment Engine

‍

Our system leverages GPT-4 via OpenAI API, orchestrated through a modular backend, to process and return structured, context-aware JSON. Each “chunk” of text is parsed for:

‍

Mentions of entities (e.g. companies, policies, individuals)

Speakers (who is expressing the opinion)

‍

Sentiment per mention (positive, neutral, negative)

‍

Emotion (e.g. concern, anger, optimism)

Confidence score (how certain the LLM is)

Overall summary (generated from combined chunks).

‍

This multi-attribute output enables downstream teams - from analysts to policy leads - to filter, rank, and aggregate insights without combing through raw text.

‍

Visual: Real-World Opinion Extraction

‍

‍

In this simple example, Feargal Sharkey expresses negative sentiment towards two water companies: Severn Trent Water and Thames Water. An example of extracted JSON would be the following:

[

{

"Speaker": "Feargal Sharkey",

"Mention": "Severn Trent Water",

"Sentiment": "Negative",

"Confidence": 90%

{

"Speaker": "Feargal Sharkey",

"Mention": "Thames Water",

"Sentiment": "Negative",

"Confidence": 90%

}

]

‍

Architecture Overview

‍

We followed a modular design, making each component replaceable or extensible for different industries:

‍

📥 Frontend (Streamlit + Next.js)

‍

Upload reports or connect live sources (e.g. internal document stores, press APIs).

‍

🧩 Backend (Django API + Task Router)

‍

Parses and chunks large documents, assigns prompts per chunk, and orchestrates LLM calls.

‍

🤖 LLM Engine (GPT-4 via OpenAI)

‍

Runs structured prompts to extract mentions, sentiment, emotion, and summaries.

‍

🧠 Post-Processing Layer

‍

Normalises results, handles edge cases (like unknown speakers), deduplicates mentions, and aligns output with UX requirements.

‍

📊 Output (Dashboard or JSON)

‍

Live results visualised as mention-sentiment tables, confidence histograms, or mention maps - ready to plug into wider analytics workflows.

‍

System Performance Highlights

‍

Our internal benchmarks (using gold-standard annotations) showed:

‍

90%+ recall for mention identification

‍

90%+ recall for speaker identification (on fuzzy-matching basis)

‍

80%+ sentiment classification accuracy

‍

Support for nuanced statements like:

‍

“The CEO said the merger could pose risks for small investors”.

‍

Not just positive or negative - but who said it, what was said, and how it was framed.

‍

Built for Real-World Use

‍

What sets this apart from generic sentiment systems?

‍

✔️ Mention-Level Precision

‍

Supports multi-mention documents, even when sentiment varies across entities.

‍

✔️ Speaker Attribution

‍

Handles ambiguity like “the company spokesperson” or “according to the report” - and applies fallback logic when unclear.

‍

✔️ Chunk-Based Prompting

‍

Allows processing of long documents beyond context window limits.

‍

✔️ Confidence-Weighted Results

‍

Every opinion is scored with confidence - allowing users to filter for high-certainty statements only.

‍

✔️ Domain Agnostic

‍

While originally used for policy and corporate text, it generalises to:

‍

ESG and sustainability briefings

Healthcare reports

‍

Legal or regulatory filings

Consumer reviews.

‍

Final Thoughts

‍

Sentiment alone doesn’t tell the full story.

‍

The future lies in entity-aware, speaker-specific, confidence-weighted opinion extraction, grounded in language, context, and structure. With LLMs like GPT-4 and careful prompt engineering, this is not just possible - it’s production-ready.

‍

If your organisation works with unstructured opinion data - from customer feedback to stakeholder reports - we’d love to talk.

‍

Speak to an expert