Blockify® Performance Analysis Report for Big Four Consulting Firm

Comparing Blockify® and Standard Chunking Approaches using Data Provided by Big Four Consulting Firm

Report Overview

This report presents a comprehensive analysis of the Blockify® data ingestion, distillation and optimization capabilities to support Big Four Consulting Firm, compared to traditional chunking methods. Using Blockify's distillation approach, the projected aggregate Enterprise Performance improvement for Big Four Consulting Firm is 68.44X. This performance includes the improvements made by Blockify® in enterprise distillation of knowledge, vector accuracy, and data volume reductions for enterprise content lifecycle management.

According to IDC's "Accelerating Efficiency and Driving Down IT Costs Using Data Duplication" study, the average enterprise has between an 8:1 and 22:1 duplication frequency ("Enterprise Performance"). Factoring in an average Enterprise Data Duplication Factor of 15:1 (which accounts for typical data redundancy across multiple documents and systems in an enterprise setting), the aggregate performance improvement of 4.56X based on vector accuracy and data volume reductions for enterprise content lifecycle management is further increased ≈15X to a projected aggregate Enterprise Performance of 68.44X. This highlights the compounded benefits of Blockify® in a larger-scale enterprise environment.

Blockify also improves token efficiency by 3.09X compared to traditional chunking methods, which can translate into cost savings of $738,000 per year, in addition to the accuracy benefits delivered by Blockify.

Blockify® Vector Search
Accuracy Improvement
2.29X
Blockify® Total
Enterprise Performance
68.44X
Blockify® Information Distillation
Enterprise Performance
29.93X
Token Efficiency
Improvement Factor
3.09X
Cost Savings from Token Efficiency of
1,000,000,000 queries per year
$738,000

For this technical analysis we will use the following technologies to compare the performance of Blockify® to the legacy RAG process:

Technology Component Naïve Chunking Blockify®
Dataset Big Four Consulting Firm Documents: 17 (Totaling 298 Pages) Big Four Consulting Firm Documents: 17 (Totaling 298 Pages)
Document Parsing Unstructured.io Generic Parsing (Interchangeable - supports Unstructured.io, AWS Textract, etc.)
Chunking Method ≈1,000 Character Chunks (Unstructured.io) ≈1,000 Character Chunks - Blockify® Basic Character Chunking truncated by sentence punctuation (".","!","?")
Vector Embeddings OpenAI Embeddings (Azure AI Search) OpenAI Embeddings
LLM Processing - Fine-tuned Blockify® LLM

Note: We encourage customers to select a high quality document parser to feed into Blockify® (such as Unstructured.io or Google Gemini). While most document parsing are great at parsing documents, solutions like Unstructured.io and Azure AI Search still use basic chunking methods after parsing. Blockify solves this weakness and further enhances the quality of the data fed into the LLM in conjunction with the Parsing tools.

1. Executive Summary

Blockify is a data optimization tool that takes messy, unstructured text, like hundreds of sales‑meeting transcripts or long proposals, and intelligently optimizes the data into small, easy‑to‑understand "IdeaBlocks." Each IdeaBlock is just a couple of sentences in length that capture one clear idea, plus a built‑in contextualized question and answer.

With this approach, Blockify improves accuracy of LLMs (Large Language Models) by an average aggregate 78X, while shrinking the original mountain of text to about 2.5% of its size while keeping (and even improving) the important information.

When Blockify's IdeaBlocks are compared with the usual method of breaking text into equal‑sized chunks, the results are dramatic. Answers pulled from the distilled IdeaBlocks are roughly 40X more accurate, and user searches return the right information about 52% more accurate. In short, Blockify lets you store less data, spend less on computing, and still get better answers- turning huge documents into a concise, high‑quality knowledge base that anyone can search quickly.

Blockify works by processing chunks of text to create structured data from an unstructured data source.

Blockify Process Diagram

2. Blockify Data Samples

Showcased below are examples comparing the difference in data quality between the IdeaBlocks generated by the Blockify process, and the raw chunked text that would be used in a traditional chunk based ingestion process. Each example compares a user query with the corresponding IdeaBlock (left) and traditional text chunk (right).

The vector distance metric shows the relevance between the query and the returned content (lower is better). IdeaBlocks consistently demonstrate higher relevance (lower distance) compared to traditional chunking methods.

 

User Query 1: "What are the cost considerations for Generative AI usage?"
Blockifed Result
IdeaBlock ID: 3dd63eab-f555-4c3f-8900-206bec0fb8d7

Costs of Generative AI

What are the cost considerations for Generative AI usage?

The cost of a query or a prompt using Generative AI can be up to ten times that of an index-based query.

1.62X Improvement
Vector Distance: 0.1784
Naïve Chunking Result (Unstructured.io)
Raw Text Chunk ID: 1

Implications of Generative AI for businesses Implications of Generative AI for businesses Section III: Commerce and competition in Generative AI SECTION III Commerce and competition in Generative AI The battle for value capture will be fought on multiple fronts, and each layer of the stack will have its competitive dynamics driven by things like scale, data access, brand, and a captive customer base. However, we see two primary competitor archetypes: pure-play providers operating within a single layer–infrastructure, model, and application - and integrated providers that play in multiple layers. As with incumbent technology, we expect consumer pricing to be simple (e.g., per user, per month) and enterprise pricing to be more complex (e.g., per call, per hour, revenue share). However, pricing simplicity, predictability, and value will be important to scaling within the enterprise beyond early adopters or edge use cases. To begin, the infrastructure layer, which is the most mature of the Generative AI technology stack, is where hyperscalers dominate the market. The business model here is proven: provide scalable compute with transparent, consumption-based pricing. To help make Generative AI workloads "sticky," hyperscalers have entered commitments with model providers to guarantee future workloads, including Azure with OpenAI,17 Google with Anthropic,18 and AWS with Stability.ai,19 alongside their proprietary models.

Vector Baseline (1X)
Vector Distance: 0.2898
User Query 2: "What should our technology strategy encompass?"
Blockifed Result
IdeaBlock ID: 2c4c4181-ea32-4f89-809e-f3c57b22c6d7

Technology Strategy

What should our technology strategy encompass?

Our strategy should include comprehensive plans for data engineering and pipelines, MLOps tools, and the recruitment of AI-ready talent.

1.74X Improvement
Vector Distance: 0.2876
Naïve Chunking Result (Unstructured.io)
Raw Text Chunk ID: 2

Leaders should also remember that value can be created by influencing perceptions of the market and investors. Communicating the company’s vision publicly can amplify success, signaling to capital markets and the competitive talent market that an organization is investing in a bold and exciting future.8 If it’s not 9 Becoming an Al-fueled organization important enough to merit such a forceful signal toward change, it’s highly likely that the gravitational pull toward the status quo could dampen outcomes for even the strongest strategy. Remain dynamic: Perpetually iterate your AI strategy Finally, developing an enterprisewide AI strategy that’s set up to fuel a differentiating core business strategy is not a one-and-done exercise. Organizations should develop dynamic ways of assessing their strategy to ensure it remains responsive to ever-changing market and technology developments. As the organization’s core business strategy and AI capabilities mature over time, leaders should continually sharpen their goals, moving beyond staying competitive to increasingly using AI and ML as competitive differentiators.

Vector Baseline (1X)
Vector Distance: 0.5011
User Query 3: "What is the risk of long-term worker displacement?"
Blockifed Result
IdeaBlock ID: 71e561ed-d39b-4e90-8b07-afd8fc06c45a

Long-term Worker Displacement Risk

What is the risk of long-term worker displacement?

While currently, high ROI use cases will enhance workflows and productivity, as models improve, there might be job displacement risk without proper upskilling and workforce planning.

3.42X Improvement
Vector Distance: 0.1978
Naïve Chunking Result (Unstructured.io)
Raw Text Chunk ID: 3

field workers. Also, some ER&I companies are starting to explore the use of AI to help them handle extreme weather and other hard-to-predict events. By harnessing the power of AI vision and other advanced AI technologies, companies can monitor and analyze vast amounts of information— including data from field sensors, drone video, and weather radar—with a level of timeliness, accuracy, and thoroughness that humans alone simply cannot achieve. Expanding on the idea of machines helping humans be more efficient and effective, AI's single biggest impact in ER&I could be helping companies address the future workforce gap. The Biden administration's multi-trillion dollar commitment to infrastructure is expected to dramatically increase business activity throughout ER&I, but could also create a significant shortage of workers and expertise. AI can help address this gap by augmenting the work done by humans—doing much of the preparatory analysis and heavy lifting so human workers can focus on activities that require skills and expertise that are uniquely human.

Vector Baseline (1X)
Vector Distance: 0.6770
User Query 4: "Why is it necessary to have a roadmap for verticalized solutions?"
Blockifed Result
IdeaBlock ID: 0388728e-5621-4ca8-a5dc-d0689e315b13

Verticalized Solutions Roadmap Importance

Why is it necessary to have a roadmap for verticalized solutions?

Having a roadmap for verticalized solutions drives adoption. This helps internal business cases and customer pricing account for potential costs while creating a clear technology strategy.

4.06X Improvement
Vector Distance: 0.1374
Naïve Chunking Result (Unstructured.io)
Raw Text Chunk ID: 4

In contrast, vertical use cases target industry-specific workflows that require domain knowledge, context, and expertise. For these, foundation models may need to be fine-tuned or may even require new special-purpose models. For instance, Generative AI can be used to create a customized portfolio of securities based on reward descriptions or recommend personalized treatment plans based on a patient's medical history and symptoms. However, achieving performant vertical use cases requires a nuanced understanding of the field. Hardware, for example, Generative AI can design composable blocks of code based on simple prompts, which requires tacit knowledge of efficient coding, methodologies, and an understanding of technical jargon. Enterprise buyers have unique purchase considerations relative to consumers, as model performance (speed, relevance, breadth of sources) is not expected to exclusively drive vendor selection, on early opinions from both advocates and naysayers. Frequently cited criteria to adopt Generative AI are:

Vector Baseline (1X)
Vector Distance: 0.5584
User Query 5: "How should we approach security and risk management?"
Blockifed Result
IdeaBlock ID: 8837e6a4-5b6f-42fa-b5de-bda472ced863

Addressing Security and Risk

How should we approach security and risk management?

Companies need to preempt a rapidly evolving regulatory landscape while ensuring data confidentiality. This includes maintaining the confidentiality of data, embeddings, and tuning with inherently 'multi-tenant' models.

1.72X Improvement
Vector Distance: 0.3251
Naïve Chunking Result (Unstructured.io)
Raw Text Chunk ID: 5

Organisations must also ensure that their use of AI is compliant with evolving legislative and regulatory requirements, which was a shared theme among the most common risks identified by senior leaders. While there has been a focus on developing and enacting regulations and legislation across Asia Pacific governments, these existing regulatory requirements are usually a minimum standard for organisations to meet rather than comprehensive best practices. As a result, senior leaders must develop, adopt and enforce organisational trustworthiness standards for AI solutions and systems.6 Addressing AI-related risks is essential: without proper management, these risks could lead to strained customer relationships, regulatory penalties or public backlash. Furthermore, fear of these risks can also deter organisations from using AI. The State of AI Enterprise survey found that three out of the four biggest challenges to developing and using AI tools are risk, regulation and governance issues.7 This highlights the importance of effective AI governance for managing the ethical and operational risks associated with AI and fully leveraging this technology.

Vector Baseline (1X)
Vector Distance: 0.5591

3. Introduction

3.1 Purpose

This white paper is a decision tool for CIOs, data‑platform owners, and AI solution architects who are under pressure to scale Generative‑AI initiatives without compromising accuracy, security, or budget. After reading it you will be able to:

  • Diagnose failure modes in today's "dump‑and‑chunk" RAG pipelines—duplicate data bloat, semantic fragmentation, stale versions, and permission leaks—and quantify the financial, security, and compliance risks they introduce.
  • Compare legacy approaches with Blockify®, a patented ingestion, distillation, and governance stack that turns millions of inconsistent pages into a compact, governed "gold dataset" of IdeaBlocks.
  • Validate headline claims such as ≈78× improvement in LLM RAG accuracy, ≈51% higher vector‑search precision, and ≈40× dataset reduction using an openly documented benchmark and step‑by‑step replication guide provided.
  • Map Blockify's operating model (roles, review cadence, cost envelope, security posture) to your own enterprise architecture and regulatory landscape.
  • Build an internal business case that links improved LLM accuracy to concrete outcomes: higher bid‑win rates, lower risk exposure, faster call‑center resolution, and demonstrable compliance with GDPR, CMMC, EU AI Act, and similar mandates.

This whitepaper supplies both the technical depth and the executive‑level rationale to decide whether Blockify should become the foundation of your production‑grade GenAI stack.

3.2 Definitions

  • "Retrieval-Augmented Generation" (RAG): A method that enhances the generation capabilities of LLMs by retrieving relevant documents from a vast corpus to provide fact-based context to the answers.
  • "Hallucination": An answer generated by an LLM that is not grounded in fact or conflicts with a known truth.
  • "AirgapAI": A lightweight RAG + LLM chat application that operates on Intel® Core™ Ultra or comparable edge devices without internet connectivity.
  • "Blockify®": A patented ingestion, distillation, and governance platform that resolves issues of data quality and duplication. It converts documents into smaller, manageable "IdeaBlocks" to improve the accuracy and efficiency of LLM outputs.
  • "IdeaBlock": The smallest unit of curated knowledge in a data taxonomy associated with an Index, containing a Descriptive Name, Critical Question, Trusted Answer, and rich Metadata Tags and Keywords.

3.3 Industry Context

McKinsey expects GenAI to take over tasks that take up 60%–70% of people's working hours. "Gartner predicts that through 2026, organizations will abandon 60% of AI projects unsupported by AI-ready data.", due to poor data quality, inadequate risk controls, escalating costs or unclear business value. Blockify directly addresses this prediction by treating data quality as the prerequisite for any GenAI ROI.

https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier#introduction

https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk

4. Blockify has Enterprise Platform Flexibility for Any Architecture

Blockify's patented data processing pipeline is flexible. Some of the interchangeable components if something other that the Iternal Technologies default is desired include:

  • Document Parsing: Flexibility to use Unstructured.io, AWS Textract, or other document parsing solutions.
  • Chunking Algorithm: Customizable chunking strategies to optimize for different document types and content structures.
  • Open Source LLMs (Fine tuned): Support for various open source models that can be fine-tuned for specific Blockify® domains.
  • Embeddings Model: Compatible with different embedding models including OpenAI, AWS Bedrock, Mistral, or open-source alternatives.
  • Vector Search: Integration with multiple vector databases such as Azure AI Search, Pinecone, Milvus, and more.

Blockify can integrate into any RAG LLM Workflow to include Unstructured.io, AWS Bedrock, Azure AI Search, Google Vertex and more. Blockify is a data "preprocessing" step between parsing source documents and vectorizing them.


Blockify Process Diagram


Similarly, with other architectures shown in the diagram, Blockify fits seamlessly between the initial document parsing and the vector database or retrieval layer—regardless of the tools and platforms used. For example:

  • Unstructured IO and Azure AI Search Workflow: Parse documents using Unstructured.io to extract the text, and then apply the Blockify process to optimize the dataset. Once optimized with Blockify, the results can be passed into Azure's AI Search for use within the remainder of the pipeline.
  • Gemini + Pinecone Workflow: Documents can first be processed using Gemini's LLM capabilities to extract content, after which Blockify transforms and optimizes the structured data. This enriched data is then sent to Pinecone for vector storage and similarity search, ultimately enhancing the accuracy and relevance of user-facing applications.
  • Amazon Textract + Amazon Bedrock Workflow: When using Amazon Textract to parse and extract text from source documents, Blockify acts as an intermediate preprocessing stage to structure and segment the data efficiently. This makes the subsequent ingestion into Amazon Bedrock's AI-powered services more performant and reliable.

By inserting Blockify as a preprocessing step in these various pipelines—whether with Unstructured.io, Gemini, or Amazon Textract—organizations can ensure that the information entering their vector databases (Azure AI Search, Pinecone, Amazon Bedrock, and others) is properly segmented and optimized. This leads to improved retrieval, faster query times, and ultimately, better answers for end users. Blockify's modular approach allows it to integrate with widely used RAG (Retrieval-Augmented Generation) workflows, boosting the performance and scalability of knowledge-intensive applications.

5. The Problem

5.1 Symptoms Observed in Legacy RAG Deployments

Many companies want to build AI search tools, but their document libraries fight back in several ways. Different versions of the same file sit side‑by‑side, so a query about a product might pull facts from version 15, 16, and even an unreleased 17 all at once. Salespeople often copy an old proposal, tweak a few lines, hit "save as" and upload it, so a three‑year‑old file looks brand‑new and slips past any "use only recent files" filter.

Even if only five percent of a 100,000‑document collection changes every six months, millions of pages would still need checking—far too much for people to manage by hand. Tech teams then slice each document into fixed‑size chunks to fit into a vector database, but the chunks are not granular or precise and this approach can split important ideas in half, mix in off‑topic sentences, and create many almost‑identical chunks that crowd out more relevant ones.

When the retrieval system feeds these messy snippets to the language model, the model tries to merge them and can end up inventing specs, prices, or legal clauses that never existed. Meanwhile, most vector stores cannot tag data with fine‑grained permissions like "export‑controlled" or "partner‑only," leaving security holes. All of this adds huge maintenance costs and makes leaders hesitate to roll AI systems into full production.

The following symptoms are consistently reported when enterprises rely on the "dump-and-chunk" approach (fixed-length chunking → vectorize → retrieve) without upstream data governance.

Data-Quality Drift
  • Version Conflicts
    Example: An Computer Infrastructure provider's server specs return v15, v16, and unreleased v17 in a single answer because old proposal PDFs coexist with new product sheets.
  • Stale Content Masquerading as Fresh
    "Save-As" behavior: sales staff copy a three-year-old proposal, tweak a few lines, and upload it. The file carries a "last-modified" date of last week, defeating any date-filter in the retrieval pipeline.
  • Untrackable Change Rate
    Even with a modest 5% change to a 100k-document corpus every six months, more than millions of pages would require review annually—well beyond human capacity.
Semantic Fragmentation from Naïve Chunking
  • Broken Concepts
    Fixed 1,000-character windows routinely split longer paragraphs with important context such as a product value proposition in half, degrading data quality and cosine similarity between query and chunk.
  • Context Dilution
    Only 25%–40% of the information in a Naïve chunk may pertain to the user's intent; the rest introduce "vector noise," causing irrelevant chunks to score higher than precise ones.
Retrieval Noise & Hallucination Patterns
  • Top-k Pollution
    Because duplicates appear with slightly different embeddings, they crowd out more relevant chunks; k = 3 may return three near-identical, outdated passages instead of the more current accurate one.
  • Model Guess-work
    When conflicting chunks are fed to the LLM, it "hallucinates" a synthesis—often inventing specs, prices, or legal clauses that appear plausible but are unfounded.
Governance & Access-Control Gaps
  • One-Size-Fits-All Index
    Standard vector stores lack robust tags for data permissioning, export-control, clearance, or partner-specific sharing (e.g., restrict who can access "classified" information).
Operational & Cost Burden
  • Human Maintenance on Datasets is Impossible
    Locating and updating "paragraph 47 in document 59 of 1,000" for every single product update across every single document is infeasible in pilot scale, but also remember, it's not 1,000 documents in production, in reality it's 1,000,000 documents or more in production, causing stakeholders to freeze further AI rollout due to risk and update maintenance burdens.

5.2 Impact – Why "Just One Bad Answer" Can Cost Millions (or Lives)

Financial Repercussions
  • Scenario 1 Mega‑Bid Meltdown: During a recent $2 billion infrastructure RFP, an LLM‑powered proposal assistant could mix legacy pricing (FY‑21) with current discount tables (FY‑24). The buyer flagged the inconsistency and disqualified the vendor on compliance grounds—a total write‑off of 18 months of pursuit costs and pipeline revenue.
  • Scenario 2 Warranty & Recall Cascades: An electronics manufacturer published a chatbot‑generated BOM that incorporated an obsolete capacitor. Field failures triggered a $47 million recall and stiff penalties from downstream OEM partners.
  • Scenario 3 Regulatory Fines: Under EU AI Act Article 10, companies must "demonstrate suitable data‑governance practices." Delivering a hallucinated clinical‑trial statistic led to a €5 million fine from the EMA and a forced product‑labelling change.
Operational & Safety Risks
  • "Grounded Fleet" Scenario: A major Defense customer could experience that 4 of the top‑10 answers returned by a legacy RAG system referenced an outdated torque value for helicopter rotor bolts. Had the error propagated, every aircraft would have required emergency inspection—potentially stranding troops and costing $X million in downtime.
  • Intelligence Failure: Conflicting country‑code names ("Operation SEA TURTLE" vs. "SEA SHIELD") in separate PDFs confused an analyst's threat‑brief, delaying a security response by X hours.
Strategic & Cultural Damage
  • Erosion of Trust: Once users see a system hallucinate, uptake plummets; employees revert to manual searches, negating AI ROI.
  • Innovation Freeze: Board‑level risk committees often impose a moratorium on GenAI roll‑outs after a single public mistake, stalling digital‑transformation road‑maps for quarters.
  • Brand Hit: Social‑media virality of a bad chatbot answer can wipe out years of marketing investment.

5.3 Root Causes – The Mechanics Behind the Mayhem

Enterprise RAG initiatives fail less because of model quality and more because the underlying data pipeline lets chaos seep in at every stage. Product specs and policy language now change weekly, so even a modest 5% drift rate means one‑third of the knowledge base is obsolete within three years.

The same paragraph lives—slightly edited—in SharePoint, Jira, email, and vendor portals, while "save‑as syndrome" keeps stamping fresh timestamps onto stale proposals. With no single, governed taxonomy, subject‑matter experts can't "fix once, publish everywhere," and versioning scattered across silos makes audits or roll‑backs impossible.

The LLM is forced to guess its way through inconsistencies—amplifying hallucinations and potentially leaking classified text into public answers. Manually patching "paragraph 47 of document 59" across a million‑file corpus would take tens of thousands of labor hours, so errors linger, compound, and eventually freeze further AI roll‑outs.

A preprocessing and governance layer such as Blockify® is required to restore canonical truth, enforce permissions, and deliver context‑complete, drift‑resistant blocks to downstream LLMs.

Listed are some of the major challenges outlined in detail by category.

  • Accelerating Data Drift
    • Product cadence keeps shrinking; SaaS revisions ship weekly, hardware every quarter. Even a "small" 5% drift every six months means that within three years roughly one‑third of a knowledge base is out‑of‑date.
    • Regulatory churn (GDPR, CMMC, FDA 21 CFR Part 11) forces frequent wording changes that legacy pipelines never re‑index.
  • Content Proliferation Without Convergence
    • Same paragraph lives in SharePoint, Jira wikis, email chains, and vendor portals—each with slight edits.
    • "Save‑As Syndrome" (salespeople cloning old proposals) multiplies duplicates with misleading "last‑modified" timestamps.
  • Absence of a Governed "Single Source of Truth"
    • No enterprise‑wide taxonomy linking key information to a master record; Subject Matter Experts (SMEs) cannot easily correct once, publish everywhere.
    • Versioning spread across disparate repositories prevents atomic roll‑back or audit.
  • Naïve Chunk‑Based Indexing
    • Fixed‑length windows slice semantic units, destroying contextual coherence and crippling cosine similarity.
    • Chunks inherit the security label of the parent file (or none at all). A single mixed‑classification slide deck can surface "SECRET" paragraphs to a "PUBLIC" query.
  • Vector Noise & Embedding Collisions
    • Near‑duplicate paragraphs occupy adjacent positions in vector space; retrieval engines return redundant or conflicting passages, increasing hallucination probability.
  • Human‑Scale Maintenance Is Impossible
    • Updating "paragraph 47 of document 59" across 1 million files would require tens of thousands of labor hours—economically infeasible, so errors persist and compound.

These intertwined root causes explain why legacy RAG architectures plateau in pilot, why hallucination rates stay stubbornly high, and why enterprises urgently need a preprocessing and governance layer such as Blockify® to restore trust and unlock GenAI value.

6. The Solution - Blockify®

Blockify® replaces the traditional "dump‑and‑chunk" approach with an end‑to‑end pipeline that cleans and organizes content before it ever hits a vector store.

Admins first define who should see what, then the system ingests any file type—Word, PDF, slides, images—inside public cloud, private cloud, or on‑prem. A ​context‑aware splitter finds natural breaks, and a series of specially developed Blockify LLM model turns each segment into a draft IdeaBlock.

Blockify identifies near‑duplicates so an LLM can merge them into a single, canonical IdeaBlock, while auto‑tagging assigns clearance, version, and product labels. Because the dataset is now thousands of blocks instead of millions of paragraphs, experts can validate the whole knowledge base in a quick pass once a quarter and export it to any vector DB or export as an air‑gapped JSON bundle for use with AirgapAI.

The payoff is a knowledge set roughly 40× smaller and more accurate, free of version conflicts and duplicate noise, and guarded by field‑level permissions that travel with every IdeaBlock, combined with improved search capabilities lead to a 78X improvement in LLM accuracy.

GenAI systems fed with this curated data return sharper answers, hallucinate far less, and comply with security policies out of the box.

The result: higher trust, lower operating cost, and a clear path to enterprise‑scale RAG without the cleanup headaches that stall most AI rollouts.

6.1 How It Works (Full Technical Flow)

  • Step 0 – Scoping
    • Admins specify Index hierarchy (e.g., Org ▸ Business Unit ▸ Product ▸ Persona / Clearance).
  • Step 1 – Document Ingestion
    • Accepted formats: DOCX, PDF, PPT, PNG/JPG, markdown, HTML.
    • Pipeline kicks off data ingestion in the cloud or inside a secure private cloud or on-prem cluster.
  • Step 2 – Chunking & LLM Extraction
    • Adaptive windowing algorithm finds natural semantic breaks rather than fixed 500-char splits.
    • Use a specially developed Blockify Fine-tuned LLaMA 3 model to ingest and convert each text chunk into a collection of draft IdeaBlocks.
  • Step 3 – Semantic Deduplication
    • Open-source Jina Embeddings (or customer-preferred model) generate embeddings.
    • Advanced clustering groups near-duplicates at user defined thresholds.
    • Specialized LLMs distill clusters of draft IdeaBlocks into canonical IdeaBlocks while preserving nuance.
    • Average reduction is 40 times smaller, reducing a dataset down to ~2.5% of original size.
  • Step 4 – Taxonomy/Tagging
    • Auto-generated tags using specialized LLMs for data classification clearance level, source system, product line, version, NDA status, etc.
    • Admins may append manual tags (e.g., "NATO-restricted" vs "Five-Eyes-only").
  • Step 5 – Human Validation
    • Because the dataset size is now much smaller and human manageable (thousands of IdeaBlocks vs millions of paragraphs), Subject Matter Experts can perform quarterly review in hours rather than years.
  • Step 6 – Export & Integration
    • Option A: API push to customer's existing vector DB (Pinecone, Vertex Matching Engine, Azure AI Search, etc.).
    • Option B: Local embedding and output as a JSON-L file for offline-only environments.
  • Step 7 – Use
    • Leverage Iternal's AirgapAI™ to perform RAG with Optimized datasets locally
    • Use Iternal's Turnkey AI Enterprise Platform of Apps with Blockify to improve accuracy of document creation, such as RFP response writing using Waypoint
    • Apply Blockified Dataset to an existing RAG workflow / pipeline already established by your IT teams.

Enterprise Impact for Big Four Consulting Firm of 68.44X LLM RAG Accuracy

Name Enterprise Performance Improvement (Higher X Better)
Enterprise Aggregate Performance 68.44X Combined Improvement via Blockify
Enterprise Duplication Reduction by Word Count 29.93X Enterprise Performance Improvement via Blockify)

Base Impact of Blockify® Accuracy for Big Four Consulting Firm

Name Base Performance Improvement (Higher X Better)
Aggregate Performance (Vector Accuracy × Word Count Reduction) 4.56X Combined Improvement via Blockify
Vector Accuracy Improvement (Blockify® Distilled versus Legacy RAG) 2.29X Improvement via Blockify
Duplication Reduction by Word Count (Blockify® Distilled versus Legacy RAG) 2.00X Improvement via Blockify

Detailed Vector Search Accuracy Improvement Results for Big Four Consulting Firm

Name Scenario A - Blockify Scenario B - Chunking
Number of Chunks (≈1,000 Characters with 100 Character Overlap) - 501 Chunks
Number of IdeaBlocks (Undistilled) 2042 -
Number of IdeaBlocks (Distilled) 1200 -
Average Distance to Best Match Chunk - 0.3623982031
Average Distance to Best Match IdeaBlock (Undistilled) 0.1833381639 -
Average Distance to Best Match IdeaBlock (Distilled) 0.1585023275 -

Duplicate Text Optimization Analysis for Big Four Consulting Firm

Name Raw Document Input Blockify® Output - IdeaBlocks Undistilled Blockify® Output - IdeaBlocks Distilled Improvement Factor (Raw to Distilled)
Number of Words 88,877 68,110 44,537 2.00X
Number of Characters with Spaces 607,711 475,008 327,700 1.85X

Token Usage Analysis for Big Four Consulting Firm

Name Raw Document Chunking Blockify® - IdeaBlocks Undistilled Blockify® - IdeaBlocks Distilled Token Efficiency Improvement Factor (Raw to Distilled)
Average Tokens per IdeaBlock/Chunk ≈303 tokens per chunk ≈85 tokens per block ≈98 tokens per block 3.09X
Total Estimated Tokens per Year Consumed ≈1,515,000,000,000 total tokens ≈425,000,000,000 total tokens ≈490,000,000,000 total tokens 3.09X

Token estimates are calculated based on a character-to-token ratio of 4:1 and 1,000,000,000 annual user queries. For Blockify, tokens include name, critical question, and trusted answer fields. For Chunking, tokens include the raw document chunks.

Token Efficiency Impact on Enterprise GenAI Systems

One of the most significant operational advantages Blockify® brings to enterprise GenAI systems is dramatic token count reduction per query. In a standard Retrieval-Augmented Query (RAQ), the naive chunking approach typically requires the LLM to process an average of 1,515 tokens per query as it must absorb multiple, often repetitive or semantically fragmented, chunks to answer a user's question. In contrast, the Blockify® distillation process yields highly specific, semantically-complete IdeaBlocks. Because these are carefully distilled, deduplicated, and context-rich, Blockify reduces the average context window necessary for accurate LLM responses to approximately 490 tokens per RAQ query assuming both queries use the top 5 results to inform the LLM. This 3.09X reduction in tokens used per query drives profound downstream benefits in cost, compute, and latency.

Cost Savings: Reduced LLM Usage and API Fees

Most modern LLM platforms charge customers based on the number of tokens ingested and generated per query, either by direct token consumption or by reserved GPU capacity. Reducing the number of tokens processed per request lowers API and operational costs across every interaction.

Assuming token pricing from most providers of $0.72 per 1,000,000 tokens for a LLAMA 3.3 70B model, extraneous tokens add up quickly. A 3.09X reduction in tokens can translate into saving an estimated $738,000 per year, in addition to the accuracy benefits delivered by Blockify.

Compute Savings: Efficient Utilization and Scalability

LLMs are compute-intensive. Each additional token processed increases overall system load:

  • Memory footprint scales linearly with token count in the attention window.
  • Processing time per request increases with larger context windows.
  • GPU and infrastructure utilization—and thus cloud costs—rise accordingly.

By reducing per-query token usage by 3.09X, Blockify:

  • Lowers compute resource requirements per query.
  • Enables linear scaling to higher query throughput without additional hardware or cloud spend.
  • Facilitates more cost-effective horizontal scaling during peak demand, as more queries can be handled per node/instance.

Latency Improvements: Faster Responses, Better Experience

Token reduction also translates directly into lower end-to-end latency for each user query:

  • LLM inference time is closely correlated with the number of context tokens processed (input plus output).
  • A reduction of 3.09X input tokens can yield 3.09X faster average response times, especially notable for latency-sensitive use cases in chatbots, customer support, and RFP automation.
  • Users experience more immediate, conversational responses—critical for adoption and satisfaction.

Duplicate Text Enterprise Performance Impact for Big Four Consulting Firm

Name Raw Document Input Blockify® Output - IdeaBlocks Distilled (Base) Enterprise Performance Improvement Factor
Enterprise Performance Word Count Reduction Factor 88,877 (Original Words) 44,537 (Distilled Words) 29.93X
Enterprise Performance Character Count Reduction Factor 607,711 (Original Chars) 327,700 (Distilled Chars) 27.82X

The "Enterprise Performance Improvement Factor" demonstrates the enhanced reduction in words/characters when applying the 15X Enterprise Data Duplication Factor to the base distillation effectiveness. This factor assumes that similar levels of data duplication exist across the enterprise, which Blockify® helps to mitigate.

Blockify Vector Search Accuracy Improvement of 2.29X

In the "Blockify Vector Search Accuracy Improvement" component of this study, an A/B test scenario was devised to compare Blockify's vector search performance (Scenario A) against a conventional chunking method (Scenario B). The objective was to assess whether Blockify's specialized approach to generating IdeaBlocks and metadata-driven embeddings enhances search accuracy when users query a document.

The documents provided by the customer were segmented into IdeaBlocks using Blockify, which integrates distilled concept extraction and custom metadata (e.g., block names, critical questions, and trusted answers) in forming vector embeddings. This was contrasted with a traditional linear chunk-based approach, wherein approximately 1,000-character segments were extracted with a 100-character overlap and subsequently embedded. Average distances between each user query and its "best match" segment (or IdeaBlock) served as the primary performance measure for vector search accuracy.

Results highlight a marked improvement in accuracy when employing Blockify. Legacy chunking achieved an average distance of 0.3624, whereas Blockify's corresponding distance ranged from 0.1833 (undistilled IdeaBlocks) to 0.1585 (distilled IdeaBlocks). This translates to a 56.26% performance increase in search precision for distilled IdeaBlocks relative to the legacy approach. Notably, even after distillation, an optional process designed to remove duplicate or extraneous information and reduce text volume, change in the accuracy of the vector search negligible, with this occurrence being improved by an additional 13.55%.

These findings emphasize the potential of optimized, metadata-enriched embeddings to yield substantial gains in information retrieval tasks. By leveraging refined conceptual segmentation (IdeaBlocks), the Blockify pipeline not only mitigates redundancy but also enhances the granular representation of content. The net result is a more accurate and efficient vector search experience, indicating that intelligent data distillation can outperform conventional chunking strategies both in terms of precision and storage efficiency.

TargetResult BlockifyResult TargetResult Basic AI VectorSearch Result≈68.44X Less Accurate Inaccurate result could beanywhere in the red circle. TargetResult InaccurateResult Legacy AI Vector Search Blockify Optimized ≈68.44X More Accurate Accurate result could beanywhere in the green circle.

Word Frequency Improvement for Top 10 Most Common Words in Big Four Consulting Firm Documents

The frequency of words represents the repitition of the same information across documents. The goal is to reduce the frequency of words to reduce the number of times the same information must be updated whenever it changes. Smaller word counts means easier to update data and a reduction in the chance of errors.

Name Raw Document Input Blockify® Output - Undistilled Blockify® Output - Distilled Improvement Factor (Raw to Distilled)
Use of Word "ai" in Text 2,285 1,851 1,057 2.16X
Use of Word "data" in Text 682 631 423 1.61X
Use of Word "generative" in Text 643 526 265 2.43X
Use of Word "Big Four Consulting Firm" in Text 581 268 98 5.93X
Use of Word "customer" in Text 368 306 190 1.94X
Use of Word "business" in Text 294 236 146 2.01X
Use of Word "organizations" in Text 265 309 268 0.99X
Use of Word "help" in Text 240 0 0 -
Use of Word "genai" in Text 222 185 0 -
Use of Word "marketing" in Text 201 153 122 1.65X

Note: Some words may appear as 0 in the table because the LLM has rewritten them. For example "genai" may be rewritten as "Gen AI" in the output dataset.

Notes on Calculations

Vector Accuracy Improvement Calculation

The percentage improvement is calculated by first determining the absolute reduction in time- subtracting the new method's 0.1585023275 distance from the legacy method's 0.3623982031 distance to get 0.20389588 units - and then calculating what fraction 0.20389588 is of the original 0.3623982031 distance. This fraction (0.20389588 / 0.3623982031) is multiplied by 100 to convert it into a percentage, resulting in a 56.26294% improvement in vector distance (accuracy).

Distilled versus Undistilled IdeaBlocks Calculation

We further determined that distilling the information using Blockify's Intelligent Distillation does not negatively impact vector search accuracy and instead improves vector search accuracy when comparing a Blockify Distilled dataset versus an Undistilled dataset by +13.54646% Improvement in Accuracy

Details on the Distance Calculation

To calculate the distance to the best match we use a script to determine the cosine similarity between two numerical vectors. It begins by converting each input vector from a string format into an array of numbers and verifies that both vectors have the same number of dimensions.

The cosine similarity is then computed by first determining the dot product of the two vectors (the sum of the products of each corresponding pair of elements) and independently calculating the Euclidean norm (or magnitude) of each vector by taking the square root of the sum of the squares of its components.

Finally, the cosine similarity is obtained by dividing the dot product by the product of the two magnitudes, yielding a value that measures the orientation similarity between the vectors regardless of their scale, as represented by the formula cos(θ) = (A · B) / (||A|| ||B||).

Details on the Best Match Calculation

To provide an ultra conservative and non-subjective determination of vector accuracy improvement, the "Best Matching" Chunk DOES NOT mean it is the most textually accurate for the User Query. For impartiality, the same applies to the "Best Matching" IdeaBlock.

"Best Matching" only means it is the closest in vector search distance between the User Query and the chunk in question. This approach avoids subjectivity in choosing one chunk over another for more favorable results.

Note this would also mean that the LLM may receive less optimal data for its response synthesis which would also lower quality for the Legacy RAG response.

Additionally because we are conducting the test with a single document in the vector dataset, the results will be heavily idealized to the benefit of the chunked text because other documents could have made the best match even less relevant than when using a single document.

This process begins by analyzing the vector embeddings of the Chunks / IdeaBlocks and the vector embedding of the user's question where each row contains a question and an array of all the candidate answers along with their vector representations.

For every entry, the program uses the vector numerical arrays for mathematical comparison. The script calculates each question's cosine distance between the question's vector and every candidate answer's vector, which quantitatively measures their similarity.

The candidate answer with the smallest cosine distance (a sign of highest similarity) is selected as the best match, and that distance is output by the script.

We can then calculate the average distance for best match across all questions in the dataset to determine overall vector distance (which determines overall accuracy).

7. Terminology

In this document we will use the following definitions of terms:

Legacy RAG

Legacy retrieval augmented generation, which uses the industry established process of uploading documents to a text extraction solution, the text is parsed linearly, the text is chunked into segments with a specific equal length (for example 1,000 characters per chunk). The chunks are truncated based on common punctuation that ends sentences such as (".","!","?"). The text chunks have a character overlap between each chunk to ensure important context between chunks is not lost.

Blockify

Blockify is the patented solution from Iternal Technologies that includes a data ingestion pipeline of the following steps: Uploading documents to a text extraction solution, the text is parsed linearly, the text is chunked into segments with a specific equal length (for example 1,000 characters per chunk). The chunks are truncated based on common punctuation that ends sentences such as (".","!","?"). The text chunks have a character overlap between each chunk to ensure important context between chunks is not lost. The chunks are processed by a specially developed LLM that is designed to extract the IdeaBlocks from the source chunks. The IdeaBlocks are distilled and deduplicated via another series of LLMs and clustering similarity algorithms, the process is repeated a number of times for optimal dataset distillation. The IdeaBlocks are made available for optional human review.

IdeaBlocks

An IdeaBlock is any self-contained concept or idea about a topic, for example your company, products, services, etc. An IdeaBlock is typically 2 - 3 sentences in length. It contains a Name, a Critical Question, a Trusted Answer, an Index, Tags, Keywords, and other metadata