• Blog
  • AI for Product Compliance: A Head of AI’s Lessons from BOM Extraction
April 22, 2026
George Karapetyan
Connect on

AI for Product Compliance: A Head of AI’s Lessons from BOM Extraction

AI for product compliance is one of the most challenging applications of modern AI. From BOM extraction to REACH and SCIP compliance, the gap between a working demo and a production-ready system is significant.

This is the first in a series of posts I am writing about bringing AI into practice in sustainability, and product compliance. The problems in this space are technically interesting, the stakes are real, and I have not seen enough honest writing about what actually works. I will keep adding to this.


 

Someone comes to you and says:

"I used Claude to extract a Bill of Materials from a supplier file for this regulation. It works. Let's roll it out."

And the demo is convincing. Upload a PDF, the model reads it, out comes something that looks structured. Clean enough. Useful enough. If you lead AI at a company that builds or sells physical products, this feels like exactly the kind of problem modern models should handle.

Here is the uncomfortable thing: they are not wrong. The model does work. That is not the problem.

The problem is what comes next.

Why BOM Extraction Demos Fail in Real Compliance Workflows

The real question is not whether a model can read a supplier file. Of course it can.

The question is whether you can turn that read, done once, on one file, in a conversation, into something the business can actually rely on six months from now. For screening. For analytics. For customer requests. For audits. And for re-running the same logic when regulations changes.

That is a very different problem. And it is the one that separates a convincing demo from a system worth shipping.

 

Demo vs. Production Reality of AI

We ran into this directly when building AI into our product compliance workflows at IntegrityNext. The surface issue is familiar to anyone in this space: supplier files do not arrive in a clean, universal format. But the deeper problem is not really about file formats. It is about the data inside them.

A product is not a flat list of parts. It is a hierarchy: assemblies contain components: components are made of materials, materials contain substances, and that hierarchy is almost never clean. The same material appears under slightly different names across suppliers. Part numbers include or omit revision suffixes. One supplier provides a structured assembly tree; another provides a flat list with no indication of what belongs to what. A document discloses substances at the material level but not the component level, or vice versa. Hierarchy is implied, guessed at, or missing entirely.

This is genuinely hard for a human to parse. For AI, it is even harder, because the model has no prior knowledge of what the hierarchy should look like, only what the document happens to say.

This is not an edge case. This is the median supplier file.

What BOM Extraction Really Means for Product Compliance Data

In product compliance, you do not care about a BOM because you enjoy organizing part numbers. You care because you need to understand what a product contains, across several levels: finished good, assemblies, components, materials, substances, and you need that understanding to be traceable.

That matters because compliance questions increasingly sit below the surface of "is this product compliant?" They are more like: what is inside this product, which supplier declared what, is there enough information to assess substance-related risk, and can we trace that answer back to actual evidence?

When substance-aware traceability is required, as it is under frameworks like REACH and SCIP, a one-line certificate saying "compliant" is rarely enough. What is needed is structured, article-level evidence that downstream workflows, customers, and auditors can examine.

So, when someone says "BOM extraction," what they often really mean is: how do we convert unstructured supplier evidence into a reusable product compliance data model?

That is the actual job.

Where AI Actually Adds Value in Document-Based Compliance Workflows

The useful role of AI in this workflow is not "do compliance for me."

It is narrower and more valuable than that.

AI helps absorb the variability in messy inputs. In a BOM extraction workflow, that means classifying what kind of document arrived, recovering structure from inconsistent layouts, identifying candidate rows and hierarchy, normalizing language that systems treat as distinct, but humans treat as equivalent, and, critically, surfacing ambiguity rather than papering over it.

That last part matters a great deal. A good production system should not try to sound confident about uncertain data. It should know when a field is ambiguous, when a document is incomplete, or when two pieces of evidence conflict.

The model's job is to propose.

The platform's job is to decide what becomes trusted data.

This distinction, propose versus commit, is the design principle that makes the difference between a system people trust and one they quietly route around.

Why a General-Purpose Chatbot Fails in Regulated Compliance Workflows

This is the obvious follow-up question. If the model can extract the BOM, why not let users do it in Claude or ChatGPT?

Because a successful extraction in a conversation is not the same thing as a reliable workflow.

General-purpose assistants are excellent for exploration and for testing what is possible. But regulated workflows need more than an answer. They need control, consistency, and accountability.

In production, you need the same type of file to go through the same process every time. You need extracted fields to map into a fixed internal schema. You need validation rules to behave deterministically. You need exceptions to route for review rather than getting smoothed over. You need outputs tied to document versions and evidence. And you need the whole thing to work in queue-based systems at scale, not as a twenty-message conversation per file. A mid-sized manufacturer would produce anywhere between 100-1000 Bill of Materials. Imagine auditing 1000 conversations with 20 messages each.

The winning pattern is not "replace the workflow with a chatbot." It is "use AI inside a controlled workflow." Probabilistic models where the world is messy. Deterministic software where accountability matters.

A Scalable AI Architecture for BOM Extraction and Compliance Data

Building this for IntegrityNext's use cases taught us that BOM extraction is not one model call. It is a staged flow with clear contracts between stages.

Start with ingestion, not extraction. Before asking a model anything, decide what arrived and how it should be handled. Is this a BOM-like document? Is it a declaration, a supporting certificate, or an irrelevant attachment? What metadata do we already know from the supplier relationship? This layer sounds boring, but it prevents most downstream chaos.

Build a stable document representation. The goal before extraction is to normalize heterogeneous inputs into something the rest of the pipeline can treat consistently. That means capturing table regions, row and column structure, section boundaries, and provenance anchors. Without this normalization layer, every downstream step becomes a bespoke workaround for whatever format happened to arrive.

Extract proposals, not final truth. Now the model does what it is good at: looking at the normalized document and proposing BOM line items, hierarchy, quantities, material-level hints, and possible substance mentions. These land in a workflow as proposals, not as committed database records. That distinction is load-bearing. It is what makes the architecture auditable.

Bind to a canonical data model. This is where many AI prototypes quietly fall apart. If the output of extraction is "whatever fields the model happened to find," you have a clever parser, not a scalable product. The long-term value comes from a fixed schema, something like an assembly (with regulation-specific attributes layered on top). Once supplier evidence lands in the right shape, it stops being a document problem and becomes a product intelligence asset.

Put deterministic validation after AI, not instead of it. Required field checks, unit normalization, part matching, duplicate handling, threshold logic, this is where rules belong, and it should be. If the AI layer absorbs variability, the deterministic layer enforces policy.

Keep humans in control, not in the weeds. Having the fields already extracted, spares the attention span of humans to control, verify, or override, rather than pure data entry. The goal is to route the human’s attention to judgement rather than execution. This is how you build the logic responsibly.

From BOM Extraction to Reusable Product Compliance Data

From Documents to Compliance Intelligence

 

Too many teams evaluate document AI as if the goal is to get structured output once.

For product compliance, the real win is not "we extracted a BOM from this file." It is "we converted supplier evidence into structured data that the business can now use repeatedly."

That changes everything. You can compare suppliers more systematically. You can build analytics over product composition. You can respond faster to auditors and customers. You can re-run checks when a substance list updates next quarter without going back to the original files. You can support future workflows without asking suppliers for the same data again.

This is also, not coincidentally, what a product compliance platform needs to deliver at its core. At IntegrityNext, the underlying data model, the way we represent products, components, materials, and substances, and how we connect supplier evidence to that structure, is what makes any of this reusable. The AI layer sits on top of that foundation. Without the foundation, extraction is a one-time trick. With it, every document processed makes the next one easier, and every workflow downstream gets smarter.

This is why schema-first design matters. It is not architecture theater. It is what makes extracted data durable.

Design Principles for AI in Regulated Workflows

If there is one design principle from our experience worth carrying into other regulated workflows, it is this:

Use AI where the world is messy. Use software where the business needs control.

In practice that means, start with the target data model, not the prompt. Separate propose from commit. Design for repeatability, not just first-pass accuracy. Preserve provenance so users can see where a value came from. Measure downstream usefulness, not extraction cleverness. Build for queue-based operations, not chat-based heroics.

What It Takes to Build AI for Product Compliance at Scale

There is a version of this story where the hard part was picking the right model or writing the right prompt. That is not the version we lived.

The hard part was combining three things that do not usually sit in the same room:

  • The technical depth to build document understanding and data pipelines that hold up under real supplier variability
  • The domain knowledge to know what structure actually matters for substance-level compliance
  • And an existing platform that already understood how manufacturers, suppliers, and regulators relate to each other

None of those three things work without the other two. A technically excellent extraction pipeline that maps into the wrong schema is useless. Deep compliance knowledge without the engineering to operationalize it stays theoretical. And a platform without AI integration becomes a bottleneck as the volume and complexity of supplier data keeps growing.

What we are building at IntegrityNext is the combination. That is what makes it worth building, and what makes it work in the field.

How IntegrityNext Can Help

IntegrityNext applies the principles outlined above by embedding AI into a controlled, schema-first compliance platform. Instead of treating BOM extraction as a one-off task, it turns unstructured supplier documents into reusable, traceable compliance data, combining AI flexibility with deterministic workflows.

  • AI-augmented data collection: Extracts, pre-fills, and validates supplier data with evidence links
  • AI Powered Supply Chain Mapping: Reconstructs multi-tier supplier networks using predictive analysis and sourcing logic.
  • Smart supplier prioritization: Focuses attention on highest-risk and highest-impact areas
  • Continuous monitoring: Detects regulatory changes and emerging risks in real time
  • Automated remediation workflows: Turns insights into action with triggered follow-ups and controls

The result is not just better extraction, but a system that makes compliance data consistent, auditable, and reusable at scale.

Discover AI intelligence layer