One of the promises of inline XBRL was simplicity.
If a data point appears on the cover page of a 10-K and it’s tagged, then—at least in theory—it should be easy to extract, compare, and analyze.
Public float is a good example.
The SEC requires filers to report their public float on the cover page, and with tagged cover pages that value is now explicitly labeled and machine-readable. Many researchers and data users understandably rely on the SEC’s structured data feeds to capture it.
But there is a quiet problem.
In a subset of filings, the tagged public float value does not match what the company actually reports on the cover page.
Here’s what that looks like in practice (Packaging Corp of America 12/31/2024 10-K):
- Reported on the HTML cover page
- $16,124,636,098
- Tagged Value in XBRL:
- 16,124,636,098,000,000
The widely used SEC Structured data files contain the same value as the tagged XBRL. To explore you can download the February 2025 data folder and after opening look at line 7,553 in the sub.tsv file.
That’s not a rounding issue or a scaling choice. It’s a conversion error—effectively inflating the reported public float by a factor of one million.
What’s especially interesting is that these filings share a common fingerprint. They appear to have been prepared using a specific filing platform, as indicated by metadata embedded directly in the submission document.
<!-- DFIN New ActiveDisclosure (SM) Inline XBRL Document --> note - creation date varied across the 30 so it was not a specific date issue<!-- Copyright (c) 2025 Donnelley Financial Solutions, Inc. -->
Based on the above, this doesn’t look like a company-specific mistake. It looks like a systematic transformation issue introduced during the filing conversion process. However, it’s more complicated than that. Based on my analysis, this software was used to file 892 10-K (I excluded 10-K/A) in 2025 but the problem seems to have occurred in roughly 30 or so.
Why this matters
If you rely on the SEC’s structured data endpoint you’ll ingest the overstated value without any obvious warning. The number is valid XBRL; it parses cleanly. Unless you validate it against the rendered filing, it looks perfectly legitimate. Why would you question it?
That’s a problem for:
- researchers using filer size thresholds in their analysis
- analysts filtering by public float,
- and anyone using public float as a screening criteria
A simple safeguard
In our own workflows we actually start with the HTML filing itself and then map tagged XBRLvalues back to what the company actually reports on the cover page. This makes discrepancies easier to flag because even for our client facing databases we report to you both the original value as reported in the html and the tagged value as you can see in the next image
This imposes additional work but it also gives you a point of reference that is lost with the raw XBRL data no matter how you access it. Ingesting just the XBRL give no reference to truth. The Inline XBRL makes more data accessible, the HTML provides a clear way of evaluating the results.
Conclusion
When the first XBRL filings were introduced, we were ready—and genuinely excited. The promise was compelling: standardized, machine-readable financial statements that could be reliably constructed directly from tagged data.
That excitement didn’t last long.
As we began validating what we were deriving from the filings, it became clear that the work required was far more exhaustive than anticipated. In one early effort, we constructed roughly 5,000 income statements from XBRL data and compared those to the original financial statements. Approximately 7% contained problems.
What made this especially frustrating was not the error rate itself, but the nature of the errors. We were unable to devise any rule, heuristic, or algorithm that could reliably identify which statements were wrong. The issues were only visible when a constructed statement was compared to the one actually presented in the filing. So XBRL could not be a starting and ending point.
Inline XBRL has materially improved this situation. By embedding tagged data directly within the rendered document, it provides the necessary context to test, validate, and reconcile structured data against what filers actually report. That context doesn’t eliminate errors—but it makes them observable.
The lesson hasn’t changed: structured data creates opportunity, but only when it is paired with validation, traceability, and context. Inline XBRL doesn’t solve data quality problems on its own—but it finally gives us the tools to see them.





















