There are a number of papers in the accounting literature that have described using directEDGAR to collect inventory write down data beginning with the Allen, Larson and Sloan 2012 Journal of Accounting and Economics paper. Historically, the collection for this data has been based on running key word searches in 10-K filings to identify potentially relevant content and then reviewing the extracted context to determine if it should be included in the sample.
Another paper that used very similar data was Ming, Markov and Shu’s 2023 Accounting Review, Motivational Optimism and Short-Term Investment Efficiency, I constructed a search that I think was close to theirs (DOCTYPE contains(10k)) and inventory w/5 (write* or obsolescence or obsolete or charge* or revaluation or provision*) pre/15 === and applied it to the 2025 10-K archive (the === signs force the search to only return results that are followed by at least three digits.). The results seemed consistent with what they reported (even though their data collection covered an earlier time period). Here is a screenshot of the search results.
Based on reading their paper and how I would collect the same data I would then run a SummaryExtraction to get the context into a CSV file to then scan and code the results.
I then wondered, could we accelerate this using the XBRL databases we have created? One challenge with using the databases is that I would either have to identify the relevant row labels or try to identify the way the data should be tagged. I decided to try the tag route first. Because I could not possibly expect filers to use a constrained set of tags I decided to make an assumption that the actual XBRL tags had to include the word inventory and then at least one of the words in my (their) original search. Here is the search I ran with the Query tool ((name LIKE ‘%charge%’) OR (name LIKE ‘%provision%’) OR (name LIKE ‘%obsol%’) or (name LIKE ‘%write%’) or (name LIKE ‘%reval%’) ) AND (name LIKE ‘%inventory%’). To be clear, I am requiring the tag used for the data to have the word inventory and at least one of the other character sequences.
Here is a screenshot of the results
I saved the results to review them and compare to the full-text search results. There were 2,298 total observations with 212 unique tags. Not all of the results were relevant. Here is a sample of some of the tags.
aaon:InventoryValuationReservesWriteOffs
iex:CostOfGoodsAndServicesSoldExcludingInventoryStepUpCharges
iex:FairValueInventoryStepUpCharges
cgnx:ExcessAndObsoleteInventoryCharges
icui:WarrantyAndReturnReserveInventoryChargedToOtherAccounts
icui:WarrantyAndReturnReserveInventoryWriteOffs
ftk:ProvisionForExcessAndObsoleteInventory
gti:CostOfGoodsAndServicesSoldExcludingInventoryWriteDown
nvax:ProvisionForExcessAndObsoleteInventory
lctc:ProvisionForInventoryObsolescence
duot:InventoryWriteoff
ngs:InventoryAllowanceAllowanceForObsolescence
ngs:InventoryWriteOffs
ew:DisposalGroupIncludingDiscontinuedOperationInventoryWriteDown
cvgw:ProvisionForAdvancesOnInventoryPurchases
arw:InventoryReversalOfWriteDown
clne:ProvisionForDoubtfulAccountsNotesAndInventory
acls:ProvisionForExcessAndObsoleteInventory
I see a really clean work-flow following from this. I think the first thing I would do is separate the tags, identify the irrelevant ones (cvgw:ProvisionForAdvancesOnInventoryPurchases) and filter those out. I won’t go further because I imagine many of you can already imagine the next steps. The point is that accessing this data significantly accelerated when the data is tagged.
The key is to be creative with how you initially filter for relevant tags. If I had to consider every tag with the word INVENTORY (there were 21,985 tags with the word inventory) the review would be overwhelming. But by requiring the word INVENTORY and then at least one other of the word roots that describe the possibility of a write-down I am evaluating a much smaller set of results for relevance.
Of course, the results are not likely to be exhaustive. I found some interesting omissions. Here is a screenshot from Blink Charging’s full-text search results:
This data was not in the results. I searched for it directly using the Query Tool and discovered that the tag was ProvisionForOtherLosses. But I noticed that they did use relevant text in the original row label as you can see in the screenshot above. I think the next step is to get creative in a similar manner using the orig_row_label field which is also fully searchable. I ran this search (orig_row_label LIKE ‘%inven%’) AND ( (orig_row_label LIKE ‘%obsol%’) OR (orig_row_label LIKE ‘%slow%’) or (orig_row_label LIKE ‘%write%’) ).
While there is sure to be duplication in the second run, that is easy to handle because you can filter out all of the tags that were in the first run.
I am not going to assert that this is going to provide complete results. But I believe that if a third round of filtering is needed it should be minimal. Finally, if there is still data to be collected, we can fallback to the full-text search after filtering out the CIKs from the filers whose tagged data was useful. Since only the financial statements and notes are tagged, if the filer made the disclosure only in the MDA without separately reporting the amount in the financials or notes then a full-text search would be the only way to identify the value.
I want to wrap this up by adding a bit of marketing. The real benefit of our platform comes from the visibility we give you into the filings. With that initial search I could easily move through the collection of filings to get a sense of my results. I instantly know if I need to expand (or contract) the search. And then within the same interface I can immediately access the tagged data. The ability to filter tags without requiring their adherence to the taxonomy provided significant benefits. Again, once I saved the results and then looked back at my search results I was ready to filter on the row labels.




















