AI Automation Does Not Make the World Better

Is it an age thing – I really dislike all of the ‘AI’ junk that invades my work! How about when you are trying to compose an email and your provider happily types in front of you. Today I am normalizing column heading extracted from tables that report the status of shares to be issued and available in Equity Compensation Plans (often reported in ITEM 12 in the 10-K). Step 1 is tedious because of slight but potentially meaningful differences in column headings even though the actual column headings are specified by CFR 17 SEC 229.201.

I want to map the standard column headings into TO_BE_ISSUED, AVERAGE EXERCISE PRICE and AVAILABLE. So I am carefully reading the original column headings and making the determination. I accidentally hit enter when Excel analyzed and determined that I was typing AVAILABLE and it generously populated the cells below – with nonsense no less. Here are some of the 64 Automatic Mappings that were auto-populated

ORIG_LABEL	NEW_LABEL (from Excel)
'NUMBER OF SECURITIES AVAILABLE FOR FUTURE ISSUANCE UNDER EQUITY COMPENSATION PLAN (EXCLUDING SECURITIES REFLECTED IN COLUMN) (# OF SHARES)	AVAIABLE
'WEIGHTED AVERAGE EXERCISE PRICE OF OUTSTANDING OPTIONS AND RIGHTS 1 EQUITY COMPENSATION PLANS APPROVED BY SHAREHOLDERS:	AVERAGE
'PLAN CATEGORY NUMBER OF SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING OPTIONS, WARRANTS AND RIGHTS	CATEGORY
'EQUITY PLAN SUMMARY COLUMN NUMBER OF SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING OPTIONS, WARRANTS, AND RIGHTS	COLUMN
'NUMBER OF SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING OPTIONS, WARRANTS AND RIGHTS AND THE VESTING OF UNVESTED RESTRICTED STOCK UNITS	EXERCISE
'EQUITY COMPENSATION PLAN INFORMATION NUMBER OF SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING OPTIONS WARRANTS AND RIGHTS	INFORMATION
'NUMBER OF SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING OPTIONS, WARRANTS AND RIGHTS OR SETTLEMENT OF RESTRICTED STOCK UNITS	ISSUED
'NUMBER OF SECURITIES TO BE ISSUED UPON EXERCISE OR SETTLEMENT OF OUTSTANDING OPTIONS, WARRANTS AND RIGHTS	OF
'EQUITY COMPENSATION PLAN INFORMATION AT DECEMBER 31, 2024 WEIGHTED-AVERAGE EXERCISE PRICE OF OUTSTANDING OPTIONS, WARRANTS AND RIGHTS	PLAN
'NUMBER OF DEFINED SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING OPTIONS, WARRANTS AND RIGHTS DECEMBER 31, 2024	SECURITIES
NUMBER OF SECURITIES THAT MAY BE ISSUED UPON EXERCISE OR VESTING OF OUTSTANDING OPTIONS, WARRANTS AND RIGHTS AT TARGET	THATMAY
'NUMBER OF SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING OPTIONS OR SETTLEMENT OF LTIP PSUS AND RSUS	TO
'NUMBER OF SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING OPTION AWARDS, RESTRICTED STOCK UNITS, AND PERFORMANCE STOCK UNITS	TO B ISS
NUMBER OF SECURITIES AVAILABLE FOR FUTURE ISSUANCE UNDER EQUITY COMPENSATION PLANS (EXCLUDING SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING SHARE OPTIONS)	UNDER
NUMBER OF SECURITIES AVAILABLE FOR FUTURE ISSUANCE UNDER EQUITY COMPENSATION PLANS (EXCLUDING SECURITIES TO BE ISSUED UPON EXERCISE OF OUTSTANDING SHARE OPTIONS)	UPON

Those of you that are avid readers know that I can substitute their for there. I have never lost sleep over those errors – I only have one brain cell. This interference is frustrating and unproductive.

I am not sure how Excel does this – but I think it could border on industrial espionage. It might be that Excel is using a C library similar to FuzzyWuzzy and the analysis and substitution decisions are happening locally. But what if they are communicating with a thread in the cloud? There are two concerns I have about this.

First, what if I am working with confidential data – does #Microsoft see all of our data and are they able to act on it? Second, I had to correct these errors, am I serving as an uncompensated trainer for Microsoft? The initial predictions were delivered, I corrected them. Did that info get delivered back to Microsoft and is that going to feed the monster? Two years ago I would have thought that possibility was idiotic – I am not so sure now.

I published this originally about an hour ago. It took me twenty minutes to clean up the mess that was caused by the original automation. I am back on track, happily making my mappings and it happened again. I was read for it this time – When Excel stared volunteering to do my work I froze and captured it.

If you can see above – I had typed REM and Excel volunteered AINDING – and look below – if I had not caught that we would have had the ANCEMAINCURITIES – tell me this is not a problem.

What is the truth?

We are desperately trying to finish up a new comprehensive audit fee database. One of the fields is the AUDIT REPORT DATE. But then we run into issues like this one. Here is a screenshot of the audit signature following the audit report included in New Concept Energy’s 10-K for the FYE 12/31/2024. Here is a link to the filing CIK 105744 10-K.

The dissemination date of this 10-K was 3/24/2025. Approximately one year and one week after the signature date. We discovered this while stress testing our data collection strategies. We have a fairly robust regular expression to use to find these dates – but the discrepancy between the dissemination date and the audit report date warranted further research (we don’t know how we make mistakes until we find the mistakes).

So now the question is what to do about this? I wonder if this is a clue (gosh I hate giving away this research idea) but anyway, is this a clue that the “Company’s internal control over financial reporting was ~~effective~~ ineffective as of December 31, 2024.” Note, the quote is from the 10-K but the date issue, while perhaps trivial might be evidence that the company does not have effective controls. I started this paragraph wondering what we should do about this. We have a couple of choices:

Leave as is.
Leave as is and flag.
Make an educated guess and change the date to 3/17/2025.
Leave the field blank.

I think what we need to do is confirm that we correctly pulled the value that is being asserted, audit report date (in this case) – and leave the value if we have collected it correctly but flag the issue.

I teased in the middle of the prior paragraph – it is kind of interesting the number and nature of errors that we find. Could this be evidence of ineffective controls in that the company does not have enough staff to carefully review and evaluate their disclosures?

Another New Dehydrator Feature

As illustrated in the screenshot below – I updated the Dehydrator to include what I have cleverly named the hydrator-{n}-grid-data.csv.

The n represents the count of dehydration artifacts in the directory that was analyzed. This new output reports the results of dehydration on a row by column basis. To keep the focus on the data I have hidden most of the metadata fields in the file – but here is a screenshot of the datavalues:

The RID represents the index of the data rows the data was pulled from. The ROW-LABEL reports the actual content from that row in the first column from the left. The CID represents the column index where 1 represents the first column to the right of the ROW-LABEL column. The value I have blocked above (818019) can be seen in this table which was the source of the data.

While it may look like the ROW-LABEL is just JOHN M. HOLMES, the actual html tells a different story. In this case both the name and the title are marked in the html as data belonging to each of the first three rows.

The entire Dehydration/Rehydration process is designed to allow you to more easily normalize the column headings (and/or row labels) to create a more compact matrix of data when working with hundreds or thousands of snipped tables from the filings. For example, while 3,111 DEF 14A filed in 2024 and so far in 2025 used OPTION AWARDS as a column heading there were 83 other variants used (OPTION-BASED AWARDS, OPTIONS/SAR AWARDS, . . . AWARDED OPTIONS). With the column-list.csv file you are able to identify one label that is to be applied to all of these phrases that carry the same semantic meaning. This is great when you want the entire result set normalized.

If you are not looking to normalize the whole table, instead if you only want particular data values then this feature is designed to help you get faster access to the data you are looking for. In this case, if all you wanted were option like awards then you can easily sort on the relevant words for the COLUMN-LABEL and that data is quickly available in a very compact form. I had two different PhD students who described this issue (not with compensation data) over the summer so I wrote the code to address their particular problems and have finally been able to integrate it into dehydration. This does not add much overhead to the Dehydration process so after agonizing a bit about offering a switch for it I just decided to make it a default feature.

I am going into the weeds next with this. There is another very important reason for this artifact. We have been trying to identify a reliable, but auditable way to reduce the column headings and row labels that are written to the column-list.csv file. We can’t really do this until we have access to all of the column headings and row labels in a particular collection of tables. This output gives us that access. If I am analyzing every single column heading at one time then I can apply another set of rules to reduce them as long as I can also keep track of where each particular column heading was used and if I have a way to share that information with you when complete. Seeing a label in a list of labels is not quite as useful as seeing it in the actual context adjacent to the other labels and data values.

In addition to the creation of this new artifact we did some more tweaking to the parser rules to improve the header and data row separation process. The new version incorporates these additional rules.

directEDGAR

Search, Extraction & Normalization Engine

Month: November 2025

AI Automation Does Not Make the World Better

What is the truth?

Another New Dehydrator Feature