More AI Automation Does not Make the World Better

Much earlier in the week I posted about Excel auto-populating a new mapping based on the column headings found in a collection of tables that report the Equity Compensation Plan Information disclosure required of SEC Filers. Today, I am not frustrated but I do want to share my experience – particularly since I know some folks are all into AI.

I am still working on that table and I am reviewing the labels before we start discussing moving this table into production. We are too small to have a human compare the original tables with the mappings so this step has to be handled with significant care. We have 908 original column headings pulled from about 1,600 original tables. These column headings should generally map into three semantic concepts, the number of shares to be issued, the weighted-average exercise price of any options reflected in the number of shares and then the remaining number of shares available for future issuance. In about 2% of the cases there are four columns of data presented but the overwhelming structure is consistent with what is set out the CFR section which mandates this disclosure.

I wanted to do a test to accomplish two goals. First, I wanted an independent review of the mappings. Since I completed 909 rows I thought asking Manish to review it would be worse then drudgery. I also wanted to evaluate whether or not I could have saved the approximately four hours original mapping took. In other words, could I trust this tool for this task? I spent about an hour outlining the problem to a very popular AI tool and submitted the list of column headings. Below is a screenshot of the original labels as parsed from the disclosures

I then merged the output from that process with my original labels to identify any differences and identified the differences. Here is a screenshot of the cases where my mapping did not match the original mapping.

This is not horrible but it is concerning. My instructions described three main concepts – TO_BE_ISSUED, AVERAGE_EXERCISE_PRICE and REMAINING. I did not recall any heading that seemed to map into a concept of OUTSTANDING and I do not think any of the above are real examples of cases where the in this context OUTSTANDING is a better choice then TO_BE_ISSUED. The first two do have the word OUTSTANDING but I would argue with an editor about the word choice if I saw the word OUTSTANDING used in this context.

I asked the AI tool why and their response really seemed to be more of a justification then a valid answer. Here is an excerpt:

Step B — But the phrase also contains both a past and future state

“issued or to be issued”

This is a compound construction:

“issued” → past

“to be issued” → future

In equity-comp language, humans sometimes use:

“issued and outstanding” interchangeably (sloppily)

“issued or issuable” to refer to currently outstanding awards

The diagnostic logic (incorrectly, in hindsight) treated:

“issued OR to be issued”
as a state aggregate rather than a flow category

That aggregate was internally labeled OUTSTANDING — meaning:

“currently existing instruments, whether already issued or issuable”

Maybe an English major would draw the same conclusion – I’ll have to ask one.

Unfortunately, I did not accomplish what I set out to accomplish – which was an evaluation of my original labels. So I took a different tack and provided my mappings and asked my trusty intern (the AI tool) to evaluate the mappings. According to my intern – my mappings were perfect. I doubted that, any reader of this blog knows I am an expert error maker. So I went through the mappings line by line. Hugely tedious but I found two errors. One case where I mapped “OF SECURITIES REMAINING AVAILABLE FOR FUTURE ISSUANCE UNDER EQUITY COMPENSATION PLANS (EXCLUDING SECURITIES REFLECTED IN COLUMN )” to “TO_BE_ISSUED” and then one where I mapped a label that communicated information about the exercise price into REMAINING.

This is interesting to me because I am trying to figure out the value of AI integration somehow into our client facing tools. I understand there is no joy in doing the original mapping and then checking them. Wouldn’t it be hugely amazing to offload that work completely? I just don’t see it yet. I think we need to perhaps streamline (pre-map) the original labels with better cleaning. So maybe my 908 rows could be reduced to 300 or so. But if I have to stand by the results (in my case it is the quality of the data we deliver, in yours it might be a really interesting and novel research finding that you want to publish) I am not ready to have any of the existing tools take over data cleaning. They are a huge help in other ways but when it comes down to verifying the accuracy of data, I think we have some ways to go.

I will say there is probably some significant value in letting the tools do the heavy lifting. If in fact you can take the time to carefully explain the nature and features of the disclosures and you dump a significant list, then you are likely to save some initial time. But I would also think about how to audit the results.

Leave a Reply