One of the projects we have been working on is enhancing our audit fee data. Frankly the current presentation is lousy and not terribly useful. So we sat down and developed a plan and identified some additional fields we needed to collect to include in the audit fee data as part of improving the value of this data.
One of the fields we need to add is the tenure. For those of you not aware there was a rule change promulgated by the PCAOB in 2017 that required the tenure (auditor since) to be disclosed and has normally been included in the 10-K or Exhibit 13. Since the disclosure is required it is much more standardized than the instances of auditor tenure disclosed voluntarily in the Proxy – the most common form is “have served as the Company’s auditor since YYYY”.
I wanted to collect that data myself so I could describe the process to our data team and set out to do using our Search, Extraction and Normalization Platform.
First – I needed to search for the phrase auditor since

As you can see I found 4,979 instances of that phrase (my universe was all 10-K filings that have been made since 1/1/2019. Next I need to extract and normalize the phrase and convert any numbers after the phrase “auditor since” into the data value. I used the ContextNormalization feature as you can see in this next image:

The Extraction Pattern translated into English means – extract the context and if a number is found following the phrase auditor since place it in the csv file in the column labeled audit_tenure.
So I invested a total of maybe 5 minutes. So lets look at the results:

The context is available for review and the application normalized the context to extract the year value for our use to add to our new audit fee data.
There are some exceptions that had to be manually handled (110/4979). Lets look at those:

As you can see – these registrants deviated from the standard disclosure and so I had to review the context and just key in the year value. I am very comfortable working with our tools and in this context so it only took me about 20 minutes to review and key those missing values.
In total I spent roughly 45 minutes to capture this data value. I spent about that much time trying to sort out how to describe the process in this blog post. (When we upload the new audit fee data presentation the tenure field will actually be labeled as SINCE).