October 2020 – directEDGAR

I received an email this morning asking a really interesting question – how can I use directEDGAR to identify the auditor and the location of the auditor’s office in 10-Ks filed before 2000? This data object is not readily available from any source that I am aware of.

Step 1 of the process was to go look at some 10-K filings – I used the following search (CNAME contains(CONAGRA)) and (DOCTYPE contains(10K*)) – I simply wanted to review how this disclosure was made in Conagra’s 10-K. I selected Conagra because I have had many students take an internship and so their name came to mind first. Here is what the disclosure looked like:

So our client is looking to capture the name of the auditor and the location of their office – my immediate thought was that I could search for auditors by name, the name of the states and require that the auditor, state name and a date be in close proximity to one another. If the search was successful I would then extract the context. I had to go do some research to identify the name of the audit firms that existed during the span of time they are trying to collect this data for. This list is not meant to be exhaustive but as a starting point I came up with these auditors: (anderson or andersen or ernst or kpmg or waterhouse or pricewaterhousecoopers or coopers or pwc or deloitte or bdo or mcgladrey or grant or baker or crowe). Given that the audit report spans from 1/1/1995 to 12/31/2001 – I need a date search parameter – date(1/1/1995 to 12/31/2001). The magic here is that our index parser recognizes dates in US form. I also need to set the search to focus on 10-K filings as well as Exhibit 13 filings. Sometimes the audit report is included in the Exhibit 13 rather than the body of the 10-K.

Here is the search string I put together for my first stab at collecting this data:

(
date(1/1/1995 to 12/31/2001)

w/6
(anderson or (yes there are typos in the auditor signatures)

andersen or
ernst or
kpmg or (. . . more auditor names)
)

w/6

(Alabama or
Alaska or
Arizona or (. . . more state/location names)
) <- closing parenthesis for location parameters
) <- closing parenthesis to group date/auditor/location parameters

and

(
(DOCTYPE contains(10k*))
or
(DOCTYPE contains(EX13))

)

There are basically four parts to this search. First, the date span, then auditor names and then state/location names. These first 3 parts need to be grouped together since we have set some proximity parameters for these particular items. I then have the document restrictions. I need the document restrictions so I don’t find the content in a consent filing.

I want to keep a fairly tight context for the extraction so I set the Context option span to be 5 words (before to after). In my initial naive pass the search identified 41,567 documents. An example of the extracted context is here:

accepted accounting principles. /s/ Arthur Anderson LLP – —————————— Boise, Idaho. February 2, 1998 <PAGE> UNAUDITED RESULTS OF QUARTERLY

ContextExtraction from Date/Auditor Search

Clearly I need to make some improvements. I need to identify other names for auditors. Further, some state names may be abbreviated in some filings or the auditor may be domiciled in another country so I need to play with adding state abbreviations, names of countries or large international cities where I expect to find results. But the hard work is done – now we just need to experiment – identify auditors, places and setting the right distance between the date/auditor/location parameters.

The context extraction has an identifier for the critical values (name of state and the auditor name) and it has the actual context. I deleted a lot of the columns to focus on the relevant context for this particular example.

CONTEXT	ANDERSON	IDAHO
accepted accounting principles. /s/ Arthur Anderson LLP – —————————— Boise, Idaho. February 2, 1998 <PAGE> UNAUDITED RESULTS OF QUARTERLY	1	1

The next step is to parse out Boise – but this should not be difficult in Excel. It would be even easier in Python – but it is definitely doable in Excel.

I told the client – this was one of the most interesting searches I have performed in some time. As an after note – I built the search in Notepad++ in my first draft I had the parentheses wrong – using Notepad++ it was easy to keep track of the grouping.

One of the critical benefits of our new platform is the ease in which we can distribute/add new filings to the search repository. Historically we have supported research projects when one of our clients has asked for a special directEDGAR repository to be created for a specific project. Off all of the filings available on EDGAR the one type that has been asked for most often has been S-1 filings. When we have created these I have been reluctant to push them out to everyone because of the level of coordination it would have required with your IT support. All of the coordination issues go away with our new platform.

I had a client reach out to me last week and asked if we could make S-1 filings and DRS (Draft Registration Statements) filings available to them. We already had an archive of S-1 filings so I just needed to update the archive and transfer it to our new platform. The DRS filings were ones I was not familiar with but they look particularly interesting for research into IPOs as well as governance. A simple description of the DRS filings is that they are S-1s confidentially filed while the registrant is sorting out the IPO process. We are including letters from the registrants in response to SEC comments in the DRS folder with the DOCTYPE tag DRSLETTER. Note – the original SEC Comment letters that prompt the response from the registrant have been available in the UPLOAD folder as they have been released. Since the DRS filings have only been available since Q4 2012 there are not that many of them. Both sets of filings are now available on our new platform.

If you are using our new platform – remember that your instance is loaded with your preferences and we maintain your configuration settings in the cloud. Thus you have to update the index library to have access. The process is pretty straightforward and is described in the Help under Indexing the specific topic is labeled Index Updates.

Once you have updated the index library the S1MASTER and DRSMASTER indexes are available for use. You can then use our amazing search as well as the full range of Extraction and Normalization tools on these filings. In the image below I ran a search to identify all DRLETTER (DRSLTR) type documents.

directEDGAR

Search, Extraction & Normalization Engine

Month: October 2020

Using the date search capability

Two New Filings Types Now Available