October 2022 – directEDGAR

Earlier this week I explained that we would move all of our archive of ITEM_1A to the platform and index them. This was completed, I had a message soon after wondering if we could do the same with the ITEM_7 (affectionately known as MD&A). This was completed over night. I did change the index name – originally I named the Risk Factors archive RISKFACTORS. When I finished the MD&A and looked at the placement in the index list – I decided I was making this unnecessarily complicated. So the names were changed to ITEM_RISKFACTORS and ITEM_MDA.

And yes, you can use all of the existing features with these indexes. Specifically – if you want to run a CIK-DATE limited search, you need to have a file with the column headings CIK and MATCHDATE, there can be other columns but those columns must exist.

Notice in the file above, I have multiple instances of the same CIK but different dates. That is because for my research example I want to establish whether or not a MDA exists in a window around each of those unique dates with the search phrase(s) I will use in my search. Once the file is created and available select the CIK/DATE Filter checkbox and hit the Set CIK/Date File to activate the file selector tool.

After you select the file remember to specify the range and the specific date you want to use. The RDATE represents the dissemination date (which is often but not always the filed date) and the CDATE represents the balance sheet date AS REPORTED in the filing header. Once you have set the parameters hit the Okay button and enter your Search term(s) phrases into the search box before you hit the Perform Search button.

Expert tip – if you only want the existence of the document rather than identifying those with specific words/phrases use the search operator XFIRSTWORD (or XLASTWORD if you prefer).

Remember as well that the platform will generate a list of CIK-DATE pairs that did not match any criteria for the search. It could either be that they did not have the relevant search terms or if they did, the MATCHDATE you specified as well as the window did not match what is available.

Let me get to the punchline first – and then I will add some detail. If you log into our platform today and use the Zoom button next to the Library listing tool and then scroll down almost to the bottom you will see the new RISKFACTORS index:

This index has the collection of the ITEM 1A Risk Factors we have previously parsed from 10-K filings. While these are available from our ExtractionPreprocessed feature and the application has a built in Custom-Indexing feature two things happened in the past week that caused me to move these to the application drive.

First, I had an interesting and long discussion with Professor David Lont at Otago University. David has been a directEDGAR user from almost the beginning. Honestly, some of the research goals he has shared with me in the past were critical factors that caused us to figure out how to add the ability to filter searches by CIK and unique date pairs. In our conversation last week David was not wishing for the ability to search the Risk Factors section, instead he was sharing a story about one of his junior colleagues general frustration with using directEDGAR. It was interesting, and painful to listen to these observations. When I think of hard/easy I am always probably measuring the difference between where we are versus what I had to do to collect purchase price allocations to establish the amount of goodwill recognized in acquisitions by reading 10-Ks on micro-fiche. That is probably not the right measuring stick!

Okay, so I had that discussion with David (and it was great to talk to him) and then two days ago a new user at another client school asked about searching within the Risk Factors section of the 10-K. Initially, my thought was to direct him to the sections in the documentation regarding ExtractionPreprocessed and then the section on Custom Indexing. But my conversation with David had been running around in my head and I thought to myself – that is adding unnecessary work to the process. What I needed to do was to just take the time to move a copy of the Risk Factors snips to the application and create an index just for those documents. The ExtractionPreprocessed was developed when we distributed our content to you and left it to your IT experts to update the indexes etc. We knew that there was no way they could be expected to manage the more than one million files that have accumulated over time. So we set up the system for you to pull the files and added the indexing feature so you could build your own index and search the snips.

I guess I get that might seem convoluted (remember my starting place).

However, there is a caveat, we have some other projects that are fairly significant in scope and I just don’t have the resources right now to do everything I wish I could do with these snips. At the moment we have not injected any metadata into the snips like we normally do with our documents. Any search results will have blank fields for the CNAME, DOCTYPE, FYEND and SIC. However, the search results will have the word count as well as the CIK, RDATE, CDATE and FVAL to provide a way to match back the original 10-K filings that the content was pulled from. I would actually like to put in the path to the original 10-K as a metadata field with the word count from the 10-K as another field.

One immediately nice thing about these is that you can get just the text by completing a search and then hitting the DocumentExtraction TextOnly item from the Extraction item on the menu bar. Thus if you want to use the raw text as an input for some process – that is immediately available.

If I was going to parse these to identify individual listed items I would prefer the htm version with the formatting preserved so as to use the tags to help identify the listed risk factors. As always, the DocumentExtraction feature will dump an original copy of the source file into the directory you specify.

FYI – Risk factors were only required after 12/1/2005. We are capturing the ITEM 1A section as it exists – and so this can be complicated for those entities that reference another document or another location in the 10-K. We are missing some and it is part of our work flow to determine if those can be captured. Further, while entity classification has changed across time – there are entity classes that are not obligated to provide separate disclosures about risk factors. Sometimes they simply remove ITEM 1A from their 10-K other times it is left in and they include language that indicates they are exempt.

Before I sat down to write this post I started the process of moving the MDA archive we have – there will be an MDA index soon (probably by Sunday 10/23/22). I also expect to move over the Business section as well.

directEDGAR

Search, Extraction & Normalization Engine

Month: October 2022

Quick Update

Risk Factors Separate Index Available Now