More Errors in SEC Filings

I have been pounding on the XBRL data as we intend to make some subset of this available. One of the issues that is particularly important to me is to improve the way we make metadata available to you to organize your search. I don’t want to get bogged down in a deep discussion about our plans at this moment, I would just like to observe that we for sure want the Document and Entity Information (DEI) at your fingertips so you can better manage your search.

I created two datasets, one has all of the DEI numeric data and another has all of the text data. We knew some of the numeric values had errors. For example, did you know that EBay once reported that their public float was $ 31,354,367,947,000,000,000! Anyway, I was playing with the dataset in an SQLITE browser tool and decided to test filter on setting ICFR flag to false and the EntityFilerCategory type to LARGE ACCELERATED FILER. I expected none or just a few. Instead, the database returned 97 entities where the DEI table indicated that the entity was a LARGE ACCELERATED FILER and the IcfrAuditorAttestationFlag is set to false. Because it is a requirement that LAF have an audit of their internal controls I thought this was curious. Here is an image of part of that query.

I of course am hugely curious about that result. Frankly, my first thought was that another unexpected issue in the source files that I will have to pour through to sort out. The code to manage this process was really tricky because of filer specific idiosyncratic choices. Guess what – most of these were coded wrong, either by an employee of the filer making an error or some hiccup between the filer’s form creation and their EDGAR-IZING software. For example, you can see clearly in this audit report that the auditor issued an unqualified opinion on the Company’s internal control over financial reporting.

Some of the cases the auditor concluded that there were one or more material weaknesses in the internal controls, but my read of the Edgar Filer Manual suggests to me that the flag is to indicate whether or not their was an audit of internal controls, not the results of the audit.

My quick scan (using directEDGAR of course) indicates that 91 of the 97 cases were miscoded by the filers. My present thought is that we need to push this data out as-reported. I am going to muck around with it a bit more before we push it into the platform. Once again, because of the updates, once it is available it will be visible in the Database to Query area of the database tool.

I will observe that DraftKings is the result of a SPAC deal and I think based on the timing etc of their acquisition of the old DraftKings business that they were exempt from the requirement to have an audit of internal controls for their 2021 financials.

I hope you have a Happy New Year.

Back to Basics

While we spend a significant amount of time trying to develop new features the truth of the matter is that directEDGAR is still my preferred way to review filings even especially when I have to hand collect something. I am sharing this because in the last three weeks I have had several users ask an unrelated question and in a subsequent conversation it turns out that they were also visiting EDGAR to review some filings for data. My sense from these conversations was that they thought the effort of going to EDGAR was less than firing up directEDGAR because their N was relatively small. My rule of thumb has generally been that if I need to look at more than 5 filings of one type I save more time by using directEDGAR versus visiting EDGAR – and frankly with the new front-end on EDGAR I am starting to think my N should be 2.

I have 37 proxy statements filed in 2022 I need to review this morning from 18 different filers. The short story is I need to establish why we have multiple proxies for these particular filers and how we should code the DC/EC data that was pulled from these filing. I hope this does not sound too arrogant but I feel like I am pretty adept at navigating EDGAR efficiently. So when I assert that using directEDGAR is much faster than using the EDGAR interface. I am already factoring in my experience.

If I use the EDGAR interface to review these proxies I have to start by keying/pasting in the CIK and remembering to hit the TAB key (rather than enter argh!).

I land on the new landing page and because I need to scan each of the two (or in one case three) proxies I need to hit the Classic version link to load the no-frills listing of filings in date order. From there I need to key in DEF 14A to load the proxies.

Next, I need to click on the filing links to open to the landing page for the filing and then I need to click on the link to the actual filings before I can review and draw the conclusion I need to draw. Even describing the process is tedious. Type in CIK, hit Classic version link, type DEF 14A, select each relevant link to open and then hit the actual proxy filing link before I can review the filing.

Using directEDGAR has a slightly higher start-up cost because I have to log-in and wait for my session to load but that start-up cost becomes meaningless when we consider the need to look at multiple filings from multiple filers. Here is a video I made that demonstrates the way I would more efficiently access the same collection of documents.

Using directEDGAR to access the same set of filings – it is just more efficient.

While I grant you that you have to log-in; that time cost is quickly mitigated by the fact that all of the documents you need to review are available. Remember – the SummaryExtraction tool creates a list of the documents with all of the required metadata in the same order the documents are displayed in the Search Results frame so you have a ready tool to use to make the appropriate notes that you need from your review.

Frankly, with the changes to EDGAR over the last year I try to avoid it even more. I will go to EDGAR to look at one or two filings. Maybe a year ago I might have gone to look at ten or so. With the latest interface I get frustrated when the CIK will not come up in the search box the first time or I can’t get a name match if I am using names.

Historical CIK Mapping

I received a really nice email from Professor Richard Price (Richard is an accounting professor at OU) after yesterday’s post. One of the things we discussed in the exchange is the pain associated with entity/security changes etc. Richard brought up a great example, Disney has had 3 CIKS (their current is 1744489, their immediate past 1001039 and their oldest 29082). We added a feature sometime ago that automatically maps whatever CIK you bring to the platform with these others if you specify that you want to include historic CIKs. We did this because some of the tools you use only keep the most current even if we can reasonably conclude that the new entity is the successor issuer. Disney filed an 8-K12B that plainly states This Current Report on Form 8-K is being filed for the purpose of establishing Disney as the successor issuer to Old Disney pursuant to Rule 12g-3(a) when they transitioned from CIK 1001039 to 1744489. As a refresher on this issue please see the news about our 4.0.5 release (4.0.5 Release Announcement). One of the enhancement with the latest version is that you used to have to manually update the mapping file – that file gets updated automatically when you start the application today.

Anyway, Richard made an interesting observation about the need to link the CUSIP-CIK mapping to the historical CIK mapping that I understood but am not sure how to implement. Therefore I decided to make the historical CIK mapping more available. There is another new database available on the platform: CIK_MAPPING. This has the content of the json file we use internally to map CIKs. Richard did not say this directly but I inferred from his comments that this will be useful when CUSIPs and CIKs change. At the present time we are not going to try to modify the CUSIP-CIK file to reflect any information in the CIK_MAPPING file. Both are relatively small and by making those available we are giving you the information you need to review and make the choices that are appropriate for your research.

I hope this is a clear headed example. Suppose you come to our platform with a list of CUSIPs and included in the list is 38259P508. This is the CUSIP assigned to GOOGLE CLASS A. You are sensitive to the fact that this is complicated so you try to match on the first 6 in our database and you find that there is an entry for 3825P706 (GOOGLE CLASS C) and the mapped CIK is 1288776.

So you then run some searches using that CIK and you observe that some observations are missing. So you then take that CIK and investigate the CIK_MAPPING database and you discover that that CIK exists and has a related CIK 1652044.

Well, now this gets interesting – do we have all of the data we need from our original data source (where the CUSIP was the identifier)? We can check because we can use the CIK we just found to see if there is a matching CUSIP.

As you can see there is data, the CUSIP in the image above reflects the CUSIP assigned to ALPHABET INC CLASS C CAPITAL STOCK. So this provides a mechanism/process by which you can be more comfortable about establishing the time series.

I of course understand that you don’t want to work with these one CUSIP/CIK at a time and that is why they are available through the Query tool on the platform. Once again, if you select the database and hit the Execute button without specifying any parameters you can get the entire database available. Use the Save Results button to save as a csv file.

I will observe that these observations are tough to identify, I added a few more today while I was working on this post – and it took three hours to identify the two (really four) new entries. This is not work that can be delegated. Matching CUSIPS to CIKs can be delegated but it is still tedious and it has to be fit in around our primary work.

Notice that I am not that worried about security type – since our focus is on getting access to SEC identifiers from CUSIPs – I know we have cases where the nine digit CUSIP in our database relates to a derivative security (Call/Put) but the six digit maps to the entity. That is what I ultimately care about. It is not perfect but I hope it helps your research