For those of you here to explore the capabilities of directEDGAR – we have created a Knowledge Base. To access the Knowledge Base – either use the menu icon above or use this link (Knowledge Base). We will be continuing to fill it out.
Almost a month ago I described the release of two new databases as well as the process we were taking to move our extensive collection of existing TAXRECON tables into a new format. If you have communicated with me at all I might have told you “I don’t know what I don’t know!” This is proving true once again. I had hoped that we could use the tables we had pulled from the html filings and with the existing row labels and the data values have some certainty regarding the identification of the XBRL tags and labels so that we could completely reorganize our existing data into the same format as the new database. This is going to prove to be partially impossible. Manish has been beating his head against the monitor for the last month and last night we had to decide to scale back our expectations.
To sum it up, when we start with the html table and use the row labels as reported in the html to set out to find the tags used in the XBRL we simply cannot make matches for all row labels. This is because the label used in the XBRL differs from the label used in the html. This is true for roughly 20% of the tables. For the tables where we can’t find a match the number of cases where a match is not found can range from one to five. The reasons we can’t match are either because the label is ultimately so different from what is in the html or because the label was just not included in the associated label file. We know this because for some cases (not anywhere near all) we have opened the label file and read each individual label after doing a number of searches to find variations of the expected language. We have tried a number of ways to finesse this problem including using the data value we found in the html table to find the value in the instance document ( the .xml file) to work through a trail – this has not worked for reasons that range from the fact that the data value might be missing in the xml file to the cases where there is no available rational way to link the two because of what I am just going to say is shoddy construction of the XBRL.
The decision we came to is to just move forward – if we can’t find the label (and thus the tag) then we will only report the label that was in the html and not the tag (the tag field for those particular rows will be blank).
With respect to the database we have for the 8-K META (8KMETA) – I have delayed updating it because I have wanted to make two changes. First, I wanted to change the dE_PATH field value from the directory that holds all of the files that were parsed from each 8-K to the path for the actual 8-K (or 8-K/A). I wanted to do this so this path can be used in a search using the DB Document search filter. The second thing I wanted to do was to add the path to the 8-K (or 8-K/A) on EDGAR. This is being added because some of our clients have told me that there are cases where they will collect data from one platform and then need additional values or need some validation of the data and the SEC hyperlink is useful for those cases. These fields are being added on our testing platform right now. The existing database will be switched out over the weekend and be updated to Friday – another piece of good news for us is that we have automated the update process like we have done with the DC data and so this will stay current.
I wanted to have some image here on this post so I decided to see how long it would take me to identify all 8-K (and 8-K/A) filings that were filed to report on auditor changes (Item 4.01 – Change in Registrant’s Certifying Accountant). It took less than five seconds.

Don’t anchor on that number because the database is still under construction. If my math is right, there are still another 1,135 to be added as well as any that have been filed in the last couple of days. (A quick scan of the EDGAR latest filings page suggests there are another six that have been filed recently) Remember, if you hit the Save Results button above all of the data from that query will be dumped to a CSV file for you to filter and research to your heart’s content.
Be careful when saving the file after you have opened it though. Excel can in some cases modify/convert the values in the ACCEPTANCE_TIME field to a rounded exponential value and if you save it after it has made that conversion you will lose some precision.
The easiest way I know to manage that is to open the csv file and right-click on the Excel column label to select the column and open the context menu and select Format Cells . . .
Select the Number format and change the Decimal places to 0.
Once you have done that the number will fully display.
If you continue to save the file as a csv file you will have to work that magic each time you open it. If you save it as an XLSX file the format will be preserved as illustrated in the next (and final) image!



