Our data team lead by Manish Pokhrel completed addressing the 1,300 some-odd issues we identified in the 2005 – 2020 auditor data and completed the initial processing of 10-K filings from 1998 – 2004. These updates have been integrated into the files available on APPSTREAM in the Extras folder.
I did make a significant change to three of the fields in the data set. In the original sqlite3.db file the dissdate, balance_sheet_date and report_date were entered as integers in the form YYYYMMDD. I changed the form to text with the format “YYYY-MM-DD”. I did this to allow you to more directly create queries using the field as a data object. SQLITE3 allows you to operate on text fields that are formatted as dates in some neat ways. The following statement selects all observations where the balance sheet date ranges from 12/25/2019 to 1/7/2020. This should address those cases where the FYE is generally around 12/31 but the registrant operates on a 52/53 week year.
SELECT * from auditdetails where date(balance_sheet_date) > date("2019-12-25") AND date(balance_sheet_date) < date("2020-01-07")
I want to observe that we are not yet finished with this work and we make no claims as to its perfection (yet) – but I would rather make this available now. If we wait until it is perfect – well it might take another six months. When we do this kind of work it is secondary to our primary responsibilities. However, we now have over 148,000 auditor 10-K filing pairs. Once this is done we can finished we can get a better measure of SINCE for all of the auditors.
THIS IS BIG: You will find another feature in the Extras folder. The SEC did not systematically add an ACCEPTANCE-TIME stamp to all filings until near the end of Q2 in 2002. For example, of over 9,900 8-K filings made in Q4 2001 only 17 had a ACCEPTANCE-TIME stamp in the header.
We have had several requests for systematic access to the ACCEPTANCE-TIME for 8-K filings in particular. I started looking to sort out a way to get a proxy for this measure. After some testing I determined that the time stamp in the live feed to EDGAR subscribers is the best proxy we could generate. Therefore we have scanned the feed archives and parsed out the transmission timestamp for all filings. We have created a database of this information for the 8-K filings to respond to these requests. At the very end of this post is a block that describes the fields in the database.
The database has as primary key the CIK_ACCESSION-NUMBER for the filing. We are including the path to the folder where the filing is stored on directEDGAR, the item codes associated with the filing as well as other fields listed below. While the item codes have changed over time we have always standardized the item codes to the current scheme. For a listing of the item codes and their meaning please review the current Form 8-K. For example an old ITEM 4 is now 4.01. ITEM 7 is now 9.01. Thus, no matter what time period you are searching, if you want ITEM 4 you need to use the current 4.01.
This file can be useful in a number of ways. If for example, you just need 8-K event dates and reason codes – you can modify the query below by deleting ITEM_4_01 = ‘YES’ and the first ‘and’. (The modified query is in the Python file in the folder as well as a query to pull all observations)
Here is an example query to that reads in a file that has a list of CIKs and some other data. The CIKs are then used to identify those in that group that filed an 8-K for an ITEM 4.01 (Changes in Registrant’s Certifying Accountant). I also conditioned the results on meeting a date window and having a value for ACCEPTANCE_TIME
cur.execute("""SELECT * FROM eightKAcceptance
WHERE ITEM_4_01 = 'YES'
and upper(ACCEPTANCE_TIME) != 'NOTAVAIL'
and date(SEC_FILING_DATE) > date('2007-05-01')
and date(SEC_FILING_DATE) < date('2011-07-05')
and CIK in {}""".format(cik_tuple))
The folder will have the CIK input file, the code I used to create the query as well as the output file. Please help us maintain the value of our offering by not sharing these files with anyone. If you are working with colleagues at other universities you can merge only the data you need for your study with their data after you have gotten their agreement that this unique data is available only to be used in the study in which you are participating. Sorry to be such a downer but – well it is a challenge at times.
Note – we are still tinkering with this data – we currently have filing ACCEPTANCE-TIME through 12/17/2021. I think we will probably update again in the first week in January 2022 and then plan on updating at the end of each quarter. However, since we add ACCEPTANCE-TIME to the 8-K filings now as metadata and we keep those updated weekly you should always be able to pull the full history.
As a reminder – please see this video for a description of how to save the zip file/files to your local computer (Accessing Extras Folder). Below is a description of the fields in the database.
"""The db has one table named eightKfilings when the table was created the following fields were included
CIK_ACCESSION PRIMARY key, # The issuer CIK concatenated to the filing
# ACCESSION number by an underscore
dE_PATH TEXT, # This is the path to the folder that the
# documents from this filing would be
# found if returned from a search on the
# platform
CIK TEXT, # The Central Index key of the filer - note
# this is a text value
# but it is not left padded
SIC TEXT, # If reported in the header file of the
# filing - this value is the Standard
# Industrial Code as reported in the header
FYE TEXT, # If reported in the header file of the
# filing - this value is the MD (1231)
# of the end of the most recent fiscal year
CONAME TEXT, # This is the name of the filer associated
# with the filing
# as of the filing date
ACCESSION TEXT, # This is the accession number associated
# with the filing
SEC_FILING_DATE TEXT, # The SEC reported filing date from the
# header file
ACCEPTANCE_TIME TEXT, # If the header file included the
# <ACCEPTANCE-DATETIME> tag - the value
# associated with that tag
# If the header file did not include the tag
# this value was pulled from
# the archive of the EDGAR dissemination
# feed.
ITEM_1_01 TEXT, # For all of the following - the header file
ITEM_1_02 TEXT, # was inspected for the list
ITEM_1_03 TEXT, # of ITEMS - Any ITEM indicated in the
ITEM_1_04 TEXT, # header file is assigned the value YES
ITEM_2_01 TEXT, # If the ITEM code was not listed in the
ITEM_2_02 TEXT, # header file the value is left blank
ITEM_2_03 TEXT, # The SEC expanded the list of reportable
ITEM_2_04 TEXT, # events and changed the numbering
ITEM_2_05 TEXT, # of events with a new rule effective
ITEM_2_06 TEXT, # 8/23/2004
ITEM_3_01 TEXT, # We mapped the ITEM codes from filings made
ITEM_3_02 TEXT, # prior to 8/23/2004 to the
ITEM_3_03 TEXT, # event codes required after 8/23/2004
ITEM_4_01 TEXT, # For instance - prior to 8/23/2004 ITEM 12
ITEM_4_02 TEXT, # was used to identify 8-K filings
ITEM_5_01 TEXT, # that reported on the Results of Operations
ITEM_5_02 TEXT, # and Financial Condition. After 8/23/2004
ITEM_5_02 TEXT, # the new ITEM code became 2.02.
ITEM_5_03 TEXT, # Therefore we recoded any 8-K that was
ITEM_5_04 TEXT, # filed prior to 8/23/2004 with
ITEM_5_05 TEXT, # with an ITEM code of 12 to 2.02.
ITEM_5_06 TEXT, # For a current list of the ITEM codes
ITEM_5_07 TEXT, # please review
ITEM_5_08 TEXT, # https://www.sec.gov/files/form8-k.pdf
ITEM_6_01 TEXT, # USE the current ITEM code in any queries
ITEM_6_02 TEXT, # or searches on the platform.
ITEM_6_03 TEXT,
ITEM_6_04 TEXT,
ITEM_6_05 TEXT,
ITEM_6_06 TEXT,
ITEM_6_1 TEXT,
ITEM_7_01 TEXT,
ITEM_8_01 TEXT,
ITEM_9_01 TEXT
There is a lot of useful information/examples about querying sqlite3 databases available from
https://www.sqlitetutorial.net/
"""