10-K Header Details with Enhanced Coverage of ACCEPTANCE-DATETIME Now Available on Platform

Our previous work to establish a proxy for the timestamp for 8-K filings prompted one of our clients to ask whether it was possible to do the same for 10-K filings. We used the same strategy and were able to add a good proxy for this field to about 97,000 10-K filings where it was not available in the header. With this addition we have 318,304 10-Ks with an ACCEPTANCE-DATETIME value and 7,610 missing a value. While working on this it seemed reasonable to collect as much metadata from the filings as possible. For example, I saw a recent paper that reported that most of the other databases we use for research have only the most recent address. They described having to go to EDGAR to collect this field. There does not seem like a lot of value for you to have to jump through that hoop. Therefore we added most of the fields included in the header or on the INDEX page of the filings.

There is a new compressed folder in the EXTRAS (see here how to access) folder that has a copy of the sqlite3 database (and some code etc). In addition to the information that was reported in the header files we added the DE_PATH – this is not the path to the 10-K but the path to the folder where the 10-K is stored. This will allow you to match search results from directEDGAR to a query from the database with just a little Excel magic on the FILENAME column from a Summary or Context extraction from directEDGAR.

I really thought this was going to take two days once I had sorted out this goal. We had already pulled the ACCEPTANCE-DATETIME measure for all of the filings. We have the headers archived. I thought it was going to be a trivial mapping exercise. It was much more complicated because there are some variations in the fields included in the headers (I learned that there is a field called CONFIRMING-COPY) and then there was a special challenge with the addresses.

As I was looking at the data I was questioning the value of using the codes that the SEC requires filers to use for the STATE field in the BUSINESS-ADDRESS and the MAILING-ADDRESS sections of the header. Who wants to pull this data and then go find the meaning of the value M2 (Jordan) to organize the results for your model? So I decided that we needed to map the values the filers use when preparing their filings to their descriptive values. This was interesting. The only source I could find for the standardization of these fields from the SEC was limited. First, Lexis-Nexis managed the dissemination platform early and I found a page that listed the codes that were to be used (Archived Country Codes). Notice that the list of codes is incomplete – we had to find cases where the code was not listed on the page but included in the header, read the header and assign the code value based on the information we could glean from the filing (and of course Professor Google). The SEC made some changes so that beginning in February 2010 these code were to be used (New State/Country Codes). There is no specific date for the changeover and no easy way to tell so we mapped to 2/1/2010 to the old and subsequent filings were mapped to the new. While there is consistency for the US states the codes for most other jurisdictions changed. For example, West Germany was I8. Under the new regime Germany was assigned the value 2M.

While my primary reason to assemble this is to facilitate your research, a second and still important reason is to make stuff available that might be interesting to muck around with in class or to use for student projects. Therefore I thought it was important to do the STATE field transformation. Note, I did this transformation to both the BA_STATE (Business Address) and the MA_STATE (Mailing Address) fields.

I want to close this by observing we simply organized the data as reported. There is operator (filer) error. I was showing Manish the finished product and thought it would be fun to identify all filers from FINLAND. There were only three observations. However, one of those is an error. China Natural Resources (2000 10-K Link) entered H9 in the STATE field. In prior filings they used K3 (Hong Kong). This of-course brings up the question – Should we correct these? Maybe later, that would be a significantly different project.

I included an html file that has the country codes (both old and new) in the folder. The queryDB.py file includes the list of fields and there are two csv files. One is the input file used in the query that restricts the results to a specific list of CIKs and the other is the output from that query. I have not yet had a chance to play with Pandas so my code still reads the input and writes the output using the CSV module. I thought about modifying Antonis’s code but since I can’t test it – I will leave it to those of you who are used to the Pandas library.

As a final note the data is current as of 12/22/2021. We will add this to our monthly or quarterly update flow once we get through the transition to 2022 in our search platform. I considered waiting until after 12/31 but there are too many tasks that need to be addressed with the beginning of a new calendar year. Thanks for our patience and I hope you find this useful.

Easier Access to 8-K Details Using Pandas

I received a message from Antonis Kartapanis. Antonis is an accounting faculty member at TAMU who uses directEDGAR for some of his research. He wanted to identify which of a set of filers in his sample made 8-K filings based on specific reason codes. He started off with the intention of running a search using our platform but then he wondered if it would not just be easier to pull what he needed from the database we just made available.

Antonis downloaded the database and after scanning the code wondered if it could be simplified by using Pandas rather than the CSV module. While I am aware of Pandas I have not used it as I just haven’t taken the time to test whether it would benefit us in production. Loading a set of CIKs from a file and then writing the results back is much more direct with Pandas as it handles the CSV file interactions ‘under the hood’. Since I did not want to take the time to start poking at Pandas I asked Antonis if he would share the code. He was kind enough to do so. I have pasted it below. Much simpler – it looks like at some point I need to invest in learning Pandas.

# Import relevant libraries
import pandas as pd
import sqlite3
# Connect to the database
conn = sqlite3.connect(r'C:\Path\To\SQLITE\8Kmeta.db')

#The following reads in a CSV file that has a column of CIKs and just one other data column - we want the CIKs

# To use the CIKs in the query we need a tuple object

input_df = pd.read_csv(r'C:\Path\to\CIK_FILE\data_CIK_test.csv')

cik_tuple = tuple(input_df['CIK'].unique()) # No need to list a given CIK twice, so let's keep one instance of each

# SQL query to execute
sql_query = """SELECT *  FROM eightKAcceptance
               WHERE  ITEM_4_01 = 'YES'
               and upper(ACCEPTANCE_TIME) != 'NOTAVAIL'
               and  date(SEC_FILING_DATE) > date('2007-05-01')
               and  date(SEC_FILING_DATE) < date('2011-07-05')  
               and CIK in {}""".format(cik_tuple)

# Execute query and close connection
output_df = pd.read_sql_query(sql=sql_query, con=conn)
conn.close()

# Output Data
output_df.to_csv(r'C:\Path\To\Output\test_8K_new.csv', index = False)

Auditor Database Update and New Database with over 1.8 million ACCEPTANCE-TIME Values for 8-K Filings!

Our data team lead by Manish Pokhrel completed addressing the 1,300 some-odd issues we identified in the 2005 – 2020 auditor data and completed the initial processing of 10-K filings from 1998 – 2004. These updates have been integrated into the files available on APPSTREAM in the Extras folder.

I did make a significant change to three of the fields in the data set. In the original sqlite3.db file the dissdate, balance_sheet_date and report_date were entered as integers in the form YYYYMMDD. I changed the form to text with the format “YYYY-MM-DD”. I did this to allow you to more directly create queries using the field as a data object. SQLITE3 allows you to operate on text fields that are formatted as dates in some neat ways. The following statement selects all observations where the balance sheet date ranges from 12/25/2019 to 1/7/2020. This should address those cases where the FYE is generally around 12/31 but the registrant operates on a 52/53 week year.

SELECT * from auditdetails where date(balance_sheet_date) > date("2019-12-25") AND date(balance_sheet_date) < date("2020-01-07")

I want to observe that we are not yet finished with this work and we make no claims as to its perfection (yet) – but I would rather make this available now. If we wait until it is perfect – well it might take another six months. When we do this kind of work it is secondary to our primary responsibilities. However, we now have over 148,000 auditor 10-K filing pairs. Once this is done we can finished we can get a better measure of SINCE for all of the auditors.

THIS IS BIG: You will find another feature in the Extras folder. The SEC did not systematically add an ACCEPTANCE-TIME stamp to all filings until near the end of Q2 in 2002. For example, of over 9,900 8-K filings made in Q4 2001 only 17 had a ACCEPTANCE-TIME stamp in the header.

We have had several requests for systematic access to the ACCEPTANCE-TIME for 8-K filings in particular. I started looking to sort out a way to get a proxy for this measure. After some testing I determined that the time stamp in the live feed to EDGAR subscribers is the best proxy we could generate. Therefore we have scanned the feed archives and parsed out the transmission timestamp for all filings. We have created a database of this information for the 8-K filings to respond to these requests. At the very end of this post is a block that describes the fields in the database.

The database has as primary key the CIK_ACCESSION-NUMBER for the filing. We are including the path to the folder where the filing is stored on directEDGAR, the item codes associated with the filing as well as other fields listed below. While the item codes have changed over time we have always standardized the item codes to the current scheme. For a listing of the item codes and their meaning please review the current Form 8-K. For example an old ITEM 4 is now 4.01. ITEM 7 is now 9.01. Thus, no matter what time period you are searching, if you want ITEM 4 you need to use the current 4.01.

This file can be useful in a number of ways. If for example, you just need 8-K event dates and reason codes – you can modify the query below by deleting ITEM_4_01 = ‘YES’ and the first ‘and’. (The modified query is in the Python file in the folder as well as a query to pull all observations)

Here is an example query to that reads in a file that has a list of CIKs and some other data. The CIKs are then used to identify those in that group that filed an 8-K for an ITEM 4.01 (Changes in Registrant’s Certifying Accountant). I also conditioned the results on meeting a date window and having a value for ACCEPTANCE_TIME

cur.execute("""SELECT *  FROM eightKAcceptance 
               WHERE  ITEM_4_01 = 'YES' 
               and upper(ACCEPTANCE_TIME) != 'NOTAVAIL'
               and  date(SEC_FILING_DATE) > date('2007-05-01') 
               and  date(SEC_FILING_DATE) < date('2011-07-05')  
               and CIK in {}""".format(cik_tuple))

The folder will have the CIK input file, the code I used to create the query as well as the output file. Please help us maintain the value of our offering by not sharing these files with anyone. If you are working with colleagues at other universities you can merge only the data you need for your study with their data after you have gotten their agreement that this unique data is available only to be used in the study in which you are participating. Sorry to be such a downer but – well it is a challenge at times.

Note – we are still tinkering with this data – we currently have filing ACCEPTANCE-TIME through 12/17/2021. I think we will probably update again in the first week in January 2022 and then plan on updating at the end of each quarter. However, since we add ACCEPTANCE-TIME to the 8-K filings now as metadata and we keep those updated weekly you should always be able to pull the full history.

As a reminder – please see this video for a description of how to save the zip file/files to your local computer (Accessing Extras Folder). Below is a description of the fields in the database.

"""The db has one table named eightKfilings when the table was created the following fields were included
    CIK_ACCESSION PRIMARY key,   # The issuer CIK concatenated to the filing
                                 # ACCESSION number by an underscore
    dE_PATH TEXT,                # This is the path to the folder that the
                                 # documents from this filing would be 
                                 # found if returned from a search on the 
                                 # platform
    CIK TEXT,                    # The Central Index key of the filer - note
                                 # this is a text value 
	                         # but it is not left padded
    SIC TEXT,                    # If reported in the header file of the 
                                 # filing - this value is the Standard
                                 # Industrial Code as reported in the header  
    FYE TEXT,                    # If reported in the header file of the 
                                 # filing - this value is the MD (1231)
                                 # of the end of the most recent fiscal year  
    CONAME TEXT,                 # This is the name of the filer associated 
                                 # with the filing 
                                 # as of the filing date
    ACCESSION TEXT,              # This is the accession number associated 
                                 # with the filing 
    SEC_FILING_DATE TEXT,        # The SEC reported filing date from the
                                 # header file
    ACCEPTANCE_TIME TEXT,        # If the header file included the 
                                 # <ACCEPTANCE-DATETIME> tag - the value 
                                 # associated with that tag
                                 # If the header file did not include the tag 
                                 # this value was pulled from 
                                 # the archive of the EDGAR dissemination 
                                 # feed. 
   ITEM_1_01 TEXT,               # For all of the following - the header file
   ITEM_1_02 TEXT,               # was inspected for the list         
   ITEM_1_03 TEXT,               # of ITEMS - Any ITEM indicated in the
   ITEM_1_04 TEXT,               # header file is assigned the value YES 
   ITEM_2_01 TEXT,               # If the ITEM code was not listed in the
   ITEM_2_02 TEXT,               # header file the value is left blank
   ITEM_2_03 TEXT,               # The SEC expanded the list of reportable 
   ITEM_2_04 TEXT,               # events and changed the numbering 
   ITEM_2_05 TEXT,               # of events with a new rule effective 
   ITEM_2_06 TEXT,               # 8/23/2004
   ITEM_3_01 TEXT,               # We mapped the ITEM codes from filings made 
   ITEM_3_02 TEXT,               # prior to 8/23/2004 to the 
   ITEM_3_03 TEXT,               # event codes required after 8/23/2004
   ITEM_4_01 TEXT,               # For instance - prior to 8/23/2004 ITEM 12
   ITEM_4_02 TEXT,               # was used to identify 8-K filings
   ITEM_5_01 TEXT,               # that reported on the Results of Operations         
   ITEM_5_02 TEXT,               # and Financial Condition.  After 8/23/2004 
   ITEM_5_02 TEXT,               # the new ITEM code became 2.02.  
   ITEM_5_03 TEXT,               # Therefore we recoded any 8-K that was 
   ITEM_5_04 TEXT,               # filed prior to 8/23/2004 with
   ITEM_5_05 TEXT,               # with an ITEM code of 12 to 2.02.
   ITEM_5_06 TEXT,               # For a current list of the ITEM codes 
   ITEM_5_07 TEXT,               # please review 
   ITEM_5_08 TEXT,               # https://www.sec.gov/files/form8-k.pdf
   ITEM_6_01 TEXT,               # USE the current ITEM code in any queries 
   ITEM_6_02 TEXT,               # or searches on the platform.
   ITEM_6_03 TEXT,
   ITEM_6_04 TEXT,
   ITEM_6_05 TEXT,
   ITEM_6_06 TEXT,
   ITEM_6_1 TEXT,
   ITEM_7_01 TEXT,
   ITEM_8_01 TEXT,
   ITEM_9_01 TEXT
There is a lot of useful information/examples about querying sqlite3 databases available from
  https://www.sqlitetutorial.net/

"""