DISSEMINATION TIMES

This is a database that we struggled with regarding the name and the field names. The first iteration of this was named SUBMISSION_TIMES. However, technically these are really DISSEMINATION times as the values are pulled from an archive of the daily feed that the SEC provides to subscribers. The time we report is from this feed and so it represents the time the filing was distributed to subscribers. Thus, the times will differ from the ACCEPTANCE_DATETIME field in some of our other products as that field is expected to represent the DATETIME the filing was accepted by EDGAR.

There are holes in the database The only data available for Q2 2021 is from July 1st to July 9th. I have sent several emails to the SEC about this and have yet to receive a response. We considered adding in the ACCEPTANCE_DATETIME from the Q2 2021 filings in our archive but that leads to another set of problems. The data from April 15, 2022 is missing as well.

I generally prefer to interact with this file using Python rather than the Query Tool as it is just faster.

PATH	"S:/directEDGAR_DATA/DISSEMINATION_TIMES.db"
Table Name	DISSEMINATION_TIME
FIELD	DESCRIPTION
CIK_ACC	CIK and accession number concatenated together
CIK	Central Index Key assigned by the SEC to the filer
ACCESSION	Accession Number assigned by the SEC to the filing
FORM	The SEC Form name for the filing
FILING_DATE	The filing date as reported in the SEC INDEX file that was used to identify the filing
CNAME	Company Name
DISSEMINATION_TIME	Dissemination date time of the filing as reported in the SEC daily summary files of the dissemination feed to the users of their direct feed service. The field has been converted from YYYYMMDDHHMMSS format to YYYY-MM-DDTHH:MM:SS

Because of the complexity of working with this database we provide a code sample that we hopes help make it easier for you to create the code that meets your use case. In this particular example we had a csv file that had around 5,000 unique CIKs. We first ran a query on the 8KMETA database to identify all 8-K filings made by this sample between February 5, 2004 and August 9, 2018. We then merged the results with the DISSEMINATION_TIMES data to match in the dissemination time.

This code will not run unless you provide the input file – the results will be saved in the Temporary Files folder in a directory named batch_outputs.

One of the key reasons to provide this sample is that we really need to use more primitive libraries/tools rather than tools like Pandas. One of the problems with Pandas is it loads the entire databases into memory before running the query. In one of my iterations using Pandas to test this I received the following error message: numpy.core._exceptions._ArrayMemoryError: Unable to allocate 220. GiB . . . I think that is suggesting that we needed 220 GB to run the query using Pandas as the interface.

I modified the code to limit the initial query to the 8Kmeta database to 2,500 CIKs and then using the CIK_ACC found in the first query I limited the batch size for the query to the DISSEMINATION_TIME database to 900 CIKs per query. This took less than three minutes to run. I just had to define my objects more carefully than if I had used Pandas.

Sample Code

import os
from collections import defaultdict
import sqlite3
import csv
import pandas as pd

# Load the CSV file and extract CIK values
csv_path = r"D:\PhotonUser\My Files\Temporary Files\my_sample.csv"
df_csv = pd.read_csv(csv_path)
cik_values = df_csv['CIK'].tolist()

# Connect to the databases
db_path_8kmeta = r"S:/directEDGAR_DATA/8Kmeta.db"
db_path_dissemination = r"S:/directEDGAR_DATA/DISSEMINATION_TIMES.db"
conn_8kmeta = sqlite3.connect(db_path_8kmeta)
conn_8kmeta.row_factory = sqlite3.Row
cur_8kmeta = conn_8kmeta.cursor()





# Define the output directory
output_dir = r"D:\PhotonUser\My Files\Temporary Files\batch_outputs"

# Create the output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

# Define the batch size
cik_batch_size = 2500
cik_acc_batch_size = 900

# Define the date range
start_date = "2004-02-05"
end_date = "2018-08-09"
fnum = 0
# Process the merging in batches
for i in range(0, len(cik_values), cik_batch_size):
    # Split the CIK values into smaller chunks for the batch
    batch_cik_values = cik_values[i:i + cik_batch_size]

    # Retrieve data from 8Kmeta.db for the batch and specified date range
    query_8kmeta = f"SELECT * FROM EIGHTKMETA WHERE CIK IN ({','.join('?' * len(batch_cik_values))}) AND SEC_FILING_DATE BETWEEN '2004-02-05' AND '2018-08-09' "
    cur_8kmeta.execute(query_8kmeta,batch_cik_values)
    results = cur_8kmeta.fetchall()
    batch_dict = defaultdict(dict)
    for r in results:
        r_d = dict(r)
        batch_dict[r_d['CIK_ACC']] = r_d
    
    
    # Extract CIK_ACC values from the filtered 8Kmeta batch
    cik_acc_values = [k for k in batch_dict.keys()]
    # now query dissemination times based on the CIK_ACC values from 
    # the first query.  Limit batch size to 900
    batch_list = []
    for j in range(0,len(cik_acc_values), cik_acc_batch_size):
        
        batch_cik_acc_values = cik_acc_values[j:j + cik_acc_batch_size]
        conn_dissemination = sqlite3.connect(db_path_dissemination)
        conn_dissemination.row_factory = sqlite3.Row
        cur_dissemination = conn_dissemination.cursor()
        query_dissemination = f"SELECT * FROM DISSEMINATION_TIME WHERE CIK_ACC IN ({','.join('?' * len(batch_cik_acc_values))})"
        cur_dissemination.execute(query_dissemination,batch_cik_acc_values)
        diss_rows = cur_dissemination.fetchall()
        conn_dissemination.close()
        #merge the data from the dissemination database to the data from the 8KMETA
        for r in diss_rows:
            d_row = dict(r)
            cik_acc = d_row['CIK_ACC']
            dest_dict = batch_dict[cik_acc]
            for k in d_row:
                if k in dest_dict:
                     # if a field in  the dissemination had the same name as a field 
                     # in the 8KMETA - rename the field,
                    new_k = k + '_diss'
                    dest_dict[new_k] = d_row[k]
                    continue
                dest_dict[k] = d_row[k]
            batch_list.append(dest_dict)
            y = batch_dict.pop(cik_acc)
    fnum += 1
    header_l = [k for k in batch_list[0].keys()]
    header_dict = dict((ch,ch) for ch in header_l)
    output_filename = f"merged_data_batch_{fnum}.csv"
    output_path = os.path.join(output_dir, output_filename)
    output_file_reference = open(output_path,'w',newline='')
    mywriter = csv.DictWriter(output_file_reference, fieldnames = header_l)
    mywriter.writerow(header_dict)
    mywriter.writerows(batch_list)
    output_file_reference.close()

directEDGAR

Search, Extraction & Normalization Engine

DISSEMINATION TIMES

Sample Code

Related