Example Reporting Errors

A Ph.D. student at one of our client schools struck up an email exchange after reading my last post. They specifically were interested in an assertion I made that some errors in reported totals were likely the result of someone using last year’s filing as the template for this year’s filing. Their email came just as I was working on addressing the total errors for FY 2021 director compensation data.

The problem is not huge, but it is painful to address. Out of approximately 4,400 registrants that we have director compensation data for 2021 so far, there were 150 of that with sum totals. I am about 1/3 of the way through reviewing these and I have only identified two where the error seems likely to be the result of using last year’s filing table as the starting point. Here is a great example:

Director Compensation from 2022 DEF 14A Filed by TESSCO TECHNOLOGIES INC (CIK: 927355)

Notice the missing totals, each of the computed totals is off. Mr. Baitler’s total is off by $5,090 and Mr. Zifferer’s total is off by $38,921. Notice the missing totals for Bryan, Dismore, Martine-Dolecki, McCray and McLean.

Here is an image of the DC table they filed in the proxy submitted in 2021 (2021 Proxy Link):

Director Compensation from 2021 DEF 14A Filed by TESSCO TECHNOLOGIES INC (CIK: 927355)

There are no sum/total errors in the 2021 proxy and – the totals included in the 2022 proxy match the reported totals in the table above. This gives us what we think is a reasonable basis to conclude that the totals reported in the 2022 proxy are the result of a company error. Therefore we are willing to use the sum we compute for the components as the total.

However, we can’t draw any conclusion about the $20,000 difference between the reported total for Ms. Clinton in the table reporting director compensation for 3D Systems Corp (CIK 910638). The table from their 2022 proxy is displayed below. The sum of Ms. Clinton’s compensation components is $224,995. The reported total is $244,995. However, these numbers are not in the 2021 proxy so we can’t attribute this difference to the same issue that is reflected in Tessco’s data.

Director Compensation Table as Included in 3D SYSTEMS CORP’s 2022 DEF 14A

Of course the follow-up question was how prevalent are these errors? While I am not finished reviewing the 2021 data – my guess is that there will be about 100 row observations out of more than 36,000 with a total error greater than $10,000. So that gets to a sum error rate of roughly 0.3%. And the final question I had to address was – how will I know there is an error? Good question, we are not flagging data inconsistencies at the moment. We might later. The best way I can suggest is to test the difference between the total we report (which in the case above would be the total reported by the filer) and the sum of the components.

Preliminary Director Compensation Database Now Available

A Full Listing of Our Preliminary Director Compensation Database – 564,718 Rows

I have just transferred a new database to our platform. It has almost all of the director compensation data in our archives. What is excluded are those observations where we have multiple tables for one CIK year. While there are 30/40 of these cases in most years there are 550 from 2020. These cases mostly occur when the registrant anticipates not being able to file their DEF 14A on time and so they file an amended 10-K with information for Items 10-14. And then later they will file their proxy (DEF 14A). Our intention is to carefully review the duplicate cases to ensure we are distributing the data from the proxy.

There are definitely more quality checks we can run on this data and we will do so. Another reason some data is not available is because of the cases where the sum of the components of the compensation does not equal the total that is reported. We set a cut-off value for a difference of +/- $10,001. We have to review these. Sometimes the error made is clear and easy to identify, for example a transposition error or the registrant used last years table as a model while preparing their disclosure but left last year’s total in one or more rows. In other cases it is a decimal issue. While we do attempt to contact registrants when these totals differ, most do not respond and thus we will release data with a mismatched sum after we have attempted to review the data.

When you previously accessed this data through a request file, the year you specified was the document year. Now the data is available for the data year.

I have written before that detailed director compensation was not a required disclosure until 2006. We do have some limited data from 2004 and 2005. I researched these and saw that most were reported in 10-K filings that were either extremely late or were amended for some financial reporting issue. Somehow some registrants determined they needed to include the Item 9-14 data even when the tabular director compensation data was not required for that particular reporting year.

Personally I am excited about the accessibility we are able to provide with this new interface. To generate the results you see in the first image, I selected the DIRECTOR_COMP database and hit the Execute button. Doing so will generate a complete list of all rows in the database (I would be careful with respect to trying this trick with the FILING_TIMES data.

I wanted to compare the proportion of female directors in 2008 with the proportion in 2020. I ran the following query to first identify the total number of directors in 2008

(YEAR  = 2008)
All Directors for 2008 as Reported in the Director Compensation Table N = 40,413

To check for women I added AND (GENDER = “F”). As you can see this returned 3,854 results

With out getting too precise, less than 10% of directors in this group were women.

We currently have 30,070 directors compensation observations for 2020. As you can see in the image below, the number of women increased to 7,066 ~ 23% of the total population.

Female Directors from 2020 Director Compensation Tables

There is more coming – all of the tabular data we currently have will be ported to this delivery modality. It will take time but it will make it more accessible. And, as I was working on preparing this data for the platform I realized that we need to add the ability to filter by PERSON_CIK. For example, suppose you pull some executive comp and you want to check whether any of your sample are/were directors of other registrants. You should have that ability and so that feature has been added to the list.

I don’t know what I don’t know!

There is so much that we do that I know nothing about when we start. Frankly, with everything else I have been working on I knew (and still know) very little about databases. One of our clients used the new database tool last night and observed that she thought that it took too long to get her results. Of course my initial reaction was defensive. I said, come on – it took a tremendous amount of work to get that data for you – be grateful! This is someone who has been a long time user and she pushed back (she won’t let me give her credit) and said databases can work more efficiently with indexes.

She is exactly right (and maybe that is why she is a full and I am an associate). I read up on indexes, and built a CIK, FILING_DATE and FORM_TYPE index for the FILING_TIMES database. Shazam, queries were easily 10 to 100 times faster (based on my crude time keeping).

For example, it took a little over 21 minutes to identify the SUBMISSION_TIME for all 8-K filings made between 2/2/2010 and 7/16/2015 before indexing. After indexing, it took less than 30 seconds to identify the same set of 449,795 results.

Here is the query I used to identify those filings

(FORM  LIKE '%8-%')  AND (FILING_DATE  BETWEEN '2010-02-02'  AND '2015-07-16') 
Query Results

New Submission Time Database Available

We loaded a new database on the platform today (FILING_TIMES). The database has all of the available submission times from the archive of the filings made available through the SEC Old Loads Archive. Unfortunately, the archive is/was incomplete as of 7/20/22 – the 2021 Q3 directory only includes filings from 7/1 though 7/9. I have already asked the SEC if these will be made available. We did some research into other sources available through EDGAR for compiled filing archives and the sources we have identified so far do not include the submission time. Another limitation is that there are no Old Load Archive entries prior to 1996.

What we did was merged the SEC INDEX Quarterly master.idx files with the submission line from the Old Loads Archive by accession number and used those results to create the database.

The primary key in the database is the concatenation of the CIK and the ACCESSION number separated with an underscore (CIK_ACCESSION). However, you can filter on CIK and query based on filing type. This database is huge and so I would advise caution in running queries that are not limited by CIK and date ranges. It is important to remember that more than one CIK can be associated with each filing. For ownership filings, the subject company CIK will be associated with the filing as well as the reporting person’s CIK. For companies that have subsidiaries that have reporting obligations each 10-K will be filed under the CIK of the parent and each of the subsidiaries.

The database has the following keys/fields:

  1. CIK_ACCESSION
  2. CIK
  3. ACCESSION
  4. FORM (SEC FORM NAME)
  5. FILING_DATE (as reported in the master.idx file)
  6. FILER_NAME
  7. SUBMISSION_TIME

The SUBMISSION_TIME is in YYYYMMDDHHMMSS format. If you wanted to process this as a date time object in Python (assume the date is in the variable SUBMISSION_TIME) date_obj = datetime.strptime(SUBMISSION_TIME, ‘%Y%m%d%H%M%S’).

According to the log there are 21,849,043 entries in the database. When I tried to retrieve all records on my test machine using our application it took approximately 20 minutes. While profiling this it was clear that the constraint is memory, the application had to make extensive use of the Windows Page (aka Swap) File. That is not very exciting. However, when I attempted a more realistic example, I had a list of 1,885 CIKs and set the search to retrieve all filings made by this set from 6/2/2007 to 10/3/2019 (note this date span was arbitrary). With this more realistic example the process took less than two minutes.

All Filings Database

I saved that to a CSV file, but it is important to remember that Excel can only fully open files with a maximum 1,048,576 rows. This file has 2,193,942 rows.

As I am writing this I realize we might need to add some features to our Query panel to make your work easier. I did not think of including a NOT operator – and I wanted to repeat the query and exclude the ownership forms. An important point, you are not limited to the operators on the panel. If you are comfortable with SQLITE query syntax – you should be able to construct a query (limited of course by the available columns and their properties). So for example, while we did not include a NOT operator as a push button (there will be one of the next version) you can construct a query using NOT.

(FILING_DATE  BETWEEN '2007-06-02'  AND '2019-10-03')  AND NOT (FORM  IN ("3","4","5","3/A","4/A","5","A")) 
New Query Excluding Ownership Forms

By excluding ownership forms the final file had only 903,906 rows.

We have more coming!

5.0 Released to Appstream – Finally

5.0 User View

We are two weeks later than I hoped when I last posted here. We are a small company. We had complications with some of the final bits associated with this release and at the same time we also were facing some unusual problems with the infrastructure related to our main platform. I can’t tell you how many nights I have been sitting watching resource monitors and waiting for thread activity exceptions to try to solve some of these problems.

The next time you log into our platform the image above is what you will see. As I noted before, one seemingly minor but important change is that the application will know about all of the document indexes and databases that are available.

One of the more challenging part of the rollout was to preserve your existing preferences and settings. I was quite excited when that little piece was finally solved last week. It was much harder to make that happen than I ever imagined. The setting I care about the most is my search history. The image below is from my personal search history.

Search History

One of the key reasons we wanted this upgrade was to provide simpler and faster access to the data that currently requires a request file and . . . You can see on the menu bar there is a Query Databases item. When you select that the interface will switch over to the database tool, all of the available databases will be listed in the Database to Query panel.

Before I go further, I need to make clear – because of how far behind we are with the application rollout – we are consequently behind in the database development etc. At the moment – please consider the EXEC_COMP database as a practice database. One of our interns is working hard to do the transformations we need to have that database online. I think eventually we will be maintaining a document with the databases and the meaning of the keys.

While I will be prepping some videos to support your use of the tool – I want to walk through a simple problem. Suppose I have a list of CIKs and I want to identify all 8-K filing events that affected those registrants for some window. In the steps below I am presuming you have logged into the system.

Step 1 – create a CSV file that has your list of CIKs (as always the CIKs must be integers) and transfer that file to the session.

Step 2 – Start the application and select the Query Databases menu item.

Step 3 – Select the 8Kmeta database.

Step 4 – Look at the Display Columns panel and adjust to suit your preferences. This panel specifies the columns that will be displayed in the viewer after the query and defines the columns available in the output. My practice is to select ALL.

To query we have to select Criteria Columns. In this case I am going to select SEC_FILING_DATE. The available operators will change based on the nature of the criteria. Since the application recognizes the SEC_FILING_DATE as a date value the application offers Comparison Operators that relate to dates.

Step 5 – Select SEC_FILING_DATE and select Between in the Comparison Operator box. (I selected 8/2/2001 for my From Date and 5/16/2018 for my To Date).

Step 6 – Hit the Add Criteria button.

Step 7 – select the Use CIK checkbox and then use the CIK Filter button to navigate to and select the CSV file that has your CIK list (remember – this file has to be available within your session).

Prepping a Query for all 8-K events filed by a specific list of registrants.

Step 7 – Hit the Execute button to execute the query.

Not Responding simply means the application does not have a listener available to accept your input to the application. This query took me approximately 30 seconds.

My Query Results

Notice – the Save Results button. Hit that to save the results to a CSV file. Saving happens pretty quickly, and when the files has been finished you will see a confirmation message.

Confirmation of Saving Query Results

To transfer the results to your local computer – select the Files icon from the control bar and select the Temporary Files folder:

Accessing the Folders Tool to Download

Use the control to the right of the Size column to select Download.

Download Results

I am very excited about this improvement. We have a lot to do to make the data transition successful but now that the application piece is in place we can shift our focus to that aspect. The automatic update of the available databases alone will be beneficial as I have seen too many cases where users have not known that new filings were available.