I was reading a recent paper in Accounting Horizons that showed up in my Google Scholar feed with a directEDGAR citation. Professors Bright and Kim from Lehigh University and Professor Koo from Kyungpook National University collaborated on CEO outside Board Service and Managerial Short-Termism. They report finding that suggest CEO’s limited outside board service is associated with the CEO seeming to be more willing to focus on a longer performance horizon. Congratulations to all for having their research recognized and published. I have reached out to Professors Bright and Kim they earned some ice-cream from eCreamery and the shirt off my back!
As I was reading the paper I noted that they needed to identify CEOs with outside directorships. They used BoardEx to identify their sample. I read further that they used our platform to collect stock compensation for their CEOs. This got me wondering – since we use the PERSON-CIK as a key identifier when processing compensation data could I create a sample to identify all executives with outside directorships using directEDGAR? I think I can.
Here is one pass on the code to do so – the results are saved to a CSV file. Because I agonize over every little detail there are too many lines in the output file. I decided to match if the officer served as an outside director in the year before, the year of, or the year after the compensation year.
import sqlite3
import csv
# Paths to the databases
exec_db_path = "S:/directEDGAR_DATA/EXECUTIVE_COMP.db"
dir_db_path = "S:/directEDGAR_DATA/DIRECTOR_COMP.db"
# Establish database connections
with sqlite3.connect(exec_db_path) as exec_conn, sqlite3.connect(dir_db_path) as dir_conn:
# Cursor to retrieve distinct years in EXECUTIVE_COMP table
exec_cursor = exec_conn.cursor()
exec_cursor.execute("SELECT DISTINCT YEAR FROM EXECUTIVE_COMP")
years = exec_cursor.fetchall() # List of years in EXECUTIVE_COMP
output = [] # Collect results here
for year_tuple in years:
year = year_tuple[0] # Extract year value
# Fetch all executives for the given year, filtering for numeric PERSON_CIK
exec_cursor.execute("""
SELECT CIK, YEAR, PERSON_TITLE, PERSON_NAME, PERSON_CIK
FROM EXECUTIVE_COMP
WHERE YEAR = ? AND PERSON_CIK NOT LIKE '%AVAI%'
AND CAST(PERSON_CIK AS INTEGER) IS NOT NULL
""", (year,))
exec_records = exec_cursor.fetchall()
# Fetch all director records for YEAR-1, YEAR, and YEAR+1, with numeric PERSON_CIK
for yr in [year - 1, year, year + 1]: #this checks the year before and after
dir_cursor = dir_conn.cursor()
dir_cursor.execute("""
SELECT CIK AS CIK_DIRECTORSHIP, YEAR AS DIRECTORSHIP_YEAR, PERSON_CIK, AGE, SINCE
FROM DIRECTORCOMP
WHERE YEAR = ? AND PERSON_CIK NOT LIKE '%AVA%'
AND CAST(PERSON_CIK AS INTEGER) IS NOT NULL
""", (yr,))
dir_records = dir_cursor.fetchall()
# Build a dictionary for fast lookup of directors by PERSON_CIK
dir_dict = {int(record[2]): record for record in dir_records if record[2].isdigit()}
# Match executives to directors based on PERSON_CIK
for exec_record in exec_records:
exec_cik, exec_year, person_title, person_name, person_cik = exec_record
if person_cik.isdigit(): # Only process if PERSON_CIK is numeric
person_cik = int(person_cik) # Convert to integer for matching
# Check if the executive PERSON_CIK matches a director entry
if person_cik in dir_dict:
dir_cik, dir_year, _, dir_age, dir_since = dir_dict[person_cik]
if dir_cik == exec_cik:
continue
output.append((exec_cik, exec_year, person_title, person_name, person_cik, dir_cik, dir_year,
dir_age, dir_since))
print(year, len(output))
# Display or save the output as needed
print(len(output))
# Specify the output CSV file path
# Column headers for the CSV
headers = ["CIK", "YEAR", "PERSON_TITLE", "PERSON_NAME", "PERSON_CIK", "CIK_DIRECTORSHIP", "DIRECTORSHIP_YEAR",
"DIRECTOR_AGE", "DIRECTOR_SINCE"]
# Write the output list to a CSV file
output_csv_path = r"D:\PhotonUser\My Files\Temporary Files\MATCHED_EXECS.csv"
with open(output_csv_path, mode="w", newline="") as file:
writer = csv.writer(file)
writer.writerow(headers) # Write the headers
writer.writerows(output) # Write the data rows
print(f"Results have been saved to {output_csv_path}")
Tim Cook has been an outside director at Nike (CIK: 320187) since 2005. Here is a cut of the output file (that has more than 100,000 rows) with some of Mr. Cooks service reported:
| CIK | YEAR | PERSON_TITLE | PERSON_NAME | PERSON_CIK | CIK_DIRECTORSHIP | DIRECTORSHIP_YEAR | DIRECTOR_AGE | DIRECTOR_SINCE |
| 320193 | 2011 | CHIEF EXECUTIVE OFFICER | TIMOTHY D. COOK | 1214156 | 320187 | 2011 | 50 | 2005 |
| 320193 | 2012 | CHIEF EXECUTIVE OFFICER | TIMOTHY COOK | 1214156 | 320187 | 2012 | 51 | 2005 |
| 320193 | 2013 | CHIEF EXECUTIVE OFFICER | TIMOTHY COOK | 1214156 | 320187 | 2013 | 52 | 2005 |
| 320193 | 2014 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2014 | 53 | 2005 |
| 320193 | 2015 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2015 | 54 | 2005 |
| 320193 | 2016 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2016 | 55 | 2005 |
| 320193 | 2017 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2017 | 56 | 2005 |
| 320193 | 2018 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2018 | 57 | 2005 |
| 320193 | 2019 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2019 | 58 | 2005 |
| 320193 | 2020 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2020 | 59 | 2005 |
| 320193 | 2021 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2021 | 60 | 2005 |
| 320193 | 2022 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2022 | 61 | 2005 |
| 320193 | 2023 | CHIEF EXECUTIVE OFFICER | TIM COOK | 1214156 | 320187 | 2023 | 62 | 2005 |
Mr. Cook’s PERSON-CIK is 1214156 – so by checking for his CIK in the DC data by year we have him in our output.
Of course – we are not perfect – in the Bright, Kim and Koo paper they focused on CEOs. We have yet to normalize the titles of the officers. Now that we finally have a complete pipeline to make the compensation data more easily available I think I am going to have one of our interns start working on an officer title mapping dictionary. This should not be challenging, especially when considering key positions like the Chief Executive Officer and the Chief Financial Officer.
As I was writing the last paragraph I started wondering – well suppose I wanted that now – how hard would it be to start cleaning that up. Excel has a great feature – remove duplicates and so I copied the titles to a new sheet and ran that function – out of the more than 106,000 total matches there were only 7,471 unique titles
In this reduced set there were 1,521 titles with the phrase CHIEF EXECUTIVE OFFICER. Of course it is not that simple – if you look at the next screenshot you will see that title is part of the title of an officer who is the CEO of a business unit or subsidiary of the reporting company.
Based on this quick review I am convinced of two things. First, if I wanted to recreate their sample and test another hypothesis I feel confident I could do so starting with that matching code from above. The one thing I have not yet accounted for is a merge with say Compustat. Limiting my sample to those firms covered in Compustat would further reduce the number of titles I would have to analyze. I suspect 15-20% of the executive we have from the first pass would not match to companies available in Compustat. Further, others would be excluded once we started controlling for the usual (for example they excluded all financial firms). Frankly, I think if I wanted to do this for a project today I am confident I could identify all of the CEOs from this file in relatively short order – perhaps a couple of hours. And while this may seem daunting – I began this journey by collecting acquisition purchase price allocations from micro-fiche 10-Ks and annual reports. This is nothing!
A final thought as I wrap this up – so initially it might seem a bit tedious to identify the CEOs by analyzing the titles. I was sharing this discovery/directEDGAR extension with one of my colleagues and she reminded me that she was a co-author on another paper 6/7 years ago. They also needed to match data from BoardEx to our compensation data. She described the process I helped her set-up to match by names across the two samples. That process took much more time and was hugely more tedious then it would take to normalize these titles.

