Python Example – Using the File Path to Find Documents and Using Director Compensation Data to Find Committee Assignment Tables in Proxy Filings

I posted some new code to our PythonCode examples. One of the code examples is one approach to finding committee assignment tables. In that example I did not mention but it is important to emphasize that I first scanned over 150 or so proxy statements to identify the different ways these tables are displayed. I was able to do that by just running a quick search for (DOCTYPE (contains DEF14A)). We always have to start data collection by viewing a range of documents to learn how the filers express the concept we are trying to find. That review helped me create that list of words that are in the code. Since this is an example, I do not mean it to be exhaustive but it is a starting point.

However, in the example code I demonstrate how we can start with a list of CIKs and find the related documents. It will generally be the case that we have to tune our KEY_WORDS etc to find the right data we are trying to collect. To help with that we will probably have to inspect documents where results were not available. One of the new features we built into the latest iteration of the platform was the ability to match on the file path of document in the repository.

The example also illustrates something I personally was excited to demonstrate. We have a pretty deep archive of director compensation and we have much of our director compensation data available in an SQLITE db that you can interact with in code. Since the table that reports committee assignments usually has the name of the directors – I pulled names from the director compensation data by CIK and some specific years to use those in my attempts to identify the tables.

Pulling DC Data to get NAMES to then use to Find Tables!

Based on my visual review of the initial documents I did not see any tables that did not have either the first and last names or just the last names. So I thought a good step one is to first find tables that had names in them.

I did set up the code to save the table so we can use the SmartBrowser to review. Here is a screenshot of one of the tables that I wanted to capture:

Awesome Table as Reviewed in the SmartBrowser.

Of course there were tables that I did not want – this means I need to tinker with my collection of words. Or, I can just delete the table using the Delete Current File button. This is always a balancing act and it might take multiple iterations to find the right set of appropriate and not so appropriate words.

Wrong Table

One test I did not implement was to set minimum dimensions of the table to be snipped, nor did I require the names to be in the same column. These can be added to the code with a bit of poking around.

In the example I created, if you followed along, the summary CSV file that reports on the results includes a variable named DE_PATH. The file also includes a stop_reason value. In my example, if there is a value in stop_reason, the current iteration of the code was not able to find a table that met the criteria. This could be either because the table we are looking for does not exist or it exists in a form that is not the form we expected. The only way to establish which of those is the correct explanation is to inspect those documents.

Summary Sorted on stop_reason.

I want to inspect the documents so I delete the rows that do not have a stop_reason listed and save the file. I then start the application select the index (in my example I am using PROXY Y2022-Y2022, click the Use DB checkbox, and then select the Set DB File.

Prepping to find specific documents.

Once I have selected Use DB and hit the Set DB File the application provides an interface to select the file – remember we need the DE_PATH column in the selected CSV file.

Selecting the File with a Column DE_PATH

We still need to specify some search term. Since I want to scan these documents I intend to use the search operator XFIRSTWORD.

Results of Using DE_Path column to find specific documents

One of the things you will discover is that some group of filers insert page images into their proxy and so there is nothing we can parse with a text/html parsing strategy.

NRG’s Proxy Page Reporting on Director Committee Assignments – it is an Image!

We will add more code examples. However, that code example also demonstrates how to accomplish some other tasks. In the code example I also provided some information about resources to learn Python as well as to find the appropriate disclosure regs as they relate to the filings.

Leave a Reply