We started parsing Executive and Director compensation from filings a number of years ago. We did this because when I visited our early clients I found that many were using our tools to collect this data to augment data they had available through a S&P product (which I won’t name). The decision to collect and make available this data was an easy one because it tied directly into the main reason I started directEDGAR – to reduce the time it took to collect data so our clients could focus on their research not data collection.
Being a small company with limited resources we had to make very cost focused choice about how to distribute the data and so we hit upon this idea of storing the data from a table as a JSON object in a CIK/YEAR directory because it was not hard to write Python code to deliver the data from a request file. And the infrastructure to make this happen was not too complicated.
Here is what one row of the data looks like as a JSON object:
{
"CIK": "100885",
"RDATE": "R20180328",
"CDATE": "C20180510",
"FNAME": "F59",
"TID": "171",
"NAME-LABEL": "Name and Principal Position",
"RID": "11",
"PERSON-NAME": "RHONDA S. FERGUSON",
"PERSON-TITLE": "EVP CHIEF LEGAL OFFICER",
"YEAR": "2016",
"SALARY": "200000",
"BONUS": "720000",
"STOCK": "400017",
"OPTION": "",
"NQDEFCOMP": "",
"OTHER": "59746",
"TOTAL": "1379763",
"SEC-NAME": "FERGUSON RHONDA S",
"PERSON-CIK": "1677193",
"GENDER": "F"
},
This turned out to be a good choice in many ways, one of which is it allowed up to push this data to those who want the data to integrate into their offerings immediately – this works well for an API.
A downfall of this choice is that we have never really had an easy way to aggregate all of the data and start poking at it at scale. That JSON object you are looking at is a text string and so you have to pull the data to test values in fields. Our shift to the new platform and our decision to move all of this data into an SQL enabled database has made it easier to start looking at the data at scale. While we are still a bit away from releasing the full Executive Compensation database to the platform we are getting closer. But as I am working with it at this intermediate stage I was curious about the questions I could ask.

There are over 470K PERSON-YEARs in this data. It is very comprehensive going back to 2006. There was a big change in the SEC mandated disclosure for EC and DC data that took effect in early 2007 (and thus affected the disclosure of 2006 data). One key difference is that prior to the new disclosure regime most registrants did not report the value of securities that were part of the compensation package – either the number of securities (shares/options . . .) were disclosed in the table or in a footnote and it was fairly uncommon for a total compensation value to be reported.
The JSON example posted above was pulled from our archive of data filed in 2017 – that is one row for Union Pacific Railroad. Here is their EC table filed in 2004 and while they have a value for Restricted Stock Awards they do not report a value for the Options/SARs – the value you see below is a number of units. Later in the proxy they do provide one estimate of the value of those securities – but not all registrants did this. Further, notice that there is no total.

So what is this somewhat bothersome finding. Well it blew my mind when I decided to ask the simple question – what is the gender distribution in this archive. How many FEMALE-YEARS of data have we collected?

We have 45,188 person years of women (awkward phrasing)

We have 418,876 person years of males. Remember that the bulk of this data comes from 2006 to the present.
I do not mean to be a social commentator – I am not very good at it. However, we are at an inflection point in society. The world is confronted with some very complex problems and it seems to me that these problems are not going to go away unless we engage with everyone who is capable of bringing parts of the solutions to the table. The differences in the gender distribution in the data is bothersome to me because I have never believed that because I am male I should move to the front of the line. But it is hard to imagine that we could end up with a distribution like this unless others believed that only men should be able to do certain things.
If you line up all of the people in the world based on some attribute I would just struggle to imagine any meaningful attribute where the ‘best’ was dominated by men. Of course I am the guy who was knocked out by a girl in sixth grade (let me tell you I never crossed her again!). But seriously, my first boss was Mrs. Ittenbach, she owned a Dairy Queen in Odenton Maryland – she taught me how to mop floors so you could eat off of them and gave me insights on how to treat people with fairness. There has never been a time in my life where I did not know a female that was smarter than me. There were very important women in my Ph.D. program, both faculty and fellow students. There have been women leaders at every college I have worked. Most importantly, in every single class I have ever taught there are as many women who are clearly capable of great things as there are men. My neighbor (Dr. E.) and I trade Wordle and Quordle results each morning – Dr. E. takes great pleasure in crushing me. (If you haven’t figure out, Dr. E. is a woman and I am trying to say she is smarter than me.)
On the one hand, as the father of an amazing son, I certainly don’t want him denied opportunities because of his gender. But Dr. E. is the mother of two amazing daughters and I know she has the same feeling (one of her daughters is an intern for us so I know how amazing they are). But this data makes it hard to deny that there are still likely systemic issues that affect opportunities for an important part of our population.
Those of you that are math geniuses will have probably already recognized that the total F + total M does not equal total observations. We have roughly 9,000 observations without a value for GENDER in our database. This is because we only started adding GENDER to the compensation data sometime after we started collecting it.
We are going to have to first establish if there as an engineering solution available (how many of these can we code for) but then there will be plenty we can’t. The problem is that most of these are going to be from the early years, just as the SEC mandated ownership reporting through EDGAR (so we can get the PERSON-CIK and they will be people who did not remain as Named Executive Officers long enough so we won’t have them in our PERSON-CIK archive. We made a bad decision early on to not collect GENDER if we did not have a PERSON-CIK. Fortunately we have been working through the consequence of that decision for a while. It may seen easy but there are too many names where the name does not provide any GENDER clue. We will try an engineering solution first and then the balance will be parsed out to the interns to work on when they are at a waiting point for another task.
I have had two questions about access to this data in the past week. One person was able to get what they wanted from the old request system and we did run a query and provided the results for another. If you have a pressing issue and need to use this data let me know and we can work something out. We just need more testing before we are quite ready to push it out for all. I will say we will make it before we have fully addressed the missing GENDER issue – we can and will update it periodically.
I would be remiss if I did not address the other diversity issue that is problematic – racial and ethnic diversity where many groups of people are even more underrepresented in the population of leaders than women. The problem is that data is much harder to access. While the NASDAQ has developed new board diversity disclosures, these are being phased in and the disclosure requirements do not specifically specify a mapping from a person to a diversity attribute. Further they do not require disclosures about the diversity of company leadership. While some companies provide a mapping between board members and various diverse attributes, many rely instead on a schedule like the following:

I used our TableSnipper to pull the available tables based on the presence of the words White, Female, and Asian. I did it again using the terms Caucasian and Gender. That seemed to cover the gamut of available tables. There are a large number of significant public companies that do not have any racial or ethnic diversity on their board. I looked at this particular company’s executive leadership and it reflected the (lack of) diversity of the board (more or less).

I have probably gone off the deep-end here. My initial goal was to provide an update on our EC transition. My update is a little nebulous but the important point is – if you need access before we are ready to port it out – we can probably work something out. Another important point is that is really is kind of cool to access the data so directly. I did get lost a bit there is the lack of FEMALE representation in the highest levels of public companies. And then the issue of evidence of a lack of racial/ethnic diversity on boards. It is going to be interesting to take advantage of some of these new disclosures to explore/test more granular hypotheses about how diversity adds meaningful value.

















































