Interesting Issue – We need to start thinking about our GENDER indicator.

I was introducing a new intern to one of our internal tools that we use for data processing. We have a number of dashboards that are populated with data when there is a missing value or if the system populates a field with an unexpected value. One of those dashboards is triggered when we are processing compensation data. If a new director or executive has not been included in any prior filing we are not likely to have a GENDER value. In this case the data shows up in a queue for someone to review and code.

When one of the team has to review a filing to identify the right code to use we provide them a link to the document in a dashboard with a place to enter the value they determine is correct. We have developed some proprietary tools to scan the document to identify the use of some specific person titles (Mr, Mrs, Ms, and the names that follow those title words. In addition – parts of the sentences personal pronouns are included in the dashboard with the referent names. If there are multiple titles or pronouns associated with one name (MR. KEALEY is the husband of MRS. KEALEY) these are flagged.

My intern wondered if we were making a mistake by limiting our search to those honorifics/titles. He became even more concerned when I explained the process we followed when we were not able to identify the GENDER using the dashboard. In those cases (when there no gender relating title or a gender explicit personal pronoun (he, she, her, him, he, his) associated with a person in a filing (yes it happens more than you would believe) we Google for a reference or image of the person. We make a determination based on a search result that contains the name of the person and the name of the entity that made the filing. Historically we have really thought it made our job easier if one of the search results has a picture of the person (presumably we did not find a picture in the proxy).

If you haven’t sorted out the problem yet, his question was – is this appropriate in a world where people are willing/prefer to identify as something other than the binary classification we use?

We try to add an indicator of GENDER to allow our research clients to test hypotheses on the association between various dimensions of firms and the characteristics of their executives and board. If we don’t sort out how to identify those executives and directors who believe that that the binary classification of their personage does not reflect some dimension of their identify we are failing to provide the right data. That was an awkward sentence but it reflects the truth and the problem.

While this is important I am stymied at the moment about how to move forward. I did a search in proxy (DEF 14A) filings made since 2005 for words that I thought would help identify these cases (gay, queer, lesbian, non-binary, LGBTQ). This seemed like a reasonable start to me – however a recent article in the New York Times made me less than confident that I had enough knowledge to create the right search (more on that later).

My first search was limited to filings made from 2005 to 2015. I found only 324 documents from 169 companies. There were only 135 documents from 69 companies where the context indicated that the finding was something other than about a Mr. or Mrs. Gay or an address (Gay Avenue, Gay St.). What was even more interesting was that there was no mention of directors with one or more of these attributes. All of the filings where these words existed the context was generally related to a statement about the registrant’s view of human rights and the value they place on a diverse workforce. The only other context for these words was when the words were included in shareholder proposals. I found no language used to indicate a person might prefer a non-binary classification.

However, when we move forward to the filings made in 2016 – 2020 some filers started indicating that some of their directors were LGBTQ. Usually this was disclosed in a Diversity matrix. One interesting thing about this disclosure is that it is not consistent across filers, even if they have a common board member. Specifically I have found examples where one board has a category to indicate which members of the board identify as members of the LGBTQ community – another filer with the same board member does not provide any indication about this characteristic of their board members. I will also observe that 2 registrants began providing a diversity matrix in 2019 that includes a non-binary classification related to gender. So far in 2021 (through 3/25) there are 4 registrants that have included this dimension in their diversity matrix. Despite the addition of this dimension to these matrices there is no indication of a director using this classification in any of the examples I could find.

This is something we are going to have to think about. Note – the only information I have found so far is just an indication that some board member identifies as LGBTQ. I have not yet identified information that would indicate a board member would classify themselves as other than F/M.

In an earlier paragraph I observed that I ran a search using words that I consider relevant. There was a really interesting New York Times article in the paper on 3/21 (the article first appeared online on 3/15) Who is making sure the A.I. Machines Aren’t Racist . I think I have read this article four times now. It is relevant to this problem – how can I/we authoritatively use language to classify a person who identifies as non-binary. This is kind of tricky. I keep telling myself that our first step is to focus on the personal pronouns used in text that describes the person. We can code for that and bring it up for review whenever there is more than one gender indicator or there are none that we are currently familiar with. The article though reminds me that we can get this wrong and need to be humble about the steps we intend to follow.

My current plan is to anticipate more nuanced disclosures about this attribute of board members and executives. Once we start finding evidence that is relatively unambiguous about diversity I think I am going to try to reach out and communicate with the filer to confirm our interpretation of their language. If we receive some positive confirmation then we will begin expanding the values used in the GENDER field.

I thought about putting up some example images of the disclosures we have found. I have decided that the people making these disclosures are intentionally making them to readers of the proxy – they are not necessarily making an announcement to the world. Thus I have decided not to provide examples at least until I can have a conversation with one or more of the directors who have made/supported the disclosure choices made by their companies.

A related problem is whether or not we propagate a disclosure choice made by one director to all of the registrants they are associated with. Imagine director FIRST_NAME LAST_NAME is identified as a member of the LGBTQ community and prefers the use of XIE as disclosed or apparent in proxy filings by FIRST_COMPANY in 2021. They are also directors of SECOND_COMPANY in 2018 – 2021 but the proxies for SECOND_COMPANY make no mention of this attribute. In the proxies for SECOND_COMPANY all references to FIRST_NAME LAST_NAME are made using either male or female pronouns. Do we change the GENDER from M/F in the data for SECOND_COMPANY or do we honor the disclosure choices they made in SECOND_COMPANY filings?

Bottom line is – this is now on our radar. I will indicate once we start needing to expand the values we report in the GENDER field. Right now I can’t imagine pointing to specific entities or people or even the documents where we drew the data to make our inferences. However, I am always happy to engage with you and take feedback on the classification decisions we ultimately make.

Missing PERSON-CIK and related metadata

During our normalization processes for Director Compensation we try to match the named directors with existing data to add the PERSON-CIK, SEC-NAME, AGE and SINCE (a measure of tenure). If we cannot match the data using our automatic processes the data table is shunted aside so a person can review and attempt to match manually. If we are not successful after reviewing the filing, prior filings and other source documents then we add some signal to the field to indicate the values we could not identify.

Certara Inc (CIK 1827090) filed their first 10-K on March 15, 2021. This filing included DC and EC data. We are missing values for some of the people listed in their DC table. Certara filed a draft registration statement in October 2020 and an S-1 in November. We have learned that many pre-IPO directors do not follow the registrant into the public market. In Certata’s case two directors who received compensation in 2020 resigned before the filing of their initial S-1 and so there is limited information about some attributes of the directors other than their names in the S-1 filing and in their initial 10-K. If directors resign before the S-1 filing and their holding are less than 5% they have no reporting obligation under Section 16. In these cases we try to discover if the individuals had some reporting obligation because of their relationship with another entity. This requires more than just a name match though.

In the case of Certara we can identify data for William E Klitgaard since he has a history on EDGAR. He has served as a director of Syneos Health since 2017. In his biography Syneos makes an explicit mention of his position on Certera’s board “Mr. Klitgaard currently serves on the board of directors and Audit Committee at Certara, a leading drug development consultancy . . .” This explicit mention will allow us to update the DC data of Certara with Mr. Klitgaard’s CIK.

However, Certara’s other director that resigned before their IPO was Edmundo Muniz. Their S-1 filing indicates that Dr. Muniz was their CEO from 2014 until early 2020. Their is not enough additional biographical information about him that would allow us to link him to another SEC registrant. In cases like this we would then search EDGAR for his name and then try to find some additional evidence to link him to the reporting company (Certera). But in this case there are no Muniz’s in EDGAR whose profile comes close to matching the profile that we would expect Dr. Muniz. Therefore – in this case we will be able to add his GENDER and his tenure (SINCE). We will not be able to add values for PERSON-CIK, SEC-NAME or AGE.

Periodically we will inspect EDGAR for cases like Dr. Muniz. Specifically we will look for name matches on EDGAR and if we find one we will then review the filings for the registrant to try to determine if the Dr. Muniz associated with that registrant is the same as the Dr. Muniz mentioned in Certera’s filings. It is much easier when they make an explicit mention of the other company (Certera in this case) in the biography of Dr. Muniz. Because he was not a director when Certara was a public company they are not obligated to make mention of that fact. If there is a close name match and the other company operates in a related industry we will use Google to try to determine if the individuals are the same or not. Until we can do that though the fields will continue to indicate that data is not available.

There are cases where a director might be affiliated with one or more public companies for long periods of time but never trigger a Section 16 filing obligation. We see this most often with banks and then foreign entities that are cross-listed into the US. We also see it when the director is the board representative/proxy for some security holder. I’ve been meaning to research the legal basis for the limited reporting by the foreign executive and board membersto learn why they are exempt. I’ll save that for another time. In the latter case the filings usually indicate that all compensation is passed through to the named director’s employer (security holder).

New MetaData Testing – Available Now

With the our filings now hosted in the cloud we can add new metadata to filings without having to ship out 6.0 TB hard drives to our clients. We are running a test right now with the addition of a number of new fields. This initial test is limited to 10-K filings made between 1/1/2016 and 12/31/2020 that are not amended. This test index is now available (10KTAGTEST Y2016-Y2020).

The fields we are adding in this initial test and their description are described in the schedule below:

Field NameDescription
ACCEPTANCEDATE/TIME of EDGAR system recording the ACCEPTANCE (but not the dissemination) of the filing. This value is in the form YYYYMMDDHHMMSS where hour is based on a 24 hour time representation. You will note that we historically have used the dissemination date so any filings made after YYYYMMDD1730 have an associated RDATE one business day later.
FILERCATEGORYThe entity reported filer category as defined by the SEC (see table below for codes).
COMMONSTOCKSHARESOUTSTANDINGThe number of common shares outstanding as of the latest practical date. This is the value reported on the cover sheet and is reported in shares.
COMMONSTOCK_DATEThe date as reported by the registrant that the number of common shares were calculated.
PUBLICFLOATThe label perhaps is not precise. This is the market value of all shares held by non-affiliates as of the last business day of the registrant’s second quarter.
FLOAT_DATEThis is the date the public float was measured.
ICFRAUDITAn indication as to whether or not the auditor of the financial statements also issued an opinion on the internal controls over financial reporting. The values are TRUE/FALSE This flag is initially only available for registrants who adopted the new 10-K form in 2020. While compliance was required for all 10-K filings made after 4/27/2020 we have noticed a number of registrants who did not conform to the requirements. This tag will initially be included only in filings where the registrant met their reporting obligation though we expect to backfill once our testing is complete.
Table Describing New Metadata Tags added to 10-K filings

The values for FILERCATEGORY and their meaning are:

CODEMEANING
LAFLarge Accelerated Filer
SRCSmaller Reporting Company
AFAccelerated Filer
NAFNon-Accelerated Filer
SRAFSmaller Reporting Accelerated Filer
Explanation of Codes used in NEW FILERCATEGORY Tag

Having these codes embedded in the filings now allows you to use these in your searches when appropriate. For example – suppose you want to find all 10-K filings made by Accelerated Filers who reported that they included an ICFR-Audit from their auditor. To include meatdata in a search we use the fields button on the application to select the appropriate field. We need to specify two fields for the search described above. First, we want to select the FILERCATEGORY field and specify AF for Accelerated Filer.

Selecting a field to use in a search.

Once we type in AF for Accelerated filer and hit the OK button the search pane in the application will populate with (FILERCATEGORY contains(AF)). To also limit the search to those filings that included a ICFR audit indication on the face of their 10-K we need to add the AND operator. Then use the fields button to select the ICFRAUDIT field and enter the word TRUE (valid values for this field are TRUE/FALSE).

Selecting the ICFRAUDIT field to use in a search.


After entering TRUE and then hitting the OK button our search is now (FILERCATEGORY contains(AF)) and (ICFRAUDIT contains(TRUE)). So we are looking for any 10-K filing made by an accelerated filer that indicated their auditor performed an audit of internal controls.

Whenever you perform a search and create a SummaryExtraction or a ContextExtraction the application always includes all of the available metadata from the filings that were returned in your search. Therefore you don’t have to specify the metadata in the search – all of the available metadata is included as columns in the output file. My search returned 16 documents. I created a SummaryExtraction from this search. If you want to review those results follow this link (Example SummaryExtraction)

Our processor code for the filings has historically added the same metadata to every document that is included in a filing. So all of the exhibits associated with a 10-K filing had the same tags as the 10-K filing. In this initial test we are only tagging the parent document (the 10-K) with the additional metadata. The exhibits will continue to include the existing metadata. For testing purposes the only documents in the 10KTAGTEST index folder are 10-K filings (no amendments or exhibits). The originally 10-K and exhibits are still in the usual place. Your feedback will help us determine whether we should add additional details to the individual exhibits.

To access this index from your instance – follow the steps described in the help to update your index libraries (File\Options, Index Library, Generate Index Library). Remember – you don’t have to do any browsing – when the Main Index Options is visible the Main Index Library Folder path should be visible – hit the Generate Library button.

Index Library Selection Tool – once visible – confirm the path exists in the Main Index Library Folder hit Generate Library

Unfortunately there is a catch. We have 3,300 or so 10-Ks filed in this window for which we cannot yet authoritatively confirm one or another item of metadata. This seems to mostly related to those cases where the registrant has two or more classes of common stock. Our challenge is matching the name of the common stock class with the shares outstanding. Right now we have identified some cases where we are matching the wrong name to the count. This is a critical issue and one we are still working on. This problem is why we are still running in test mode rather than in production. I am hoping we do not have to resort to manual matching. In those cases we have decided not to add metadata for those particular attributes yet. So you will see cases where the public float is reported but there is no data about the shares outstanding.

Finally – please remember that these values are reported by the registrant. There are bound to be registrant errors. This is a new area for us and our immediate focus is on capturing the values as reported. I have a very significant degree of confidence that when the registrant reports it is an ACCELERATED FILER we have captured that correctly. Whether or not they are is another issue. Also remember that the meaning of some metadata values will change across time. The authoritative source for an explanation of any specific terms/fields is always the SEC’s rule making disclosures.

Your feedback on additional metadata to include will be appreciated. As I was working on this I had a conversation with a client who expressed an interest in having a hyperlink to the document on EDGAR. This is the second time I have heard that this could be beneficial so we will add this field in the next iteration (less than a month I hope). So if you think some other fields would be useful please let us know.