A client recently contacted me about making data from form N-PX more available. Form N-PX is used by institutional managers to report how they voted their shares in proposals put to shareholders. These reports have been filed/mandated by the SEC since 2004. Until this year they have been filed as txt or htm documents with significant variation in the structure of the document. We have periodically looked at them but since there is always something to do we have not invested significant resources into normalizing these filings.
This year the SEC required these filings to be filed using XML to report the voting records for each security and so it became much easier to parse them. In addition to revamping the form of the disclosure the SEC significantly increased the number of filers that are now required to report at least their vote on SAY-ON-PAY issues. Because of the increase in reporting obligations there were almost 9,500 N-PX filings filed by the deadline (8/30) this year as compared to roughly 4,000 per-year in prior years.
If you want to jump to see the Field Guide for this database – use this link to take you to the Knowledge Base (KB LINK)
I have parsed these and loaded them into a new database (npxdata) that is now available on the platform. Remember, I take constructive criticism well. I decided to omit all reporting of securities that do not have a CUSIP. I did this because the database is large and I am not certain of the value of having a compilation of the reports of non-US entities when there is not an underlying CUSIP available. I made the point about constructive criticism because we can make the data from these other securities available – I will wait for a prompt and discussion to make sure this will be valuable. The data has been parsed already to putting on the platform will not take significant effort. If there is enough interest we may just make both available but separate based on the available identifiers.
Here is a screenshot of the results of a search for reported votes against the Say-On-Pay proposals.
Forgive the repetition here – I am having too much fun exploring this. Here are the results of a search to identify all proposals related to environmentOrClimate:

The initial release of this data relies on the CUSIP. I need to take another step to merge with the CIK. That might take a bit of time and since CUSIP is available in other databases you might merge with I decided to do this initial release without the CIK.
Some of the way the data is displayed/delivered is not very efficient. Here is a screenshot of the original data that CAPTRUST FINANCIAL ADVISORS reported as it is displayed in the rendered html table on EDGAR. This data relates to their votes on the Advisory vote to approve executive compensation at the 2024 annual meeting for The 3M Company.
There are 26 entries reporting votes as they relate to different tranches of shares but for the life of me I cannot sort out why they were reported in this manner. The total number of votes against is 68,847 – so this matches the number of shares that they report as having voted – but I can’t understand why it was useful in their mind to report those individually in this manner. At the present time each of those individual votes have been preserved in our database as a separate line. We will include all of the other metadata about this vote for each line. Once I am confident we are not losing any useful data it is likely that we will consolidate all of these into one row. (I will observe that there are no included managers associated with this filing and their is no series information). It would be interesting to understand the process by which these votes were case – did they in-fact fill out 27 individual proxy cards because these shares are held in 27 individual accounts?
One really nice feature of having this data is that it is now trivial to identify specific proposals and the nature of their sponsorship. The SEC had curated a list of vote categories and these are a required disclosure for each item that is reported. Here is the current list (there are 14 items).
N-PX filers are required to assign at least one and as many as 14 codes to each vote that they report. Yes, there are some cases where 14 codes were assigned to one vote issue. From my perspective – this database can be used to identify those filers that had a voting issue that you might want to explore. This makes it easier to identify for example all voting issues that related to ENVIRONMENT OR CLIMATE. What we did was to remove the spaces and converted the text to camel-case. DIRECTOR ELECTIONS is now directorElections. We did this so we could convert these to 1/Null values. The screenshot above that has the results
The SEC mandated that the N-PX filer report on the sponsor of the proposals. The field in the N-PX filing is voteSource – we preserved that field name. The valid values are ISSUER and SECURITY HOLDER. Thus any proposal that has SECURITY HOLDER as a voteSource was sponsored/initiated by a share holder.
Filers are also supposed to report on the number of shares they have control over that are on-loan. My phrase construction may leave a bit of confusion because control is a bit nebulous term in this context. Nonetheless there is a field for SHARES ON LOAN (sharesOnLoan in the db). Institutional investors may lend shares and also direct the votes for those shares so those cases do exist. Shares may be on loan but if the investor directed the voting then those votes will be recorded.
One challenge for our initial users is going to be to sort out how to aggregate these by proposal (voteDescription in the database). The SEC’s language is fairly explicit that the filer is to “If a form of proxy in connection with a matter is subject to rule 14a-4 under the Exchange Act [17 CFR 240.14a-4], the description in Item 1(f) must: (i) use the same language that is on the form of proxy to identify the matter; . . .”
Here is the exact language of a proposal in Apple’s proxy:
I pulled all of the details of votes relating to proposals detailed in Apple’s proxy for their annual meeting that took place on 2/28/2024. My expectation was to find the language EEO Policy Risk Report. However I found the following variations of that language
- A shareholder proposal entitled “EEO Policy Risk Report”
- A shareholder proposal entitled EEO Policy Risk Report .
- A shareholder proposal entitled ”EEO Policy Risk Report”
- A shareholder proposal entitled EEO Policy Risk Report .
- Report on Risks of Omitting Viewpoint and Ideological Diversity from EEO Policy
- A SHAREHOLDER PROPOSAL ENTITLED EEO POLICY RISK REPORT". "
Clearly we can clean a lot of this up and we will. Now that this data is in a database we can view it and start developing rules to help normalize these labels. But that will take some time, thought and extensive testing. However, the paraphrase/translation used by the investor as displayed in the fifth row (highlighted in red) will be challenging to normalize when done at scale. We could not normalize these until we could see them – so that step will happen – I am going to presume that it is more valuable for you to have this data now rather than wait for the arduous process of confirming the editing that we need to do.
Here is a code sample that you can use to pull all rows from the database where section12ASayOnPayVotes = 1 and voteRecord_howVoted = ‘Against’
import sqlite3
import csv
# Define the database path and output CSV file
DB_PATH = r"S:\directEDGAR_DATA\npxdata.db"
output_csv = r"D:\PhotonUser\My Files\Temporary Files\filtered_data.csv"
try:
# Connect to the SQLite database
conn = sqlite3.connect(DB_PATH)
cursor = conn.cursor()
# Execute the query to select rows with the specified conditions
query = """
SELECT * FROM npxdata
WHERE section14ASayOnPayVotes = 1
AND voteRecord_howVoted = 'AGAINST'
"""
cursor.execute(query)
# Fetch all matching rows
rows = cursor.fetchall()
# Get the column names from the cursor description
column_names = [description[0] for description in cursor.description]
# Open the CSV file for writing
with open(output_csv, mode='w', newline='', encoding='utf-8') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=column_names)
writer.writeheader()
for row in rows:
try:
# Create a dictionary mapping column names to row values
row_dict = dict(zip(column_names, row))
# Write the row to the CSV file
writer.writerow(row_dict)
except Exception as e:
print(f"Error writing row: {row_dict} - {e}")
print(f"Data successfully written to {output_csv}")
except sqlite3.Error as e:
print(f"Database error: {e}")
except Exception as e:
print(f"Error: {e}")
finally:
# Close the database connection
if conn:
conn.close()
For your information – I was in a hurry so I asked ChatGPT for the code above. The prompt I provided was:
You said:
1. I have an sqlite database located at "S:\directEDGAR_DATA\npxdata.db"
2. There is a table in the database named npxdata
3. There is a field named section14ASayOnPayVotes that has a value of 1 or Null
4. There is another field named voteRecord_howVoted that has a value of FOR or AGAINST.
5. I want some python code to select all rows from the database where section14ASayOnPayVotes= 1 and voteRecord_howVoted = AGAINST -
6. I want to save them to a csv file named r"D:\PhotonUser\My Files\Temporary Files\filtered_data.csv"
7. I want to use Pythons CSV library - not pandas
8. I do not know the names of the fields so I want the code to find the columns in-order from the database
9. I want the rows to be written using the DictWriter method.
10. I want to write each line individually have some graceful error handling in case there is a problem writing any particular row.
The new form makes access to this data more efficient and enhances the nature of the disclosures. We hope pushing it into the database helps accelerate your research. Our next step is to add the CIK and then we will work on normalizing the voteDescription. Look in the Knowledge Base for a description of all of the fields.



