Finally! (Well, Come Monday)

Lookie here – all of this metadata to be available on directEDGAR beginning on 8/30/2021:

<meta name="SICODE" content="7370">
<meta name="FYEND" content="1231">
<meta name="CNAME" content="ALPHABET INC.">
<meta name="FILINGDATE" content="20200204">
<meta name="ACCEPTANCE" content="20200203210359">
<meta name="ZIP" content="94043">
<meta name="DOCTYPE" content="10K">
<meta name="SECPATH" content="https://www.sec.gov/Archives/edgar/data/1652044/000165204420000008/goog10-k2019.htm">
<meta name="AddressCityOrTown content="MOUNTAIN VIEW">
<meta name="CurrentReportingStatus content="YES">
<meta name="SmallBusiness content="FALSE">
<meta name="WellKnownSeasonedIssuer content="YES">
<meta name="EmergingGrowthCompany content="FALSE">
<meta name="FilerCategory content="LAF">
<meta name="ShellCompany content="FALSE">
<meta name="AddressStateOrProvince content="CA">
<meta name="VoluntaryFilers content="NO">
<meta name="PublicFloat" content="663000000000.0">
<meta name="FloatDate" content="2019-06-28">
<meta name="CommonStockSharesOutstanding_1" content="299895185">
<meta name="ShareDate" content="2020-01-27">
<meta name="SecurityName_1" content="CommonClassA">
<meta name="CommonStockSharesOutstanding_2" content="46411073">
<meta name="SecurityName_2" content="CommonClassB">
<meta name="CommonStockSharesOutstanding_3" content="340979832">
<meta name="SecurityName_3" content="CapitalClassC">
<meta name="AUDITOR" content="ERNST & YOUNG">
<meta name="REPORT_DATE" content="2/3/2020">
<meta name="LOCATION_CITY" content="SAN JOSE">
<meta name="LOCATION_STATE" content="CALIFORNIA">
<meta name="SINCE" content="1999">

There is a lot of noise above (and I don’t think Jimmy Buffett is noise)- what is that? That is the new metadata block that was added to the 10-K Alphabet Inc. submitted to the SEC on 2/4/2020. This has taken longer than I had hoped – one of the reasons for the delay is the number of our clients who asked how to identify the auditor caused me to do some research. I learned that the availability of auditor data is spotty prior to the disclosure of audit fees beginning in 2001/2002. I know a number of our clients were using our search tools to find/identify the auditor and so that made me decide it was worth the effort to add the information we could about the auditor.

I think the most common request before this was either the accession number of the source file (accession.txt) or the path to the documents returned from a search. I will be happy to take feedback if you feel like the accession number should be added as a direct piece of metadata. However, as I think about balancing everything I have initially determined that since the accession number follows the CIK and is before the actual file name ( 000165204420000008 ) – it is easy enough to parse out of either a SummaryExtraction or ContextExtraction product created by the platform.

When available in a document, all of this metadata is accessible if you do a search and then do a SummaryExtraction of the results. Clearly some fields are not likely to be useful for searching, like (ACCEPTANCE). Some, though, are likely to be very useful for constructing your search – remember that disclosure requirements vary by FilerCategory. For instance, a LARGE ACCELERATED FILER (LAF) has an accelerated filing schedule relative to other filers (10-K is due 15 days sooner than a 10-K for an Accelerated Filer) and carries the largest disclosure burden of all filers.

To access the new fields, press the “fields” button. The Select Field box will become available, which allows you to populate the Value field in the interface. While you can use as many fields as you’d like in a search, you have to add them one at a time. This image shows the Select Field tool for the new Y2016-Y2020 index:

Field Listing Full 10K

The case of the listed fields is an indicator of the source of the field. If the field name is all upper case – we generated the value from some artifact during the processing or outside the system. If the field name is a sequence of words with the first letter in each word capitalized – we captured the field value from the filing in more or less an automated process.

We are adding less metadata to the exhibits included with the 10-K. Basically, we will be leaving out everything after the SECPATH value. Here is the metadata that is going to be embedded in the first Exhibit 10 filed with the 10-K referenced above:

<meta name="SICODE" content="7370">
<meta name="FYEND" content="1231">
<meta name="CNAME" content="ALPHABET INC.">
<meta name="FILINGDATE" content="20200204">
<meta name="ACCEPTANCE" content="20200203210359">
<meta name="ZIP" content="94043">
<meta name="DOCTYPE" content="EX10">
<meta name="SECPATH" content="https://www.sec.gov/Archives/edgar/data/1652044/000165204420000008/googexhibit10081.htm">

One of the consequences of not including the full field set that is attached to the 10-K filing into the exhibits is that it is not yet possible to directly search for EXHIBIT 10s that were included in the 10-K filings of a LARGE ACCELERATED FILER. Instead you would have to first run a search for (DOCTYPE contains(10K)) and (FilerCategory contains (LAF)). This will identify all 10-K filings made by LAFs. Do a summary extraction, save that file and pull the CIKs. Use that CIK list to do a CIK filtered search for (DOCTYPE contains(EX10)) and your search will only return EXHIBIT 10s that were included in a 10-K filing made by a LARGE ACCELERATED FILER.

The following table contains a list of the metadata labels as well as their definition. If you are a current filer you will receive an email that includes a third column (SOURCE) which describes how the value was determined.

METALABELDEFINITION
SICCODEStandard Industrial Classification Code (as assigned by the SIC)
 FYENDFiscal Year End for the most recent balance sheet included in the 10-K
 CNAMECompany Name
 FILINGDATEThe date the filing was submitted to EDGAR
 ACCEPTANCEThe date-time stamp associated with the acceptance of the filing by the EDGAR system
 DOCTYPEThe registrant is required to classify each document in a filing – this  tag indicates the classification of the document assigned by the registrant.
 SECPATHThe full path to the filing on EDGAR
 VoluntaryFilersYES/NO to indicate if the registrant is making this filing on a voluntary basis.
 SmallBusinessTRUE/FALSE to indicate if the registrant meets the SEC definition of a Small Business Filer
 ShellCompanyTRUE/FALSE to indicate if the registrant meets the SEC definition of a Shell Company
 FilerCategoryThe registrants conclusion regarding their classification per the SEC’s filer category classification definitions 
 FilerCategory     LAF – Large Accelerated Filer
 FilerCategory     AF – Accelerated Filer
 FilerCategory     NAF – Non-Accelerated Filer
 FilerCategory     SRC – Smaller Reporting Company
 FilerCategory     SRAF – Smaller Reporting Accelerated Filer (this is actually not a valid classification but it has been used by a number of registrants – there are 55 10Ks with this self-reported label)
 EmergingGrowthCompanyTRUE/FALSE to indicate if the filer meets the definition of an emerging growth company
 WellKnownSeasonedIssuerYES/NO to indicate whether or not the filer meets the definition of a Well Known Seasoned Issuer
 CurrentReportingStatusYes/No to indicate if  the registrant is current on their mandated filing obligations
 AddressStateOrProvinceThe State or Province of the headquarters of the filer
 AddressCityOrTownThe name of the City or Town of the headquarters of the filer
 ZIPThe ZIP/POSTAL code component of the address of the filer
 PublicFloatThe aggregate market value of the shares of the registrant held by non-affiliates as of the last day of the registrants most recent second quarter (if multiple float values are reported we sum them to maintain consistency – we do validation checks to catch the cases where a total is reported as well as the float for each class)
 FloatDateThe date used to determine the public float
 CommonStockSharesOutstanding_1The number of shares reported as outstanding – if there are multiple share types/classes reported this is the first listed.
 ShareDateThe measurement date which is the latest practical date (closest to the filing date) of the 10-K
 SecurityName_1If provided, the name of the security whose common shares outstanding are listed as CommonStockSharesOutstanding_1
 CommonStockSharesOutstanding_2The number of shares reported as outstanding – if there are multiple share types/classes reported this is the security listed second.
 SecurityName_2If provided, the name of the security whose common shares outstanding are listed as CommonStockSharesOutstanding_2
 CommonStockSharesOutstanding_3The number of shares reported as outstanding – if there are multiple share types/classes reported this is the security listed second.
 SecurityName_3If provided, the name of the security whose common shares outstanding are listed as CommonStockSharesOutstanding_2
 IcfrAuditorAttestationFlagTRUE/FALSE to indicate whether the filing includes an attestation by the/an external auditor on the Internal Controls over Financial Reporting
 AUDITORThe name of the auditor of the most recent financial statements (conforming to the FYEND tag)
 REPORT_DATEThe audit report date
 LOCATION_CITYThe city location of the auditor
 LOCATION_STATEThe state/country of the auditor.
 SINCEThe tenure of the auditor

This is only the beginning of our work on improving the opportunity to add fields to the filings. Right now the team has auditor details back to 2007 – we will collect all the way to 1994. We have identified a way to add an ACCEPTANCE datetime stamp to the earlier filings. While it is not available in the index or hdr files for most filings prior to about 2002 we have determined how to identify a very good proxy for this value. We will be redoing the 2021 10-K filings in the next week or so to add in the same meta data and then we will next redo the 2010 – 2015 10-K filings. We have also been working on supplementing the self-reported IcfrAuditorAttestationFlag field and are close to being able to add this value to a significant number of filings.

Details matter a lot. We have standardized the auditor names but we have not yet made any attempt to roll back auditor name changes. We cannot consider doing this until all of the auditor data has been collected. If you are an existing client and need the mappings for the auditor name standardization send me an email. We will not be fully populating all of the SINCE fields until all of the auditor data has been collected. We tried to use some algorithms to add this value when it was not reported but we were not happy with the results. Specifically there are too many cases of auditors reporting this value as the year they signed an engagement letter with the registrant – not the first year they audited financial statements.

Another Details matter a lot point – there are some registrants that report more than three classes of stock. There was one who reported nine values for CommonStockSharesOutstanding (indicating 9 classes of securities). But when we analyzed these we discovered that once we moved past 3 the results were fairly dicey with respect to trying to use the reported values in a meaningful way. If you are interested – we identified 41 unique CIKs that reported more than 3 classes of securities. Here is a link to the interactive data presentation for one (Strategic Student Housing & Senior Trust Inc). The only registrant I could identify where I thought this value was useful was Molson Coors (great product!) but in weighing where to start and recognizing that we would be adding two additional columns that would only be meaningful for one CIK – that seemed to be too much. We can revisit this based on your feedback.

We are doing something else that is pretty cool but I think I need to keep it under my hat for the moment. I just don’t want you to think this is it. There is always more!

As an aside one of the things I would still like to do is to push all of this field data into a database – make an isolated copy of the database and then provide you with the tools to access it for a data analytics class. I look at some of the stuff that is being used and I wonder whether or not those databases give students a real sense of the complexity and ‘messiness’ of real data. For example – there is no category SMALLER REPORTING ACCELERATED FILER – we only learn it exists by testing the data and establishing that some filings did not have a valid value for Filer Category. I could imagine creating a database that combines this metadata with other data (such as compensation) to give students a richer experience working with real financial data. Poke me if this interests you.

I will send a direct email to you as soon as this is complete. As of 5:45 PM 8/28 – this is the current status of the indexing operation:

Leave a Reply