Search Operators

The following search operators can be used to build the search you need across our archive

Basic Operators

and

The and operator requires that any document contain the terms or phrases on either side of the operator. Thus, you need to have at least two terms or phrases. Note – if you need to find a phrase that contains the word and then the word and needs to have a tilde (~) appended to the word and. For instance audit and finance committee will find all documents that have the word audit and the phrase finance committee in the document. The phrase audit and~ finance committee will find all documents that have the phrase audit and finance committee.

and not

To use the not operator it has to immediately follow the and operator. It requires that documents have any words or phrases to the right of and not and none of the terms or phrases to the left of the and not operators. To use the word not in a search as a term or in a phrase you must append a tilde (~) to the word (not~).

or

The or operator will return documents the meet any criteria to the left and right of the operator. Like our other operators, if you want to use the operator as a search term or in a phrase you must append a tilde (or~).

pre/n

The pre/n operator will return all documents where the phrase/term to the left is before the term/phrase to the right within n word tokens. I use this often when I want to find a specific type of phrase. For example, if I want to find the frequency of audit committee meetings I might create a very narrow search audit committee pre/3 meetings. In that example I am only going to see documents that had the phrase audit committee that is no more than three words before the word meetings.

w/n

While the pre/n operators sets order, the w/n operator simply requires that the terms/phrases on the right be within (before or after) the terms and phrases to the left.

More Unique Operators

xfirstword

The xfirstword operator finds the first word in a document. One example of where this can be helpful is when you want to try to limit contracts (Exhibit 10s) by their nature. A simple application would be to specify xfirstword pre/100 credit agreement. This would require the phrase credit agreement to be within the first 100 words of the first word in the document.

xlastword

The xlastword operator finds the last word in a document. Again this helps anchor your search to a particular location in the document. One user reported that it helped them identify the CFO by searching for Chief Financial Officer pre/100 xlastword when they were searching Exhhibit 32s.

andany

The andany operator allows you to find and importantly account for noisy phrases that might be returned in your search without excluding relevant possible matches. Since the recent cases that come to mind relate to some clever work of one of our clients I am going to provide a more general case. Suppose you are looking for documents that describe competition. One of the challenges is that you don’t want the phrase less competition. The first option to consider would be to construct the search competition and not less competition. However, the and not condition is really strong, it will exclude those cases where there are other discussions of competition even if there are discussions of less competition. This search competition and any less competition will highlight the cases where less competition is in the same document as competition. Here is an example of a document that would be excluded if you relied on the and not search.

The summary results file will report the instances of competition and the instances of less competition.

date(begin to end)

Our platform allows you to specify a date range as a parameter in a search by specifying the word date immediately following the word with an open parenthesis a begin date (m/d/yyyy) the word to and an end date in the same format followed by a closing parenthesis. For example I ran this search on all 10-K filings and exhibits filed in 2023 date(2/1/2023 to 3/10/2023). The results looked like this:

We often use the date search to help collect data. Here is a post that describes using dates to help augment our identification of auditors, audit report dates and auditor locations: (Audit Report Dates)

Numbers

Since we index numbers they are searchable. We don’t index punctuation marks or special characters but I often include them in my search because it seems more natural and easier to validate my search results. For example, if I wanted to identify adoption ASU 2023-09 Improvements to Income Tax Disclosures. I would search for ASU 2023-09. The results would be the same as searching for ASU 2023 09 because our parser will search for all instances of ASU adjacent to 2023 adjacent to 09. Note, our parser normalizes all spaces (ASUspacespace2023) will be normalized to ASUspace2023).

Numeric Patterns with = Operator

The = sign represents any digit (0-9). If I replaced the 09 in the previous example with ==, my search ASU 2023-== will return all instances of ASU 2023-##.

The ability to search for number patterns provides significant benefits when used deliberately for data collection in conjunction with our Context Extraction feature. One example is to collect the CEO pay ratio. While there is no standard requirement for the exact form of this disclosure registrants tend to disclose this in some fairly common patterns. Here is a screenshot of this search: employee pre/5( ==== 1 or === 1 or == 1 or ==== to 1 or === to 1 or == to 1):

The search I used specified that I wanted to identify all documents that had the word EMPLOYEE five words or fewer before a sequence of either four, three, or two digits followed by a space (or a hyphen or a colon or any other special character followed by a single digit as well. The search also considers the case of four, three or two digits followed by the word to followed again by a single digit. I decided to highlight the results from Chevron since there were two matches within a short span of each other. One of the matches is relevant, the other is not as relevant. I am going to do a context extraction with these results (see Basic Extraction Features for a more complete explanation). I am going to set a relatively tight span of 3 words (3 words before the beginning of the search result and then 3 words after the last ‘word’ in the search result). Here is a screenshot of those results:

You can see the noise but you can also see the data that you want to collect. Either creative filtering in Excel or a Python script to operate on the CONTEXT column of the results file will easily help you identify and separate the useful data from the noise.

n~~n

You can also search for numbers by specifying a numeric range. Our indexing process ignores most punctuation marks, including commas. I know past users have used this feature to more precisely identify disclosures in 10-K filings about advertising or research and development costs or the numbers of employees.

To use a numeric range in a search enter the upper and lower bounds separated by two tildes (~~). Because the indexer ignores commas, decimals and other symbols the range 1~~999) should capture/identify those cases that are relevant even if the number is greater than 999. For example to find disclosures about employee counts that might be reported try this search ((DOCTYPE contains(10K)) and employee w/30 1~~999).

Mail – Specifically Email Addresses

To find email addresses use mail() where the content inside the parentheses reflects the email pattern you want to find. I wanted to find all email addresses in proxies filed so far this year so I ran the following search.

mail(*@*.*.com)

Other patterns will yield different results. For example mail(*@*.*.com) will return those cases where the domain name has a dot separating multiple parts.

directEDGAR

Search, Extraction & Normalization Engine