Raw SEC filings are sent in a SGML file - this parses that master submission into component documents, with content lines in list column 'TEXT'.

parse_submission(x, include.binary = T, include.content = T)

Arguments

x

- Input submission to parse. May be one of the following:

URI

URL to a SEC complete submission text file

Text

String with the full submission

File path

Path to local file containing the submission

include.binary

- Default TRUE, determines if the content of binary documents is returned.

include.content

- Default TRUE, determines if the content of documents is returned.

Value

a dataframe with one row per document. For the metadata (TYPE, DESCRIPTION, FILENAME) it is important to note that these are provided by the filer and have little standardization or enforcement.

SEQUENCE

Sequence number of the file

TYPE

The type of document, e.g. 10-K, EX-99, GRAPHIC

DESCRIPTION

The type of document, e.g. 10-K, EX-99, GRAPHIC

FILENAME

The document's filename

TEXT

The text representation of the document. For text-based documents (txt, html) this is the actual file contents. For binary files (graphics, pdfs) this contains the uuencoded contents.

Details

Most of the time the information you need along with the specific files will be available by using filing_documents, but there are scenarios where you may want to access the full contents of the master submission -

Old Submissions

Older submissions are not parsed into component documents by the SEC so access requires parsing the main filing

Full Document List

The SEC only provides what it considers the relevant documents, but filings often include many more ancillary files

Efficient Downloading

If you're fetching many documents from a filing over many filings, there can be efficiency gains from just downloading a single file.

NOTE: non-text documents are uuencoded and need a separate decoder to be viewed.

Examples

# \donttest{ try( parse_submission(paste0('https://www.sec.gov/Archives/edgar/data/', '37996/000003799617000084/0000037996-17-000084.txt'))[ , c('SEQUENCE', 'TYPE', 'DESCRIPTION', 'FILENAME')] )
#> SEQUENCE TYPE DESCRIPTION FILENAME #> 1 1 8-K 8-K ceostrategicupdate8-k.htm #> 2 2 EX-99 EXHIBIT 99 exhibit99ceostrategicupd.htm #> 3 3 GRAPHIC exhibit99ceostrategicupd001.jpg #> 4 4 GRAPHIC exhibit99ceostrategicupd002.jpg #> 5 5 GRAPHIC exhibit99ceostrategicupd003.jpg #> 6 6 GRAPHIC exhibit99ceostrategicupd004.jpg #> 7 7 GRAPHIC exhibit99ceostrategicupd005.jpg
# }