HomeToolsAnnual ReportsAnnual Report Table Extractor

// ANNUAL REPORTS

Annual Report Table Extractor

Extract every table from a company annual report into clean, structured rows and columns.

annual-report-table-extractor.run
> extract annual_report_FY24.pdf
Document: Annual Report FY24 Pages scanned: 284 Tables detected: 63 - Balance Sheet p.118 - Profit & Loss p.119 - Cash Flow p.121 - Segment Revenue p.144 - Notes 1-42 p.130-218 Output: tables.xlsx (63 sheets) + sources.json

// EXAMPLE INPUT

command
$ extract annual_report_FY24.pdf

// EXAMPLE OUTPUT

output
Document: Annual Report FY24
Pages scanned: 284
Tables detected: 63
  - Balance Sheet           p.118
  - Profit & Loss           p.119
  - Cash Flow               p.121
  - Segment Revenue         p.144
  - Notes 1-42              p.130-218
Output: tables.xlsx (63 sheets) + sources.json

// EXTRACTION LOGIC

Layout-aware table detection runs across each page; multi-page tables are stitched on matching column headers. Header rows, units, currency, and footnote markers are preserved.

// SOURCE-LINKED OUTPUT

Every cell in the output Excel carries a source reference (PDF page, table index, row/column coordinates) so any value can be re-opened in the original document.

anchor (per value)
{ file, page, table_id, row_id, cell_id, label, value, unit, period }

// FAQ

Does it handle multi-page tables?

Yes. Tables that continue across pages are stitched into a single sheet by matching column headers and units.

Are units and currencies preserved?

Units (₹ Cr, ₹ Mn, USD Mn, %, bps) are kept in a dedicated metadata column so they are not lost during normalization.

What output formats are supported?

Excel (.xlsx) with one sheet per table, plus CSV and JSON. A sources.json file maps each cell back to its PDF coordinates.

// EARLY ACCESS

Get early access to the Annual Report Table Extractor

Paper Data is currently in private beta. Request access to start converting your financial documents into source-linked tables.