Document Layout Analysis is a part of Computer Vision indicating the process of identifying and categorizing the regions of interest in a document image, e.g. a scanned page. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order. Detection and labeling of the different zones (or blocks) as text body, illustrations, math symbols, and tables embedded in a document is called geometric layout analysis. But text zones play different logical roles inside the document (titles, captions, footnotes, etc.) and this kind of semantic labeling is the scope of the logical layout analysis.
Document layout analysis is the union of geometric and logical labeling. It is typically performed before a document image is sent to an OCR engine, but it can be used also to detect duplicate copies of the same document in large archives, or to index documents by their structure or pictorial content.
Document layout is formally defined in the international standard ISO 8613-1:1989.
Read more about Document Layout Analysis: Layout Analysis Software
Famous quotes containing the words document and/or analysis:
“... research is never completed ... Around the corner lurks another possibility of interview, another book to read, a courthouse to explore, a document to verify.”
—Catherine Drinker Bowen (18971973)
“Ask anyone committed to Marxist analysis how many angels on the head of a pin, and you will be asked in return to never mind the angels, tell me who controls the production of pins.”
—Joan Didion (b. 1934)