Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering

Hoai Nam Vu; Tuan Anh Tran; Na In Seop; Soo Hyung Kim

doi:10.2991/ijndc.2016.4.1.2

<Previous Article In Issue

Next Article In Issue>

Volume 4, Issue 1, January 2016, Pages 11 - 21

Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering

Authors

Hoai Nam Vu, Tuan Anh Tran, Na In Seop, Soo Hyung Kim

Corresponding Author

Hoai Nam Vu

Available Online 1 January 2016.

DOI: 10.2991/ijndc.2016.4.1.2 How to use a DOI?
Keywords: Multilevel, K-means, Connected Component, Thesholding.
Abstract: Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefore we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.
Copyright: © 2017, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Issue

Next Article In Issue>

Journal: International Journal of Networked and Distributed Computing
Volume-Issue: 4 - 1
Pages: 11 - 21
Publication Date: 2016/01/01
ISSN (Online): 2211-7946
ISSN (Print): 2211-7938
DOI: 10.2991/ijndc.2016.4.1.2 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - Hoai Nam Vu
AU  - Tuan Anh Tran
AU  - Na In Seop
AU  - Soo Hyung Kim
PY  - 2016
DA  - 2016/01/01
TI  - Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering
JO  - International Journal of Networked and Distributed Computing
SP  - 11
EP  - 21
VL  - 4
IS  - 1
SN  - 2211-7946
UR  - https://doi.org/10.2991/ijndc.2016.4.1.2
DO  - 10.2991/ijndc.2016.4.1.2
ID  - Vu2016
ER  -

download .riscopy to clipboard