To provide a technique for removing only a seal impression while keeping character string information when applying OCR to a business document stored in grayscale, even if the character string and the seal impression overlap with each other.
The character string that overlaps with the seal impression is extrapolated by matching a character string present near the seal impression against a database. More specifically, first, a seal impression region in the business document input in grayscale is removed. Next, character information that is present near the removed seal impression region and of which a portion of the characters is unclear due to the seal impression region, is extracted as seal impression related information. Then, an attribute of the extracted seal impression related information is identified, a customer database storing character string candidates containing customer information is referred to, and based on the seal impression related information classified by attribute, the character string that overlaps with the seal impression region and that is thus unclear is extrapolated.
COPYRIGHT: (C)2010,JPO&INPIT
JPH11272804A | 1999-10-08 | |||
JP2004280530A | 2004-10-07 | |||
JPH01181177A | 1989-07-19 |
CSNG199900107002; 杉山淳一 外5名: '文書画像理解における単語情報と論理構造の援用法' 電子情報通信学会技術研究報告 第89巻 第389号, 19900125, 第9〜16頁, 社団法人電子情報通信学会
Sekiya Mitsuo
Toshiaki Watanabe
Hidekazu Matsumaru