To provide a text mining device capable of extracting a characteristic expression of an arbitrary size in real time based on an information quantity reference considering from results of syntactic analysis of a positive example group and a negative example group to complexity of trees.
A syntax tree input means 1 inputs the result of syntactic analysis of a text group by syntactic analyzing technique, and stores it as a positive example syntax tree group 2 or a negative example syntax tree group 3. A partial syntax tree enumeration means 4 enumerates all partial syntax trees for each syntax tree of the positive example syntax tree group 2, an information quantity reference calculation means 5 tabulates, for each enumerated partial syntax tree, appearance frequency A in the positive example syntax tree group 2 and appearance frequency B in the negative example syntax tree group 3, and calculates the characteristic degree of each partial syntax tree by use of an information quantity reference considering the complexity of trees. A result output means 6 assigns the information quantity references calculated by the calculation means 5 as characteristic degrees to the enumerated partial syntax trees followed by outputting.
ONO KAZUHIKO
YAMANISHI KENJI
ARIMURA HIRONORI