To individually store all utterances of each participant together with the whole progress of a conference, and enable easily acquiring necessary data of the utterances.
The participants a-d of the conference make utterances by using utterance input devices 2-5 provided with microphones 16 and cameras 17. Audio data inputted by the utterance input device and image data of photograph of each participant's face are individually recorded in a folder to which a storage device 13 corresponds at each utterance input device. An entire utterance data creator 33 creates entire utterance data in which the audio data inputted into each utterance input device and image data are arranged in time series. An audio/text converter 34 converts the audio of the entire utterance data into text data. Minutes creating section 35 creates minutes data on the basis of the text data and the image data. The data can be read from a storage device 13 during a conference and after the closure of the conference.