To enable a display device to perform warning notification, control of an imaging device, or the like without registering voice data from a sound collection device with the imaging device such as a monitoring camera.
A display device comprises: registration means which creates reference voice information and registers the reference voice information with a storage server; selection means which selects the reference voice information stored in the storage server; acquisition means which acquires the reference voice information selected by the selection means from the storage server; comparison means which compares collected sound information output from a sound collection device that collects ambient sound around an imaging device, with the reference voice information acquired by the acquisition means; determination means which determines similarity of the collected sound information and the reference voice information on the basis of the comparison result made by the comparison means; and notification means which notifies a user of warning according to the determination result made by the determination means.