How often do we make guesses about a person’s physical traits just based on their voice? If their voice is deep, they must be tall. If their breath is shallow, they must have bad posture. How often are we correct?
Rita Singh, research faculty at the Carnegie Mellon University LTI School of Computer Science, has created a software to answer these exact questions. With a background in computer voice recognition, and artificial intelligence applied to voice forensics, Singh has worked in collaboration with CINA to develop voice recognition software that can assist the department of homeland security in identifying and profiling criminals.
The software works in multiple interfaces in which it analyzes voice recordings from a variety of sources such as hoax calls, wiretaps, telephone conversations, and voice evidence from a multitude of crimes. From a 30 second soundbite the software is able to accurately discern age, height, weight, and ethnicity, estimated build and renders a 3D facial construction based on voice alone.
The project started in July of 2018, by September an alpha version of the technology was displayed at the World Economic Forum, dedicated to fostering entrepreneurship and furthering technology across the world in Tianjin, China. Tested by nearly one thousand people in three days, including state dignitaries, CEOs, CTOs, and royalty, the program crowd test was a great success.
Lines formed to wait in anticipation to see if the machine could accurately describe each participant’s physical characteristics. Participants read a thirty second snippet into a microphone and got to watch their own facial features reconstructed before their eyes. Although not accurate 100% of the time, the technology impressed all those who attended.
The language independent program was developed to detect false mayday calls that were sending the coastguard out to sea on unnecessary rescue missions. The program’s ability to make predictions about heartrate, voice frequency, stress level and a number of other psychological things will help to separate out false emergencies from real ones.
In addition to voice recognition, a database will be constructed cataloguing sounds such as the noise of certain boat engines and animal calls in order to better pinpoint call locations. For the time being, the database will contain largely maritime sounds. However, the goal is to have an exhaustive database than can distinguish a call from L Train in Chicago or the shuffle of Midtown Manhattan just based on background noise.