In brief: One big fear over AI being able to perfectly recreate a person’s voice is its potential for scams. This sort of technology has made huge advancements over the last few months, and, unsurprisingly, is being abused by criminals who can spoof a voice to convince its owner’s family that the person needs financial help.
The Washington Post reports that a Canadian couple in their 70s recently received a phone call from what sounded like their grandson claiming he was in jail and needed bail money. They withdrew a maximum 3,000 Canadian dollars from one bank and were about to take the same amount out of another when the manager told them they were being scammed; it turned out that another customer had received a similar call and learned it had been faked.
A different couple who weren’t so lucky were the parents of Benjamin Perkin. They received a call from a lawyer saying their son had killed a US diplomat in a car accident, and Perkin was in jail and needed money for legal fees. The lawyer put Perkin on the phone, who said he loved them and appreciated the money.
The voice sounded “close enough for my parents to truly believe they did speak with me,” Perkin said. His parents sent $15,449 to the scammer via a bitcoin terminal and have not been able to get it back.
Voice-based scams aren’t new. Federal Trade Commission data reveals that of the 36,000 reports last year of people being swindled by criminals pretending to be friends or family, over 5,100 of these incidents happened over the phone.
Spoofing a person’s voice used to be a complicated and lengthy procedure involving the discovery and collection of hours of audio, and the end result wasn’t always very convincing. Now, however, artificial intelligence tools have made the process so easy that scammers just need a small clip of a person speaking, often gathered from a social media account, to recreate their voice accurately.
An example of this technology is Microsoft’s Vall-E tool that the company announced in January. Building on a tech called EnCodec that Meta announced in October 2022, it works by analyzing a person’s voice, breaking down the information into components, and using its training to synthesize how the voice would sound if it were speaking different phrases. Even after hearing just a three-second sample, Vall-E can replicate a speaker’s timbre and emotional tone. Check out how eerily convincing it is on this GitHub page.
First, we’ve always had the ability to trace any generated audio clip back to a specific user. We’ll now go a step further and release a tool which lets anyone verify whether a particular sample was generated using our technology and report misuse. This will be released next week
— ElevenLabs (@elevenlabsio) January 31, 2023
ElevenLabs, which offers a generative AI voice tool, recently tweeted that it was seeing “an increasing number of voice cloning misuse cases.” It led to the removal of the voice cloning capability from the free version of its VoiceLab program.