Then and now: How AI voice deepfake tools used in cyber fraud can fool you
The Beatles have once again delighted millions of fans around the world by releasing a new song, all possible thanks to artificial intelligence (AI), combining parts of an old recording while also improved its audio quality. While there is joy at the band’s masterpiece, there is also a darker side of using AI to create deepfake voices and images.
Thankfully, such deepfakes – and the tools used to make them – are for now, not well developed or widespread, nevertheless, their potential for use in fraud schemes is extremely high, and the technology is not standing still.
What are voice deepfakes capable of?
Open AI recently demonstrated an Audio API model that can generate human speech and voice input text. So far, only this Open AI software is the closest to real human speech.
In the future, such models can also become a new tool in the hands of attackers. The Audio API can reproduce the specified text by voice, while users can choose which of the suggested voice options the text will be pronounced with. The Open AI model, in its existing form, cannot be used to create deepfake voices, but is indicative of the rapid development of voice generation technologies.
Today, practically no devices exist that is capable of producing a high-quality deepfake voice, indistinguishable from real human speech. However, in the last few months, more tools are being released to generate a human voice. Previously, users needed basic programming skills, but now it is becoming easier to work with them. In the near future, we can expect to see models that will combine both simplicity of use and quality of results.
Fraud using artificial intelligence is uncommon, but examples of “successful” cases are already known. In mid-October 2023, American venture capitalist Tim Draper warned his Twitter followers that scammers can use his voice in fraud schemes. Tim shared that the requests for money being made by his voice are the result of artificial intelligence, which is obviously getting smarter.
How to protect yourself?
So far, society may not perceive voice deepfakes as a possible cyber threat. There are very few cases where they are used with malicious intentions, so protection technologies are slow to appear.
For now, the best way to protect yourself is to listen carefully to what your caller says to you on the telephone. If the recording is of poor quality, has noises, and the voice sounds robotic, this is enough not to trust the information you hear.
Another good way to test your companion’s “humanity” is to ask out-of-the-box questions. For example, if the caller turns out to be a voice model, a question about its favorite color will leave its stumped, as it is not what a victim of fraud usually asks. Even if the attacker manually dials and plays back the answer at this point, the time delay in the response will make it clear that you are being tricked.
One more safe option is also to install a reliable and comprehensive security solution. While they cannot 100 percent detect deepfake voices, they can help users avoid suspicious websites, payments, and malware downloads, by protecting browsers and checking all files on the computer.
“The main advice at the moment is not to exaggerate the threat or try to recognize voice deepfakes where they don’t exist. For now, the available technology is unlikely to be powerful enough to create a voice human would not be able to recognize as artificial. Nevertheless, you need to be aware of possible threats and be prepared for advanced deepfake fraud becoming a new reality in the near future,” comments Dmitry Anikin, Senior Data Scientist at Kaspersky.