In early January 2023, Microsoft released a paper with demo files on GitHub called “Vall-E, Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.”
For those interested, you can find it here.
I will not even try to break down what that title means, but I can tell you what it MEANS MEANS.
Microsoft has developed a Text-To-Speech (TTS) language model/AI so powerful that it can basically REPLICATE your voice with only 3 seconds of input.
You read that correctly!
3 seconds…
And it can replicate tone and emotions too.
Vall-E has been trained for 60 000 hours in the English language, which is hundreds of times more than any other AI.
The AI has not been released to the public yet.
Frankly, I am not sure if it is wise to do so.
With great power comes great responsibility… and we don’t have any Spiderman in our lives.
Imagine what someone with bad intentions could do with a powerful voice replicating AI like this.
I am sure many have had a “Microsoft support” call from India to help them get rid of that horrible virus you have on your Windows PC, even if you use a Mac…
Imagine if the same scammers get access to an AI that can replicate your voice in 3 seconds and then proceed to call your old non-technical mom or grandmom and pretends to be you…
I think this AI can be a game changer for fraud…
But we don’t want any game changers in fraud, do we?
Heck no!
But Microsoft support services from India or princes from Nigeria are only a few dangers at the top of my head.
I am sure there are 100s of illicit use cases for such a powerful tool.
As much as I am excited about AI, and am also very worried about where it can go.
And I guess most of all, I am worried about AI getting to a point where it is impossible to stop it, impossible to “switch it off”.
I guess that would be AI becoming sentient – becoming aware of self, and being able to perceive, or feel things.
Have we crossed the point of no return?
A quick demonstration of the powers of Vall-E
To show you the power of Vall-E, I created a short 1-minute demonstration with the sound files (test files) Microsoft provided together with the aforementioned paper.
Listen/watch the 1min test below:
What do you think about this?
I find it exciting yet highly disturbing.
The thing we must not forget is that the stuff that eventually surfaces is what is “ready” to be shown to the world.
What we never see or hear about are all the other things being tested and which are even more powerful (and possibly dangerous).
Both the USA and the EU are currently working on AI legislation.
But as usual, legislation is lagging.
And what good is legislation in the USA or the EU if some of the crazies over in North Korea, China, or Russia get their hand on the technology, or develop their own?
What about the other tech companies
So far, we have not really heard a lot from the other tech giants about AI.
We have companies like Google that is yet to release any “shocking” AI developments to the world.
Yes, they demonstrated one of their AI language tools back in 2018.
To try to underpin my point even more, imagine what developments have been made in AI in the last five years.
The video above is now five years old.
Don’t you think that if they had these powerful capabilities back then, they have something insane cooking in their secret AI lab just about now?
Of course, they do!
But wait!
We have not heard anything from giants like Apple, Facebook, Oracle, IBM, Amazon, GE, or Tesla yet.
They are not sitting still in all this.
As certain as my name is Thomas, these companies are doing AI research en masse.
We know Tesla is working on Neurolink, which is about to undergo human trials.
Meaning they will start implanting microchips into people’s brains to see if they can “help it”!
But that is a whole other can of worms that I need to dedicate a separate article to.
A FREE weekly newsletter about AI and its applications and implications on business for non-technical business professionals and managers.
Ethics Statement and Vall-E
Microsoft did, together with the Vall-E paper, release an ethics statement and raise cause for concern about misuse. The researchers warn that when the model is released to the “real world,” it should include a protocol to ensure the speaker approves the use of their voice and a synthesized speech detection model.
Call me a skeptic, but this may be hard to enforce or even prevent from being hacked.
In the end, all it takes is one bad actor with access to the code, and armageddon may be upon us…
Thomas Sorheim
I am the creator of the Practical AI newsletter and The Future Handbook website. I write about all things AI and try hard to make it all understandable for non-technical people.A FREE weekly newsletter about AI and its applications and implications on business for non-technical business professionals and managers.