Thursday, May 2, 2024
No menu items!
HomeCloud ComputingElevenLabs debuts a generative AI solution for spoken content using customizable voices...

ElevenLabs debuts a generative AI solution for spoken content using customizable voices on Google Cloud

Imagine a future in which every piece of content out there can be listened to in any language or voice, at the click of a button, with high-quality intonations and pacing.

ElevenLabs, the world’s leading voice AI research organization, is doing just this. Our mission is to make spoken content universally accessible in any language and voice. Our platform offers text-to-speech tools that can generate highly customizable voices that sound authentically human. Our services include the ability to create a clone of your own voice, selecting from a range of pre-programmed synthetic voices, or designing an entirely unique synthetic voices. These can then be used for everything from narrating long-form content like books or news, to giving voices to characters in video games.

A great example of this in action is AI Radio, an autonomous, streaming radio station created using our Prime Voice AI technology for a virtual DJ that can announce breaking news and weather along with introducing music, combined with the capabilities of Super Hi-Fi’s MagicStitch™ production service and ChatGPT.

In late February, we introduced Voice Design for creating unique artificial voices, including making adjustments for qualities such as gender, age, and accent. With our upcoming features, creators will be able to structure large texts, insert longer pauses, regenerate portions of audio to their liking, edit intonations, and assign parts to different speakers.

Our solutions are powered by our own deep learning model that was built to realistically render how people speak, with the results adjusting based on the context of the written material and accounting for implied emotions.

Spoken words to resonate with audiences worldwide

ElevenLabs was founded in early 2022. I met my co-founder, Piotr Dabkowski, in high school in Poland, where we both grew up. 

In Poland, movies frequently use a single-person voiceover. Even for scenes with many actors, audiences only hear one voice speaking all the lines, devoid of emotions or intonations. As you can imagine, it’s a pretty bad experience.

After college, Piotr went to work for Google and I worked with companies like BlackRock and Palantir. Our ongoing friendship, common experience with poor dubbing, and shared interest in new technologies led to the idea for ElevenLabs.

We set out to see how we could better analyze and convey a human voice for applications that go well beyond movie voiceovers. We felt there was an enormous opportunity to work with a broad set of content developers to generate spoken-word audio from texts, automate translations into multiple languages, and more.

We launched our beta product in January 2022, after a year of intense development and testing. From YouTube video creators to podcasters and book publishers, we’re already seeing incredible adoption. Independent authors or small publishers, for example, are now able to convert a printed book into an audiobook at a fraction of the cost and in a very short time.

Amplifying voice technologies with Google Cloud

We’ve tried all the major cloud services and we’ve been most impressed by Google Cloud. We’ve had hundreds of thousands of users on our platform so far in the first quarter of this year and feel comfortable that if we need to do 100 times or 1 million times what we do today, we can rely on the infrastructure to scale seamlessly.

We found that some other cloud providers just weren’t scaling as well. There were more hoops that we needed to jump through. With Google Cloud, it was easy. And that’s what you need when you’re growing fast.

The costs are excellent too. It helps that we enrolled in the Google for Startups Cloud Program and received $100,000 in Google Cloud credits for this year and another $100,000 for next year. That support is invaluable for startups like ours that need to experiment but might be tight on cash. The program helped us deploy our models and start serving customers, even before we started bringing in revenue.

We use a lot of Cloud GPUs to ensure high performance for our models.BigQuery is our enterprise data warehouse, enabling us to analyze our information holistically. Looker helps us with our business intelligence visualizations. We also use Google Analytics for our website traffic. 

The tools all integrate well so we can customize metrics for key indicators, such as how many characters of text each customer has used from their subscription quota and how many cloned voices they have saved. We can analyze behaviors on the macro level to understand how people use our products and which voices or vocal characteristics are most popular.

To support users that register for our services, we use Firebase Authentication. It was a very easy identity solution to connect to our application for secure, streamlined user sign-in and onboarding.

We also are big fans of using Google Workspace for all of its organizational and productivity benefits. The ecosystem is great and it’s simple for our team members to use.

ElevenLabs in the future

As we look to our future, I’m interested in exploring new use cases, such as for people who are blind or visually impaired. I was recentlyinterviewed by Jonathan Mosen on his Living Blindfully podcast where we discussed the possibilities for parents who are blind and want to read bedtime stories to their children in their own voice, and for transforming all written content into accessible content in an instant.

As AI technologies evolve, we think it’s important to balance innovations with appropriate safeguards. We’ve had safeguards in place from the start and have rolled-out additional safety features since launch, with lots more coming. 

We also think it’s important to educate the public on safe ways of using the technology, so that people better understand what is and isn’t appropriate. Of course we’ll adapt to new standards and regulations as they emerge, such as the Artificial Intelligence Act in the European Union. One of the things they are proposing is for all AI-generated content to be tagged as such. We strongly agree that should be the case and we are building tools that make this possible.

We are thankful to have a great team of employees, investors, partners, and advisors working with us. As we expand globally, we are happy to be aligned with Google Cloud.

Already today we support generating content in English, Spanish, French, German, Polish, Italian and Hindi – but keep an ear out! In the near future, our model will extend to additional 10 languages as we prepare to cover every language out there!

ElevenLabs team members

If you want to learn more about how Google Cloud can help your startup, visit our page here to get more information about our program, and sign up for our communications to get a look at our community activities, digital events, special offers, and more.

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments