Reducing Voice AI Costs by ~90% With SaladCloud

Modev News voiceandai gnrt CodeForward Feb 16, 2024 10:39:30 AM Modev Staff Writers 4 min read

“I was looking at a $15,000 bill on a managed transcription service. Turns out, it only costs $800 running on consumer GPUs” - Founder, Education provider

Salad is offering $5000 in cloud credits for the first 25 VOICE & AI members to book a demo with the Salad team. Qualifying conditions apply. You can request a demo here. Continue reading to learn why you should consider a demo.

Overpaying for GPUs in Voice AI

Generative AI has brought about a boom in Voice AI applications. Toolify.ai lists around 518 Voice AI companies and over 100+ voice-based chatbots. In this crowded market, scaling user growth and being profitable will distinguish the winners.

For many growing AI companies, scaling is hard amidst a GPU shortage. Meanwhile, profitability takes a hit due to sky-high GPU prices and larger organizations pricing them out of the market.

But there’s another problem: Data suggests that the Voice AI industry is massively overpaying for managed services and high-end GPUs while under-utilizing consumer GPUs.

For example, transcribing 1 Million hours of audio costs almost $1 Million on popular managed transcription services. Meanwhile, the cost is around $5200 running on the lowest-priced consumer GPUs in the market.

Consumer GPUs are the answer to AI/ML inference

It is one of the industry's best-kept secrets that while expensive, high-end GPUs are required to train a foundational AI model, consumer GPUs (like the RTX/GTX series) are more than adequate to power inference for most use cases. But often, AI companies continue to serve inference on the same GPUs the models were trained on. Or they rely on managed services for transcription/translation that have a huge markup in prices, leaving profits on the table.

Running Voice AI applications like ASR & TTS on consumer GPUs is often 10X cheaper. So where are these consumer GPUs available in the midst of a GPU shortage?

The answer lies in the 400 Million consumer GPUs worldwide owned by individuals. Most of these GPUs lie unused more than 80% of the day. But how do AI companies take advantage of them?

Enter Salad’s distributed cloud.

What is SaladCloud?

The clever folks at Salad have created the world’s largest distributed cloud with over 2 Million individual GPUs and 100s of businesses on the network. GPU owners download the Salad app and rent out their GPUs when not in use. Salad then integrates those unused GPUs into SaladCloud, where they're used to power your organization's AI inferences. It's a bit like AirBnB, but for GPUs.

With 10,000+ GPUs online at any time and prices from $0.02/hour, 100s of companies like DeepAI, Pareto.io & one of the top 5 most visited AI websites have turned to Salad to lower their cloud costs.

A real-time snapshot of the worldwide locations of Salad’s distributed GPUs over a 30 minute period

SaladCloud is workload agnostic. Organizations can deploy popular models or bring their own in containers. With SOC-2 certification and multi-layer security, many production use cases are a great fit for this distributed cloud.

The organizations that stand to benefit the most from Salad's approach are those serving AI inference at scale - particularly Speech-to-Text, Text-to-Speech, LLM chatbots, image generation, etc.

Let's now look at some of SaladCloud's key use cases.

TRANSCRIPTION: Over 99% Savings On Transcription Using Whisper-Large-v3

Using SaladCloud to power OpenAI's automatic speech recognition model, Whisper Large V3, significantly (and quite dramatically) reduces audio transcription costs. At an impressive 11,736 minutes transcribed per dollar, running AI transcription on Salad costs 99% less than the next best option in the market. You can read the whole benchmark **here.**

Minutes per dollar transcribed on SaladCloud compared to other services

A Whisper Large v2 benchmark shows transcription costs at $0.00059 per audio minute on Salad. To transcribe the entirety of the English CommonVoice dataset, it’ll cost just $117 on SaladCloud. And that amounts to an average transcription rate of one hour of audio every 16.47 seconds

TEXT-TO-SPEECH: 6 Million+ words per dollar with OpenVoice

AI voices are not just popular but have a real business case (L&D, corporate training, marketing, etc). Here, Salad’s OpenVoice TTS benchmark delivers over 6 Million words per dollar with the Text-to-Speech model.

Words per dollar for different consumer GPUs with OpenVoice on SaladCloud for TTS applications

To put that in context, reading the entire book of "Adventures of Huckleberry Finn" by Mark Twain would cost as little as $0.01 on Salad using an RTX 2070.

For a TTS model like Suno AI’s Bark, Salad’s consumer GPUs deliver 39,000 words per dollar.

Getting Voice AI companies to profitability

SaladCloud's unique distributed infrastructure lowers cloud costs significantly while increasing scalability.

By and large, SaladCloud's service is a game-changer. Its innovative approach, combined with its user-friendly platform, makes it a prime choice for developers and businesses looking to leverage the power of AI.

At Modev, we're proud of our partnership with Salad and look forward to highlighting more of its disruptive (in a good way) tech. The future of AI inferences is distributed and sustainable.

The future of AI inferences is SaladCloud (and it's also a whole lot cheaper).

Special offer for Modev/VOICE & AI members

Salad is offering $5000 in cloud credits for the first 25 Voice & AI members to book a demo with the Salad team. Qualifying conditions apply. You can request a demo here.

Modev Staff Writers

Modev staff includes a talented group of developers and writers focused on the industry and trends. We include Staff when several contributors join forces to produce an article.

Reducing Voice AI Costs by ~90% With SaladCloud

Overpaying for GPUs in Voice AI

Consumer GPUs are the answer to AI/ML inference

What is SaladCloud?

TRANSCRIPTION: Over 99% Savings On Transcription Using Whisper-Large-v3

TEXT-TO-SPEECH: 6 Million+ words per dollar with OpenVoice

Getting Voice AI companies to profitability

Modev Staff Writers

Information

Modev Events

Reducing Voice AI Costs by ~90% With SaladCloud

Overpaying for GPUs in Voice AI

Consumer GPUs are the answer to AI/ML inference

What is SaladCloud?

TRANSCRIPTION: Over 99% Savings On Transcription Using Whisper-Large-v3

TEXT-TO-SPEECH: 6 Million+ words per dollar with OpenVoice

Getting Voice AI companies to profitability

Modev Staff Writers

VOICE Summit is now VOICE & AI

GitHub & AI @ VOICE & AI

5 Reasons Why AI Use Cases Matter

Information

Modev Events