Speechify, ElevenLabs, Hume: Which AI Voice Can Actually Feel Something?

Written by anywhichway | Published 2025/05/08
Tech Story Tags: artificial-intelligence | ai-emotion | ai-text-to-speech-tools | ai-text-to-speech-cost | ai-text-to-speech-tools-review | hume-review | elevenlabs-review | hackernoon-top-story

TLDRThe article investigates the performance of several text-to-speech engines in terms of both speed and quality when generating emotionally laded speech. They are tested in the domains of poetry, coaching, and meditation.via the TL;DR App

I have written a number of articles on Hackernoon regarding AI and empathy. In this article I move beyond the primarily text oriented approach I have taken in the past to address voice output. I also move beyond empathy to the broader space of emotional expression.

The article investigates the performance of several text-to-speech engines (Eleven Labs, Hume, iCednant Speech, Speechify) in terms of both speed and quality when generating emotionally laded speech. Several major brands like Amazon, Google, Microsoft were not evaluated because either their platforms made accessing the API complex or they did not offer direct emotional nuance, e.g. SSML had to be used (See below). All the evaluated engines are fine for short form converting input from both a speed and quality perspective, i.e. just one sentence where there is not much opportunity for emotional nuance. From a quality perspective they are all also fine for longer form input where emotional nuance may not be critical, e.g. reading a business document. Although for long form, the speed to generate the audio can vary dramatically. However, dramatic differences appear when the engine are faced with the nuanced medium to long form formats of poetry, meditation, and coaching .

The article isn’t very long because its focus in audio; hence, you will spend more time listening that you do reading. Also, by way of disclosure, I have a financial interest in the success of some of the platforms covered, so I won’t be doing any qualitative comparisons. I will be leaving it up to you the reader listener to decide which engines perform best under which scenarios. I will restrict comparisons to quantitative values such as speed and cost.

A note about SSML … Although SSML appears quite powerful on its face, particularly when used with the contour attribute that allows for changing frequency and volume at specific millisecond offsets, at the end of the day it does not come close to simuating real human emotion. The micro prosody required can only be matched by neural processors. However, some SSML does make for better management of pauses, volume and special pronunciations.

Also note, iCendant is partnered with Speechify to enhance their capability. Speechify really shines in the editing space. It relatively new to the real-time API space and iCendant gives the Speechify platform a lift.

Poetry

Poetry is a hard domain to get right given it’s varying volume, cadence, and emotion, along with dependence on line breaks and alternative grammatical structures for emphasis. Follow along with the text as you listen to the audio from each vendor.

After A Great Pain - by Emily Dickinson

After great pain, a formal feeling comes
The Nerves sit ceremonious, like Tombs
The stiff Heart questions 'was it He, that bore,'
And 'Yesterday, or Centuries before'?
The Feet, mechanical, go round
A Wooden way
Of Ground, or Air, or Ought
Regardless grown,
A Quartz contentment, like a stone
This is the Hour of Lead
Remembered, if outlived,
As Freezing persons, recollect the Snow
First Chill then Stupor then the letting go

Platform

Time To Generate

Audio

ElevenLabs

5s

πŸ”Š

Hume

13s

πŸ”Š

iCendant Speech

5s

πŸ”Š

Speechify

3s

πŸ”Š

Coaching

Coaching is a hard domain to get right given it’s need for potentially abrupt emotional, speed and volume transitions while also taking into account empathy.

Volleyball Pep Talk - Generated by Claude Sonnet 3.5

Team, gather around.
Consider yourself. Those hands have served thousands of sets.
Those legs have jumped for countless blocks.
Those lungs have powered through five-set matches when you thought you couldn't go on.
But volleyball isn't played with hands or legs or lungs alone.
It's played with heart.
Today, we face the toughest opponent of our season.
They're taller. They're ranked higher.
The statistics say you shouldn't win.
But statistics can't measure teamwork.
Remember who we are.
You are not just six individuals on a court - you are a single unit that breathes, moves, and fights as one. When you step onto that court, you carry each other's dreams.
I've watched you grow from a group of talented players into something extraordinary.
In those moments when fatigue sets in, when doubt creeps into your mind, remember why you play.
Remember the joy of a perfect pass, the satisfaction of a clean kill, the electric feeling when you celebrate together.
Play with passion. Play with purpose. Play for each other.

Platform

Time To Generate

Audio

ElevenLabs

5s

πŸ”Š

Hume

20s

πŸ”Š

iCendant Speech

4s

πŸ”Š

Speechify

3s

πŸ”Š

Meditation

Meditation is a hard domain to get right given it’s need for cadence, counting and smooth emotional transitions.

I suggest that while listenting to the meditations you focus on breathing from your belly and closing your eyes.

Breathing Meditation - Generated by Claude Sonnect v3.5

We will be doing a calming exercise, breathe deeply through your nose, hold, then exhale making a whooshing sound.
Inhale life 1... 2...
Hold peace 1... 2... 3...
Exhale tension 1... 2... 3... 4...
Repeat...
Inhale... 1... 2... 3...
Hold... 1... 2... 3... 4...
Exhale... 1... 2... 3... 4... 5...
And finally...
Inhale... 1... 2... 3... 4...
Hold... 1... 2... 3... 4... 5...
Exhale... 1... 2... 3... 4... 5... 6...
May your life be full of peace.
May your tensions be taken by the wind whispering in the trees.

Platform

Time To Generate

Audio

ElevenLabs

5s

πŸ”Š

Hume

12- 20s

πŸ”Š

iCendant Speech

5s

πŸ”Š

Speechify

3s

πŸ”Š

Costs And Other Info

Platform

Cost (approximately 1,000 char per minute)

SSML Available

ElevenLabs

$0 to $1,300 per month including fixed usage
2,000 to 22,000 minutes (20,000 to 222,000 chars)
$0.016 to $0.012 per 1,000 chars

No
(limited custom XML)

Hume

$0 to $900 per month including capped usage with overage fees
10,000 to 10,000,00 characters per month
$0.20 to $0.10 per 1,000 chars overage

No

iCendant Speech

$0.008 per 1,000 chars + $10 per month

No

Speechify

$0.001 per 1,000 chars

Yes
(moderate emotion extensions)

Google Neural

$0.016 per 1,000 chars (not evaluated due to limited emotions in extended SSML)

Yes
(very limited emotion extensions)

Conclusion

Hume seems awfully expensive compared to the other options, particularly given its slower speed.

Ultimately, it is not clear any of the platforms can currently handle all needs, although I see promise at the intersection of all their capabilities. Another 12 months I think …


Vested interest disclosure: the author is a part owner of iCendant Speech.


Written by anywhichway | Working in the clouds around Seattle ... sailing when it's clear.
Published by HackerNoon on 2025/05/08