Amazon Polly - AI Voice Generator

Deploy high-quality, natural-sounding human voices in dozens of languages

What is Amazon Polly?

Amazon Polly is a fully-managed service that generates voice on demand, converting any text to an audio stream. Using deep learning technologies to convert articles, web pages, PDF documents, and other text-to-speech (TTS). Polly provides dozens of lifelike voices across a broad set of languages for you to build speech-activated applications that engage and convert. Meet diverse linguistic, accessibility, and learning needs of users across geographies and markets. Powerful neural networks and generative voice engines work in the background, synthesizing speech for you. Integrate the Amazon Polly API into your existing applications to become voice-ready quickly.

Use cases

Generate speech in dozens of languages.

Engage customers with a natural-sounding voice

Create audio for media at a fraction of the cost

Capabilities

Amazon Polly has a variety of capabilities including some listed below

Lifelike voices

Deliver conversational user experiences in consistently fast response times

When requesting Amazon Polly output, you can choose from dozens of lifelike voices and various languages. Each voice is created using native speakers, with voice-to-voice variations even within the same language. Most languages include one or more male and female voices, so you can choose the best fit for your use case.

Customizable output

Customize and control speech output as needed

Amazon Polly allows you to create custom text-to-speech output that attracts and holds your audience's attention. Use custom lexicons to modify the pronunciation of acronyms, company names, internal terminology, or any other words you choose. Amazon Polly’s Speech Synthesis Markup Languages (SSML) tags also allow you to adjust emphasis, intonation, phrasing, and style. Generate voice AI output that best suits your business.

Gen AI power

Access built-in gen AI capabilities at a fraction of the cost

Amazon Polly supports multiple voice engines that you can choose from to convert text-to-speech. The engine deploys a billion-parameter transformer to generate voices in an incremental, streamable manner. This AI voice generator creates synthetic speech that is assertive, emotionally engaged, and highly colloquial, similar to a real human voice.

text-to-speech

Control and security

Securely store and redistribute speech in standard formats

Store your text-to-speech output in standard audio files like MP3 and OGG for redistribution, analysis, archiving, or any other use case at no extra cost. Cache your files for faster retrieval if needed. Your content's security, trust, and privacy are AWS’s highest priorities. Amazon Polly does not retain the content of your text submissions.