Skip to content
background gradient desk

Audiobooks: The future of publishing

December 28, 2021

Share with:

1. The rise of audio books

Audiobooks are recordings of text-based books (Text to Speech) using human voice. Audiobooks were born in America in 1932 with the aim to help visually impaired people or elders. Today, the development of technology has changed the way people receive information, leading to the popularity of audiobooks and increasing usage among youngsters.

A global view

Audiobooks have developed recently and experienced a breakthrough since 2020. According to a report by Omdia, a UK-based telecoms market research company, audiobook revenue reached USD 4 billion in 2020. It is expected to be USD 4.8 billion this year and continue its strong growth in the next few years. By 2026, Omdia predicted that there would be 337 million people listening to audiobooks each month all over the world. 

doanh thu audiobook
          The chart shows digital audiobook revenue around the globe, survey of Omdia in 2020

số người nghe hàng tháng

  The chart shows the number of audiobook listeners each month around the globe, survey of Omdia in 2020

According to the America Audio Publishers Association, audiobook revenue in this country reached USD 1.3 billion in 2020. In the first two months of 2021, this number increased by 23.7% and reached USD 131.6 million. The Guardian UK also admitted that 2020 is the booming time of audiobooks.

Audiobooks, a late explosion in Vietnam

Not standing outside of the trend, the audiobook industry in Vietnam also begins to show promising signs of development with some factors such as: potential market with a population of over 90 million people, including 56% are under 35 – the age having highest demand for audiobooks (Statista); the second highest number of smartphone users in Southeast Asia in 2020 with 61.3 million smartphones (Statista); sharing culture where people update information anytime, anywhere, etc. 

Despite its strong developing potential, the market volume of audiobooks in Vietnam is still significantly lower than Asia and the world. Businesses in this field are still dispersing, which leads to low market value, and technology limits also make producing audiobooks manually cost much more than printed books.

2. The application of Text-to-speech (TTS) technology in producing audiobooks

The trend of applying Text-to-speech technology in the audiobook industry is gradually shaping the future of this industry when being embraced by “giants” such as Google and Amazon: Google Play Books has introduced teaching assistant program in March 2021, allowing AI to read out loud books, automatically turn pages and access children’s dictionary; Amazon has created a similar audiobook tutor for children in June 2021, and recently Alexa has been connected with the free audiobook program of the National Institute of Blind People, etc. 

gg play book

 Google Play Books also implements Vietnamese AI voice. However, this voice is not natural and has a limit on regional voices.

A solution for a nascent market

As a developing technology with initial progress in Vietnam, audiobooks still need a right direction to become a boom of the future publishing industry. Text-to-speech technology (TTS) helping automatically read text aloud will be the next trend that shapes the audiobook industry. The application of TTS technology in producing audiobooks not only helps save costs of recording and editing audio but also fit a small-volume and dispersing market like Vietnam’s. In other words, TTS technology is the leverage of Vietnam’s audiobook industry. In the Vietnamese Text-to-speech market, FPT.AI Text to Speech based on an artificial intelligence (AI) is a versatile solution to convert text into voice, which has many outstanding features for audiobooks compared with normal converting software. 

Pronounce correctly: Using natural language processing and deep learning technology, the AI voice of FPT.AI TTS can pronounce exactly Vietnamese words, names and numbers up to millions; automatically break sentences where there are commas (,) and period (.), thus reading text naturally and professionally.

Read naturally with a wide range of regional voices: FPT.AI’s TTS technology has outstanding completion to make the AI voice natural and smooth as human’s. There will be no more rigid, mechanical Google’s voice but today, AI voice can convey emotions vividly, making audiobooks’ content more lively and attractive. Moreover, being researched and developed specially for the Vietnamese market based on in-depth research of regional voices, FPT.AI TTS now has 9 voices from all 3 regions, North, Center and South of Vietnam, both male and female voices, meeting most of users’ demands. 

Customize easily: Adapt to the diversity of audiobooks, FPT.AI TTS has features to adjust voice to fit different content and topics. For example, self-help and reference books usually require a formal voice with moderate speed, while children’s spelling books or fairy tales need a slow and melodic voice. In this case, users can change speed, voice, intonation, etc. in text-to-speech software. Users even can use different voices at the same time for scenarios with dialogs spoken by different people. 

Implement flexibly: Users can quickly connect with APIs of FPT.AI or convert text into common audio file format such as MP3, WAV, etc. directly on website with user interface, thus always ensuring smooth and fast service even when the number of visitors suddenly surge.

Today, many e-newspapers and audiobook storage in Vietnam have been applying AI-powered TTS technology of FPT.AI to help convert text into speech. One of them is FPT Corporation’s tech newspaper – TechInsight – has implemented FPT.AI’s TTS technology for audio news, or VnExpress has used FPT.AI’s TTS voice to create news videos.

Flexibility and accuracy are two of the important factors that FPT.AI’s TTS solution wants to achieve. Vietnamese has special context and meaning, styles of authors are also different across fields and gernes. Therefore, the solution developed by FPT.AI for Vietnam’s market does not generate mechanical voice but emotional, smooth and accurate speech that is appropriate for publishing trends in the tech age. 

Moreover, Acesound AI voice upgraded recently by FPT.AI has brought Text to Speech technology to the next level with natural voice matching up to 90% with human voice. Listeners can hardly recognize that they are hearing artificial voices on the website. Experience right now at:


Audiobooks are developing and will be the future of publishing in Vietnam and around the world. Applying advanced technologies will change service experiences and “reading culture” of the community. Vietnamese Text to speech by FPT.AI will always accompany and provide state-of-the-art technology solutions to audiobook businesses, contributing to the development of the publishing industry 4.0. 


Sách nói đang phát triển và sẽ là tương lai của ngành xuất bản tại Việt Nam và trên thế giới. Áp dụng công nghệ tiên tiến sẽ thay đổi trải nghiệm dịch vụ và “văn hóa đọc” của cộng đồng. FPT.AI sẽ luôn đồng hành và cung cấp các giải pháp công nghệ hiện đại nhất cho các doanh nghiệp kinh doanh sách nói, góp phần vào sự phát triển của ngành xuất bản 4.0.



? Experience other products of #FPT_AI at 

? Address: 7th floor, FPT Tower, 10 Pham Van Bach Street, Cau Giay District, Hanoi///3rd floor Pijico Tower, 186 Dien Bien Phu Street, Ward 6 District 3, Ho Chi Minh City

☎ Hotline: 1900 638399

? Email: [email protected]

Related Posts

Get ahead with AI-powered technology updates!

Subscribe now to our newsletter for exclusive insights, expert analysis, and cutting-edge developments delivered straight to your inbox!