text-to-speech

date: 2020-11-21 excerpt: text-to-speech

text-to-speech

OSXの`say`

発音できるボイス一覧

$ say -v "?"

Otoyaでの発音の例

$ echo "こんにちは世界" | say -v "Otoya"

`gtts`での発音

Googleの無料サービスにクエリを投げている？

gtts

$ python3 -m pip install gTTS

e.g. 話した内容をmp3形式で保存

from gtts import gTTS
tts = gTTS('今回はおすすめの本の紹介を行いたいと思います。', lang='ja')
tts.save('output.mp3')

Google Cloud Text-to-Speech

wavenet等高品質モデルが使用可能である

GRPCとAPI経由での２つでのIFが用意されている

google-cloud-speech

$ python3 -m pip install google-cloud-speech

APIの有効化
GCPのAPIとサービスの有効化から、 speech-to-textのAPIを有効化している必要とGOOGLE_APPLICATION_CREDENTIALSの環境変数にキーが設定されているがある

GRPCでは以下のようなコードでmp3を生成することができる

from google.cloud import texttospeech

client = texttospeech.TextToSpeechClient()
synthesis_input = texttospeech.SynthesisInput(text="こんにちは世界!")
voice = texttospeech.VoiceSelectionParams(language_code="ja-JP", name="ja-JP-Wavenet-C")
audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)
response = client.synthesize_speech(input=synthesis_input, voice=voice, audio_config=audio_config)
with open("output.mp3", "wb") as out:
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

nameでどのモデルで合成音声を作成するか指定しているが、このリンクにGoogle Cloud Text-to-Speechの全モデルが発話例付きで列挙されている。

open-jtalk

ゆっくりボイスに近い音声を作りたいときに便利

install

$ sudo apt install open-jtalk open-jtalk-mecab-naist-jdic

install voices
MMDAgent_Exampleのリンクから最新のzipを落とす

/usr/shareなどに配置する
run example

$ echo "そんなもうだめです" | open_jtalk -x /var/lib/mecab/dic/open-jtalk/naist-jdic -m /usr/share/hts-voice/mei/mei_happy.htsvoice -ow sample.wav

text-to-speech