Generating speech¶
Marvin can generate speech from text.
What it does
The speak
function generates audio from text. The @speech
decorator generates speech from the output of a function.
Example
The easiest way to generate speech is to provide a string:
How it works
Marvin passes your prompt to the OpenAI speech API, which returns an audio file.
Text is generated verbatim
Unlike the images API, OpenAI's speech API does not modify or revise your input prompt in any way. Whatever text you provide is exactly what will be spoken.
Therefore, you can use the speak
function to generate speech from any string, or use the @speech
decorator to generate speech from the string output of any function.
Generating speech¶
By default, OpenAI generates speech from the text you provide, verbatim. We can use Marvin functions to generate more interesting speech by modifying the prompt before passing it to the speech API. For example, we can use a function to generate a line of dialogue that reflects a specific intent. And because of Marvin's modular design, we can simply add a @speech
decorator to the function to generate speech from its output.
import marvin
@marvin.speech
@marvin.fn
def ai_say(intent: str) -> str:
'''
Given an `intent`, generate a line of diagogue that
reflects the intent / tone / instruction without repeating
it verbatim.
'''
ai_say('hello')
# Hi there! Nice to meet you.
Result
Playing audio¶
The result of speak
and @speech
is an Audio
object that can be played by calling its play
method. By default, playback will start as soon as the first bytes of audio are available. See the note on streaming audio for more information.
Streaming audio¶
By default, Marvin streams audio from the OpenAI API, which means that playback can start as soon as the first bytes of audio are available. This can be useful for long audio files, as it allows you to start listening to the audio before it has finished generating. If you want to wait for the entire audio file to be generated before starting playback, you can pass stream=False
:
pcm
(or raw) audio file format, and an error will be raised if you try to generate speech in a different format with stream=True
. However, you can always save pcm
audio to a file in a different format after it has been generated.
Saving audio¶
To save an Audio
object to a file, you can call its save
method:
Marvin will attempt to infer the correct file format from the file extension you provide. If you want to save the audio in a different format, you can pass a format
argument to save
.
Saving audio¶
Choosing a voice¶
Both speak
and @speech
accept a voice
parameter that allows you to choose from a variety of voices. You can preview the available voices here.
The result of the `speak` function and `@speech` decorator is an audio stream.
audio = marvin.speak("Hello, world!", voice="nova")
audio.play("hello_world.mp3")
Model parameters¶
You can pass parameters to the underlying API via the model_kwargs
arguments of speak
and @speech
. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
Async support¶
If you are using Marvin in an async environment, you can use speak_async
(or decorate an async function with @speech
) to generate speech asynchronously: