Flow API Reference
GETwss://flow.api.speechmatics.com/
Protocol overview
A basic Flow session will have the following message exchanges:
Indicates messages sent by the client
Indicates messages sent by the service
Session Start
Once-only at conversation start:
- StartConversation
- ConversationStarted
Audio Input/Output and Transcripts
Repeating during the conversation to cover the audio stream from the client and corresponding transcripts:
-
AddAudio(client sending audio) -
AudioAdded(server received audio) -
AddTranscript/AddPartialTranscript(server sent transcript) -
AddAudio(server sending audio) -
AudioReceived(client received audio)
TTS Response Management
-
ResponseStarted(when TTS begins) -
ResponseCompleted(when TTS finishes normally) -
ResponseInterrupted(when TTS is interrupted)
Function Calling
Exchanged during function calling over the websocket:
-
ToolInvoke(when function call is triggered) -
ToolResult(client response to function call)
Session Termination
Once-only at conversation end:
-
AudioEnded(client ending session) -
ConversationEnding(agent ending session) -
ConversationEnded(final message before connection close)
Info, Warning and Error messages will be sent as appropriate.
Sent messages
StartConversation
StartConversationaudio_format objectrequired
- AudioFormatRaw
- AudioFormatFile
rawPossible values: [pcm_f32le, pcm_s16le, mulaw]
fileconversation_config objectrequired
Required in the the StartConversation message in the Flow API. Generated from the Speechmatics Portal. This maps to the language supported, agent's prompt, LLM, TTS voice, & custom dictionary. These can be customised by creating or modifying agents in the Portal.
template_variables object
tools object[]
A list of tools that the LLM can use during the conversation.
The type of tool to use. At the moment, only function is supported.
Possible values: [function]
function objectrequired
The function that the tool will call.
The name of the function that should be called. This name is passed as a field in the ToolInvoke message
A natural language string that instructs the LLM about the condition in which the function must be called
parameters object
An object containing the properties of the function call which should be collected from the conversation. Each parameter is defined by:
Possible values: [object]
(optional) The list of input parameters for the function which are required.
properties object
Properties of the function parameter object
[property name: string] object
Possible values: [integer, number, string, boolean]
A description of the parameter.
An example value for the parameter.
debug object
AddAudio
AudioReceived
AudioReceivedAudioEnded
AudioEndedAddInput
AddInputThe information that the LLM must incorporate in the response
If true, the response will be interrupted by the new input.
If false, the response will continue until it is complete, defaults to false.
falseIf true, the input will be treated as urgent and will be sent to LLM immediately.
If false, new input will be added to current prompt and sent to LLM as a part of the next request.
falseToolResult
ToolResultThe id of the tool invoke.
Possible values: [ok, rejected, failed]
The content of the tool result.
Received messages
ConversationStarted
ConversationStartedAddAudio
AudioAdded
AudioAddedAddPartialTranscript
AddPartialTranscriptSpeechmatics JSON output format version number.
2.1metadata objectrequired
results object[]required
Possible values: [word, punctuation]
Possible values: [next, previous, none, both]
alternatives object[]
display object
Possible values: [ltr, rtl]
Possible values: >= 0 and <= 1
Possible values: >= 0 and <= 100
AddTranscript
AddTranscriptSpeechmatics JSON output format version number.
2.1metadata objectrequired
results object[]required
Possible values: [word, punctuation]
Possible values: [next, previous, none, both]
alternatives object[]
display object
Possible values: [ltr, rtl]
Possible values: >= 0 and <= 1
Possible values: >= 0 and <= 100
ResponseStarted
ResponseStartedThe content that is spoken by the agent in the response.
The start time of the spoken response, relative to the start of the session.
ResponseCompleted
ResponseCompletedThe content that is spoken by the agent in the response.
The start time of the spoken response, relative to the start of the session.
The end time of the spoken response, relative to the start of the session.
ResponseInterrupted
ResponseInterruptedThe content that is spoken by the agent in the response.
The start time of the spoken response, relative to the start of the session.
The end time of the spoken response, relative to the start of the session.
ToolInvoke
ToolInvokeThe id of the tool invoke.
function objectrequired
The name of the tool to invoke.
arguments objectrequired
[property name: string] object
- MOD1
- MOD2
- MOD3
Error
ErrorPossible values: [invalid_message, invalid_model, invalid_config, invalid_audio_type, not_authorised, insufficient_funds, not_allowed, job_error, data_error, buffer_error, protocol_error, timelimit_exceeded, quota_exceeded, unknown_error]
Warning
WarningPossible values: [duration_limit_exceeded]
Info
InfoPossible values: [recognition_quality, model_redirect, deprecated]
ConversationEnding
ConversationEndingConversationEnded
ConversationEnded