The Diatheke API is defined using gRPC and protocol buffers. This section of the documentation is auto-generated from the protobuf file. It describes the data types and functions defined in the spec. The “messages” below correspond to the data structures to be used, and the “service” contains the methods that can be called.
Service that implements the Cobalt Diatheke Dialog Management API.
Method Name | Request Type | Response Type | Description |
---|---|---|---|
Version | Empty | VersionResponse | Returns version information from the server. |
ListModels | Empty | ListModelsResponse | ListModels returns information about the Diatheke models the server can access. |
CreateSession | SessionStart | SessionOutput | Create a new Diatheke session. Also returns a list of actions to take next. |
DeleteSession | TokenData | Empty | Delete the session. Behavior is undefined if the given TokenData is used again after this function is called. |
UpdateSession | SessionInput | SessionOutput | Process input for a session and get an updated session with a list of actions to take next. This is the only method that modifies the Diatheke session state. |
StreamASR | ASRInput | ASRResult | Create an ASR stream. A result is returned when the stream is closed by the client (which forces the ASR to endpoint), or when a transcript becomes available on its own, in which case the stream is closed by the server. The ASR result may be used in the UpdateSession method. If the session has a wakeword enabled, and the client application is using Diatheke and Cubic to handle the wakeword processing, this method will not return a result until the wakeword condition has been satisfied. Utterances without the required wakeword will be discarded and no transcription will be returned. |
StreamTTS | ReplyAction | TTSAudio | Create a TTS stream to receive audio for the given reply. The stream will close when TTS is finished. The client may also close the stream early to cancel the speech synthesis. |
Transcribe | TranscribeInput | TranscribeResult | Create an ASR stream for transcription. Unlike StreamASR, Transcribe does not listen for a wakeword. This method returns a bi-directional stream, and its intended use is for situations where a user may say anything at all, whether it is short or long, and the application wants to save the transcript (e.g., take a note, send a message). The first message sent to the server must be the TranscribeAction, with remaining messages sending audio data. Messages received from the server will include the current best partial transcription until the full transcription is ready. The stream ends when either the client application closes it, a predefined duration of silence (non-speech) occurs, or the end-transcription intent is recognized. |
Data to send to the ASR stream. The first message on the stream must be the session token followed by audio data.
Field | Type | Label | Description |
---|---|---|---|
token | TokenData | Session data, used to determine the correct Cubic model to use for ASR, with other contextual information. |
|
audio | bytes | Audio data to transcribe. |
The result from the ASR stream, sent after the ASR engine has endpointed or the stream was closed by the client.
Field | Type | Label | Description |
---|---|---|---|
text | string | The transcription. |
|
confidence | double | Confidence estimate between 0 and 1. A higher number represents a higher likelihood of the output being correct. |
|
timedOut | bool | True if a timeout was defined for the session’s current input state in the Diatheke model, and the timeout expired before getting a transcription. This timeout refers to the amount of time a user has to verbally respond to Diatheke after the ASR stream has been created, and should not be confused with a network connection timeout. |
Specifies an action that the client application should take.
Field | Type | Label | Description |
---|---|---|---|
input | WaitForUserAction | The user must provide input to Diatheke. |
|
command | CommandAction | The client app must execute the specified command. |
|
reply | ReplyAction | The client app should provide the reply to the user. |
|
transcribe | TranscribeAction | The client app should call the Transcribe method to capture the user’s input. |
This action indicates that the client application should execute a command.
Field | Type | Label | Description |
---|---|---|---|
id | string | The ID of the command to execute, as defined in the Diatheke model. |
|
input_parameters | CommandAction.InputParametersEntry | repeated |
Field | Type | Label | Description |
---|---|---|---|
key | string | ||
value | string |
The result of executing a command.
Field | Type | Label | Description |
---|---|---|---|
id | string | The command ID, as given by the CommandAction |
|
out_parameters | CommandResult.OutParametersEntry | repeated | Output from the command expected by the Diatheke model. For example, this could be the result of a data query. |
error | string | If there was an error during execution, indicate it here with a brief message that will be logged by Diatheke. |
Field | Type | Label | Description |
---|---|---|---|
key | string | ||
value | string |
This message is empty and has no fields.
A list of models available on the Diatheke server.
Field | Type | Label | Description |
---|---|---|---|
models | ModelInfo | repeated |
Information about a single Diatheke model.
Field | Type | Label | Description |
---|---|---|---|
id | string | Diatheke model ID, which is used to create a new session. |
|
name | string | Pretty model name, which may be used for display purposes. |
|
language | string | Language code of the model. |
|
asr_sample_rate | uint32 | The ASR audio sample rate, if ASR is enabled. |
|
tts_sample_rate | uint32 | The TTS audio sample rate, if TTS is enabled. |
This action indicates that the client application should give the provided text to the user. This action may also be used to synthesize speech with the StreamTTS method.
Field | Type | Label | Description |
---|---|---|---|
text | string | Text of the reply |
|
luna_model | string | TTS model to use with the TTSReply method |
Used by Diatheke to update the session state.
Field | Type | Label | Description |
---|---|---|---|
token | TokenData | The session token. |
|
text | TextInput | Process the user supplied text. |
|
asr | ASRResult | Process an ASR result. |
|
cmd | CommandResult | Process the result of a completed command. |
|
story | SetStory | Change the current session state. |
The result of updating a session.
Field | Type | Label | Description |
---|---|---|---|
token | TokenData | The updated session token. |
|
action_list | ActionData | repeated | The list of actions the client should take next, using the session token returned with this result. |
Used to create a new session.
Field | Type | Label | Description |
---|---|---|---|
model_id | string | Specifies the Diatheke model ID to use for the session. |
|
wakeword | string | Specifies a custom wakeword to use for this session. The wakeword must be enabled in the Diatheke model for this to have any effect. It will override the default wakeword specified in the model. |
Changes the current state of a Diatheke session to run at the specified story.
Field | Type | Label | Description |
---|---|---|---|
story_id | string | The ID of the story to run, as defined in the Diatheke model. |
|
parameters | SetStory.ParametersEntry | repeated | A list of parameters to set before running the given story. This will replace any parameters currently defined in the session. |
Field | Type | Label | Description |
---|---|---|---|
key | string | ||
value | string |
Contains synthesized speech audio. The specific encoding is defined in the server config file.
Field | Type | Label | Description |
---|---|---|---|
audio | bytes |
User supplied text to send to Diatheke for processing.
Field | Type | Label | Description |
---|---|---|---|
text | string |
A token that represents a single Diatheke session and its current state.
Field | Type | Label | Description |
---|---|---|---|
data | bytes | ||
id | string | Session ID, useful for correlating logging between a client and the server. |
|
metadata | string | Additional data supplied by the client app, which will be logged with other session info by the server. |
This action indicates that the client application should call the Transcribe method to capture the user’s input.
Field | Type | Label | Description |
---|---|---|---|
id | string | The ID of the transcribe action, which is useful to differentiate separate transcription tasks within a single sesssion. |
|
cubic_model_id | string | (Required) The ASR model to use for transcription. |
|
diatheke_model_id | string | (Optional) A Diatheke model to use for end-of-stream conditions. If empty, the server will not be able to automatically close the transcribe stream based on conditions defined in the model, such as a non-speech timeout or an “end-transcription” intent. When empty, the stream must be closed by the client application. |
Data to send to the Transcribe stream. The first message on the stream must be a TranscribeAction, followed by audio data.
Field | Type | Label | Description |
---|---|---|---|
action | TranscribeAction | Action defining the transcribe configuration. |
|
audio | bytes | Audio data to transcribe. |
The result from the Transcribe stream. Usually, several partial (or intermediate) transcriptions will be sent until the final transcription is ready for every utterance processed.
Field | Type | Label | Description |
---|---|---|---|
text | string | The transcription. |
|
confidence | double | Confidence estimate between 0 and 1. A higher number represents a higher likelihood that the transcription is correct. |
|
is_partial | bool | True if this is a partial result, in which case the next result will be for the same audio, either repeating or correcting the text in this result. When false, this represents the final transcription for an utterance, which will not change with further audio input. It is sent when the ASR has identified an endpoint. After the final transcription is sent, any additional results sent on the Transcribe stream belong to the next utterance. |
Lists the version of Diatheke and the engines it uses.
Field | Type | Label | Description |
---|---|---|---|
diatheke | string | Dialog management engine |
|
chosun | string | NLU engine |
|
cubic | string | ASR engine |
|
luna | string | TTS engine |
This action indicates that Diatheke is expecting user input.
Field | Type | Label | Description |
---|---|---|---|
requires_wake_word | bool | True if the next user input must begin with a wake-word. |
|
immediate | bool | True if the input is required immediately (i.e., in response to a question Diatheke asked the user). When false, the client should be allowed to wait indefinitely for the user to provide input. |
See the protocol buffer documentation for these
.proto Type | Notes |
---|---|
Duration | Represents a signed, fixed-length span of time represented as a count of seconds and fractions of seconds at nanosecond resolution |
Empty | Used to indicate a method takes or returns nothing |