Diatheke API Reference

The Diatheke API is defined using gRPC and protocol buffers. This section of the documentation is auto-generated from the protobuf file. It describes the data types and functions defined in the spec. The “messages” below correspond to the data structures to be used, and the “service” contains the methods that can be called.

diatheke.proto

Service: Diatheke

Service that implements the Cobalt Diatheke Dialog Management API.

Method Name Request Type Response Type Description
Version Empty VersionResponse Returns version information from the server.
ListModels Empty ListModelsResponse ListModels returns information about the Diatheke models the server can access.
CreateSession SessionStart SessionOutput Create a new Diatheke session. Also returns a list of actions to take next.
DeleteSession TokenData Empty Delete the session. Behavior is undefined if the given TokenData is used again after this function is called.
UpdateSession SessionInput SessionOutput Process input for a session and get an updated session with a list of actions to take next. This is the only method that modifies the Diatheke session state.
StreamASR ASRInput ASRResult Create an ASR stream. A result is returned when the stream is closed by the client (which forces the ASR to endpoint), or when a transcript becomes available on its own, in which case the stream is closed by the server. The ASR result may be used in the UpdateSession method.

If the session has a wakeword enabled, and the client application is using Diatheke and Cubic to handle the wakeword processing, this method will not return a result until the wakeword condition has been satisfied. Utterances without the required wakeword will be discarded and no transcription will be returned.
StreamTTS ReplyAction TTSAudio Create a TTS stream to receive audio for the given reply. The stream will close when TTS is finished. The client may also close the stream early to cancel the speech synthesis.
Transcribe TranscribeInput TranscribeResult Create an ASR stream for transcription. Unlike StreamASR, Transcribe does not listen for a wakeword. This method returns a bi-directional stream, and its intended use is for situations where a user may say anything at all, whether it is short or long, and the application wants to save the transcript (e.g., take a note, send a message).

The first message sent to the server must be the TranscribeAction, with remaining messages sending audio data. Messages received from the server will include the current best partial transcription until the full transcription is ready. The stream ends when either the client application closes it, a predefined duration of silence (non-speech) occurs, or the end-transcription intent is recognized.

Message: ASRInput

Data to send to the ASR stream. The first message on the stream must be the session token followed by audio data.

Field Type Label Description
token TokenData

Session data, used to determine the correct Cubic model to use for ASR, with other contextual information.

audio bytes

Audio data to transcribe.

Message: ASRResult

The result from the ASR stream, sent after the ASR engine has endpointed or the stream was closed by the client.

Field Type Label Description
text string

The transcription.

confidence double

Confidence estimate between 0 and 1. A higher number represents a higher likelihood of the output being correct.

timedOut bool

True if a timeout was defined for the session’s current input state in the Diatheke model, and the timeout expired before getting a transcription. This timeout refers to the amount of time a user has to verbally respond to Diatheke after the ASR stream has been created, and should not be confused with a network connection timeout.

Message: ActionData

Specifies an action that the client application should take.

Field Type Label Description
input WaitForUserAction

The user must provide input to Diatheke.

command CommandAction

The client app must execute the specified command.

reply ReplyAction

The client app should provide the reply to the user.

transcribe TranscribeAction

The client app should call the Transcribe method to capture the user’s input.

Message: CommandAction

This action indicates that the client application should execute a command.

Field Type Label Description
id string

The ID of the command to execute, as defined in the Diatheke model.

input_parameters CommandAction.InputParametersEntry repeated

Message: CommandAction.InputParametersEntry

Field Type Label Description
key string

value string

Message: CommandResult

The result of executing a command.

Field Type Label Description
id string

The command ID, as given by the CommandAction

out_parameters CommandResult.OutParametersEntry repeated

Output from the command expected by the Diatheke model. For example, this could be the result of a data query.

error string

If there was an error during execution, indicate it here with a brief message that will be logged by Diatheke.

Message: CommandResult.OutParametersEntry

Field Type Label Description
key string

value string

Message: Empty

This message is empty and has no fields.

Message: ListModelsResponse

A list of models available on the Diatheke server.

Field Type Label Description
models ModelInfo repeated

Message: ModelInfo

Information about a single Diatheke model.

Field Type Label Description
id string

Diatheke model ID, which is used to create a new session.

name string

Pretty model name, which may be used for display purposes.

language string

Language code of the model.

asr_sample_rate uint32

The ASR audio sample rate, if ASR is enabled.

tts_sample_rate uint32

The TTS audio sample rate, if TTS is enabled.

Message: ReplyAction

This action indicates that the client application should give the provided text to the user. This action may also be used to synthesize speech with the StreamTTS method.

Field Type Label Description
text string

Text of the reply

luna_model string

TTS model to use with the TTSReply method

Message: SessionInput

Used by Diatheke to update the session state.

Field Type Label Description
token TokenData

The session token.

text TextInput

Process the user supplied text.

asr ASRResult

Process an ASR result.

cmd CommandResult

Process the result of a completed command.

story SetStory

Change the current session state.

Message: SessionOutput

The result of updating a session.

Field Type Label Description
token TokenData

The updated session token.

action_list ActionData repeated

The list of actions the client should take next, using the session token returned with this result.

Message: SessionStart

Used to create a new session.

Field Type Label Description
model_id string

Specifies the Diatheke model ID to use for the session.

wakeword string

Specifies a custom wakeword to use for this session. The wakeword must be enabled in the Diatheke model for this to have any effect. It will override the default wakeword specified in the model.

Message: SetStory

Changes the current state of a Diatheke session to run at the specified story.

Field Type Label Description
story_id string

The ID of the story to run, as defined in the Diatheke model.

parameters SetStory.ParametersEntry repeated

A list of parameters to set before running the given story. This will replace any parameters currently defined in the session.

Message: SetStory.ParametersEntry

Field Type Label Description
key string

value string

Message: TTSAudio

Contains synthesized speech audio. The specific encoding is defined in the server config file.

Field Type Label Description
audio bytes

Message: TextInput

User supplied text to send to Diatheke for processing.

Field Type Label Description
text string

Message: TokenData

A token that represents a single Diatheke session and its current state.

Field Type Label Description
data bytes

id string

Session ID, useful for correlating logging between a client and the server.

metadata string

Additional data supplied by the client app, which will be logged with other session info by the server.

Message: TranscribeAction

This action indicates that the client application should call the Transcribe method to capture the user’s input.

Field Type Label Description
id string

The ID of the transcribe action, which is useful to differentiate separate transcription tasks within a single sesssion.

cubic_model_id string

(Required) The ASR model to use for transcription.

diatheke_model_id string

(Optional) A Diatheke model to use for end-of-stream conditions. If empty, the server will not be able to automatically close the transcribe stream based on conditions defined in the model, such as a non-speech timeout or an “end-transcription” intent. When empty, the stream must be closed by the client application.

Message: TranscribeInput

Data to send to the Transcribe stream. The first message on the stream must be a TranscribeAction, followed by audio data.

Field Type Label Description
action TranscribeAction

Action defining the transcribe configuration.

audio bytes

Audio data to transcribe.

Message: TranscribeResult

The result from the Transcribe stream. Usually, several partial (or intermediate) transcriptions will be sent until the final transcription is ready for every utterance processed.

Field Type Label Description
text string

The transcription.

confidence double

Confidence estimate between 0 and 1. A higher number represents a higher likelihood that the transcription is correct.

is_partial bool

True if this is a partial result, in which case the next result will be for the same audio, either repeating or correcting the text in this result. When false, this represents the final transcription for an utterance, which will not change with further audio input. It is sent when the ASR has identified an endpoint. After the final transcription is sent, any additional results sent on the Transcribe stream belong to the next utterance.

Message: VersionResponse

Lists the version of Diatheke and the engines it uses.

Field Type Label Description
diatheke string

Dialog management engine

chosun string

NLU engine

cubic string

ASR engine

luna string

TTS engine

Message: WaitForUserAction

This action indicates that Diatheke is expecting user input.

Field Type Label Description
requires_wake_word bool

True if the next user input must begin with a wake-word.

immediate bool

True if the input is required immediately (i.e., in response to a question Diatheke asked the user). When false, the client should be allowed to wait indefinitely for the user to provide input.

Well-Known Types

See the protocol buffer documentation for these

.proto Type Notes
Duration Represents a signed, fixed-length span of time represented as a count of seconds and fractions of seconds at nanosecond resolution
Empty Used to indicate a method takes or returns nothing

Scalar Value Types

.proto Type Notes Go Type Python Type C++ Type
double float64 float double
float float32 float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int32
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 int/long int64
uint32 Uses variable-length encoding. uint32 int/long uint32
uint64 Uses variable-length encoding. uint64 int/long uint64
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int32
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 int/long int64
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int uint32
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 int/long uint64
sfixed32 Always four bytes. int32 int int32
sfixed64 Always eight bytes. int64 int/long int64
bool bool boolean bool
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string str/unicode string
bytes May contain any arbitrary sequence of bytes. []byte str string