Diatheke SDK > Diatheke API Reference

Diatheke API Reference

The Diatheke API is defined using gRPC and protocol buffers. This section of the documentation is auto-generated from the protobuf file. It describes the data types and functions defined in the spec. The “messages” below correspond to the data structures to be used, and the “service” contains the methods that can be called.

diatheke.proto

Service: Diatheke

Service that implements the Cobalt Diatheke Dialog Management API.

Method Name	Request Type	Response Type	Description
Version	Empty	VersionResponse	Returns version information from the server.
ListModels	Empty	ListModelsResponse	ListModels returns information about the Diatheke models the server can access.
CreateSession	SessionStart	SessionOutput	Create a new Diatheke session. Also returns a list of actions to take next.
DeleteSession	TokenData	Empty	Delete the session. Behavior is undefined if the given TokenData is used again after this function is called.
UpdateSession	SessionInput	SessionOutput	Process input for a session and get an updated session with a list of actions to take next. This is the only method that modifies the Diatheke session state.
StreamASR	ASRInput	ASRResult	Create an ASR stream. A result is returned when the stream is closed by the client (which forces the ASR to endpoint), or when a transcript becomes available on its own, in which case the stream is closed by the server. The ASR result may be used in the UpdateSession method. If the session has a wakeword enabled, and the client application is using Diatheke and Cubic to handle the wakeword processing, this method will not return a result until the wakeword condition has been satisfied. Utterances without the required wakeword will be discarded and no transcription will be returned.
StreamTTS	ReplyAction	TTSAudio	Create a TTS stream to receive audio for the given reply. The stream will close when TTS is finished. The client may also close the stream early to cancel the speech synthesis.
Transcribe	TranscribeInput	TranscribeResult	Create an ASR stream for transcription. Unlike StreamASR, Transcribe does not listen for a wakeword. This method returns a bi-directional stream, and its intended use is for situations where a user may say anything at all, whether it is short or long, and the application wants to save the transcript (e.g., take a note, send a message). The first message sent to the server must be the TranscribeAction, with remaining messages sending audio data. Messages received from the server will include the current best partial transcription until the full transcription is ready. The stream ends when either the client application closes it, a predefined duration of silence (non-speech) occurs, or the end-transcription intent is recognized.

Message: ASRInput

Data to send to the ASR stream. The first message on the stream must be the session token followed by audio data.

Field	Type	Label	Description
token	TokenData		Session data, used to determine the correct Cubic model to use for ASR, with other contextual information.
audio	bytes		Audio data to transcribe.

Message: ASRResult

The result from the ASR stream, sent after the ASR engine has endpointed or the stream was closed by the client.

Field	Type	Description
text	string	The transcription.
confidence	double	Confidence estimate between 0 and 1. A higher number represents a higher likelihood of the output being correct.
timedOut	bool	True if a timeout was defined for the session’s current input state in the Diatheke model, and the timeout expired before getting a transcription. This timeout refers to the amount of time a user has to verbally respond to Diatheke after the ASR stream has been created, and should not be confused with a network connection timeout.

Message: ActionData

Specifies an action that the client application should take.

Field	Type	Description
input	WaitForUserAction	The user must provide input to Diatheke.
command	CommandAction	The client app must execute the specified command.
reply	ReplyAction	The client app should provide the reply to the user.
transcribe	TranscribeAction	The client app should call the Transcribe method to capture the user’s input.

Message: CommandAction

This action indicates that the client application should execute a command.

Field	Type	Label	Description
id	string		The ID of the command to execute, as defined in the Diatheke model.
input_parameters	CommandAction.InputParametersEntry	repeated

Message: CommandAction.InputParametersEntry

Field	Type	Label	Description
key	string
value	string

Message: CommandResult

The result of executing a command.

Field	Type	Label	Description
id	string		The command ID, as given by the CommandAction
out_parameters	CommandResult.OutParametersEntry	repeated	Output from the command expected by the Diatheke model. For example, this could be the result of a data query.
error	string		If there was an error during execution, indicate it here with a brief message that will be logged by Diatheke.

Message: CommandResult.OutParametersEntry

Field	Type	Label	Description
key	string
value	string

Message: Empty

This message is empty and has no fields.

Message: ListModelsResponse

A list of models available on the Diatheke server.

Field	Type	Label	Description
models	ModelInfo	repeated

Message: ModelInfo

Information about a single Diatheke model.

Field	Type	Description
id	string	Diatheke model ID, which is used to create a new session.
name	string	Pretty model name, which may be used for display purposes.
language	string	Language code of the model.
asr_sample_rate	uint32	The ASR audio sample rate, if ASR is enabled.
tts_sample_rate	uint32	The TTS audio sample rate, if TTS is enabled.

Message: ReplyAction

This action indicates that the client application should give the provided text to the user. This action may also be used to synthesize speech with the StreamTTS method.

Field	Type	Label	Description
text	string		Text of the reply
luna_model	string		TTS model to use with the TTSReply method

Message: SessionInput

Used by Diatheke to update the session state.

Field	Type	Description
token	TokenData	The session token.
text	TextInput	Process the user supplied text.
asr	ASRResult	Process an ASR result.
cmd	CommandResult	Process the result of a completed command.
story	SetStory	Change the current session state.

Message: SessionOutput

The result of updating a session.

Field	Type	Label	Description
token	TokenData		The updated session token.
action_list	ActionData	repeated	The list of actions the client should take next, using the session token returned with this result.

Message: SessionStart

Used to create a new session.

Field	Type	Label	Description
model_id	string		Specifies the Diatheke model ID to use for the session.
wakeword	string		Specifies a custom wakeword to use for this session. The wakeword must be enabled in the Diatheke model for this to have any effect. It will override the default wakeword specified in the model.

Message: SetStory

Changes the current state of a Diatheke session to run at the specified story.

Field	Type	Label	Description
story_id	string		The ID of the story to run, as defined in the Diatheke model.
parameters	SetStory.ParametersEntry	repeated	A list of parameters to set before running the given story. This will replace any parameters currently defined in the session.

Message: SetStory.ParametersEntry

Field	Type	Label	Description
key	string
value	string

Message: TTSAudio

Contains synthesized speech audio. The specific encoding is defined in the server config file.

Field	Type	Label	Description
audio	bytes

Message: TextInput

User supplied text to send to Diatheke for processing.

Field	Type	Label	Description
text	string

Message: TokenData

A token that represents a single Diatheke session and its current state.

Field	Type	Description
data	bytes
id	string	Session ID, useful for correlating logging between a client and the server.
metadata	string	Additional data supplied by the client app, which will be logged with other session info by the server.

Message: TranscribeAction

This action indicates that the client application should call the Transcribe method to capture the user’s input.

Field	Type	Description
id	string	The ID of the transcribe action, which is useful to differentiate separate transcription tasks within a single sesssion.
cubic_model_id	string	(Required) The ASR model to use for transcription.
diatheke_model_id	string	(Optional) A Diatheke model to use for end-of-stream conditions. If empty, the server will not be able to automatically close the transcribe stream based on conditions defined in the model, such as a non-speech timeout or an “end-transcription” intent. When empty, the stream must be closed by the client application.

Message: TranscribeInput

Data to send to the Transcribe stream. The first message on the stream must be a TranscribeAction, followed by audio data.

Field	Type	Label	Description
action	TranscribeAction		Action defining the transcribe configuration.
audio	bytes		Audio data to transcribe.

Message: TranscribeResult

The result from the Transcribe stream. Usually, several partial (or intermediate) transcriptions will be sent until the final transcription is ready for every utterance processed.

Field	Type	Description
text	string	The transcription.
confidence	double	Confidence estimate between 0 and 1. A higher number represents a higher likelihood that the transcription is correct.
is_partial	bool	True if this is a partial result, in which case the next result will be for the same audio, either repeating or correcting the text in this result. When false, this represents the final transcription for an utterance, which will not change with further audio input. It is sent when the ASR has identified an endpoint. After the final transcription is sent, any additional results sent on the Transcribe stream belong to the next utterance.

Message: VersionResponse

Lists the version of Diatheke and the engines it uses.

Field	Type	Description
diatheke	string	Dialog management engine
chosun	string	NLU engine
cubic	string	ASR engine
luna	string	TTS engine

Message: WaitForUserAction

This action indicates that Diatheke is expecting user input.

Field	Type	Label	Description
requires_wake_word	bool		True if the next user input must begin with a wake-word.
immediate	bool		True if the input is required immediately (i.e., in response to a question Diatheke asked the user). When false, the client should be allowed to wait indefinitely for the user to provide input.

Well-Known Types

See the protocol buffer documentation for these

.proto Type	Notes
Duration	Represents a signed, fixed-length span of time represented as a count of seconds and fractions of seconds at nanosecond resolution
Empty	Used to indicate a method takes or returns nothing

Scalar Value Types

.proto Type	Notes	Go Type	Python Type	C++ Type
double		float64	float	double
float		float32	float	float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32	int	int32
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64	int/long	int64
uint32	Uses variable-length encoding.	uint32	int/long	uint32
uint64	Uses variable-length encoding.	uint64	int/long	uint64
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32	int	int32
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64	int/long	int64
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	uint32	int	uint32
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	uint64	int/long	uint64
sfixed32	Always four bytes.	int32	int	int32
sfixed64	Always eight bytes.	int64	int/long	int64
bool		bool	boolean	bool
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	str/unicode	string
bytes	May contain any arbitrary sequence of bytes.	[]byte	str	string