navigation
Note
This documentation is for v2.x of the API. For documentation about v1.x of the API, please see here.

Diatheke is Cobalt’s dialog management engine. It uses a combination of speech technologies and artificial intelligence to allow users to interact with computers and mobile devices through audio and text based dialogs.

Audio Based Dialogs

Audio based dialog management uses multiple speech technologies, including Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Text To Speech (TTS). It starts with audio input from a calling application (human speech), which Diatheke sends to the ASR engine to get a transcription. This transcription is passed on to the NLU engine, which converts the transcription into an intent and entities. The intent and entites are then used to perform an action, as defined by the Diatheke model. One such action is for Diatheke to send a reply to the user. The reply text is sent to the TTS engine, which then synthesizes audio to send back to the client, as shown below.

graph LR; subgraph Diatheke A[ASR] -->|Transcription| B[NLU] B -->|Intents and Entities| C[Dialog
Model] C -->|Reply| D[TTS] end E[Calling
Application] ==>|Audio Input| A D ==>|Synthesized Audio| E

Text Based Dialogs

Diatheke is also capable of processing dialogs without using audio input or output. In this case, text is sent to Diatheke, which forwards it directly to the NLU engine and converts the text to an intent and entities, as it did with the transcription in the audio workflow. The intent and entities are used to perform an action, such as sending a reply to the calling application in the form of text, as shown below.

graph LR; subgraph Diatheke A[NLU] -->|Intents and Entities| B[Dialog
Model] end C[Calling
Application] ==>|Text Input| A B ==>|Reply Text| C