A session represents the state of a dialog between a user and Diatheke. It tracks information relevant to the dialog, storing that information and passing it to the calling application as necessary. When information is no longer needed, it is removed from session memory. The Diatheke SDK encapsulates the session state in a token that the calling application must use when processing user input and executing commands.
How to create a new session.
Defines the actions an application should handle on behalf of a session.
How to update the state of a session using one of the allowed inputs, then responding to the session output.
How to end a session and clean up.
Every session is created using a Diatheke model, which defines the expected paths a dialog may take, including what the user and the application are allowed to do during the dialog. A session is typically created when the calling application wants to start a new dialog from the beginning, with nothing saved in memory. At the time of session creation, Diatheke will also return a list of actions the application should take. While processing these actions, the application will periodically need to update the session, at which point Diatheke will return additional actions the application should process. This cycle of processing actions and updating the session continues until the application decides it is done with the session, at which point it is deleted.
Below are a couple of examples showing how an application may interact with a session to respond to user’s request to get the weather forecast.
In this example, only voice interactions are defined between the user and the system (i.e., input is speech audio and output is synthesized speech audio). As shown below, the application starts by creating a session, which returns a session token and a WaitForUserAction. To get user input, the application then creates a new ASR Stream and starts streaming audio data (e.g., from a microphone) to Diatheke.
Eventually Diatheke will return an ASR result that includes a transcript of what the user said (“what’s the weather today”). This result is then used to update the session via the Process ASR Result method, which will return the next actions the application should take. In this case, it returns a CommandAction instructing the application to lookup weather data (e.g., from an external weather service such as https://api.weather.gov/).
The relevant weather data is then used to update the session again via the Process Command Result method, which in this example returns two actions the application should take. The first is a ReplyAction, which should be used to create a new TTS Stream. Audio from the TTS stream is played back to the user. Once playback is complete, the next action (another WaitForUserAction) is handled, and the process continues for the lifetime of the session.
In this example, only text interactions are defined between the user and the system. As shown below, the application starts by creating a session, which returns a session token and a WaitForUserAction. In response to this action, the application uses an appropriate method to get input text from the user, such a command line prompt or a GUI form.
The text from the user (“what’s the weather today”) is used to update the session via the Process Text method, which returns a CommandAction instructing the application to lookup weather data.
The relevant weather data is then used to update the session again via the Process Command Result method, which in this example returns two actions the application should take. The first is a ReplyAction, which contains the text of Diatheke’s reply that the application should display to the user using an appropriate method. The next action (another WaitForUserAction) is then handled, and the process continues for the lifetime of the session.
Applications are allowed to combine voice and text I/O in whatever ways are optimal for their use cases. For the sake of simplicity, this documentation will not go into great detail about how to do this. The key difference to note for mixed UIs is that the application will likely use both methods (Process Text and Process ASR Result) to update the session, as well as the Set Story method to synchronize the session with the GUI state.