AI Dialogue Management for Enterprise Applications: The Chatty Revolution:
Written by Aisera CTO Antonio Nucci
Ph.D. with 25 years of experience in Silicon Valley’s Tech Industry
Aisera Dialogue Management
State of the Art virtual agents are nowadays built on modern Auto-Regressive Transformer Architecture, also known as Generative Pre-Trained Transformer (GPT) architecture, and are capable of creating new and original answers and content based on 45 Terabytes of public data it has been trained on (text, images, programming languages and more).
GPT leverages a super-large neural network comprised of 175 billion parameters (the connection between two logically adjacent neurons) to store this massive amount of information. The outcome is an impressive in-depth understanding of a language and its nuances more than any other of its kind. It is capable of understanding natural language queries without prior training or knowledge about the user but solely focusing on the user’s writing style, the context of the conversation, and the grammar used.
Aisera has deeply embraced GPT across all its architectural components and fined-tuned the pre-trained model with over 1 trillion sentence pairs (comprised of domain-specific entities and associated intents) to further deepen its understanding of enterprises in various domains, including IT, HR, Legal & Compliance, Finance & Procurement, Sales & Marketing, and Customer Service.
Aisera Dialogue Management Unique Capabilities
Aisera’s Dialogue Management is primarily responsible for interpreting and contextualizing conversations between the users and the virtual assistant. Beyond recording and keeping track of the status of each user conversation (across all engagement channels), the dialogue engine interacts with a policy service and a variety of AI models (each model is specialized to execute a specific AI task) to deeply analyze the user request (natural language understanding) and auto-generate conversational messages back to the users (natural language generation). Whether that is providing clarification or providing a final resolution to their inquiries.
Below we provide some of the major features supported by Aisera’s Fine-Tuned Generative AI SQL Dialogue Engine.
Stateful Content Awareness Service
Aisera’s stateful dialogue engine keeps track of all the exchanges between the user and the virtual agent within the same session, across multiple domains, and channels of engagement. By reviewing the past interactions, the dialogue management can more clearly understand what the user is seeking (intent understanding) and frame the new responses in context (answer generation). Let’s consider the case of IT domain and assume the user types “install zoom on iPhone 14” and in the next turn types “and do the same for Webex”. A stateless context awareness service would treat the request “and do the same for Webex” as a new user request without any memory of what was requested by the user before, hence not capable of understanding the user intent. Conversely, Aisera’s stateful content awareness service is capable to interpret “and do the same for Webex” is related to software installation on iPhone 14 and reformulates the second user request as “install Webex on iPhone 14” (Figure 1).
Figure 1. Intelligent Context Carrying (Enterprise Domain: IT)
Next, let’s consider the case of Customer Service, and consider serving Benepass, a company that provides pre-tax benefits and perks to employees. If a user types “do you offer any commuter benefits?”, followed by “want to enroll now”, the stateful content awareness service will be capable to understand that “want to enroll now” is related to the previously asked commuter service, and reformulate the user question as “enroll to commuter benefits”.
Personalized Conversational Experiences and Security Controls
Aisera’s AI SQL Dialogue Engine leverages existing user profile data to streamline and personalize interactions, and use conversational data to dynamically keep user profiles up-to-date. The Conversation Engine can refer back to user profile data such as hardware, language preferences, and job title to deliver appropriate and relevant responses. Aisera’s platform can also set up access controls based on user title to set up security checks. This allows the Conversation Engine to safeguard the delivery of certain types of content depending on user privileges.
Casual Classification Service
The Casual Classification Service processes the user request to assess whether the user request is CASUAL (“Good morning, my name is Mark. How are you today?”), and if so, it will automatically generate an answer back to the user (“Hi, my name is Aisera and I am doing great today. How can I help you?”). The Casual Classification Service allows Aisera Virtual Assistant to engage with users in everyday situations and conversations (Figure 2).
Figure 2. Casual Talks for Everyday Conversations.
Random/Gibberish Classification Service
The Random/Gibberish Classification Service process the user request to assess whether is RANDOM (“1313qfcfd b/”) or GIBBERISH (“gollygoops”) and if some of it will automatically generate an answer back to the user by acknowledging its lack of understanding (“I’m sorry, but I don’t understand what you are trying to communicate with “1313qfcfd b/“. Could you please provide me with more context or rephrase your question or statement? I’m here to help you with any questions or issues you may have.”)
Intent Extraction & Disambiguation Service
The Intent Extraction Service processes the user request, being short or long and verbose, and extracts all the intents it can find, single or multiple intents. For each intent extracted, it then assesses whether the intent is ambiguous or actionable.
An intent is defined as ambiguous if the virtual agent has only a vague understanding of what the user is asking, and hence it cannot precisely act upon. In this case, the Intent Extraction & Disambiguation Service automatically generates an in-context clarification question which acknowledges first the user request and asks for more details. Examples of ambiguous user requests and corresponding auto-generated answers are shown in Figure 3.
User Request: “Help with VPN”
Figure 3. Generative AI to disambiguate vague user requests with auto-generated AI and in-context questions.
Conversely, if an intent is defined as actionable, the virtual agent has a good understanding of what the user is asking. In such cases, the Intent Extraction & Disambiguation Service forwards the request to the fulfillment service for resolution as shown in Figure 4.
Figure 4. An example of actionable and well-defined intent “Login issue with Global Protect” and associated resolution via answer extraction from Enterprise Private Knowledge articles.
Domain Type Classification Service
The Domain Type Classification Service is an AI service which is specialized to process each actionable intent and determine the specific domain the user request/intent belongs to. Customers can simply define the enterprise scope of their virtual assistant, as a recollection of enterprise domains that the virtual assistant will be trained on and hence capable to provide effective resolutions for.
When a user request is classified as belonging to an in-scope domain, the virtual assistant will forward the user request to the fulfillment pipeline. Conversely, if the user request falls outside its defined scope, the virtual domain type classification will auto-generate an answer to the user by acknowledging their request but politely reminding them that their request is outside the defined scope of the virtual assistant. In Figure 5 we show an example of an out-of-scope user request and how the domain type classification service interacts back to the user. This experience can be fully customized by the customers.
Figure 5. The scope of Aisera Virtual Assistant has been defined to cover requests related solely to IT and HR. In this example, we show how an out-of-scope user request (sales operations) will be correctly classified by the virtual assistant as an out-of-scope request and how the Domain Type Classification Engine automatically acknowledges the user request and remind the user of what it has been instructed to respond
Chaining Together Intent Extraction, Disambiguation & Domain Classification Services
The dialogue management utilizes a knowledge graph to fact-check customer data, knowledge articles, etc. This helps deliver verified and accurate information. Furthermore, the Intent Extraction Classification service processes the information provided by the Stateful Content Awareness Service and extracts the intent(s) from the user request (whether single or multiple intents). Each intent extracted is then processed by the disambiguation model which checks whether the extracted intent is ambiguous or actionable. If ambiguous, meaning the virtual agent has only a vague understanding of what the user is asking, the disambiguation service automatically generates a clarification question to seek further details from the user.
If the intent is actionable, meaning the virtual agent has a good understanding of what the user is asking, then the intent is forwarded to the domain classification service, which processes the intent to extract the domain it belongs to. If the extracted domain does not belong to the scope of the virtual agent, the domain classification service automatically generates an answer to acknowledge the user request but reminds the user that it can only serve topics related to the scope of the virtual agent. If the domain extracted is within the scope of the virtual agent, then the intent is further processed by the fulfillment pipeline.
For example, let’s say the virtual agent has been set up to serve IT and HR domains for an enterprise. This is known as the scope of the virtual agent. Let’s say the user types “how to set up alerts in zoom and change my contributions”. The user request will be processed, and the following two intents will be extracted:
- “Intent: Setup Zoom Notifications | Intent Type: Actionable | Intent Domain: IT”
- “Intent: Change My Contributions |Intent Type “Ambiguous | Intent Domain: Unknown
Clarification Question “I’m sorry, but I’m not sure what you are referring to when you say, “change my contributions”. Could you please provide me with more information or context so that I can better understand what you are asking for?”
Sentiment & Emotions Detection & Empathetic Answer Generation Service
The Sentiment Detection Service, process the user request and categorize the request into negative, positive, or neutral. The Emotion Detection Service helps gain a deeper understanding of the user’s emotions like angry, happy, and sad or whether the user is engaged or disengaged.
By leveraging this information, the service automatically generates an answer to the user by self-adjusting its conversational style and tone to align with the user’s feelings (see Figure 6). In Figure 6, we show how this service operates on the user request “I have been stuck for hours in trying to connect to my corporate network. So much time wasted!”, which is classified with Sentiment: Negative and Emotions: Frustration, Disappointment, Impatience. Sentiment analysis also helps directly escalate a user request to a human agent when applicable.
Figure 6. The Sentiment & Emotions Detection Service process the user request and extract Sentiment: Negative and Emotions: Frustration, Disappointment, Impatience. It automatically adjusts the conversation style and tone to generate an answer which resembles the user’s feelings.