Rasa-Bot-Project

The files and python code for an Information Retrieval Chatbot made using the Rasa Stack


Project maintained by prat-degwekar Hosted on GitHub Pages — Theme by mattgraham

Documentation : An Information Retrieval Chatbot Using the Rasa Stack

Author
Prathamesh Anil Degwekar

Version
$Revision: 1 $

Parts:

About Chatbots

A chatbot is a computer program or an artificial intelligence which conducts a conversation via auditory or textual methods. Such programs are often designed to convincingly simulate how a human would behave as a conversational partner, thereby passing the Turing test. Chatbots are typically used in dialog systems for various practical purposes including customer service or information acquisition. Some chatterbots use sophisticated natural language processing systems, but many simpler systems scan for keywords within the input, then pull a reply with the most matching keywords, or the most similar wording pattern, from a database.

The process of creating a chatbot follows a pattern similar to the development of a web page or a mobile app. It can be divided into Design, Building, Analytics and Maintenance.

Design

The chatbot design is the process that defines the interaction between the user and the chatbot. The chatbot designer will define the chatbot personality, the questions that will be asked to the users, and the overall interaction. It can be viewed as a subset of the conversational design. In order to speed up this process, designers can use dedicated chatbot design tools, that allow for immediate preview, team collaboration and video export. An important part of the chatbot design is also centered around user testing. User testing can be performed following the same principles that guide the user testing of graphical interfaces.

Building

The process of building a chatbot can be divided into two main tasks: understanding the user’s intent and producing the correct answer. The first task involves understanding the user input. In order to properly understand a user input in a free text form, a Natural Language Processing Engine can be used. The second task may involve different approaches depending on the type of the response that the chatbot will generate.

Analytics

The usage of the chatbot can be monitored in order to spot potential flaws or problems. It can also provide useful insights that can improve the final user experience.

Maintenance

To keep chatbots up to speed with changing company products and services, traditional chatbot development platforms require ongoing maintenance. This can either be in the form of an ongoing service provider or for larger enterprises in the form of an in-house chatbot training team.[47] To eliminate these costs, some startups are experimenting with Artificial Intelligence to develop self-learning chatbots, particularly in Customer Service applications.

Working

All chatbots, regardless of source and nature, work on similar principles. They all work on Natural Language Understanding. Then, once they understand what the user means, they go ahead and act on it depending on how they are made and what they are made to do. The crux of their working falls on intent <link> and entity <link> extraction.

So, What are intents?

Simply put, intents are the intentions of the end-user, these intentions or intents are conveyed by the user to your bot. Understanding which intention your user had, is equivalent to understanding what question he asked, and that is the whole business of the product.

You can roughly classify your intents into 2 categories

  1. Casual Intents
  2. Meaningful Intents

They are :

  1. Casual Intents

    These are the intents that are used to generate “small talk’. These intents are the opener or closer of a conversation. The Greetings like “hi” and “hello” are the opening or closing statements in a conversation. These intents should direct your bot to respond with a small talk reply like “Hello, what can I do for you today?” or “Bye, Thanks for talking to me!”. The casual intents also comprise of Affirmative and Negative intents for utterances like “Ok” and “yes please”. Having General affirmative and negative intents help you handle all the information in a confirmedly satisfied manner. For ex — if the Bot just asked a question to end-user — His reponse (positive or negative) tells you his stance on the matter and makes things more clear for you.

  2. Meaningful Intents

    These are the intents that directly map to the purpose of the bot. They tell you which if the actual functions your bot exists for, are needed by the user. The meaningful intents of your bot are more important because the small talk is the same for most bots, the differentating factor is these intents. They define what your bot is actually doing.

Hence, most of the time spent on bot design is spent on intent design, because this is the backbone on which any bot stands.

And Entities?

Intents tell you the purpose, or the function the user requires, but the spicifics of said tasks are given by the entities. Let’s consider an example where the intent is searching for a restaurant. The intent can tell you that the user is searching for a restaurant by cuisine, but the entity will tell you WHICH cuisine the user is looking for. So the intent maybe the same, but different users are looking for different things, which is captured by the entity.

All the different frameworks have their own methods of doing said extractions. Once this information is received, required actions are taken and subsequent results are achieved.

Information Retrieval Chatbots

Most of the chatbots used today are used for one specific application : Information Retrieval. This class of bots is designed to provide human-like answers without human intervention. Here, the system tries to understand what question you are trying to ask, or more realistically which question from it’s bank is closest to the question you are trying to ask, and answers it accordingly (Pre stored and/or trained). Of course internet variants exist where you search online or query a database for results, or maybe even run diagnostics, analytics or some other functionality. They are gaining importance by the day and have a great lot of development put into them. There are mostly 3 types of bots:

  1. The age old Rule Based
  2. The modern era Deep Learning Based
  3. And the one that seems to gain the most traction, the Hybrid, which is a combination of the above two.

Hence, the way forward in this project was conferred and aggreed to be the third.

Frameworks

For this task, there are very many Frameworks in development out there that employ different technologies and use different implementations for accomplishing similar (or sometimes the same) tasks. Of these, after great Research, trial and consideration, the following were deemed fit to fulfill the requirements of this project and product :

Their positives, negatives, and use cases were deeply pondered upon and the following describe the results.

WIT


By their Definition ::
“Wit.ai makes it easy for developers to build applications and devices that you can talk or text to. Our vision is to empower developers with an open and extensible natural language platform. Wit.ai learns human language from every interaction, and leverages the community: what’s learned is shared across developers.”

Wit was recently bought by Facebook!

For programming using Wit, there are 3 major options: Javascript, Python, and Ruby. Here, one needs to type out all the sentences, mark intents and entitites, and with enough examples the system performs well! The only issue though, is that this only provides Natural Language Understanding and not any contextual dialogue or conversational options. Hence this was ruled out as a possible alternative.

LUIS


LUIS, Short for Language Understanding is a bot making service and framework designed by Microsoft. Self described :: “Language Understanding (LUIS) is a cloud-based service that applies custom machine-learning to a user’s conversational, natural language text to predict overall meaning, and pull out relevant, detailed information.

A client application for LUIS can be any conversational application that communicates with a user in natural language to complete a task. Examples of client applications include social media apps, chatbots, and speech-enabled desktop applications.”

image

A LUIS app contains a domain-specific natural language model you design. Hence it was possible to use this for my project. What gave this tool a great advantage is that it can be used hand-in-hand with other great tools Microsoft provides like the Bing Spell Check and Correction, Azure Search, Bing Speech API, their Azure Bot Service, etc.

The only issues faced here were that the code had to be in C#, and more importantly, it had to be hosted on a Microsoft Server and the data would be given to Microsoft, which was very much a deal breaker. Not to mention the pricing issues that come along with it.

DialogFlow


As they define it, Dialogflow helps you :

"Give users new ways to interact with your product by building engaging voice and text-based
conversational interfaces, such as voice apps and chatbots, powered by AI. Connect with users
on your website, mobile app, the Google Assistant, Amazon Alexa, Facebook Messenger, and other
popular platforms and devices."

DialogFlow gives a great Web UI to create a bot agent and train it with easy to create training data. Built on Google’s infrastructure, it has great Machine Learning algorithms implementations and capabilities and has great integration with Social Media. The issues with this though are that:

Thus, even though it was a great alternative, It had to be sidelined for RASA.

Rasa


As they define it, Rasa helps you :

"Build great conversational AI in-house
 Open source machine learning tools for developers and product teams to expand bots
 beyond answering simple questions."

Conversations are rarely just one question and one answer. With Rasa you can build bots that can handle back-and-forth with your customers. This handling of contextual dialogues is done with deep learning instead of hand-crafted rules. It is an Open Source and fully customisable solution even for the enterprise IT landscape. It has the latest machine learning research integrated for best results.

The other competent frameworks have a lot of data sharing issues, as in you need to share your data with them. This is where Rasa comes incredibly ahead. It is an open source bot building framework. It doesn’t have any pre-built, pre-existing and downloadable models on a server that you can call using an API. This gives you complete control of ALL the components in your chatbot.

Rasa does this in a very innovative way. The Rasa stack consists of two major components :

Rasa NLU and Rasa Core. 

Rasa NLU is responsible for the Natural Language Understanding of the chatbot. Its main purpose is, given an input sentence, predict the intent of that sentence and extract useful entities from it.

The second component, Rasa Core, is the next component in the Rasa stack. It takes structured input in the form of intents and entities which need not be from Rasa NLU, and chooses which action the bot should take using a probabilistic model (LSTM).

The RASA Stack

The cool thing about Rasa is that every part of the stack is fully customizable and easily interchangeable. It is possible, and sometimes recommended to use Rasa Core or Rasa NLU separately! When using Rasa NLU, you can choose among several backend NLP (Natural Language Processing) libraries. The LSTM neural network which Rasa Core uses for action prediction can be easily exchanged for any other type of Neural Network, if you know how to implement them in Keras.

image

Rasa Core

Rasa Core takes in structured input: intents and entities, button clicks, etc., and decides which action your bot should run next. If you want your system to handle free text, you need to also use Rasa NLU or another NLU tool.

The main idea behind Rasa Core is that thinking of conversations as a flowchart doesn’t scale. It’s very hard to reason about all possible conversations explicitly, but it’s very easy to tell, mid-conversation, if a response is right or wrong. Hence, Rather than writing a bunch of if/else statements, a Rasa bot learns from real conversations. A probabilistic model chooses which action to take, and this can be trained very easily.

The advantages of this approach are that:

Rasa NLU

You can think of Rasa NLU as a set of high level APIs for building your own language parser using existing NLP and ML libraries. Rasa NLU is written in Python, but it you can use it from any language through Using Rasa NLU as an HTTP server. If your project is written in Python, you can simply import the relevant classes and get the job done.

Rasa Installation and Usage

The installation of Rasa is simple and fast.

Rasa Core

To install , run :

pip install rasa_core

and that’s it!

Rasa NLU

Installation of NLU depends on which backend you use. I used the “spacy + sklearn” backend, which can be installed as follows :

pip install rasa_nlu[spacy]
python -m spacy download en

And your setup is done.

Usage

The usage depends on a simple setup python code which you can easily find in the Rasa Docs, or can be extracted from my code <link of code comes here>, and the definition of 3 files, namely :

Domain.yml, stories.md, and nlu_data.md

They define respectively the universe your bot lives in, the conversational backbone the bot has to follow, and the data on which it is trained to classify intents and entities.

Domain

The Domain defines the universe in which your bot operates. It specifies exactly:

What are Slots?

Slots are the things you want to keep track of during a conversation. For example, in the messages above you would want to store “Mexican” as a cuisine type. The tracker has an attribute like tracker.get_slot(“cuisine”) which will return “Mexican”

Actions

Actions are the things your bot can actually do. They are invoked by calling the action.run() method. For example, an action can:

Stories

A training data sample for the dialogue system is called a story. This shows the bot how to act and react to the inputs given by the user. :

## story_07715946                     <!-- name of the story - just for debugging -->

* greet
    - action_ask_howcanhelp
* inform{"location": "rome", "price": "cheap"}  <!-- user utterance, in format intent{entities} -->
    - action_on_it
    - action_ask_cuisine
* inform{"cuisine": "spanish"}
    - action_ask_numpeople             <!-- action of the bot to execute -->
* inform{"people": "six"}
    - action_ack_dosearch

This is what we call a story. A story starts with a name preceded by two hashes ## story_03248462, this is arbitrary but can be used for debugging. The end of a story is denoted by a newline, and then a new story starts again with ##.

This story tells the bot that the beginning is the greet, then if, the first inform happens, then take action_on_it and then action_ask_cuisine. So on and so forth.

NLU Data

The data to train the NLU can be given either by JSON on by MARKDOWN. I chose markdown, so this is what that looks like :

## intent:check_balance
- what is my balance <!-- no entity -->
- how much do I have on my [savings](source_account) <!-- entity "source_account" has value "savings" -->
- how much do I have on my [my savings account](source_account:savings) <!-- synonyms, method 1-->

## intent:greet
- hey
- hello

## synonym:savings   <!-- synonyms, method 2 -->
- pink pig


## regex:zipcode
- [0-9]{5}

Hence, the intent is mentioned, and a couple of examples of said intent are mentioned. Then the model is trained on these examples. Rule of thumb is, more the examples (for each intent), the better the prediction.

Hence, depending on the definition and content of the above three files, your bot will be trained and will preform accordingly.

The Project Code

<Code and Explanation comes here, along with the created block diagrams>

# put python code here

<copy paste of code in master branch, and the image of code and block diagrams of Core and NLU come here>

The crux of the code and the design depends on Neural Networks, more specifically LSTM.

So what is an LSTM?

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized by many people in following work.1 They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer. LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way. More information can be found here<link>.

This sounds great, but is there anything better than this?
I mean, better is a relative term, but the GRU cell seems to be beating the LSTM at a lot of it’s own game.

So, what is a GRU cell?

Explaining what the GRU cell is without getting overly complicated is a bit difficult, but the difference between the LSTM and the GRU cell, is that it combines the forget and input gates of an LSTM into a single “update gate” of the GRU. It also merges the cell state and hidden state, among other changes. The resulting model is simpler than standard LSTM models, and has been growing increasingly popular.

Hence I made two different bots with the exact same configuration with the only difference being one is based on the LSTM cell, while the other is based on the GRU cell. So how did the two compare?

Results