As programmers, many of us are interested in developing for different smart devices that we use at home every day. But what if we can also manage a whole business just by connecting our skills and these new technologies that we use at home? Voice assistants are popular nowadays because they can efficiently understand our commands, get information, give it back to us and easily work with other IoT devices. In this article, I want to show you how we can quickly set up a voice assistant to access data in a database. This is where Ruby comes in.
In the previous article, we showed you how a small Alexa device can improve your business processes and decrease the amount of time needed to get various statistics. Now I want to give you a closer look at exactly how it was built.
Let’s imagine that we have a company and are storing all of our data in a database. We keep orders, invoices, project descriptions, employee data, salaries and much more in order to have fast access to it from different departments.
Before reading further, it might be useful to see everything (commands, requests and responses) as a video explanation first
and then take a closer look at the implementation.
Preconditions
First of all, we need a database with any type of data: records about existent accounts, sales, invoices, projects and their connections, etc. Most companies store data to be able to provide statistics and calculations and work with it during different processes.
Usually, companies already have web applications that work with that data (e.g. administration, visualization, different time and task trackers, etc.).
To be able to access this data with Alexa, we first need to create and set a new skill on Amazon’s developer website. Amazon developers provide good documentation on how to do it in just a few steps. We also need to build an interaction model for our skill.
In the example that I used in the video, I created a big interaction model that you can copy here. In this article, we will only use the following intents: GetHelp, GetWorkingEmployeeAmount(AMAZON.DATE date), SetProject(AMAZON.SearchQuery name, AMAZON.NUMBER budget)
. You can clone my example.
But how do we force Amazon service to send requests to our local server during the development process?
Ngrok will help us here. Just follow the instructions and put the address that you see on your screen as your Service Endpoint on the Endpoint tab of your skill, like so:
I wrote a small server using Sinatra that handles Alexa’s requests that come from Amazon and are being tunneled by Ngrok to us. Next, I will describe how it works.
Implementation of ruby server
Our local server listens to Alexa’s requests on the standard for Sinatra, port 4567. Amazon then checks theEndpoint configuration and sends it to the configured route. In this specific example, it will be /
path because I didn’t specify any other.
post '/' do
hash = JSON.parse(request.body.read)
RequestHandler.get_response(request: hash["request"]).to_json
end
As soon as we get a request, we send its body to RequestHandler
class, which will decide what the request means and how our application should respond to it.
class RequestHandler
attr_reader :response
def initialize(request:)
@request = request
set_response
end
def self.get_response(request:)
new(request: request).response
end
private
attr_reader :request
def set_response
case request["type"]
when "LaunchRequest"
@response = make_default_response_schema("Welcome to your company assistant! How can I help you?")
when "IntentRequest"
@response = IntentSelector.get_response(request: request)
end
end
def make_default_response_schema(text)
{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "PlainText",
"text": text,
},
"shouldEndSession": false
}
}
end
end
Let’s turn on our Alexa device and tell her to open APPNAME
The RequestHandler class looks for attribute ‘type’ and tries to understand whether it is IntentRequest
, LaunchRequest
or something else. If it’s a launch request, as in this case, then it greets the user back using the default response schema.
Now let’s talk more about complex requests and ask Alexa to give us some assistance by saying: “Alexa, please get some help.” Alexa then sends us another request, where the body contains type IntentRequest
.
As soon as our RequestHandler
gets this type of request, we call IntentSelector
class, which is responsible for finding intents and their definition in our application.
require 'active_support/core_ext/string'
class IntentSelector < RequestHandler
private
def set_response
@response = find_intent_class.get_response(request: request)
end
def find_intent_class
begin
class_name = request["intent"]["name"].classify.constantize
rescue
class_name = FallbackIntent
end
return class_name
end
end
As you can see, we are looking for the request request[‘intent’][‘name’]
parameter that Alexa defines by parsing our phrase. In our case, it is a GetHelp
intent. IntentSelector
class checks if we have built a class called GetHelp
, and if so, it returns a response that is defined inside and wraps it in the default response model.
class GetHelp < BaseIntent
private
def set_response
@response = %Q( I'm your office assistant.
You can ask me different questions like:
How many employees do we have?
Or how many employees are working on date?
Also, you can set a meeting by saying Set meeting on date. )
end
end
If we haven’t built the intent yet, Alexa will respond with the default response written in the BaseIntent
class:
class BaseIntent
attr_reader :response, :should_end_session, :response_schema
def initialize(request:)
set_defaults(request)
set_response
set_response_schema
@response = response_schema
end
def self.get_response(request:)
new(request: request).response
end
private
attr_reader :request
def set_defaults(request)
@request = request
@should_end_session = false
end
def set_response
@response = "Unfortuntely, I don't know this one yet. Try to say get help to get additional help."
end
def set_response_schema
@response_schema = default_schema
end
def default_schema
{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "PlainText",
"text": response,
},
"shouldEndSession": @should_end_session
}
}
end
end
Amazon also provides us with the opportunity to use variables in our communication. “Alexa, get the number of working employees for today” or “Alexa, please tell me how many people are in the office on Monday” will build a query containing the intent name and date parameter containing the current date or date on the next Monday, respectively.
IntentSelector catches the intent class, and in our case, GetWorkingEmployeeAmount
intent gets the slot value by name, as described in the BaseIntent
class.
class BaseIntent
def get_slot_value(name)
request["intent"].fetch("slots", {}).fetch(name, {}).fetch("value", nil)
end
def date_argument
get_slot_value("date")
end
def date_range(date)
date = DateTime.parse(date)
date.beginning_of_day .. date.end_of_day
end
end
After getting this value, you are free to use it in your response setter method:
class GetWorkingEmployeeAmount < BaseIntent
private
def set_response
if date_argument
amount = Account.where(last_checked_date: date_range(date_argument)).count
else
amount = Account.where(last_checked_date: date_range(DateTime.now.to_s)).count
end
date = date_argument ? date_argument : "current date"
@response = "On #{date} we have #{amount} of people checked in the system"
end
end
Furthermore, we can also use multiple variables. By default, Alexa’s API will not understand if you use more than one slot in your sentence, but you can change this behaviour by implementing a dialog structure in the code.
Let’s say we want to set up a project.
Alexa receives the request to create a project and will see that in our intent definition, we require a few variables. In this case, Alexa will send us a request to start the dialog. It will be the same intent request but with variable dialogState
in params. This parameter can hold three states: – STARTED
, IN_PROGRESS
, COMPLETED
.
When Alexa sends the STARTED
state, it will tell our application that it will be a dialog and that we will need to check each variable’s value from our side. Initially, variables can be blank, so by default, we will DENY them or leave confirmation as NONE
. We communicate this to Alexa by sending it back as Dialog.Delegate
, which updates the intent setProject
. Parameter updates point Alexa towards which intention we are working with. Alexa will check her list of required slots and will ask a question for each slot that have still not been approved on our side. Parameter updates point Alexa towards the intention we are working with. Alexa will check her list of required slots and ask a question for each slot that has not yet been approved on our side. After asking a question and getting a value, Alexa will send it to us to approve/deny, with the dialogState: IN_PROGRESS
.
This action will be looped until we approve all the values for the required slots.
The whole cycle for SetProject
intent will look like this:
After we send Alexa the last approved value, she sends us IntentRequest
with all the parameters and values, marking them as APPROVED
and changing the dialog state to COMPLETED
. After this step, our application will know that Alexa has validated all the required fields and sent their values to us. We can take all these values and work with them however we like – create a project, send invitations, emails or SMSs, etc.
The SetProject
class simply looks like this:
class SetProject < DialogIntent
private
def set_slots
@slots = ["name", "budget"]
end
def proceed_results_and_response
new_project = Project.create(title: get_slot_value("name"), budget: get_slot_value("budget"))
@response = "Project #{new_project.title} created. Budget set to #{new_project.budget}"
end
end
In set_slots
method, we define the required fields that Alexa will ask us about; in proceed_results_and_response
, we describe what we should do with the verified variables after the last step. This method works the same as set_response
from the basic intent definition.
As you can see, this class has a parent called DialogIntent
, where we built all the dialog logic and structure for building responses. It also calls set_response
as the last step but can build dialog type responses and verify the variables. I wrote a simple implementation of DialogIntent. In my code, it approves all the values that are not nil or blank, but you can update the code by simply sending lambdas or patching the generate_slots
method inside.
The dialog model is different from the normal response model, so while dialog is still IN_PROGRESS
or just STARTED
, we have to communicate with Alexa using a dialog syntax. When we receive COMPLETED
state, we should send back a normal response using a response syntax.
Conclusion
In this article, I tried to cover all the main parts of building your custom skill and communicating with it using Ruby code as a server. My implementation is quite modest but gives a good base for you to create your own complex skills that will allow you to access different data, perform CRUD actions and use any of the modern gems that Ruby provides (e.g. when you need to generate a pdf by command and send it to a group of workers, push code to a branch by command or maybe connect to other electronic devices that you can control via different APIs.). The list of possible actions and tasks that you can complete using Alexa is limited only by your imagination.
GitHub repository of the project:
https://github.com/AKovtunov/alexa_ruby_demo