08. April 2017

What I learned building an api.ai webhook

api.ai lets you build conversational chatbots that interface easily with lots of existing messaging services. The promise is: “define your logic once (no coding!!!), reach users across many platforms”. Trouble is these bots are kind of oblivious. If you haven’t talked to them in the last 10 minutes, they will likely have forgotten about you and what the two of you have been talking about the last time.

I recently tried to build a persistence layer for such a bot using the “webhook” feature to trigger an HTTP endpoint that will process the response before it’s being sent to the client, enabling api.ai to access data stored from conversations in the past.


Contexts

The core concept of conversation state in api.ai is called contexts, which is just a list of objects that can contain basically anything. An example could look like:

{    
     "contexts": [
         {
            "stage": "onboarding",
            "lifespan": 4
         },
         {
            "source": "facebook",
            "lifespan": 99
         },
         {
            "name": "generic",
            "parameters": {
              "telegram_chat_id": "abc123"
            },
            "lifespan": 1
         }
     ]
}

This context gets lost after a while of inactivity, so if you would want to revive the last state of the conversation, you’d have to persist the last set of contexts of the conversation.

Webhook

api.ai lets you connect your bot something they call webhook. A webhook is nothing but a web application that the bot will call before responding to the user. The webhook gets sent the bots response can alter and augment this response according to custom logic or by adding data from other third party services.

        ----> CHAT MESSAGE ---->         ----> HTTP POST ---->

 | CLIENT |                     | API.AI |                 | WEBHOOK | - ??? ->

        <------------------- CHAT MESSAGE <-------------------

Architecture

A webhook that should persist the user’s context is therefore pretty straightforward:

  • whenever api.ai sends a payload with present contexts, identify the user and save the passed contexts
  • in case no contexts are available in the payload, see if we can identify the user and return the last contexts that we stored
  • in case the user is new, we will just save his id in the database so we can store contexts for him or her
  • the api.ai text responses will be simply passed through in almost all cases

You might have noticed that this architecture has one pitfall: in case api.ai has already forgotten about a user, and the user will try to talk to the bot again, it will send an empty contexts list to the webhook. In this case we will fetch the stored contexts, but api.ai is unable to post-process this data in the first step. This is why the first response always has to be something like “Whoa, I just woke up, could you say that again?”. When the user then continues the conversation, api.ai is able to pick up the contexts and react accordingly.

Latency

The first thing you will soon notice is that api.ai has pretty strict rules about response times for webhooks. If your webhook will not reply in time (I think the current limit is 5000ms, which is much shorter than it sounds), it will omit the results (even if they will be sent later on). This means that if you are using some free service or your webhooks relies on a third party, you have to make sure this happens in a performant manner, otherwise you’ll see confused and frustrated users.

In the beginning I had the webhook application running on a Heroku Free Plan, but it turned out that the Dyno regularly falling asleep would result in a cascade of timeouts. It wasn’t even possible to use that for development.

It’s also very hard to tell where the api.ai server that will call your webhook the next time is located, so the geographic location of your webhook also matters a lot.

Sending multiple messages

In the interface of api.ai, you can respond to a message using multiple messages. For example, you could send two text bubbles and a picture. api.ai will format this properly for all connected messenger applications so they will receive what they want.

If you want to do this using a webhook, this will turn out to be a pretty difficult task as the response format only knows about a single text field for the response.

I did not implement this behavior, as I decided to simply persist the state at certain stages (i.e. only those where a single text bubble is an okay answer) only, but there should be two options to add this behavior to a webhook:

  • you deparse the messages in the payload and manually transform them into the correct formats used for each messenger you support (basically rendering the promise of “build once” invalid)
  • you use api.ai events and push the following messages to the user at a later point. Enabling pushing unfortunately requires human approval by the messaging service for spam fighting reasons, so it’s also quite a stretch

Rich messages

The same issue arises when you try to use built-in functionality in the messenger application, e.g. buttons to select from a set, also known as “rich messages”. api.ai can do this for you when not calling a webhook, but if you need a message to pass your webhook, you will have to start writing custom per-messenger logic. Just like above, it’s probably better to just skip these messages and model your bots logic differently.

Session identifiers are not what you think they are

Each time api.ai will call your webhook, it will send a sessionId parameter. When I started building the application it seemed to me that this was a unique value that would always be used for the same device / messenger combination. So I decided to use this id as the identifier for a user in the database.

After a time we found users that the webhook would inexplicably forget about for a while, just to know about them again a few hours later. After digging into lots of rabbit holes it turned out that these sessionId values are not stable. You might be assigned abc123 today, def456 tomorrow and abc123 again the day after tomorrow.

Luckily, most messengers will send service specific ids that are actually stable, so I started using these for identifying a user in the database:

VENDORS = %w(facebook_sender_id telegram_chat_id slack_user_id).freeze

# service specific ids are more reliable than the `sessionIds` that will
# be sent by api.ai
def get_unique_key(payload)
  contexts = payload.fetch('result', {}).fetch('contexts', [])
  generic_contexts = contexts.select { |ctx| ctx['name'] == 'generic' }
  generic_parameters = generic_contexts.fetch(0, {}).fetch('parameters', {})
  vendor_key = VENDORS.find { |vendor| generic_parameters.fetch(vendor, false) }
  generic_parameters[vendor_key] || payload.fetch('sessionId', nil)
end

Would I do it again this way? Don’t know. This “little” thing I intended to build turned out to be a lot more complicated than I thought. I expected an application that saves some part of a payload and echoes the old one, but it turned out to be something with quite a few pieces of intricate logic going on. Also, the approach of connecting api.ai to a webhook is not measuring up on the expectations of “build once, run anywhere”. You either have to live with some trade offs or will end up implementing per-messenger features.