RiveScript Rewritten in CoffeeScript

Noah Petherbridge
kirsle
Posted by Noah Petherbridge on Wednesday, April 22 2015 @ 12:47:51 PM

I've spent the last few months porting over the JavaScript version of my chatbot scripting language, RiveScript, into CoffeeScript. I mainly did this because I didn't like to maintain raw JavaScript code (especially after I ran a linter on it and had to rearrange my variable declarations because JavaScript's variable scoping rules are stupid), but I also took the opportunity to restructure the code and make it more maintainable in the future.

For some historical background, the first RiveScript implementation was written in Perl, and I designed it to be one big monolithic file (RiveScript.pm), because then I could tell noobs that they can update to a new version of RiveScript in their bots by just dropping in a new file to replace the old one, and avoid overwhelming them with the complexity of doing a CPAN installation (especially if they're generally new to Perl as well, and all they really wanna do is just run a chatbot.)

Source File Restructuring

The Python and JavaScript ports are more-or-less direct ports of the Perl version: I literally read the Perl source from top to bottom and translated it into each respective language. So, there was rivescript.js which was a huge 2,900 line long wall of JavaScript and one of my goals in the refactor was to spread logic out into multiple files, so if I have a bug to fix in the reply matching code, I don't have to scroll through pages and pages of loading/parsing code to get to the relevant part.

The new file structure is like this:

  • rivescript.js - The user-facing API, has all the same public functions as the old one
  • parser.js - A self-contained module that loads RiveScript code into an "abstract syntax tree" (a JSON serializable blob that represents ALL of the parsed RiveScript code in a program-friendly format).
  • sorting.js - The rat's nest that is the implementation behind sortReplies() is contained here.
  • inheritance.js - Functions related to topic inheritance/includes are here.
  • brain.js - The code that actually gets a reply is here.
  • utils.js - All those miscellaneous internal utility functions, like quotemeta() and such.
  • lang/javascript.js - The implementation for JavaScript object macros in your RiveScript code.
  • lang/coffee.js - This is new - you can use CoffeeScript in your object macros (it's not enabled by default, you'll probably have to snipe lang/coffee.js and include it in your own project for now, but the built-in shell.coffee uses it out-of-the-box).

Logic Refactoring

Another thing that was messy in the Perl, Python and JS versions of RiveScript was how it lays out its internal data structures in memory. If you'd ever run the equivalent of Data::Dumper you'd see reply and trigger data appearing in multiple places depending on whether it had a %Previous tag on it or not.

It looked something like this:

$RiveScript = {
    "topics" => {
        # Most reply data is under here, BUT NOT triggers with %Previous!
        "random" => { # (topic names)
            "hello bot" => { # (trigger texts)
                "reply" => { # (replies to the trigger)
                    0 => "Hello, human!",
                    1 => "Hi there!",
                },
                "condition" => { # (conditions)
                    0 => "<get name> != undefined => Hello, <get name>!",
                },
                "redirect" => undef, # (if there's an @redirect)
            },
            # ...
        },
    },
    "thats" => {
        # This is like 'topics' but ONLY for triggers with %Previous
        "random" => { # (topic names like before)
            "who is there" => { # (the %Previous text)
                "*" => { # (trigger text)
                    # then things were like the above
                    "reply" => { 0 => "<sentence> who?" },
                    "condition" => {},
                    "redirect" => undef,
                }
            }
        }
    },

    # And then sorting! Again the 'normal' replies are completely
    # segregated from those with %Previous
    "sorted" => {
        "random" => [ # (topic names)
            "hello bot", # (triggers, sorted)
            "*",
        ],
    },
    "sortsthat" => {
        # This one is simply a trigger list for ones that have %Previous
        "random" => [ # (topic names)
            "*",
        ]
    }
    "sortedthat" => {
        # This one is, in case one %Previous has more than one answer,
        # if we ONLY had the above sort we'd overwrite the first answer
        # with the second.. this one keeps track of all "replies" with
        # the same %Previous
        "random" => {
            "who is there" => [
                "*",
            ]
        }
    },
}

As you can see, it was a mess. There were a ton of different places where reply data was kept, and triggers had to be sorted many different ways for various edge cases. Additionally, keeping track of which topics inherit or include others was kept in a separate data structure from the topics themselves!

In the new refactor of RiveScript-js I eliminated as much duplication as possible. Now, the entirety of the reply base exists under the topics key, and instead of using lots of nested dictionaries, e.g. ->{topic}->{trigger}->{reply} which had the issue of one trigger overwriting the data for another if they both happened to have the same text (e.g. when you have two answers to the same %previous question), the ordering of the triggers is preserved. The data structure ends up looking like this:

{
    "topics": { // main reply data
      "random": { // (topic name)
        "includes": {}, // included topics
        "inherits": {}, // inherited topics
        "triggers": [ // array of triggers
          {
            "trigger": "hello bot",
            "reply": [], // array of replies
            "condition": [], // array of conditions
            "redirect": "",  // @ redirect command
            "previous": null, // % previous command
          },
          ...
        ]
      }
    }
}

Additionally, whenever the code refers to reply data (for example, in the sort buffers and the %previous tree), it refers to a specific index in the singular topic structure for the reply data. This has the other side benefit that, while getting a reply for the user, when a matching trigger is found it already has the pointer to that trigger's responses right away. It doesn't have to look it up from the central topic structure like before (and since the replies are kept on an array, this would be impractical now anyway).

Keeping the replies on an array also naturally takes care of the issue with multiple responses to the same %previous without needing a third sort buffer (we still do need a separate sort buffer for %previous replies themselves, though).

Other Implementations

The Perl and Python versions probably won't get updated anytime too soon to fit this new structure, but any future ports of RiveScript to other languages that I work on will be based off this new CoffeeScript version. I have some vague plans right now to port RiveScript over to Google's Go language, and I wanted to get the refactor out of the way first so I have a new "template" to reference when writing a new port.

So, Why CoffeeScript?

To elaborate more on why I rewrote it in CoffeeScript instead of just doing this restructure in Node-style JavaScript:

  • A while back I started JS-linting the JavaScript codebase and fixing all of the mistakes it pointed out.
  • JavaScript has really weird variable scoping issues, where all variables declared within a function, no matter where it's declared, gets implicitly "hoisted" to the top of the function as though you had declared it up there instead. Sane programming languages have lexical scoping where variables declared inside of a { block } of code (including loop variables, i.e. for (my $i = 0; ...)) die with that block's closing brace, for example Perl does it this way.
  • In numerous places in my code, I would have a long function that does a loop over "something" in more than one place, always doing for (var i = 0; ...). Declaring var i multiple times in the same function is the error, so I had to take all the reused variable names and make this ugly, var i, iend, j, jend, match; at the tops of functions.
  • CoffeeScript has a clean syntax compared to JS (it looks quite like Python), and it automagically handles your variable declarations. You simply use a variable (like Python), and CoffeeScript handles when and where to declare it in the JavaScript output.

As for all the haters that say things like, "ECMAScript 6 makes CoffeeScript obsolete because it adds classes and the arrow operator and everything else into the core JavaScript language": I see no reason at all that CoffeeScript can't one day compile into ECMAScript 6 the way that it does into ES5 right now, so it's not like CoffeeScript is a dying language that will no longer be maintained in the future. In a worst case scenario, I could program a CoffeeScript to ES6 compiler myself if nobody else will. It's not as if I have no experience writing scripting language parsers. ;)

And with GitHub backing it (they use CoffeeScript to program my favorite text editor, Atom), the language isn't going anywhere anytime soon.

Categories:

[ Blog ]

Comments

There are 2 comments on this page.

Avatar
guest
Posted on Thursday, April 23 2015 @ 09:15:42 PM by dcsan.

Awesome! I'm a big coffee fan too!

Did you think about just running js2coffee on your whole codebase?

The parser blob sounds interesting. Is that exposed anywhere to be got at?

I have my own hacked up version of rive, but would like to upgrade to this...

Avatar
kirsle
Posted on Friday, April 24 2015 @ 03:59:31 PM by Noah Petherbridge.

The parser helper is an attribute of the RiveScript object under .parser.

> RiveScript = require("rivescript")
> bot = new RiveScript()
> data = bot.parser.parse("fname.rive", "+ hello bot\n- Hello human.")
> console.log(JSON.stringify(data, null, 2))
{
  "begin": {
    "global": {},
    "var": {},
    "sub": {},
    "person": {},
    "array": {}
  },
  "topics": {
    "random": {
      "includes": {},
      "inherits": {},
      "triggers": [
        {
          "trigger": "hello bot",
          "reply": [
            "Hello human."
          ],
          "condition": [],
          "redirect": null,
          "previous": null
        }
      ]
    }
  },
  "objects": []
}

Docs

The filename param is only used for error reporting (like syntax errors, so it can say "at fname.rive line 5")

Add a Comment

Your name:
Your Email:
Message:
Comments can be formatted with Markdown, and you can use
emoticons in your comment.

If you can see this, don't touch the following fields.



Kirsle
Channels
Creativity
Software
Web Tools
Subdomains
Miscellany
Links


Fan Club