This is Part 1 in a series of blog posts about my adventures programming chatterbots for instant messengers in the early 2000's. In this series of posts, I'll focus on one instant messenger at a time and dive into the interesting quirks and challenges we botmakers faced when programming bots for them.
The order of the posts will roughly start "from the beginning." This is Part One: AOL Instant Messenger.
Today, Fedora 21 was released and I upgraded to it immediately, and decided not to install Skype this time.
Skype for Linux has been poorly maintained, going years between updates sometimes, and who knows what kind of unknown zero-day vulnerabilities are in there. On my previous installation of Fedora, Skype twice showed a weird issue where it replaced some of its icons with Chinese (or Korean, or something) symbols. I posted about it on Reddit and the Skype forum with no responses, as if I'm the only one who's ever seen this. Was I hacked? Maybe.
I took the latest Windows version of Skype and dumped all its icons trying to find these weird symbols but came up empty-handed. And I don't know of any way to pry icons out of the Linux binary of Skype. So... for now I just don't trust it.
In other news, I decided to Google the Skype protocol, and see what the progress is on people attempting to reverse engineer it, to be able to build an open source third-party Skype client (e.g. to have support in Pidgin). The Wikipedia article said something interesting:
On June 20, 2014, Microsoft announced the deprecation of the old Skype protocol. [...] The new Skype protocol - Microsoft Notification Protocol 24 - promises better offline messaging and better messages synchronization across Skype devices.
I wrote previously that the MSN Messenger service was still alive but it looks like it's the future of Skype as well.
The Microsoft Notification Protocol (MSNP) is the protocol used by MSN Messenger/Windows Live Messenger. I'm reasonably familiar with it from back in the day when I used to work on chatbots that signed in to MSN Messenger to accept their add requests and carry on conversations with humans.
MSNP is a plain text, line-delimited protocol similar to SMTP. There is some outdated documentation up through MSNP10 that we referenced in developing an MSN module in Perl. As an example, this is what you'd see going over the network if somebody sent a chat message to a friend:
MSG 4 N 133
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
X-MMS-IM-Format: FN=Arial; EF=I; CO=0; CS=0; PF=22
Hello! How are you?
The protocol consisted of command lines, typically three letters long (some I remember offhand: NLN
--go online, FLN
--go offline, BRB
--set status to "be right back", RNG
--request a conversation with a contact ("ring"), ANS
--accept a conversation request ("answer"), and MSG
--send a message). The RNG
and ANS
commands were invisible to the user in the official client, but allowed for some interesting behaviors in our chatbots (like immediately sending you a message the moment you open their chat window, before you even begin to type anything!)
It's interesting that Skype "downgraded" to the MSNP though, given that Skype's old protocol was an impenetrable fortress of obfuscation and encryption that nobody's ever managed to reverse engineer. Even third party clients like Trillian that had Skype support were technically using SkypeKit, a developer tool that allowed Skype to be remote controlled, but which kept all the proprietary bits a secret still. On the other hand, it makes sense for them to make Skype conform to Outlook.com and their other services that used MSNP, rather than upgrade all their other services to use the Skype protocol.
The last version of Windows Live Messenger used MSNP19 (or MSNP21 depending who you ask), so the new Skype protocol is just the next version of MSNP.
In Googling MSNP24 I found this site where efforts are already underway at reverse engineering it. This other site has a lot more details on the current status of the Messenger protocol.
It's only a matter of time now until Pidgin can natively support Skype accounts. It will also be fun to program chatbots for Skype. :)
In an older blog post I mentioned getting Alicebot Program V, a Perl implementation of an Alice AIML bot, up and running on a modern version of Perl, but I didn't go into any details on what I actually did to get this to work (which wasn't very nice of me ;) ).
Unfortunately for me I also didn't even write down any notes for myself, so I had to figure it out again, myself, from scratch. Which I decided to do, only this time I wrote down some notes, and published a new, updated version of Program V which you can check out on GitHub -- or for the lazy, get a zipped release of Program V 0.09.
This blog post is about what I needed to do to get Program V up and running, some of which is also outlined in UpdateNotes.md
on the GitHub repo.
Update #2 (12/02/2014): I've updated ProgramV 0.08 and released a new version, 0.09, which should work out-of-the-box on modern Perl. If you're interested in ProgramV, check it out on GitHub or read my blog post about the update.
The original blog post follows.
Numerous years ago (2002 or 03) while I was a newb at programming my own chatterbots in Perl (and a newb at Perl in general), there was this program called Alicebot Program V - an implementation of an Alice AIML chatbot programmed in Perl.
When I moved from RunABot (hosted AIML bots) and Alicebot Program D (a Java AIML bot) to Perl, I had to give up using AIML for my bots' response engines because there weren't any simple Perl solutions for parsing AIML code. There was only Program V, and Program V is a monster! I could never figure out how to get it to actually run, and, while it had a dozen Perl modules with it that deal with the AIML files, these modules are too integrated together to separate and use in another program.
And then Program V's site went down and for several years I couldn't find a copy of Program V anymore to give it another shot.
Since I effectively could not use AIML for my Perl bots, I eventually developed an alternative bot response language called RiveScript, and it's text-based instead of XML-based, so it's super easy to deal with. At this point I don't care much for AIML any more, because my RiveScript is more powerful than it anyway.
The only thing RiveScript is missing though is Alice - the flagship bot personality of AIML. Alice has about 40,000 patterns that it can respond to. Users chatting with Alice won't get bored with the conversation for a long, long time. Alice has a large enough reply set that you'll have to chat with it a lot before you can start predicting how she might reply to your next message.
RiveScript, being (relatively) new (and not yet as popular as AIML), hasn't seen any large projects like Alice. Alice was written by Dr. Wallace, who created AIML; should I, as the creator of RiveScript, create an Alice-sized reply base myself? Ha. I wish I had that kind of free time on my hands.
So for a really long period of time I was trying to create an AIML-to-RiveScript translator, so that Alice's 40,000 responses of AIML code could become 40,000 responses of RiveScript code. I've finally accomplished this with a really good degree of success recently. So mission accomplished.
Now, while searching for something unrelated, I managed to come across a site that hosted a copy of the Program V code. Now that I'm much more awesome at Perl than I was back then, I downloaded it and tried getting it set up. It took a bit of tinkering (it was programmed on Perl 5.6 and some things have changed between then and 5.10) but I got it up and running.
Program V works in two parts: first you run a "build script," which reads and processes the AIML code and builds a kind of database file (really it's just the result of Data::Dumper, dumping out a large Perl data structure). And then you run the actual bot script, which just loads this data structure from disk. This is because the Alice AIML set (40K replies) takes about 3 to 4 minutes to load, but the Perl data structure takes milliseconds to load. So you build it first to save lots of time when actually running it.
I was impressed at how fast the bot could reply, too. Most of its replies were coming back in 8 milliseconds or less. In contrast, when I load Alice's brain in RiveScript... it takes 5 seconds to load all the RiveScript code from disk (much faster than Program V's loading of AIML), and then Alice will usually reply in less than 1 second, but longer than 8 milliseconds. So, I had a look at this data structure that Program V creates.
In the data structure, all the patterns in AIML are put into a hash, and categorized by the first word in the pattern. Here's just a snippit:
The following patterns are represented here: ITS * ITS BORING ITS FUN ITS GOOD * $data = { aiml => { matches => { 'ITS' => [ '* <that> * <topic> * <pos> 17818', 'BORING <that> * <topic> * <pos> 17819', 'FUN <that> * <topic> * <pos> 17820', 'GOOD * <that> * <topic> * <pos> 17821', ], }, }, };So, since all these patterns began with the word "ITS", they're all categorized under the "ITS"... each item in the array begins with the rest of the pattern (after the word ITS), and then there's separators for the "that", "topic", and "pos" (position). In all the examples here, these patterns had no 'that' or 'topic' tags associated with them, so there's only *'s here. The position is an array index.
The templates (responses to these patterns) are all thrown together into a single large array. All those positions listed in the "matches" structure? Those are array indices.
You might need to know a little Perl to see the performance boost here. A good number of patterns in Alice's brain begin with the same word. So when it's time to match a reply from the human, the program can use the first word in your message as a hash key (say you said "It's good to be the king", it would look up the array above based on the word "ITS"). With Alice's brain, there'd be only a few hundred unique first words to patterns. So this is a relatively small hash, and looking up one of these keys such as "ITS" is really fast. Then, each of the arrays here are relatively small, and the program just loops through them to find out if any of them match your message (taking into account the `that`'s and `topic`s too).
When it finds a match, it has an array index of the template for that match. Pulling an array item by index in Perl is even more wicked fast than a hash. So almost instantaneously you can get a response back.
Compared to the Perl module, RiveScript.pm's, data structure, the one used by Program V is much more efficient. In RiveScript.pm everything is arranged in a hierarchy, sorted by: topic, pattern, reply. RiveScript uses arrays in the end to organize the patterns in the most efficient way possible, but when it comes to actually digging out data from this giant hashref structure, it's a little slower than just using an array like Program V.
Still, though, 1 second response times for a brain that contains 40,000 patterns isn't bad. But I might need to think about recoding RiveScript.pm to use more efficient data structures like Program V.
I have a copy of Program V hosted on RiveScript.com here: programv-0.08.tar.
And today is one such day that I tried to SSH home and got a "Connection refused", because my IP had changed. My bot told me what my new address was, and all was good.
Which brings me to my next point: the program I wrote for my bots I named AiRS (Artificial Intelligence: RiveScript). If any of my readers know me from when I used to run AiChaos.com, this bot is sort of like my Juggernaut and Leviathan programs. The bot can run multiple connections (mirrors) to multiple listeners (so far, AIM and HTTP, but I'll be adding MSN Messenger support shortly). Unlike Juggernaut and Leviathan, though, the program uses RiveScript and RiveScript only as the reply engine.
By the time I release the program it will definitely have support for AIM, MSN, IRC, and HTTP. Other listeners? Maybe, maybe not. Jabber is a possibility. It will depend on what existing modules are available, how usable they are, how well they work, etc.
At any rate, you can chat with my AIM bot by sending an IM to AiRS Aiden. So far it just chats, but it can also tell you how the weather is, play mad libs, and do a couple less cool things.
0.0036s
.