Create a Twitter Politician Bot with Markov Chains, Node.js and StdLib

Written by notoriaga | Published 2017/11/08
Tech Story Tags: bots | nodejs | stdlib | javascript | twitter

TLDRvia the TL;DR App

In the world’s current political climate, propaganda is the name of the game and Twitter is the medium of choice. Automation is king, and if you’re not using Twitter bots to sway the masses, you’re doing it wrong. Here at StdLib, we don’t really have any political motivations, but we sure do enjoy building bots. And, with the launch of StdLib Sourcecode, it’s never been easier for us to share our newest project with you: introducing Jaden Trudeau, the eccentric future Prime Minister of Canada. We’ll teach you all about how we built this wonder of modern engineering, and how you can build your own “Political Terminator” (shoutout to Arnie) in minutes.

With the goal of building a Twitter bot build to appeal to the masses, we chose to combine the wisdom of Jaden Smith:

With the wholesomeness of Justin Trudeau:

To create the world’s perfect politician: Jaden Trudeau. More specifically, the goal is to create a bot that occasionally tweets procedurally generated sentences in the style of Jaden Smith and Justin Trudeau. This combination results in wonderful specimens such as:

The tool of choice is for this project is a Markov chain: Markov chains have many real world applications like Google’s Page Rank algorithm, but none are as important as this one. If you want to skip to the working version of the code, you can checkout its API page here. From this page you can try the service yourself, and even mix in other peoples Twitters!

Coming to an election near you — Jaden Trudeau

What’s the deal with Markov Chains?

We describe a Markov chain as follows: We have a set of states, S = {s_₁, s_₂,…,s_r}. The process starts in one of these states and moves successively from one state to another. Each move is called a step. If the chain is currently in state s_i, then it moves to state s_j at the next step with a probability denoted by p_ij, and this probability does not depend upon which states the chain was in before the current. [source]

A two state Markov chain [source]

In short, a Markov chain is a mathematical model that transitions from one state to another by throwing out the history of previous states and only examining the present. While that explanation is still bit abstract, it becomes more clear within the context of generating sentences. Below is an outline for how you might generate text using a Markov chain.

  1. Split a body of text (your corpus) into tokens (words and punctuation).
  2. Build a frequency table. This data structure has a key for every unique token in your corpus. This key is mapped to a list of all the words that follow the key, along with the frequency at which it occurs after that word. It also helps to add special keys for the start and end of sentences. This ensures that when sampling from the model you can always start and end sentences with appropriate words.
  3. Select a starting point (one of those special start words) and then randomly select a token from the list of tokens that follow the key. The probability that a key is chosen should be proportional to how often it appears after the key. This new token is now the state of the Markov chain. Lookup the new token in the frequency table and repeat.

Implementation

With a general idea of how to proceed, it’s time to get going. First things first, we need to fetch some tweets. With Twit, thats no problem.

After receiving the tweets, they need to be tokenized. With tweets, this is not an entirely trivial process. Tweets are full of URLs, emojis and ill formed sentences. We can turn a string representing a tweet into an array of tokens with the code below:

This function takes in a tweet, strips it of URLs and mentions and splits it into words. These arrays can then be feed into the frequency table.

The code to generate the table is a little long for a medium post, but you can see it here. After the table is generated, entries look like this:

These entries could be traversed in a few ways. At the beginning, there is a 50/50 change of selecting ‘our’ or ‘we’ as the starting word. Assuming ‘our’ gets chosen then there is a 2/5 chance that ‘future’ or ‘differences’ gets chosen and a 1/5 chance for ‘relationship’. This process keeps repeating until a chain is created such as:

__START -> our -> future -> office -> __END

And that’s pretty much it. If you want to see it in action, follow Jaden Trudeau on Twitter. Of cource, there are many tweaks that can be made. For instance, if you want to generate multiple sentences at a time you can add edges from __END to __START, and just make sure that you end with a complete sentence.

Building Your Own

If you’d like to build your own Twitter bot, you can check out the sourcecode here (you can even deploy a bot directly from the browser!). When you navigate to the page you’ll find a button labeled “Create Service from Source”. When clicked you’ll be prompted for some environment variables.

Environment variables for the Twitter bot

The first one is a StdLib library token, which you can leave as is. The other four are can be found on your Twitter application management page. Click “Create an App” and fill out the form:

Twitter application management page

After you click “Create”, you’ll find the four keys on the next page. Copy them into the prompt and click “Deploy”.

In order to see the code locally and make changes you need our CLI, which you can get it from NPM by opening up your terminal and running:

npm install lib.cli -g

And now create a StdLib workspace and get the code you just deployed:

$ mkdir stdlib-workspace$ cd stdlib-workspace$ lib init$ lib get <username>/twitter-bot

By default this bot uses the Markov chain to generate tweets. If you wanted to swap that out for another method, you could open __main__.js and make a small change:

Main function for the Twitter bot

On line 12 there is a call to lib.steve[twitter-markov-chain] , which the Markov chain from earlier. You can play with it directly from the StdLib library docs page. You can create your own function that generates tweets and simply swap it in. Now to deploy the bot again you just run

$ lib release

And that’s it, thanks for reading! Hopefully you were able to learn a little bit about Markov chains, Twitter bots and StdLib. Building a propaganda machine is just one of the many ways you can get started with StdLib. If you have a neat idea you’d like to share, reach out to me directly by e-mail: steven@stdlib.com, or follow me and the StdLib team on Twitter.

As always, we look forward to hearing from you and happy building!

Steve Meyer is a recent graduate of Oberlin College and Software Engineer at StdLib. When he’s not programming you can find him cooking, baking, or playing Breath of the Wild.


Published by HackerNoon on 2017/11/08