Create a Wikipedia Bias Tracker in 4 Steps with StdLib and IBM Watson

Written by notoriaga | Published 2018/01/02
Tech Story Tags: ibm-watson | wikipedia-bias-tracker | wikipedia | bias | fact-checking

TLDRvia the TL;DR App

Wikipedia is an invaluable resource when trying to get some quick information on a topic. For more rigorous research, it can provide a place to start and find more sources via the references attached to an article. Since Wikipedia is an open platform,subscribe to five pillars of conduct to prevent abuse. One of them is that articles most be written with a ‘neutral point of view’. One important part of writing from neutral point of view is using an impartial tone:

Even where a topic is presented in terms of facts rather than opinions, inappropriate tone can be introduced through the way in which facts are selected, presented, or organized. Neutral articles are written with a tone that provides an unbiased, accurate, and proportionate representation of all positions included in the article.

The question is, how well does Wikipedia maintain this neutral point of view, in particular the impartial tone. Using IBM Watson’s Natural Language Understanding API we can attempt to examine this question. The API lets you look for emotions and sentiment in text. In an ideal world, both emotion and sentiment for a neutral text should be near zero. To check this you can use IBM Watson’s API and Wikimedia’s API, glued together with StdLib.

If you haven’t heard StdLib before, we’re the fastest way to build backend web services and ship real business value. Built on “serverless” architecture, you never have to worry about managing servers or allocating resources for scale. Write a function, deploy, and you’re ready to go! We also have a growing ecosystem of integrations contributed by other developers that are easy to use.

Step 1: Sign Up for StdLib

Getting started with StdLib is easy — head over to our website, choose pick a username and hit “Claim Namespace”. Later on, you’ll go over getting our CLI and deploying services. But first, you need to set up the the database for this application.

Step 2: Setup Watson

Head here to get started creating an account with Bluemix. After confirming your email, you can go to the Watson page. Click ‘Create Watson Service’ and you’ll be presented with some options.

Pick ‘Natural Language Understanding’. All the default options are fine, just make sure the ‘Lite’ (free) plan is selected!

After clicking ‘Creating’ and landing on the new page, you’ll see a tab called ‘Service Credentials’ on the top left. Click ‘New credential’ which give you a dropdown where you can find your username and password for your service. Take note of these, you will use in just a bit.

Step 3: Setup Wikimedia API

Wikimedia’s API, and thus Wikipedia’s, does not require any authentication. It can, however, be a little confusing to use. So, for convenience, I’ve published a StdLib service that wraps some common operations. For example, if you wanted the top 1000 most viewed articles for given month and year, you could call, using the lib npm package:

Or if you wanted the top 100 most edited articles of last year: Both of those endpoints return the urls for the articles, which plays well with the service you are about to create.

Step 4: Putting it all together

You’re going to use StdLib to compose the Watson and Wikimedia APIs. In order to use StdLib, you’ll need to get the command line tools, available here on GitHub. First, if you don’t have Node.js installed you can download the latest version, along with npm, here. Now with Node installed you can get the StdLib CLI by opening up a terminal and running:

$ npm install lib.cli -g

Now create a workspace and navigate to it with:

$ mkdir stdlib-workspace$ cd stdlib-workspace$ lib init

Next, get the code for the Waston list by running:

$ lib create -s @steve/watson

You’ll be prompted to give the new service a name, the rest of this tutorial assumes it’s still watson . Now you can navigate to the new service with:

$ cd <username>/watson

Now you can open the env.json file found in the root of the directory. There are two fields. Copy and paste the username and password from when you created your IBM account into their respective places.

Now simply run:

$ lib up dev

With that command, your Watson API is live in a mutable development environment. It exposes just one endpoint __main__ that takes an array of urls. For each url, it gathers sentiment and emotion data from that page and averages the results across all texts. Watson does some cleaning of the page, so you don’t have to worry about ads and the like. If you want to try the service from the command line you can enter:

$ lib <username>.watson '["https://en.wikipedia.org/wiki/IBM"]'

Which will return results for just the Wikipedia article about IBM. This service will work with any webpage that has text, so you could try your favorite (or least favorite) news source for example.

With this endpoint set up, you can perform some simple analysis. Just open a text file and enter the code below. It uses the Wikipedia wrapper to get the most viewed articles of December 2017, then feeds that into the Watson API.

Running this script, with $ node example.js , should give you some results like:

{ sentiment: -0.037099235591900016,emotions:{ sadness: 0.21031268868868871,joy: 0.45655976376376384,fear: 0.07352899499499507,disgust: 0.0967122352352352,anger: 0.07937950750750744 }}

The sentiment category is on a scale from (-1, 1), lower means negative sentiment and higher means positive. The emotions are on a scale from (0, 1) where a higher number means that emotions is more present. Ideally, all numbers should be close to 0, for an unbiased article. For the top 1000 most viewed articles of December 2017, this seems to be true, baring joy scoring relatively high. Now if you wanted to look at the most edited (and thus most contested) articles you could use:

Which returns:

{ sentiment: -0.48160470200000033,emotions:{ sadness: 0.39764493939393936,joy: 0.1554383434343435,fear: 0.13799211111111104,disgust: 0.08601269696969707,anger: 0.2993595656565651 }}

These numbers are quite different. Overall, they are much more negative and anger, as well as sadness, are more prominent. This makes sense, considering articles that are edited frequently are probably on contentious topics.

Thank You

And that’s it, thanks for reading! Hopefully you were able to learn a little bit about composing APIs with StdLib. If you have a neat idea you’d like to share, reach out to me directly by e-mail: [email protected], or follow me and the StdLib team on Twitter.

As always, we look forward to hearing from you and happy building!

Steve Meyer is a recent graduate of Oberlin College and Software Engineer at StdLib. When he’s not programming you can find him cooking, baking, or playing Breath of the Wild.

Image credit, HackerNoon AI Image Generator prompt of ‘wikipedia bias tracker.’


Published by HackerNoon on 2018/01/02