Create a Go Json Parser: Batteries Included

Written by j.d.livni | Published 2018/11/20
Tech Story Tags: golang | parser | programming | json | go-json-parser

TLDRvia the TL;DR App

The inspiration for this post came from a project at work. I was building a service that required the comparison of two Json objects. The catch was that I needed to be able to replace keys, filter out paths, and apply comparison functions to specific nodes.

Obviously, a standard library comparison function such as reflect.DeepEqual() would not work. 😧

The solution was to build a AST(Abstract Syntax Tree) modeled off of the Json objects. Every Node in the tree represents either a string, integer, array, or object.

By doing this I would allow for the flexibility to more easily apply algorithms onto the data.

To build this we’ll start with the Lexer to generate Tokens. Then move onto the Parser which will take the tokens and match them to Json grammar. Finally, we’ll add AST hooks to generate the tree.

The final directory structure:

.main.go/lexerlexer.golexer_test.go/tokentoken.go/parserparser.go/astast.go

If you want to see and run the final results:

cd $GOPATH/src/github.com/Lebonescogit clone https://github.com/Lebonesco/json_parser.gitgo run main.go ./examples/test.json

Lexer

The lexer’s job is to take in the json data and convert it into a stream of tokens. These tokens include: INVALID, EOF, COMMA, COLON, LBRACE, RBRACE, LBRACKET, RBRACKET, STRING, and INTEGER.

Note: A lexer is also referred to as a scanner.

Let’s start down below 👇

cd $GOPATH/src/github.com/Lebonesco/json_parsermkdir tokencd tokentouch token.go

You have some freedom in how you want to define your tokens. The more data you add to a token the easier it is to debug.

Note: We’ll be using a rune array, []rune, to store our token literals to allow for Unicode characters.

Next, let’s jump into our lexer 👍

mkdir lexercd lexertouch lexer.gotouch lexer_test.go

The lexer will track where we are in the input and necessary character look aheads.

In terms of functionality, it will need to be able to create a new token and peak ahead to the next one.

Note: This scanner doesn’t support Boolean values, but they could easily be added.

Lexer Test

Here we will take in a json string and make sure that it outputs the correct stream of tokens.

To run the test:

go test -v=== RUN TestLexer--- PASS: TestLexer (0.00s)PASSok github.com/Lebonesco/json_parser/lexer 0.433s

You now have a working lexer 🎉 🎉 🎉

Parser

This is the part where we take our stream and match it with json grammar to produce AST nodes.

If we were to define json in terms of regular expressions it would be represented by the grammar defined below 👇

JSON : valueValue : Object | Array | String | Integer | Bool | NullArray : '[' [Value] {',' Value}']'Object : '{' [Property] {',' Property} '}'Property : String ':' Value

In the above syntax, [expression] means the expression occurs one or more times and {expression} means that it occurs zero or more times.

There are tools like gocc that will generate a lexer and/or parser if you provide the regular expressions. That’s the recommended way to go if you’re dealing with something more complicated.

But since Json is fairly simple, we’ll do it by hand! 👐

Let’s construct the AST nodes that will represent our final results.

mkdir astcd asttouch ast.go

The nodes are fairly simple. If we cared more about error handling and tracking node hashes, like in my use case, we could store more data.

Note: Because Go uses composition instead of inheritance we need to apply the TokenLiteral() method to each node type in order for each to be interpreted as a Json node.

Now onto the Parser!

Let’s bring it all together and write our driver, main.go.

Note: json.MarshalIndent() is a nice substitute to json.Marshal() to get prettier json outputs.

To run:

go run main.go ./exampes/test.json

All Done 😄

Now that we can generate an AST, it’s easy to add a rolling hash to each node and do all of the other custom operations such as:

  • substitute node values
  • filter out nodes
  • apply special comparison functions to nodes

I’ll leave those extensions to the reader. Feel free to drop links to your work in the comments. I’d love to see what you come up with. 👍

Thank you for taking the time to read this article.

If you found it helpful or interesting please let me know 👏👏👏.


Published by HackerNoon on 2018/11/20