Digging in Deep: Solving a Real Problem with Haskell Tensor Flow

Written by james_32022 | Published 2017/07/10
Tech Story Tags: machine-learning | artificial-intelligence | python | haskell | software-development

TLDRvia the TL;DR App

Last week we got acquainted with the core concepts of Tensor Flow. We learned about the differences between constants, placeholders, and variable tensors. Both the Haskell and Python bindings have functions to represent these. The Python version was a bit simpler though. Once we had our tensors, we wrote a program that “learned” a simple linear equation.

This week, we’re going to solve an actual machine learning problem. We’re going to use the Iris data set, which contains measurements of different Iris flowers. Each flower belongs to one of three species. Our program will “learn” a function choosing the species from the measurements. This function will involved a fully-connected neural network.

Formatting our Input

The first step in pretty much any machine learning problem is data processing. After all, our data doesn’t magically get resolved into Haskell data types. Luckily, Cassava is a great library to help us out. The Iris data set consists of data in .csv files that each have a header line and then a series of records. They look a bit like this:

120,4,setosa,versicolor,virginica6.4,2.8,5.6,2.2,25.0,2.3,3.3,1.0,14.9,2.5,4.5,1.7,24.9,3.1,1.5,0.1,0...

Each line contains one record. A record has four flower measurements, and a final label. In this case, we have three types of flowers we are trying to classify between: Iris Setosa, Iris Versicolor, and Iris Virginica. So the last column contains the numbers 0,1, and 2, corresponding to these respective classes.

Let’s create a data type representing each record. Then we can parse the file line-by-line. Our IrisRecord type will contain the feature data and the resulting label. This type will act as a bridge between our raw data and the tensor format we’ll need to run our learning algorithm. We’ll derive the “Generic” typeclass for our record type, and use this to get FromRecord. Once our type has an instance for FromRecord, we can parse it with ease. As a note, throughout this article, I’ll be omitting the imports section from the code samples. I’ve included a full list of imports from these files as an appendix at the bottom. We'll also be using the OverloadedLists extension throughout.

{-# LANGUAGE DeriveGeneric #-}{-# LANGUAGE OverloadedLists #-}

...

data IrisRecord = IrisRecord  { field1 :: Float  , field2 :: Float  , field3 :: Float  , field4 :: Float  , label  :: Int64  }  deriving (Generic)

instance FromRecord IrisRecord

Now that we have our type, we’ll write a function, readIrisFromFile, that will read our data in from a CSV file.

readIrisFromFile :: FilePath -> IO (Vector IrisRecord)readIrisFromFile fp = do  contents <- readFile fp  let contentsAsBs = pack contents  let results = decode     HasHeader contentsAsBs :: Either String (Vector IrisRecord)  case results of    Left err -> error err    Right records -> return records

We won’t want to always feed our entire data set into our training system. So given a whole slew of these items, we should be able to pick out a random sample.

sampleSize :: IntsampleSize = 10

chooseRandomRecords :: Vector IrisRecord -> IO (Vector IrisRecord)chooseRandomRecords records = do  let numRecords = Data.Vector.length records  chosenIndices <-     take sampleSize <$> shuffleM [0..(numRecords - 1)]  return $ fromList $ map (records !) chosenIndices

Once we’ve selected our vector of records to use for each run, we’re still not done. We need to take these records and transform them into the TensorData that we’ll feed into our algorithm. We create items of TensorData by feeding in a shape and then a 1-dimensional vector of values. First, we need to know the shapes of our input and output. Both of these depend on the number of rows in the sample. The “input” will have a column for each of the four features in our set. The output meanwhile will have a single column for the label values.

irisFeatures :: Int64irisFeatures = 4

irisLabels :: Int64irisLabels = 3

convertRecordsToTensorData :: Vector IrisRecord                            -> (TensorData Float, TensorData Int64)convertRecordsToTensorData records = (input, output)  where    numRecords = Data.Vector.length records     input = encodeTensorData       [fromIntegral numRecords, irisFeatures] (undefined)    output = encodeTensorData       [fromIntegral numRecords] (undefined)

Now all we need to do is take the various records and turn them into one dimensional vectors to encode. Here’s the final function:

convertRecordsToTensorData :: Vector IrisRecord                            -> (TensorData Float, TensorData Int64)convertRecordsToTensorData records = (input, output)  where    numRecords = Data.Vector.length records     input = encodeTensorData       [fromIntegral numRecords, irisFeatures]      (fromList $ concatMap recordToInputs records)    output = encodeTensorData       [fromIntegral numRecords]       (label <$> records)    recordToInputs :: IrisRecord -> [Float]    recordToInputs rec =       [field1 rec, field2 rec, field3 rec, field4 rec]

Neural Network Basics

Now that we’ve got that out of the way, we can start writing our model. Remember, we want to perform two different actions with our model. First, we want to be able to take our training input and train the weights. Second, we want to be able to pass a test data set and determine the error rate. We can represent these two different functions as a single Model object. Remember the Session monad, where we run all our Tensor Flow activities. The training will run an action that alters the variables but returns nothing. The error rate calculation will return us a float value.

data Model = Model  { train :: TensorData Float -- Training input          -> TensorData Int64 -- Training output          -> Session ()  , errorRate :: TensorData Float -- Test input              -> TensorData Int64 -- Test output              -> Session Float  }

Now we’re going to build a fully-connected neural network. We’ll have 4 input units (1 for each of the different features), and then we’ll have 3 output units (1 for each of the classes we’re trying to represent). In the middle, we’ll use a hidden layer consisting of 10 units. This means we’ll need two sets of weights and biases. We’ll write a function that, when given dimensions, will give us the variable tensors for each layer. We want the weight and bias tensors, plus the result tensor of the layer.

buildNNLayer :: Int64 -> Int64 -> Tensor v Float             -> Build (Variable Float, Variable Float, Tensor Build Float)buildNNLayer inputSize outputSize input = do  weights <- truncatedNormal (vector [inputSize, outputSize]) >>=    initializedVariable  bias <- truncatedNormal (vector [outputSize]) >>=    initializedVariable  let results = (input `matMul` readValue weights) `add`                 readValue bias  return (weights, bias, results)

We do this in the Build monad, which allows us to construct variables, among other things. We’ll use a truncatedNormal distribution for all our variables to keep things simple. We specify the size of each variable in a vector tensor, and then initialize them. Then we’ll create the resulting tensor by multiplying the input by our weights and adding the bias.

Constructing our Model

Now we’ll start building our Model object, again within the Build monad. We begin by specifying our input and output placeholders, as well the number of hidden units. We’ll also use a batchSize of -1 to account for the fact that we want a variable number of input samples.

irisFeatures :: Int64irisFeatures = 4

irisLabels :: Int64irisLabels = 3-- ^^ From above

createModel :: Build ModelcreateModel = do  let batchSize = -1 -- Allows variable sized batches  let numHiddenUnits = 10  inputs <- placeholder [batchSize, irisFeatures]  outputs <- placeholder [batchSize]

Then we’ll get the nodes for the two layers of variables, as well as their results. Between the layers, we’ll add a “rectifier” activation function relu:

(hiddenWeights, hiddenBiases, hiddenResults) <-   buildNNLayer irisFeatures numHiddenUnits inputslet rectifiedHiddenResults = relu hiddenResults(finalWeights, finalBiases, finalResults) <-  buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults

Now we have to get the inferred classes of each output. This means calling argMax to take the class with the highest probability. We’ll also cast the vector and then render it. These are some Haskell-Tensor-Flow specific terms for getting tensors to the right type. Next, we compare that against our output placeholders to see how many we got correct. Then we’ll make a node for calculating the error rate for this run.

actualOutput <- render $ cast $   argMax finalResults (scalar (1 :: Int64))let correctPredictions = equal actualOutput outputserrorRate_ <- render $ 1 - (reduceMean (cast correctPredictions))

Now we have to actually do the work of training. First, we’ll make oneHot vectors for our expected outputs. This means converting the label 0 into the vector [1,0,0], and so on. We’ll compare these values against our results (before we took the max), and this gives us our loss function. Then we will make a list of the parameters we want to train. The adam optimizer will minimize our loss function while modifying the params.

let outputVectors = oneHot outputs (fromIntegral irisLabels) 1 0let loss = reduceMean $ fst $ softmaxCrossEntropyWithLogits finalResults outputVectorslet params = [hiddenWeights, hiddenBiases, finalWeights, finalBiases]train_ <- minimizeWith adam loss params

Now we’ve got our errorRate_ and train_ nodes ready. There's one last step here. We have to plug in for the placeholder values and create functions that will take in the tensor data. Remember the feed pattern from last week? We use it again here. Finally, our model is complete!

return $ Model  { train = \inputFeed outputFeed ->       runWithFeeds        [ feed inputs inputFeed        , feed outputs outputFeed        ]        train_  , errorRate = \inputFeed outputFeed -> unScalar <$>      runWithFeeds        [ feed inputs inputFeed        , feed outputs outputFeed        ]        errorRate_  }

Tying it all together

Now we’ll write our main function that will run the session. It will have three stages. In the preparation stage, we’ll load our data, and use the build function to get our model. Then we’ll train our model for 1000 steps by choosing samples and converting our records to data. Every 100 steps, we'll print the output. Finally, we’ll determine the resulting error ratio by using the test data.

runIris :: FilePath -> FilePath -> IO ()runIris trainingFile testingFile = runSession $ do  -- Preparation  trainingRecords <- liftIO $ readIrisFromFile trainingFile  testRecords <- liftIO $ readIrisFromFile testingFile  model <- build createModel

  -- Training  forM_ ([0..1000] :: [Int]) $ \i -> do    trainingSample <- liftIO $ chooseRandomRecords trainingRecords    let (trainingInputs, trainingOutputs) =          convertRecordsToTensorData trainingSample    (train model) trainingInputs trainingOutputs    when (i `mod` 100 == 0) $ do      err <- (errorRate model) trainingInputs trainingOutputs      liftIO $ putStrLn $         "Current training error " ++ show (err * 100)

  liftIO $ putStrLn ""

  -- Testing  let (testingInputs, testingOutputs) =         convertRecordsToTensorData testRecords  testingError <- (errorRate model) testingInputs testingOutputs  liftIO $ putStrLn $ "test error " ++ show (testingError * 100)

  return ()

Results

So when we actually run all this output, we’ll get the following results on our test set.

Current training error 60.000004Current training error 30.000002Current training error 39.999996Current training error 19.999998Current training error 10.000002Current training error 10.000002Current training error 19.999998Current training error 19.999998Current training error 10.000002Current training error 10.000002Current training error 0.0

test error 3.333336

Our test sample size was 30, so this means we got 29/30 this time around. Results change though from run to run (I obviously used the best results I found). Since our sample size is so small, we have high entropy here (sometimes the error rate is like 40%). Generally we’ll want to train longer on a larger test set, so that we get more consistent results, but this is a good start.

Conclusion

In this article we went over the basics of making a neural network using the Haskell Tensor Flow library. We made a fully-connected neural network and fed in real data we parsed using the Cassava library. This network was able to learn a function to classify flowers from the Iris data set. Considering the small amount of data, we got some good results.

Come back next week, where we’ll see how we can add some more summary information to our tensor flow graph. We’ll use the tensor board application to view our graph in a visual format.

For more details on installing the Haskell Tensor Flow system, check out our In-Depth Tensor Flow Tutorial. It should walk you through the important steps in running the code on your own machine.

Perhaps you’ve never tried Haskell before at all, and want to see what it’s like. Maybe I’ve convinced you that Haskell is in fact the future of AI. In that case, you should check out our Getting Started Checklist for some tools on starting with the language.

Appendix: All Imports

Documentation for Haskell Tensor Flow is still a major work in progress. So I want to make sure I explicitly list the modules you need to import for all the different functions we used here.

import Control.Monad (forM_, when)import Control.Monad.IO.Class (liftIO)import Data.ByteString.Lazy.Char8 (pack)import Data.Csv (FromRecord, decode, HasHeader(..))import Data.Int (Int64)import Data.Vector (Vector, length, fromList, (!))import GHC.Generics (Generic)import System.Random.Shuffle (shuffleM)

import TensorFlow.Core (TensorData, Session, Build, render, runWithFeeds, feed, unScalar, build,                        Tensor, encodeTensorData)import TensorFlow.Minimize (minimizeWith, adam)import TensorFlow.Ops (placeholder, truncatedNormal, add, matMul, relu,                      argMax, scalar, cast, oneHot, reduceMean, softmaxCrossEntropyWithLogits,                       equal, vector)import TensorFlow.Session (runSession)import TensorFlow.Variable (readValue, initializedVariable, Variable)

Published by HackerNoon on 2017/07/10