PricingNet: modelling the global airline industry with neural networks

By Kieran McHugh

Here at Skyscanner, we’re always looking out for ways to improve our travellers’ experience through novel applications of machine learning.

Every day, millions of people start planning their next trip on Skyscanner. Each search can generate thousands of prices for travellers to browse and compare. All of these prices pass through Skyscanner’s data platform: where they are ingested, filtered, processed, and ultimately stored. At our scale, it becomes a real challenge to extract actionable insights owing to the sheer volume of pricing data with which we’re being bombarded.

Background

We’ve all experienced the frustration and disappointment of having to fork out more than expected for a product or service: be it a nasty utility price hike, a heated online auction, or just missing out on a flash sale. Afterwards, we feel ripped off — and we might have to make significant and inconvenient adjustments to financial plans in order to gather the necessary cash.

It’s an unfortunate and regrettable fact that this happens all the time when people are booking flights for upcoming trips. The travel industry is a dynamic and highly competitive marketplace, where prices set by airlines and travel agents are extremely volatile and subject to modification without notice. Sometimes, prices can increase several times per hour.

In fact, the airline industry as a whole is seen as “one of the most sophisticated in its use of dynamic pricing strategies in an attempt to maximise its revenue” [1].

The price of the cheapest flight for any given route can vary wildly over time. The above chart illustrates price changes for a one-way short-haul flight over a 14-day window.

This situation is hugely frustrating for us, and for our travellers. Every day, we work extremely hard to maximise the recency and accuracy of prices we display on our website. We aren’t notified when prices are going to change, nor do we know the amount by which they will increase or decrease. It’s up to us to second-guess the market and minimise the impact of these fluctuations on our users.

An idea

Having worked with Skyscanner throughout my university studies, I’d been on the lookout for how to apply my academic experience in the context of Skyscanner’s biggest business challenges. So, when the time came to select a topic for my thesis, I knew that I wanted to partner with Skyscanner to tackle a difficult problem. I had studied the principles of machine learning in-depth, and I decided to specialise in Artificial Neural Networks. I developed a specific interest in how neural networks can be applied to regression problems unrelated to computer vision and image recognition.

I wondered whether we could leverage Skyscanner’s large historical datasets to design, build, train and evaluate some simple neural networks — which, given some contextual information about a flight, could provide us with some insight about how we can expect the ticket price to increase or decrease between now and its departure date.

Why would we want to do this? There are already several websites which advise travellers whether they should ‘buy now’ or ‘wait for a while’. In the latter case, it is hoped that the ticket price will decrease, allowing the traveller to save some cash. However, in the majority of cases, flight ticket prices will never go down — they increase monotonically. I therefore reasoned that machine learning models focused on ‘buy/wait’ classification are not very useful for customers.

The best advice we can give to travellers is to purchase your tickets as early as you can. Though, life is rarely this simple: it’s often inconvenient to book a flight right away, and it might be necessary to defer a booking. For instance, we often have to wait for payday to come around, or for a (less organised) friend to confirm their attendance so that all the tickets can be booked together. I believe it would be much more valuable if we could inform our travellers…

how long they can delay their purchase for before the price starts to increase substantially;
by how much they can expect the price to increase in the event that they need to defer their purchase for whatever reason.

So, instead of a simple ‘buy’ or ‘wait’ classification, it would be far preferable to develop a machine learning model whose output is an indicative price. In other words, we give the model the details about our flight (such as the origin, destination, and date of travel) along with a parameter representing the number of days until departure. In theory, as we vary this parameter between 365 days (1 year in advance) and 0 days (the day of the departure), the model should produce different prices.

At the point that I came up with this proposal, I had no idea how difficult such a model would be to construct, or even whether it was remotely feasible. My whole project was clouded in uncertainty, and there were an overwhelming number of unknowns which made it difficult to know where I should begin.

The training data

Before attempting to break the problem down any further, I concentrated on getting access to as much high-quality training data as possible. I worked with the Skyscanner legal team to set up an academic data-sharing agreement. This agreement granted me unprecedented access to a snapshot of global flight pricing over a 60-day period from September to November 2016.

Specifically, for every search performed on Skyscanner in this period, we recorded the minimum direct price (MDP) that we showed to the traveller. The MDP is the cheapest direct fare (no connections) across all flight times and providers. This amounted to more than 200,000,000 candidate training patterns.

That’s a lot of data!

Selecting the project scope

With such a huge amount of training data, I felt that I would be setting myself up for failure if I didn’t establish a clear focus for my investigation.

With this in mind, I decided to consider only direct, one-way flights. I thought that this presented a sensible starting point, and that it would provide a basis for future work investigating a more complete model including return flights and connecting flights.

In addition, I made the assumption that users would be willing to travel at any time of day in order to minimise the amount they pay, even if the cheapest ticket was for a flight at a particularly unsocial time.

Big challenges

I found that establishing a focused scope for my research and making some assumptions about traveller behaviour removed a lot of complexity. That being said, there were quite a few outstanding challenges to think about.

There are a number of reasons why the airline industry is particularly challenging to model. In a way, it behaves like a stock market: airlines take a huge number of different factors into account when selecting prices, and there are lots of externalities. Revenue management analysts combine the advice of complex software with their own unpredictable intuition to reach a determination as to how much customers should be charged. How could I be sure that any determinism exists in the relationship between fares and the factors that influence them?
The factors taken into account by revenue management systems often do not have a meaningful numerical representation, meaning that they cannot be presented directly as input to a neural network. How would I represent categorical entities, such as airports, in a way the network could understand?
I aspired to build a single, universal neural network capable of predicting flight prices between arbitrary destinations. As we have seen, the amount of data required to train any universal flight pricing model is very large. At this scale, storing, preprocessing, splitting, and shuffling the data are all nontrivial tasks. How could I effeciently work with so much data in the limited time I had to complete the project?

Selecting an architecture

While I knew that I wanted to apply neural networks to this problem, I then had to choose from an overwhelming array of possible different neural network paradigms and architectures. There were two possible approaches I could have taken.

Approach 1: Select a really simple feed-forward neural architecture. While this approach would be low-risk, there was a possibility it wouldn’t be sophisticated enough to provide an accurate model.
Approach 2: Go ‘all out’ and try to leverage the power of a more sophisticated architecture such as a Recurrurent Neural Network (RNN). RNNs are sometimes better suited to time series data. Going with this approach would be risky owing to the additional complexity involved with constructing and training these networks.

I decided to take Approach 1. My project was the first documented attempt to apply neural networks to this problem, it made much more sense to start simple and scale the complexity later — rather than starting complex and reducing the complexity in the event of failure.

Tools and frameworks

To keep my research efficient, I wanted to minimise the time I spent implementing the networks and managing the data pipeline. I looked into several machine learning frameworks, and eventually settled on TensorFlow. I decided to take advantage of Keras: a machine learning library built on top of TensorFlow that offers powerful, expressive, and production-ready neural network components right out of the box.

I took advantage of Amazon’s Elastic MapReduce to cleanse, preprocess, and reformat the training data to extract the variables I needed. This data was piped directly into Pandas dataframes and fed verbatim to the training utilities provided by Keras. Keras and TensorFlow support handling data in mini-batches: a group of training patterns can be delegated to the GPU and processed in parallel to dramatically speed up the training process.

Splitting the training data

In order to evaluate our model’s performance reliably, not all of the data can be used for training: part must be reserved to assess performance on unseen situations. Intuitively, we might imagine that an appropriate split could be achieved by sorting the data chronologically, using the ‘early bird’ flight prices for training, and reserving the last-minute prices for evaluation. There are several reasons why I think that this is not necessarily the best approach.

Firstly, this approach only measures the ability of the network to extrapolate (project trends based on a set of historical data points into the future), and not to interpolate (fill in missing information between known data points). In the context of modelling flight prices, good performance on both extrapolation and interpolation are critical, since data for less popular routes is often sparse. Secondly, a temporal split could result in a situation where the evaluation data is unrepresentative of general trends. In other words, I would be evaluating each model’s performance solely on its ability to predict what happens in the final few days before departure, rather than the whole booking window. This is bad.

Splitting the data randomly is often a more reliable method. Though, owing to the fact that each price in the data set belongs to a wider time series corresponding to a specific flight, we cannot simply split the data randomly. I proposed that a more effective approach would be to first to group training points by the flights they refer to. We will reserve 10% of these flights for evaluation, and use the remaining 90% to train the network. This way, there is no possibility that the training process will divulge any sneak hints about the flights that we intend to use for evaluation.

Input variables

Having conducted extensive research on the economics of airline pricing, I developed an understanding of what factors are taken into account by revenue management systems when deciding on flight pricing. I identified a subset of these factors to provide a basic set of inputs to the model.

Origin airport: a discrete categorical variable describing the airport from which the flight departs.
Destination airport: also a discrete categorical variable describing the airport at which the flight lands. The set of possible origin airports is equal to the set of possible destination airports.
Day of week: a discrete integer variable between 1 and 7 representing the day of the week the flight departs, from Monday — Sunday.
Week of year: a discrete integer variable from 1 to 53 describing the week
Duration: a continuous integer variable describing how long the flight is, in minutes.
Days remaining until departure: a continuous integer variable representing the number of days between the date of the price prediction and the date of departure.

Airport2Vec

There’s a problem here. As input, neural networks accept continuous data. Four of our six proposed inputs are not continuous. I had a few possible solutions to this:

Integer IDs: each possible variable value is assigned an integer identifier which is passed to the network. For instance, London Heathrow airport might be assigned an ID of 1234. However, let’s now imagine that Paris Charles de Gaulle is given an ID of 2468. From the network’s perspective, if we multiply London Heathrow by 2, then we get Charles de Gaulle. This makes absolutely no sense. Integer IDs don’t work in this context.
One-Hot Encoding: We could instead have a separate binary input for every possible airport. Whenever we want to refer to an airport, we simply set the corresponding input to ‘1’, while the rest are kept at ‘0’. This does solve some of the issues with the spurious relationships induced by integer ID representations, but it causes other issues. Namely, we’d need more than 3000 inputs for origin and destination airports (6000 total) which creates a huge computational resource requirement, and exponentially increases the amount of training data required.
Entity Embeddings: Entity embeddings are an exciting new approach to resolving some of the issues associated with one-hot encoding. Before the one-hot inputs are passed to the network, they are first compressed using an intermediate layer of neurons called the embedding layer. This means that our 3000+ dimensional input is mapped down onto an n-dimensional real-valued vector. ‘Similar’ airports will appear close to each other in this new vector space, allowing the network to differentiate between them. Similarity in this context is learned by the network during the normal supervised training process.

I decided to implement the third approach: entity embeddings. Under my proposed mapping, each airport would be represented by a 12-dimensional vector. I refer to this mapping as Airport2Vec.

Network Designs

I decided to implement, train and evaluate four feed-forward neural network designs.

LR-1: The first network design was an extremely primitive linear regression model using just two of the inputs (days to departure and duration). I expected the results from this model to be poor — but I’d be able to use them as a benchmark against which I could compare subsequent designs.
NN-1: The second network design used the same two inputs as LR-1, but had a more sophisticated topology with 2 hidden layers.
NN-2: The penultimate network design introduced airport embeddings — to allow the network to differentiate between pricing strategies on specific routes. I increased the number of hidden layers to 3, and added more neurons per layer to add ‘capacity’ for modelling the additional complexity.
PricingNet: The final iteration used all six inputs, with entity embeddings for the four categorical inputs.

By gradually ramping up the complexity of the models, I was able to measure the impact of adding new inputs and adjusting the topology. It also allowed me to perform sanity checks and validate some of my ‘leap of faith’ assumptions early on in the process. All of the networks except LR-1 used Adam optimisation and ReLU activation to keep them ‘future-proof’.

Results

It was reassuring to see that the performance of my models improved substantially with each iteration. As expected, LR-1 exhibited the poorest performance. The chart below illustrates the linear fit learned by the network for the two input variables I provided: flight duration, and days remaining until departure.

The addition of hidden layers in NN-1 meant that the network was able to fit a non-linear curve to the training data. This resulted in a 40% reduction in Mean Absolute Error (MAE) on the unseen training samples. The relationship that the model learned between duration, days to departure, and price also looked much cooler when it was plotted.

At this point, it was clear that the network didn’t have enough information to account for all of the variation in the data. The next logical step for NN-2 was to include information about the origin and destination airports for each flight.

The addition of airport information, compressed using Airport2Vec, led to a further 48% reduction in MAE on unseen evaluation samples versus NN-1. Curiously, NN-2 was able to identify asymmetrical pricing strategies on certain routes. For instance, the plot below shows that it’s generally considerably cheaper to fly one-way from London to New York than the other way around. Different taxes across national jurisdictions could be one of the reasons that this occurs.

When I constructed and trained PricingNet (the final iteration) and incorporated information regarding the day and week of departure, there was a further 13.5% improvement in MAE versus NN-2. The regression plot below illustrates for 10,000 random unseen evaluation samples PricingNet’s prediction versus the actual price.

The general y=x shape of this plot was reassuring to see, though it’s clear that there’s still some work to be done to account for the residual variation.

Evaluation and Learnings

The results I obtained suggest that it would not only be feasible to develop a universal neural pricing model, but that it would also be straightforward in comparison to some of the previous work in this area. A simple network was able to account for most of the variance in pricing with just six input variables. I suspect that, had the network been trained on at least two years’ worth of pricing data, the performance improvements displayed by PricingNet would have been even more significant. This is because the network must really observe at least two full annual cycles in order to identify recurring seasonal trends.

Although the network performed well, there’s clearly work still left to be done to improve the overall accuracy. The remaining residual variation means that it might be misleading to present the network output to travellers — without the caveat that the information only represents a general trend, and not necessarily the exact prices the user can expect to encounter. Providing specific numeric values might give customers the impression that the network, in its current form, is more accurate than it actually is. Until further work can be carried out, it might be wise to instead provide a trend graph with an unlabelled price axis.

I learned throughout this process that it’s very important to apply ‘lean’ principles even when developing machine learning models. This means starting simple, restricting the input variables, and constraining the network topology. It’s then possible to iterate, measuring the performance of each model, and using this to inform the development of subsequent designs.

References

[1] O. Etzioni, R. Tuchinda, C. A. Knoblock and A. Yates, “To buy or not to buy: Mining airfare data to minimize ticket purchase price”, in Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2003, pp. 119–128.

SEE the world with us

At Skyscanner, we’re for travellers by travellers. Our employees can work up to 30 days a year from any of our 10 global offices on our SEE (Skyscanner Employee Experience) programme and even do up to 30 days home country working if you’re based in an office out of the country you call home. Of course, there are always chances to travel to the other offices for work trips or conferences too.

Like the sound of this? Look at our current Skyscanner Product Engineering job roles.

Join the team

About the Author

My name is Kieran, and I’m a Software Engineer at Skyscanner London and a recent graduate from the University of York. My team manages Skippy, responsible for redirecting millions of daily travellers to airline websites to purchase their tickets. Outside work, I’m a keen pianist and love learning more about technology, business, finance, aviation and French language.