How We Managed To Build a 12-Story Stack and Not Go Crazy

Written by Appodeal | Published 2018/10/04
Tech Story Tags: mobile-advertising | programming | data-storage | story-stack | tech-stack

TLDRvia the TL;DR App

Appodeal has a team of about 100 people who work in San Francisco, Moscow, Kirov, Barnaul, Barcelona, and now also in Minsk — from June 2018 onwards. We monetize mobile applications by displaying ads to users. We started with ad mediation, but due to constant growth of the technology stack new ad tech products were added.

For those unfamiliar with ad tech, it’s a field for technological companies in advertisement. When you tell people that you work in the mobile ad industry, they tend to react skeptically, probably visualizing the annoying “play this video” ad. However, this outdated view on ads has nothing to do with the actual advertising business that is characterized by diversity and rapid growth. The mobile segment we work in has long gone beyond web ads.

Why on earth integrate ads?

Publishers work hard to create an engaging app and when they’re ready to submit them to an app store, they need to think of how to monetize. In order to successfully monetize, publishers should take into account many factors. They can utilize various models — from in-app purchases to displaying ads — to make sure their app thrives. Displaying ads is one of the best ways to let people use apps for free, and ensure a broader reach.

No doubt, too many ads would annoy people and affect user retention, which is something everyone wants to avoid. That’s why publishers always seek to integrate ads smartly in order to make the most money out of their app while at the same time relieve users of undesirable costs.

How does this work?

As soon as you decide to monetize through ads, it’s important to pay attention to this part of app development and go with the solution that helps you maximize your revenue.

What happens if you sign up with Appodeal? After registering on our website, we integrate an app into our service. This is done via a client SDK that connects app to the server and interacts with the server via API.

To keep it short and sweet, the interaction is reduced to:

a. Determining the ad to be displayed at the moment

b. Sending information about ads displayed and recording that in statistics

Today Appodeal serves several thousand active apps displaying 400 to 450 millions ads per day with up to 1 billion requests to ad networks that provide ads directly. To make sure everything is up and running, our servers process around 125K requests per second (i.e. about 10.8B requests per day).

What’s at the core?

We use various technologies to provide speed and reliability, as well as agile development and support. At the moment, we code using the following languages:

  • /Ruby / Ruby on Rails + React.JS (front-end)/: Most of API and the whole web part that our users and employees can see
  • /GoLang/: Processing a wide range of statistical and other data
  • /Scala/: Real-time processing of requests to interact with traffic trading boards via RTB protocol (for details, see the final part of this article)
  • /Elixir / Phoenix/: More of an experimental part. Building several microservices to process some statistics and API.

Why Ruby and Ruby on Rails from the start?

In the mobile ads segment Appodeal competes with industry giants, so we know we need to stay alert at all times and quickly adapt to market changes. Often, it feels like changing car wheels while driving at 100 km/h. Ruby on Rails allowed us to stay in the race and become securely established in the market i to lead the segment. We see the following advantages to Rails:

  • A large number of highly qualified developers
  • Great community
  • Many off-the-shelf solutions and libraries
  • Quick implementation of new features and changing/erasing the old ones

Apparent drawbacks:

  • In general, efficiency leaves much to be desired. I should also mention absence of JIT (at the moment) and ability to parallel code (aside from JRuby). This can be tolerated to a certain degree, because usually database and cache slow things down, as can be seen from the NewRelic screenshot:

  • It is difficult to divide rail block into microservices, the business logic is closely tied to data access logic (ActiveRecord)

What data do you store?

We have lots of data. We deal with billions/ ten of billions/ hundreds of billions of records. However, since this data is quite diverse, we store it in a variety of ways. Architecture should never be limited to a single solution that is supposedly universal. First of all, as practice shows, there are virtually no universal high-load solutions. Universality is achieved at the cost of access speed / read speed / storage space being at mid-level or even well below it. Secondly, you always need to try new things, do experiments and search for unconventional solutions for tasks at hand. To sum up:

  • /PostgreSQL/: We really like Postgre and think that it’s currently the best OLTP solution to store data. We store user, application, ad campaign and other data there. We use master/slave replication, but only do backups at Christmas because it’s a cookie-pushers’ game (kidding).
  • /VerticaDB/: A column-oriented database we’re using to store billions of statistical records. In short, in the past we considered Vertica to be the best OLAP solution to store analytics. Its main drawback — a costly (individual) license price.
  • /ClickHouse/: a column-oriented database we are gradually migrating to from VerticaDB. We think that OLAP is currently the best solution available. It doesn’t cost you anything, because it’s totally free. It’s also very fast and reliable. Its main drawback is that you cannot erase or update data (we could write a whole separate article about that, should anyone be interested).
  • /Aerospike/: It seems to be one of the fastest NoSQL key-value storage solutions. It has some disadvantages , but, in general, we are happy with it. On Aerospike’s website you can have a look at a table comparing their performance with that of other solutions: [When to use Aerospike NoSQL database vs.Redis] (https://www.aerospike.com/when-to-use-aerospike-vs-redis/)
  • /Redis/: Curiously enough, its main advantage is ease of use and a single-thread architecture that allows avoiding race conditions, e.g. when working with standard counters.
  • /CouchBase/: At some point, memcache became unequal to the tasks at hand, so we switched to couchbase, leaving memcache only on some local nodes. We store all our global cache in couchbase.
  • /Druid/: We use it for big data arrays to work with RTB boards. In fact, it has a lot in common with ClickHouse, however,we haven’t switched to a single tool so far.

You might think we complicate everything, but that’s not the case. First of all, Appodeal has several development teams and many sub-projects within a project. Secondly, we are not alone in this: many companies in ad tech use a whole stack of various technologies within one company.

Is that it? How do you monitor this?

No, that’s not the full story. We have a lot more interesting stuff going on. For instance, since data flows are quite big, they have to be queued. We use Kafka for that purpose. We see it as a great reliable solution written in Scala that has never let us down so far.

The only requirement for a consumer in this case is to process the queue faster than it grows. This is a simple and obvious rule, we mainly use GoLang for this purpose. However, you have to keep in mind that this server must have more than enough RAM.

To keep track of all this stuff, it is necessary to monitor and delegate literally everything. To help us do that we use the following solutions:

  • /NewRelic/: This is a time-tested solution that integrates very well with Ruby on Rails and GoLang microservices. Because NewRelic’s only drawback is price, we don’t use it across the board. We mostly strive to replace it with manually gathered metrics and put them into Grafana.
  • /Zabbix/: A good real-time monitoring tool for everything that’s going on our servers.
  • /Statsd + Grafana/: A great tool for gathering our internal metrics, except we have to set up everything ourselves and “duplicate” NewRelic’s out-of-the-box functionality.
  • /Fluentd + ElasticSearch + Kibana/: We put pretty much everything into logs, from slow PostgreSQL requests to some Rails’ system messages. In fact, an ElasticSearch-based solution like Kibana can gather all logs in one place and then search messages in them.
  • /Airbrake/: An integral part of this process is gathering errors along with messages’ stacktraces. We are currently migrating from Airbrake to a free solution to save money.

It’s important to understand that well-built monitoring is your eyes and ears. It’s a waste of time to do guesswork. You need to be able to see what happens on your servers at any given moment. That’s why stability and reliability of your product largely depends on how well you build the metric gathering and visual display systems.

By the way, speaking of reliability, we have several staging servers where we perform trial rollouts and test releases that we keep steadily under load by sending them a partial copy of actual traffic. Every week we synchronize databases between production and staging. This gives us a kind of a mirror to test things that can’t be verified locally and pinpoint problems at the load test level.

Is it really that complicated?

Yes. Elon Musk mentioned in his book Tesla, SpaceX, and the Quest for a Fantastic Future that Jeff Hammerbacher, an early Facebook engineer, told him: “The best minds of my generation are thinking about how to make people click ads… That sucks”.

Here’s a short description of what Appodeal does:

  • We are integrated with 60+ ad networks and DSPs. We automatically register applications in these networks and set up various parameters so that these networks perform at their best. Not every network has the required APIs, so here bots come on the scene.
  • Each network pays display fees to users. These fees have to be received, divided according to various parameters and processed. This is done on a loop. Once again, here and there bots are used for that purpose.
  • To maximize a user’s revenue, we invoke competition among networks by putting up so-called “waterfalls” out of ad offers. A waterfall is built based on various criteria (e.g. eCPM, an average price for 1000 displays) that we predict in a variety of ways. The higher an ad offer is on the waterfall, the higher its predicted price. The waterfall is forwarded to the device as often as required. As you might have guessed, annoying ads that nobody opens are of no interest to anyone. Perhaps, the exception are “branded” banner ads from Coca-Cola, Pepsi and other corporations that are known for a tendency to be image-obsessed.
  • Part of this interaction is done via RTB protocol: Real-Time Bidding.

Here the so-called bidders bargain online over the right to display their ads on a selected device. It is an interesting topic worth of being covered in a separate article. Many exchange markets, such as Google AdExchange, set rigid constraints upon a server’s response time (for example, 50 ms) which raises the performance issue. Failure to comply usually results in a penalty of thousands of dollars. This is exactly what the core, written in Scala, does in conjunction with Druid.

  • Everyone wants to know the ropes, and so our customers (as well as we are) want to gain knowledge about who has seen ads, when and why. For that reason we have to queue all our data with Kafka, then gradually process it and put it in OLAP database (ClickHouse). Many people think that PostgreSQL can handle this task just as well as various “hip” solutions, but this is debatable. PostgreSQL is good, but the canonical solution for building indices for quick data access breaks down when the number of fields used for filtering and sorting exceeds 10 and the number of stored data records comes close to 1 billion. You will simply run short of memory to store all these indices or run into problems updating them. In any case, you won’t achieve the same performance levels, such as demonstrated by column-oriented solutions, for analytical requests.

Summary

In this article I tried to briefly describe what we do and how we store and process data. If you wish state in comments which stack you’re using and feel free to ask questions — we’re happy to share our experience with you.


Published by HackerNoon on 2018/10/04