Our battle with fraud

Due to our estimations, about half of our advertising product traffic was fraud in the spring of 2021. We used a third-party tool to filter fraud, but we had to pay for its use and could not influence the “magic” under the solutions’ hood.

Taking matters into our own hands, we figured out the details and worked out our own filter system, raising the conversion rate on advertisers’ websites. After disabling fraudulent partners (the bot farms), we reduced the fraud share to 10%.

In this article we talk about the basics of our approach. However, we will not disclose all secrets.

Briefly about the solution

The campaign manager allows push ads with classic push notifications displayed by the operating system tools and the in-page push that appears on the page in the browser in an iframe. The user sees the advertisement, clicks, and gets to our backend. That’s where all the fun happens.

Why is it necessary to fight fraud?

We filter out two types of traffic.

  1. Traffic from bots that will never buy anything from the advertiser. We divide bots into two types:
  • Automated scripts created by means for testing in browsers — Phantomjs, Selenium, Puppeteer, Playwright, and others. With their help, the browser itself “presses” certain buttons, emulating ads views;
  • JavaScript code embedded in a legitimate page that the user is viewing. The code pretends that the user has downloaded and clicked on the ad, visiting the advertiser’s website.

2. Traffic in the wrong price range. Push traffic is expensive. Advertisers pay for it. Fraudulent sources bring us other types of advertising to make more money by pretending to be the push. We filter out such publishers, protecting the advertiser’s interests, even though legitimate users can come to us through this channel. After all, our client should get what he pays for.

Initially, we used a third-party traffic filtering tool to go to market quickly. But for us, it turned out to be the wrong choice for several reasons:

  • the solution works with its own servers,
  • there is obscure magic under the hood
  • that is the loading of another external dependency,
  • the tool introduces quite a lot of traffic losses,
  • it had false positives,
  • we stumbled upon errors that turned off filtering for us.

The main reason is that the third-party tool did not consider the specifics of push advertising, not allowing specific fraud detection cases, including those with traffic of the wrong price range. It allows you to tell exactly whether the user is a bot or not, and that’s it. We tried more expensive solutions, but they also did not suit us.

As a result, we had to dive into the topic independently.

Redirects and Intermediate Pages

We collect browser settings and other available data on this page to evaluate whether the traffic is good or bad. The intermediate page sends all the collected data to the backend, where we decide whether to show our advertiser’s ad in response to this request. For those who are rejected, we show the stub. On the other hand, the advertiser sees a high conversion and appreciates the tool for high-quality traffic.

We implemented several variants of the intermediate page:

  • The minimal page does not collect anything and is not used regularly(0% of clicks are sent to it). This page was implemented to understand how the very fact of an “extra” redirect to an intermediate page affects the process of a user’s transition to the advertiser’s site. It turned out that each redirect eliminates about 5–7% of clicks.
  • The light page is used most of the time (95% of the traffic goes to it). In this version of the intermediate page, we collect only the basic parameters. This page loses about 16% of users. It could be bots that can’t execute JavaScript or people with mobile devices and bad internet connections which don’t wait for the intermediate page to load and render. We are calm about these losses since neither the first nor the second ones will bring any profit to the advertiser.
  • The normal page collects about 140 different parameters. We use it to monitor traffic from partners in real-time and highlight fraud. Also, we used to work on it with an external fraud detection system. With its help, we tested some of our hypotheses. The collection of so many parameters and the operation of a third-party system in terms of traffic losses are not free. We lose about 32% of traffic.

Depending on the task, we redirect traffic between intermediate pages. Generally, we use light. If it is necessary to experiment with a new trap — we connect Normal. In some exceptional cases, we return Minimal.

How filtering is carried out

Revealing all the traps means giving bot growers a tool to bypass our anti-fraud system. But I can provide a few examples that will make the approach clear.

Time trap (Low time to click)

High CTR trap

It should be noted that the trap is not for a specific visitor but a traffic source.

Window size trap

Curiously, we also came across errors in this rule — when the parameters are 0, but the user comes to the advertiser’s site and performs targeted actions. Later, we found an explanation — a new generation of bots appeared, periodically going to the advertiser’s website so that ML tools do not consider fraud. We need to ban them, but there is a high probability of a false positive for legitimate users. Our team does not yet have an unequivocal opinion on the topic of this rule — we continue to experiment.

Object Presence Hook

A legal user must have a “window.chrome” object in the window (i.e., it must not be undefined). At the same time, many bot drivers hide the browser and remove this object for some reason. Thus, we filter traffic if this property is undefined. The hook is mentioned here as an example since it is now disabled (on Firefox, it started to give a lot of false positives).

Not all rules go into production. For example, we assumed that the request, click, and the icon should match in case the same User-Agent. We set such a trap but received too many false positives. It turned out that many partners can pass arbitrary modified User-Agent strings.

Filtration quality control

The most straightforward approach is to look at Clickhouse and evaluate the parameters of the filtered traffic with your eyes.

We can also look at the quality of traffic from the advertiser’s side — evaluate the conversion when a user clicks on an ad and performs certain actions on the site. When the target action is completed, the advertiser informs us about it.

Our campaign manager has collected enough data and can analyze which traffic does not bring conversions. Often this indicates bots-generated traffic. Conversion stats help us look for new traps or test how well old ones work.

In general, we have a relatively loyal approach to filtering — we filter out traffic only if we are 99% sure it is a fraud.

Moving forward

To analyze the clusters of traffic allocated by specific parameters, we use CatBoost. This is a reasonably effective classifier from Yandex, which works well with categorical features.

We throw traffic data into it, and at the output, we get a list of significant parameters based on which we can create new rules. In addition, CatBoost can create a feature significance matrix, so we get a lot of interesting things out of it. For instance, properties of traffic revealing fraud in it).

However, there is no further automatic work — issues with partners are resolved at the management level. We had precedents of shutting down partners that supply a lot of junk traffic, especially at the beginning. After such a “cleaning” of the partners, the share of fraud in traffic was reduced from half to 10%.

We have recently taken a different path. We began to look at the automation tools themselves. While analyzing the product’s source code and its plugins, we found several markers of Puppeteer. However, the problem is that bot tools regularly undergo a complete refactoring and there are pull requests that fix the “holes” we found in the scripts of bot growers. Then we have to study the new version of the bot system to understand how to catch it. In the same way, we are studying new versions of browsers — they have new properties that can be used to conclude the quality of traffic.

PS. Subscribe to our social networks: Twitter, Telegram, FB to learn about our publications and Maxilect news.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store

We are building IT-solutions for the Adtech and Fintech industries. Our clients are SMBs across the Globe (including USA, EU, Australia).