Prevent bot traffic from being collected by GA4
If you’ve seen unusual activity tracked in your Google Analytics property, you know how frustrating it is. All your past reports are now marred by these erratic patterns, making it hard to see what’s really happening with web metrics. Your first step is to identify what this bot traffic is. Next, you’ll want to use those characteristics to filter out that traffic in reporting. And if the traffic is continuing to hit your website, setting up a filter so it’s not being logged in your GA4 data is a smart move. This article describes how to do that, leveraging GA4’s “Internal traffic filters” in a way you may not have realized is possible.
In Universal Analytics, we had the ability to filter traffic from a property view based on a variety of dimensions. In GA4 our options for filtering at the property level are a lot more limited. GA4 lets us create ‘Internal traffic filters’ and that’s about it. This feature is poorly named – we can actually use it to filter any traffic based on IP address, internal or otherwise.
What is less obvious and less well-documented is that the actual filtering takes place based on an optional ‘traffic_type’ parameter. This method of filtering traffic requires two steps:
- Add an Internal traffic rule – when you do this you specify an IP address or range of addresses. The rule sets the value of the traffic_type parameter for all incoming events that match the IP address(es).
- Add a Traffic filter – if you set the Type of the filter type to ‘Internal traffic’, you can label or exclude traffic based on the value of the traffic_type parameter.
But you don’t actually need to do step 1! If you set a value for the traffic_type parameter in your Google Tag, you can create a Traffic filter without creating an Internal traffic rule. This gives you A LOT more power to exclude traffic using Google Tag Manager (GTM). At a high level, this process looks like this:
- Create a traffic_type variable in Tag Manager using the full capabilities of javascript in GTM.
- Add a traffic_type parameter to your GA4 Google Tag that takes the value of your variable.
- Add a Traffic filter in GA4 based on the value you set.
I use this technique a lot to filter out dev traffic. It is often the case that developers work on a version of a site that has a different domain name or URL structure. I create a regex lookup variable in GTM that outputs “dev” or “staging” based on this URL pattern. Then I follow steps 2 and 3 above to exclude the traffic. Note that when the traffic_type variable has a value of null, nothing happens – no harm, no foul.
And I’ve used this technique to filter out traffic from persistent bots. The process of identifying bots can be tricky and requires some tenacity. My Hunting for Bots post follows the multi-month process I took recently to identify & filter a bot that was particularly tricky to track down.