Removing analytics clutter from campaign URLs
Long, complex URLs stuffed with query string parameters. We web developers are responsible for a fair few of those, although with the increasing adoption of URL Rewriting they're less visible than once they were.
Against this trend are the URLs which Google Analytics (GA) encourages its users to deploy to track specific campaigns, such as how many visitors arrived via a particular marketing link, or from an RSS reader. I'm sure you've seen them: http://www.example.com/story.html?utm_source=blah&utm_medium=blah&utm_term=blah&utm_content=blah&utm_campaign=blah
Nothing wrong with this, you might say. It may look like junk to ordinary folk, but it's doing necessary work.
Except that, unlike the query string identifiers an app needs to retrieve the correct product page etc. these campaign parameters aren't an intrinsic part of the URL - they're just there for the benefit of the site owner wanting to snoop measure activity.
Still, no harm done to the visitor... Unless they want to do something with the URL, such as bookmark it or copy and paste it into another application. If they don't want all that "utm" gunk sticking to it, they'll have to clean it off manually.
Wouldn't it be nice if we could save them that bother by cleaning up our own URLs, whilst still allowing GA to do its work?
Using Javascript to manipulate the URL
It's certainly possible to make changes to the current URL once a page has loaded, and Paul Irish has published a nifty script which appears to do the job. Unfortunately it relies on the HTML5 History API, which you can't use in IE9 and below (which in practice means all IE versions at the time of writing).
For less capable browsers we can use the window.location
object. The following will remove the entire query string from the current window's URL:
window.location.search = "";
Trouble is, if you add that line of code to your page's document.ready() handler, you'll find yourself in an infinite loop—because changing the query string using this method causes a full page refresh. Even if we add some conditions so that it fires just once to get rid of the "utm" params, you'll still have an additional page load every time. Not good for the user or the analytics.
However there is a property of window.location
that can be manipulated without a page refresh: the hash
—i.e. the portion after the # sign.
window.location.hash= "";
This will leave the trailing # sign which is not ideal, but anything after it will be stripped in any browser without triggering a page reload.
Configuring GA to use hashes instead of query strings
But what use is this? The GA parameters are in the search
property/query string, not the hash
. True, but remember GA is also Javascript and it's quite easy to tell it to grab the parameters from the hash using the method setAllowAnchor
in your configuration "snippet".
var _gaq = _gaq || [];
_gaq.push(
['_setAccount','YOUR-GA-KEY']
,['_setAllowAnchor',true]
,['_trackPageview']
);
This allows URLs with a hash instead of a question mark separating the campaign parameters to be tracked:
http://www.example.com/story.html#utm_source=blah&utm_medium=blah&utm_term=blah&utm_content=blah&utm_campaign=blah
A cross-browser solution
With this in place and our inbound campaign links using hashes instead of question marks, we just need to implement a function to detect and strip out any "utm" parameters using the best method the browser supports: the HTML5 window.history object or plain old hash manipulation:
var removeUtms = function(){
var l = window.location;
if( l.hash.indexOf( "utm" ) != -1 ){
if( window.history.replaceState ){
history.replaceState({},'', l.pathname + l.search);
} else {
l.hash = "";
}
};
};
var _gaq = _gaq || [];
_gaq.push(
['_setAccount','YOUR-GA-KEY']
,['_setAllowAnchor',true]
,['_trackPageview']
);
_gaq.push( removeUtms );
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
To be absolutely sure that you don't zap the parameters before GA has had a chance to register them, it's important to call the function using the _gaq.push()
method, which will ensure it fires after the page view has been tracked.
Note also that any query string parameters—which might be present for other purposes—won't be touched. Only the hash values are stripped.
Anchors
Update October 2013: Ilhan asks a good question in the comments: what if your campaign URL includes a named anchor? In other words you want to link to a specific part of a page by appending the name of an anchor or id, e.g. campaign.html?#bottomofpage
. The script above removes everything after the #, including any named anchors.
An anchor must come immediately after the # symbol, before the utm parameters, so we can adapt our function so that it will preserve any non utm string in that initial position.
var removeUtms = function(){
var l = window.location;
if( l.hash.indexOf( "utm" ) != -1 ){
var anchor = l.hash.match(/#(?!utm)[^&]+/);
anchor = anchor? anchor[0]: '';
if(!anchor && window.history.replaceState){
history.replaceState({},'', l.pathname + l.search);
} else {
l.hash = anchor;
}
};
};