Once in a while, I need to identify so-called dead pages on a website – that is, pages with no pageviews. Usually, this is necessary when migrating a website to a new CMS. In that case, it’s useful to know if there are pages that can be safely deleted/omitted (just remember proper redirects). A similar use case is when cleaning up a website in order to remove unnecessary content. However, since Google Analytics only track pages that are visited, it’s not possible to find dead pages in Google Analytics alone since those pages will not show up in any reports.
But it’s actually still possible to find dead pages in Google Analytics if Analytics is combined with something else. The basic approach is simply to have a complete list of pages on the website. Then compare that list to the logged pages in Google Analytics. All pages that are not found in Google Analytics are dead pages. Depending on your selected time frame of course.
Get all tracked pages from Google Analytics
First of all, we need a complete list of pages being tracked by Analytics. By default, Google Analytics page tracking tracks the page path; i.e. the part of the URL that comes after the domain name. If the complete URL for a page on the site is e.g.
yourdomain.com/directory/page.html, then the page path is
/directory/page.html. In some cases, the page path is overwritten so it might also contain the domain name. In other cases, the page path is edited in ways that might remove or rename parts of the path. This is usually caused by using filters, so before proceeding you should check if you have any such filters, since you need the precise page paths. (but you definitely have a raw data view, right?)
When that’s in order, go ahead and create a new custom report and use these configurations (or download and install the report here):
- Type = Flat Table
- Dimensions = Page
- Metrics = Pageviews
Then save the report and set your desired date range. The span of the date range depends on how far back in time you’ll want to look for dead pages. Is a page dead if it hasn’t received any pageviews for a month? Three months? A year?
Next, make sure to use the Show rows dropdown at the bottom of the table to show all rows. Then go ahead and export the report using the Excel (xlsx) option in the top navigation bar. If you have more than 5,000 pages, you’ll have to make an export for each 5,000 rows and then combine the Excel files manually.
If done right, you should end up with something like this:
Click image to enlarge
Get all page paths from the website
Next up, we’ll need a complete list of page paths from the website. Paths are usually called something else in most standard CMS solutions; in WordPress it’s called a slug or permalink, and in other systems it may just be called “URL”. If your CMS has a built-in function to export a list of all URLs, then great – do that.
But if using a system that can’t do that (e.g. WordPress), then use a crawler software such as Screaming Frog (my favourite website crawler). Start by opening up Screaming Frog and use these configurations (click on Spider in the Configuration menu):
Then enter you domain name (e.g. analytical42.com) in the address bar and Start the crawl. Depending on the number of pages on your website, the crawl can take anything from a few seconds to several hours. But when complete, navigate to the Internal tab, set the Filter to HTML and hit the Export button – and save the file in Excel (xlsx) format:
Before proceeding, it’s necessary to do a little cleanup in the Screaming Frog export. So open up the file, select the entire A column (which should contain the crawled URLs) and do a search and replace (CTRL + H). Search for your domain name including the protocol; e.g. https://analytical42.com and leave the replace field empty. Hit the Replace all button which should then remove your domain name leaving just the page path for each crawled page.
Combine the two files
Now follow these steps to combine the two files and to prepare the data:
- Create a new blank Excel workbook/file with two sheets; name one sheet GA Pages, and the second sheet Live Pages
- Copy all content from the Dataset1 sheet in the Google Analytics export file, and paste it onto the GA Pages sheet starting in cell A1
- Then copy the A column from the Screaming Frog export file to the Live Pages sheet starting i cell A1 as well
- Go to the GA Pages sheet and select the entire A and B columns
- In the Ribbon in Excel, go to Data, and then select Sort in the Sort & Filter group
- Check My data has headers and Sort by Page, Sort on Values and Order A to Z
- Click OK
Next, we’ll write the formula to identify all live pages that have not been tracked by Google Analytics (these should be our dead pages):
- Go to the Live Pages sheet and select the B-cell next to the first URL (this should be B3 if you’ve followed the steps so far)
- Enter the formula
=VLOOKUP(A3,'GA Pages'!A:B,2,FALSE)and hit Enter
- Auto-fill the entire column with that formula by double clicking the square in the bottom right corner of cell B3
And there you have it! This is how you find dead pages in Google Analytics – pretty simple, right? Every page that has not been tracked by Google Analytics, will have a #N/A error in it’s corresponding B cell. If the cell contains a number, this number tells two things: 1) The page has been tracked, and 2) the number of pageviews that have been tracked:
Get the Excel file
You can download the final Excel file here to check it out yourself: find-dead-pages-google-analytics-excel.xlsx.