Utilizing the Wayback Machine for SEO
The Wayback Machine from the non-profit Internet Archive organization is one of the coolest tools that I use on a regular basis. The tool isn’t intended for SEO but you can use it for search engine optimization with great results.
Brief History of the Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web and other information like books, movies and music. The tool was created by the Internet Archive, a non-profit organization, based in San Francisco, California. The service enables users to see archived versions of web pages across time and since 1996, they have been archiving cached pages of websites.
Generally speaking, they revisit sites every few weeks or months and archive a new version if the content has changed. Prominent webpages like the homepages of Amazon.com and Google.com appear to be archived up to 30 or more times per day on average. The intent is to capture and archive content that would otherwise be lost whenever a site is changed or closed down. Their grand vision is to archive the entire Internet!
Wayback Machine Interface
The tool is incredibly easy to use, simply enter in the URL and you will be presented with a neat graph and calendar with visual elements depicting when and how often the URL has been archived over time.
History of Amazon.com
Wayback Machine in Action
The two screenshots below of Amazon.com’s homepage in 2006 and in 2015 show just how powerful this tool is:
Amazon.com in 2006
Amazon.com in 2015
From the archived webpage you can even attempt to click around to other archived pages or view the page source which means that the archived page is not simply a static image or a screenshot.
Usage in SEO
The Wayback Machine has three major uses in SEO:
- Researching older links that may have disappeared
- Tracking website modifications that resulted in traffic changes
- Using archived pages as evidence
Researching older links that may have disappeared
- Combined with a broken link checker you can get a visual of the broken link. Far more insightful than trying to decipher what www.domainname.com/2014/product/1 is. This will allow you to 301 redirect the webpage for example to the correct page with more confidence. Or perhaps inquire as to why a valuable page or set of pages no longer link to your site while providing a visual to the person or organization in control of that website.
- You can use the Wayback Machine to help revive deleted content. In most cases this content will be backed up in the cloud and/or offline but I’ve found that some quick formatting or wording changes sometimes make it to the HTML version of the page aren’t made in the corresponding word processing document so the latest archived version may be the most recent version of the content.
Track website modifications that resulted in traffic changes
- If you or your team was responsible for a site architecture overhaul the Wayback Machine could be instrumental had you not taken screenshots prior to the redesign. Perhaps there is a correlation between a particular navigation structure and a sudden drop or increase in organic web traffic and you need to or would like to recall what the site looked like prior to the redesign.
- Combined with a file or document comparison tool you can compare the code side by side and highlight the differences. Again, perhaps there are major or specific website changes that correspond to organic traffic performance.
- You may also want to see how the site and competitor websites evolved over time and where you think the marketplace is headed in terms of SEO, marketing and development.
Using archived pages as evidence
- You can use the date of an archived page as proof that you did not make a change to a website that is perceived to have had or simply did have a negative impact. Since you or your team did not have access to the site when the change was made for example.
- You could also use an archived page as proof of link acquisition for a link that was acquired after the webpage went live.
The above assumes that the webpages in question are archived of course. I have come across instances of inner pages not being archived, especially for less popular sites. I have also come across a few sites which block the Wayback Machine in their robots.txt file altogether. Regardless, given that 455 billion web pages are currently archived by the Wayback Machine, especially top-level pages from what I can see, it has plenty to offer as a quality SEO Tool.
Speaking of Wayback machine, here is our insight into Google Time Dimension.