Once upon a time there were two nerds at Stanford working on their PhDs. (Now that I think about it, there were probably a lot more than two nerds at Stanford.) Two of the nerds at Stanford were not satisfied with the current options for searching online, so they attempted to develop a better way.
Being long-time academics, they eventually decided to take the way academic papers were organized and apply that to webpages. A quick and fairly objective way to judge the quality of an academic paper is to see how many times other academic papers have cited it. This concept was easy to replicate online because the original purpose of the Internet was to share academic resources between universities. The citations manifested themselves as hyperlinks once they went online. One of the nerds came up with an algorithm for calculating these values on a global scale, and they both lived happily ever after.
Of course, these two nerds were Larry Page and Sergey Brin, the founders of Google, and the algorithm that Larry invented that day was what eventually became PageRank. Long story short, Google ended up becoming a big deal and now the two founders rent an airstrip from NASA so they have somewhere to land their private jets.
Relevance, Speed, and Scalability
Hypothetically, the most relevant search engine would have a team of experts on every subject in the entire world—a staff large enough to read, study, and evaluate every document published on the web so they could return the most accurate results for each query submitted by users.
The fastest search engine, on the other hand, would crawl a new URL the very second it’s published and introduce it into the general index immediately, available to appear in query results only seconds after it goes live.
The challenge for Google and all other engines is to find the balance between those two scenarios: To combine rapid crawling and indexing with a relevance algorithm that can be instantly applied to new content. In other words, they’re trying to build scalable relevance. With very few exceptions, Google is uninterested in hand-removing (or hand-promoting) specific content. Instead, its model is built around identifying characteristics in web content that indicate the content is especially relevant or irrelevant, so that content all across the web with those same characteristics can be similarly promoted or demoted.
This book frequently discusses the benefits of content created with the user in mind. To some hardcore SEOs, Google’s “think about the user” mantra is corny; they’d much prefer to know a secret line of code or server technique that bypasses the intent of creating engaging content.
While it may be corny, Google’s focus on creating relevant, user-focused content really is the key to its algorithm of scalable relevance. Google is constantly trying to find ways to reward content that truly answers users’ questions and ways to minimize or filter out content built for content’s sake. While this book discusses techniques for making your content visible and accessible to engines, remember that means talking about content constructed with users in mind, designed to be innovative, helpful, and to serve the query intent of human users. It might be corny, but it’s effective.
That fateful day, the Google Guys capitalized on the mysterious power of links. Although a webmaster can easily manipulate everything (word choice, keyword placement, internal links, and so on) on his or her own website, it is much more difficult to influence inbound links. This natural link profile acts as an extremely good metric for identifying legitimately popular pages.
NOTE : Google’s PageRank was actually named after its creator, Larry Page. Originally, the algorithm was named BackRub after its emphasis on backlinks. Later, its name was changed to PageRank because of its connections to Larry Page’s last name and the ability for the algorithm to rank pages.
Larry Page’s original paper on PageRank, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” is still available online. If you are interested in reading it, it is available on Stanford’s website at http://infolab.stanford.edu/~backrub/google.html. It is highly technical, and I have used it on more than one occasion as a sleep aid. It’s worth noting that the original PageRank as described in this paper is only a tiny part of Google’s modern-day search algorithm.
Now wait a second—isn’t this supposed to be a book for advanced SEOs? Then why am I explaining to you the value of links? Relax, there is a method to my madness. Before I am able to explain the more advanced secrets, I need to make sure we are on the same page.
As modern search engines evolved, they started to take into account the link profile of both a given page and its domain. They found out that the relationship between these two indicators was itself a very useful metric for ranking webpages.
Domain and Page Popularity
There are hundreds of factors that help engines decide how to rank a page. And in general, those hundreds of factors can be broken into two categories—relevance and popularity (or “authority”). For the purposes of this demonstration you will need to completely ignore relevancy for a second. (Kind of like the search engine Ask.com.) Further, within the category of popularity, there are two primary types—domain popularity and page popularity. Modern search engines rank pages by a combination of these two kinds of popularity metrics. These metrics are measurements of link profiles. To rank number one for a given query you need to have the highest amount of total popularity on the Internet. (Again, bear with me as we ignore relevancy for this section.)

This is very clear if you start looking for patterns in search result pages. Have you ever noticed that popular domains like Wikipedia.org tend to rank for everything? This is because they have an enormous amount of domain popularity. But what about those competitors who outrank me for a specific term with a practically unknown domain? This happens when they have an excess of page popularity.
Graph showing different combinations of relevancy and popularity metrics that can be used to achieve high rankings
Although en.wikipedia.org has a lot of domain popularity and get.adobe.com/reader/ has a lot of page popularity, www.awesome.com ranks higher because it has a higher total amount of popularity. This fact and relevancy metrics (discussed later in this chapter) are the essence of Search Engine Optimization. (Shoot! I unveiled it in the first chapter, now what am I going to write about?)
Popularity Top Ten Lists
The top 10 most linked-to domains on the Internet (at the time of writing) are:
- Google.com
- Adobe.com
- Yahoo.com
- Blogspot.com
- Wikipedia.org
- YouTube.com
- W3.org
- Myspace.com
- Wordpress.com
- Microsoft.com
The top 10 most linked-to pages on the Internet (at the time of writing) are:
- http://wordpress.org/
- http://www.google.com/
- http://www.adobe.com/products/acrobat/readstep2.html
- http://www.miibeian.gov.cn/
- http://validator.w3.org/check/referer
- http://www.statcounter.com/
- http://jigsaw.w3.org/css-validator/check/referer
- http://www.phpbb.com/
- http://www.yahoo.com/
- http://del.icio.us/post
Before I summarize I would like to nip the PageRank discussion in the bud. Google releases its PageRank metric through a browser toolbar. This is not the droid you are looking for. That green bar represents only a very small part of the overall search algorithm.
Not only that, but at any given time, the TbPR (Toolbar PageRank) value you see may be up to 60–90 days older or more, and it’s a single-digit representation of what’s probably very a long decimal value.
Just because a page has a PageRank of 5 does not mean it will outrank all pages with a PageRank of 4. Keep in mind that major search engines do not want you to reverse engineer their algorithms. As such, publicly releasing a definitive metric for ranking would be idiotic from a business perspective. If there is one thing that Google is not, it’s idiotic.
Google makes scraping (automatically requesting and distributing) its PageRank metric difficult. To get around the limitations, you need to write a program that requests the metric from Google and identifies itself as the Google Toolbar.
In my opinion, hyperlinks are the most important factor when it comes to ranking web pages. This is the result of them being difficult to manipulate. Modern search engines look at link profiles from many different perspectives and use those relationships to determine rank. The takeaway for you is that time spent earning links is time well spent. In the same way that a rising tide raises all ships, popular domains raise all pages. Likewise, popular pages raise the given domain metrics.
In the next section I want you to take a look into the pesky missing puzzle piece of this chapter: relevancy. I am going to discuss how it interacts with popularity, and I may or may not tell you another fairy tale.
