12 Google Link Analysis Methods That Might Have Changed
In the Google Inside Search blog, Google’s Amit Sighal published a post titled Search quality highlights: 40 changes for February that told us about many changes to how Google ranks pages, including the following:
Link evaluation. We often use characteristics of links to help us figure out the topic of a linked page. We have changed the way in which we evaluate links; in particular, we are turning off a method of link analysis that we used for several years. We often rearchitect or turn off parts of our scoring in order to keep our system maintainable, clean and understandable.
A lot of people were guessing which “method of link analysis” might have been changed, from PageRank being turned off, to anchor text being devalued, to Google ignoring rel=”nofollow” attributes in links, to others. I was asked my opinion by a few people, and mentioned that there were a number of potential approaches that Google might have changed.
I’ve made a list of a dozen possibilities and granted Google patents that describe them, but Google uses link analysis in a lot of ways, and what Google turned off might involve something else entirely, and/or something that might not even be described in a patent.
Here’s my list:
1. Local Inter-connectivity
Search Results are ranked normally in response to a query, and then before they are displayed to searchers, the links between the pages in that smaller subset are explored and some results may be boosted in the results based upon links between those results.
The book In the Plex mentions that the inventor behind this patent, Krishna Bharat, developed an algorithm similar to the HITS algorithm that was incorporated into what Google does in 2003. This patent was granted in 2003, and it’s similar in a number of ways to the HITS algorithm.
This process might be somewhat unnecessary these days, especially if Google is reranking search results based on something like the co-occurrence of terms in a result set based upon phrase-based indexing. – Ranking search results by reranking the results based on local inter-connectivity
2. Finding Related Sites
If you perform a search that appears to be for a specific site, you might see a list of other pages at the bottom of the search results, with a heading (that’s also a link) that heads “Pages similar to www.example.com”. If you click upon it, you’ll see search results for [related:www.example.com], The method that determined which pages were related was based upon links pointing at those pages using a link-based analysis.
Could Google have found a better way f finding related pages? It’s possible, but the pages showing don’t seem to have changed. – Techniques for finding related hyper linked documents using link-based analysis
3. Adaptive Page Rank
This patent describes a faster approach to calculating PageRank, taking some shortcuts. It can take a while to calculate PageRank, and a method like the one described here could speed that up.
Google has a lot more pages indexed now than they did when the patent behind this approach was filed, and they may still need this shortcut. They’ve also advanced technologically, and may not. – Adaptive computation of ranking
4. Cross Language Information Retrieval
It might be possible to use anchor text from a link on a page in one language to understand what webpage that link is pointing to in another language, to understand what the targeted page is about.
Google has done a lot of work in building statistical machine translation models over the past 5-7 years and that technology might serve them better than an approach like this one. – Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
5. Link Based Clustering
Google has probably clustered similar web pages by looking at other pages that link to pages appearing in search results, and seeing what other pages they link to.
Google might have replaced this clustering approach with one that focuses instead more upon the content and/or the concepts contained on those pages. – Link based clustering of hyperlinked documents
6. Personalized PageRank Scoring
Determining personalized page scores for web pages based upon links pointing to pages that appear for specific queries in search results and whether the anchor text in those links are related to those query terms.
Google might use a different approach, such as one that may look at large amounts of data about searchers, pages, and queries to calculate a personalized page score for pages. – Personalizing anchor text scores in a search engine
7. Anchor Text Indexing
Using anchor text for links to determine the relevance of the pages they point towards. It’s quite likely that Google continues to use an approach like this, but in a modified manner that might be influenced by things like phrase-based indexing – Anchor tag indexing in a web crawler system
8. Link Analysis using Historical Data
In 2005, Google published a patent application that describes a wide range of temporal-based factors related to links, such as the appearance and disappearance of links, the increase and decrease of back links to documents, weights to links based upon freshness, weights to links based upon authoritativeness of the documents linked from, age of links, spikes in link growth, relatedness of anchor text to page being pointed to over time.
Google may have used some of the factors described in this patent and continue to use them or replaced them with something else, and it might have ignored others, – Information retrieval based on historical data
9. Link Weights based upon Page Segmentation
We’ve known for a few years that Google will give different weights for links based upon segments of a page where a link is located. It’s quite likely that something like this might continue to be used today, but it might have been modified in some manner, such as limiting in some way the amount of value a link might pass along if, for instance, it appears in the footers on multiple pages of a site.
Then again, Google probably has already been doing that. – Document segmentation based on visual gaps
10. Reasonable Surfer Link Features
Google’s Reasonable Surfer model describes a good number of features that might be taken together to determine how much value a link might pass along from a page in relation to other links on that page, and it’s possible that one or more of those values are no longered considered in a way that they might have been in the past. – Ranking documents based on user behavior and/or feature data
11. Links between Affiliated Sites
Some sites may be deemed to be related, or affiliated, to others in some manner, such as being owned by the same person or people. The value of those links might be diminished because of that relationship, in comparison to other “editorially determined links.”
How that affiliation is calculated might have changed. – Determining quality of linked documents
12. Propagation of Relevance between Linked Pages
Assigning relevance of one web page to other web pages could be based upon distance of clicks between the pages and/or certain features in the content of anchor text or URLs. For example, if one page links to another with the word “contact” or the word “about”, and the page being linked to includes an address, that address location might be considered relevant to the page doing that linking.
There are a few different parts to this method of having the relevance of one page on a site propagated to other pages on the same site, and one or more of those could have changed if it is in use. – Propagating useful information among related web pages, such as web pages of a website
What “method of link analysis” do you think Google turned off?
97 thoughts on “12 Google Link Analysis Methods That Might Have Changed”
Thanks for the thoughts Bill, but I guess there are going to be a lot of ifs and buts, difficult to determine exactly what has changed. Heard a talk by Graham Hansell of CIM yesterday and he said that it was likely the link valuation change was Google turning off anchor text indexing! Not helpful when speaking to a room full of potential clients.
Its pretty much useful information for webmasters and seo professional to keep an eye on kind of links hey are acquiring and how it recognize as good or bad by Google. Thanks for sharing, Hope you would share further detail in future of identified any fix pattern about link evaluation.
Thank you Bill for this great overview. It seems that we are up to guessing.
But I don’t see a lot of fluctration, just in secondary markets.
Paul, does Graham have any pruves for his theory? How should they than find out topic relevancy?
If you put yourself in Google’s shoes for a moment then in my opinion (after reading Wil Reynolds recent article on Link spam) the top target is surely the personalisation “of PageRank Scoring for web pages based upon links pointing to pages that appear for specific queries in search results and whether the anchor text in those links are related to those query terms.”
This has to be the deepest consideration because ultimately top relevant web content has provided the best in expert advice; providing a self policed network of authority.
@Matdwright
Google is giving less importance in PR. Even if you can have a link in a reputable site Google may not follow that link, if he “thinks” that it will not have a good experience for the user.
With the integration of G+ on SERP’s the results will be as personalized as it can get. The target for Google is “a good experience”. Content farms and link building will be banned more and more.
Obviously, we’re all just guessing (unless Amit would like to leave a comment :-P), but given the importance of SPYW, it is extremely unlikely that Google “turned off” Personalized PageRank Scoring (this also has a lot of spam fighting uses) or Link Analysis using Historical Data (especially since the new unified “privacy” policy goes into effect today).
If I had to pick one or two from the above list, my guess would be Local Interconnectivity and/or Cross Language Information Retrieval. The first approach actually amplifies the effects of linkages between pages in a SERP, and if you’re looking for a way to devalue linkages (i.e., “turn off” a method of link analysis), this would be an excellent candidate. As Bill mentions, the second approach seems extremely outdated now that Google has such a powerful language translation system. In fact, I would probably trust Google’s translation of a page significantly more than I would trust the accuracy of anchor text in a different language.
@webbstuff
I think they will devalue the anchortexts and focus on the website theme, surrounding text and other factors that will give Google a picture of the link instead of the anchortext.
Thanks Bill! As usual it is a pleasure to read your articles. I did not find the clear answer to the question, but I can see that until this moment the anchor text value is intact.