Talk:PageRank

This is the talk page for discussing improvements to the PageRank article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
Please sign and date your posts by typing four tildes ( ~~~~ ).
New to Wikipedia? Welcome! Ask questions, get answers.

Article policies

Archives: Index, 1

This article is of interest to the following WikiProjects:

WikiProject Computing / Software

(Rated B-class, Mid-importance)

This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.

B

This article has been rated as B-Class on the project's quality scale.

Mid

This article has been rated as Mid-importance on the project's importance scale.

This article is supported by WikiProject Software (marked as Mid-importance).

WikiProject Internet

(Rated B-class, High-importance)

This article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.

B

This article has been rated as B-Class on the project's quality scale.

High

This article has been rated as High-importance on the project's importance scale.

WikiProject Google (Rated B-class, High-importance)

This article is within the scope of WikiProject Google, a collaborative effort to improve the coverage of Google and related topics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.

B This article has been rated as B-Class on the project's quality scale.

High This article has been rated as High-importance on the project's importance scale.

WikiProject Google To-do:

Here are some tasks awaiting attention:

Article requests : Google Handwriting input and articles for most of the other products listed here and here.
Assess : All articles in the Category:Unknown-importance Google articles and Category:Unassessed Google articles using the project's assessment scale
Expand : Google Mapathon, Google Talkback
Maintain : This WikiProject and Portal:Google
Merge : Google mobile services into List of Google products
Stubs : Category:Stub-Class Google articles and Category:Google stubs
Update : List of features in Android and Gmail interface#Product integration
Other :
- Help the Google article for a good article status
- Improve the Outline of Google
- Get more members using
{{subst:Wikipedia:WikiProject Google/Invite Members}}
- Add more stuff to this list if you like!

Alphabet Task Force

(Rated B-class, Mid-importance)

This article is within the scope of the Alphabet Task Force, a collaborative effort to improve the coverage of Alphabet Inc. articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the Alphabet Task Force discussion and see a list of open tasks.

B

This article has been rated as B-Class on the quality scale.

Mid

This article has been rated as Mid-importance on the importance scale.

WikiProject Marketing & Advertising

(Rated B-class, Low-importance)

This article is within the scope of WikiProject Marketing & Advertising, a collaborative effort to improve the coverage of Marketing on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.

B

This article has been rated as B-Class on the quality scale.

Low

This article has been rated as Low-importance on the importance scale.

Sources for development of this article may be located at "PageRank" – news · newspapers · books · scholar · JSTOR · free images

This talk page is automatically archived by Lowercase sigmabot III. Any threads with no replies in 30 days may be automatically moved. Sections without timestamps are not archived.

1 Difference between Page Rank and Page Ranking
2 "named after Larry Page[1]"
3 Problem with equations in 'Power Method' section
4 Estimates on how many pages are PageRank 10, 9, 8, 7, 6, 5, 4, 3, 2, 1?
5 Page Rank or Page Score?
6 Can someone check the MATLAB code, I believe it should be Norm(v,1)
7 Intuitive Justification / Random Surfer Model
8 Article's Graphic: What's the Unit of Measure?
9 Article is too technical
10 15/10/2013 Gogole say no with Pagerank Update
11 Removed
12 Unnecessary assumption in derivation of power method?
13 External links modified
14 Minor error in "Damping Factor"
15 Damping factor for biological data
16 Examples are Wrong

Difference between Page Rank and Page Ranking[edit]

There seems to be some blurring here around the difference between page rank and page ranking. For instance,

Other factors are also part of the algorithm such as the size of a page, the number of changes and its up-to-dateness, the key texts in headlines and the words of hyperlinked anchor texts.

These are part of Google's ranking algorithm, but not part of the page rank algorithm.

The Google Page Rank algorithm provides a numerical value which is used (unless discontinued) as part of the Google SERPs algorithm. — Preceding unsigned comment added by Oli4uk (talk • contribs) 21:34, 29 June 2011 (UTC)

"named after Larry Page[1]"[edit]

Tom Anderson has cited this article as a reference for this extremely dubious factoid, but the reference goes to a nonlink at Google. So I don't know whether this is a spoof, an April Fool's joke at Google, or truth. Can anyone support it? It probably should be deleted otherwise. · rodii · 15:28, 18 July 2011 (UTC)

That page disappeared only recently. There's still a copy in Google's cache dated Jul 12. There's a copy at archive.org which we can cite. -- X7q (talk) 15:35, 18 July 2011 (UTC)

I believe it should be "purported" to be named after the patent assigned to Larry Page, but may simply represent the ranking of web "pages" ... to cite a PR statement from Google doesn't seem to align with the realities of what IS - should we drink the Koolaid? — Preceding unsigned comment added by MonteShaffer (talk • contribs) 17:14, 1 March 2013 (UTC)

Problem with equations in 'Power Method' section[edit]

Some of the equations are not parsed properly in this section. It seems to be because a pair of curly braces are needed to define the scope of the \widehat operator, but have been omitted. I'm not sure of the correct scope of the operator so haven't fixed it. — Preceding unsigned comment added by 109.156.49.34 (talk) 21:20, 6 October 2011 (UTC)

Estimates on how many pages are PageRank 10, 9, 8, 7, 6, 5, 4, 3, 2, 1?[edit]

Here are pages that gives numbers for PageRank 10 and 9, I would like to know the others at least how many pages are 8, 7, 6, 5, 4, 3, 2, 1 PageRank?

PageRank 10= 12 pages (see the website above)
PageRank 9 = 148 pages (see the website above)
PageRank 8 = 1,816 pages estimates
PageRank 7 = 22,330 pages estimates
PageRank 6 = 274,664 pages estimates
PageRank 5 = 3,378,367 pages estimates
PageRank 4 = 41,553,912 pages estimates
PageRank 3 = 511,113,116 pages estimates
PageRank 2 = 6,286,691,331 pages estimates
PageRank 1 = 73,463,463,463 pages estimates

From PageRank 8 to 1 I have estimated by using two numbers from 10 and 9 but if anybody knows exact numbers let me know. MohammedBinAbdullah (talk) 22:02, 10 April 2012 (UTC)

My estimate would be.....

PageRank 10= 10 pages estimates
PageRank 9 = 100 pages estimates
PageRank 8 = 1,000 pages estimates
PageRank 7 = 10,000 pages estimates
PageRank 6 = 100,000 pages estimates
PageRank 5 = 1,000,000 pages estimates
PageRank 4 = 10,000,000 pages estimates
PageRank 3 = 100,000,000 pages estimates
PageRank 2 = 1,000,000,000 pages estimates
PageRank 1 = 10,000,000,000 pages estimates

Doubtcoachdoubtcoach (talk) 03:04, 15 September 2012 (UTC)

My estimate would be.....

PageRank 10= 12 pages (see the website above)
PageRank 9 = 148 pages (see the website above)
PageRank 8 = 1,500 pages estimates
PageRank 7 = 12,330 pages estimates
PageRank 6 = 174,664 pages estimates
PageRank 5 = 2,378,367 pages estimates
PageRank 4 = 31,553,912 pages estimates
PageRank 3 = 411,113,116 pages estimates
PageRank 2 = 4,286,691,331 pages estimates
PageRank 1 = 40,463,463,463 pages estimates

113.203.171.207 (talk) 14:11, 26 September 2012 (UTC)

Page Rank or Page Score?[edit]

Hi,

I wanted to pose a question. Adding this may make the article complete.

Is it right to call it PageRank or PageScore?

The reason is -

In ranking - 1 is high and 10 is low

In score - 1 is low and 10 is high.

Google page rank of 1 is low and 10 is high. Hence PageScore may be appropriate.

Regards,

Sashidharan

http://in.linkedin.com/in/sashidharan Website: http://sashidharanb.wix.com/businesssolutions — Preceding unsigned comment added by 101.62.26.247 (talk) 13:32, 18 December 2012 (UTC)

This might be sound logic, but "PageRank" is a name which has been firmly established and recognized. Using "PageScore" would do nothing but confuse people reading this page. It's sort of like Coccinellidae, which goes by names such as "ladybird" or "ladybug" -- but in actuality it is neither a bird nor a bug (in the strictest sense). Sometimes names are wrong, but they continue to be used because it is what people already know and use as a reference point. Jefflithe (talk) 01:40, 21 December 2012 (UTC)

I understand that 'PageRank' is an established name. However, The 'Rank' part of it may confuse some users particular about 'Ranking', especially Ranking of assets etc. 'PageScore' is used here only for discussion purposes. I don't advise using 'PageScore' unless it is officially renamed. — Preceding unsigned comment added by 101.62.79.37 (talk) 16:50, 28 December 2012 (UTC)

Can someone check the MATLAB code, I believe it should be Norm(v,1)[edit]

Hi folks,

I was playing with this code, and couldnt get the PR results to sum to 1.0 (per the discussion at start of document). Looking at it, I believe that the problem is that the normalization doesn't cause it to sum to 1 - setting it to norm(v,1) does do this.

This might just have been random good luck on my part - but it meets the initial condition - that the random distribition of pagerank sums to 1.0, not so sure about why it needs to renormalized everytime though, that seems flaky. — Preceding unsigned comment added by 67.160.239.133 (talk) 05:26, 26 December 2012 (UTC)

Follow up - the example itself doesn't sum to one, running on Octave.

The reason I *believe* is at least partly due to the columns not all summing to one. I fixed this in my example code, by ensuring that Node has a SELF-LOOP (ie the diagonal is non zero), and by removing the L2 norm in the loop. Would love someone more knowledgeable than me to explain this though.

ans =

  0.54033
  0.30295
  0.30295
  0.45012
  0.56735  — Preceding unsigned comment added by 67.160.239.133 (talk) 19:58, 26 December 2012 (UTC)

Update -- in looking at the paper Topic Sensitive Pagerank, (Taher H. Haveliwala) the author reports that if a Node has out degree == 0, then instead of just having a self-loop, assign an equal probability to going EVERYWHERE in the graph. — Preceding unsigned comment added by 63.194.72.222 (talk) 18:25, 4 January 2013 (UTC)

Update 2 -- I've successfully resolved various issues in the Matlab code.

1) There is no reason why the pagerank vector should be constantly renormalized by the L2 norm. I haven't found any justification for it's use. It makes sense to normalize the initial vale by the L1 norm, to assure that the page rank distribution sums to 1.0. However, after this it will remain a PDF and not need renormalizing.

2) The way of dealing with SINKs is ill-defined at best. In order to achieve the PageRank for the original graphic at the top of the page, it is necessary to treat a sink's out links as being equally connected to the whole graph, including itself. However, one does not include self-loops for nodes with existing outlinks. The reason for this is that making the transition be equally likely across all nodes is to make sure that a sink node is treated as a 100% AUTOMATIC restart

Wdavies (talk) 16:31, 22 April 2013 (UTC)WDavies

Hi, thanks for pointing this out, you are indeed right. Sink pages have to be represented as if they would link to all pages; and the older Matlab example was wrong in using the L2 norm. I have changed the (broken) Matlab example to a simple Python example and illustrated how to run it (in the external pages section of the page). I think this is much clearer now. I have also removed the "proof that the Matlab algorithm is wrong" discussion in the middle of the page. Others, please feel free to improve on this.

129.169.150.206 (talk) 21:39, 23 August 2014 (UTC)

Intuitive Justification / Random Surfer Model[edit]

I'm not experienced with editing wiki articles yet, so I'd like to share my thoughts about this article here first.

I'm missing a more obvious statement about the underlying model (random surfer model) and the intuitive explanation about the algorithm. This would help to understand further details. Later in this article it can help to refer to this model e.g. when introducing the damping factor. I think in general it's a good style to present models for algorithms, if they exist.

Therefore I would propose a separate heading before explaining the algorithm in detail.

Good explanations can be found in Brin's works:

The original paper contains a short section "2.5 Random Surfer Model" http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
"The anatomy of a large-scale hypertextual Web search engine" http://ilpubs.stanford.edu:8090/361/1/1998-8.pdf has a section "2.1.2 Intuitive Justification" with an additional paragraph explaining, when websites (should) have a high page rank.

Should be checked, who mentioned this model first. Actually I thought it is Brin (he doesn't cite others). But skimming through other publications they refer to Blum: A. Blum, T.H.H. Chan, M.R. Rwebangira, in Proceedings of the eigth Workshop on Algorithm Engineering and Experiments and the third Workshop on Analytic Algorithmics and Combi- natorics. (Society for Industrial and Applied Mathematics, 2006), pp. 238–246

It could also be mentioned, that this model has been established in scientific work: Many publications refer to it introducing variant algorithms.

I like the section about "The intentional surfer model" by the way. ;) — Preceding unsigned comment added by 85.178.89.56 (talk) 11:31, 15 April 2013 (UTC)

Article's Graphic: What's the Unit of Measure?[edit]

From the text below the illustration: Mathematical PageRanks for a simple network, expressed as percentages. I've seen this graphic for years, and just today I actually gave it a hard look for the first time, and noticed that it's missing some kind of Unit of Measure. The percentage has to refer to something. Is it percentage of PageRank? "Juice"? If the percentages refer to PageRank as the Unit of Measure, or whatever it may be, it ought to state this explicitly. My Algebra teacher would have marked this answer wrong on a test, and my Chemistry teacher would have have held this answer up for ridicule by the entire class.Jonny Quick (talk) 16:27, 15 August 2013 (UTC)

Article is too technical[edit]

Why do articles like this always have to be written as though they're going to be read by computer scientists / academics. Most readers of Wikipedia are ordinary people who want to see a simple explanation in layman's terms. At least keep the first paragraph in simple, plain English that anyone can understand (without jargon, formulas, technical definitions, etc.). Then anyone who's interested can carry on reading and have more of a technical explanation after the contents section.109.55.102.160 (talk) 12:26, 10 September 2013 (UTC)

15/10/2013 Gogole say no with Pagerank Update[edit]

So Webmaster and SEOer also know Google said NO with Pagerank Update in future... What's it going on ? Nobody know why, but you can try check pagerank with multi and xml sitemap support with this link I gave.

Removed[edit]

Google uses an automated web spider called Googlebot to actually count links and gather other information on web pages.

It is by no means clear that the counting can be said to be done by Googlebot, and it is not intuitively a spidering operation, more likely a feature of the database to which the spidering software stores its flies. Therefore this needs a citation to be in the article. Clearly what parts of the Google infrastructure are called "Googlebot" is up to Google, however if it extends too far, the description needs to be changed. All the best: Rich Farmbrough, 13:25, 11 August 2014 (UTC).

Unnecessary assumption in derivation of power method?[edit]

In the power method section, the first step of the derivation is :

If the matrix \mathcal{M} is a transition probability, i.e., column-stochastic with no columns consisting of just zeros and \mathbf{R} is a probability distribution (i.e., |\mathbf{R}|=1, \mathbf{E}\mathbf{R}=\mathbf{1} where \mathbf{E} is matrix of all ones), Eq. (**) is equivalent to

   \mathbf{R} = \left( d \mathcal{M} + \frac{1-d}{N} \mathbf{E} \right)\mathbf{R} =: \widehat{ \mathcal{M}} \mathbf{R}.       (***)

And so on...

My Question: Is it necessary that \mathcal{M} has this particular property for this step of the derivation to hold? It seems that only the property that R is a probability distribution is required. — Preceding unsigned comment added by Mtjoul (talk • contribs) 22:41, 21 December 2014 (UTC)

External links modified[edit]

Hello fellow Wikipedians,

I have just added archive links to one external link on PageRank. Please take a moment to review my edit. You may add {{cbignore}} after the link to keep me from modifying it, if I keep adding bad data, but formatting bugs should be reported instead. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether, but should be used as a last resort. I made the following changes:

Attempted to fix sourcing for http://www.google.com/competition/howgooglesearchworks.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

As of February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete the "External links modified" sections if they want, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{sourcecheck}} (last update: 15 July 2018).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—^{cyberbot II}_{Talk to my owner:Online} 05:05, 31 March 2016 (UTC)

What does the link: Our products and services by Google have to do with PageRank? I followed the link but could not find PageRank. If it is somewhere at that URL, then the link should be refined. 5.34.22.174 (talk) 04:41, 5 January 2018 (UTC)

Minor error in "Damping Factor"[edit]

In "Damping Factor", after the two formulas, it states, "The difference between them is that the PageRank values in the first formula sum to one, while in the second formula each PageRank is multiplied by N and the sum becomes N." However (unless I'm blind), from the first equation to the second, only the first part has been multiplied by N. (Otherwise you'd have 1 - d + Nd(stuff).) With the second equation given as-is, the sum of page ranks is rather trickier than "N". Given that the equation is already acknowledged to be wrong, it's probably not urgent, but hey. — Preceding unsigned comment added by 98.102.161.228 (talk) 17:37, 22 August 2016 (UTC) The parameter of this code is good to be used in many problems..hoursguru.com

Damping factor for biological data[edit]

The article mentions 0.31 as the optimal value for d, however nowhere in the cited paper can I directly find 0.31. I can find a reference to an epsilon of 0.3 in Appendix 1.1, but I am not yet convinced this is equivalent to the damping factor d. Can someone clarify? --José Devezas (talk) 08:19, 19 December 2018 (UTC)

Examples are Wrong[edit]

It's not clear what does the value 0.5 represents
The sum(i, M_i,j) is never 1
I think columns/rows have been messed up. Comments mention that M_i,j represents link from j to i, but if you run a clear example, e.g, the following:

everything links to A which means A is an important page, and A links to C, thus C is important as well.

   M = np.array([
       [1, 1, 1],  # * -> A
       [0, 0, 0], 
       [1, 0, 0]   # A -> C
   ])
  print(pagerank(M, 0.001, 0.85)) 
  array([[0.61536926],  
         [0.28131799],
         [0.10331275]])

B should be last but its second. Am I missing something?

Talk:PageRank

Contents

Difference between Page Rank and Page Ranking[edit]

"named after Larry Page[1]"[edit]

Problem with equations in 'Power Method' section[edit]

Estimates on how many pages are PageRank 10, 9, 8, 7, 6, 5, 4, 3, 2, 1?[edit]

Page Rank or Page Score?[edit]

Can someone check the MATLAB code, I believe it should be Norm(v,1)[edit]

Intuitive Justification / Random Surfer Model[edit]

Article's Graphic: What's the Unit of Measure?[edit]

Article is too technical[edit]

15/10/2013 Gogole say no with Pagerank Update[edit]

Removed[edit]

Unnecessary assumption in derivation of power method?[edit]

External links modified[edit]

Minor error in "Damping Factor"[edit]

Damping factor for biological data[edit]

Examples are Wrong[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Interaction

Tools

Print/export

Languages