I also combined materials from chapter 2 of the book

**"A First Course in Applied Mathematics"**by Jorge Rebaza (2012):

## How much is the PageRank algorithm worth?

In 2012:

Google annual revenue = US 50.2 billion

Google total assets = US 93.8 billion

Google market cap = US 248.2 billion

Hong Kong's GDP = US 243.7 billion

So the entire Hong Kong population working for a year, is roughly worth the outstanding stocks of Google. Perhaps learning the PageRank algorithm can give us an estimate of how useful mathematics can be :)

In the graph:

nodes = web pages

links =

The matrices are the

Notice that each row of the matrices sum to 1. These are called

Then the rank of a page is defined by adding up the ranks of pages with out-links pointing to it:

$$ r(P) = \sum_{i} \frac{r(P_i)}{|P_i|} $$## The algorithm

As of 2012 Google is searching 8.9 billion web pages. First, it builds a

**directed graph**($\Gamma$) of how the web pages are**linked**to each other.
Example 1:

$$ A = \left[ \begin{array}{c c c c c} 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 1/2 & 0 & 0 & 1/2 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \end{array} \right] $$

Example 2:

$$ A = \left[ \begin{array}{c c c c c c} 0 & 1 & 0 & 0 & 0 & 0 \\ 1/2 & 0 & 1/2 & 0 & 0 & 0 \\ 1/2 & 1/2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1/3 & 0 & 1/3 & 1/3 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1/2 & 1/2 & 0 \end{array} \right] $$

In the graph:

nodes = web pages

links =

**out-going**links from page to pageThe matrices are the

**adjacency matrices**of the graphs. Each row represents the**out-going**links from the page $P_i$. If a page has $n$ out-going links, each link is weighted by $1/n$.**stochastic matrices**.Then the rank of a page is defined by adding up the ranks of pages with out-links pointing to it:

where $r(P_i)$ is the rank of Page $i$, and $|P_i|$ denotes the total number of out-going links from Page $i$. The rationale is that it is a good thing to be pointed to, so the "goodness" of Page $i$ is passed to Page $P$; but it has to be divided by $|P_i|$ because a page that points outwards "promiscuously" should have its influence diluted.

But this ranking has to be calculated

**starting from an initial ranking. The iterative equation has the form:**

*iteratively*$$ r_k(P) = \sum_i \frac{r_{k-1}(P_i)}{|P_i|}, \quad \quad k = 1, 2, ... $$

This means that the "goodness" (ranking) is passed around the graph iteratively until their values at every node

**converge**to stable values.

$$ v^T_k = v^T_{k-1} H $$

where

$$ H_{ij} = \left\{ \begin{array}{l@l} \frac{1}{|P_i|} & \mbox{if } P_i \mbox{ has a link to } P \\ 0 & \mbox{otherwise} \end{array} \right. $$

Notice that matrix H is

*constant*throughout the iteration. That means we are repeatedly multiplying by H:

$$ v = \lim_{k \rightarrow \infty} v_k = \lim_{k \rightarrow \infty} H^k v_0 $$

where $v$ is the

**PageRank vector**.

The above is reminiscent of the way we repeatedly press the "cosine" button on a calculator, and see the results converge. That value is the solution to the equation:

$$ \cos x = x. $$

In general, when we want to solve an equation of the form $ f(x) = x $, we can repeatedly apply $f()$ starting from an initial value $x_0$. The point for which $f(x^*) = x^*$ is called the

**fixed point**, a very useful concept in mathematics.

The above iteration is called the

**power method**, which will converge to the eigenvector with the greatest eigenvalue of the matrix $H$. This is called the

**dominant eigenvector**, which I will now explain...

## Power method

This section explains why the power method converges. Recall the definition of eigenvector and eigenvalue:$$ A x = \lambda x $$

where $x$ = eigenvector, $\lambda$ = eigenvalue.

Assume that the matrix A satisfies these conditions:

- $A$ has a complete set of linearly independent eigenvectors $v_1, ... v_n$
- the corresponding eigenvalues can be ordered:

$ |\lambda_1| > |\lambda_2| \ge ... \ge |\lambda_n| $

(notice the strict > in the first term)

For iteration, choose an arbitrary initial vector $x_0$. Due to the eigenvectors forming a basis, we can express $x_0$ as:

$$ x_0 = c_1 v_1 + ... + c_n v_n $$

Now we multiply the above with matrix A and get:

$$ \begin{array}{l@=l} A x_0 & = c_1 \lambda_1 v_1 + c_2 \lambda_2 v_2 + ... + c_n \lambda_n v_n \\ A^2 x_0 & = c_1 \lambda_1^2 v_1 + c_2 \lambda_2^2 v_2 + ... + c_n \lambda_n^2 v_n \\ & \vdots \\ A^k x_0 & = c_1 \lambda_1^k v_1 + c_2 \lambda_2^k v_2 + ... + c_n \lambda_n^k v_n \end{array} $$

And because $ |\lambda_1| > |\lambda_2| \ge ... \ge |\lambda_n| $, as $k \rightarrow \infty$, the weight of the first term outgrows those of the rest, so the vector $A^k x_0$ moves towards the dominant eigenvector direction.

The

**rate of convergence**is determined by the ratio $\frac{ |\lambda_2| } { |\lambda_1| }$.For the power method to converge, we need to ensure that the matrix is stochastic and

**irreducible**....

## Irreducible matrices

The section explains the necessary condition for the power method to converge; but the measure to force convergence also gave rise to Google's personalized search, as we shall see why.A matrix is called

**reducible**if there exists a

**permutation matrix**$P$ such that:

$$ P^T A P = \left[ \begin{array}{cc} C& D \\ \mathbf{0} & E \end{array} \right] $$

where $\mathbf{0}$ is a $(n-r) \times r$ block matrix with 0's.

Observe that the above definition expresses the

**similarity**between $A$ and a

**block upper triangular matrix**. (Note: 2 matrices $A$ and $B$ are

**similar**if there is a $P$ such that $B = P A P^{-1}$, where P is any invertible matrix. This means that the effect of applying $A$ is the same as applying $B$ between a change of coordinates and back. In the case of permutation matrices, $P^{-1}$ coincides with $P^T$.)

The significance of this definition is that:

matrix A is irreducible $\Longleftrightarrow \Gamma(A) $ is strongly connected

A graph is

**strongly connected**if every node is

**reachable**from any other node.

Example:

where

$$ A = \left[ \begin{array}{c c c c} 0 & 1/3 & 1/3 & 1/ 3 \\ 1/2 & 0 & 1/2 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array} \right] $$

$$ B = \left[ \begin{array}{c c c c} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1/3 & 1/3 & 0 & 1/3 \\ 1/2 & 0 & 1/2 & 0 \end{array} \right] $$

$$ B = P^T A P $$

$$ P = \left[ \begin{array}{c c c c} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{array} \right] $$

The effect of multiplying by $P$ (a

Notice that there is no way in matrix $A$ to go from the nodes 3 & 4 to nodes 1 & 2. Thus matrix $A$ is

The effect of multiplying by $P$ (a

**permutation matrix**), is to exchange some rows, hence the name.Notice that there is no way in matrix $A$ to go from the nodes 3 & 4 to nodes 1 & 2. Thus matrix $A$ is

*reducible*.
To ensure irreducibility (in order for the power method to work), Google forces every page to be reachable from every other page.

Google's solution is to add a

**perturbation**matrix to $H$:
$$ G = \alpha H + (1 - \alpha) E $$

where $G$ is the

**Google matrix**, $\alpha \in (0,1)$, and $E = e \; e^T / n$ where $e$ is a vector or 1's. This makes every page reachable to every page, but later Google modified this to:
$$ E = e \; u^T $$

where $u$ is a probabilistic vector called the

**personalization vector**, which plays a great role in Google's success. More details are explained in Rebaza's book.
Hope this is helpful to you :)

This comment has been removed by a blog administrator.

ReplyDeleteThis comment has been removed by a blog administrator.

ReplyDeleteThis comment has been removed by a blog administrator.

ReplyDelete