## Thursday, January 24, 2013

### Mathematics behind Google's page rank algorithm

Thanks to professor Raymond Chan of CUHK mathematics, who gave an undergrad talk on the mathematics of Google, which I attended by chance.  In this post I'll try to re-present what he said, with some of my own variations.  (Mistakes, if any, are mine.)

I also combined materials from chapter 2 of the book "A First Course in Applied Mathematics" by Jorge Rebaza (2012):

## How much is the PageRank algorithm worth?

In 2012:
Google annual revenue = US 50.2   billion
Google total assets        = US 93.8   billion
Google market cap        = US 248.2 billion
Hong Kong's GDP         = US 243.7 billion

So the entire Hong Kong population working for a year, is roughly worth the outstanding stocks of Google.  Perhaps learning the PageRank algorithm can give us an estimate of how useful mathematics can be :)

## The algorithm

As of 2012 Google is searching 8.9 billion web pages.  First, it builds a directed graph ($\Gamma$) of how the web pages are linked to each other.

Example 1:

$$A = \left[ \begin{array}{c c c c c} 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 1/2 & 0 & 0 & 1/2 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \end{array} \right]$$

Example 2:

$$A = \left[ \begin{array}{c c c c c c} 0 & 1 & 0 & 0 & 0 & 0 \\ 1/2 & 0 & 1/2 & 0 & 0 & 0 \\ 1/2 & 1/2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1/3 & 0 & 1/3 & 1/3 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1/2 & 1/2 & 0 \end{array} \right]$$

In the graph:
nodes = web pages

The matrices are the adjacency matrices of the graphs.  Each row represents the out-going links from the page $P_i$.  If a page has $n$ out-going links, each link is weighted by $1/n$.

Notice that each row of the matrices sum to 1.  These are called stochastic matrices.

Then the rank of a page is defined by adding up the ranks of pages with out-links pointing to it:
$$r(P) = \sum_{i} \frac{r(P_i)}{|P_i|}$$
where $r(P_i)$ is the rank of Page $i$, and $|P_i|$ denotes the total number of out-going links from Page $i$.  The rationale is that it is a good thing to be pointed to, so the "goodness" of Page $i$ is passed to Page $P$;  but it has to be divided by $|P_i|$ because a page that points outwards "promiscuously" should have its influence diluted.

But this ranking has to be calculated iteratively starting from an initial ranking.  The iterative equation has the form:
$$r_k(P) = \sum_i \frac{r_{k-1}(P_i)}{|P_i|}, \quad \quad k = 1, 2, ...$$

This means that the "goodness" (ranking) is passed around the graph iteratively until their values at every node converge to stable values.

In vector notation:  let $v_k = [ r_k(P_1) \; r_k(P_2) \; ... \; r_k(P_N) ]^T$.  Then the iterative equation becomes:
$$v^T_k = v^T_{k-1} H$$
where
$$H_{ij} = \left\{ \begin{array}{l@l} \frac{1}{|P_i|} & \mbox{if } P_i \mbox{ has a link to } P \\ 0 & \mbox{otherwise} \end{array} \right.$$

Notice that matrix H is constant throughout the iteration.  That means we are repeatedly multiplying by H:
$$v = \lim_{k \rightarrow \infty} v_k = \lim_{k \rightarrow \infty} H^k v_0$$
where $v$ is the PageRank vector.

The above is reminiscent of the way we repeatedly press the "cosine" button on a calculator, and see the results converge.  That value is the solution to the equation:
$$\cos x = x.$$
In general, when we want to solve an equation of the form $f(x) = x$, we can repeatedly apply $f()$ starting from an initial value $x_0$.  The point for which $f(x^*) = x^*$ is called the fixed point, a very useful concept in mathematics.

The above iteration is called the power method, which will converge to the eigenvector with the greatest eigenvalue of the matrix $H$.  This is called the dominant eigenvector, which I will now explain...

## Power method

This section explains why the power method converges.  Recall the definition of eigenvector and eigenvalue:
$$A x = \lambda x$$
where $x$ = eigenvector, $\lambda$ = eigenvalue.

Assume that the matrix A satisfies these conditions:

• $A$ has a complete set of linearly independent eigenvectors $v_1, ... v_n$
• the corresponding eigenvalues can be ordered:
$|\lambda_1| > |\lambda_2| \ge ... \ge |\lambda_n|$
(notice the strict > in the first term)
For iteration, choose an arbitrary initial vector $x_0$.  Due to the eigenvectors forming a basis, we can express $x_0$ as:
$$x_0 = c_1 v_1 + ... + c_n v_n$$

Now we multiply the above with matrix A and get:
$$\begin{array}{l@=l} A x_0 & = c_1 \lambda_1 v_1 + c_2 \lambda_2 v_2 + ... + c_n \lambda_n v_n \\ A^2 x_0 & = c_1 \lambda_1^2 v_1 + c_2 \lambda_2^2 v_2 + ... + c_n \lambda_n^2 v_n \\ & \vdots \\ A^k x_0 & = c_1 \lambda_1^k v_1 + c_2 \lambda_2^k v_2 + ... + c_n \lambda_n^k v_n \end{array}$$

And because $|\lambda_1| > |\lambda_2| \ge ... \ge |\lambda_n|$, as $k \rightarrow \infty$, the weight of the first term outgrows those of the rest, so the vector $A^k x_0$ moves towards the dominant eigenvector direction.

The rate of convergence is determined by the ratio $\frac{ |\lambda_2| } { |\lambda_1| }$.

For the power method to converge, we need to ensure that the matrix is stochastic and irreducible....

## Irreducible matrices

The section explains the necessary condition for the power method to converge;  but the measure to force convergence also gave rise to Google's personalized search, as we shall see why.

A matrix is called reducible if there exists a permutation matrix $P$ such that:
$$P^T A P = \left[ \begin{array}{cc} C& D \\ \mathbf{0} & E \end{array} \right]$$
where $\mathbf{0}$ is a $(n-r) \times r$ block matrix with 0's.

Observe that the above definition expresses the similarity between $A$ and a block upper triangular matrix.  (Note:  2 matrices $A$ and $B$ are similar if there is a $P$ such that $B = P A P^{-1}$, where P is any invertible matrix.  This means that the effect of applying $A$ is the same as applying $B$ between a change of coordinates and back.  In the case of permutation matrices, $P^{-1}$ coincides with $P^T$.)

The significance of this definition is that:
matrix A is irreducible $\Longleftrightarrow \Gamma(A)$ is strongly connected

A graph is strongly connected if every node is reachable from any other node.

Example:
where
$$A = \left[ \begin{array}{c c c c} 0 & 1/3 & 1/3 & 1/ 3 \\ 1/2 & 0 & 1/2 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array} \right]$$

$$B = \left[ \begin{array}{c c c c} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1/3 & 1/3 & 0 & 1/3 \\ 1/2 & 0 & 1/2 & 0 \end{array} \right]$$

$$B = P^T A P$$

$$P = \left[ \begin{array}{c c c c} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{array} \right]$$

The effect of multiplying by $P$ (a permutation matrix), is to exchange some rows, hence the name.
Notice that there is no way in matrix $A$ to go from the nodes 3 & 4 to nodes 1 & 2.  Thus matrix $A$ is reducible.

To ensure irreducibility (in order for the power method to work), Google forces every page to be reachable from every other page.

Google's solution is to add a perturbation matrix to $H$:
$$G = \alpha H + (1 - \alpha) E$$
where $G$ is the Google matrix, $\alpha \in (0,1)$, and $E = e \; e^T / n$ where $e$ is a vector or 1's.  This makes every page reachable to every page, but later Google modified this to:
$$E = e \; u^T$$
where $u$ is a probabilistic vector called the personalization vector, which plays a great role in Google's success.  More details are explained in Rebaza's book.

Hope this is helpful to you :)