The recursive nature of the algorithm complicates analy-
sis of higher-order partitioning, so we restrict our attention
to the (simpler) case of a single bipartitioning of the graph.
Finding an optimal partition, which minimizes the NCut cri-
terion, is an NP-hard problem [19]. However, [19] shows that
when there is a partition (A, B) of V such that the 2nd small-
est eigenvector x1, of the eigensystem (D − W)x = λDx,
is piecewise constant with respect to a partition (A, B):
x1i = α, i ∈ A, and x1i = β,i ∈ B, β = α, then (A, B)
is the optimal partition—it minimizes the NCut criterion
and λ1 = NCut.
Recent analysis has focused on achieving a more thorough
understanding of the conditions under which x1 will be piece-
wise constant. Meila and Shi [13] outline a set of condi-
tions under which the spectral algorithm will return an ex-
act partitioning, showing that the spectral problem formu-
lated for NCut is equivalent to the eigenvectors/values of
the stochastic matrix P = D−1W. The authors connect
spectral clustering to Markov random walks, showing that
P will have an eigenvector that is piecewise constant w.r.t.
a partition (A1, A2) iff P is block-stochastic w.r.t. (A1, A2).
Here, block-stochastic means that the underlying Markov
random walk can be viewed as a Markov chain with state
space ∆ = (A1, A2) and transition probability matrix R =
[Pss' ]s,s'=1,2, where for s, s = 1, 2 , Σj∈As'
Pij is constant∀i ∈
As, and Pss' = Σj∈As'
Pij for any i ∈ As. This shows that
spectral clustering groups nodes based on the similarity of
their transition probabilities to subsets of the graph.
There has been little analysis of the impact of non-constant
transition probabilities on algorithm performance. Empir-
ical evidence indicates that the algorithm finds good par-
titions even when the transition probabilities are far from
constant. Ideally, we would like to characterize the condi-
tions necessary for optimal performance and bound algo-
rithm performance otherwise. As a first step, we analyze
asymptotic performance for non-constant intra- and inter-
cluster transition probabilities.
If we assume a generative model of the data where a latent
cluster variable (A1, A2), determines the attribute values in-
trinsic to the objects and the relationships among objects,
we can analyze the similarity metric S(i, j), and each entry
in W, as a random variable. Consider the entries of row
i. The entries Wij , Wik are not independent because the
similarity values are both based on node i. However, con-
ditioned on the state of i (e.g., attribute values of i), the
entries are independent random variables since the state of
j is independent of the state of k. As a result, the entries
of row i can be viewed as independent random variables.
With this model we can show that any similarity metric will
produce piecewise constant eigenvectors in the limit.
Theorem: Let ∆ = (A1, A2) be a partition of V . Let
the function S(i, j) define the similarity measure between
vi, vj ∈ V . If, ∀i, j, k, S(i, j) is conditionally independent
of S(i, k) given node i, and E[P11]E[P22] = E[P12]E[P21]
then, P has an eigenvector that will converge to piecewise
constant w.r.t. ∆ as |A1|, |A2|→∞.
We provide the intuition for the proof here and refer the
reader to Appendix A for details. If we view the entries of
W as random variables, the normalized values in P are also
random variables (i.e., the entries in W divided by a row
sum of random variables). The total intra- and inter-cluster
transition probabilities in P (e.g., Σj∈As'
Pij ) then corre-
spond to the ratio of two sums of random variables. Since
the transition probabilities are composed of sums of inde-
pendent random variables, as cluster size → ∞, the intra-
and inter-cluster transition probabilities will converge to the
same value for all nodes in each cluster. Therefore an eigen-
vector of the similarity matrix will converge to piecewise con-
stant w.r.t. (A1, A2), provided the intra- and inter-cluster
means (e.g., E[P11], E[P12]) are distinguishable.
This analysis indicates that all metrics will perform equally
in the limit. We expect however, that finite sample perfor-
mance will vary based on the characteristics of the metrics.
In particular, we expect that performance will be influenced
by the mean and variance of the intra- and inter cluster
transition probabilities. We demonstrate the impact of the
transition probability distributions below, using synthetic
data experiments.