www.flowers-to-world.com - buy flower online

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

Base article about links

Page 1
Spectral Clustering with Links and Attributes
Jennifer Neville
Department of Computer
Science
University of Massachusetts
Amherst, MA 01003
jneville@cs.umass.edu
Micah Adler
Department of Computer
Science
University of Massachusetts
Amherst, MA 01003
micah@cs.umass.edu
David Jensen
Department of Computer
Science
University of Massachusetts
Amherst, MA 01003
jensen@cs.umass.edu
ABSTRACT
If relational data contain communities—groups of inter-related
items with similar attribute values—a clustering technique
that considers attribute information and the structure of
relations simultaneously should produce more meaningful
clusters than those produced by considering attributes alone.
We investigate this hypothesis in the context of a spectral
graph partitioning technique, considering a number of hy-
brid similarity metrics that combine both sources of infor-
mation. Through simulation, we find that two of the hybrid
metrics achieve superior performance over a wide range of
data characteristics. We analyze the spectral decomposition
algorithm from a statistical perspective and show that the
successful hybrid metrics exaggerate the separation between
cluster similarity values, at the expense of increased vari-
ance. We cluster several relational datasets using the best
hybrid metric and show that the resulting clusters exhibit
significant community structure, and that they significantly
improve performance in a related classification task.
Categories and Subject Descriptors
I.5.3 [Clustering]: Pattern Recognition
Keywords
Clustering, Relational Learning, Spectral Analysis
1. INTRODUCTION
Spectral clustering techniques, which partition data into dis-
joint clusters using the eigenstructure of a similarity ma-
trix, have been successfully applied in a number of domains,
including image segmentation [19] and document cluster-
ing [5]. Finding an optimal partition is in general NP com-
plete, but the eigenvectors of the matrix provide some infor-
mation that can be used to guide an approximate solution.
Experimental evidence has shown this heuristic approach of-
ten works well in practice and has prompted further inves-
tigation into the properties of spectral clustering. Recent
findings—facilitated by a long history of work in spectral
graph theory (e.g., [2])—include a connection to random
walks [13] and preliminary performance analysis [10, 16].
In this paper, we investigate methods of adapting spectral
clustering techniques to relational domains.
The goal of this work is to find communities in relational
data represented as an attributed graph G = (V,E,X),
where the nodes V represent objects in the data (e.g., genes),
the edges E represent relations among the objects (e.g., in-
teractions), and the attributes X record data about each ob-
ject (e.g., localization). Community clusters identify groups
of objects that have similar attributes and are also highly
inter-related. For example in genomic data, a group of genes
with similar attributes and many common interactions may
all be involved in a similar function in the cell. The underly-
ing assumption is that there is a latent cluster variable that
influences both the attribute values intrinsic to objects and
the relationships among objects. In particular, objects are
more likely to link to other objects in the same cluster than
objects in other clusters, and pairs of objects within a clus-
ter are more likely to have similar attribute values than pairs
spanning different clusters. A clustering algorithm that ex-
amines both link structure and attributes simultaneously
should be more robust to noise than methods examining
attribute or link information in isolation.
There has been little work applying spectral techniques to
relational domains with a combination of link and attribute
information. Existing techniques use either: (1) a complete
graph where attribute similarity is calculated for all n × n
pairs of objects (e.g., [16]), or (2) a nearest neighbor graph,
where attribute similarity is calculated for n × d pairs of
objects—each object is connected to a fixed number (d) of
other objects determined by spatial locality (e.g., [19]). Our
work differs in that we are trying to incorporate the hetero-
geneous relational structure into the similarity metric.
The similarity metric, used to populate the similarity ma-
trix, provides a means to extend spectral techniques to new
domains. However, the success of spectral clustering tech-
niques depends heavily on the choice of metric. There has
been some research into learning the correct similarity func-
tion from labeled data (e.g., [1]), but for domains where the
correct clustering is unknown, design has been approached
in a relatively ad-hoc manner. This leaves us with little guid-
ance as to how to incorporate link and attribute information
into a metric for relational domains. This work investigates
the design of similarity metrics that incorporate multiple

Page 2
sources of information and identifies the characteristics that
underlie successful metrics.
Specifically, we analyze the normalized cut (NCut) spec-
tral partitioning algorithm [19] from a statistical perspec-
tive. For the special case of bi-partitioning, we show that as
cluster size → ∞, the spectral decomposition will include an
eigenvector that is piecewise constant, with respect to the
clusters, for any similarity metric where the average intra-
cluster similarity differs from the average inter-cluster sim-
ilarity. If the eigenvector associated with the 2nd smallest
eigenvalue of the similarity matrix is piecewise constant, the
spectral partitioning will be exact [19]. Next, we empirically
evaluate the effect of finite cluster sizes using synthetic data.
We show that: (1) decreasing variance of cluster similari-
ties, and increasing separation of similarities, both improve
the ordering of the eigenvector with respect to the clusters,
and (2) increasing the separation of cluster similarities has
a greater impact on algorithm performance when the NCut
objective function is used. This indicates that a metric that
increases variance in order to better separate the cluster sim-
ilarities will perform better over a wider range of conditions.
Based on these results, we propose a hybrid similarity metric
for relational data that incorporates link and attribute infor-
mation, and we evaluate performance on several relational
datasets. We show that resulting clusters exhibit signifi-
cant community structure and demonstrate significant per-
formance gains when using the resulting clusters in a related
classification task.
2. SPECTRAL CLUSTERING
Spectral clustering originated with graph partitioning tech-
niques that exploit the connection between eigenvectors and
algebraic properties of a graph (e.g., [6, 7]). Recently, Shi
and Malik [19] presented a new clustering algorithm that
uses spectral partitioning to optimize the NCut objective
function. We investigate the application of this algorithm
to relational domains through the use of similarity metrics
that incorporate link and attribute information.
The NCut algorithm of [19] clusters datasets through eigen-
value decomposition of a similarity matrix. The algorithm
is a divisive, hierarchical clustering algorithm, which takes a
graph G = (V,E), a set of k attributes X = {X1, ··· , Xk},
where Xk = {xk
i
: vi ∈ V }, and a similarity function S,
where S(i, j) defines the similarity between vi, vj ∈ V , and
recursively partitions the graph as follows:
Let WN×N = [S(i, j)] be the similarity matrix and let D
be an N ×N diagonal matrix with di = Σj∈V S(i, j). Solve
the eigensystem (D − W)x = λDx for the eigenvector x1
associated with the 2nd smallest eigenvalue λ1. Consider m
uniform values between the minimum and maximum value
in x1. For each value m: bipartition the nodes into (A, B)
such that A ∩ B = ∅, A ∪ B = V, and ∀va ∈ A x1a < m, and
calculate the NCut value for the partition, NCut(A, B) =
P
ieA,jeB S(i,j)
P
ieA di
+
P
ieA,jeB S(i,j)
P
jeB dj
. Partition the graph into
the (A, B) with minimum NCut. If stability(A, B) ≤ c, re-
cursively repartition A and B.1
1We use the stability threshold proposed in [19] where the sta-
bility value is the ratio of the minimum and maximum bin sizes,
after the values of x1 are binned by value into m bins. All the ex-
It takes O(n3) operations to solve for all eigenvalues of an
arbitrary eigensystem. However, O(|E|) approximate algo-
rithms exist [10], and if the weight matrix is sparse, O(n1.4)
Lanczos algorithms can be used to compute the solution [18]—
for this reason, similarity metrics that produce sparse ma-
trices are preferable.
Our hybrid metrics calculate the similarity between objects
i and j through a weighted combination of attribute and link
information: S(i, j) = α · 1
k
Σ
k sk(i, j) + (1 − α) · l, where
sk(i, j)=1if xk
i = xk
j and 0 otherwise, and l = 1 if eij ∈ E
or eji ∈ E, and 0 otherwise.
When α = 1, we refer to the metric as AttrOnly. When
α = 0, we refer to the metric as LinkOnly. These metrics
are included as baselines—one for data clustering techniques
that ignore link information, and the other for graph par-
titioning techniques that ignore attribute information. At-
trOnly calculates similarity by counting the number of at-
tribute values objects i and j have in common (scaled by k so
the maximum similarity is 1). LinkOnly uses the relational
structure as a measure of similarity.
When α = k
k+1 , we refer to the metric as LinkAsAttr. This
approach is an obvious way to include relational information—
links are incorporated as a match on the (k + 1)th attribute.
With no prior domain knowledge, we have no reason to ex-
pect that link structure contains more information than at-
tribute values. However, link structure is often central in
relational domains—for example, in a graph of hyperlinked
web documents, we expect a link to confer more information
about topic clustering than a match on a single word for two
pages. To better exploit the relational information, we set
α = 1
2 . This metric, referred to as WtLinkAttr1, combines
the link and attribute information uniformly—high similar-
ity indicates that two objects are related or have a number
of attribute values in common.
In sparse relational graphs, the expected intra-cluster link
similarity will be less than one, even if the links are per-
fectly correlated with cluster membership. In this case, if the
link and attribute information are combined uniformly (e.g.,
WtLinkAttr1), or if the attributes are given proportionally
more weight (e.g., LinkAsAttr), noise in the attributes can
drown out a strong link signal. An approach that gives the
link information proportionally more weight (e.g., α > 1
2 )
may achieve better performance. In practice we will not
know how to scale the link information to combine the two
sources of information equally. However, for the synthetic
experiments discussed in the next section, we know the max-
imum edge probability is 0.2 so setting α = 1
6 equalizes the
attribute and link signals. When α = 1
6 , we refer to the met-
ric as WtLinkAttr2. Although we will not know the scaling
factor in practice, we include this metric to test the con-
jecture that the poor performance of WtLinkAttr1 is due
to the relatively weak link signal being combined uniformly
with the attribute signal.
When α = l, we refer to the metric as LinkAsFilter. It cal-
periments in this paper used the settings: m = llog2(N)+1i, and
c = 0.06. Sensitivity analysis on synthetic data shows c = 0.06 to
be a conservative threshold, returning clusters with high precision
but low recall.

Page 3
culates similarity by weighting the existing edges of G with
the AttrOnly metric. Objects that are not directly related
have a similarity of 0 regardless of their attribute values. A
high similarity score indicates that two objects are related
and have a number of attribute values in common. This ap-
proach incorporates both sources of information while main-
taining the sparsity of the relational data graph so the algo-
rithm can use efficient eigensolver techniques.
3. ALGORITHM ANALYSIS
The recursive nature of the algorithm complicates analy-
sis of higher-order partitioning, so we restrict our attention
to the (simpler) case of a single bipartitioning of the graph.
Finding an optimal partition, which minimizes the NCut cri-
terion, is an NP-hard problem [19]. However, [19] shows that
when there is a partition (A, B) of V such that the 2nd small-
est eigenvector x1, of the eigensystem (D − W)x = λDx,
is piecewise constant with respect to a partition (A, B):
x1i = α, i ∈ A, and x1i = β,i ∈ B, β = α, then (A, B)
is the optimal partition—it minimizes the NCut criterion
and λ1 = NCut.
Recent analysis has focused on achieving a more thorough
understanding of the conditions under which x1 will be piece-
wise constant. Meila and Shi [13] outline a set of condi-
tions under which the spectral algorithm will return an ex-
act partitioning, showing that the spectral problem formu-
lated for NCut is equivalent to the eigenvectors/values of
the stochastic matrix P = D−1W. The authors connect
spectral clustering to Markov random walks, showing that
P will have an eigenvector that is piecewise constant w.r.t.
a partition (A1, A2) iff P is block-stochastic w.r.t. (A1, A2).
Here, block-stochastic means that the underlying Markov
random walk can be viewed as a Markov chain with state
space ∆ = (A1, A2) and transition probability matrix R =
[Pss' ]s,s'=1,2, where for s, s = 1, 2 , Σj∈As'
Pij is constant∀i ∈
As, and Pss' = Σj∈As'
Pij for any i ∈ As. This shows that
spectral clustering groups nodes based on the similarity of
their transition probabilities to subsets of the graph.
There has been little analysis of the impact of non-constant
transition probabilities on algorithm performance. Empir-
ical evidence indicates that the algorithm finds good par-
titions even when the transition probabilities are far from
constant. Ideally, we would like to characterize the condi-
tions necessary for optimal performance and bound algo-
rithm performance otherwise. As a first step, we analyze
asymptotic performance for non-constant intra- and inter-
cluster transition probabilities.
If we assume a generative model of the data where a latent
cluster variable (A1, A2), determines the attribute values in-
trinsic to the objects and the relationships among objects,
we can analyze the similarity metric S(i, j), and each entry
in W, as a random variable. Consider the entries of row
i. The entries Wij , Wik are not independent because the
similarity values are both based on node i. However, con-
ditioned on the state of i (e.g., attribute values of i), the
entries are independent random variables since the state of
j is independent of the state of k. As a result, the entries
of row i can be viewed as independent random variables.
With this model we can show that any similarity metric will
produce piecewise constant eigenvectors in the limit.
Theorem: Let ∆ = (A1, A2) be a partition of V . Let
the function S(i, j) define the similarity measure between
vi, vj ∈ V . If, ∀i, j, k, S(i, j) is conditionally independent
of S(i, k) given node i, and E[P11]E[P22] = E[P12]E[P21]
then, P has an eigenvector that will converge to piecewise
constant w.r.t. ∆ as |A1|, |A2|→∞.
We provide the intuition for the proof here and refer the
reader to Appendix A for details. If we view the entries of
W as random variables, the normalized values in P are also
random variables (i.e., the entries in W divided by a row
sum of random variables). The total intra- and inter-cluster
transition probabilities in P (e.g., Σj∈As'
Pij ) then corre-
spond to the ratio of two sums of random variables. Since
the transition probabilities are composed of sums of inde-
pendent random variables, as cluster size → ∞, the intra-
and inter-cluster transition probabilities will converge to the
same value for all nodes in each cluster. Therefore an eigen-
vector of the similarity matrix will converge to piecewise con-
stant w.r.t. (A1, A2), provided the intra- and inter-cluster
means (e.g., E[P11], E[P12]) are distinguishable.
This analysis indicates that all metrics will perform equally
in the limit. We expect however, that finite sample perfor-
mance will vary based on the characteristics of the metrics.
In particular, we expect that performance will be influenced
by the mean and variance of the intra- and inter cluster
transition probabilities. We demonstrate the impact of the
transition probability distributions below, using synthetic
data experiments.
4. SYNTHETIC DATA EXPERIMENTS
In order to identify the situations where we can expect each
of the similarity metrics to perform well, we evaluate al-
gorithm performance on synthetic data sets for which the
correct clustering is known. This facilitates analysis over a
wide range of conditions.
4.1 Synthetic Data
Our synthetic data sets are undirected, connected graphs
(G = (V,E)) where nodes correspond to objects and edges
correspond to relations among objects. Unless otherwise in-
dicated, |V | = 200. A binary label, C = {+, −}, is used
to represent cluster membership; labels are assigned ran-
domly to each object with P(+) = 0.5. Each object has five
binary attributes, where the attribute values are assigned
randomly given the object’s cluster label. Edges are added
to the graph by considering each pair of objects in V in-
dependently, and adding edges randomly given the cluster
labels of the two objects.
The experiments record algorithm performance while vary-
ing both attribute and link association. Within each level of
correlation, all five attributes were generated with the same
probability: P+ = P(A = 1|C = +) = {0.50, 0.55,..., 0.95, 1.0},
P− = P(A = 1|C = −)=1.0 − P+. The symmetry in at-
tribute parameters simplifies the analytical analysis but it
is not necessary for algorithm correctness. Intra-cluster and
inter-cluster links were generated with the following range of
probabilities: Pl
in = P(eij |Ci = Cj ) = {0.10, 0.12,..., 0.18, 0.20},
Pl
out = P(eij |Ci = Cj )=0.2 − Pl
in. Here the range of prob-
abilities, and symmetry, was chosen to produce a graph with

Page 4
approximately 10% of the n(n − 1)/2 possible edges. This
level of linkage is comparable to the levels of sparsity we
have observed in real-world relational data sets.
4.2 Metric Performance
We measured the accuracy of the six metrics across the range
of attribute and link probabilities described above. Figure 1
reports the accuracy of the clusterings returned by the simi-
larity metrics, averaged over 100 trials at each setting. Note
that the bottom, foremost corner of each plot represents
completely random link and attribute information, where
no metric should do better than 0.5.
LinkOnly and AttrOnly performance is as expected—they
perform well when the link, or respectively attribute, signal
is moderate to high, but poorly otherwise. The LinkAsAttr
and WtLinkAttr1 results are comparable to AttrOnly. How-
ever, the LinkAsFilter and WtLinkAttr2 metrics achieve
perfect accuracy over a wide range of conditions, with LinkAs-
Filter covering more space than WtLinkAttr2. These met-
rics should yield good results in datasets where either the
links or the attributes are moderately correlated with the
clusters. However, they do not always perform as well as
LinkOnly and AttrOnly. Consider the LinkOnly results when
link correlation is moderate and attribute correlation is low—
both hybrid metrics achieve significantly lower accuracy than
would be achieved considering links in isolation. Similar be-
havior is apparent for the AttrOnly metric, but notice that
the effect is more pronounced in this situation. This indi-
cates that the two metrics rely more heavily on link infor-
mation and illustrates the tradeoff for utilizing both sources
of information—the additional information increases vari-
ance, which will impair performance in some situations, in
exchange for better coverage of the space.
4.3 Performance Analysis
LinkAsFilter and WtLinkAttr2 achieve superior performance
over a wide range of data characteristics, but what is the
mechanism by which this occurs? Following our analysis
in section 3, we hypothesize that metric performance is in-
fluenced by intra- and inter-cluster transition probabilities.
We conjecture that the algorithm will be able to distinguish
clusters, if the distributions of intra- and inter-cluster tran-
sition probabilities are separable, where separation depends
on the mean and variance of the transition probabilities.
Given our data generation parameters, we can calculate
intra- and inter-cluster mean transition probabilities ana-
lytically. Recall that our data generation process produces
the same distribution for each cluster, and furthermore, we
know that the transition probabilities in P are normalized
to sum to one. This means we can examine µPin = E[Pin]
from a single set of distributions, µPin
and µPout . When
µPin
= 1.0 there is maximal separation between the two
clusters; µPin = 0.5 corresponds to no separation.
Figure 2 graphs µPin vs. attribute/link correlations. The
shapes of the graphs are quite similar to the accuracy graphs
in figure 1, indicating a strong relationship between mean
separation and algorithm performance. However, the areas
where we observe perfect performance (i.e., accuracy = 1.0)
do not necessarily correspond to maximum mean separa-
tion (i.e., µPin ≤ 1.0). This illustrates a difference between
the LinkAsFilter and WtLinkAttr2 metrics—µPin is signif-
icantly higher on average for the LinkAsFilter metric.
To examine the effect of µPin
on algorithm performance,
we analyzed the data from all metrics concurrently. Figure
3a graphs µPin vs. accuracy for the experiments reported
above, combining results from all the metrics in the same
graph. There is a clear relationship between µPin and accu-
racy (corr= 0.849, p ≪ 0.05)—accuracy is consistenly high
for µPin > 0.675 and consistently low otherwise. We looked
at the association between µPin
and the eigenvector val-
ues in x1 using a number of different measures of eigenvec-
tor stability. Only one measure showed a clear relationship
to µPin —a measure of the quality of the ordering in the
(sorted) eigenvector, which looked at the sorted eigenvector
and recorded the maximum accuracy possible from the set
of m possible partition values considered by the algorithm.
The linear search for an optimal partition (in the NCut al-
gorithm) should not be adversely affected by degradation of
piecewise constancy unless the degradation also affects the
ordering of objects’ eigenvector values. If the maximum ac-
curacy is low, this indicates disorder in the eigenvector. The
evector ordering measure is graphed against µPin in figure
3b. It shows that decreasing µPin results in a disordering of
the eigenvector values. These results explain the high accu-
racy results—for µPin > 0.675 there is little disorder in the
eigenvector.
Figure 3c graphs evector ordering vs. accuracy. There is
a strong correlation between evector ordering and accuracy,
but there are also a significant number of trials with very
little disorder that achieve only low accuracy. This effect
is explained by figure 3d, where we graph the precision of
the smallest cluster returned by the algorithm. This shows
that when the eigenvector is ordered correctly but the al-
gorithm only achieves low accuracy, it is because the algo-
rithm prefers to separate a small, but pure, cluster from the
rest of the graph. Why does the algorithm break off small,
high-precision clusters even when the eigenvector ordering
is correct? This is not a spurious effect due to consideration
of only a small number of thresholds (e.g., m values). It
remains consistent even when we set m = N. We discuss
reasons for this effect below.
We have shown that mean separation affects algorithm per-
formance through the ordering of the objects’ eigenvector
values, but how does variance interact with mean sepa-
ration to degrade performance? Figures 4a-b graph the
same variables as figure 3a, but for a set of experiments
with |V | = 500, and |V | = 50. This illustrates the im-
pact of decreased, and increased, variance in the transition
probabilities—increasing variance impairs performance for
all µPin , but decreasing variance only improves performance
for µPin > 0.675. This is contrary to our expectation that
decreased variance would improve performance by increas-
ing the separation between cluster transition probabilities.
However, this effect is due to the NCut optimization, not
the ordering of the eigenvector values. Figure 4c shows a
box plot of evector ordering as a function of sample size, for
the set of trials with µPin < 0.675. Except for the small-
est sample size, where we see higher accuracy due to chance
alone, the mean ordering value is monotonically increasing
with sample size. Figure 4d graphs accuracy results for the

Page 5
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(a)
Pin
l
P+
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(b)
Pin
l
P+
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(c)
Pin
l
P+
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(d)
Pin
l
P+
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(e)
Pin
l
P+
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(f)
Pin
l
P+
Accuracy
Figure 1: Cluster accuracy of metrics on synthetic data: (a) AttrOnly, (b) LinkOnly, (c) LinkAsAttr, (d)
WtLinkAttr1, (e) WtLinkAttr2, and (f) LinkAsFilter.
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(a)
Pin
l
P+
Intra cluster mean
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(b)
Pin
l
P+
Intra cluster mean
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(c)
Pin
l
P+
Intra cluster mean
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(d)
Pin
l
P+
Intra cluster mean
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(e)
Pin
l
P+
Intra cluster mean
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(f)
Pin
l
P+
Intra cluster mean
Figure 2: Intra-cluster means of metrics for synthetic data: (a) AttrOnly, (b) LinkOnly, (c) LinkAsAttr, (d)
WtLinkAttr1, (e) WtLinkAttr2, and (f) LinkAsFilter.

Page 6
0.5
0.6
0.7
0.8
0.9
1.0
0.5
0.7
0.9
(a)
Intra cluster mean
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.5
0.6
0.7
0.8
0.9
1.0
(b)
Intra cluster mean
Evector ordering
0.5
0.6
0.7
0.8
0.9
1.0
0.5
0.7
0.9
(c)
Evector ordering
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.5
0.7
0.9
(d)
Evector ordering
Precision
Figure 3: Analysis of intra-cluster mean on algo-
rithm performance: (a) 200 objects, (b) µPin vs. or-
dering, (c) ordering vs. accuracy and (d) precision.
same sample, showing that the algorithm converges to low
accuracies as sample size increases. Maximizing the NCut
criterion causes the algorithm to consistently prefer high
precision over high accuracy when the separation between
intra- and inter-cluster transition probabilities is low (i.e.,
µPin < 0.675). This indicates that metrics with low µPin
should not be combined with the NCut criterion.
It is now clear that the WtLinkAttr2 and LinkAsFilter met-
rics achieve their good performance due to high µPin , but
what do they tradeoff for this increased separation? Fig-
ure 5a graphs a box plot of µPin for each metric individually.
This is a one-dimensional summary of the data in figure 2,
which again illustrates that the µPin is significantly higher
for the LinkAsFilter metric on average. Figure 5b graphs a
box plot of the variance of Pin for each metric. This shows
that LinkAsFilter trades off higher variance for increased
mean separation. Figure 4c-d graphs the performance of
WtLinkAttr2 and LinkAsFilter for |V | = 50. Compare this
to figure 1 to see that performance degradation is not uni-
form across metrics. The LinkAsFilter metric is adversly
affected over a wider range of data conditions. This il-
lustrates the primary distinction between LinkAsFilter and
WtLinkAttr2. The LinkAsFilter metric reduces the amount
of information it uses in order to increase the mean sepa-
ration between the clusters. Because it is filtering the at-
tribute information through the existing edges of the graph,
it throws away both useful and noisy data and increases the
variance of the transition probabilities. If the sample size is
large enough to withstand this increase in variance, then the
metric will produce superior clusterings. However, when the
sample size is low, the filter can do more harm than good.
For example, filtering through the existing edges may dis-
connect a previously connected cluster. In these situations,
it may be best to use the WtLinkAttr2 metric, which suf-
0.5
0.6
0.7
0.8
0.9
1.0
0.5
0.7
0.9
(a)
Intra cluster mean
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.5
0.7
0.9
(b)
Intra cluster mean
Accuracy
26
50
100 200 400
0.5
0.7
0.9
(c)
Dataset Size
Evector ordering
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qqqq
q
qqqq
q
q
q
q
qqqq
q
q
q
qq
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
qqqqqq
q
q
q
q
qq
q
q
qq
q
qq
q
qqq
q
q
qq
q
q
q
qqq
q
q
qqqq
q
q
q
qq
q
qqqq
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
qq
qq
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
qq
q
q
qq
q
q
q
qqqqq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qqqqqqqq
qq
q
q
q
qq
q
q
qqqqq
q
qq
q
q
q
q
q
q
q
q
qq
q
q
qqq
q
qqq
q
q
q
q
q
qq
q
q
qq
q
qq
q
q
q
qqq
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
qq
qq
q
q
q
q
q
q
qq
q
q
q
qqq
q
q
q
qq
qqqqq
q
q
qq
q
q
q
q
q
q
q
qqq
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
qq
q
q
q
q
q
q
qq
q
q
qq
q
q
q
qqq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqqq
q
qqq
q
q
q
q
q
q
qqqq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
qqq
qqq
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
qqqqqq
q
q
qqq
q
qq
q
q
qq
qq
q
q
q
q
q
qqqqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
qq
qq
q
q
q
qqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
qq
qqqqqq
q
q
q
qqqq
q
qqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qq
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
qqqq
q
q
q
q
qq
q
qqq
qqqq
qq
q
q
q
q
qq
q
q
q
qq
q
qqqqqqqqqqq
q
qq
q
q
q
qq
q
qqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqq
q
q
q
qq
q
q
qq
q
qq
q
qqqqqqqqq
q
q
q
q
q
q
q
q
q
qq
q
qqqqqqq
q
q
q
q
qq
qqqqqqqqqq
q
qqq
q
qq
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
qq
qqqqqqq
qqqqqqqqqqqqqqqqq
q
26
50
100 200 400
0.5
0.7
0.9
(d)
Dataset Size
Accuracy
Figure 4: Analysis of intra-cluster variance on al-
gorithm performance: (a) 500 objects, (b) 50 ob-
jects, (c) ordering and (d) accuracy for settings with
µPin < 0.675.
fers less from increased variance and still performs well over
a wide range of data characteristics. However, since we do
not know how to set α for WtLinkAttr2 in practice, and
because LinkAsFilter offers the opportunity to use efficient
eigensolver techniques, we focus on LinkAsFilter for our em-
pirical data experiments.
5. EMPIRICAL DATA EXPERIMENTS
The experiments reported below are intended to evaluate
two assertions. The first claim is that the LinkAsFilter clus-
tering approach can be used to find groups of items with
similar attribute values and high inter-connectedness. We
evaluate this claim by comparing the clusters produced by
the LinkAsFilter metric to randomly generated clusters of
the same size, evaluating intra-cluster attribute similarity
and intra-cluster linkage.
The second claim is that the LinkAsFilter clustering ap-
proach finds meaningful clusters. Evaluating clusterings of
datasets for which there is no right answer is a difficult task.
One approach is to present the resulting clusters for user ex-
amination. For this type of subjective evaluation, we include
example cluster members from two real-world datasets. An-
other, more objective, approach is to examine cluster utility
by evaluating the cluster labels ability to improve a related
classification task. We evaluate three approaches (LinkOnly,
AttrOnly, and LinkAsFilter) on a third real-world dataset
in this manner, and show the LinkAsFilter clusters achieve
a significant improvement in classification accuracy.
5.1 Datasets
We clustered three real-world datasets where attributes ex-
hibit correlation among linked objects, and the link struc-
ture exhibits clustering. These are the characteristics we

Page 7
A
B
C
D
E
F
0.5
0.7
0.9
(a)
Similarity metric
Intra cluster mean
AttrOnly
LinkOnly
LinkAsAttr
WtLinkAttr1
WtLinkAttr2
LinkAsFilter
q
q
qqqqqqqqqqq
q
qqqqq
q
qq
q
A
B
C
D
E
F
0.00
0.04
0.08
0.12
(b)
Similarity metric
Variance
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(c)
Pin
l
P+
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(d)
Pin
l
P+
Accuracy
Figure 5: (a) Intra-cluster mean by metric, (b)
intra-cluster variance by metric, (c) accuracy of
WtLinkAttr2 and (d) LinkAsFilter for 50 objects.
expect to find in datasets that contain communities, and it
is in these situations that we expect our clustering algorithm
will perform well.
The first data set is drawn from Cora, a database of com-
puter science research papers extracted automatically from
the web using machine learning techniques [12]. We selected
the largest connected component from the set of machine-
learning papers published after 1993. The resulting graph
contains 1,042 papers and 2546 citation links. We clus-
tered the undirected version of this graph. The similarity
metric considered two topic attributes at different levels of
granularity (e.g., {Machine Learning, Neural Networks} and
{Planning, Rule Learning}).
The second data set consists of a set of web pages from
four computer science departments, collected by the WebKB
Project [4]. The web pages have been manually classified
into the categories: course, faculty, staff, student, research
project, or other. The category “other†denotes a page
that is not a home page (e.g., a curriculum vitae linked
from a faculty page or homework description linked from a
course page). The collection contains approximately 4,000
web pages and 8,000 hyperlinks among those pages. We
clustered the largest connected component in these data—a
graph of 1236 pages and 3673 hyperlinks. Again, we used
the undirected version of the graph. The similarity metric
considered two attributes: page category and department.
However, the entire component is from a single department
(Wisconsin) so the department attribute adds no additional
information.
The third data set is a relational data set containing infor-
mation about the yeast genome at the gene and the pro-
tein level (www.cs.wisc.edu/~dpage/kddcup2001/). The data
Table 1: Cora cluster examples
Cluster 9: Belief revision: A critique; Plausibility measures
and default reasoning; Modeling belief in dynamic systems. Part
I: foundations; Knowledge-Based Framework for Belief Change,
Part II: Revision and Update; Iterated revision and minimal re-
vision of conditional beliefs; An event-based abductive model of
update; On the logic of iterated belief revision; A unified model
of qualitative belief change: A dynamical systems perspective;
Generalized update: Belief change in dynamic settings
Cluster 14: In defense of C4.5: Notes on learning one-level
decision trees; Exploring the decision forest: An empirical in-
vestigation of Occams razor in decision tree induction; Algorith-
mic stability and sanity-check bounds for leave-one-out cross-
validation; Bias and the quantification of stability; Characteriz-
ing the generalization performance of model selection strategies;
A new metric-based approach to model selection; Preventing
overfitting of Cross-Validation data; Further experimental evi-
dence against the utility of occams razor
Cluster 19: An empirical evaluation of bagging and boosting;
On-line portfolio selection using multiplicative updates; Hetero-
geneous uncertainty sampling for supervised learning; Improved
boosting algorithms using confidence-rated predictions; On-line
algorithms in machine learning; Training algorithms for hidden
Markov models using entropy based distance functions; A sys-
tem for multiclass multi-label text categorization; Coevolution-
ary Search Among Adversaries
Cluster 24: Refinement of Bayesian networks by combin-
ing connectionist and symbolic techniques; DistAl: An inter-
pattern distance-based constructive learning algorithm; An
Anytime Approach to Connectionist Theory Refinement: Refin-
ing the Topologies of Knowledge-Based Neural Networks; Cre-
ating advice-taking reinforcement learners; Learning controllers
for industrial robots; Generating accurate and diverse members
of a neural-network ensemble; A Neural Architecture for a High-
Speed Database Query System; Comparing methods for refining
certainty-factor rule-bases;
set contains information about 1,243 genes and 1,734 in-
teractions. We clustered the largest connected component,
which consisted of 814 genes and 1475 interactions. The
similarity metric considered 13 boolean function attributes.
Each gene may have multiple functions. We evaluated the
resulting cluster labels’ ability to predict gene localization.
We applied a relational Bayesian classifier [15] to the entire
dataset, using the cluster labels as an additional attribute,
and measured performance.
5.2 Results
Clustering the sample of Cora papers produced 71 clusters
varying in size from 1-202 papers, with an average size of
15. We report statistics for the 28 clusters with more than
six papers. Table 1 includes randomly selected titles from
four clusters for subjective evaluation. Although we did not
use title words in the similarity metrics, the clusters show a
surprising uniformity among the titles. This indicates that
research papers can be clustered into meaningful groups us-
ing the citation structure and topic attributes alone.
To evaluate intra-cluster attribute similarity, we averaged
the attribute similarity across all pairs of genes within each
cluster. As a baseline measure we calculated the average at-
tribute similarity in ten random clusterings. Figure 6a plots
the intra-cluster attribute similarity (dark bars) compared
to the expected averages given random clusterings (light
bars), with the clusters listed in ascending order by size.

Page 8
1
4
7
10
13
16
19
22
25
28
(a)
Cluster
Mean Attribute Similarity
0.5
0.6
0.7
0.8
0.9
1.0
1
4
7
10
13
16
19
22
25
28
(b)
Cluster
Prop. Intra−Cluster Links
0.0
0.2
0.4
0.6
0.8
1.0
Figure 6: Evaluation of hybrid clusters in Cora.
Table 2: WebKB cluster examples
Cluster 5: http://www.cs.wisc.edu/Dienst/UI/2.0/Describe/-
ncstrl.uwmadison/CS-TR-89-890;
http://www.cs.wisc.edu/-
Dienst/UI/2.0/Describe/ncstrl.uwmadison/CS-TR-90-947;
http://www.cs.wisc.edu/Dienst/UI/2.0/Describe/-
ncstrl.uwmadison/CS-TR-95-1283;
http://www.cs.wisc.edu/-
Dienst/UI/2.0/Describe/ncstrl.uwmadison/CS-TR-91-1037;
http://www.cs.wisc.edu/Dienst/UI/2.0/Describe/ncstrl.-
uwmadison/CS-TR-90-962; http://www.cs.wisc.edu/Dienst/-
UI/2.0/Describe/ncstrl.uwmadison/CS-TR-89-900;
http://-
www.cs.wisc.edu/~reps/reps.html; http://www.cs.wisc.edu/-
Dienst/UI/2.0/Describe/ncstrl.uwmadison/CS-TR-91-1038
Cluster
9:
http://www.cs.wisc.edu/~bart/537/quizzes/-
quiz6.html; http://www.cs.wisc.edu/~bart/cs537.html;
http://www.cs.wisc.edu/~bart/537/quizzes/quiz3.html;
http://www.cs.wisc.edu/~bart/537/quizzes/quiz10.html;
http://www.cs.wisc.edu/~bart/537/quizzes/quiz2.html;
http://www.cs.wisc.edu/~bart/537/programs/program2.html;
http://www.cs.wisc.edu/~bart/537/lecturenotes/-
titlepage.html;
http://www.cs.wisc.edu/~bart/537/quizzes/-
quiz9.html;
Cluster
11:
http://www.cs.wisc.edu/~cs354-2/cs354/-
lec.notes/numbers.html;
http://www.cs.wisc.edu/-
~cs354-2/cs354/lec.notes/data.structures.html;
http://-
www.cs.wisc.edu/~cs354-2/cs354/solutions/Q2.j.html; http://-
www.cs.wisc.edu/~cs354-2/cs354/lec.notes/arch.features.html;
http://www.cs.wisc.edu/~cs354-2/cs354/lec.notes/-
interrupts.html;
http://www.cs.wisc.edu/~cs354-2/cs354/-
lec.notes/case.studies.html;
http://www.cs.wisc.edu/~cs354-
2/cs354/lec.notes/arith.int.html;
http://www.cs.wisc.edu/-
~cs354-2/cs354/lec.notes/MAL.html;
Cluster 14: http://www.cs.wisc.edu/condor/research.html;
http://www.cs.wisc.edu/~bart/cs638.html;
http://-
www.cs.wisc.edu/coral/coral.people.html;
http://-
www.cs.wisc.edu/~brad/brad.html; http://www.cs.wisc.edu/-
~sastry/spring96.html;
http://www.cs.wisc.edu/~ashraf/-
ashraf.html;
http://maf.wisc.edu/distributed/condor/-
index.html; http://www.cs.wisc.edu/~ssl/resume.html;
1
3
5
7
9
11
13
15
(a)
Cluster
Mean Attribute Similarity
0.5
0.6
0.7
0.8
0.9
1.0
1
3
5
7
9
11
13
15
(b)
Cluster
Prop. Intra−Cluster Links
0.0
0.2
0.4
0.6
0.8
1.0
Figure 7: Evaluation of hybrid clusters in WebKB.
Attribute similarity is significantly higher than expected.2
Note that the largest cluster (#28) does not exhibit high
linkage or attribute similarity. This cluster may contain the
set of papers that could not be partitioned into smaller clus-
ters (i.e., the papers with no coherent community structure).
Figure 6b shows the actual and expected proportion of intra-
cluster citations. To assess the connectivity of the clusters,
we compared the proportion of intra-cluster linkage (per
cluster) to expected proportions, given ten random clus-
terings. Again, the proportion of intra-cluster citations is
significantly higher than the expected values. This indi-
cates that the clustering technique is finding groups of highly
inter-connected research papers.
Clustering the sample of WebKB pages produced 55 clusters
varying in size from 1-649 pages, with an average size of 22.
We report statistics for the 15 clusters with more than six
pages, listed in ascending order by size. Table 2 includes
randomly selected URLs from four clusters for subjective
evaluation. Recall that the component graph only contains
pages from the University of Wisconsin. The selected clus-
ters appear to group by function—for example, tech reports,
course pages, or research group pages.
Figure 7b plots the intra-cluster averages compared to the
expected averages given random clusterings. Figure 7b shows
the actual and expected proportion of intra-cluster hyper-
links. The proportion of intra-cluster linkage is significantly
higher than expected, but notice that the largest cluster’s
(#15) expected linkage is quite high by random chance.
This may indicate that the largest cluster contains a set
of pages that are too tightly connected to partition. This
clustering does exhibit significantly higher than expected at-
tribute similarity. However, we note that the algorithm is
still able to cluster pages into groups that are highly inter-
connected. This indicates that the LinkAsFilter metric may
be robust to irrelevant attribute values.
Clustering the sample of genes produced 88 clusters varying
in size from 1-140 genes, with an average size of 8. We report
statistics for the 14 clusters with more than six genes. Intra-
cluster attribute similarity (figure 8a) and intra-cluster link-
age (figure 8b) are both significantly higher than expected.
These results show that the LinkAsFilter metric can be used
to find groups of genes with similar functions and many com-
mon interactions.
The structure of genomic data offers an opportunity for an
objective evaluation of the clustering results. Clusters of
inter-connected genes with similar associated functions may
indicate a group of genes that are interacting to perform a
particular function in the cell. If this is the case, the cluster
labels should be helpful in predicting gene localization in the
cell. To test this hypothesis, we used the cluster labels to
predict gene localization. We applied a relational Bayesian
classifier (RBC) [15] to the gene data, using the cluster labels
as an additional attribute, and measured change in accuracy.
Figure 8d reports average 10-fold cross-validation accuracies
for RBC models learned using the cluster labels from the
LinkOnly, AttrOnly, and LinkAsFilter metrics. The baseline
2We assessed significance using two-tailed t-tests, p < 0.05.

Page 9
1
3
5
7
9
11
13
(a)
Cluster
Mean Attribute Similarity
0.08
0.10
0.12
0.14
1
3
5
7
9
11
13
(b)
Cluster
Prop. Intra−Cluster Links
0.0
0.2
0.4
0.6
0.8
1.0
1
3
5
7
9
11
13
(c)
Cluster
Size
0
20
40
60
80
100
120
140
AttrOnly
LinkOnly
LinkAsFilter
w/o Clusters
(d)
Accuracy
0.50
0.60
0.70
0.80
Cluster labels only
Other attrs + cluster labels
Figure 8: Evaluation of hybrid clusters in Gene.
RBC model used twelve attributes for prediction, including
gene phenotype and motif, and achieved an average accu-
racy of 66.3%. The RBC model that included cluster labels
from AttrOnly did not significantly improve accuracy.3 The
model that included cluster labels from LinkOnly achieved
a significant improvement in accuracy, with an average of
68.4%, indicating that gene interactions alone are helpful
for predicting location. However, the model that included
cluster labels from LinkAsFilter achieved an average accu-
racy of 70.2%. This is a significant improvement over both
LinkOnly and the baseline RBC model without cluster la-
bels, which demonstrates the utility of clustering for com-
munities using both attribute and link information.
6. DISCUSSION
This paper presents a hybrid metric for spectral clustering
algorithms that exploits both attribute information and link
structure to improve discovery of communities in relational
data. There has been relatively little work investigating
clustering techniques for relational domains. The work in
this area has focused on either complex generative models
with latent variables [11, 20, 3], or augmented clustering
techniques that use ad-hoc similarity metrics to incorporate
both link and attribute information [14, 9]. Due to the com-
plexity of probabilistic relational models with latent vari-
ables, and the sparsity of relational graphs that enable the
use of efficient eigensolver techniques, we chose to explore
extensions to spectral clustering for relational domains.
The most closely related prior work is that of He, Ding, Zha,
and Simon [9], which uses a spectral graph-partitioning al-
gorithm to automatically identify topics in sets of retrieved
web pages. This approach uses a similarity measure specifi-
cally designed for high-dimensional text domains with weighted
co-citation links. We differ from this work, and other re-
3Again, significance was assessed using two-tailed t-tests,
p < 0.05.
search on hybrid spectral algorithms, in our exploration of
the characteristics that underlie successful similarity met-
rics.
We have set up a framework to evaluate different similarity
metrics quantitatively over a wide range of relational data
sets. Our experiments show that increasing the separation
between total intra-cluster and inter-cluster transition prob-
abilities results in superior performance over a wide range of
data characteristics. One way to increase the separation be-
tween cluster transition probabilities is to drop potentially
noisy information from consideration. Using this approach,
we expect the LinkAsFilter metric will successfully recover
groupings over a wide range of data characteristics.
There are two primary advantages to using the LinkAsFilter
metric. The first advantage is algorithm efficiency—there
are O(E) approximate eigensolver algorithms, and there are
O(n1.4) exact eigensolver algorithms for sparse matrices that
can exploit the sparse matrix structure produced by the met-
ric. The second advantage is the choice of α = l, which is
independent of data characteristics. We expect the metric
will work well in any dataset exhibiting community struc-
ture, provided there is enough data to withstand the associ-
ated increase in variance. In small datasets, where the size
of the data cannot offset the increase in variance, the appli-
cation of balanced metrics (e.g., WtLinkAttr2) may produce
superior clusterings. In practice however, this approach is
limited by the need to set α to balance the link and attribute
information.
With a way to evaluate each setting, an algorithm could
search for the best α. Our analysis indicates that the “bestâ€
settings will maximize the separation between the intra-
cluster and inter-cluster transition probabilities. We con-
jecture that the eigenvector information—more specifically,
the separation between the means of distributions of the
eigenvector values on either side of the cut—can be used to
approximate this information. We report preliminary find-
ings in support of this conjecture.
Figure 9a graphs the correlation between algorithm perfor-
mance and the separation of eigenvector-value distributions.
We clustered over the space of synthetic datasets described
in section 4.1 using 20 different values of α, chosen uni-
formly in the range [0, 1]. We recorded (1) the accuracy of
the clustering, and (2) the distance between the means of the
eigenvector-value distributions on either side of the chosen
cut (after the values were normalized to unit range). Fig-
ure 9b shows performance when we set α by maximizing the
separation between the means of the eigenvector-value dis-
tributions. Comparing this graph to figure 1, we can see that
this technique approaches the performance of the LinkAs-
Filter metric. This is a promising direction to explore for
applications with little data, where the variance will be too
high to apply LinkAsFilter successfully.
7. CONCLUSIONS AND FUTURE WORK
We have analyzed the spectral decomposition algorithm from
a statistical perspective and shown that the successful hy-
brid metrics use the link and attribute information to in-
crease the separation between noisy clusters. We have shown
an empirical connection between the distribution of tran-

Page 10
+++ +
+ +
++++++++++++
++++++
+
+++ ++++++++ +
+
+
+
++
+++++ +
+
+
+
++
++++
+
+
++
++
++
++++++++++++++++
+
++++++++++++++
++++++++ +++++++++++++++ ++++++++++++ +++++++++++++++
++ +
++++
+ ++
+++++++
++ ++
++
+
+++
++
+
+
+
++++
+
+ +
+
+++++++++++++++++++
+
+ ++++++
++++++++ ++ ++
+
++++
+
++++
+
+++++ ++
+
+
++
+++++++++++++
+++
++
+++++++++++++++
+++++++ +++++++++++++++ ++++++++++++++++ +++++++++
++++++++++++++++++++ +
+
+++++++++
+
+
+
+
++++++
++++++++++
+ ++++++++
+
+
+++++++
++++++
+ +++
+++
+++
++
+
+
+
++
+++
+
++
+
+
+
++
+++++++++++++++++
++++++++++++++++++
++++++++++++++ ++++++++++++++ ++++++++++++ +++++++++++++++
++++ ++ ++++++++++++
++ +
+
+++++ +++
+
++++
++ ++
+
+
++++++++++ +++++ ++++
+++++
+++++
++++
++++ ++
+
+
++
+
+
++
++++++
++
+
++
+
+
+++++++++++++++++++
+++++++++++++ ++++++++++++ +++++++++++ ++++++++++ +++++++++
+
+ ++++++ ++++++++++++
+
+++++
+
++++++
+++++
++
++++++++ +++++
++ +
+
+
+
+
+
+ ++++++++ +++
++
++
+
++
+
++
+
+
+
+++
+
+
++
++
+++++++
+++++++++++++++++++++
+++++++++++++++++++++++++++++++++
++++++++++++++++
+++++++++++++ ++++++++++++++++
++++++++++
++++ +
++
+
+
+
+
+++
+
+++
++
+
+++ +
++
+
+
+ +
+++++ +++++++
+
++++
+
+
+
+
+++ +
+++++++
+
++
++++
+
++
++
+
+
+
+
++
+
+
++
+
+
+++++++++
+++++++++
+++++++++
+++++++++++++++++++ +++++++++++++++++
+++++++++++++++++ ++++++++++++++++ ++++++++++++++++
+++++ +++++++++ +
++++
+
++++ +++
++
+++
+
++
++ +
+ ++
++++++++ +
+
+++
+
+
+
+++++
++++
+
+
+
++
+
+ +
++
+++++++
++
+
++
+
+
+
+++++++++++++
++
++++
+++++++++++++++
+++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++
+++ ++++++++++
+
+
+
+
+
+
+
+ +++
+
+++++
+
+++
+
+
+
+
+
+
++++++++++++
+ +
+
+
+
+
+
++
++++
++
+
++
+ +
++
+
+
+
+++++
++
+
+
+
+
++
++
+++++++++++
+
+++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++ ++++++++++++
++++++++++++++
++
+
+
+
+
++
+
+
++
+
++
+
+
+
+
+
+
++
+
+
+
+++++ +++
+
+
+
+
+ +
+ + ++ + + +
+++
+
+
+
+
+
+
+
+ + ++ + + + + + + +
+
+
+
+
+
+ +
+++++++ + + +++++
++
+++++++++++++++++++
+++++++++++++++++++++
+++++++++++++++++++++
+++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++
++++++++++
++
+
+
+ +
+ +
+
+ +
++ +++++++++
+ ++++
++
++++++
+
+
+
+ +
+
+
+
+ ++
+ + ++
+++
+++ +
+
+
+
+ +
+ ++ + + + +++
+
+
+
+
+
+
+++++++++++++++
+++
++++++++++++++++++
+++++++++++++++++++++
+++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++ +
+
+
+
+
+
+
++
+ +
++ ++++++ +
+
+
+
++ ++
++ +
++++++
+
+
+
+
+
+ +
+ + + + + + + +
+ +
++
+
+
+
+
+
+ + ++ + + + + + + ++
+
+
+
+
++
+ +
+ +++++++++++
+ +
+ +++ + + + + + + + + +++++++
+++++++++++++++++++++
+++++++++++++++++++++
+++++++++++++++++++++
++++++++++++++++++++
++++++++++++++++++++++
+++++++ +
++
+
+
+
+
+
+
+
+
+
+ ++
0.4
0.5
0.6
0.7
0.8
0.5
0.6
0.7
0.8
0.9
1.0
(a)
Eigenvector value separation
Accuracy
0.5
0.6
0.7
0.8
0.9
1.0
0.10
0.12
0.14
0.16
0.18
0.20
0.5
0.6
0.7
0.8
0.9
1.0
(b)
Pin
l
P+
Accuracy
Figure 9: Searching for α to use in the metric: (a)
correlation between separation of eigenvector values
and accuracy (corr = 0.71), and (b) cluster accuracy
using α that maximizes separation.
sition probabilities and algorithm performance, connecting
both mean and variance to cluster accuracy. Future work
will compare this approach to latent-variable relational mod-
els and explore complexity/efficiency tradeoffs between the
two techniques. Furthermore, we will attempt to derive the-
oretical bounds on finite-sample performance, and explore
the alternative optimization criteria for data with low mean
separation, where the NCut criteria prefers high-precision/low-
recall groupings.
In addition, the WebKB results suggest an alternative clus-
tering task—clustering data that exhibit role equivalence
structure, rather than community structure. Objects that
play the same roles in a graph have similar attributes and
similar link patterns but may not actually link to each other.
For example, faculty pages rarely link to each other but they
conistently link to student and course pages. Current meth-
ods for grouping data in this manner focus primarily on link
information (e.g., [17]). Extending this work to incorporate
attribute information seems an exciting direction to explore.
8. ACKNOWLEDGMENTS
The authors acknowledge helpful comments and discussion
from Alicia Wolfe. This research is supported under a AT&T
Graduate Research Fellowship and by DARPA and AFRL
under contract numbers F30602-00-2-0597 and F30602-01-
2-0566.
9. REFERENCES
[1] F. Bach and M. Jordan. Learning spectral clustering.
In Proceedings of NIPS16, 2003.
[2] F. Chung. Spectral Graph Theory. The American
Mathematical Society, 1997.
[3] D. Cohn and T. Hofmann. The missing link - a
probabilistic model of document content and
hypertext connectivity. Advances in Neural
Information Processing Systems, 10, 2001.
[4] M. Craven, D. DiPasquo, D. Freitag, A. McCallum,
T. Mitchell, K. Nigam, and S. Slattery. Learning to
extract symbolic knowledge from the world wide web.
In Proceedings of the 15th National Conference on
Artificial Intelligence, 1998.
[5] I. Dhillon. Co-clustering documents and words using
bipartite spectral graph partitioning. In Proc. of the
7th ACM International Conf. on Knowledge Discovery
and Data Mining, 2001.
[6] W. Donath and A. Hoffman. Lower bounds for the
partitioning of graphs. IBM Journal of Research and
Development, 17):420–425, 1973.
[7] M. Fiedler. Algebraic connectivity of graphs.
Czecheslovak Math. Jour., 23(98):298–305, 1973.
[8] G. Golub and C. V. Loan. Matrix Computations.
Johns Hopkins University Press, 1983.
[9] X. He, C. Ding, H. Zha, and H. Simon. Automatic
topic identification using webpages clustering. In
Proceedings of the 1st IEEE International Conference
on Data Mining, 2001.
[10] R. Kannan, S. Vempala, and A. Vetta. On clusterings:
Good, bad and spectral. In Proceedings of the 41st
Symposium on the Foundations of Computer Science,
2000.
[11] J. Kubica, A. Moore, J. Schneider, and Y. Yang.
Stochastic link and group detection. In Proceedings of
the 18th National Conference on Artificial
Intelligence, 2002.
[12] A. McCallum, K. Nigam, J. Rennie, and K. Seymore.
A machine learning approach to building
domain-specific search engines. In Proceedings of the
16th International Joint Conference on Artificial
Intelligence, 1999.
[13] M. Meila and J. Shi. A random walks view of spectral
segmentation. In Proceedings of the 8th International
Workshop on Artificial Intelligence and Statistics,
2001.
[14] D. Modha and W. Spangler. Clustering hypertext
with applications to web searching. In Proceedings of
the 11th ACM Conference on Hypertext and
Hypermedia, 2000.
[15] J. Neville, D. Jensen, and B. Gallagher. Simple
estimators for relational bayesian classifiers. In
Proceedings of the 3rd IEEE International Conference
on Data Mining, 2003.
[16] A. Ng, M. Jordan, and Y. Weiss. On spectral
clustering: Analysis and an algorithm. In NIPS 2001,
2001.
[17] K. Nowicki and T. Snijders. Estimation and prediction
for stochastic blockstructures. Journal of the
American Statistical Association, 96:1077–1087, 2001.
[18] B. Parlett. The Symmetric Eigenvalue Problem.
Prentice-Hall, Inc., 1980.
[19] J. Shi and J. Malik. Normalized cuts and image
segmentation. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 22(8):888–905, 2000.
[20] B. Taskar, E. Segal, and D. Koller. Probabilistic
clustering in relational data. In Proceedings of the 17th
International Joint Conference on Artificial
Intelligence, 2001.

Page 11
APPENDIX
A. PROOF OF THEOREM
Theorem: Let ∆ = (A1, A2) be a partition of V . Let
the function S(i, j) define the similarity measure between
vi, vj ∈ V . If, ∀i, j, k, S(i, j) is conditionally independent
of S(i, k) given node i, and E[P11]E[P22] = E[P12]E[P21]
then, P has an eigenvector that will converge to piecewise
constant w.r.t. ∆ as |A1|, |A2|→∞.
Proof. In order to simplify the calculations below, we
assume that the two clusters share the same distribution of
intra- and inter- cluster similarity values. The symmetry
in attribute parameters simplifies the analysis but is not
necessary for correctness. Let µin be the mean intra-cluster
similarity for nodes i, j ∈ A1 or i, j ∈ A2. Similarly, let µout
be the mean inter-cluster similarity for nodes i ∈ A1 and
j ∈ A2.
We can represent each entry in W as a random variable.
Consider the entries of row i. The entries Wij , Wik are not
independent because the similarity values are both based
on node i. However, conditioned on the state of i (e.g. at-
tribute values of i), the entries can be viewed as independent
random variables if the state of j is independent of the state
of k. This assumption corresponds to a generative model in
which the objects and links in the graph are conditionally
independent given the object cluster memberships.
We will calculate the expected intra- and inter-cluster transi-
tion probabilities in P as a ratio of sums of random variables.
Let Ti
in be the total intra-cluster transition probability for
node i, where i ∈ Ak,k∈1,2, and let |Ak| = nk. Similarly, let
Ti
out be the total inter-cluster transition probability, and Ti
all
be the total transition probability. Then Pi
in is the ratio of
Ti
in and Ti
all, and Pi
out is the ratio of Ti
out and Ti
all.
The normalized transition probabilities in P then corre-
spond to the ratio of two random variables (e.g., Ti
in/Ti
all),
which can be approximated using a truncated Taylor se-
ries expansion. The expectation and variance for intra- and
inter-cluster normalized transition probabilities are below.
(Analytical derivations are included in Section A.1.)
E[Pi
in] = E[Ti
in/Ti
all]
≈
µTin
µTall
· [1 + [
σTall
µTall
]2 −
σTinTall
µTin µTall
]
E[Pi
out] = E[Ti
out/Ti
all] ≈
µTout
µTall
· [1 + [
σTall
µTall
]2 −
σToutTall
µTout µTall
]
where σXY is the covariance of X, Y .
As n1, n2 → ∞, it follows directly from the Law of Large
Numbers that the value of Ti
in/Tj
in → 1 for i, j ∈ Ak, since
Tin is a sum of independent random variables with finite
mean and variance. A similar argument holds for Tout and
Tall. Now consider the normalized transition probabilities
for P. If, in the limit, the sums Ti
in (and Ti
out, Ti
all) converge
to the same value for all i ∈ Ak, then the normalized sums
Pi
in will converge to the same value Pin for all i ∈ Ak. A
similar argument holds for Pi
out.
As n1, n2 → ∞, we can decompose the matrix P into P =
P +ϵE, where P is a matrix with constant transition prob-
abilities Pin and Pout, and E is a perturbation matrix with
||E||2 = 1. Then by matrix perturbation theory [8]:
(P + ϵE)xi(ϵ) = λi(ϵ)xi(ϵ)
where xi(ϵ) = xi + ϵ Σn
j=1,j=i {
yj
T Exi
(λi−λj )yj
T xi
} + O(ϵ2) ,
and
λi(ϵ) = λi ±
ϵ
|yi
T xi|
Here xi, yi, and λi, are the right and left eigenvectors, and
the eigenvalues of P . As n1, n2 → ∞, ϵ → 0 and the
eigenvectors of P will converge to the eigenvectors of P .
Therefore the graph will converge to a Markov chain with
state space ∆ = (A1, A2), and constant transition probabil-
ities R11 = R22 = E[Pi
in], and R12 = R21 = E[Pi
out]. If
R11 = R12, then R will be non-singular, and by proposition
2 in [13], P will have a piecewise linear eigenvector w.r.t
∆.
A.1 Analytic Derivations
When S(i, j) is conditionally independent of S(i, k) given
the state of node i, the cluster transition probabilities are
simply sums of independent random variables. Using condi-
tional expectation (E[h(X, Y )] = EX {E[h(X, Y )|X]}), we
can calculate the expectation for Ti
in based on the state of
i, which we refer to as iS :
E[Ti
in] = E[Σj∈Ak
S(i, j)]
= ΣiS
p(iS ) · E[Σj∈Ak
S(iS , j)]
= ΣiS
p(iS ) · nk · E[S(iS , j)|j ∈ Ak]
= nk · ΣiS
p(iS ) · ΣjS
p(jS ) · S(iS , jS )
= nk · ΣiS
Σ
jS
p(iS ) · p(jS ) · S(iS , jS )
= nk · E[Sin]
= nk · µin
Total inter-cluster and overall means are calculated in a sim-
ilar fashion. E[Ti
out] = nk' · µout, and E[Ti
all]=(nk · µin) +
(nk' · µout), where nk' = ni,i=k.
The variance of the total intra-cluster similarity is calculated
as follows 4:
V ar[Ti
in] = V ar[Σj∈Ak
S(i, j)]
= EiS {V ar[Σj∈Ak
S(iS , j)]}
= ΣiS
p(iS ) · V ar[Σj∈Ak
S(iS , j)]
= ΣiS
p(iS ) · nk · V ar[S(iS , j)|j ∈ Ak]
= nk · ΣiS
Σ
jS
p(iS ) · p(jS ) · {S(iS , jS ) − EiS [S(iS , jS )]}2
Total inter-cluster and overall variance are calculated in a
4The derivation uses the following equivalence:
V ar(h(X, Y )) = E[h(X, Y )2] − E[h(X, Y )]2
= EX {E[h(X, Y )2|X]} − EX {E[h(X, Y )|X]2}
= EX {V ar(h(X, Y )|X)}

Page 12
similar fashion: V ar[Ti
out] = nk' ·ΣiS
p(iS ) · V ar[S(iS , j)|j ∈ Ak' ],
and V ar[Ti
all] = ΣiS
p(iS ) {nk' · V ar[S(iS , j)|j ∈ Ak' ]
+nk · V ar[S(iS , j)|j ∈ Ak]}.
From these we can calculate the expected transition prob-
abilities of P using the ratio of two random variables (e.g.,
Tin/Tall). These calculations use an approximation of the
ratio of two random variables, based on a truncated Taylor
series expansion:
E[X/Y ]
≈ µX
µY
· [1 + [σY
µY
]2 − σXY
µX µY
]
V ar(X/Y ) ≈ [µX
µY
]2 · [[σX
µX
]2 + [σY
µY
]2 − 2 σXY
µX µY
]
The expectation and variance for intra- and inter-cluster
normalized transition probabilities are as follows:
E[Pi
in]
= E[Ti
in/Ti
all]
≈
µTin
µTall
· [1 + [
σTall
µTall
]2 −
σTinTall
µTin µTall
]
V ar[Pi
in] = V ar[Ti
in/Ti
all]
≈ [
µTin
µTall
]2 · [[
σTin
µTin
]2 + [
σTall
µTall
]2 − 2
σTinTall
µTin µTall
]
E[Pi
out]
= E[Ti
out/Ti
all]
≈
µTout
µTall
· [1 + [
σTall
µTall
]2 −
σToutTall
µTout µTall
]
V ar[Pi
out] = V ar[Ti
out/Ti
all] ≈ [
µTout
µTall
]2 · [[
σTout
µTout
]2 + [
σTall
µTall
]2 − 2
σToutTall
µTout µTall
]
where σXY
is the covariance of X, Y . For the equations
above, the covariance of Tin and Tall reduces to the vari-
ance of Tin, using conditional expectation to eliminate the
covariance:
σTinTall
= E[TinTall] − E[Tin] · E[Tall]
= E[Tin(Tin + Tout)] − E[Tin] · E[(Tin + Tout)]
= E[T2
in + Tin · Tout] − E[Tin]2 − E[Tin] · E[Tout]
= E[T2
in] + E[Tin · Tout] − E[Tin]2 − E[Tin] · E[Tout]
= E[T2
in] − E[Tin]2 + E[Tin · Tout] − E[Tin] · E[Tout]
= V ar(Tin) − ΣiS
p(iS ){E[Tin · Tout|i] − E[Tin|i] · E[Tout|i]}
= V ar(Tin) − ΣiS
p(iS ) · 0
= V ar(Tin)
A similar derivation applies to the covariance of Tout and
Tall.

Pictures and photos with Links

Image of links - 1 Image of links - 2 Image of links - 3 Image of links - 4 Image of links - 5
Image of links - 6 Image of links - 7 Image of links - 8 Image of links - 9 Image of links - 10
Image of links - 11 Image of links - 12 Image of links - 13 Image of links - 14 Image of links - 15
Image of links - 16 Image of links - 17 Image of links - 18 Image of links - 19 Image of links - 20
Image of links - 21 Image of links - 22 Image of links - 23 Image of links - 24 Image of links - 25
Image of links - 26 Image of links - 27 Image of links - 28 Image of links - 29 Image of links - 30
Image of links - 31 Image of links - 32 Image of links - 33 Image of links - 34 Image of links - 35
Image of links - 36 Image of links - 37 Image of links - 38 Image of links - 39 Image of links - 40
Image of links - 41 Image of links - 42 Image of links - 43 Image of links - 44 Image of links - 45
Image of links - 46 Image of links - 47 Image of links - 48 Image of links - 49 Image of links - 50
Image of links - 51 Image of links - 52 Image of links - 53 Image of links - 54 Image of links - 55
Image of links - 56 Image of links - 57 Image of links - 58 Image of links - 59 Image of links - 60
Image of links - 61 Image of links - 62 Image of links - 63 Image of links - 64 Image of links - 65
Image of links - 66 Image of links - 67 Image of links - 68 Image of links - 69 Image of links - 70
Image of links - 71 Image of links - 72 Image of links - 73 Image of links - 74 Image of links - 75
Image of links - 76 Image of links - 77 Image of links - 78 Image of links - 79 Image of links - 80
Image of links - 81 Image of links - 82 Image of links - 83 Image of links - 84 Image of links - 85
Image of links - 86 Image of links - 87 Image of links - 88 Image of links - 89 Image of links - 90
Image of links - 91 Image of links - 92 Image of links - 93 Image of links - 94 Image of links - 95
Image of links - 96 Image of links - 97 Image of links - 98 Image of links - 99 Image of links - 100
Image of links - 101 Image of links - 102 Image of links - 103 Image of links - 104 Image of links - 105
Image of links - 106 Image of links - 107 Image of links - 108 Image of links - 109 Image of links - 110
Image of links - 111 Image of links - 112 Image of links - 113 Image of links - 114 Image of links - 115
Image of links - 116 Image of links - 117 Image of links - 118 Image of links - 119 Image of links - 120
Image of links - 121 Image of links - 122 Image of links - 123 Image of links - 124 Image of links - 125
Image of links - 126 Image of links - 127 Image of links - 128 Image of links - 129 Image of links - 130
Image of links - 131 Image of links - 132 Image of links - 133 Image of links - 134 Image of links - 135
Image of links - 136 Image of links - 137 Image of links - 138 Image of links - 139 Image of links - 140
Image of links - 141 Image of links - 142 Image of links - 143 Image of links - 144 Image of links - 145
Image of links - 146 Image of links - 147 Image of links - 148 Image of links - 149 Image of links - 150
Image of links - 151 Image of links - 152 Image of links - 153 Image of links - 154 Image of links - 155
Image of links - 156 Image of links - 157 Image of links - 158 Image of links - 159 Image of links - 160
Image of links - 161 Image of links - 162 Image of links - 163 Image of links - 164 Image of links - 165
Image of links - 166 Image of links - 167 Image of links - 168 Image of links - 169 Image of links - 170
Image of links - 171 Image of links - 172 Image of links - 173 Image of links - 174 Image of links - 175
Image of links - 176 Image of links - 177 Image of links - 178 Image of links - 179 Image of links - 180
Image of links - 181 Image of links - 182 Image of links - 183 Image of links - 184 Image of links - 185
Image of links - 186 Image of links - 187 Image of links - 188 Image of links - 189 Image of links - 190
Image of links - 191 Image of links - 192 Image of links - 193 Image of links - 194 Image of links - 195
Image of links - 196 Image of links - 197 Image of links - 198 Image of links - 199 Image of links - 200
Image of links - 201 Image of links - 202 Image of links - 203 Image of links - 204 Image of links - 205
Image of links - 206 Image of links - 207 Image of links - 208 Image of links - 209 Image of links - 210
Image of links - 211 Image of links - 212 Image of links - 213 Image of links - 214 Image of links - 215
Image of links - 216 Image of links - 217 Image of links - 218 Image of links - 219 Image of links - 220
Image of links - 221 Image of links - 222 Image of links - 223 Image of links - 224 Image of links - 225
Image of links - 226 Image of links - 227 Image of links - 228 Image of links - 229 Image of links - 230
Image of links - 231 Image of links - 232 Image of links - 233 Image of links - 234 Image of links - 235
Image of links - 236 Image of links - 237 Image of links - 238 Image of links - 239 Image of links - 240
Image of links - 241 Image of links - 242 Image of links - 243 Image of links - 244 Image of links - 245
Image of links - 246 Image of links - 247 Image of links - 248 Image of links - 249 Image of links - 250
Image of links - 251 Image of links - 252 Image of links - 253 Image of links - 254 Image of links - 255
Image of links - 256 Image of links - 257 Image of links - 258 Image of links - 259 Image of links - 260
Image of links - 261 Image of links - 262 Image of links - 263 Image of links - 264 Image of links - 265
Image of links - 266 Image of links - 267 Image of links - 268 Image of links - 269 Image of links - 270
Image of links - 271 Image of links - 272 Image of links - 273 Image of links - 274 Image of links - 275
Image of links - 276 Image of links - 277 Image of links - 278 Image of links - 279 Image of links - 280
Image of links - 281 Image of links - 282 Image of links - 283 Image of links - 284 Image of links - 285
Image of links - 286 Image of links - 287 Image of links - 288 Image of links - 289 Image of links - 290
Image of links - 291 Image of links - 292 Image of links - 293 Image of links - 294 Image of links - 295
Image of links - 296 Image of links - 297 Image of links - 298 Image of links - 299 Image of links - 300
Image of links - 301 Image of links - 302 Image of links - 303 Image of links - 304 Image of links - 305
Image of links - 306 Image of links - 307 Image of links - 308 Image of links - 309 Image of links - 310
Image of links - 311 Image of links - 312 Image of links - 313 Image of links - 314 Image of links - 315
Image of links - 316 Image of links - 317 Image of links - 318 Image of links - 319 Image of links - 320
Image of links - 321 Image of links - 322 Image of links - 323 Image of links - 324 Image of links - 325
Image of links - 326 Image of links - 327 Image of links - 328 Image of links - 329 Image of links - 330
Image of links - 331 Image of links - 332 Image of links - 333 Image of links - 334 Image of links - 335
Image of links - 336 Image of links - 337 Image of links - 338 Image of links - 339 Image of links - 340
Image of links - 341 Image of links - 342 Image of links - 343 Image of links - 344 Image of links - 345
Image of links - 346 Image of links - 347 Image of links - 348 Image of links - 349 Image of links - 350
Image of links - 351 Image of links - 352 Image of links - 353 Image of links - 354 Image of links - 355
Image of links - 356 Image of links - 357 Image of links - 358 Image of links - 359 Image of links - 360
Image of links - 361 Image of links - 362 Image of links - 363 Image of links - 364 Image of links - 365
Image of links - 366 Image of links - 367 Image of links - 368 Image of links - 369 Image of links - 370
Image of links - 371 Image of links - 372 Image of links - 373 Image of links - 374 Image of links - 375
Image of links - 376 Image of links - 377 Image of links - 378 Image of links - 379 Image of links - 380
Image of links - 381 Image of links - 382 Image of links - 383 Image of links - 384 Image of links - 385
Image of links - 386 Image of links - 387 Image of links - 388 Image of links - 389 Image of links - 390
Image of links - 391 Image of links - 392 Image of links - 393 Image of links - 394 Image of links - 395
Image of links - 396 Image of links - 397 Image of links - 398 Image of links - 399 Image of links - 400
Image of links - 401 Image of links - 402 Image of links - 403 Image of links - 404 Image of links - 405
Image of links - 406 Image of links - 407 Image of links - 408 Image of links - 409 Image of links - 410
Image of links - 411 Image of links - 412 Image of links - 413 Image of links - 414 Image of links - 415
Image of links - 416 Image of links - 417 Image of links - 418 Image of links - 419 Image of links - 420
Image of links - 421 Image of links - 422 Image of links - 423 Image of links - 424 Image of links - 425
Image of links - 426 Image of links - 427 Image of links - 428 Image of links - 429 Image of links - 430
Image of links - 431 Image of links - 432 Image of links - 433 Image of links - 434 Image of links - 435
Image of links - 436 Image of links - 437 Image of links - 438 Image of links - 439 Image of links - 440
Image of links - 441 Image of links - 442 Image of links - 443 Image of links - 444 Image of links - 445
Image of links - 446 Image of links - 447 Image of links - 448 Image of links - 449 Image of links - 450
Image of links - 451 Image of links - 452 Image of links - 453 Image of links - 454 Image of links - 455
Image of links - 456 Image of links - 457 Image of links - 458 Image of links - 459 Image of links - 460
Image of links - 461 Image of links - 462 Image of links - 463 Image of links - 464 Image of links - 465
Image of links - 466 Image of links - 467 Image of links - 468 Image of links - 469 Image of links - 470
Image of links - 471 Image of links - 472 Image of links - 473 Image of links - 474 Image of links - 475
Image of links - 476 Image of links - 477 Image of links - 478 Image of links - 479 Image of links - 480
Image of links - 481 Image of links - 482 Image of links - 483 Image of links - 484 Image of links - 485
Image of links - 486 Image of links - 487 Image of links - 488 Image of links - 489 Image of links - 490
Image of links - 491 Image of links - 492 Image of links - 493 Image of links - 494 Image of links - 495
Image of links - 496 Image of links - 497 Image of links - 498 Image of links - 499 Image of links - 500
Image of links - 501 Image of links - 502 Image of links - 503 Image of links - 504 Image of links - 505
Image of links - 506 Image of links - 507 Image of links - 508 Image of links - 509 Image of links - 510
Image of links - 511 Image of links - 512 Image of links - 513 Image of links - 514 Image of links - 515
Image of links - 516 Image of links - 517 Image of links - 518 Image of links - 519 Image of links - 520
Image of links - 521 Image of links - 522 Image of links - 523 Image of links - 524 Image of links - 525
Image of links - 526 Image of links - 527 Image of links - 528 Image of links - 529 Image of links - 530
Image of links - 531 Image of links - 532 Image of links - 533 Image of links - 534 Image of links - 535
Image of links - 536 Image of links - 537 Image of links - 538 Image of links - 539 Image of links - 540
Image of links - 541 Image of links - 542 Image of links - 543 Image of links - 544 Image of links - 545
Image of links - 546 Image of links - 547 Image of links - 548 Image of links - 549 Image of links - 550
Image of links - 551 Image of links - 552 Image of links - 553 Image of links - 554 Image of links - 555
Image of links - 556 Image of links - 557 Image of links - 558 Image of links - 559 Image of links - 560
Image of links - 561 Image of links - 562 Image of links - 563 Image of links - 564 Image of links - 565
Image of links - 566 Image of links - 567 Image of links - 568 Image of links - 569 Image of links - 570
Image of links - 571 Image of links - 572 Image of links - 573 Image of links - 574 Image of links - 575
Image of links - 576 Image of links - 577 Image of links - 578 Image of links - 579 Image of links - 580
Image of links - 581 Image of links - 582 Image of links - 583 Image of links - 584 Image of links - 585
Image of links - 586 Image of links - 587 Image of links - 588 Image of links - 589 Image of links - 590
Image of links - 591 Image of links - 592 Image of links - 593 Image of links - 594 Image of links - 595
Image of links - 596 Image of links - 597 Image of links - 598 Image of links - 599 Image of links - 600
Image of links - 601 Image of links - 602 Image of links - 603 Image of links - 604 Image of links - 605
Image of links - 606 Image of links - 607 Image of links - 608 Image of links - 609 Image of links - 610
Image of links - 611 Image of links - 612 Image of links - 613 Image of links - 614 Image of links - 615
Image of links - 616 Image of links - 617 Image of links - 618 Image of links - 619 Image of links - 620
Image of links - 621 Image of links - 622 Image of links - 623 Image of links - 624 Image of links - 625
Image of links - 626 Image of links - 627 Image of links - 628 Image of links - 629 Image of links - 630
Image of links - 631 Image of links - 632 Image of links - 633 Image of links - 634 Image of links - 635
Image of links - 636 Image of links - 637 Image of links - 638 Image of links - 639 Image of links - 640
Image of links - 641 Image of links - 642 Image of links - 643 Image of links - 644 Image of links - 645
Image of links - 646 Image of links - 647 Image of links - 648 Image of links - 649 Image of links - 650
Image of links - 651 Image of links - 652 Image of links - 653 Image of links - 654 Image of links - 655
Image of links - 656 Image of links - 657 Image of links - 658 Image of links - 659 Image of links - 660
Image of links - 661 Image of links - 662 Image of links - 663 Image of links - 664 Image of links - 665
Image of links - 666 Image of links - 667 Image of links - 668 Image of links - 669 Image of links - 670
Image of links - 671 Image of links - 672 Image of links - 673 Image of links - 674 Image of links - 675
Image of links - 676 Image of links - 677 Image of links - 678 Image of links - 679 Image of links - 680
Image of links - 681 Image of links - 682 Image of links - 683 Image of links - 684 Image of links - 685
Image of links - 686 Image of links - 687 Image of links - 688 Image of links - 689 Image of links - 690
Image of links - 691 Image of links - 692 Image of links - 693 Image of links - 694 Image of links - 695
Image of links - 696 Image of links - 697 Image of links - 698 Image of links - 699 Image of links - 700
Image of links - 701 Image of links - 702 Image of links - 703 Image of links - 704 Image of links - 705
Image of links - 706 Image of links - 707 Image of links - 708 Image of links - 709 Image of links - 710
Image of links - 711 Image of links - 712 Image of links - 713 Image of links - 714 Image of links - 715
Image of links - 716 Image of links - 717 Image of links - 718 Image of links - 719 Image of links - 720
Image of links - 721 Image of links - 722 Image of links - 723 Image of links - 724 Image of links - 725
Image of links - 726 Image of links - 727 Image of links - 728 Image of links - 729 Image of links - 730
Image of links - 731 Image of links - 732 Image of links - 733 Image of links - 734 Image of links - 735
Image of links - 736 Image of links - 737 Image of links - 738 Image of links - 739 Image of links - 740
Image of links - 741 Image of links - 742 Image of links - 743 Image of links - 744 Image of links - 745
Image of links - 746 Image of links - 747 Image of links - 748 Image of links - 749 Image of links - 750
Image of links - 751 Image of links - 752 Image of links - 753 Image of links - 754 Image of links - 755
Image of links - 756 Image of links - 757 Image of links - 758 Image of links - 759 Image of links - 760
Image of links - 761 Image of links - 762 Image of links - 763 Image of links - 764 Image of links - 765
Image of links - 766 Image of links - 767 Image of links - 768 Image of links - 769 Image of links - 770
Image of links - 771 Image of links - 772 Image of links - 773 Image of links - 774 Image of links - 775
Image of links - 776 Image of links - 777 Image of links - 778 Image of links - 779 Image of links - 780
Image of links - 781 Image of links - 782 Image of links - 783 Image of links - 784 Image of links - 785
Image of links - 786 Image of links - 787 Image of links - 788 Image of links - 789 Image of links - 790
Image of links - 791 Image of links - 792 Image of links - 793 Image of links - 794 Image of links - 795
Image of links - 796 Image of links - 797 Image of links - 798 Image of links - 799 Image of links - 800
Image of links - 801 Image of links - 802 Image of links - 803 Image of links - 804 Image of links - 805
Image of links - 806 Image of links - 807 Image of links - 808 Image of links - 809 Image of links - 810
Image of links - 811 Image of links - 812 Image of links - 813 Image of links - 814 Image of links - 815
Image of links - 816 Image of links - 817 Image of links - 818 Image of links - 819 Image of links - 820
Image of links - 821 Image of links - 822 Image of links - 823 Image of links - 824 Image of links - 825
Image of links - 826 Image of links - 827 Image of links - 828 Image of links - 829 Image of links - 830
Image of links - 831 Image of links - 832 Image of links - 833 Image of links - 834 Image of links - 835
Image of links - 836 Image of links - 837 Image of links - 838 Image of links - 839 Image of links - 840
Image of links - 841 Image of links - 842 Image of links - 843 Image of links - 844 Image of links - 845
Image of links - 846 Image of links - 847 Image of links - 848 Image of links - 849 Image of links - 850
Image of links - 851 Image of links - 852 Image of links - 853 Image of links - 854 Image of links - 855
Image of links - 856 Image of links - 857 Image of links - 858 Image of links - 859 Image of links - 860
Image of links - 861 Image of links - 862 Image of links - 863 Image of links - 864 Image of links - 865
Image of links - 866 Image of links - 867 Image of links - 868 Image of links - 869 Image of links - 870
Image of links - 871 Image of links - 872 Image of links - 873 Image of links - 874 Image of links - 875
Image of links - 876 Image of links - 877 Image of links - 878 Image of links - 879 Image of links - 880
Image of links - 881 Image of links - 882 Image of links - 883 Image of links - 884 Image of links - 885
Image of links - 886 Image of links - 887 Image of links - 888 Image of links - 889 Image of links - 890
Image of links - 891 Image of links - 892 Image of links - 893 Image of links - 894 Image of links - 895
Image of links - 896 Image of links - 897 Image of links - 898 Image of links - 899 Image of links - 900
Image of links - 901 Image of links - 902 Image of links - 903 Image of links - 904 Image of links - 905
Image of links - 906 Image of links - 907 Image of links - 908 Image of links - 909 Image of links - 910
Image of links - 911 Image of links - 912 Image of links - 913 Image of links - 914 Image of links - 915
Image of links - 916 Image of links - 917 Image of links - 918 Image of links - 919 Image of links - 920
Image of links - 921 Image of links - 922 Image of links - 923 Image of links - 924 Image of links - 925
Image of links - 926 Image of links - 927 Image of links - 928 Image of links - 929 Image of links - 930
Image of links - 931 Image of links - 932 Image of links - 933 Image of links - 934 Image of links - 935
Image of links - 936 Image of links - 937 Image of links - 938 Image of links - 939 Image of links - 940
Image of links - 941 Image of links - 942 Image of links - 943 Image of links - 944 Image of links - 945
Image of links - 946 Image of links - 947 Image of links - 948 Image of links - 949 Image of links - 950
Image of links - 951 Image of links - 952 Image of links - 953 Image of links - 954 Image of links - 955
Image of links - 956 Image of links - 957 Image of links - 958 Image of links - 959 Image of links - 960
Image of links - 961 Image of links - 962 Image of links - 963 Image of links - 964 Image of links - 965
Image of links - 966 Image of links - 967 Image of links - 968 Image of links - 969 Image of links - 970
Image of links - 971 Image of links - 972 Image of links - 973 Image of links - 974 Image of links - 975
Image of links - 976 Image of links - 977 Image of links - 978 Image of links - 979 Image of links - 980
Image of links - 981 Image of links - 982 Image of links - 983 Image of links - 984 Image of links - 985
Image of links - 986 Image of links - 987 Image of links - 988 Image of links - 989 Image of links - 990
Image of links - 991 Image of links - 992 Image of links - 993 Image of links - 994 Image of links - 995
Image of links - 996 Image of links - 997 Image of links - 998 Image of links - 999 Image of links - 1000

flower online

SEO - Search Engine OptimizationFlowers deliverySite map