Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

We can represent a data set as a collection of object nodes and a collection of

ID: 3822658 • Letter: W

Question

We can represent a data set as a collection of object nodes and a collection of attribute nodes, where there is a link between each object and each attribute, and where the weight of that link is the value of the object for that attribute. For sparse data, if the value is 0, the link is omitted. Bipartite clustering attempts to partition this graph into disjoint clusters, where each cluster consists of a set of object nodes and a set of attribute nodes. The objective is to maximize the weight of links between the object and attribute nodes of a cluster, while minimizing the weight of links between object and attribute links in different clusters. This type of clustering is also known as co-clustering since the objects and attributes are clustered at the same time.
a)   How is bipartite clustering (co-clustering) different from clustering the sets of objects and attributes separately?
b)   Are there any cases in which these approaches yield the same clusters?

Explanation / Answer

A....

In this section we rst recall the standard notion of FCA, as well as the notion

of independent sub-contexts, and then give their counterpart in the setting of

bipartite graphs where we interpret them in clustering terms.

2.1 Formal concepts and independent subcontexts

Let Rbe a binary relation between a set Oof objects and a set Pof Boolean

properties. We note R= (O,P, R) the tuple formed by these objects and prop-

erties sets and the binary relation. It is called a formal context [11]. The notation

(x, y)Rmeans that object xhas property y. Let R(x) = {yP|(x, y)R}

be the set of properties of object x. Similarly, R1(y) = {xO|(x, y)R}is

the set of objects having property y.

Formal concept analysis [11] denes two set operators, here denoted (.)

and (.)1, called intent and extent operators respectively, s.t. YPand

XO:

X={yP|xX, (x, y)R}(1)

Y1={xO|yY, (x, y)R}(2)

Xis the set of properties possessed by all objects in X.Y1is the set of

objects having all properties in Y. These two operators induce an antitone Galois

connection between 2Oand 2P. This means that the following property holds

XY1YX.

A pair such that X=Yand Y1=Xis called a formal concept[11]. X

is its extent and Yits intent. In other words, a formal concept is a pair (X, Y )

Clustering sets of objects using concepts-objects bipartite graphs 3

such that Xis the set of objects having all properties in Yand Yis the set

of properties shared by all objects in X. It can be shown that formal concepts

correspond to maximal pairs (X, Y ) such that

X×YR.

A recent parallel between formal concept analysis and possibility theory[8]

has led to emphasize the interest of an other remarkable set operator (.), and

their two respective duals. The new operator and the already dened intent

operator can be written as follows, XO:

X={yP|R1(y)X6=} (3)

X={yP|R1(y)X}(4)

Note that (4) is equivalent to the denition of operator (.)in (1). Xis the

set of properties that are possessed by at least one object in X.Xis the set

of properties shared by all objects in X.

Operators (.)1, (.)1are dened similarly on a set Yof properties by

substituting R1to Rand by inverting Oand P. (Y)1, (Y)1are respec-

tively, the set of objects having at least one property in Yand the set of objects

that have all the properties in Y.

This new operator lead to consider a new connection[9] that corresponds to

pairs (X, Y ) such that X=Yand Y1=X(while (.)leads to formal

concepts, as already said). Pairs (X, Y ) such that X=Yand Y1=X

do not dene formal concept, but independent sub-contexts. Indeed, it has been

recently shown[9] that pairs (X, Y ) of sets exchanged through the new connection

operator, are subsets such that

(X×Y)(X×Y)R,

just as formal concepts correspond to maximal pairs (X, Y ) such that

X×YR

B......Consider a network of n nodes. Suppose we want to create a mapping or a clustering of these nodes. cij denotes the number of links (e.g., co-occurrence links, co-citation links, or bibliographic coupling links) between nodes i and j (cij = cji 0). sij denotes the association strength of nodes i and j (Van Eck & Waltman, 2009) and is given by i j ij ij c c mc s 2 , (1) where ci denotes the total number of links of node i and m denotes the total number of links in the network, that is,

c c and i i m c 2 1 . (2) In the case of mapping, we need to find for each node i a vector xi R p that indicates the location of node i in a p-dimensional map (usually p = 2). In the case of clustering, we need to find for each node i a positive integer xi that indicates the cluster to which node i belongs. Our unified approach to mapping and clustering is based on minimizing i j ij i j V x xn sijdij d 2 1 ( ,, ) (3) with respect to x1, …, xn. dij denotes the distance between nodes i and j and is given by p k ij i j ik jk d x x x x 1 2 ( ) (4) in the case of mapping and by i j i j ij x x x x d 1 if 0 if (5) in the case of clustering. We refer to the parameter in (5) as the resolution parameter ( > 0). The larger the value of this parameter, the larger the number of clusters that we obtain. Equation (3) can be interpreted in terms of attractive and repulsive forces between nodes. The first term in (3) represents an attractive force, and the second term represents a repulsive force. The higher the association strength of two nodes, the stronger the attractive force between the nodes. Since the strength of the repulsive force between two nodes does not depend on the association strength of the nodes, the overall effect of the two forces is that nodes with a high association strength are pulled towards each other while nodes with a low association strength are pushed away from each other. In the case of mapping, it has been shown that the above approach is equivalent to the VOS mapping technique (Van Eck & Waltman, 2007; Van Eck et al., 2010), which is in turn closely related to the well-known technique of multidimensional scaling.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote