nips nips2001 nips2001-89 knowledge-graph by maker-knowledge-mining

89 nips-2001-Grouping with Bias

Source: pdf

Author: Stella X. Yu, Jianbo Shi

Abstract: With the optimization of pattern discrimination as a goal, graph partitioning approaches often lack the capability to integrate prior knowledge to guide grouping. In this paper, we consider priors from unitary generative models, partially labeled data and spatial attention. These priors are modelled as constraints in the solution space. By imposing uniformity condition on the constraints, we restrict the feasible space to one of smooth solutions. A subspace projection method is developed to solve this constrained eigenproblema We demonstrate that simple priors can greatly improve image segmentation results. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract With the optimization of pattern discrimination as a goal, graph partitioning approaches often lack the capability to integrate prior knowledge to guide grouping. [sent-7, score-0.429]

2 In this paper, we consider priors from unitary generative models, partially labeled data and spatial attention. [sent-8, score-0.388]

3 These priors are modelled as constraints in the solution space. [sent-9, score-0.247]

4 By imposing uniformity condition on the constraints, we restrict the feasible space to one of smooth solutions. [sent-10, score-0.197]

5 A subspace projection method is developed to solve this constrained eigenproblema We demonstrate that simple priors can greatly improve image segmentation results. [sent-11, score-0.586]

6 In image segmentation, it means finding objects or object segments by clustering pixels and segregating them from background. [sent-13, score-0.479]

7 In computer vision, we would like image segmentation to correspond directly to object segmentation. [sent-17, score-0.512]

8 In data clustering, if users provide a few examples of clusters, we would like a system to adapt the grouping process to achieve the desired properties. [sent-18, score-0.436]

9 We show in this paper that it is possible to integrate both bottom-up and top-down information in a single grouping process. [sent-20, score-0.436]

10 In the proposed method, the bottom-up grouping process is modelled as a graph partitioning [1, 4, 12, 11, 14, 15] problem, and the top-down knowledge is encoded as constraints on the solution space. [sent-21, score-1.027]

11 Though we consider normalized cuts criteria in particular, similar derivation can be developed for other graph partitioning criteria as well. [sent-22, score-0.814]

12 We show that it leads to a constrained eigenvalue problem, where the global optimal solution can be obtained by eigendecomposition. [sent-23, score-0.234]

13 2 Model In graph theoretic methods for grouping, a relational graph GA == (V, E, W) is first constructed based on pairwise similarity between two elements. [sent-26, score-0.326]

14 Associated with the graph edge between vertex i and j is weight Wij , characterizing their likelihood of belonging in the same group. [sent-27, score-0.302]

15 For image segmentation, pixels are taken as graph nodes, and pairwise pixel similarity can be evaluated based on a number of low level grouping cues. [sent-28, score-1.057]

16 etween two pixels is inversely proportional to the magnitude of the strongest intervening edge [9]. [sent-31, score-0.338]

17 For every marked pixel, we compute its affinity with its neighbours within a radius of 10. [sent-39, score-0.259]

18 The value is determined by a Gaussian function of the maximum magnitude of edges crossing the straight line connecting the two pixels [9]. [sent-40, score-0.281]

19 When there is a strong edge separating the two, the affinity is low. [sent-41, score-0.162]

20 It is the second eigenvector from normalized cuts [15] on the affinity matrix. [sent-44, score-0.484]

21 This gives a bipartitioning of the image which corresponds to the best cuts that have maximum within-region coherence and between-region distinction. [sent-48, score-0.432]

22 After an image is transcribed into a graph, image segmentation becomes a vertex partitioning problem. [sent-49, score-0.832]

23 This corresponds to vertex bipartitioning (VI, V2 ) on graph G, where V = VI U V2 and VI n V2 = 0. [sent-51, score-0.29]

24 A good segmentation seeks a partitioning such that nodes within partitions are tightly connected and nodes across partitions are loosely connected. [sent-52, score-0.8]

25 For normalized cuts [15], the solution is given by some eigenvector of weight matrix W (Fig. [sent-54, score-0.55]

26 Thresholding on it leads to a discrete segmentation (Fig. [sent-56, score-0.319]

27 hile we will focus on normalized cuts criteria [15], most of the following discussions apply to other criteria as well. [sent-59, score-0.465]

28 1 Biased grouping as constrained optimization Knowledge other than the image itself can greatly change the segmentation we might obtain based on such low level cues. [sent-61, score-0.976]

29 The sources of priors we consider in this paper are: unitary generative models (Fig. [sent-63, score-0.228]

30 2a), which could arise from sensor models in MRF [5], partial grouping (Fig. [sent-64, score-0.536]

31 We model such prior knowledge in the form of constraints on a valid grouping configuration. [sent-68, score-0.598]

32 In particular, we see that all such prior knowledge defines a partial a)Bright foreground. [sent-69, score-0.18]

33 In this case, pixels of light (dark) intensities are likely to be the foreground(background). [sent-74, score-0.313]

34 This prior knowledge is helpful not only for identifying the tiger as the foreground, but also for perceiving the river as one piece. [sent-75, score-0.17]

35 How can we incorporate these unitary constraints into a- graph that handles only pairwise relationships between pixels? [sent-76, score-0.397]

36 b )Global configuration constraints from partial grouping a priori. [sent-77, score-0.618]

37 In this case, we have manually selected two sets of pixels to be grouped together in foreground (+) and background (JJ. [sent-78, score-0.486]

38 They are distributed across the image and often have distinct local features. [sent-80, score-0.117]

39 How can we force them to be in the same group and further bring similar pixels along and push dissimilar pixels apart? [sent-81, score-0.597]

40 grouping solution, indicating which set of pixels should belong to one partition. [sent-86, score-0.748]

41 Unitary generative models: H l and H 2 contains a set of pixels that satisfy the unitary generative models for foreground and background respectively. [sent-90, score-0.666]

42 2a, H l (H2 ) contains pixels of brightest(darkest) intensities. [sent-92, score-0.281]

43 ,n, contains a set of pixels that users specify to belong together. [sent-96, score-0.312]

44 Spatial attention: H l == 0 and H 2 contains pixels randomly selected outside the visual fovea, since we want to maintain maximum discrimination at the -fovea but merging pixels far away from the fovea to be one group. [sent-98, score-0.625]

45 To formulate these constraints induced on the graph partitioning, we introduce binary group indicators X == [Xl, X 2 ]. [sent-99, score-0.257]

46 The constrained grouping problem can be formally written as: min s. [sent-102, score-0.542]

47 €(Xl ,X2 ) Xt(i) == Xt(j), i, j E HE, 1 == 1"" ,n, t == 1,2, Xt(i) =1= Xt(j), i E H l , j E H 2 , t == 1,2, where €(X1 ,X2 ) is some graph partitioning cost function, such as minimum cuts [6], average cuts [7], or normalized cuts [15]. [sent-104, score-1.13]

48 The first set of constraints can be re-written in matrix form: U T X == 0 , where, e. [sent-105, score-0.13]

49 We search for the optimal solution only in the feasible set determined by all the constraints. [sent-108, score-0.179]

50 2 Conditions on grouping constraints The above formulation can be implemented by the maximum-flow algorithm for minimum cuts criteria [6, 13, 3], where two special nodes called source and sink are introduced,. [sent-110, score-0.976]

51 In the context of learning from labeled and unlabeled data, the biased mincuts are linked to minimizing leave-one-out cross validation [2]. [sent-112, score-0.21]

52 In the normalize cuts formulation, this leads to a constrained eigenvalue problem, as soon to be seen. [sent-113, score-0.336]

53 However, simply forcing a few nodes to be in the same group can produce some undesirable graph partitioning results, illustrated in Fig. [sent-114, score-0.527]

54 When we assign points from top and bottom clusters to be together, we do not just want one of the groups to lose its labeled point to the other group (Fig. [sent-118, score-0.243]

55 3b), but rather we desire the biased grouping process to explore their neighbouring connections and change the organization to left and right division accordingly. [sent-119, score-0.573]

56 Figure 3: Undesired grouping caused by simple grouping constraints. [sent-124, score-0.872]

57 a)Data points are distributed in four groups, with a larger spatial gap between top and bottom groups than that between left and right groups. [sent-125, score-0.13]

58 Defining weights based on proximity, we find the top-bottom grouping as the optimal bisection. [sent-126, score-0.481]

59 The desired partitioning should now be the left-right division. [sent-129, score-0.209]

60 However, perturbation on the unconstrained optimal cut can lead to a partitioning that satisfies the constraints while producing the smallest cut cost. [sent-130, score-0.635]

61 The desire of propagating partial grouping information on the constrained nodes is, however, not reflected in the constrained partitioning criterion itself. [sent-131, score-1.071]

62 Often, a slightly perturbed version of the optimal unbiased cut becomes the legitimate optimum. [sent-132, score-0.254]

63 One reason for such a solution being undesirable is that some of the "perturbed" nodes-are isolated from their close neighbours. [sent-133, score-0.174]

64 To fix this problem, we introduce the notion of uniformity of a graph partitioning. [sent-134, score-0.251]

65 Intuitively, if two labeled nodes, i and j, have similar connections to their neighbours, we desire a cut to treat them fairly so that if i gets grouped with i's friends, j also gets grouped with j's friends (Fig. [sent-135, score-0.471]

66 This uniformity condition is one way to propagate prior grouping information from labeled nodes to their neighbours. [sent-137, score-0.834]

67 For normalized cuts criteria, we define the normalized cuts of a single node to be ·. [sent-138, score-0.622]

68 This value is high for a node isolated from its close neighbours in partitioning X. [sent-142, score-0.319]

69 4, we show that the uniformity condition on the bias helps preserving the smoothness of solutions at every labeled point. [sent-146, score-0.366]

70 b)Simple constraint U T X == 0 forces any feasible solution to have equal valuation on the two marked points. [sent-188, score-0.231]

71 e)Segmentation with simple bias not only fails to glue the two side strips into one, but also has two marked points isolated from their neighbours. [sent-193, score-0.323]

72 f)Segmentaion with conditioned bias brings two side strips into one group. [sent-194, score-0.347]

73 It is a transition probability matrix for nonnegative weight matrix W [10]. [sent-200, score-0.131]

74 We define a new variable x == (1 - a)XI - aX2 • We can show that for normalized cuts, the biased grouping with the uniformity condition is translated into: . [sent-202, score-0.716]

75 Using Lagrange multipliers, we find that the optimal solution x* satisfies: QPx* == AX*, E(X*) == 1 - A, where Q is a projector onto the feasible solution space: Q == I - D-1V(VTD-1V)-lVT , V == pTU. [sent-207, score-0.267]

76 Here we assume that the conditioned constraint V is of full rank, thus V T D- 1V is invertible. [sent-208, score-0.168]

77 Since 1 is still the trivial 'Solution corresponding to the largest eigenvalue of 1, the second leading right eigenvector of the matrix QP is the solution we seek. [sent-209, score-0.236]

78 To summarize, given weight matrix W, partial grouping in matrix form UT x we do the following to find the optimal bipartitioning: == 0, Compute degree matrix D, D ii == E j Wij , Vi. [sent-210, score-0.76]

79 For all the examples, we compute pixel affinity W as in Fig. [sent-218, score-0.174]

80 All the segmentation results are obtained by thresholding the eigenvectors using their mean values. [sent-220, score-0.409]

81 The results without bias, with simple bias U T x == 0 and conditioned bias U T Px == 0 are compared in Fig. [sent-221, score-0.352]

82 Figure 5: Segmentation with bias from unitary generative models. [sent-232, score-0.293]

83 b)We randomly sample 17 brightest pixels for HI (+),48 darkest pixels for H2 (~), producing 63 constraints in total. [sent-235, score-0.809]

84 The labeled nodes have an uneven influence on grouping. [sent-239, score-0.208]

85 g) and h) show the solution with conditioned bias. [sent-240, score-0.218]

86 It successfully breaks the image into tiger and river as our general impression of this image. [sent-241, score-0.207]

87 Prior knowledge is particularly useful in supplying long-range grouping information which often lacks in data grouping based on low level cues. [sent-243, score-0.913]

88 With our model, the partial grouping prior can be integrated into the bottom-up grouping framework by seeking the optimal solution in a restricted domain. [sent-244, score-1.178]

89 We show that the uniformity constraint is effective in eliminating spurious solutions resulting from simple perturbation on the optimal unbiased solutions. [sent-245, score-0.236]

90 Figure 6: Segmentation with bias from hand-labeled partial grouping. [sent-252, score-0.211]

91 b)Hand-labeled partial grouping includes 21 pixels for HI (+), 31 pixels for H 2 (A), producing 50 constraints in total. [sent-255, score-1.239]

92 Labeled pixels in cluttered contexts are poor at binding their segments together. [sent-259, score-0.355]

93 g) and h) show the solution with conditioned bias. [sent-260, score-0.218]

94 d) Figure 7: Segmentation with bias from spatial attention. [sent-268, score-0.167]

95 a)We randomly choose 86 pixels far away from the fovea (Fig. [sent-270, score-0.344]

96 d) and e) show the solution with conditioned bias. [sent-275, score-0.218]

97 It ignores the variation in the background scene, which includes not only large pieces of constant intensity, but also many small segments of various intensities. [sent-276, score-0.118]

98 This greatly simplifies the task we face when seeking a segmentation from the continuous eigensolution. [sent-283, score-0.388]

99 All these benefits come at a computational cost that is 10 times that for the original unbiased grouping problem. [sent-285, score-0.478]

100 Learning from labeled and unlabeled data using graph mincuts, 2001. [sent-300, score-0.244]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('grouping', 0.436), ('segmentation', 0.319), ('pixels', 0.281), ('cuts', 0.235), ('ncuts', 0.212), ('partitioning', 0.209), ('graph', 0.14), ('conditioned', 0.13), ('unitary', 0.129), ('cut', 0.12), ('image', 0.117), ('xt', 0.114), ('bias', 0.111), ('foreground', 0.111), ('uniformity', 0.111), ('strips', 0.106), ('affinity', 0.105), ('labeled', 0.104), ('nodes', 0.104), ('partial', 0.1), ('vision', 0.097), ('solution', 0.088), ('desire', 0.084), ('constraints', 0.082), ('bipartitioning', 0.08), ('criteria', 0.077), ('normalized', 0.076), ('vertex', 0.07), ('px', 0.069), ('constrained', 0.069), ('eigenvector', 0.068), ('fovea', 0.063), ('neighbours', 0.063), ('producing', 0.059), ('marked', 0.059), ('hi', 0.058), ('edge', 0.057), ('spatial', 0.056), ('grouped', 0.055), ('hz', 0.055), ('biased', 0.053), ('brightest', 0.053), ('darkest', 0.053), ('friends', 0.053), ('mincuts', 0.053), ('tiger', 0.053), ('generative', 0.053), ('thresholding', 0.05), ('matrix', 0.048), ('perturbed', 0.047), ('isolated', 0.047), ('wlo', 0.046), ('feasible', 0.046), ('pairwise', 0.046), ('priors', 0.046), ('optimal', 0.045), ('intensity', 0.044), ('segments', 0.042), ('unbiased', 0.042), ('sink', 0.042), ('groups', 0.042), ('knowledge', 0.041), ('vt', 0.041), ('eigenvectors', 0.04), ('condition', 0.04), ('prior', 0.039), ('background', 0.039), ('vi', 0.039), ('undesirable', 0.039), ('object', 0.039), ('constraint', 0.038), ('pixel', 0.037), ('pieces', 0.037), ('river', 0.037), ('shi', 0.037), ('computer', 0.037), ('min', 0.037), ('conference', 0.036), ('group', 0.035), ('greatly', 0.035), ('yu', 0.035), ('texture', 0.035), ('robotics', 0.035), ('weight', 0.035), ('seeking', 0.034), ('carnegie', 0.034), ('eigenvalue', 0.032), ('pittsburgh', 0.032), ('qp', 0.032), ('intensities', 0.032), ('binding', 0.032), ('partitions', 0.032), ('wij', 0.032), ('mellon', 0.032), ('ill', 0.032), ('compute', 0.032), ('bottom', 0.032), ('modelled', 0.031), ('belong', 0.031), ('clusters', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 89 nips-2001-Grouping with Bias

Author: Stella X. Yu, Jianbo Shi

2 0.19578609 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation

Author: Heiko Wersing

Abstract: We present a new approach to the supervised learning of lateral interactions for the competitive layer model (CLM) dynamic feature binding architecture. The method is based on consistency conditions, which were recently shown to characterize the attractor states of this linear threshold recurrent network. For a given set of training examples the learning problem is formulated as a convex quadratic optimization problem in the lateral interaction weights. An efﬁcient dimension reduction of the learning problem can be achieved by using a linear superposition of basis interactions. We show the successful application of the method to a medical image segmentation problem of ﬂuorescence microscope cell images.

3 0.14860475 191 nips-2001-Transform-invariant Image Decomposition with Similarity Templates

Author: Chris Stauffer, Erik Miller, Kinh Tieu

Abstract: Recent work has shown impressive transform-invariant modeling and clustering for sets of images of objects with similar appearance. We seek to expand these capabilities to sets of images of an object class that show considerable variation across individual instances (e.g. pedestrian images) using a representation based on pixel-wise similarities, similarity templates. Because of its invariance to the colors of particular components of an object, this representation enables detection of instances of an object class and enables alignment of those instances. Further, this model implicitly represents the regions of color regularity in the class-speciﬁc image set enabling a decomposition of that object class into component regions. 1

4 0.13769585 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

Author: Jens Kohlmorgen, Steven Lemm

Abstract: We propose a novel method for the analysis of sequential data that exhibits an inherent mode switching. In particular, the data might be a non-stationary time series from a dynamical system that switches between multiple operating modes. Unlike other approaches, our method processes the data incrementally and without any training of internal parameters. We use an HMM with a dynamically changing number of states and an on-line variant of the Viterbi algorithm that performs an unsupervised segmentation and classification of the data on-the-fly, i.e. the method is able to process incoming data in real-time. The main idea of the approach is to track and segment changes of the probability density of the data in a sliding window on the incoming data stream. The usefulness of the algorithm is demonstrated by an application to a switching dynamical system. 1

5 0.12590016 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss

Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1

6 0.10268387 170 nips-2001-Spectral Kernel Methods for Clustering

7 0.09725415 46 nips-2001-Categorization by Learning and Combining Object Parts

8 0.096185736 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

9 0.09415397 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

10 0.089664884 171 nips-2001-Spectral Relaxation for K-means Clustering

11 0.089235209 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

12 0.082780622 144 nips-2001-Partially labeled classification with Markov random walks

13 0.080645099 54 nips-2001-Contextual Modulation of Target Saliency

14 0.079301603 193 nips-2001-Unsupervised Learning of Human Motion Models

15 0.076230913 65 nips-2001-Effective Size of Receptive Fields of Inferior Temporal Visual Cortex Neurons in Natural Scenes

16 0.075074084 190 nips-2001-Thin Junction Trees

17 0.065278105 118 nips-2001-Matching Free Trees with Replicator Equations

18 0.063519128 14 nips-2001-A Neural Oscillator Model of Auditory Selective Attention

19 0.061705422 169 nips-2001-Small-World Phenomena and the Dynamics of Information

20 0.060532451 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.201), (1, -0.017), (2, -0.041), (3, -0.122), (4, 0.004), (5, -0.048), (6, -0.23), (7, -0.012), (8, 0.008), (9, -0.021), (10, 0.075), (11, 0.025), (12, 0.152), (13, 0.171), (14, -0.02), (15, -0.102), (16, -0.026), (17, 0.039), (18, 0.156), (19, -0.066), (20, -0.075), (21, 0.139), (22, -0.025), (23, 0.038), (24, -0.014), (25, -0.058), (26, 0.089), (27, -0.128), (28, -0.053), (29, 0.012), (30, -0.054), (31, 0.04), (32, 0.076), (33, 0.166), (34, 0.064), (35, 0.116), (36, 0.117), (37, 0.18), (38, -0.049), (39, -0.067), (40, -0.041), (41, -0.006), (42, 0.015), (43, 0.094), (44, 0.067), (45, -0.007), (46, -0.106), (47, 0.119), (48, -0.006), (49, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97366452 89 nips-2001-Grouping with Bias

Author: Stella X. Yu, Jianbo Shi

2 0.69415182 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation

Author: Heiko Wersing

3 0.57875532 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

Author: Jens Kohlmorgen, Steven Lemm

4 0.56664395 191 nips-2001-Transform-invariant Image Decomposition with Similarity Templates

Author: Chris Stauffer, Erik Miller, Kinh Tieu

5 0.48935464 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

Author: Mikhail Belkin, Partha Niyogi

Abstract: Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami op erator on a manifold , and the connections to the heat equation , we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered. In many areas of artificial intelligence, information retrieval and data mining, one is often confronted with intrinsically low dimensional data lying in a very high dimensional space. For example, gray scale n x n images of a fixed object taken with a moving camera yield data points in rn: n2 . However , the intrinsic dimensionality of the space of all images of t he same object is the number of degrees of freedom of the camera - in fact the space has the natural structure of a manifold embedded in rn: n2 . While there is a large body of work on dimensionality reduction in general, most existing approaches do not explicitly take into account the structure of the manifold on which the data may possibly reside. Recently, there has been some interest (Tenenbaum et aI, 2000 ; Roweis and Saul, 2000) in the problem of developing low dimensional representations of data in this particular context. In this paper , we present a new algorithm and an accompanying framework of analysis for geometrically motivated dimensionality reduction. The core algorithm is very simple, has a few local computations and one sparse eigenvalu e problem. The solution reflects th e intrinsic geom etric structure of the manifold. Th e justification comes from the role of the Laplacian op erator in providing an optimal emb edding. Th e Laplacian of the graph obtained from the data points may be viewed as an approximation to the Laplace-Beltrami operator defined on the manifold. The emb edding maps for the data come from approximations to a natural map that is defined on the entire manifold. The framework of analysis presented here makes this connection explicit. While this connection is known to geometers and specialists in sp ectral graph theory (for example , see [1, 2]) to the best of our knowledge we do not know of any application to data representation yet. The connection of the Laplacian to the heat kernel enables us to choose the weights of the graph in a principled manner. The locality preserving character of the Laplacian Eigenmap algorithm makes it relatively insensitive to outliers and noise. A byproduct of this is that the algorithm implicitly emphasizes the natural clusters in the data. Connections to spectral clustering algorithms developed in learning and computer vision (see Shi and Malik , 1997) become very clear. Following the discussion of Roweis and Saul (2000) , and Tenenbaum et al (2000), we note that the biological perceptual apparatus is confronted with high dimensional stimuli from which it must recover low dimensional structure. One might argue that if the approach to recovering such low-dimensional structure is inherently local , then a natural clustering will emerge and thus might serve as the basis for the development of categories in biological perception. 1 The Algorithm Given k points Xl , ... , Xk in ]]{ I, we construct a weighted graph with k nodes, one for each point , and the set of edges connecting neighboring points to each other. 1. Step 1. [Constru cting th e Graph] We put an edge between nodes i and j if Xi and Xj are

6 0.38634589 182 nips-2001-The Fidelity of Local Ordinal Encoding

7 0.38542765 144 nips-2001-Partially labeled classification with Markov random walks

8 0.38243952 193 nips-2001-Unsupervised Learning of Human Motion Models

9 0.37871104 93 nips-2001-Incremental A*

10 0.37680355 14 nips-2001-A Neural Oscillator Model of Auditory Selective Attention

11 0.34996909 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

12 0.34888357 64 nips-2001-EM-DD: An Improved Multiple-Instance Learning Technique

13 0.34609956 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

14 0.32838535 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

15 0.31094214 169 nips-2001-Small-World Phenomena and the Dynamics of Information

16 0.29260543 54 nips-2001-Contextual Modulation of Target Saliency

17 0.29013115 180 nips-2001-The Concave-Convex Procedure (CCCP)

18 0.28902236 118 nips-2001-Matching Free Trees with Replicator Equations

19 0.2853938 171 nips-2001-Spectral Relaxation for K-means Clustering

20 0.27020124 34 nips-2001-Analog Soft-Pattern-Matching Classifier using Floating-Gate MOS Technology

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.031), (17, 0.059), (19, 0.039), (27, 0.138), (30, 0.084), (38, 0.035), (50, 0.148), (59, 0.029), (69, 0.044), (72, 0.042), (79, 0.057), (83, 0.015), (88, 0.01), (91, 0.175)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94126958 91 nips-2001-Improvisation and Learning

Author: Judy A. Franklin

Abstract: This article presents a 2-phase computational learning model and application. As a demonstration, a system has been built, called CHIME for Computer Human Interacting Musical Entity. In phase 1 of training, recurrent back-propagation trains the machine to reproduce 3 jazz melodies. The recurrent network is expanded and is further trained in phase 2 with a reinforcement learning algorithm and a critique produced by a set of basic rules for jazz improvisation. After each phase CHIME can interactively improvise with a human in real time. 1 Foundations Jazz improvisation is the creation of a jazz melody in real time. Charlie Parker, Dizzy Gillespie, Miles Davis, John Coltrane, Charles Mingus, Thelonious Monk, and Sonny Rollins et al. were the founders of bebop and post bop jazz [9] where drummers, bassists, and pianists keep the beat and maintain harmonic structure. Other players improvise over this structure and even take turns improvising for 4 bars at a time. This is called trading fours. Meanwhile, artiﬁcial neural networks have been used in computer music [4, 12]. In particular, the work of (Todd [11]) is the basis for phase 1 of CHIME, a novice machine improvisor that learns to trade fours. Firstly, a recurrent network is trained with back-propagation to play three jazz melodies by Sonny Rollins [1], as described in Section 2. Phase 2 uses actor-critic reinforcement learning and is described in Section 3. This section is on jazz basics. 1.1 Basics: Chords, the ii-V-I Chord Progression and Scales The harmonic structure mentioned above is a series of chords that may be reprated and that are often grouped into standard subsequences. A chord is a group of notes played simultaneously. In the chromatic scale, C-Db-D-Eb-E-F-Gb-G-Ab-A-Bb-B-C, notes are separated by a half step. A ﬂat (b) note is a half step below the original note; a sharp (#) is a half above. Two half steps are a whole step. Two whole steps are a major third. Three half steps are a minor third. A major triad (chord) is the ﬁrst or tonic note, then the note a major third up, then the note a minor third up. When F is the tonic, F major triad is F-A-C. A minor triad (chord) is the tonic ¡ www.cs.smith.edu/˜jfrankli then a minor third, then a major third. F minor triad is F-Ab-C. The diminished triad is the tonic, then a minor third, then a minor third. F diminished triad is F-Ab-Cb. An augmented triad is the tonic, then a major third, then a major third. The F augmented triad is F-A-Db. A third added to the top of a triad forms a seventh chord. A major triad plus a major third is the major seventh chord. F-A-C-E is the F major seventh chord (Fmaj7). A minor triad plus a minor third is a minor seventh chord. For F it is F-Ab-C-Eb (Fm7). A major triad plus a minor third is a dominant seventh chord. For F it is F-A-C-Eb (F7). These three types of chords are used heavily in jazz harmony. Notice that each note in the chromatic scales can be the tonic note for any of these types of chords. A scale, a subset of the chromatic scale, is characterized by note intervals. Let W be a whole step and H be a half. The chromatic scale is HHHHHHHHHHHH. The major scale or ionian mode is WWHWWWH. F major scale is F-G-A-Bb-C-D-E-F. The notes in a scale are degrees; E is the seventh degree of F major. The ﬁrst, third, ﬁfth, and seventh notes of a major scale are the major seventh chord. The ﬁrst, third, ﬁfth, and seventh notes of other modes produce the minor seventh and dominant seventh chords. Roman numerals represent scale degrees and their seventh chords. Upper case implies major or dominant seventh and lower case implies minor seventh [9]. The major seventh chord starting at the scale tonic is the I (one) chord. G is the second degree of F major, and G-Bb-D-F is Gm7, the ii chord, with respect to F. The ii-V-I progression is prevalent in jazz [9], and for F it is Gm7-C7-Fmaj7. The minor ii-V-i progression is obtained using diminished and augmented triads, their seventh chords, and the aeolian mode. Seventh chords can be extended by adding major or minor thirds, e.g. Fmaj9, Fmaj11, Fmaj13, Gm9, Gm11, and Gm13. Any extension can be raised or lowered by 1 step [9] to obtain, e.g. Fmaj7#11, C7#9, C7b9, C7#11. Most jazz compositions are either the 12 bar blues or sectional forms (e.g. ABAB, ABAC, or AABA) [8]. The 3 Rollins songs are 12 bar blues. “Blue 7” has a simple blues form. In “Solid” and “Tenor Madness”, Rollins adds bebop variations to the blues form [1]. ii-V-I and VI-II-V-I progressions are added and G7+9 substitutes for the VI and F7+9 for the V (see section 1.2 below); the II-V in the last bar provides the turnaround to the I of the ﬁrst bar to foster smooth repetition of the form. The result is at left and in Roman numeral notation Bb7 Bb7 Bb7 Bb7 I I I I Eb7 Eb7 Bb7 G7+9 IV IV I VI at right: Cm7 F7 Bb7 G7+9 C7 F7+9 ii V I VI II V 1.2 Scale Substitutions and Rules for Reinforcement Learning First note that the theory and rules derived in this subsection are used in Phase 2, to be described in Section 3. They are presented here since they derive from the jazz basics immediately preceding. One way a novice improvisor can play is to associate one scale with each chord and choose notes from that scale when the chord is presented in the musical score. Therefore, Rule 1 is that an improvisor may choose notes from a “standard” scale associated with a chord. Next, the 4th degree of the scale is often avoided on a major or dominant seventh chord (Rule 3), unless the player can resolve its dissonance. The major 7th is an avoid note on a dominant seventh chord (Rule 4) since a dominant seventh chord and its scale contain the ﬂat 7th, not the major 7th. Rule 2 contains many notes that can be added. A brief rationale is given next. The C7 in Gm7-C7-Fmaj7 may be replaced by a C7#11, a C7+ chord, or a C7b9b5 or C7alt chord [9]. The scales for C7+ and C7#11 make available the raised fourth (ﬂat 5), and ﬂat 6 (ﬂat 13) for improvising. The C7b9b5 and C7alt (C7+9) chords and their scales make available the ﬂat9, raised 9, ﬂat5 and raised 5 [1]. These substitutions provide the notes of Rule 2. These rules (used in phase 2) are stated below, using for reinforcement values very bad (-1.0), bad (-0.5), a little bad (-0.25), ok (0.25), good (0.5), and very good (1.0). The rules are discussed further in Section 4. The Rule Set: 1) Any note in the scale associated with the chord is ok (except as noted in rule 3). 2) On a dominant seventh, hip notes 9, ﬂat9, #9, #11, 13 or ﬂat13 are very good. One hip note 2 times in a row is a little bad. 2 hip notes more than 2 times in a row is a little bad. 3) If the chord is a dominant seventh chord, a natural 4th note is bad. 4) If the chord is a dominant seventh chord, a natural 7th is very bad. 5) A rest is good unless it is held for more than 2 16th notes and then it is very bad. 6) Any note played longer than 1 beat (4 16th notes) is very bad. 7) If two consecutive notes match the human’s, that is good. 2 CHIME Phase 1 In Phase 1, supervised learning is used to train a recurrent network to reproduce the three Sonny Rollins melodies. 2.1 Network Details and Training The recurrent network’s output units are linear. The hidden units are nonlinear (logistic function). Todd [11] used a Jordan recurrent network [6] for classical melody learning and generation. In CHIME, a Jordan net is also used, with the addition of the chord as input (Figure 1. 24 of the 26 outputs are notes (2 chromatic octaves), the 25th is a rest, and the 26th indicates a new note. The output with the highest value above a threshold is the next note, including the rest output. The new note output indicates if this is a new note, or if it is the same note being held for another time step ( note resolution). ¥£ ¡ ¦¤¢ The 12 chord inputs (12 notes in a chromatic scale), are 1 or 0. A chord is represented as its ﬁrst, third, ﬁfth, and seventh notes and it “wraps around” within the 12 inputs. E.g., the Fm7 chord F-Ab-C-Eb is represented as C, Eb, F, Ab or 100101001000. One plan input per song enables distinguishing between songs. The 26 context inputs use eligibility traces, giving the hidden units a decaying history of notes played. CHIME (as did Todd) uses teacher forcing [13], wherein the target outputs for the previous step are used as inputs (so erroneous outputs are not used as inputs). Todd used from 8 to 15 hidden units; CHIME uses 50. The learning rate is 0.075 (Todd used 0.05). The eligibility rate is 0.9 (Todd used 0.8). Differences in values perhaps reﬂect contrasting styles of the songs and available computing power. Todd used 15 output units and assumed a rest when all note units are “turned off.” CHIME uses 24 output note units (2 octaves). Long rests in the Rollins tunes require a dedicated output unit for a rest. Without it, the note outputs learned to turn off all the time. Below are results of four representative experiments. In all experiments, 15,000 presentations of the songs were made. Each song has 192 16th note events. All songs are played at a ﬁxed tempo. Weights are initialized to small random values. The squared error is the average squared error over one complete presentation of the song. “Finessing” the network may improve these values. The songs are easily recognized however, and an exact match could impair the network’s ability to improvise. Figure 2 shows the results for “Solid.” Experiment 1. Song: Blue Seven. Squared error starts at 185, decreases to 2.67. Experiment 2. Song: Tenor Madness. Squared error starts at 218, decreases to 1.05. Experiment 3. Song: Solid. Squared error starts at 184, decreases to 3.77. Experiment 4. Song: All three songs: Squared error starts at 185, decreases to 36. Figure 1: Jordan recurrent net with addition of chord input 2.2 Phase 1 Human Computer Interaction in Real Time In trading fours with the trained network, human note events are brought in via the MIDI interface [7]. Four bars of human notes are recorded then given, one note event at a time to the context inputs (replacing the recurrent inputs). The plan inputs are all 1. The chord inputs follow the “Solid” form. The machine generates its four bars and they are played in real time. Then the human plays again, etc. An accompaniment (drums, bass, and piano), produced by Band-in-a-Box software (PG Music), keeps the beat and provides chords for the human. Figure 3 shows an interaction. The machine’s improvisations are in the second and fourth lines. In bar 5 the ﬂat 9 of the Eb7 appears; the E. This note is used on the Eb7 and Bb7 chords by Rollins in “Blue 7”, as a “passing tone.” D is played in bar 5 on the Eb7. D is the natural 7 over Eb7 (with its ﬂat 7) but is a note that Rollins uses heavily in all three songs, and once over the Eb7. It may be a response to the rest and the Bb played by the human in bar 1. D follows both a rest and a Bb in many places in “Tenor Madness” and “Solid.” In bar 6, the long G and the Ab (the third then fourth of Eb7) ﬁgure prominently in “Solid.” At the beginning of bar 7 is the 2-note sequence Ab-E that appears in exactly the same place in the song “Blue 7.” The focus of bars 7 and 8 is jumping between the 3rd and 4th of Bb7. At the end of bar 8 the machine plays the ﬂat 9 (Ab) then the ﬂat 3 (Bb), of G7+9. In bars 13-16 the tones are longer, as are the human’s in bars 9-12. The tones are the 5th, the root, the 3rd, the root, the ﬂat 7, the 3rd, the 7th, and the raised fourth. Except for the last 2, these are chord tones. 3 CHIME Phase 2 In Phase 2, the network is expanded and trained by reinforcement learning to improvise according to the rules of Section 1.2 and using its knowledge of the Sonny Rollins songs. 3.1 The Expanded Network Figure 4 shows the phase 2 network. The same inputs plus 26 human inputs brings the total to 68. The weights obtained in phase 1 initialize this network. The plan and chord weights Figure 2: At left “Solid” played by a human; at right the song reproduced by the ANN. are the same. The weights connecting context units to the hidden layer are halved. The same weights, halved, connect the 26 human inputs to the hidden layer. Each output unit gets the 100 hidden units’ outputs as input. The original 50 weights are halved and used as initial values of the two sets of 50 hidden unit weights to the output unit. 3.2 SSR and Critic Algorithms Using actor-critic reinforcement learning ([2, 10, 13]), the actor chooses the next note to play. The critic receives a “raw” reinforcement signal from the critique made by the . A rules of Section 1.2. For output j, the SSR (actor) computes mean Gaussian distribution with mean and standard deviation chooses the output . is generated, the critic modiﬁes and produces . is further modiﬁed by a self-scaling algorithm that tracks, via moving average, the maximum and minimum reinforcement and uses them to scale the signal to produce .

same-paper 2 0.93678784 89 nips-2001-Grouping with Bias

Author: Stella X. Yu, Jianbo Shi

3 0.83865613 13 nips-2001-A Natural Policy Gradient

Author: Sham M. Kakade

Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1

4 0.83199376 86 nips-2001-Grammatical Bigrams

Author: Mark A. Paskin

Abstract: Unsupervised learning algorithms have been derived for several statistical models of English grammar, but their computational complexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an efficient EM training algorithm. The model is based upon grammatical bigrams, i.e. , syntactic relationships between pairs of words. We present the results of experiments that quantify the representational adequacy of the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text. 1

5 0.82756078 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

Author: Jens Kohlmorgen, Steven Lemm

6 0.82649672 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

7 0.82257271 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

8 0.81928849 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

9 0.81927967 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

10 0.81927168 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments

11 0.81872785 161 nips-2001-Reinforcement Learning with Long Short-Term Memory

12 0.81817663 169 nips-2001-Small-World Phenomena and the Dynamics of Information

13 0.81771684 121 nips-2001-Model-Free Least-Squares Policy Iteration

14 0.81766748 27 nips-2001-Activity Driven Adaptive Stochastic Resonance

15 0.816962 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

16 0.81597495 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks

17 0.81309474 57 nips-2001-Correlation Codes in Neuronal Populations

18 0.81294489 36 nips-2001-Approximate Dynamic Programming via Linear Programming

19 0.81214499 190 nips-2001-Thin Junction Trees

20 0.8111394 55 nips-2001-Convergence of Optimistic and Incremental Q-Learning