nips nips2012 nips2012-135 knowledge-graph by maker-knowledge-mining

135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach


Source: pdf

Author: Dijun Luo, Heng Huang, Feiping Nie, Chris H. Ding

Abstract: In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semisupervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with convex optimization to “forge” a graph: with an input of a graph, to learn a graph with higher quality. Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. We develop a graph learning algorithm by solving a convex optimization problem and further develop an efficient optimization to obtain global optimal solutions with theoretical guarantees. With only one non-sensitive parameter, our method is shown by experimental results to be robust and achieve higher accuracy in semi-supervised learning and clustering under various settings. As a preprocessing of graphs, our method has a wide range of potential applications machine learning and data mining.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract In many graph-based machine learning and data mining approaches, the quality of the graph is critical. [sent-6, score-0.163]

2 However, in real-world applications, especially in semisupervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. [sent-7, score-0.293]

3 In this paper, we proposed a robust approach with convex optimization to “forge” a graph: with an input of a graph, to learn a graph with higher quality. [sent-8, score-0.193]

4 Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. [sent-9, score-0.192]

5 We develop a graph learning algorithm by solving a convex optimization problem and further develop an efficient optimization to obtain global optimal solutions with theoretical guarantees. [sent-10, score-0.349]

6 With only one non-sensitive parameter, our method is shown by experimental results to be robust and achieve higher accuracy in semi-supervised learning and clustering under various settings. [sent-11, score-0.097]

7 1 Introduction Many machine learning algorithms use graphs as input, such as clustering [16, 14], manifold based dimensional reduction [2, 15], and graph-based semi-supervised learning [23, 22]. [sent-13, score-0.13]

8 In these approaches, we are particularly interested in the similarity among objects. [sent-14, score-0.175]

9 However, the observation of similarity graphs often contain noise which sometimes mislead the learning algorithm, especially in unsupervised and semi-supervised learning. [sent-15, score-0.269]

10 Deriving graphs with high quality becomes attractive in machine learning and data mining research. [sent-16, score-0.077]

11 A robust and stable graph learning algorithm is especially desirable in unsupervised and semisupervised learning, because the unavailability or high cost of ground truth in real world applications. [sent-17, score-0.315]

12 In this paper, we develop a novel graph learning algorithm based on convex optimization, which leads to robust and competitive results. [sent-18, score-0.185]

13 1 Motivation and Main Problem In this section, the properties of similarity matrix are revisited from point of view of normalized cut clustering [19]. [sent-20, score-0.474]

14 Given a symmetric similarity matrix W ∈ Rn×n on n objects, normalized cut solves the following optimization problem [10]. [sent-21, score-0.472]

15 F⊺ F = I, (2) F≥0 ˜ where F = [f1 , f2 , · · · , fK ], H = [h1 , h2 , · · · , hK ], fk = D 2 hk / D 2 hk , 1 ≤ k ≤ K, W = 1 n −1 2 WD− 2 , D = diag(d , d , · · · , d ), d = D 1 2 n i j=1 Wij , I is the identity matrix, and K is the number of groups. [sent-26, score-0.092]

16 (2) can be further rewritten as, ˜ min W − FF⊺ F s. [sent-28, score-0.081]

17 , we learn a surrogate graph which is close but ˜ not identical to W. [sent-35, score-0.119]

18 Note that matrix FF⊺ holds the following properties: (P1) symmetric, (P2) nonnegative, (P3) low rank, and (P4) positive semidefinite. [sent-37, score-0.122]

19 This suggests a convex graph learning ˜ min G − W 2 s. [sent-38, score-0.199]

20 the sum of the singular values [8], and c is a model parameter which controls the rank of G. [sent-42, score-0.155]

21 The constraint of G ≥ 0 is to force the similarity to be naturally non-negative. [sent-43, score-0.23]

22 By intuition, one might impose row rank constraint of rank(G) ≤ c. [sent-44, score-0.156]

23 Following matrix completion methods [5], the trace constraint in Eq. [sent-46, score-0.156]

24 (5) is a good surrogate for the low rank constraint. [sent-47, score-0.142]

25 For notational convenience, the ˜ normalized similarity matrix W is denoted as W in the rest of the paper. [sent-48, score-0.269]

26 (5), we are actually seeking a similarity matrix which satisfies all the properties of a perfect similarity matrix (P1–P4) and which is close to the original input matrix G. [sent-50, score-0.586]

27 (5) and to demonstrate the usefulness of its optimal solution in clustering and semi-supervised learning using both theoretical and empirical evidences. [sent-52, score-0.162]

28 2 Related Work Our method can be viewed as a preprocessing for similarity matrix and a large number of machine learning and data mining approaches require a similarity matrix (interpreted as a weighted graph) as input. [sent-54, score-0.549]

29 For example, in unsupervised clustering, Normalized Cut [19], Ratio Cut [11], Cheeger Cut [3] have been widely applied in various real world applications. [sent-55, score-0.083]

30 Mixed Membership Block models [1] can be also interpreted as generative models on the similarity matrices among objects. [sent-58, score-0.175]

31 Thus a similarity matrix preprocessing model can be widely applied. [sent-59, score-0.281]

32 A large number of approaches have been developed to learn similarity matrix with different emphasis. [sent-60, score-0.224]

33 Local Linear Embedding (LLE ) [17, 18] and Linear Label Propagation [21] can be viewed as obtaining a similarity matrix using sparse coding. [sent-61, score-0.224]

34 Another way to perform the similarity matrix preprocessing is to take a graph as input and to obtain a refined graph by learning, such as bi-stochastic graph learning [13]. [sent-62, score-0.638]

35 First, von Neumann provided a convergence proof of successive projection method that it guarantees to converge to feasible solution in convex optimization with multiple constraints, which was employed in the paper by Liu et al. [sent-66, score-0.14]

36 In this paper, we develop a novel optimization algorithm to solve the optimization problem with multiple convex constraints (including the inequality constraints), which is guaranteed to find the global solution. [sent-68, score-0.321]

37 We also develop a useful Lemma to handle the subproblems with trace norm constraint in the main algorithm. [sent-70, score-0.179]

38 5 −1 0 5 10 15 Sorting index 20 25 30 Figure 1: A toy example of low rank and positive semidefinite graph learning. [sent-77, score-0.349]

39 2 A Toy Example We first emphasize the usefulness of the positive semidefinite and low rank constraints in problem of Eq. [sent-87, score-0.291]

40 In this toy example, we also solve the following problem for contrast, min G G−W 2 F s. [sent-89, score-0.144]

41 G = G⊺ , Ge = e, G ≥ 0, (6) where e = [1, 1, · · · , 1]T and the constraints of positive semidefinite and low rank are removed from Eq. [sent-91, score-0.267]

42 (6) is used in the bi-stochastic graph learning [13]. [sent-94, score-0.119]

43 (5) and (6) for the same input G and compare the solution to see the effect of positive semidefinite and low rank constraints. [sent-96, score-0.215]

44 In the toy example, we first generate a perfect similarity matrix W in which Wij = 1 if data points i, j are in the same group and Wij = 0 otherwise. [sent-97, score-0.318]

45 (5) recover the perfect similarity much more accurately than Eq. [sent-104, score-0.213]

46 (6), the low rank and positive semidefinite constraints are ignored and the result deviates from the ground truth. [sent-107, score-0.29]

47 We show the eigenvalues distributions of G for solution in Figure 1 (d) for both methods in Eqs. [sent-108, score-0.109]

48 (5) gives low rank and positive semidefinite results, while the solution for Eq. [sent-111, score-0.215]

49 (5) is always positive, symmetric, low rank, and positive semidefinite, we called our solution the Non-negative Low-rank Kernel (NLK). [sent-114, score-0.114]

50 Then we have more information to learn a better similarity matrix. [sent-118, score-0.175]

51 (5) by enforcing the similarity to be zeros for those paris of data points in different classes, i. [sent-120, score-0.175]

52 (7) We will demonstrate the advantage of these semi-supervision constraints in the experimental section. [sent-126, score-0.093]

53 Our strategy is to introduce two extra copies of optimization variable X and Y to split the constraints into several directly solvable subproblems, min G−W 2 F, s. [sent-131, score-0.219]

54 1 Seeking Global Solutions: A Variant of ALM The augmented Lagrangian multiplier function of Eqs. [sent-146, score-0.133]

55 (9a) – (9d) is µ µ G − X 2 − Σ, Y − G + G−Y Φ(G, X, Y) = G − W 2 − Λ, X − G + F F 2 2 with constraints of G ≥ 0, X 0, and Y 2 F, (10) ≤ c, where Λ, Σ are the Lagrangian multipliers. [sent-147, score-0.093]

56 ∗ Then ALM method leads to the following updating steps, G ← arg min Φ(G, X, Y, Z) (11a) X ← arg min Φ(G, X, Y, Z) (11b) G≥0 X 0 Y ← arg min Φ(G, X, Y, Z) Y ∗ ≤c Λ ← Λ − µ (G − X) Σ ← Σ − µ (G − Y) µ ← γµ, t ← t + 1. [sent-148, score-0.294]

57 (11c) (11d) (11e) (11f) Notice that the symmetric constraint is removed here. [sent-149, score-0.104]

58 We will later show that given a symmetric input W, the output of our algorithm automatically satisfies the symmetric constraints. [sent-150, score-0.098]

59 φi (X) ≤ ci , 1 ≤ i ≤ m, (12) where φi (X) ≤ ci is any constraint on eigenvalues of X, i = 1, 2, · · · , m and m is the number of constraints. [sent-160, score-0.175]

60 By comparing two solutions of X = VSV⊺ and Z = USU⊺ , one should notice (a) that Z satisfies all the constraints of φi (Z) = φi (X) ≤ ci , 1 ≤ i ≤ m in Eq. [sent-170, score-0.18]

61 1 shows an interesting property of the matrix approximation with eigenvalue or singular value constraint: the optimal solution matrix shares the same subspace of the input matrix. [sent-174, score-0.217]

62 Thus the lemma provides a powerful mathematical tool in computation of optimization problem with eigenvalue and singular value constraints. [sent-176, score-0.16]

63 (11a) as following, G ← arg min (2 + 2µ)G − (2W + µ(X + Y) + Λ + Σ) 2 + const F G≥0 = 2W + µ(X + Y) + Λ + Σ ,0 . [sent-186, score-0.078]

64 (11b), we need to solve the following subproblem min X − P 2 , X 0, where P = G + Λ/µ. [sent-190, score-0.088]

65 F X (18) Notice that X 0 is constraint on the eigenvalues of X. [sent-191, score-0.123]

66 (19) can be further rewritten as, n 2 (si − di ) , s. [sent-199, score-0.093]

67 (20) has simple closed form solution as si = max(di , 0), i = 1, 2, · · · , n. [sent-203, score-0.083]

68 (11c) can be rewritten as, min Y − Q Y where Q = G + 1 µ Σ. [sent-207, score-0.081]

69 (22) F Since we do not know the true Lagrangian multiplier λ, we cannot directly apply the singular value thresholding technique [4]. [sent-209, score-0.141]

70 We notice that Y is symmetric and the constraint of Y ∗ ≤ c becomes a constraint on the eigenvalues of Y. [sent-212, score-0.288]

71 And the solution is si = sign(di ) max(|di | − θ, 0). [sent-228, score-0.083]

72 (26) Notice that in each step of algorithm, the solution has closed form solution and that the output of G is always symmetric, which indicates that the constraint of G = G⊺ is automatically satisfied in each step. [sent-229, score-0.137]

73 The augmented Lagrangian multiplier function is Φ(G, X, Y) = G−W +µ 2 2 F − Λ, X − G + Y−G 2 F + µ 2 (i,j)∈T X−G µ 2 2 Gij 2 F − Σ, Y − G − Ωij Gij , (27) This is identical to Eq. [sent-234, score-0.133]

74 (10), except we added Ω as additional Lagrangian multiplier for the semisupervised constraints, i. [sent-235, score-0.127]

75 the desired similarity Gij = 0 for (i, j) having different known class labels. [sent-237, score-0.175]

76 Here T = {(i, j) : yi = yj , i, j = 1, 2, · · · , ℓ}. [sent-238, score-0.083]

77 To update G, we set ∂Φ(G, X, Y)/∂G = 0 and obtain Gij ←   max     max  2Wij + µ(Xij + Yij ) + Λij + Σij + Ωij ,0 2 + 3µ 2Wij + µ(Xij + Yij ) + Λij + Σij ,0 2 + 2µ if yi = yj , otherwise. [sent-242, score-0.083]

78 (28) For Lagrangian multiplier Ω, the corresponding updating is Ωij ← Ωij − µGij , ∀yi = yj . [sent-243, score-0.202]

79 6 Algorithm 1 NLK Algorithm For Supervised Learning and Semi-supervised Learning Require: Weighted graph W, model parameters c, optimization parameter γ, partial label y for semi-supervised learning. [sent-247, score-0.165]

80 4 Theoretical Analysis of The Algorithm Since the objective function and all the constraints are convex, we have the following [12] Theorem 3. [sent-258, score-0.093]

81 Algorithm 1 converges to the global solution of Eq. [sent-260, score-0.075]

82 Notice that this conclusion is stronger than that in the related research papers [13] for graph learning. [sent-263, score-0.119]

83 (5)) can be used as preprocessing for any graph based methods. [sent-265, score-0.176]

84 Here we evaluated NLK on several state-of-the-art graph based learning models, include Normalized Cut (Ncut) [19] for unsupervised learning and Gaussian Fields and Harmonic Functions (GFHF) and local and global consistency learning (LGC) for semi-supervised learning. [sent-266, score-0.214]

85 We compare the clustering in both clustering accuracy and normalized mutual information (NMI). [sent-267, score-0.239]

86 1 Experimental Settings For clustering, we compare three similarity matrices: (1) original from Gaussian kernel matrix, wij = exp − xi − xj 2 /2σ 2 , where σ is set to the average pairwise distances among all the data points. [sent-273, score-0.249]

87 The clustering algorithm of Normalized Cut [19] is applied on the three similarity matrices. [sent-275, score-0.272]

88 Then we have total three clustering approaches: Normalized Cut (Ncut), BBS+Ncut, and NLK+Ncut. [sent-276, score-0.097]

89 For each clustering method, we try 100 random trials for different clustering initializations. [sent-277, score-0.22]

90 We compare 4 types of similarity matrices: original Gaussian kernel matrix, as discussed before, BBS, NLK, and NLK with semi-supervised constraints (model in Eq. [sent-280, score-0.291]

91 2 Parameter Settings For all the similarity learning approaches (BBS, NLK, and NLK Semi), we set the convergent criteria as follows. [sent-286, score-0.2]

92 3 Experimental Results We show that clustering results in Table 1 where we compare both measurements (accuracy, NMI) over 3 methods on 4 data sets. [sent-341, score-0.097]

93 5 Conclusions and Discussion In this paper, we derive a similarity learning model based on convex optimizations. [sent-491, score-0.203]

94 We demonstrate that the low rank and positive semidefinite constraints are nature in the similarity. [sent-492, score-0.267]

95 Further more, we develop new sufficient algorithm to obtain global solution with theoretical guarantees. [sent-493, score-0.113]

96 We also develop more optimization techniques that are potentially useful in the related eigenvalues or singular values constraints optimization. [sent-494, score-0.299]

97 The presented model is verified on extensive experiments, and the results show that our method enhances the quality of the similarity matrix significantly, in both clustering and semi-supervised learning. [sent-495, score-0.321]

98 Spectral relaxation models and structure analysis for k-way graph clustering and bi-clustering. [sent-560, score-0.216]

99 New spectral methods for ratio cut partitioning and clustering. [sent-565, score-0.136]

100 A globally convergent augmented lagrangian pattern search algorithm for optimization with general constraints and simple bounds. [sent-572, score-0.32]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nlk', 0.645), ('bbs', 0.248), ('ncut', 0.242), ('usu', 0.241), ('similarity', 0.175), ('udu', 0.174), ('gij', 0.144), ('graph', 0.119), ('lagrangian', 0.11), ('cut', 0.108), ('semide', 0.106), ('rank', 0.101), ('binalpha', 0.099), ('gfhf', 0.099), ('vsv', 0.099), ('clustering', 0.097), ('constraints', 0.093), ('lgc', 0.088), ('multiplier', 0.087), ('ff', 0.081), ('ding', 0.076), ('semi', 0.072), ('nmi', 0.072), ('eigenvalues', 0.068), ('ij', 0.065), ('di', 0.064), ('vehicle', 0.064), ('notice', 0.061), ('unsupervised', 0.061), ('updating', 0.06), ('subproblems', 0.057), ('preprocessing', 0.057), ('toy', 0.056), ('yj', 0.055), ('constraint', 0.055), ('alm', 0.054), ('singular', 0.054), ('min', 0.052), ('wij', 0.051), ('diag', 0.05), ('trx', 0.05), ('unavailability', 0.05), ('symmetric', 0.049), ('matrix', 0.049), ('luo', 0.047), ('augmented', 0.046), ('optimization', 0.046), ('normalized', 0.045), ('mining', 0.044), ('nie', 0.044), ('si', 0.042), ('solution', 0.041), ('low', 0.041), ('heng', 0.04), ('semisupervised', 0.04), ('harmonic', 0.039), ('develop', 0.038), ('perfect', 0.038), ('lemma', 0.036), ('neumann', 0.036), ('solve', 0.036), ('green', 0.034), ('hk', 0.034), ('eigenvector', 0.034), ('global', 0.034), ('gt', 0.033), ('graphs', 0.033), ('positive', 0.032), ('yij', 0.031), ('trace', 0.029), ('candes', 0.029), ('rewritten', 0.029), ('sorting', 0.028), ('seeking', 0.028), ('split', 0.028), ('spectral', 0.028), ('yi', 0.028), ('convex', 0.028), ('ge', 0.027), ('ci', 0.026), ('arg', 0.026), ('try', 0.026), ('von', 0.025), ('acm', 0.025), ('convergent', 0.025), ('dn', 0.025), ('usefulness', 0.024), ('duchi', 0.024), ('eigenvalue', 0.024), ('xij', 0.024), ('fields', 0.024), ('fk', 0.024), ('fortunately', 0.023), ('segment', 0.023), ('ground', 0.023), ('completion', 0.023), ('relational', 0.023), ('original', 0.023), ('world', 0.022), ('membership', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

Author: Dijun Luo, Heng Huang, Feiping Nie, Chris H. Ding

Abstract: In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semisupervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with convex optimization to “forge” a graph: with an input of a graph, to learn a graph with higher quality. Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. We develop a graph learning algorithm by solving a convex optimization problem and further develop an efficient optimization to obtain global optimal solutions with theoretical guarantees. With only one non-sensitive parameter, our method is shown by experimental results to be robust and achieve higher accuracy in semi-supervised learning and clustering under various settings. As a preprocessing of graphs, our method has a wide range of potential applications machine learning and data mining.

2 0.1160363 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

3 0.10596167 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

Author: Jinfeng Yi, Rong Jin, Shaili Jain, Tianbao Yang, Anil K. Jain

Abstract: One of the main challenges in data clustering is to define an appropriate similarity measure between two objects. Crowdclustering addresses this challenge by defining the pairwise similarity based on the manual annotations obtained through crowdsourcing. Despite its encouraging results, a key limitation of crowdclustering is that it can only cluster objects when their manual annotations are available. To address this limitation, we propose a new approach for clustering, called semi-crowdsourced clustering that effectively combines the low-level features of objects with the manual annotations of a subset of the objects obtained via crowdsourcing. The key idea is to learn an appropriate similarity measure, based on the low-level features of objects and from the manual annotations of only a small portion of the data to be clustered. One difficulty in learning the pairwise similarity measure is that there is a significant amount of noise and inter-worker variations in the manual annotations obtained via crowdsourcing. We address this difficulty by developing a metric learning algorithm based on the matrix completion method. Our empirical study with two real-world image data sets shows that the proposed algorithm outperforms state-of-the-art distance metric learning algorithms in both clustering accuracy and computational efficiency. 1

4 0.091395482 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

5 0.08109495 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

Author: Cho-jui Hsieh, Arindam Banerjee, Inderjit S. Dhillon, Pradeep K. Ravikumar

Abstract: We consider the composite log-determinant optimization problem, arising from the 1 regularized Gaussian maximum likelihood estimator of a sparse inverse covariance matrix, in a high-dimensional setting with a very large number of variables. Recent work has shown this estimator to have strong statistical guarantees in recovering the true structure of the sparse inverse covariance matrix, or alternatively the underlying graph structure of the corresponding Gaussian Markov Random Field, even in very high-dimensional regimes with a limited number of samples. In this paper, we are concerned with the computational cost in solving the above optimization problem. Our proposed algorithm partitions the problem into smaller sub-problems, and uses the solutions of the sub-problems to build a good approximation for the original problem. Our key idea for the divide step to obtain a sub-problem partition is as follows: we first derive a tractable bound on the quality of the approximate solution obtained from solving the corresponding sub-divided problems. Based on this bound, we propose a clustering algorithm that attempts to minimize this bound, in order to find effective partitions of the variables. For the conquer step, we use the approximate solution, i.e., solution resulting from solving the sub-problems, as an initial point to solve the original problem, and thereby achieve a much faster computational procedure. 1

6 0.078079261 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

7 0.077869035 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

8 0.072564788 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

9 0.068962276 330 nips-2012-Supervised Learning with Similarity Functions

10 0.061187658 247 nips-2012-Nonparametric Reduced Rank Regression

11 0.060109045 180 nips-2012-Learning Mixtures of Tree Graphical Models

12 0.05625125 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

13 0.056129076 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

14 0.055312052 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

15 0.054614302 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

16 0.052957989 291 nips-2012-Reducing statistical time-series problems to binary classification

17 0.051159672 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

18 0.050953548 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

19 0.050907481 338 nips-2012-The Perturbed Variation

20 0.050651267 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.157), (1, 0.052), (2, 0.039), (3, -0.091), (4, 0.031), (5, -0.01), (6, -0.018), (7, -0.019), (8, -0.015), (9, 0.056), (10, 0.037), (11, -0.109), (12, -0.061), (13, 0.016), (14, 0.064), (15, 0.045), (16, 0.019), (17, 0.052), (18, -0.004), (19, -0.048), (20, -0.067), (21, -0.051), (22, -0.03), (23, 0.011), (24, 0.006), (25, 0.043), (26, 0.014), (27, -0.002), (28, -0.019), (29, -0.01), (30, -0.007), (31, -0.006), (32, 0.018), (33, 0.041), (34, -0.002), (35, -0.055), (36, -0.031), (37, 0.037), (38, 0.051), (39, 0.076), (40, -0.057), (41, -0.052), (42, 0.093), (43, 0.014), (44, 0.079), (45, 0.075), (46, 0.021), (47, 0.005), (48, -0.053), (49, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92742813 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

Author: Dijun Luo, Heng Huang, Feiping Nie, Chris H. Ding

Abstract: In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semisupervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with convex optimization to “forge” a graph: with an input of a graph, to learn a graph with higher quality. Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. We develop a graph learning algorithm by solving a convex optimization problem and further develop an efficient optimization to obtain global optimal solutions with theoretical guarantees. With only one non-sensitive parameter, our method is shown by experimental results to be robust and achieve higher accuracy in semi-supervised learning and clustering under various settings. As a preprocessing of graphs, our method has a wide range of potential applications machine learning and data mining.

2 0.80559689 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

3 0.72550231 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

Author: Ehsan Elhamifar, Guillermo Sapiro, René Vidal

Abstract: Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data. 1

4 0.69393849 125 nips-2012-Factoring nonnegative matrices with linear programs

Author: Ben Recht, Christopher Re, Joel Tropp, Victor Bittorf

Abstract: This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that satisfies X ≈ CX and some linear constraints. The constraints are chosen to ensure that the matrix C selects features; these features can then be used to find a low-rank NMF of X. A theoretical analysis demonstrates that this approach has guarantees similar to those of the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method extends to more general noise models and leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation can factor a multigigabyte matrix in a matter of minutes. 1

5 0.69180739 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

6 0.64698112 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

7 0.64663154 196 nips-2012-Learning with Partially Absorbing Random Walks

8 0.6214208 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

9 0.59710771 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

10 0.59422278 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

11 0.58424962 309 nips-2012-Semi-supervised Eigenvectors for Locally-biased Learning

12 0.58008564 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

13 0.57725936 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

14 0.57341665 338 nips-2012-The Perturbed Variation

15 0.57054979 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

16 0.56967998 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

17 0.56908417 157 nips-2012-Identification of Recurrent Patterns in the Activation of Brain Networks

18 0.55653059 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

19 0.54051948 22 nips-2012-A latent factor model for highly multi-relational data

20 0.53752148 128 nips-2012-Fast Resampling Weighted v-Statistics


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.068), (11, 0.011), (17, 0.294), (38, 0.172), (39, 0.02), (42, 0.017), (54, 0.032), (55, 0.013), (74, 0.049), (76, 0.113), (80, 0.084), (92, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96537971 224 nips-2012-Multi-scale Hyper-time Hardware Emulation of Human Motor Nervous System Based on Spiking Neurons using FPGA

Author: C. M. Niu, Sirish Nandyala, Won J. Sohn, Terence Sanger

Abstract: Our central goal is to quantify the long-term progression of pediatric neurological diseases, such as a typical 10-15 years progression of child dystonia. To this purpose, quantitative models are convincing only if they can provide multi-scale details ranging from neuron spikes to limb biomechanics. The models also need to be evaluated in hyper-time, i.e. significantly faster than real-time, for producing useful predictions. We designed a platform with digital VLSI hardware for multiscale hyper-time emulations of human motor nervous systems. The platform is constructed on a scalable, distributed array of Field Programmable Gate Array (FPGA) devices. All devices operate asynchronously with 1 millisecond time granularity, and the overall system is accelerated to 365x real-time. Each physiological component is implemented using models from well documented studies and can be flexibly modified. Thus the validity of emulation can be easily advised by neurophysiologists and clinicians. For maximizing the speed of emulation, all calculations are implemented in combinational logic instead of clocked iterative circuits. This paper presents the methodology of building FPGA modules emulating a monosynaptic spinal loop. Emulated activities are qualitatively similar to real human data. Also discussed is the rationale of approximating neural circuitry by organizing neurons with sparse interconnections. In conclusion, our platform allows emulating pathological abnormalities such that motor symptoms will emerge and can be analyzed. It compels us to test the origins of childhood motor disorders and predict their long-term progressions. 1 Challenges of studying developmental motor disorders There is currently no quantitative model of how a neuropathological condition, which mainly affects the function of neurons, ends up causing the functional abnormalities identified in clinical examinations. The gap in knowledge is particularly evident for disorders in developing human nervous systems, i.e. childhood neurological diseases. In these cases, the ultimate clinical effect of cellu1 lar injury is compounded by a complex interplay among the child’s injury, development, behavior, experience, plasticity, etc. Qualitative insight has been provided by clinical experiences into the association between particular types of injury and particular types of outcome. Their quantitative linkages, nevertheless, have yet to be created – neither in clinic nor in cellular physiological tests. This discrepancy is significantly more prominent for individual child patients, which makes it very difficult to estimate the efficacy of treatment plans. In order to understand the consequence of injury and discover new treatments, it is necessary to create a modeling toolset with certain design guidelines, such that child neurological diseases can be quantitatively analyzed. Perhaps more than any other organ, the brain necessarily operates on multiple spatial and temporal scales. On the one hand, it is the neurons that perform fundamental computations, but neurons have to interact with large-scale organs (ears, eyes, skeletal muscles, etc.) to achieve global functions. This multi-scale nature worths more attention in injuries, where the overall deficits depend on both the cellular effects of injuries and the propagated consequences. On the other hand, neural processes in developmental diseases usually operate on drastically different time scales, e.g. spinal reflex in milliseconds versus learning in years. Thus when studying motor nervous systems, mathematical modeling is convincing only if it can provide multi-scale details, ranging from neuron spikes to limb biomechanics; also the models should be evaluated with time granularity as small as 1 millisecond, meanwhile the evaluation needs to continue trillions of cycles in order to cover years of life. It is particularly challenging to describe the multi-scale nature of human nervous system when modeling childhood movement disorders. Note that for a child who suffered brain injury at birth, the full development of all motor symptoms may easily take more than 10 years. Therefore the millisecondbased model needs to be evaluated significantly faster than real-time, otherwise the model will fail to produce any useful predictions in time. We have implemented realistic models for spiking motoneurons, sensory neurons, neural circuitry, muscle fibers and proprioceptors using VLSI and programmable logic technologies. All models are computed in Field Programmable Gate Array (FPGA) hardware in 365 times real-time. Therefore one year of disease progression can be assessed after one day of emulation. This paper presents the methodology of building the emulation platform. The results demonstrate that our platform is capable of producing physiologically realistic multi-scale signals, which are usually scarce in experiments. Successful emulations enabled by this platform will be used to verify theories of neuropathology. New treatment mechanisms and drug effects can also be emulated before animal experiments or clinical trials. 2 Methodology of multi-scale neural emulation A. Human arm B. Monosynaptic spinal loop C. Inner structure of muscle spindle Gamma Secondary dynamic Gamma output input static Primary input output Bag 1 αMN Bag 2 Chain Figure 1: Illustration of the multi-scale nature of motor nervous system. The motor part of human nervous system is responsible for maintaining body postures and generating voluntary movements. The multi-scale nature of motor nervous system is demonstrated in Fig.1. When the elbow (Fig.1A) is maintaining a posture or performing a movement, a force is established by the involved muscle based on how much spiking excitation the muscle receives from its αmotoneurons (Fig.1B). The α-motoneurons are regulated by a variety of sensory input, part of which comes directly from the proprioceptors in the muscle. As the primary proprioceptor found in skeletal muscles, a muscle spindle is another complex system that has its own microscopic Multiple-InputMultiple-Output structure (Fig.1C). Spindles continuously provide information about the length and lengthening speed of the muscle fiber. A muscle with its regulating motoneurons, sensory neurons and proprioceptors constitutes a monosynaptic spinal loop. This minimalist neurophysiological 2 structure is used as an example for explaining the multi-scale hyper-time emulation in hardware. Additional structures can be added to the backbone set-up using similar methodologies. 2.1 Modularized architecture for multi-scale models Decades of studies on neurophysiology provided an abundance of models characterizing different components of the human motor nervous system. The informational characteristics of physiological components allowed us to model them as functional structures, i.e. each of which converting input signals to certain outputs. In particular, within a monosynaptic spinal loop illustrated in Fig.1B, stretching the muscle will elicit a chain of physiological activities in: muscle stretch ⇒ spindle ⇒ sensory neuron ⇒ synapse ⇒ motoneuron ⇒ muscle contraction. The adjacent components must have compatible interfaces, and the interfacing variables must also be physiologically realistic. In our design, each component is mathematically described in Table 1: Table 1: Functional definition of neural models COMPONENT Neuron Synapse Muscle Spindle MATHEMATICAL DEFINITION S(t) = fneuron (I, t) I(t) = fsynapse (S, t) ˙ T (t) = fmuscle (S, L, L, t) ˙ Γdynamic , Γstatic , t) A(t) = fspindle (L, L, all components are modeled as black-box functions that map the inputs to the outputs. The meanings of these mathematical definitions are explained below. This design allows existing physiological models to be easily inserted and switched. In all models the input signals are time-varying, e.g. I = I(t), L = L(t) , etc. The argument of t in input signals are omitted throughout this paper. 2.2 Selection of models for emulation Models were selected in consideration of their computational cost, physiological verisimilitude, and whether it can be adapted to the mathematical form defined in Table 1. Model of Neuron The informational process for a neuron is to take post-synaptic current I as the input, and produce a binary spike train S in the output. The neuron model adopted in the emulation was developed by Izhikevich [1]: = 0.04v 2 + 5v + 140 − u + I = a(bv − u) v u (1) (2) if v = 30 mV, then v ← c, u ← u + d where a, b, c, d are free parameters tuned to achieve certain firing patterns. Membrane potential v directly determines a binary spike train S(t) that S(t) = 1 if v ≥ 30, otherwise S(t) = 0. Note that v in Izhikevich model is in millivolts and time t is in milliseconds. Therefore the coefficients in eq.1 need to be adjusted in correspondence to SI units. Model of Synapse When a pre-synaptic neuron spikes, i.e. S(0) = 1, an excitatory synapse subsequently issues an Excitatory Post-Synaptic Current (EPSC) that drives the post-synaptic neuron. Neural recording of hair cells in rats [2] provided evidence that the time profile of EPSC can be well characterized using the equations below: I(t) = Vm × e t d Vm −τ 0 t − e− τr Vm if t ≥ 0 (3) otherwise The key parameters in a synapse model is the time constants for rising (τr ) and decaying (τd ). In our emulation τr = 0.001 s and τr = 0.003 s. 3 Model of Muscle force and electromyograph (EMG) The primary effect of skeletal muscle is to convert α-motoneuron spikes S into force T , depending ˙ on the muscle’s instantaneous length L and lengthening speed L. We used Hill’s muscle model in the emulation with parameter tuning described in [3]. Another measurable output of muscle is electromyograph (EMG). EMG is the small skin current polarized by motor unit action potential (MUAP) when it travels along muscle fibers. Models exist to describe the typical waveform picked by surface EMG electrodes. In this project we chose to implement the one described in [4]. Model of Proprioceptor Spindle is a sensory organ that provides the main source of proprioceptive information. As can be seen in Fig.1C, a spindle typically produces two afferent outputs (primary Ia and secondary II) ˙ according to its gamma fusimotor drives (Γdynamic and Γstatic ) and muscle states (L and L). There is currently no closed-form models describing spindle functions due to spindle’s significant nonlinearity. On representative model that numerically approximates the spindle dynamics was developed by Mileusnic et al. [5]. The model used differential equations to characterize a typical cat soleus spindle. Eqs.4-10 present a subset of this model for one type of spindle fiber (bag1): Γdynamic − x0 /τ Γdynamic + Ω2 bag1 x0 ˙ = x1 ˙ = x2 1 = [TSR − TB − TP R − Γ1 x0 ] M x2 ˙ (4) (5) (6) where TSR TB TP R CSS = KSR (L − x1 − LSR0 ) (7) 0.3 = (B0 + B1 x0 ) · (x1 − R) · CSS · |x2 | = KP R (x1 − LP R0 ) 2 = −1 −1000x2 1+e (8) (9) (10) Eq.8 and 10 suggest that evaluating the spindle model requires multiplication, division as well as more complex arithmetics like polynomials and exponentials. The implementation details are described in Section 3. 2.3 Neuron connectivity with sparse interconnections Although the number of spinal neurons (~1 billion) is significantly less compared to that of cortical neurons (~100 billion), a fully connected spinal network still means approximately 2 trillion synaptic endings [6]. Implementing such a huge number of synapses imposes a major challenge, if not impossible, given limited hardware resource. In this platform we approximated the neural connectivity by sparsely connecting sensory neurons to motoneurons as parallel pathways. We do not attempt to introduce the full connectivity. The rationale is that in a neural control system, the effect of a single neuron can be considered as mapping current state x to change in state x through a band-limited channel. Therefore when a collection of ˙ neurons are firing stochastically, the probability of x depends on both x and the firing behavior s ˙ (s = 1 when spiking, otherwise s = 0) of each neuron, as such: p(x|x, s) = p(x|s = 1)p(s = 1|x) + p(x|s = 0)p(s = 0|x) ˙ ˙ ˙ (11) Eq.11 is a master equation that determines a probability flow on the state. From the Kramers-Moyal expansion we can associate this probability flow with a partial differential equation: ∂ p(x, t) ∂t ∞ − = i=1 ∂ ∂x i D(i) (x)p(x, t) (12) where D(i) (x) is a time-invariant term that modifies the change of probability density based on its i-th gradient. 4 Under certain conditions [7, 8], D(i) (x) for i > 2 all vanish and therefore the probability flow can be described deterministically using a linear operator L: ∂ ∂ ∂ 2 (2) D (x) p(x, t) = Lp(x, t) (13) p(x, t) = − D(1) (x) + ∂t ∂x ∂x2 This means that various Ls can be superimposed to achieve complex system dynamics (illustrated in Fig.2A). B. Equivalent network with sparse interconnections A. Neuron function as superimposed linear operators SN Sensory Input + SN SN SN αMN αMN αMN Motor Output αMN Figure 2: Functions of neuron population can be described as the combination of linear operators (A). Therefore the original neural function can be equivalently produced by sparsely connected neurons formalizing parallel pathways (B). As a consequence, the statistical effect of two fully connected neuron populations is equivalent to ones that are only sparsely connected, as long as the probability flow can be described by the same L. For a movement task, in particular, it is the statistical effect from the neuron ensemble onto skeletal muscles that determines the global behavior. Therefore we argue that it is feasible to approximate the spinal cord connectivity by sparsely interconnecting sensory and motor neurons (Fig.2B). Here a pool of homogenous sensory neurons projects to another pool of homogeneous α-motoneurons. Pseudorandom noise is added to the input of all homogeneous neurons within a population. It is worth noting that this approximation significantly reduces the number of synapses that need to be implemented in hardware. 3 Hardware implementation on FPGA We select FPGA as the implementation device due to its inherent parallelism that resembles the nervous system. FPGA is favored over GPU or clustered CPUs because it is relatively easy to network hundreds of nodes under flexible protocols. The platform is distributed on multiple nodes of Xilinx Spartan-6 devices. The interfacing among FPGAs and computers is created using OpalKelly development board XEM6010. The dynamic range of variables is tight in models of Izhikevich neuron, synapse and EMG. This helps maintaining the accuracy of models even when they are evaluated in 32-bit fixed-point arithmetics. The spindle model, in contrast, requires floating-point arithmetics due to its wide dynamic range and complex calculations (see eq.4-10). Hyper-time computations with floating-point numbers are resource consuming and therefore need to be implemented with special attentions. 3.1 Floating-point arithmetics in combinational logic Our arithmetic implementations are compatible with IEEE-754 standard. Typical floating-point arithmetic IP cores are either pipe-lined or based on iterative algorithms such as CORDIC, all of which require clocks to schedule the calculation. In our platform, no clock is provided for model evaluations thus all arithmetics need to be executed in pure combinational logic. Taking advantage of combinational logic allows all model evaluations to be 1) fast, the evaluation time depends entirely on the propagating and settling time of signals, which is on the order of microseconds, and 2) parallel, each model is evaluated on its own circuit without waiting for any other results. Our implementations of adder and multiplier are inspired by the open source project “Free FloatingPoint Madness”, available at http://www.hmc.edu/chips/. Please contact the authors of this paper if the modified code is needed. 5 Fast combinational floating-point division Floating-point division is even more resource demanding than multiplications. We avoided directly implementing the dividing algorithm by approximating it with additions and multiplications. Our approach is inspired by an algorithm described in [9], which provides a good approximation of the inverse square root for any positive number x within one Newton-Raphson iteration: 1 x Q(x) = √ ≈ x(1.5 − · x2 ) 2 x (x > 0) (14) Q(x) can be implemented only using floating-point adders and multipliers. Thereby any division with a positive divisor can be achieved if two blocks of Q(x) are concatenated: a a (15) = √ √ = a · Q(b) · Q(b) (b > 0) b b· b This algorithm has been adjusted to also work with negative divisors (b < 0). Numerical integrators for differential equations Evaluating the instantaneous states of differential equation models require a fixed-step numerical integrator. Backward Euler’s Method was chosen to balance the numerical error and FPGA usage: x ˙ xn+1 = f (x, t) = xn + T f (xn+1 , tn+1 ) (16) (17) where T is the sampling interval. f (x, t) is the derivative function for state variable x. 3.2 Asynchronous spike-based communication between FPGA chips Clock Spike clean count Counter 1 1 2 1 2 3 Figure 3: Timing diagram of asynchronous spike-based communication FPGA nodes are networked by transferring 1-bit binary spikes to each other. Our design allowed the sender and the receiver to operate on independent clocks without having to synchronize. The timing diagram of the spike-based communication is shown in Fig.3. The sender issues Spike with a pulse width of 1/(365 × Femu ) second. Each Spike then triggers a counting event on the receiver, meanwhile each Clock first reads the accumulated spike count and subsequently cleans the counter. Note that the phase difference between Spike and Clock is not predictable due to asynchronicity. 3.3 Serialize neuron evaluations within a homogeneous population Different neuron populations are instantiated as standalone circuits. Within in each population, however, homogeneous neurons mentioned in Section 2.3 are evaluated in series in order to optimize FPGA usage. Within each FPGA node all modules operate with a central clock, which is the only source allowed to trigger any updating event. Therefore the maximal number of neurons that can be serialized (Nserial ) is restrained by the following relationship: Ffpga = C × Nserial × 365 × Femu (18) Here Ffpga is the fastest clock rate that a FPGA can operate on; C = 4 is the minimal clock cycles needed for updating each state variable in the on-chip memory; Femu = 1 kHz is the time granularity of emulation (1 millisecond), and 365 × Femu represents 365x real-time. Consider that Xilinx 6 Spartan-6 FPGA devices peaks at 200MHz central clock frequency, the theoretical maximum of neurons that can be serialized is Nserial 200 MHz/(4 × 365 × 1 kHz) ≈ 137 (19) In the current design we choose Nserial = 128. 4 Results: emulated activities of motor nervous system Figure 4 shows the implemented monosynaptic spinal loop in schematics and in operation. Each FPGA node is able to emulate monosynaptic spinal loops consisting of 1,024 sensory and 1,024 motor neurons, i.e. 2,048 neurons in total. The spike-based asynchronous communication is successful between two FPGA nodes. Note that the emulation has to be significantly slowed down for on-line plotting. When the emulation is at full speed (365x real-time) the software front-end is not able to visualize the signals due to limited data throughput. 128 SNs 128 αMNs SN αMN 128 SNs 128 αMNs SN αMN ... 8 parallel pathways 2,048 neurons Figure 4: The neural emulation platform in operation. Left: Neural circuits implemented for each FPGA node including 2,048 neurons. SN = Sensory Neuron; αMN = α-motoneuron. Center: One working FPGA node. Right: Two FPGA nodes networked using asynchronous spiking protocol. The emulation platform successfully created multi-scale information when the muscle is externally stretched (Fig.5A). We also tested if our emulated motor system is able to produce the recruitment order and size principles observed in real physiological data. It has been well known that when a voluntary motor command is sent to the α-motoneuron pool, the motor units are recruited in an order that small ones get recruited first, followed by the big ones [10]. The comparison between our results and real data are shown in Fig.5B, where the top panel shows 20 motor unit activities emulated using our platform, and the bottom panel shows decoded motor unit activities from real human EMG [11]. No qualitative difference was found. 5 Discussion and future work We designed a hardware platform for emulating the multi-scale motor nervous activities in hypertime. We managed to use one node of single Xilinx Spartan-6 FPGA to emulate monosynaptic spinal loops consisting of 2,048 neurons, associated muscles and proprioceptors. The neurons are organized as parallel pathways with sparse interconnections. The emulation is successfully accelerated to 365x real-time. The platform can be scaled by networking multiple FPGA nodes, which is enabled by an asynchronous spike-based communication protocol. The emulated monosynaptic spinal loops are capable of producing reflex-like activities in response to muscle stretch. Our results of motor unit recruitment order are compatible with the physiological data collected in real human subjects. There is a question of whether this stochastic system turns out chaotic, especially with accumulated errors from Backward Euler’s integrator. Note that the firing property of a neuron population is usually stable even with explicit noise [8], and spindle inputs are measured from real robots so the integrator errors are corrected at every iteration. To our knowledge, the system is not critically sensitive to the initial conditions or integrator errors. This question, however, is both interesting and important for in-depth investigations in the future. 7 It has been shown [12] that replicating classic types of spinal interneurons (propriospinal, Iaexcitatory, Ia-inhibitory, Renshaw, etc.) is sufficient to produce stabilizing responses and rapid reaching movement in a wrist. Our platform will introduce those interneurons to describe the known spinal circuitry in further details. Physiological models will also be refined as needed. For the purpose of modeling movement behavior or diseases, Izhikevich model is a good balance between verisimilitude and computational cost. Nevertheless when testing drug effects along disease progression, neuron models are expected to cover sufficient molecular details including how neurotransmitters affect various ion channels. With the advancing of programmable semiconductor technology, it is expected to upgrade our neuron model to Hodgkin-Huxley’s. For the muscle models, Hill’s type of model does not fit the muscle properties accurately enough when the muscle is being shortened. Alternative models will be tested. Other studies showed that the functional dexterity of human limbs – especially in the hands – is critically enabled by the tendon configurations and joint geometry [13]. As a result, if our platform is used to understand whether known neurophysiology and biomechanics are sufficient to produce able and pathological movements, it will be necessary to use this platform to control human-like limbs. Since the emulation speed can be flexibly adjusted from arbitrarily slow to 365x real-time, when speeded to exactly 1x real-time the platform will function as a digital controller with 1kHz refresh rate. The main purpose of the emulation is to learn how certain motor disorders progress during childhood development. This first requires the platform to reproduce motor symptoms that are compatible with clinical observations. For example it has been suggested that muscle spasticity in rats is associated with decreased soma size of α-motoneurons [14], which presumably reduced the firing threshold of neurons. Thus when lower firing threshold is introduced to the emulated motoneuron pool, similar EMG patterns as in [15] should be observed. It is also necessary for the symptoms to evolve with neural plasticity. In the current version we presume that the structure of each component remains time invariant. In the future work Spike Timing Dependent Plasticity (STDP) will be introduced such that all components are subject to temporal modifications. B. Verify motor unit recruitment pattern A. Multi-scale activities from emulation Emulation 1s Stretch Spindle Ia Sensory post-synaptic current Real Data Motoneurons Muscle Force EMG Figure 5: A) Physiological activity emulated by each model when the muscle is sinusoidally stretched. B) Comparing the emulated motor unit recruitment order with real experimental data. Acknowledgments The authors thank Dr. Gerald Loeb for helping set up the emulation of spindle models. This project is supported by NIH NINDS grant R01NS069214-02. 8 References [1] Izhikevich, E. M. Simple model of spiking neurons. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council 14, 1569–1572 (2003). [2] Glowatzki, E. & Fuchs, P. A. Transmitter release at the hair cell ribbon synapse. Nature neuroscience 5, 147–154 (2002). [3] Shadmehr, R. & Wise, S. P. A Mathematical Muscle Model. In Supplementary documents for “Computational Neurobiology of Reaching and Pointing”, 1–18 (MIT Press, Cambridge, MA, 2005). [4] Fuglevand, A. J., Winter, D. A. & Patla, A. E. Models of recruitment and rate coding organization in motor-unit pools. Journal of neurophysiology 70, 2470–2488 (1993). [5] Mileusnic, M. P., Brown, I. E., Lan, N. & Loeb, G. E. Mathematical models of proprioceptors. I. Control and transduction in the muscle spindle. Journal of neurophysiology 96, 1772–1788 (2006). [6] Gelfan, S., Kao, G. & Ruchkin, D. S. The dendritic tree of spinal neurons. The Journal of comparative neurology 139, 385–411 (1970). [7] Sanger, T. D. Neuro-mechanical control using differential stochastic operators. In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, 4494–4497 (2010). [8] Sanger, T. D. Distributed control of uncertain systems using superpositions of linear operators. Neural computation 23, 1911–1934 (2011). [9] Lomont, C. Fast inverse square root (2003). URL http://www.lomont.org/Math/Papers/ 2003/InvSqrt.pdf. [10] Henneman, E. Relation between size of neurons and their susceptibility to discharge. Science (New York, N.Y.) 126, 1345–1347 (1957). [11] De Luca, C. J. & Hostage, E. C. Relationship between firing rate and recruitment threshold of motoneurons in voluntary isometric contractions. Journal of neurophysiology 104, 1034–1046 (2010). [12] Raphael, G., Tsianos, G. A. & Loeb, G. E. Spinal-like regulator facilitates control of a two-degree-offreedom wrist. The Journal of neuroscience : the official journal of the Society for Neuroscience 30, 9431–9444 (2010). [13] Valero-Cuevas, F. J. et al. The tendon network of the fingers performs anatomical computation at a macroscopic scale. IEEE transactions on bio-medical engineering 54, 1161–1166 (2007). [14] Brashear, A. & Elovic, E. Spasticity: Diagnosis and Management (Demos Medical, 2010), 1 edn. [15] Levin, M. F. & Feldman, A. G. The role of stretch reflex threshold regulation in normal and impaired motor control. Brain research 657, 23–30 (1994). 9

2 0.85415727 283 nips-2012-Putting Bayes to sleep

Author: Dmitry Adamskiy, Manfred K. Warmuth, Wouter M. Koolen

Abstract: We consider sequential prediction algorithms that are given the predictions from a set of models as inputs. If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart). However, what if the favored models in each segment are from a small subset, i.e. the data is likely to be predicted well by models that predicted well before? Curiously, fitting such “sparse composite models” is achieved by mixing in a bit of all the past posteriors. This self-referential updating method is rather peculiar, but it is efficient and gives superior performance on many natural data sets. Also it is important because it introduces a long-term memory: any model that has done well in the past can be recovered quickly. While Bayesian interpretations can be found for mixing in a bit of the initial prior, no Bayesian interpretation is known for mixing in past posteriors. We build atop the “specialist” framework from the online learning literature to give the Mixing Past Posteriors update a proper Bayesian foundation. We apply our method to a well-studied multitask learning problem and obtain a new intriguing efficient update that achieves a significantly better bound. 1

3 0.83467799 62 nips-2012-Burn-in, bias, and the rationality of anchoring

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

4 0.83467799 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

Author: Adam Coates, Andrej Karpathy, Andrew Y. Ng

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

5 0.81211042 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

Author: Yung-kyun Noh, Frank Park, Daniel D. Lee

Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1

same-paper 6 0.77543128 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

7 0.68211281 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

8 0.67020786 23 nips-2012-A lattice filter model of the visual pathway

9 0.65186948 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

10 0.64389706 190 nips-2012-Learning optimal spike-based representations

11 0.64321208 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

12 0.6405642 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

13 0.63575268 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

14 0.633479 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

15 0.63255668 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

16 0.63238239 345 nips-2012-Topic-Partitioned Multinetwork Embeddings

17 0.63218749 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

18 0.62914258 170 nips-2012-Large Scale Distributed Deep Networks

19 0.62882733 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

20 0.62719506 102 nips-2012-Distributed Non-Stochastic Experts