nips nips2009 nips2009-50 knowledge-graph by maker-knowledge-mining

50 nips-2009-Canonical Time Warping for Alignment of Human Behavior

Source: pdf

Author: Feng Zhou, Fernando Torre

Abstract: Alignment of time series is an important problem to solve in many scientiﬁc disciplines. In particular, temporal alignment of two or more subjects performing similar activities is a challenging problem due to the large temporal scale difference between human actions as well as the inter/intra subject variability. In this paper we present canonical time warping (CTW), an extension of canonical correlation analysis (CCA) for spatio-temporal alignment of human motion between two subjects. CTW extends previous work on CCA in two ways: (i) it combines CCA with dynamic time warping (DTW), and (ii) it extends CCA by allowing local spatial deformations. We show CTW’s effectiveness in three experiments: alignment of synthetic data, alignment of motion capture data of two subjects performing similar actions, and alignment of similar facial expressions made by two people. Our results demonstrate that CTW provides both visually and qualitatively better alignment than state-of-the-art techniques based on DTW. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Alignment of time series is an important problem to solve in many scientiﬁc disciplines. [sent-5, score-0.056]

2 In particular, temporal alignment of two or more subjects performing similar activities is a challenging problem due to the large temporal scale difference between human actions as well as the inter/intra subject variability. [sent-6, score-0.48]

3 In this paper we present canonical time warping (CTW), an extension of canonical correlation analysis (CCA) for spatio-temporal alignment of human motion between two subjects. [sent-7, score-0.951]

4 CTW extends previous work on CCA in two ways: (i) it combines CCA with dynamic time warping (DTW), and (ii) it extends CCA by allowing local spatial deformations. [sent-8, score-0.538]

5 We show CTW’s effectiveness in three experiments: alignment of synthetic data, alignment of motion capture data of two subjects performing similar actions, and alignment of similar facial expressions made by two people. [sent-9, score-0.782]

6 Our results demonstrate that CTW provides both visually and qualitatively better alignment than state-of-the-art techniques based on DTW. [sent-10, score-0.227]

7 1 Introduction Temporal alignment of time series has been an active research topic in many scientiﬁc disciplines such as bioinformatics, text analysis, computer graphics, and computer vision. [sent-11, score-0.3]

8 In particular, temporal alignment of human behavior is a fundamental step in many applications such as recognition [1], temporal segmentation [2] and synthesis of human motion [3]. [sent-12, score-0.482]

9 1a which shows one subject walking with varying speed and different styles and Fig. [sent-14, score-0.052]

10 1b which shows two subjects reading the same text. [sent-15, score-0.049]

11 Previous work on alignment of human motion has been addressed mostly in the context of recognizing human activities and synthesizing realistic motion. [sent-16, score-0.42]

12 Typically, some models such as hidden Markov models [4, 5, 6], weighted principal component analysis [7], independent component analysis [8, 9] or multi-linear models [10] are learned from training data and in the testing phase the time series is aligned w. [sent-17, score-0.094]

13 In the context of computer vision a key aspect for successful recognition of activities is building view-invariant representations. [sent-21, score-0.033]

14 [1] proposed a view-invariant descriptor for actions making use of the afﬁnity matrix between time instances. [sent-23, score-0.053]

15 Caspi and Irani [11] temporally aligned videos from two closely attached cameras. [sent-24, score-0.038]

16 [12, 13] aligned trajectories of two moving points using constraints from the fundamental matrix. [sent-26, score-0.038]

17 [3] proposed the iterative motion warping, a method that ﬁnds a spatio-temporal warping between two instances of motion captured data. [sent-28, score-0.446]

18 In the context of data mining there have been several extensions of DTW [14] to align time series. [sent-29, score-0.149]

19 Keogh and Pazzani [15] used derivatives of the original signal to improve alignment with DTW. [sent-30, score-0.227]

20 [16] proposed continuous proﬁle models, a probabilistic method for simultaneously aligning and normalizing sets of time series. [sent-32, score-0.104]

21 A relatively unexplored problem in behavioral analysis is the alignment between the motion of the body of face in two or more subjects (e. [sent-33, score-0.371]

22 Major challenges to solve human motion align1 (a) (b) Figure 1: Temporal alignment of human behavior. [sent-37, score-0.37]

23 (a) One person walking in normal pose, slow speed, another viewpoint and exaggerated steps (clockwise). [sent-38, score-0.049]

24 ment problems are: (i) allowing alignment between different sets of multidimensional features (e. [sent-40, score-0.252]

25 , audio/video), (ii) introducing a feature selection or feature weighting mechanism to compensate for subject variability or irrelevant features and (iii) execution rate [17]. [sent-42, score-0.087]

26 To solve these problems, this paper proposes canonical time warping (CTW) for accurate spatio-temporal alignment between two behavioral time series. [sent-43, score-0.735]

27 We pose the problem as ﬁnding the temporal alignment that maximizes the spatial correlation between two behavioral samples coming from two subjects. [sent-44, score-0.412]

28 To accommodate for subject variability and take into account the difference in the dimensionally of the signals, CTW uses CCA as a measure of spatial alignment. [sent-45, score-0.11]

29 CTW extends DTW by adding a feature weighting mechanism that is able to align signals of different dimensionality. [sent-47, score-0.251]

30 CTW also extends CCA by incorporating time warping and allowing local spatial transformations. [sent-48, score-0.444]

31 Section 2 reviews related work on dynamic time warping and canonical correlation analysis. [sent-50, score-0.541]

32 Section 4 extends CTW to take into account local transformations. [sent-52, score-0.047]

33 2 Previous work This section describes previous work on canonical correlation analysis and dynamic time warping. [sent-54, score-0.241]

34 1 Canonical correlation analysis Canonical correlation analysis (CCA) [18] is a technique to extract common features from a pair of multivariate data. [sent-56, score-0.084]

35 The pair of canonical variates T T (vx X, vy Y) is uncorrelated with other canonical variates of lower order. [sent-61, score-0.681]

36 Each successive canonical variate pair achieves the maximum correlation orthogonal to the preceding pairs. [sent-62, score-0.181]

37 1 has a closed form solution in terms of a generalized eigenvalue problem. [sent-64, score-0.02]

38 See [19] for a uniﬁcation of several component analysis methods and a review of numerical techniques to efﬁciently solve the generalized eigenvalue problems. [sent-65, score-0.02]

39 In computer vision, CCA has been used for matching sets of images in problems such as activity recognition from video [20] and activity correlation from cameras [21]. [sent-66, score-0.042]

40 [22] Bold capital letters denote a matrix X, bold lower-case letters a column vector x. [sent-68, score-0.063]

41 xij denotes the scalar in the ith row and j th column of the matrix X. [sent-70, score-0.046]

42 1m×n , 0m×n ∈ Rm×n are matrices of ones and zeros. [sent-72, score-0.02]

43 proposed an extension of CCA with parameterized warping functions to align protein expressions. [sent-85, score-0.419]

44 The learned warping function is a linear combination of hyperbolic tangent functions with nonnegative coefﬁcients, ensuring monotonicity. [sent-86, score-0.3]

45 Unlike our method, the warping function is unable to deal with feature weighting. [sent-87, score-0.3]

46 The correspondence matrix P can be parameterized by a pair of path vectors, P = [px , py ]T ∈ R2×m , in which px ∈ {1 : nx }m×1 and py ∈ {1 : ny }m×1 denote the composition of alignment in frames. [sent-90, score-0.641]

47 For instance, the ith frame in X and the j th frame in Y are aligned iff there exists pt = [px , py ]T = [i, j]T t t for some t. [sent-91, score-0.378]

48 P has to satisfy three additional constraints: boundary condition (p1 ≡ [1, 1]T and pm ≡ [nx , ny ]T ), continuity (0 ≤ pt − pt−1 ≤ 1) and monotonicity (t1 ≥ t2 ⇒ pt1 − pt2 ≥ 0). [sent-92, score-0.179]

49 The policy function, π : {1 : nx } × {1 : ny } → {[1, 0]T , [0, 1]T , [1, 1]T }, deﬁnes the deterministic transition between consecutive steps, pt+1 = pt + π(pt ). [sent-94, score-0.4]

50 Once the policy queue is known, the alignment steps can be recursively constructed from the starting point, p1 = [1, 1]T . [sent-95, score-0.29]

51 2 shows an example of DTW to align two 1-D time series. [sent-97, score-0.149]

52 3 Canonical time warping (CTW) This section describes the energy function and optimization strategies for CTW. [sent-98, score-0.358]

53 1 Energy function for CTW In order to have a compact and compressible energy function for CTW, it is important to notice that Eq. [sent-100, score-0.045]

54 2 can be rewritten as: nx ny y x wi T wj x i − y j Jdtw (Wx , Wy ) = 2 T T = XWx − YWy 2 F, (4) i=1 j=1 where Wx ∈ {0, 1}m×nx , Wy ∈ {0, 1}m×ny are binary selection matrices that need to be inferred to align X and Y. [sent-101, score-0.388]

55 4 the matrices Wx and Wy encode the alignment path. [sent-103, score-0.247]

56 For instance, 3 y x wtpx = wtpy = 1 assigns correspondence between the px th frame in X and py t t t th frame in Y. [sent-104, score-0.283]

57 CCA applies a linear transformation to the rows (features), while DTW applies binary transformations to the columns (time). [sent-109, score-0.032]

58 In order to accommodate for differences in style and subject variability, add a feature selection mechT T anism, and reduce the dimensionality of the signals, CTW adds a linear transformation (Vx , Vy ) (as CCA) to the least-squares form of DTW (Eq. [sent-110, score-0.078]

59 Moreover, this transformation allows aligning temporal signals with different dimensionality (e. [sent-112, score-0.204]

60 CTW combines DTW and CCA by minimizing: T T T T Jctw (Wx , Wy , Vx , Vy ) = Vx XWx − Vy YWy dx ×b 2 F, (5) dy ×b where Vx ∈ R , Vy ∈ R , b ≤ min(dx , dy ) parameterize the spatial warping by projecting the sequences into the same coordinate system. [sent-115, score-0.526]

61 Wx and Wy warp the signal in time to achieve optimum temporal alignment. [sent-116, score-0.103]

62 CTW is a direct and clean extension of CCA and DTW to align two signals X and Y in space and time. [sent-120, score-0.161]

63 It extends previous work on CCA by adding temporal alignment and on DTW by allowing a feature selection and dimensionality reduction mechanism for aligning signals of different dimensions. [sent-121, score-0.492]

64 We alternate between solving for Wx , Wy using DTW, and optimally computing the spatial projections using CCA. [sent-124, score-0.059]

65 These steps monotonically decrease Jctw and since the function is bounded below it will converge to a critical point. [sent-125, score-0.019]

66 Alternatively, PCA can be applied independently to each set, and used as initial estimation of Vx and Vy if dx = dy . [sent-132, score-0.117]

67 In the case of high-dimensional data, the generalized eigenvalue problem is solved by regularizing the covariance matrices adding a scaled identity matrix. [sent-133, score-0.04]

68 We consider the algorithm to converge when the difference between two consecutive values of Jctw is small. [sent-135, score-0.017]

69 4 Local canonical time warping (LCTW) In the previous section we have illustrated how CTW can align in space and time two time series of different dimensionality. [sent-136, score-0.657]

70 , aligning long sequences) where a global transformation of the whole time series is not accurate. [sent-139, score-0.162]

71 This section extends CTW by allowing multiple local spatial deformations. [sent-141, score-0.114]

72 1 Energy function for LCTW Let us assume that the spatial transformation for each frame in X and Y can be model as a x x linear combination of kx or ky bases. [sent-143, score-0.305]

73 Let be Vx = [V1 T , · · · , Vkx T ]T ∈ Rkx dx ×b , Vy = yT y T T [V1 , · · · , Vky ] ∈ Rky dy ×b and b ≤ min(kx dx , ky dy ). [sent-144, score-0.323]

74 ricx denotes the coefﬁcient (or y weight) of the cth basis for the ith frame of X (similarly for rjcy ). [sent-146, score-0.178]

75 The last two regularization terms, Fx ∈ Rnx ×nx , Fy ∈ Rny ×ny , are 1st order differential operators of rxx ∈ Rnx ×1 , ryy ∈ Rny ×1 , encouraging smooth solutions over time. [sent-152, score-0.078]

76 c c Observe that Jctw is a special case of Jlctw when kx = ky = 1. [sent-153, score-0.173]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ctw', 0.379), ('vx', 0.369), ('vy', 0.369), ('warping', 0.3), ('cca', 0.266), ('alignment', 0.227), ('dtw', 0.216), ('wy', 0.2), ('wx', 0.178), ('nx', 0.16), ('canonical', 0.122), ('align', 0.119), ('jctw', 0.118), ('pt', 0.111), ('ry', 0.095), ('ky', 0.089), ('kx', 0.084), ('aligning', 0.074), ('motion', 0.073), ('ny', 0.068), ('dy', 0.067), ('py', 0.067), ('rx', 0.062), ('jdtw', 0.059), ('lctw', 0.059), ('ricx', 0.059), ('ywy', 0.059), ('frame', 0.058), ('temporal', 0.056), ('cy', 0.056), ('px', 0.052), ('rnx', 0.052), ('rny', 0.052), ('xwx', 0.052), ('cx', 0.051), ('dx', 0.05), ('extends', 0.047), ('dynamic', 0.047), ('fy', 0.044), ('policy', 0.044), ('spatial', 0.042), ('correlation', 0.042), ('signals', 0.042), ('fx', 0.042), ('idx', 0.039), ('idy', 0.039), ('jlctw', 0.039), ('rdx', 0.039), ('rjcy', 0.039), ('rxx', 0.039), ('ryy', 0.039), ('xwyt', 0.039), ('ydy', 0.039), ('aligned', 0.038), ('human', 0.035), ('variates', 0.034), ('xdx', 0.034), ('activities', 0.033), ('transformation', 0.032), ('ib', 0.032), ('time', 0.03), ('walking', 0.03), ('xt', 0.029), ('energy', 0.028), ('subjects', 0.028), ('behavioral', 0.026), ('series', 0.026), ('allowing', 0.025), ('yt', 0.024), ('th', 0.024), ('accommodate', 0.024), ('actions', 0.023), ('variability', 0.022), ('weighting', 0.022), ('letters', 0.022), ('subject', 0.022), ('ith', 0.022), ('mechanism', 0.021), ('wj', 0.021), ('reading', 0.021), ('robotics', 0.021), ('matrices', 0.02), ('eigenvalue', 0.02), ('pose', 0.019), ('steps', 0.019), ('bold', 0.019), ('graphics', 0.018), ('rt', 0.018), ('optimally', 0.017), ('unexplored', 0.017), ('xpx', 0.017), ('compressible', 0.017), ('disciplines', 0.017), ('synthesizing', 0.017), ('variate', 0.017), ('vectorization', 0.017), ('warp', 0.017), ('xnx', 0.017), ('yyt', 0.017), ('consecutive', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior

Author: Feng Zhou, Fernando Torre

2 0.31393927 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

Author: Piyush Rai, Hal Daume

Abstract: Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efﬁcacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction. 1

3 0.15344742 46 nips-2009-Bilinear classifiers for visual recognition

Author: Hamed Pirsiavash, Deva Ramanan, Charless C. Fowlkes

Abstract: We describe an algorithm for learning bilinear SVMs. Bilinear classiﬁers are a discriminative variant of bilinear models, which capture the dependence of data on multiple factors. Such models are particularly appropriate for visual data that is better represented as a matrix or tensor, rather than a vector. Matrix encodings allow for more natural regularization through rank restriction. For example, a rank-one scanning-window classiﬁer yields a separable ﬁlter. Low-rank models have fewer parameters and so are easier to regularize and faster to score at run-time. We learn low-rank models with bilinear classiﬁers. We also use bilinear classiﬁers for transfer learning by sharing linear factors between different classiﬁcation tasks. Bilinear classiﬁers are trained with biconvex programs. Such programs are optimized with coordinate descent, where each coordinate step requires solving a convex program - in our case, we use a standard off-the-shelf SVM solver. We demonstrate bilinear SVMs on difﬁcult problems of people detection in video sequences and action classiﬁcation of video sequences, achieving state-of-the-art results in both. 1

4 0.11611633 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

Author: Yusuke Fujiwara, Yoichi Miyawaki, Yukiyasu Kamitani

Abstract: Image representation based on image bases provides a framework for understanding neural representation of visual perception. A recent fMRI study has shown that arbitrary contrast-deﬁned visual images can be reconstructed from fMRI activity patterns using a combination of multi-scale local image bases. In the reconstruction model, the mapping from an fMRI activity pattern to the contrasts of the image bases was learned from measured fMRI responses to visual images. But the shapes of the images bases were ﬁxed, and thus may not be optimal for reconstruction. Here, we propose a method to build a reconstruction model in which image bases are automatically extracted from the measured data. We constructed a probabilistic model that relates the fMRI activity space to the visual image space via a set of latent variables. The mapping from the latent variables to the visual image space can be regarded as a set of image bases. We found that spatially localized, multi-scale image bases were estimated near the fovea, and that the model using the estimated image bases was able to accurately reconstruct novel visual images. The proposed method provides a means to discover a novel functional mapping between stimuli and brain activity patterns.

5 0.085320503 261 nips-2009-fMRI-Based Inter-Subject Cortical Alignment Using Functional Connectivity

Author: Bryan Conroy, Ben Singer, James Haxby, Peter J. Ramadge

Abstract: The inter-subject alignment of functional MRI (fMRI) data is important for improving the statistical power of fMRI group analyses. In contrast to existing anatomically-based methods, we propose a novel multi-subject algorithm that derives a functional correspondence by aligning spatial patterns of functional connectivity across a set of subjects. We test our method on fMRI data collected during a movie viewing experiment. By cross-validating the results of our algorithm, we show that the correspondence successfully generalizes to a secondary movie dataset not used to derive the alignment. 1

6 0.074777782 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

7 0.066367842 88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out

8 0.064645544 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models

9 0.053623799 114 nips-2009-Indian Buffet Processes with Power-law Behavior

10 0.052206557 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

11 0.049228132 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

12 0.043699548 137 nips-2009-Learning transport operators for image manifolds

13 0.039530497 236 nips-2009-Structured output regression for detection with partial truncation

14 0.039343014 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

15 0.036142994 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

16 0.033657365 147 nips-2009-Matrix Completion from Noisy Entries

17 0.033570573 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

18 0.033010382 246 nips-2009-Time-Varying Dynamic Bayesian Networks

19 0.032626815 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

20 0.032528587 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.112), (1, -0.029), (2, 0.005), (3, -0.011), (4, 0.009), (5, 0.032), (6, 0.089), (7, -0.071), (8, 0.019), (9, 0.04), (10, 0.099), (11, 0.021), (12, -0.143), (13, 0.044), (14, -0.203), (15, -0.061), (16, 0.18), (17, 0.164), (18, 0.055), (19, -0.027), (20, 0.115), (21, -0.063), (22, 0.125), (23, -0.204), (24, -0.159), (25, -0.039), (26, 0.237), (27, -0.008), (28, -0.11), (29, -0.183), (30, 0.016), (31, 0.139), (32, -0.069), (33, -0.013), (34, 0.012), (35, 0.129), (36, 0.037), (37, -0.057), (38, 0.064), (39, -0.051), (40, 0.003), (41, -0.021), (42, -0.06), (43, 0.09), (44, -0.061), (45, 0.058), (46, 0.011), (47, 0.132), (48, -0.046), (49, -0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96844685 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior

Author: Feng Zhou, Fernando Torre

2 0.77084255 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

Author: Piyush Rai, Hal Daume

3 0.53572321 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

Author: Yusuke Fujiwara, Yoichi Miyawaki, Yukiyasu Kamitani

4 0.4296208 46 nips-2009-Bilinear classifiers for visual recognition

Author: Hamed Pirsiavash, Deva Ramanan, Charless C. Fowlkes

5 0.2510426 114 nips-2009-Indian Buffet Processes with Power-law Behavior

Author: Yee W. Teh, Dilan Gorur

Abstract: The Indian buffet process (IBP) is an exchangeable distribution over binary matrices used in Bayesian nonparametric featural models. In this paper we propose a three-parameter generalization of the IBP exhibiting power-law behavior. We achieve this by generalizing the beta process (the de Finetti measure of the IBP) to the stable-beta process and deriving the IBP corresponding to it. We ﬁnd interesting relationships between the stable-beta process and the Pitman-Yor process (another stochastic process used in Bayesian nonparametric models with interesting power-law properties). We derive a stick-breaking construction for the stable-beta process, and ﬁnd that our power-law IBP is a good model for word occurrences in document corpora. 1

6 0.2385565 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

7 0.23505522 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

8 0.23219168 88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out

9 0.22916697 261 nips-2009-fMRI-Based Inter-Subject Cortical Alignment Using Functional Connectivity

10 0.19390389 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

11 0.19390333 236 nips-2009-Structured output regression for detection with partial truncation

12 0.19284457 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

13 0.18994479 243 nips-2009-The Ordered Residual Kernel for Robust Motion Subspace Clustering

14 0.18367806 137 nips-2009-Learning transport operators for image manifolds

15 0.17552884 209 nips-2009-Robust Value Function Approximation Using Bilinear Programming

16 0.16741817 152 nips-2009-Measuring model complexity with the prior predictive

17 0.16537699 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models

18 0.16489176 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models

19 0.16444971 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

20 0.15778168 112 nips-2009-Human Rademacher Complexity

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.431), (24, 0.03), (25, 0.032), (35, 0.03), (36, 0.113), (39, 0.025), (57, 0.011), (58, 0.081), (61, 0.015), (71, 0.05), (86, 0.045), (91, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79317516 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior

Author: Feng Zhou, Fernando Torre

2 0.58953995 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

Author: Douglas Eck, Yoshua Bengio, Aaron C. Courville

Abstract: The Indian Buffet Process is a Bayesian nonparametric approach that models objects as arising from an inﬁnite number of latent factors. Here we extend the latent factor model framework to two or more unbounded layers of latent factors. From a generative perspective, each layer deﬁnes a conditional factorial prior distribution over the binary latent variables of the layer below via a noisy-or mechanism. We explore the properties of the model with two empirical studies, one digit recognition task and one music tag data experiment. 1

3 0.48224801 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li

Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classiﬁcation tasks, and deﬁne a so-called essential loss for ranking as the weighted sum of the classiﬁcation errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modiﬁcations can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1

4 0.34776151 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning

Author: Chonghai Hu, Weike Pan, James T. Kwok

Abstract: Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1 -regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems. 1

5 0.3474665 76 nips-2009-Efficient Learning using Forward-Backward Splitting

Author: Yoram Singer, John C. Duchi

Abstract: We describe, analyze, and experiment with a new framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we ﬁrst perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the ﬁrst phase. This yields a simple yet effective algorithm for both batch penalized risk minimization and online learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as ℓ1 . We derive concrete and very simple algorithms for minimization of loss functions with ℓ1 , ℓ2 , ℓ2 , and ℓ∞ regularization. We 2 also show how to construct efﬁcient algorithms for mixed-norm ℓ1 /ℓq regularization. We further extend the algorithms and give efﬁcient implementations for very high-dimensional data with sparsity. We demonstrate the potential of the proposed framework in experiments with synthetic and natural datasets. 1

6 0.34736902 72 nips-2009-Distribution Matching for Transduction

7 0.34652182 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

8 0.34639353 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

9 0.34566835 27 nips-2009-Adaptive Regularization of Weight Vectors

10 0.34523723 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

11 0.34514281 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

12 0.34445742 129 nips-2009-Learning a Small Mixture of Trees

13 0.343853 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

14 0.34385282 128 nips-2009-Learning Non-Linear Combinations of Kernels

15 0.34253439 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

16 0.34193119 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output

17 0.34180811 260 nips-2009-Zero-shot Learning with Semantic Output Codes

18 0.34081221 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors

19 0.34054846 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

20 0.34037888 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction