nips nips2010 nips2010-255 knowledge-graph by maker-knowledge-mining

255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs

Source: pdf

Author: Nikos Karampatziakis

Abstract: We cast the problem of identifying basic blocks of code in a binary executable as learning a mapping from a byte sequence to a segmentation of the sequence. In general, inference in segmentation models, such as semi-CRFs, can be cubic in the length of the sequence. By taking advantage of the structure of our problem, we derive a linear-time inference algorithm which makes our approach practical, given that even small programs are tens or hundreds of thousands bytes long. Furthermore, we introduce two loss functions which are appropriate for our problem and show how to use structural SVMs to optimize the learned mapping for these losses. Finally, we present experimental results that demonstrate the advantages of our method against a strong baseline. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We cast the problem of identifying basic blocks of code in a binary executable as learning a mapping from a byte sequence to a segmentation of the sequence. [sent-3, score-0.961]

2 By taking advantage of the structure of our problem, we derive a linear-time inference algorithm which makes our approach practical, given that even small programs are tens or hundreds of thousands bytes long. [sent-5, score-0.344]

3 1 Introduction In this work, we are interested in the problem of extracting the CPU instructions that comprise a binary executable ﬁle. [sent-8, score-0.676]

4 In particular we are motivated by a computer security application, in which we want to detect whether a previously unseen executable contains malicious code. [sent-10, score-0.355]

5 This is a task that computer security experts have to solve many times every day because in the last few years the volume of malicious software has witnessed an exponential increase (estimated at 50000 new malicious code samples every day). [sent-11, score-0.366]

6 This happens because the tools themselves are based on heuristics and make many assumptions about the way a binary executable is structured. [sent-13, score-0.321]

7 But why is it hard to ﬁnd the instructions inside a binary executable? [sent-14, score-0.529]

8 After all, when running a program the CPU always knows which instructions it is executing. [sent-15, score-0.474]

9 The caveat here is that we want to extract the instructions from the executable without running it. [sent-16, score-0.672]

10 On one hand, running the executable will in general reveal little information about all possible instructions in the program, and on the other hand it may be dangerous or even misguiding. [sent-17, score-0.647]

11 1 Another issue that makes this task challenging is that binary executables contain many other things except the instructions they will execute. [sent-18, score-0.629]

12 2 Furthermore, the executable does not contain any demarcations about the locations of instructions in the ﬁle. [sent-19, score-0.647]

13 3 Nevertheless, an executable ﬁle is organized into sections such as a code section, a section with constants, a section containing global variables etc. [sent-20, score-0.358]

14 2 Here, we are focusing on Windows executables for the Intel x86 architecture, though everything carries over to any other modern operating system and any other architecture with a complex instruction set. [sent-26, score-0.644]

15 3 Executables that contain debugging information are an exception, but most software is released without it 1 1 refer to all instructions as code and to everything else as data. [sent-27, score-0.648]

16 For example, the compiler may, for performance reasons, prefer to pad a function with up to 3 data bytes so that the next function starts at an address that is a multiple of 4. [sent-28, score-0.33]

17 This table does not contain any instructions, yet it can be stored together with the instructions that make up the function in which the “switch” statement appears. [sent-31, score-0.443]

18 Apart from the compiler, the author of a malicious program can also insert data bytes in the code section of her program. [sent-32, score-0.492]

19 In this sense, words play the role of instructions and letters play the role of bytes. [sent-44, score-0.443]

20 For complex instruction sets like the Intel x86, instructions are composed of a variable number of bytes, as words are composed of a variable number of letters. [sent-45, score-0.93]

21 This analogy motivates tackling our problem as predicting a segmentation of the input sequence into blocks of code and blocks of data. [sent-51, score-0.671]

22 An obvious ﬁrst approach for this task would be to treat it as a sequence labeling problem and train, for example, a linear chain conditional random ﬁeld (CRF) [2] to tag each byte in the sequence as being the beginning, inside, or outside of a data block. [sent-52, score-0.4]

23 To learn the parameters of the model, we will use structural SVMs to optimize performance according to loss functions that are appropriate for our task, such as the sum of incorrect plus missed CPU instructions induced by the segmentation. [sent-57, score-0.628]

24 2 Heuristic tools for analyzing binary executables Tools for statically analyzing binary executables differ in the details of their workings but they all share the same high level logic, which is called recursive disassembly. [sent-61, score-0.512]

25 5 The tool starts by obtaining the address of the ﬁrst instruction from a speciﬁc location inside the executable. [sent-62, score-0.612]

26 All the disassembled instructions would execute one after the other until we reach an instruction that changes the ﬂow of execution. [sent-73, score-0.961]

27 After the execution of an unconditional jump the next instruction to be executed is at the address speciﬁed by the jump’s argument. [sent-75, score-0.627]

28 Other control ﬂow instructions are similar to the unconditional jump. [sent-76, score-0.507]

29 A call saves the address of the next instruction and then jumps. [sent-78, score-0.562]

30 The tool places the arguments of control ﬂow instructions it encounters on the stack. [sent-80, score-0.469]

31 If the control ﬂow instruction is a conditional jump or a call, it continues disassembling, otherwise it takes the next address, that has not yet been disassembled, from the stack and repeats. [sent-81, score-0.622]

32 Even though recursive disassembly seems like a robust way of extracting the instructions from a program, there are many reasons that can make it fail [6]. [sent-82, score-0.676]

33 Most importantly, the arguments of the control ﬂow instructions do not have to be constants, they can be registers whose values are generally not available during static analysis. [sent-83, score-0.507]

34 Hence, recursive disassembly can ran out of addresses much before all the instructions have been extracted. [sent-84, score-0.676]

35 This means that if the call instruction was spanning positions a, . [sent-94, score-0.608]

36 , a + − 1 of the sequence, upon the function’s return the next instruction will be at position a + + 3, not at a + . [sent-97, score-0.595]

37 This will give a completely different decoding of the sequence and is called disassembly desynchronization. [sent-98, score-0.322]

38 To return to a text analogy, recursive disassembly parses the sequence “driverballetterrace” as [driver, ballet, terrace] while the actual parsing, obtained by starting three positions down, is [xxx, verbal, letter, race], where x denotes junk data. [sent-99, score-0.395]

39 3 A structured prediction model In this section we will combine ideas from recursive disassembly and structured prediction to derive an expressive and efﬁcient model for predicting the instructions inside a binary executable. [sent-100, score-0.802]

40 As in recursive disassembly, if we are certain that code begins at position i we can unambiguously disassemble the byte sequence starting from position i until we reach a control ﬂow instruction. [sent-101, score-0.697]

41 The trellis graph has vertices bi that denote the possibility that a code block starts at position i. [sent-103, score-0.637]

42 It also has vertices ej and edges (bi , ej ) which denote that disassembling from position i yields a possible code block that spans positions i, . [sent-104, score-0.838]

43 Edges (ej , bj+1 ) and (ej , dj+1 ) encode that the next byte after a code block can either be the beginning of another code block, or a data byte respectively. [sent-109, score-0.977]

44 For data blocks no particular structure is assumed and we just use edges (dj , dj+1 ) and (dj , bj+1 ) to denote that a data byte can be followed either by another data byte or by the beginning of a code block respectively. [sent-110, score-0.996]

45 For example, the sequence in Figure 1 contains three code blocks that span positions 1–7, 8–8, and 10–12. [sent-115, score-0.437]

46 6 Some subsequences will produce errors while decoding to assembly because some bytes may not correspond to any instructions. [sent-117, score-0.351]

47 These could never be valid code blocks because they would crash the program. [sent-118, score-0.332]

48 Below this, we show the actual x86 instructions with position 9 being a data byte. [sent-120, score-0.525]

49 We show both the mnemonic instructions as well as the bytes they are composed of. [sent-121, score-0.664]

50 Disassembling from position i in the sequence yields an interval [i, j] where j is the position where the ﬁrst encountered control ﬂow instruction ends. [sent-132, score-0.766]

51 The ﬁrst loss function, which we call block loss, comes from the observation that the beginnings of the code blocks are necessary and sufﬁcient to describe the segmentation. [sent-141, score-0.587]

52 In the case where the inferred y identiﬁes, say, the second instruction in a block as its start, we would like to penalize ˆ this less, since the disassembly is still synchronized, and only missed one instruction. [sent-143, score-0.844]

53 Formally, we let y and y be the sets of positions where the instructions start in the two segmentations and we ˆ deﬁne the instruction loss ∆I (y, y ) to be the cardinality of their symmetric difference. [sent-144, score-1.175]

54 For the instruction loss, the positions of the real instructions are y = {1, 4, 6, 8, 10, 11} while the proposed segmentation has y = {2, 9, 10, 11}. [sent-150, score-1.143]

55 Taking the symmetric difference of these ˆ sets, we see that the instruction loss has value 6. [sent-151, score-0.578]

56 If we simply sum the losses for each sequence then the losses in longer executables may overshadow the losses on shorter ones. [sent-153, score-0.511]

57 To represent each executable equally in the ﬁnal measure we can normalize our loss functions, for example we can deﬁne the normalized instruction loss to be ∆N I (y, y ) = ˆ |y| + |ˆ| − 2|y ∩ y | y ˆ |y| and we similarly deﬁne a normalized block loss ∆N B . [sent-154, score-1.117]

58 , n of sequences and segmentations we can learn a vector of parameters w, that assigns a high score to segmentation yi and a low score to all other possible segmentations of xi . [sent-160, score-0.334]

59 ∀i ∀¯ ∈ Yi : w Ψ(xi , yi ) − w Ψ(xi , y ) ≥ ∆(yi , y ) − ξi y ¯ ¯ The constraints of this optimization problem enforce that the difference in score between the correct segmentation y and any incorrect segmentation y is at least as large as the loss ∆(yi , y ). [sent-163, score-0.448]

60 If bi ∈ y then bi deﬁnes an incorrect code / block which spans bytes i, . [sent-180, score-0.733]

61 , j and c(bi ) = 1 + |{k|bk ∈ y ∧ i < k ≤ j}|, capturing that we will introduce one incorrect block and we will skip all the blocks that begin between positions i and j. [sent-183, score-0.418]

62 For the instruction loss, y is a set of instruction positions. [sent-186, score-0.974]

63 If i ∈ y then bi is the beginning of an incorrect block that spans bytes i, . [sent-188, score-0.56]

64 , j / and produces instructions in a set of positions yi . [sent-191, score-0.572]

65 Let s be the ﬁrst position in this block that gets ˜ synchronized with the correct decoding i. [sent-192, score-0.327]

66 The ﬁrst term captures ˜ the number of incorrect instructions produced by treating bi as the start of a code block, while the second term captures the number of missed real instructions. [sent-196, score-0.73]

67 This approach could not discover all code blocks, especially when the compiler was automatically inserting code that did not exist in the source, such as the calls to destructors generated by C++ compilers. [sent-204, score-0.38]

68 Since the executables we used were 200 common programs from a typical installation of Windows XP, we believe that the outputs of heuristic tools should have little noise. [sent-206, score-0.317]

69 For each edge in the graph, the byte-level features are extracted from an 11 byte window around the source of the edge (so if the source vertex is at position i, the window spans positions i − 5, . [sent-216, score-0.469]

70 The features are which bytes and byte pairs appear in which position inside the window. [sent-220, score-0.628]

71 An example feature is “does byte c3 appear in position i − 1? [sent-221, score-0.318]

72 In x86 architectures, when the previous instruction is a return instruction this feature ﬁres. [sent-223, score-1.0]

73 These are obtained from histograms of instructions that occur in candidate code blocks (i. [sent-225, score-0.744]

74 We use two kinds of histograms, one where we abstract the values of the arguments of the instructions but keep their type (register, memory location or constant), and one where we completely discard all information about the arguments. [sent-228, score-0.443]

75 An example of the former type of feature would be “number of times the instruction [add register, register] appears in this block”. [sent-229, score-0.487]

76 An example of the latter type of feature would be “number of times the instruction [mov] appears in this block”. [sent-230, score-0.487]

77 Normalized losses (∆N X ) are multiplied with the ¯ ¯ ¯ average number of bytes (L), instructions (I), or blocks (B) to bring all numbers to a similar scale. [sent-270, score-0.91]

78 It tags each byte as being the beginning, inside or outside of a code block using Viterbi and optimizes Hamming loss. [sent-273, score-0.6]

79 Running a general segmentation model [4] was impractical since inference depends quadratically on the maximum length of the code blocks, which was 2800 in our data. [sent-274, score-0.364]

80 Table 2 shows the results of our comparison for different loss functions (columns): Hamming loss, instruction loss, block loss, and their normalized counterparts. [sent-277, score-0.731]

81 Results for normalized losses have ¯ ¯ ¯ been multiplied with the average number of bytes (L), instructions (I), or blocks (B) to bring all hmm numbers to a similar scale. [sent-278, score-0.938]

82 Greedy starts decoding from the begining of the sequence and after decoding a block (bi , ej ) it repeats at position j + 1. [sent-280, score-0.594]

83 It only marks a byte as data if the decoding fails, in which case it starts decoding from the next byte in the sequence. [sent-281, score-0.691]

84 The results suggest that just treating our task as a simple sequence labeling problem at the level of bytes already goes a long way in terms of Hamming loss and block loss. [sent-282, score-0.572]

85 SVMhmm sometimes predicts as the beginning of a code block a position that leads to a decoding error. [sent-283, score-0.525]

86 Since it is not clear how to compute the instruction loss in this case, we do not report instruction losses for this model. [sent-284, score-1.164]

87 For the Hamming loss, this is expected since the SVMwis models are more expressive and a small block loss or instruction loss implies a small Hamming loss, but not vice versa. [sent-291, score-0.862]

88 For the instruction loss, we believe this occurs because of two reasons. [sent-292, score-0.487]

89 First our data consists of benign programs and for them learning to identify the code blocks may be enough. [sent-293, score-0.375]

90 Second it may be harder to learn with the instruction loss since its value depends on how quickly each decoding synchronizes with another (the correct) decoding of the stream, something that is not modeled in the feature map we are using. [sent-294, score-0.762]

91 The end result is that the models trained for block loss also attain the smallest losses for all other loss functions. [sent-295, score-0.458]

92 5 Related work and other applications There are two lines of research which are relevant to this work: one is structured prediction approaches for segmenting sequences and the other is research on static analysis techniques for ﬁnding code and data blocks in executables. [sent-296, score-0.339]

93 Previous techniques for identifying code blocks in executables have used no or very little statistical learning. [sent-305, score-0.458]

94 For example, [10] and [11] use recursive disassembly and pattern heuristics similarly to currently used tools such as OllyDbg and IdaPro. [sent-306, score-0.321]

95 In this work, the authors use simple statistical models based on unigram and bigram instruction models in addition to the pattern heuristics. [sent-308, score-0.487]

96 However, these approaches make independent decisions for every candidate code block and they have a less principled way of dealing with equally plausible but overlapping code blocks. [sent-309, score-0.461]

97 One way to approximate this feature would be to count how many candidate code blocks have instructions that jump to or call the current position in the sequence. [sent-315, score-0.937]

98 6 Conclusions In this work we proposed a code segmentation model SVMwis that can help security experts in the static analysis of binary executables. [sent-327, score-0.395]

99 We also want to look into richer features such as some approximation of call consistency (since the actual constraints give rise to NPhard inference), so that addresses which are targets of call or jump instructions from a code block do not lie inside data blocks. [sent-332, score-1.017]

100 Finally, we plan to extend our model to allow for joint segmentation and classiﬁcation of the executable as malicious or not. [sent-333, score-0.424]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('instruction', 0.487), ('instructions', 0.443), ('byte', 0.236), ('bytes', 0.221), ('executable', 0.204), ('disassembly', 0.173), ('executables', 0.157), ('code', 0.154), ('block', 0.153), ('blocks', 0.147), ('segmentation', 0.134), ('svmwis', 0.11), ('losses', 0.099), ('decoding', 0.092), ('loss', 0.091), ('malicious', 0.086), ('ej', 0.083), ('position', 0.082), ('positions', 0.079), ('disassembling', 0.079), ('svmhmm', 0.079), ('segmentations', 0.075), ('programs', 0.074), ('jump', 0.069), ('hamming', 0.065), ('eax', 0.063), ('ollydbg', 0.063), ('trellis', 0.063), ('bi', 0.063), ('ow', 0.062), ('recursive', 0.06), ('vertices', 0.059), ('sequence', 0.057), ('inside', 0.057), ('tools', 0.052), ('scheduling', 0.052), ('debugging', 0.051), ('labeling', 0.05), ('yi', 0.05), ('inference', 0.049), ('decodings', 0.047), ('edi', 0.047), ('jnz', 0.047), ('ad', 0.044), ('beginning', 0.044), ('dj', 0.043), ('call', 0.042), ('register', 0.041), ('compiler', 0.041), ('sir', 0.041), ('expressive', 0.04), ('stack', 0.04), ('spans', 0.04), ('security', 0.04), ('incorrect', 0.039), ('static', 0.038), ('unconditional', 0.038), ('assembly', 0.038), ('heuristics', 0.036), ('starts', 0.035), ('heuristic', 0.034), ('costs', 0.033), ('address', 0.033), ('interval', 0.032), ('analogy', 0.032), ('features', 0.032), ('crash', 0.031), ('disassembled', 0.031), ('ebx', 0.031), ('idapro', 0.031), ('mov', 0.031), ('nop', 0.031), ('rebels', 0.031), ('usenix', 0.031), ('calls', 0.031), ('missed', 0.031), ('program', 0.031), ('path', 0.031), ('binary', 0.029), ('inc', 0.029), ('graph', 0.028), ('hmm', 0.028), ('svms', 0.028), ('workings', 0.028), ('crf', 0.027), ('fold', 0.027), ('length', 0.027), ('return', 0.026), ('edges', 0.026), ('control', 0.026), ('joachims', 0.026), ('misses', 0.025), ('saved', 0.025), ('confuse', 0.025), ('cpu', 0.025), ('news', 0.025), ('want', 0.025), ('structural', 0.024), ('ads', 0.024), ('trained', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs

Author: Nikos Karampatziakis

2 0.067737564 77 nips-2010-Epitome driven 3-D Diffusion Tensor image segmentation: on extracting specific structures

Author: Kamiya Motwani, Nagesh Adluru, Chris Hinrichs, Andrew Alexander, Vikas Singh

Abstract: We study the problem of segmenting speciﬁc white matter structures of interest from Diffusion Tensor (DT-MR) images of the human brain. This is an important requirement in many Neuroimaging studies: for instance, to evaluate whether a brain structure exhibits group level differences as a function of disease in a set of images. Typically, interactive expert guided segmentation has been the method of choice for such applications, but this is tedious for large datasets common today. To address this problem, we endow an image segmentation algorithm with “advice” encoding some global characteristics of the region(s) we want to extract. This is accomplished by constructing (using expert-segmented images) an epitome of a speciﬁc region – as a histogram over a bag of ‘words’ (e.g., suitable feature descriptors). Now, given such a representation, the problem reduces to segmenting a new brain image with additional constraints that enforce consistency between the segmented foreground and the pre-speciﬁed histogram over features. We present combinatorial approximation algorithms to incorporate such domain speciﬁc constraints for Markov Random Field (MRF) segmentation. Making use of recent results on image co-segmentation, we derive effective solution strategies for our problem. We provide an analysis of solution quality, and present promising experimental evidence showing that many structures of interest in Neuroscience can be extracted reliably from 3-D brain image volumes using our algorithm. 1

3 0.063088134 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification

Author: Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-liang Yu

Abstract: Robust regression and classiﬁcation are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of “loss clipping” can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classiﬁcation problems. 1

4 0.057908297 181 nips-2010-Network Flow Algorithms for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm deﬁned as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a speciﬁc hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network ﬂow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost ﬂow problem. We propose an efﬁcient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1

5 0.055867326 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

Author: Tamir Hazan, Raquel Urtasun

Abstract: In this paper we propose an approximated structured prediction framework for large scale graphical models and derive message-passing algorithms for learning their parameters efﬁciently. We ﬁrst relate CRFs and structured SVMs and show that in CRFs a variant of the log-partition function, known as the soft-max, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for the structured prediction problem, using duality, based on a local entropy approximation and derive an efﬁcient messagepassing algorithm that is guaranteed to converge. Unlike existing approaches, this allows us to learn efﬁciently graphical models with cycles and very large number of parameters. 1

6 0.055803291 228 nips-2010-Reverse Multi-Label Learning

7 0.055621333 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression

8 0.053011864 234 nips-2010-Segmentation as Maximum-Weight Independent Set

9 0.052865218 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

10 0.052295327 91 nips-2010-Fast detection of multiple change-points shared by many signals using group LARS

11 0.052080721 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning

12 0.050334755 282 nips-2010-Variable margin losses for classifier design

13 0.050101008 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

14 0.04894729 257 nips-2010-Structured Determinantal Point Processes

15 0.048237376 118 nips-2010-Implicit Differentiation by Perturbation

16 0.047573622 162 nips-2010-Link Discovery using Graph Feature Tracking

17 0.047532018 167 nips-2010-Mixture of time-warped trajectory models for movement decoding

18 0.046784945 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

19 0.046106633 151 nips-2010-Learning from Candidate Labeling Sets

20 0.04565284 235 nips-2010-Self-Paced Learning for Latent Variable Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.135), (1, 0.03), (2, -0.003), (3, -0.012), (4, -0.04), (5, -0.037), (6, -0.036), (7, 0.013), (8, 0.039), (9, -0.005), (10, -0.067), (11, -0.04), (12, 0.038), (13, 0.074), (14, 0.006), (15, 0.032), (16, -0.012), (17, 0.001), (18, -0.009), (19, -0.068), (20, 0.023), (21, 0.031), (22, 0.026), (23, 0.012), (24, 0.06), (25, -0.064), (26, -0.001), (27, 0.038), (28, 0.015), (29, 0.071), (30, 0.049), (31, 0.051), (32, -0.01), (33, -0.097), (34, -0.07), (35, 0.026), (36, -0.04), (37, -0.054), (38, 0.047), (39, 0.046), (40, 0.048), (41, 0.034), (42, -0.044), (43, -0.074), (44, -0.007), (45, 0.06), (46, 0.032), (47, -0.078), (48, -0.04), (49, -0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9053039 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs

Author: Nikos Karampatziakis

2 0.62242162 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

Author: Tamir Hazan, Raquel Urtasun

3 0.59796387 181 nips-2010-Network Flow Algorithms for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

4 0.58551466 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

Author: Ronny Luss, Saharon Rosset, Moni Shahar

Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efﬁcient methods for each partitioning subproblem through an equivalent representation as a network ﬂow problem, and prove that this sequence of partitions converges to the global solution. These network ﬂow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.

5 0.57285756 234 nips-2010-Segmentation as Maximum-Weight Independent Set

Author: William Brendel, Sinisa Todorovic

Abstract: Given an ensemble of distinct, low-level segmentations of an image, our goal is to identify visually “meaningful” segments in the ensemble. Knowledge about any speciﬁc objects and surfaces present in the image is not available. The selection of image regions occupied by objects is formalized as the maximum-weight independent set (MWIS) problem. MWIS is the heaviest subset of mutually non-adjacent nodes of an attributed graph. We construct such a graph from all segments in the ensemble. Then, MWIS selects maximally distinctive segments that together partition the image. A new MWIS algorithm is presented. The algorithm seeks a solution directly in the discrete domain, instead of relaxing MWIS to a continuous problem, as common in previous work. It iteratively ﬁnds a candidate discrete solution of the Taylor series expansion of the original MWIS objective function around the previous solution. The algorithm is shown to converge to an optimum. Our empirical evaluation on the benchmark Berkeley segmentation dataset shows that the new algorithm eliminates the need for hand-picking optimal input parameters of the state-of-the-art segmenters, and outperforms their best, manually optimized results.

6 0.52067792 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods

7 0.4801861 29 nips-2010-An Approximate Inference Approach to Temporal Optimization in Optimal Control

8 0.47491866 77 nips-2010-Epitome driven 3-D Diffusion Tensor image segmentation: on extracting specific structures

9 0.46454114 118 nips-2010-Implicit Differentiation by Perturbation

10 0.46051651 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

11 0.45207757 50 nips-2010-Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories

12 0.45143706 269 nips-2010-Throttling Poisson Processes

13 0.45045918 187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization

14 0.44985121 290 nips-2010-t-logistic regression

15 0.44727162 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning

16 0.43528566 1 nips-2010-(RF)^2 -- Random Forest Random Field

17 0.43212384 228 nips-2010-Reverse Multi-Label Learning

18 0.42975017 2 nips-2010-A Bayesian Approach to Concept Drift

19 0.42654946 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

20 0.42623222 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.019), (17, 0.018), (27, 0.036), (30, 0.046), (35, 0.011), (45, 0.671), (50, 0.028), (52, 0.012), (60, 0.021), (77, 0.034), (78, 0.014), (90, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99917281 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs

Author: Nikos Karampatziakis

2 0.99850756 235 nips-2010-Self-Paced Learning for Latent Variable Models

Author: M. P. Kumar, Benjamin Packer, Daphne Koller

Abstract: Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the self-paced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif ﬁnding and handwritten digit recognition. 1

3 0.99846035 11 nips-2010-A POMDP Extension with Belief-dependent Rewards

Author: Mauricio Araya, Olivier Buffet, Vincent Thomas, Françcois Charpillet

Abstract: Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes. 1

4 0.99831086 187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization

Author: Alper Ayvaci, Michalis Raptis, Stefano Soatto

Abstract: We tackle the problem of simultaneously detecting occlusions and estimating optical ﬂow. We show that, under standard assumptions of Lambertian reﬂection and static illumination, the task can be posed as a convex minimization problem. Therefore, the solution, computed using efﬁcient algorithms, is guaranteed to be globally optimal, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance. 1

5 0.99791217 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

Author: Daniel Lowd, Pedro Domingos

Abstract: Arithmetic circuits (ACs) exploit context-speciﬁc independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the ﬁrst ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we ﬁnd that the methods based on sampling and learning work best: one such method (AC2 -F) is faster and usually more accurate than loopy belief propagation, mean ﬁeld, and Gibbs sampling; another (AC2 -G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines. 1

6 0.99745899 167 nips-2010-Mixture of time-warped trajectory models for movement decoding

7 0.99597913 100 nips-2010-Gaussian Process Preference Elicitation

8 0.99563926 133 nips-2010-Kernel Descriptors for Visual Recognition

9 0.99548185 108 nips-2010-Graph-Valued Regression

10 0.98184025 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

11 0.97755772 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

12 0.97623962 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning

13 0.97317111 144 nips-2010-Learning Efficient Markov Networks

14 0.97198808 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

15 0.97124714 197 nips-2010-Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets

16 0.96944046 118 nips-2010-Implicit Differentiation by Perturbation

17 0.96884298 61 nips-2010-Direct Loss Minimization for Structured Prediction

18 0.96880662 212 nips-2010-Predictive State Temporal Difference Learning

19 0.96711659 94 nips-2010-Feature Set Embedding for Incomplete Data

20 0.96605021 25 nips-2010-Active Learning by Querying Informative and Representative Examples