hunch_net hunch_net-2006 hunch_net-2006-171 knowledge-graph by maker-knowledge-mining

171 hunch net-2006-04-09-Progress in Machine Translation

meta infos for this blog

Source: html

Introduction: I just visited ISI where Daniel Marcu and others are working on machine translation. Apparently, machine translation is rapidly improving. A particularly dramatic year was 2002->2003 when systems switched from word-based translation to phrase-based translation. From a (now famous) slide by Charles Wayne at DARPA (which funds much of the work on machine translation) here is some anecdotal evidence: 2002 2003 insistent Wednesday may recurred her trips to Libya tomorrow for flying. Cairo 6-4 ( AFP ) – An official announced today in the Egyptian lines company for flying Tuesday is a company “insistent for flying” may resumed a consideration of a day Wednesday tomorrow her trips to Libya of Security Council decision trace international the imposed ban comment. And said the official “the institution sent a speech to Ministry of Foreign Affairs of lifting on Libya air, a situation her recieving replying are so a trip will pull to Libya a morning Wednesday.” E

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I just visited ISI where Daniel Marcu and others are working on machine translation. [sent-1, score-0.081]

2 A particularly dramatic year was 2002->2003 when systems switched from word-based translation to phrase-based translation. [sent-3, score-0.449]

3 From a (now famous) slide by Charles Wayne at DARPA (which funds much of the work on machine translation) here is some anecdotal evidence: 2002 2003 insistent Wednesday may recurred her trips to Libya tomorrow for flying. [sent-4, score-0.552]

4 Cairo 6-4 ( AFP ) – An official announced today in the Egyptian lines company for flying Tuesday is a company “insistent for flying” may resumed a consideration of a day Wednesday tomorrow her trips to Libya of Security Council decision trace international the imposed ban comment. [sent-5, score-1.254]

5 And said the official “the institution sent a speech to Ministry of Foreign Affairs of lifting on Libya air, a situation her recieving replying are so a trip will pull to Libya a morning Wednesday. [sent-6, score-0.734]

6 ” Egyptair has tomorrow to Resume Its flight to Libya. [sent-7, score-0.269]

7 Cairo 4-6 (AFP) – said an official at the Egyptian Aviation Company today that the company egyptair may resume as of tomorrow, Wednesday its flight to Libya after the International Security Council resolution to the suspension of the embargo imposed on Libya. [sent-8, score-1.114]

8 “The official said that the company had sent a letter to the Ministry of Foreign Affairs, information on the lifting of the air embargo on Libya, where it had received a response, the firt take off a trip to Libya on Wednesday morning”. [sent-9, score-1.087]

9 The machine translation systems are becoming effective at the “produces mostly understandable although broken output”. [sent-10, score-0.424]

10 A service might deliver translations of web pages into your native language. [sent-12, score-0.534]

11 When properly integrated into the web browser, it will appear as if every webpage uses your native language (although maybe in a broken-but-understandable way). [sent-14, score-0.282]

12 An instant message service might deliver translations into whichever language you specify allowing communication with more people. [sent-16, score-0.58]

13 At this point, the feasibility of these applications is a matter of engineering and “who pays for it” coordination rather than technology development. [sent-17, score-0.168]

14 There remain significant research challenges in tackling nonstudied language pairs and in improving the existing technology. [sent-18, score-0.081]

15 ) where the machine translation version of a Turing test is passed: humans can not distinguish between a machine translated sentence and a human translated sentence. [sent-20, score-0.711]

16 A key observation here is that machine translation does not require full machine understanding of natural language. [sent-21, score-0.505]

17 The source of machine translation success seems to be a combination of better models (switching to phrase-based translation made a huge leap), application of machine learning technology, and big increases in the quantity of data available. [sent-22, score-0.848]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('libya', 0.437), ('translation', 0.343), ('wednesday', 0.222), ('official', 0.194), ('company', 0.187), ('tomorrow', 0.166), ('afp', 0.125), ('cairo', 0.125), ('council', 0.125), ('egyptair', 0.125), ('egyptian', 0.125), ('embargo', 0.125), ('foreign', 0.125), ('insistent', 0.125), ('lifting', 0.125), ('ministry', 0.125), ('translations', 0.125), ('trips', 0.125), ('said', 0.115), ('flying', 0.111), ('instant', 0.111), ('affairs', 0.111), ('resume', 0.111), ('service', 0.111), ('trip', 0.111), ('web', 0.104), ('translated', 0.103), ('morning', 0.103), ('flight', 0.103), ('deliver', 0.097), ('native', 0.097), ('security', 0.092), ('air', 0.089), ('sent', 0.086), ('language', 0.081), ('machine', 0.081), ('today', 0.079), ('imposed', 0.075), ('international', 0.075), ('technology', 0.062), ('whichever', 0.055), ('coordination', 0.055), ('dramatic', 0.055), ('funds', 0.055), ('letter', 0.055), ('trace', 0.055), ('tuesday', 0.055), ('switched', 0.051), ('leap', 0.051), ('pays', 0.051)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 171 hunch net-2006-04-09-Progress in Machine Translation

2 0.072805032 160 hunch net-2006-03-02-Why do people count for learning?

Introduction: This post is about a confusion of mine with respect to many commonly used machine learning algorithms. A simple example where this comes up is Bayes net prediction. A Bayes net where a directed acyclic graph over a set of nodes where each node is associated with a variable and the edges indicate dependence. The joint probability distribution over the variables is given by a set of conditional probabilities. For example, a very simple Bayes net might express: P(A,B,C) = P(A | B,C)P(B)P(C) What I don’t understand is the mechanism commonly used to estimate P(A | B, C) . If we let N(A,B,C) be the number of instances of A,B,C then people sometimes form an estimate according to: P’(A | B,C) = N(A,B,C) / N /[N(B)/N * N(C)/N] = N(A,B,C) N /[N(B) N(C)] … in other words, people just estimate P’(A | B,C) according to observed relative frequencies. This is a reasonable technique when you have a large number of samples compared to the size space A x B x C , but it (nat

3 0.05799602 95 hunch net-2005-07-14-What Learning Theory might do

Introduction: I wanted to expand on this post and some of the previous problems/research directions about where learning theory might make large strides. Why theory? The essential reason for theory is “intuition extension”. A very good applied learning person can master some particular application domain yielding the best computer algorithms for solving that problem. A very good theory can take the intuitions discovered by this and other applied learning people and extend them to new domains in a relatively automatic fashion. To do this, we take these basic intuitions and try to find a mathematical model that: Explains the basic intuitions. Makes new testable predictions about how to learn. Succeeds in so learning. This is “intuition extension”: taking what we have learned somewhere else and applying it in new domains. It is fundamentally useful to everyone because it increases the level of automation in solving problems. Where next for learning theory? I like the a

4 0.056664135 297 hunch net-2008-04-22-Taking the next step

Introduction: At the last ICML , Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now. The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers pag

5 0.056026131 239 hunch net-2007-04-18-$50K Spock Challenge

Introduction: Apparently, the company Spock is setting up a $50k entity resolution challenge . $50k is much less than the Netflix challenge, but it’s effectively the same as Netflix until someone reaches 10% . It’s also nice that the Spock challenge has a short duration. The (visible) test set is of size 25k and the training set has size 75k.

6 0.055477329 121 hunch net-2005-10-12-The unrealized potential of the research lab

7 0.053150758 464 hunch net-2012-05-03-Microsoft Research, New York City

8 0.052356929 194 hunch net-2006-07-11-New Models

9 0.051367559 424 hunch net-2011-02-17-What does Watson mean?

10 0.04891992 97 hunch net-2005-07-23-Interesting papers at ACL

11 0.046962984 146 hunch net-2006-01-06-MLTV

12 0.045563377 70 hunch net-2005-05-12-Math on the Web

13 0.043902796 84 hunch net-2005-06-22-Languages of Learning

14 0.043731507 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?

15 0.043362662 94 hunch net-2005-07-13-Text Entailment at AAAI

16 0.042292677 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

17 0.041208878 50 hunch net-2005-04-01-Basic computer science research takes a hit

18 0.04101846 143 hunch net-2005-12-27-Automated Labeling

19 0.040791374 120 hunch net-2005-10-10-Predictive Search is Coming

20 0.040674862 316 hunch net-2008-09-04-Fall ML Conferences

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.089), (1, -0.009), (2, -0.063), (3, 0.035), (4, -0.015), (5, -0.008), (6, -0.009), (7, 0.035), (8, -0.0), (9, -0.038), (10, -0.024), (11, 0.005), (12, 0.0), (13, 0.017), (14, -0.04), (15, 0.002), (16, -0.02), (17, 0.007), (18, 0.024), (19, -0.024), (20, 0.004), (21, -0.032), (22, 0.041), (23, -0.024), (24, -0.008), (25, 0.034), (26, -0.013), (27, -0.012), (28, 0.008), (29, -0.008), (30, 0.024), (31, 0.079), (32, 0.055), (33, -0.0), (34, -0.027), (35, -0.006), (36, -0.007), (37, -0.018), (38, -0.03), (39, -0.049), (40, 0.07), (41, -0.017), (42, -0.012), (43, -0.046), (44, -0.049), (45, 0.045), (46, 0.057), (47, -0.019), (48, 0.081), (49, 0.008)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.92004895 171 hunch net-2006-04-09-Progress in Machine Translation

2 0.53993165 63 hunch net-2005-04-27-DARPA project: LAGR

Introduction: Larry Jackal has set up the LAGR (“Learning Applied to Ground Robotics”) project (and competition) which seems to be quite well designed. Features include: Many participants (8 going on 12?) Standardized hardware. In the DARPA grand challenge contestants entering with motorcycles are at a severe disadvantage to those entering with a Hummer. Similarly, contestants using more powerful sensors can gain huge advantages. Monthly contests, with full feedback (but since the hardware is standardized, only code is shipped). One of the premises of the program is that robust systems are desired. Monthly evaluations at different locations can help measure this and provide data. Attacks a known hard problem. (cross country driving)

3 0.53180438 424 hunch net-2011-02-17-What does Watson mean?

Introduction: Watson convincingly beat the best champion Jeopardy! players. The apparent significance of this varies hugely, depending on your background knowledge about the related machine learning, NLP, and search technology. For a random person, this might seem evidence of serious machine intelligence, while for people working on the system itself, it probably seems like a reasonably good assemblage of existing technologies with several twists to make the entire system work. Above all, I think we should congratulate the people who managed to put together and execute this project—many years of effort by a diverse set of highly skilled people were needed to make this happen. In academia, it’s pretty difficult for one professor to assemble that quantity of talent, and in industry it’s rarely the case that such a capable group has both a worthwhile project and the support needed to pursue something like this for several years before success. Alina invited me to the Jeopardy watching party

4 0.52229464 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations

Introduction: Machine learning algorithms have a much better chance of being widely adopted if they are implemented in some easy-to-use code. There are several important concerns associated with machine learning which stress programming languages on the ease-of-use vs. speed frontier. Speed The rate at which data sources are growing seems to be outstripping the rate at which computational power is growing, so it is important that we be able to eak out every bit of computational power. Garbage collected languages ( java , ocaml , perl and python ) often have several issues here. Garbage collection often implies that floating point numbers are “boxed”: every float is represented by a pointer to a float. Boxing can cause an order of magnitude slowdown because an extra nonlocalized memory reference is made, and accesses to main memory can are many CPU cycles long. Garbage collection often implies that considerably more memory is used than is necessary. This has a variable effect. I

5 0.51937026 291 hunch net-2008-03-07-Spock Challenge Winners

Introduction: The spock challenge for named entity recognition was won by Berno Stein , Sven Eissen, Tino Rub, Hagen Tonnies, Christof Braeutigam, and Martin Potthast .

6 0.51176614 84 hunch net-2005-06-22-Languages of Learning

7 0.51094681 94 hunch net-2005-07-13-Text Entailment at AAAI

8 0.49170986 178 hunch net-2006-05-08-Big machine learning

9 0.47882855 125 hunch net-2005-10-20-Machine Learning in the News

10 0.47322479 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?

11 0.46755767 239 hunch net-2007-04-18-$50K Spock Challenge

12 0.44554889 389 hunch net-2010-02-26-Yahoo! ML events

13 0.43635324 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

14 0.433281 276 hunch net-2007-12-10-Learning Track of International Planning Competition

15 0.42807111 120 hunch net-2005-10-10-Predictive Search is Coming

16 0.42628959 237 hunch net-2007-04-02-Contextual Scaling

17 0.42395762 112 hunch net-2005-09-14-The Predictionist Viewpoint

18 0.4217301 162 hunch net-2006-03-09-Use of Notation

19 0.42124122 423 hunch net-2011-02-02-User preferences for search engines

20 0.41622305 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.017), (10, 0.019), (26, 0.579), (27, 0.061), (38, 0.021), (53, 0.026), (55, 0.064), (64, 0.012), (94, 0.052), (95, 0.032)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94302732 171 hunch net-2006-04-09-Progress in Machine Translation

2 0.92562598 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound

Introduction: Textbooks invariably seem to carry the proof that uses Markov’s inequality, moment-generating functions, and Taylor approximations. Here’s an easier way. For , let be the KL divergence between a coin of bias and one of bias : Theorem: Suppose you do independent tosses of a coin of bias . The probability of seeing heads or more, for , is at most . So is the probability of seeing heads or less, for . Remark: By Pinsker’s inequality, . Proof Let’s do the case; the other is identical. Let be the distribution over induced by a coin of bias , and likewise for a coin of bias . Let be the set of all sequences of tosses which contain heads or more. We’d like to show that is unlikely under . Pick any , with say heads. Then: Since for every , we have and we’re done.

3 0.81137753 17 hunch net-2005-02-10-Conferences, Dates, Locations

Introduction: Conference Locate Date COLT Bertinoro, Italy June 27-30 AAAI Pittsburgh, PA, USA July 9-13 UAI Edinburgh, Scotland July 26-29 IJCAI Edinburgh, Scotland July 30 â€“ August 5 ICML Bonn, Germany August 7-11 KDD Chicago, IL, USA August 21-24 The big winner this year is Europe. This is partly a coincidence, and partly due to the general internationalization of science over the last few years. With cuts to basic science in the US and increased hassle for visitors, conferences outside the US become more attractive. Europe and Australia/New Zealand are the immediate winners because they have the science, infrastructure, and english in place. China and India are possible future winners.

4 0.63121516 305 hunch net-2008-06-30-ICML has a comment system

Introduction: Mark Reid has stepped up and created a comment system for ICML papers which Greger Linden has tightly integrated. My understanding is that Mark spent quite a bit of time on the details, and there are some cool features like working latex math mode. This is an excellent chance for the ICML community to experiment with making ICML year-round, so I hope it works out. Please do consider experimenting with it.

5 0.57457316 97 hunch net-2005-07-23-Interesting papers at ACL

Introduction: A recent discussion indicated that one goal of this blog might be to allow people to post comments about recent papers that they liked. I think this could potentially be very useful, especially for those with diverse interests but only finite time to read through conference proceedings. ACL 2005 recently completed, and here are four papers from that conference that I thought were either good or perhaps of interest to a machine learning audience. David Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation . (Best paper award.) This paper takes the standard phrase-based MT model that is popular in our field (basically, translate a sentence by individually translating phrases and reordering them according to a complicated statistical model) and extends it to take into account hierarchy in phrases, so that you can learn things like “X ‘s Y” -> “Y de X” in chinese, where X and Y are arbitrary phrases. This takes a step toward linguistic syntax for MT, whic

6 0.50813264 25 hunch net-2005-02-20-At One Month

7 0.4003143 43 hunch net-2005-03-18-Binomial Weighting

8 0.19633867 202 hunch net-2006-08-10-Precision is not accuracy

9 0.19286625 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)

10 0.19042151 160 hunch net-2006-03-02-Why do people count for learning?

11 0.18632786 144 hunch net-2005-12-28-Yet more nips thoughts

12 0.18324696 423 hunch net-2011-02-02-User preferences for search engines

13 0.18168209 454 hunch net-2012-01-30-ICML Posters and Scope

14 0.18043107 40 hunch net-2005-03-13-Avoiding Bad Reviewing

15 0.1802198 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous

16 0.18007153 437 hunch net-2011-07-10-ICML 2011 and the future

17 0.17837845 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making

18 0.17765164 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers

19 0.17757078 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

20 0.17752105 464 hunch net-2012-05-03-Microsoft Research, New York City