北京翻译公司 0086 10-82115891, 0086 21-31200158
001 647 624 9243, 0061 02 91885890
 
翻译样稿 更多>>
· 食品卫生例行检查与新食品卫...
· 网站翻译样稿:北方故事旅行社
· 加拿大旅游网站翻译:北极光...
· 英中旅游网站翻译样稿:在市...
· 北方故事旅行社北极光之旅网...
· 杰克 韦尔奇领导辞典图书翻...
· 脑机界面的进展_美国国家工...
· 脑机界面的进展_美国国家工...
· 中国人民解放军境内外练兵方...
· 中国人民解放军境内外练兵方...
小语种翻译业绩 更多>>
· 法语翻译业绩
· 德语翻译业绩
· 俄语翻译业绩
· 日语翻译业绩
· 西班牙语翻译业绩
· 韩语翻译业绩
· 意大利语翻译业绩
· 葡萄牙语翻译业绩
电子通信英译中翻译样稿
当前位置:首页 > 翻译样稿 > 电子通信英译中翻译样稿

搜索统计图_美国国家工程院2011年美国工程前沿研讨会上宣读的论文(节选)_英文原文_20120027-8

Searching for Statistical Diagrams
Shirley Zhe Chen, Michael J. Cafarella, and Eytan Adar
University of Michigan
INTRODUCTION
Statistical, or data-driven, diagrams are an important method for communicating complex information. For many technical documents, the diagrams may be readers’ only access to the raw data underlying the documents’ conclusions.
Unfortunately, finding diagrams online is very difficult using current search systems. Standard text-based search will only retrieve the diagrams’ enclosing documents. Web image search engines may retrieve some diagrams, but they generally work by examining textual content that surrounds images, thus missing out on many important signals of diagram content (Bhatia et al., 2010; Carberry et al., 2006). Even the text that is present in diagrams has meaning that is hugely dependent on their geometric positioning within the diagram’s frame; a number in the caption means something quite different from the same number in the x-axis scale (Bertin, 1983).
There has been growing commercial interest in making data-driven diagrams more accessible, with data search systems such as SpringerImages (http://www.springerimages.com/ ) and Zanran (http://www.zanran.com/q/ ). While there is a huge amount of research literature on search and image-related topics, diagram search per se is largely unexplored.
In this paper we propose a Web search engine exclusively for data-driven diagrams. As with other Web search engines, our system allows the user to enter keywords into a text box in order to obtain a relevance-ranked list of objects. Our system addresses several challenges that are common among different search engines but that require solutions specifically tailored for data-driven diagrams.
Diagram Corpus Extraction
Obtaining the text of a Web document is usually as easy as downloading and parsing an HTML file; in contrast, statistical diagrams require special processing to extract useful information. They are embedded in PDFs with little to distinguish them from surrounding text, the text embedded in a diagram is highly stylized with meaning that is very sensitive to the text’s precise role, and, because diagrams are often an integral part of a highly engineered document, they can have extensive “implicit hyperlinks” in the form of figure references from the body of the surrounding text. Our Diagram Extractor component attempts to recover all of the relevant text for a diagram and determine an appropriate semantic label (caption, y-axis label, etc.) for each string.
Ranking Quality
All search engines must figure out how to score an object’s relevance to a search query, but scoring diagrams for relevance can yield strange and surprising results. We use the metadata extracted from the previous step to obtain search quality that is substantially better than naive methods.
Snippet Generation
Small summaries of the searched-for content, usually called snippets, allow users to quickly scan a large number of results before actually selecting one. Conventional search engines select regions of text from the original documents, while image search engines generally scale down the original image to a small thumbnail. Neither technique can be directly applied to data-driven diagrams.
Obviously, textual techniques will not capture any visual elements. Figure 1 shows that image scaling is also ineffective: although photos and images remain legible at smaller sizes, diagrams quickly become difficult to understand.
This paper describes DiagramFlyer, a search engine for finding data-driven diagrams in Web documents. It addresses each of the above challenges, yielding a search engine that successfully extracts diagram metadata in order to provide both higher-quality ranking and improved diagram “snippets” for fast search result scanning.
The techniques we propose are general and can work across diagrams found throughout the Web. However, in our current testbed we concentrate on diagrams extracted from PDFs that were discovered and downloaded from public Web pages on academic Internet domains. Our resulting corpus contains 153,000 PDFs and 319,000 diagrams. We show that DiagramFlyer obtains a 52% improvement in search quality over naive approaches. Furthermore, we show that DiagramFlyer’s hybrid snippet generator allows users to find results 33% more accurately than with a standard image-driven snippet. We also place DiagramFlyer’s intellectual contributions in a growing body of work on domain-independent information extraction—techniques that enable retrieval of structured data items from unstructured documents, even when the number of topics (or domains) is unbounded.
 
原件下载:
翻译语种 更多>>
英语翻译 德语翻译 法语翻译
俄语翻译 日语翻译 韩语翻译
西班牙语 葡萄牙语 荷兰语翻译
乌克兰语 意大利语 波兰语翻译
丹麦语翻译 希腊语翻译 泰语翻译
瑞典语翻译 越南语翻译 阿拉伯语
专业范围 更多>>
· 安全环保 · 电力能源 · 银行保险
· 法学翻译 · 天文地理 · 钢铁冶金
· 航空航天 · 道路桥梁 · 地质采矿
· 建筑工程 · 金融财会 · 经济管理
· 交通运输 · 仪器仪表 · 医疗器械
· 医药卫生 · 石油化工 · 机械电子
小语种译员 更多>>
· 黄女士 法国佩皮尼昂大学硕...
· 法语翻译 核电专业法语翻译...
· 熊先生 法国某大学市场营销...
· 陆先生 国际经济与贸易本科...
· 宁先生 法国南特大学 工商...
北京翻译公司 地址:海淀区太阳园4号楼1507室 电话:010-82115891 82115892 bjhyw@263.net QQ:800022641
上海翻译公司 地址:上海市闵行区古美路443弄10号楼804 电话:021-31200158 shkehu@263.net, QQ:390645976
美国翻译公司 地址:450 N Atlantic Blvd Monterey Park, CA 91754, Tel:1 626 768 3096 信箱chinatranslation.net@gmail.com
加拿大翻译公司 地址:46 Ealing Dr, North York, Toronto, ON, M2L 2R5 电话:647 624 9243 bjctn@vip.sina.com
太原翻译公司 地址:太原市万柏林区迎泽西大街奥林匹克花园7D202 电话:15034183909 Email:tykehu@163.com
澳大利亚Chinese Translation客服电话:61 02 91885890,国内其他地区统一电话:950 404 80511  
京ICP备05038718号-3
北京翻译公司