Translating
and the Computer 31 Conference
Day
One: 19 November 2009
Go to Day Two
08.30 Registration
09.00 Introduction by Chair: Daniel Grasmick, Lucy Software and Services, Germany
09.05 CHANGE TO THE ADVERTISED PROGRAMME:
.......MyMemory, creating the world's largest translation memory
.......Marco Trombetti of TRANSLATED
This paper will present a fast growing 5 billion word collaborative translation memory. The presentation will focus on how collaborative models and open standard can help to create value for customers, LSP and translators. Technical challenges and barriers to collaboration will be discussed. Translated will also propose an open standard that will allow CAT producers to connect with large web-based translation memories will suggest of new business models for the industry. MyMemory is available for beta testing at http://mymemory.translated.net
09.45 Discussion
09.50CHANGE TO THE ADVERTISED PROGRAMME:
Yours or Mine or What?The simple complexity of copyright of translations
......Abraham de Wolf, Lucy Software and Services, Germany
The presentation will offer the audience a basic structure with which copyright issues can be identified and understood. The focus will be on the use of software for translations and addresses, under what conditions who owns what and who else might have rights of use that exist aside from ownership rights. Copyright aspects of the use of free internet translation machines and collaborative translation platforms will also be looked at. And to round things up some basic recommendations for how to address copyright issues in contracts with customers, free lancers and employees will be given.
10.30 Discussion
10.35 Coffee
11.00 Designing a Collaborative Multilingual Terminology Platform
Alain Désilets, National Research Council of Canada
Terminology Databases are still one of the most commonly used tools by translators, and
corpus-based tools like Translation Memories and Statistical Machine Translation with Post Editing, have not yet displaced them. One disadvantage of terminology databases, compared to such corpus-based approach, is that they are costly to create, since they rely on human labour. It has been suggested that the labour cost of building large databases could be distributed across a large number of individuals, through a massive collaboration process like Wikipedia. We refer to this process as Collaborative Multilingual Terminology (CMT). This paper describes how we designed and implemented software for supporting CMT, by combining features from both the database management systems and collaborative wiki systems. This system, which we refer to as Tiki-CMT, is built on top of the TikiWiki Content Management System.
11.35 Discussion
11.40 Towards an Effective Tool-kit for translators
Andreas Eisele, Universitat des Saarlandes, Germanyand James Hodson, Princeton University, USA
As advances in the field of machine translation (MT) continue to allow for greater distribution of multi-lingual information, an increasing number of web based tool-kits are being developed. However, these systems (such as Google's Translator Tool-kit) have failed to satisfy the translation community due to a fundamental misunderstanding of the way translators work. The aim of this project has been to reduce the time that translators spend on the whole transfer process - by realising that an MT core can provide much more than a raw translation, even at this relatively early stage of its development. This paper evaluates the shortcomings of existing systems and proposes a new integrated online-toolkit that addresses these limitations based on an understanding of the most significant and unnecessary struggles between translators and MT systems.
12.15 Discussion
12.20 Computer Aided Translation Backed by Machine Translation
Ondrej Odchazel and Ondrej Bojar, Charles University Prague
The aim of this paper is to show the development if an online application to support translators (computer-aided translation, CAT) based on the modern AJAX technology on the client side and Moses machine translation (MT) system on the server side. The tool simplifies and accelerates the process of text translation. The paper investigates methods that tightly couple an MT system with an online text editor. The proposed tool differs to other CAT tools in the ability to suggest translation of unseen sentences and/ or the ability to present several candidates for translation of the complete sentence. The paper will show that with the use of a good user interface, an MT system can accelerate and simplify the process of translation.
12.55 Discussion
13.00 Lunch and Exhibition
14.15 Introduction by Chair – Olaf-Michael Stefanov, Austria
14.15 Panel Discussion: Crowdsourcing - Chaired by Alain Désilets, National Research Council of Canada and Olaf-Michael Stefanov, Austria, with input from Rajat Gupta, University of Limerick, Ireland
Translation crowdsourcing - the process of having translation work done by a large group of potentially amateur translators is slowly gaining popularity as a business model for certain corporations. The most highly publicised example of this is Facebook, the popular social networking web site that put localization of its user interface in the hands of end users. In this panel, various experts will touch on many questions that are burning the lips of professional translators, and translation managers. For example:
- Does it actually save money?
- Does quality suffer, and if so, how to do quality control in this kind of context?
- What technological tools and what new processes are needed to manage a crowd of translators?
- Is crowdsourcing most effective in some circumstances than in others?
- Will this lead to a de-skilling of translation? Will professional translators ultimately loose their jobs to a mob of unskilled drones?
- Are there hybrid models where the "crowd" might consist of a large group of loosely coordinated professional translators?
- Can crowdsourcing, combined with Machine Translation, and yes, traditional translation by professionals, work synergetically to bridge the growing gap between supply and demand for translation?
15.15 Tea break
15.40 Minna no Hon’yaku: A Website for Hosting, Archiving and Promoting Translations
National Institute of Informatics, Japan
This paper introduces the system "Minna no Hon'yaku" ("Translation for everyone"; henceforth MNH). MNH is a web site for aiding online translators, facilitating the development of online translators' communities by hosting translation activities, archiving translations with their originals, and publishing translated information. In MNH, the strategy for archiving translations and originals is simple. Using the world wide web to manually search translations and their originals, these are then stored in MNH. In previous research, 160 English and Japanese texts (approximately 1.4 million English words) were collected in this way. The design principle of the hosting service on MNH is threefold:1) Volunteer translators can translate their texts easily by using tools available on MNH; 2) They can publish their translations directly using MNH platform; 3) Socialize translation activities and nurture translation communities. The MNH website has been publicly available since April 2009. At the time of abstract submission, more than 550 translators have registered , more that 1,350 translation document pairs have accumulated, which counts for more than 11, 000 translation block pairs. Similar services have become available, such as Google Translation Toolkit, the quality of translation-aid functions and hosting functions provided by MNH remains unique.
16.15 Discussion
16.20 Cloud Computing in GILT Ecosystems and Evolution
Sven Christian Andrä, Andrä AG and Jörg Schütz, bioloom group, Germany
With the ever growing volumes of dynamic content enterprises are faced with serious difficulties to handle, manage and analyse information in many languages and across different cultural boundaries. Therefore, there is a need to employ more open and collaborative approaches based on recent Web technologies and the concepts of utility computing to allow the existing language ecosystems to successfully evolve to the next generation of technology offerings. One recent innovation driver in this scenario is cloud computing. Cloud Computing is the continuous development of a variety of technologies that can alter an enterprise's approach to build, maintain and leverage an IT infrastructure, for the language industry and the GILT service communities in particular, however, it serves as a next generation globalisation enabler. It offers new ways of building, offering and delivering translingual services that will further transmute into transcultural services. This presentation will discuss the various emerging language ecosystems with new collaborative approaches and business models based on cloud computing technology platforms.
16.55 Discussion
17.00 End of Day One, followed by drinks reception to 19.00
Day Two 20 November
09.00 Introduction by Chair: Chris Pyne, SAP, Germany
09.05 Copyright Issues in Translation Memory Ownership
Ross Smith, PricewaterhouseCoopers, Spain
Over the last two decades the utilisation of terminological databases and translation memory (TM) software has become increasingly popular in the professional translation community. Specialised web sites have recently been created in order to trade in these “TM assets”. A decisive step has therefore been taken, as translation memory and termbank files can now be bought, sold and licensed as individual commodities. Who actually owns these assets? In other words, who holds the intellectual property rights to these resources and what legal protection is available to them? The purpose of this paper is to clarify the basic issues with regard to understanding who owns what, in the light of the studies published to date by specialists on the subject and particularly from the perspective of intellectual property legislation in the European Union and the United States.
09.45 Discussion
09.50 Exploring Translation Memory for Extensibility Across Genres: Implications for Usage and Metrics
Carol Van Ess-Dykema, National Virtual Translation Center, Washington, DC, USA
TM has shown its best light for translating document sets with large amounts of recurring text, either across the breadth of a document (e.g., bibliographic footers on journal articles) and across the document’s history (e.g., unchanged passages in an updated system manual). Less is known about the benefit of TM for translating other genres, such as transcripts of audio interviews or email traffic. Passages in such genres will likely be less recurrent than in system manuals, yet there may be significant potential for TM in these genres nonetheless. The National Virtual Translation Center (NVTC), along with the Naval Research Laboratory (NRL) and the National Institute of Standards and Technology (NIST), have conducted the first in a series of experiments on the usability and effectiveness of TM for non-assessed genres. A variety of different measures have been captured, perhaps most importantly the assessment by professional translators, and indirectly by quality control professionals, of the net benefit of translation memory for the translation process. This paper is a report of the methodology, the execution, and analysis of this pilot experiment. Measuring the actions and assessment of professional users, the experiment measures the potential contribution of a TM system to a series of controlled translation and quality control activities in Russian, Arabic, and Chinese into English.
10.25 Discussion
10.30 Coffee
11.00 Language Technology for Multilingual Information and Document Management
Paul Schmidt and Mahmoud Gindiyeh, University of Saarbrücken, Germany
The paper introduces a multilingual information processing system for expert knowledge (technical engineering) developed in a project funded by the German Ministry of Economy. The innovation mainly lies in the approach to term processing, and in a specific integration of a vector based approach to classification. The system has been developed to be used in the actual workflow. The paper will present all the components in detail. The innovation of the system is that it is based on high quality language technology. The paper will provide figures that measure progress compared to other technology. The technology can be used for all kinds of multilingual retrieval tasks.
11.35 Discussion
11.40 Tiptoeing Towards TBX
Dave Calvert, TransForm, Germany
Affordable strategies for terminological research at a language service provider are extremely constrained. Internet discipline, the use of known research sources such as bilingual corpuses, and a structured approach to using Google are essential. Sites based on a user-generated content are useful when approached with caution. Where sourced terminology is not available, a combination of informed creativity and consistent practice has to fill the gap. Terminology records generated during the course of a translation job should be validated before storage. Further constraints on terminology work in an LSP include software and file format compatibility, and access to specific packages. Taking TransForm GmbH as an example, our terminology is stored in separate MultiTerm 5.5 databases for each customer. We send out presegmented Word files with inserted terminology for translation. External translators supply terminology in MS Word glossaries. In-house translators use the intranet-based system. The consequences of this situation include restricted interoperability, a need to run concurrent, incompatible systems, and pressure to upgrade to extremely expensive server-based solutions. We thus face an increasing need for a means of consolidating and maintaining terminology independently of proprietary formats. he emergence of the open standard TBX, which is being adopted by many of the main TM system vendors, has opened the door to fulfilling this need.
12.15 Discussion
12.20 Automatic Indexing and Concordances for Any Language
Neil Rees & Jon Riding, British and Foreign Bible Society, UK
Many translations are prepared for constituencies where the general knowledge of the text may be quite low. Identifying the key pericopes of scripture in an extent of more than 1,000 pages can be challenging. Help for the reader is most commonly given in the form of a concordance which lists the important narratives and other key areas of the text indexed by the key words which most closely represent the themes of the text. Concordances are not restricted to the Bible. Many important literary corpora have concordances to aid study as do some technical and academic publications. Creating such indexing systems can be time consuming and costly, particularly where a document is translated into many languages, all of which require similar indexes to be built. The British & Foreign Bible Society Concordance Builder system lets any Bible Society build a short concordance to a completed translation project. Taking an existing concordance as a model the system glosses the model key words against a new translation. This takes about 3 minutes. The results can be edited by the translation team. The program has been used by Bible Societies all over the world for concordances of different lengths. The easy to use editing interface can be localised for different languages. The presentation will conclude with a demonstration of the Concordance Builder program. Its benefits and limitations will be discussed. Samples of completed concordances produced by this method will be shown.
12.55 Discussion
13.00 Lunch and Exhibition
14.00 Introduction by Chair: Professor Ruslan Mitkov, University of Wolverhampton, UK
14.00 Panel Discussion Memories are made of this
Chairs: Chris Pyne, SAP, Germany and Reinhard Schäler, University of Limerick, Ireland
Translation Memories (TMs) bring down the cost of translation, their use leads to better translations as they enforce consistency, reusing previous translations shortens translation time. They are the property of clients who pay for the translations. Most would agree with these statements. Members of this panel are going to shed the cold light of reality on these myths. You will never think of Translation Memories in the same way again!
15.00 Tea
15.20 Moderating Strong Accents
Dr L. Baghai-Ravary, University of Oxford, UK
Modern labour intensive communication based industries, such as call centres, are increasingly outsourced to Asian countries where a dialect of English is widely spoken, and the pool of suitable staff is large. Despite the distances involved, this is highly cost effective, but is not without its drawbacks. This presentation will examine the differences between a strong non-native accent and southern British English from an engineering and signal processing perspective. Both timing and acoustic factors will be considered. It will then go on to outline those aspects of the speech which are primarily associated with accent, and those which determine "speaker identity". A preliminary system will then be proposed to ameliorate the perceived differences between the accents, so as to aid the comprehension by the listener, and improve the "naturalness" of their communication.
15.55 Discussion
16.00 Mapping Mechanism for Localisation and Translation
Lamine Aouad, University of Limerick, Ireland
The idea of cloud computing is not new. The cloud delivers storage and processing services, thereby reducing the cost and the computer expertise required. Automated localisation workflows can greatly benefit from the cloud, making translation and localization services more readily available to translators. However, the quality of the overall localisation and translation process is of prime importance and a challenging issue. This paper will present a mechanism that will be able to take into account predefined requirements and quality parameters to efficiently map an abstract localisation workflow to the available cloud services. This paper will also describe general issues in mapping application workflows and present some results in improving application performance through multiple mapping techniques in other areas such as scientific and business workflows.
16.35 Discussion
16.40 Creating Multimedia Localisation Training Materials – The Process and Resources for eCoLoMedia
Alina Secarä, University of Leeds, UK
This presentation explores the online resources developed as part of an European collaborative project in the domain of translator training and discusses the impact that the rise of entertainment and cultural industries has had in this field. It presents the results of a needs analysis survey as well as an overview of how some project-specific resources can be used in translator training environments. Demonstrations of materials created and their localized versions in a variety of European languages – Flash clips, video and audio files and a game – will be presented together with scenarios for their integration into vocational translator training.
17.15 Discussion
17.20 Close of Conference
Please
note: Aslib reserves the right to make changes
to the publicised programme without prior
notice.