The MLIA area is very much a multidisciplinary area involving the following fields: information retrieval, natural language processing, machine translation and summarization, speech processing, and human-computer interaction. While research was initially mainly confined to working with textual data, over the years CLEF has successfully expanded the coverage to other media, such as working with audio and images, in order to stimulate research into the development of multilingual multimedia retrieval systems.
User studies in the MLIA area have received scarce attention from the scientific community, partly due to their cost (much higher than running batch experiments) and partly due to the difficulties of establishing evaluation methodologies which are both realistic (performed in real-world scenarios) and scientifically well-grounded (performed under laboratory-controlled conditions) TrebleCLEF will address the needs of (at least) three types of users with strong interests: a) multilingual system developers; b) business companies with a potential interest in MLIA system software (the potential market for system developers); c) end users with information needs that transcend language barriers
Test Collection Creation
It is generally assumed by many researchers that constructing test collections demands great effort and can only be afforded by rich organisations or through extensive collaboration with large numbers of researchers. Current evaluation campaigns reinforce this belief. However, such attitudes ignore the flood of research currently being conducted on new measures and new methodologies that allow building test collections more efficiently along with new measures that work well with the new test collections. TrebleCLEF aims at identifying and collating the latest research in methods for forming test collections quickly and efficiently and at identifying new evaluation methodologies and metrics specifically designed and tuned for use in a multilingual context.
Individual researchers or small groups do not usually have the possibility of running large-scale and systematic experiments over a large set of experimental collections and resources in order to improve the comprehension of MLIA systems and gain an exhaustive picture of their behaviour with respect to languages. TrebleCLEF will address this lack of information by promoting and coordinating a series of systematic “grid-experiments” which will re-use and exploit the valuable resources and experimental collections made available by CLEF in order to gain more insights about the effectiveness of the various weighting schemes and retrieval techniques with respect to the languages and to disseminate this knowledge to the relevant application communities.
Language Resources for MLIA
TrebleCLEF will support the development of high priority language resources for Multilingual Information Access in a systematic, standards-driven, collaborative learning context. Priority requirements will be assessed through consultations with language industry and communication players, and a protocol and roadmap will be established for developing a set of language resources for all technologies related to MLIA.