ELSNET's First Roadmap Workshop

Aims of the Workshop

One of the items on ELSNET's agenda for the period 2000-2001 is to develop views on and visions of the longer term future of the field of language and speech technologies and neighbouring areas, also called ELSNET's Road Map for Human Language Technologies. As a first step in this process, ELSNET's Research Task group is organizing a brainstorming workshop with a number of prominent researchers an developers from our community.

The title of the workshop is: "How will language and speech technology be used in the information world of 2010? Research challenges and infrastructure needs for the next ten years"

The workshop was organised by Niels Ole Bernsen and Steven Krauwer.

The main result of the workshop is the First Roadmap Report (edited by Ole Bernsen, in PDF), which is mostly focused on speech-related technologies.



November 23, 24 2000

Report by J.M.Pardo


This report is a summary of what happened in the view of the author and it is not an attempt to cover with full detail the development of the workshop (it is not a minutes document) so there are inevitable some omissions that there are the only responsibility of the author.

The workshop was held in the pleasant and quiet Hotel Savoy by the sea in the city of Katwijk aan Zee (The Netherlands). The location was a very good choice for the workshop due to its relaxed environment that gave place to fruitful discussions and the emergence of new ideas. It was a kind of group spiritual retreat.


Paul Heisterkamp
Daimler-Chrysler Research Ulm
Arjan van Hessen
COMSYS / U. Twente Utrecht
Pierre Isabelle
XEROX Grenoble
José Pardo
ELSNET / Univ.Politecnica Madrid
Oliviero Stock
IRST Trento
Hans Uszkoreit
DFKI Saarbruecken
Antonio Zampolli
ELSNET / U. Pisa
Niels Ole Bernsen
ELSNET / NISLab Odense
Steven Krauwer
ELSNET / U. Utrecht


The organisation of the workshop was as follows: First every person present gave a "short" presentation of their position paper, most of the papers were available in advance. Then after all the presentations a proposal for the continuation of work was done and aproved.

Although theoretically every presentation had half an hour, the discussions started from the very beginning not only discussing about the position paper of the presenter but about the objective of the workshop in general, so the time allocated for each presentation varied a lot depending on the amount of the discussion that was carried out. The time left for the final discussion was only about 2 hours.


  1. Write a document on the vision and views of Elsnet community about how the area of Speech and Language will be developed in the next 10 years. The audience of the document is double: The researchers of the community and decision makers. It its a kind of exercise that the Elsnet community did also at the early nineties and had and important impact in the development in the field afterwards. Most of the considerations and predictions of this early document turned to be adopted by the community.
  2. Define the procedure for continuation of this exercise.
  3. Contribute to the WP3 work package of Elsnet contract.

Presentations and discussions

The workhop started with the presentation of Paul Heisterkamp (see his paper). The presentation was very lively and a lot of discussion appeared together with nicely included visual jokes that helped the the communication between the attendees. I will stress only some points that I wrote down.

The first point was about the concept of roadmap itself. Some talked about terrain map instead of roadmap. Some talked about prediction map instead of planning map. A planning map can be done in the case of a time span of five or less number of years and where some input of the planner (in terms of invest of money and effort) can be considered to influence the roadmap.

The difference between the roadmap and terrain map is that of considering that we are driving a single road with milestones on it or alternatively there is a map of the terrain with terrain mountains and valleys (i.e. different events that can eventually happen) but different roads can be taken to the same (or different) destination. I think that after the discussion it was clear that our exercise was of the kind of predicting roadmap instead of planning roadmap and more that of a roadmap instead (i.e. a single dimension in time) instead of a terrain map ( a two dimensional map with alternative roads, maybe exclusive). But the roadmap is going to be full of geographical accidents that will shape the form ot the road itself.

Some other concepts were discussed , one was the evaluation itself. The evaluation can be applied to a technology, to the users (field evaluation) and to the business model. One of the main problems is how to predict the performance of a system in a different task to the one it was evaluated (this is what the application provider wants to know in advance). Another important point was the fact that a good or new technology is not necesarily the best to use in a new application (see the example of how to solve the problem of saturation of telephone lines because of many simultaneous calls after a "CALL NOW" add in TV after an event or concourse; the first cheap solution is not to have automatic speech recognition but a call scheduler that keeps the calls on line for some time before they are answered by a human operator).

Another concept that appeared was the concept of innovativeness or strength of Europe compared to USA and Japan. It would be good to stress the stronger points of european technolgy in order to have some chance of good investement returns. One of the strengths of Europe compared to USA and Japan is assumed to be that of the handling of mixed initiative dialog (i.e Philips and Daimler Chrysler; Nuance systems are very much system driven).

Another interesting point mentioned was the cost problem. A technology may not be applied because of its cost. In the car environment the memory is very limited because of cost, for this reason TTS with large memory is not used. TTS by rule is not used either because its low quality.

Other subject of discussion was that of the local maximum (i.e. an HMM technology can give pass to a local maximum and to separate from it will inevitable lead to a less performance until a new breakthrough can beat the local maximum with a new maximum bigger than the previous one).

Oliviero Stock gave an interesting presentation, also continuously interrupted about his view of the future. One of the point that will grow in the future is the area of automatic extraction of information. Information extraction is a need. Another area is educational entertainment: Language based interaction for educational games. Life-like agents will be more and more developed. In many systems, output will be more important than input (the systems knows who you are, where are you and what you want without asking you and it will give you personalized information. Medical information will be using more and more language technology. "The disappearing computer" : In ten years the computer (lap top) will disappear in its actual form and it will be embedded in objects (i.e. a wood table will embed a screen and a microphone and no keyboard will be around), your files will be stored in a remote server and will be retrieved to your screen when asked for without the need of local archiving all your files in a laptop.

The call centers will be half automatic, the messages will be originated in a form (long text or a language) and will be translated into a different form (a summary of it, or both a summary and a different language). Also more traditional applications will go on.

On the research side the statistic models will continue and diverge more and more from the linguistic theory. The linguistic theory is also moving. (In my view, the statistic model is just another linguistic model). The evaluation concept of the technology should be reevaluated i.e. in terms of modularity, the possibility to merge with other techniques, the potential to conduct new results although this latest kind of evaluation is very difficult to carry since it is very much subjective. Text planning models will be important in the case of rethorical speaking (or writing), automatic generation of humor etc… Also the field of analyzing and generating persuasion in language will be an important topic. Another topic is group communication (between several humans and several machines) in contrast to one to one communication. Oliviero also thinks that language processing will come back to artificial intelligence. Niels also added to cognitive science.

Arjan van Hessen said that actual TTS is only good for 30 seconds of speech, more than that it is boring. The speech processors will be moved from the server to the devices themselves.

Hans Uskoreit presented a roadmap as a mixture of predictions and plannings but it was more of the type mentioned above of planning roadmap (i.e. the planner has a role as an actor to make the plan happen). He believes that the future communication will be both human and machine and human and the so called "infostructure". The infostructure contains the collective memory. He presented also the concept of konwledge base where the information has to be transformed to. The information itself is like raw data without organisation. The knowledge is organized information inmediately accessible, densely connected and suitable for inferences. The object will be to give to information extraction an structure instead of retrieving a long list of data that has to be processed by the user.

Steven Krauwer predicts that in 10 years the language barrier will not longer be an obstacle for us. Then there was the question of translation if translation will happen or not. Pierre says "something that doesn´t pass the Turing test should not be called translation". Other opinions mentioned that the goal is not to make a translator similar to a human translator (it will never happen, taking into considerations all the considerations that a human assumes when making a translation) but to achieve both translation tools and translator tools. An interesting fact is that less than 3% of translation today is done by machines or with the help of machines.

Jose Pardo presented his view (see the paper) taking a low risk long term prediction of 50 years. Even so, many people in the audience counteract to say that some of the topics need 300 or more years to come. His speech was reduced to 15 minutes (35 of ellapsed time including discussions) due to the short time that rested and that many of the topics were already discussed.

Niels Ole Bernsen presented a very good paper (given also in advance of the meeting) well organized that cover most of the points and that could be the basis for the long report document (see below about the short report document and the long report document).


[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 02-01-2014 by Steven Krauwer Disclaimer / Contact ELSNET