For a while I have been watching the growth of the vocabularies used
for some Teletext pages dealing with traffic and weather
information. Languages are Dutch, English, Danish and
Swedish.
Some statistics are collected on a daily basis:
- A matrix, containing along the diagonal the number of
different word forms used on each of the pages (not
lemmatized, not corrected), and the size of the
intersections (lower left) and unions (upper right) of the
vocabularies .
- For each of the pages, the most recently added new words.
- For all pages together a (semi-)graphical presentation of
the growth of the word form vocabulary over time
- For every page and topic a full list of word forms (not
corrected for spelling errors) and their frequencies.
By
clicking on the column header you can sort the list
alphabetically or by frequency.
Here are some examples (updated every day):
- [inactive] Dutch
NOS Teletekst pages 751
,
752 , 753
and
754 with information for train passengers. Statistics , word frequency
list
- [inactive] Dutch
NOS Teletekst pages 755
with information for bus passengers. Statistics , word frequency
list
- [inactive] Dutch
NOS Teletekst pages 762
and
763 with information for airline passengers. Statistics , word frequency
list
- [inactive] Dutch
NOS Teletekst pages 730
,
731 , 738
and
739 (road traffic information). Statistics , word frequency
list
- [inactive] Dutch
NOS Teletekst pages 504
and
505 (stock market reports). Statistics , word frequency
list
- [inactive] Dutch
NOS Teletekst pages 703
,
706 , 707
(weather forecasts).
Statistics ,
word frequency list
Data collection from
the above NOS Teletekst sites has been stopped
on April 1 2005,
since automatic data collection from
the NOS site is no longer allowed.
- Belgian (Flemish) TV1
Teletekst pages
301 and
302 (weather forecast). Statistics , word frequency
list
- Danish
Tekst-TV pages 402 and
403
(weather forecasts).
Statistics ,
word frequency list
- Swedish SVT Text
page 401
(weather forecast). Statistics , word frequency
list
- German ARD
Text-TV pages
171 ,
172 and
173 (weather forecasts).
Statistics ,
word frequency list
- [inactive] Irish
AERTEL Teletext pages 161, 162, and 163 (weather
forecasts).
Statistics , word frequency
list
Stopped on Jan 30 2007 because the pages are no
longer available in text format.
The recordings have started at different dates,
starting in 1994, and are still ongoing.