The times of the printed dictionary are over. Wolfgang Klein on the benefits of the digital lexicon, forgotten words, and the Problem of youth language.

Interview of Kathrin Heinrich

The “center for digital lexicography of the German language” (ZDL), the largest dictionary of the German language. A comprehensive digital System, which allows the reader to interactive use and Research. Wolfgang Klein is a linguist and head of the new ZDL in Berlin, like its predecessor project “Digital dictionary of the German language” (DWDS). In this interview he explains how to find new words and why some never in the dictionary.

SZ: Mr. Klein, how many words you used?

Wolfgang Klein: , The German present-day language has approximately five million words. The are also really in use, as us data from the period between 1995 and 2005. A single speaker used only an active vocabulary of several thousand words, but he understands, of course, much more.

And these five million words lists, the center for digital lexicography?

The goal is to describe the entire German vocabulary very carefully – in all possible aspects. The methods are quite different than that of the classic printed dictionary, rather, it is a digital lexical System. We plan initially to 200 000 full – article part, we have access to older sources in the DWDS, which we revise now.

what is the difference between your System the Duden, which is also available on-line?

a lot can capture more words, and has more features. Clickable statistics show curves, for example, in History, when a word showed up. But, above all, the user can jump directly to a text source and self-reference.

And how can you find the words?

We work with various digital thematic collections of bundled text, called corpora. For the historic area, we have a selection of about a total of 3000 texts that are temporally scattered between 1600 and 2000. In this corpus, the fist is in there, or the critique of pure reason, but also cooking books, psychological works, and Newspapers. For the contemporary part, we work with many different corpora, but mainly with Newspapers. Some we can make in the full text the reader is accessible. The social media we use, there are Blog and chat corpora. While it is technically not a Problem, but legally difficult – a grey zone. We have a staff of Webcrawling, so after searching for things, which may be downloaded.

How fast can you map the current changes in the language?

Our text collections are always a little behind, about a year and a half. The major reason that Newspapers sometimes make corrections that we need to take into account.

Wolfgang Klein is head of the “centre for digital lexicography of the German language” in Berlin.

(photo: WK; BBAW)

you have to work hardly to get from A to Z, like a printed dictionary?

no, it makes the rather according to the importance of words. Which is determined by the frequency of a word, whether it occurs in different types of text, and whether it is in simple words are more important than compound. The description depth can vary according to a tiered System. Sometimes you need just a few details, the grammar can be automated.

What’s in it for you because a simple word?

Actually, a mouse would be a good example of this. This is a small, grey beings. But since about 30 years, it has still a different meaning, to describe, namely, computer mouse.

“We are trying to change the meaning of trace”

How do you capture such changes?

In Göttingen, we have audited our own work, which describes individual word histories. There words are analyzed in context, this is almost a bit telling. A word such as “immigration,” for example, We ask how it arose, how it behaves, “immigration”, when it is more frequently or with a different word will be replaced. So we are trying to change the meaning. Five million words is only limited feasible.

How do you deal with words that have been removed from the language, as the dictionary clean-up after the Nazi time?

Yes, there is a political Problem. We thought long and hard, as we are the corpus for the 20. Century have selected. The texts from the Nazi period. This was not an easy decision, but they are just part of the development of the German language. You have to be able to explore the texts, Yes. Some of them are provided with a note, but we don’t have the intention to tell people what they should do, but you should be able to inform you comprehensively and reliably as possible and think for themselves.

Get words in oblivion?

such A word from the grimms’dictionary is “dalest”. This is become extinct a very long time ago and nobody knows even today what it means. The Curious thing is that Wilhelm Grimm edited the entry himself and very insecure. He did not know himself what it means.

is the common that a word disappears?

Only rarely. It is not forbidden, but the device out of use. There are also words like “weiland”. This is really a very old-fashioned word, which means “at the time”. You think Goethe might have perhaps used, but even in his time the out of date already. But in the new text collections, such as in the time, that is used when you want to give a set of old-fashioned Flair. The most beautiful example I found a few years ago, said, “as weiland, Sarah Palin”, because this is already weiland (laughs).

What is to be used with the current words that are not found in the Newspapers, but in school yards?

It is difficult to get authentic youth language. You young time at the microphone, then they’re talking, of course, not in the way you talk. Every year, there is a lexicon of youth language, but these words are mostly invented. Youth language in Germany is not uniform, but rather an age typical of the colloquial language. And it shows, for example, obscene words, but also in typical phrases such as “Ey”.

It may be that fashion, words are never recorded, because they fly under your Radar?

Yes, absolutely. Of course, we are dependent on our data. A large gap, which grieved me very much, is the spoken language. Our main sources are the Newspapers simply because they have gigantic Material and digital. You do not need to scan. But how would you come to a spoken everyday language? We have some corpora, but you can’t recycle in their spoken Form directly, but you must transcribe consuming.

there Are still words that you even surprise?

My German’s favorite word for some time now, jail is the “gap”. Because no one will believe in the world a the. Imagine, a Japanese, must say! And: “Rate The Chicken”.

What is a interest rate chicken?

used to deliver to the farmers of the Tithes in kind, so-and-so many bushels of wheat, cords of wood, or cattle, was a very driven hard. Gradually it has changed to money, but symbolically had to be paid a living being, namely, the interest chicken.