Electronic corpora are large, systematically organized collections of naturally occurring texts. For language learners, open access corpora present a rich source of examples of how a target language is used. Many contemporary corpora are equipped with user-friendly search and visualization tools that learners can use either under teachers’ guidance or on their own as supplementary tools for language learning. One such resource for German is the Das Digitale Wörterbuch der Deutschen Sprache (DWDS). Despite its name (roughly translated as the Digital Dictionary of the German Language), the DWDS is far more than just a dictionary. It is an open-access suite of electronic resources of word usage in the historical and contemporary German language (Wortauskunftssystem zur deutschen Sprache in Geschichte und Gegenwart) that is regularly updated by the Berlin-Brandenburg Academy of Sciences in Germany.
The DWDS is organized as several subcorpora, each comprising thousands of texts in different genres. Its core corpus (Kernkorpus) is balanced by time and by text type: each decade of 20th- and 21st-century German is represented by approximately 100,000 words divided equally among four text types: fiction, nonfiction (e.g., guides and manuals), science, and journalism. Additionally, the DWDS includes subcorpora of historical texts that go as far back as the 15th century, several newspaper corpora, and a number of specialized corpora, including transcribed oral interviews, film subtitles, and blogs. A user can search each of these subcorpora for specific words, word combinations, or word classes (e.g., nouns, adjectives, main or auxiliary verbs). The search results are presented in the form of concordances – stacked lines of examples with the search words bolded; these are presented in three different views: sentence, extended context (three sentences), or KWIC: Key Words in Context (truncated sentences with the search word centered). Each example also contains a link to the bibliographic citation and, if available on the internet, to the full text from which the example was taken.
By setting specific search filters the user can also find statistical information on the word’s frequency in different genres or at different points in time, make usage comparisons with other words, and much more. Search results are displayed in a variety of different formats, many of which are graphic and intuitive: tables, word clouds, and time graphs.
With the help of these open and free resources, one can explore how different words and constructions have been used by speakers of German in their writing and speech. This guide is oriented toward English-speaking users new to using language corpora and, as such, presents only a few selected basic search functions. For extended search functions, interested readers are referred to the specialized DWDS user manual in German (Korpussuche).
DWDS homepage (Startseite)
The DWDS homepage (dwds.de) presents an overview of its three main components: electronic dictionaries (Wörterbücher), text corpora (Textkorpora), and statistical analyses (Statistische Auswertungen). The homepage also features the Article of the Day (Artikel des Tages) and the newest additions to the DWDS (Neueste Artikel). At the bottom of the homepage, there are links to the description of the DWDS research project and its resources, as well as contact and copyright information.
Word information (Wortinformation)
Let’s start by exploring the word Herausforderung (‘challenge’). Type it in the search box at the top of the DWDS homepage and click on the search icon. This will take you to the word information page (Wortinformation). At the top of this page, lexicographic information from the DWDS electronic dictionaries is presented:
- The untitled first section contains some grammatical information: the part-of-speech of the word (noun in our example), its inflected forms (genitive and plural), syllabification, and word formation (our noun was derived from a verb herausfordern with the help of the suffix -ung); you can also click on the speaker icon next to Aussprache to hear how the word is pronounced.
- The Bedeutung (meaning) section contains a definition and/or a few usage examples.
- The Etymologie (etymology) section explain the history of the word’s origin
- The Thesaurus section lists the word’s common synonyms.
- Additionally, to the right of these sections, there are links to information about the search word from two older dictionaries that have been recently digitized.
While the aforementioned dictionary-based sections (Wörterbücher) are very useful resources, they will not be further addressed in our user guide. Instead, we will focus on the corpus-based resources, which present the user with the output of automatic trawls through the texts archived in DWDS. These resources are word profile (Wortprofil), word frequency timeline (Wortverlaufskurve), and corpus examples (Belege in Korpora).