Corpus Examples

Corpus Examples (Belege in Korpora)

Belege in Korpora is located under the time graphs and the links to two older dictionaries on the right side of the word information page. It lists absolute frequencies of search word in different DWDS corpora: the core Kernkorpus with texts in four written genres (fiction, nonfiction, science, and journalism), newspaper corpora, we corpora (blogs), and special corpora (film subtitles, transcribed speeches, etc).

It is important to keep in mind that all these corpora differ in size and activity, so direct comparisons of absolute word frequencies across corpora are not very meaningful.  For example, the Kernkorpus of the 20th century comprises about 100 million words (10 million words for each decade) and is in a fixed form; the core corpus of the 21st century, by contrast, has only one decade of searchable data and is still being populated.  We cannot state, as such, that Herausforderung was more frequent in the 20th century than it is in the 21st century based on the absolute frequencies (889 and 452 at the time of our last search).  Indeed, the opposite is actually true because, as we saw in the Wortprofil tutorial, the frequency of Herausforderung has been growing rapidly since the turn of the century.

We can, however, compare absolute frequencies of different words in the same corpus. For example, if we search DWDS for Problem and look under Belege in Korpora, we can find out that Herausforderung is less frequent in the 20th century core corpus (889 occurrences) than Problem (25794 occurrences).

Word use examples in corpora

To see actual examples of word use, enter your search term and then click on the title of a particular corpus. For example, if you enter a search for Herausforderung and then click on DWDS-Kernkorpus (1900–1999), you get access to 767 sentences containing Herausforderung.  You will note that there are actually 889 total occurrences in the corpus, but the remaining 122 sentences cannot be viewed because of copyright restrictions; those occurences are stil counted for statistical calculations. The bibliographic reference to the original text is listed above the example.

Herausforderung in 20th century corpus

The default view lists 50 sentences per page in  reverse chronological order (1999 to 1900). These settings can be changed in the advanced search functions at the top of the page. The user can also scroll through additional examples by using the numbered navigation bar right above the example sentences. The time window (Start / Ende) and the number of examples per page (Anzahl Treffer pro Seite) can also be changed, as well as the sorting order (Sortierung). The search can be shifted to a different corpus by selecting the new corpus from the list in the right-hand menu and, in the case of the Kernkorpus, be restricted to certain genres (Textklassen).

Selected sentences can also be exported from DWDS to different file formats (Treffer exportieren). Finally, the amount of context for the example can be modified using the Anzeige radio buttons.  The default voll view shows the search word in the context of the sentence in which it appears.  The maximal view adds one sentence before and one sentence after the sentence containing the search word.  The KWIC view provides concordances that cut the example sentences to one line each and lists them with the search word bolded and centered. While the two more extended views are better for reading the example sentences for meaning, the concordance view lends itself to analyzing typical word usage patterns. For example, we can easily see in the KWIC view that Herausforderung appears more frequently  in the singular than it does in the plural form (Herausforderungen).

KWIC view of Herausforderung

Using what you learned above about the KWIC view, answer the question that follows.

Corpus searches beyond single words

All of the search examples described above focused on a single word such as Herausforderung, but one can also search for collocations such as Herausforderung meistern. However, simply typing a word or a collocation in the search line retrieves all inflected word forms, such as both singular and plural forms for all nouns or different conjugated forms of a verb. To go beyond such searches (called lemma searches), specific symbols are needed.

  • To search for a specific word form, one has to type the @ symbol first (e.g., searching for @Herausforderung will exclude results with the plural form Herausforderungen).
  • Conversely to search for part of a word, one needs to do a wildcard search using the * symbol. (e.g., searches for Haus* will retrieve such words as Haustier, Haushalt, Hausfrau and searches for *haus will bring up such words as Elternhaus, Krankenhaus, Treppenhaus.
  • There are many more search options listed on the Korpussuche page that can be reached by clicking the question mark just to the right of the search icon on any Korpusbelege page

Notably, DWDS can be searched not only for specific words or for parts of words, but also for linguistic categories such as parts-of-speech (nouns, adjectives, prepositions…). The part-of-speech search symbol is $p= followed by an abbreviated name of each word category. For example, typing $p=ADV will retrieve all sentences containing adverbs, and $p=ART will retrieve all sentences containing definite and indefinite articles from the respective corpus. The list of part-of-speech abbreviations (Liste der abfragbaren Wortarten) can be found on the Korpussuche page.

Let’s try a few more examples.