<h1>MERLIN Corpus | Resources for research and practice related to foreign language learning</h1>
<divid="merlin-info"style="width:690px;">
<p>MERLIN is an error-annotated written <strong>learner corpus for German, Italian and Czech</strong>. It was created within the MERLIN <ahref="#"onclick="document.forms['about'].submit();">project</a> (2012-2014). The texts in MERLIN were taken from standardized language tests and are methodologically precisely related to the Common European Framework of Reference for Languages (Council of Europe 2001, 2020). This platform makes all corpus texts available with their ratings. It shows possible <ahref="C_teacher.php"target="_blank">usage scenarios</a>, in the teaching practice as well as in research, and informs about the structure and the design of the <ahref="C_mcorpus.php"target="_blank">corpus</a> and of the <ahref="C_annotation.php"target="_blank">annotations</a>. Users can search the corpus with the help of the integrated web-based search engine <ahref="https://merlin-platform.eu/annis"target="_blank">ANNIS</a>.</p>
<p>MERLIN provides access to 2.286 texts written by learners of <b>Czech</b>, <b>Italian</b> and <b>German</b>.</p>
<p>The learner texts stem from standardized language tests and they have been reliably related to the CEFR levels. <ahref="C_mcorpus.php"class="reference"> read more</a></p>
<h2>1 Download MERLIN texts and resources</h2>
<p>You can download the whole corpus (2.286 texts) in the following file formats:</p>
<ul>
<li><ahref="https://clarin.eurac.edu/repository/xmlui/bitstream/handle/20.500.12124/6/merlin-text-v1.1.zip?sequence=3&isAllowed=y"class="a.reference"> TXT-files</a><ahref="https://clarin.eurac.edu/repository/xmlui/bitstream/handle/20.500.12124/6/merlin-text-v1.1.zip?sequence=3&isAllowed=y"dir="ltr"><imgsrc="img/icon_txt.png"alt="txt"width="13"height="16"/></a> including the target hypothesis and metadata such as age, gender, mother tongue, task, and rating </li>
<li><ahref="https://gitlab.inf.unibz.it/commul/merlin-platform/merlin-exmaralda/tags/v1.1"dir="ltr"class="a.reference">Transcription files in the EXMARaLDA format</a></li>
<li>in the <ahref="https://clarin.eurac.edu/repository/xmlui/bitstream/handle/20.500.12124/6/merlin-paula-v1.1.zip?sequence=6&isAllowed=y"class="a.reference">PAULA </a>and <ahref="https://clarin.eurac.edu/repository/xmlui/bitstream/handle/20.500.12124/6/merlin-annis-v1.1.zip?sequence=7&isAllowed=y"class="a.reference">ANNIS</a> format</li>
</ul>
<p>In addition, the following corpus-related overviews are available:</p>
<ul>
<li>an overview of texts (IDs) and assigend <ahref="https://clarin.eurac.edu/repository/xmlui/bitstream/handle/20.500.12124/6/merlin-metadata-v1.1.zip?sequence=4&isAllowed=y"class="a.reference">metadata </a>in *.xlsx</li>
<li><ahref="https://clarin.eurac.edu/repository/xmlui/bitstream/handle/20.500.12124/6/merlin-tasks-v1.1.zip?sequence=5&isAllowed=y"target="_blank"class="a.reference"dir="ltr"> Tasks</a><ahref="https://clarin.eurac.edu/repository/xmlui/bitstream/handle/20.500.12124/6/merlin-tasks-v1.1.zip?sequence=5&isAllowed=y"dir="ltr"><imgsrc="img/document-pdf.png"alt="pdf"width="16"height="16"/></a> on which the target lanuage tests (L2 test) are based </li>
<li>the<ahref="https://clarin.eurac.edu/repository/xmlui/bitstream/handle/20.500.12124/6/merlin-docs-v1.1.zip?sequence=2&isAllowed=y"target="_blank"class="a.reference"> complete documentation</a><ahref="https://clarin.eurac.edu/repository/xmlui/bitstream/handle/20.500.12124/6/merlin-tasks-v1.1.zip?sequence=5&isAllowed=y"dir="ltr"><imgsrc="img/document-pdf.png"alt="pdf"width="16"height="16"/></a> of the transcription, rating, and annotation process</li>
</ul>
<h2dir="ltr">2 Display and filter MERLIN texts </h2>
<p>The MERLIN texts are TXT-files that you can open in a standard text editor. Descriptive file names help you easily filter the files by metadata. In addition, you can use the <ahref="https://merlin-platform.eu/annis/"target="_blank"><strongid="docs-internal-guid-df89826f-7fff-b367-0ef2-2aa618ff671a">ANNIS search tool</strong></a> to sort texts and display them in the document browser.<br/>
</p>
<divid="anchor1"></div>
<h4><ahref="#anchor1"onclick="toggle('#content1','#img1')"><imgsrc="img/toggle-expand.png"alt="toggle-expand"id="img1"/></a> Open texts with the file manager</h4>
<divid="content1"class="content">
<p>Open the texts after downloading and unpack / extract them from your native file manager, e. g. Windows File Explorer. Choose<em><strong>↘ meta-ltext </strong></em> for learner texts (L2 texts) with metadata or<em><strong>↘ </strong></em><em><strong>meta_ltext_TH </strong></em>for L2 texts with target hypothesis.</p>
</div>
<divid="anchor3"></div>
<h4><ahref="#anchor3"onclick="toggle('#content3','#img3')"><imgsrc="img/toggle-expand.png"alt="toggle-expand"id="img3"/></a> Filter texts with the file manager</h4>
<divid="content3"class="content">
<p>Use the search box of your native file manager, e. g. in the Windows File Explorer (you can find it to the right of the address bar) to filter the file list for the following features (metadata):</p>
<ul>
<li>overall rating of the text, CEFR level, e. g. <em><strong>B1</strong></em></li>
<li>task on which the L2 test is based, e. g. <strong><em>visit-letter</em></strong></li>
<li>mother tongue (L1) of the learner, e. g. <strong><em>Russian</em></strong></li>
</ul>
<p>For example, to find all texts with the overall CEFR rating B1 written by learners with Russian as their mother tongue, enter <em><strong>B1 Russian</strong></em>.<br/>
The following <strong>L1 </strong>occur in the corpus: <em>Arabic, Czech, English, Chinese, French, German, Hungarian, Italian, Polish, Portuguese, Russian, Slovak, Spanish, Turkish</em>.</p>
<pdir="ltr">On <ahref="C_mcorpus.php"target="_blank">MERLIN Corpus</a> you will find an overview of all tasks including the abbreviations we used in the file names.<br/>
</p>
</div>
<divid="anchor2"></div>
<h4><ahref="#anchor2"onclick="toggle('#content2','#img2')"><imgsrc="img/toggle-expand.png"alt="toggle-expand"id="img2"/></a> Open texts in ANNIS</h4>
<divid="content2"class="content">
<p>Open the <ahref="https://merlin-platform.eu/annis/"target="_blank">ANNIS search interface</a>, go to <em><strong>Corpus List</strong></em> and select the corpus you want to display (i. e. the target language). Click on the<strong><em><strong>↘ </strong></em>document icon</strong> [1]. In the field to the right, the list view of all MERLIN texts of the chosen language opens up. Click on <em><strong>↘ Full text</strong></em> [2] next to a text to open it and on "<strong>i</strong>" [3] to display the assigned metadata.</p>
<h4dir="ltr"><ahref="#anchor4"onclick="toggle('#content4','#img4')"><imgsrc="img/toggle-expand.png"alt="toggle-expand"id="img4"/></a> Sort texts in ANNIS</h4>
<divid="content4"class="content">
<p>Select a corpus (according to the target language) in the <ahref="https://merlin-platform.eu/annis/"target="_blank">ANNIS search interface</a><em><strong>↘</strong></em><em><strong>Corpus List</strong></em> and click on the <em><strong>↘</strong></em><strong>document icon</strong>. In the field to the right, a list view of all MERLIN texts of the chosen language opens up.</p>
<pdir="ltr">By clicking on<em><strong>↘</strong></em><em><strong>_rating_fair_cefr</strong></em> you can quickly sort the texts according to the CEFR level (overall rating).</p>
<pdir="ltr">If you start a search for learner language features directly in ANNIS, you can also filter texts by metadata such as the learner's L1, age or the assigned task. More on this in the next section.</p>
</div>
<h2>3 Search the MERLIN corpus</h2>
<p>You can search the MERLIN Corpus for lexcial, grammatical and other features as well as for words, lemmas, or tagged parts of speech. By doing so, you will obtain examples for learner language (L2) in context. To provide the search functionality, the MERLIN platform uses the visualization and search architecture of ANNIS, which allows to display multi-layer annotations as those of the MERLIN corpus.</p>
<inputclass="bt"type="button"value="Search MERLIN in ANNIS"onclick="window.location.href='https://merlin-platform.eu/annis/'"/>
</form>
<p></p>
<divid="merlin-info"style="width:690px;">
<h3dir="ltr">Example searches</h3>
<ul>
<li>DE <strong>↘</strong><ahref="https://merlin-platform.eu/annis/#_q=dG9rX2xlbW1hPSJHcnXDnyI&_c=TUVSTElOX0dlcm1hbg&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"target="_blank">Realisations of forms of the word 'Gruß' in L2 texts</a></li>
<li>DE <strong>↘</strong><ahref="https://merlin-platform.eu/annis/#_q=dG9rX2xlbW1hPSJncsO8w59lbiIgJiBFQV9jYXRlZ29yeT0vT18uKi8gJiAjMSBfb18gIzI&_c=TUVSTElOX0dlcm1hbg&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"target="_blank">Orthographical errors related to the word 'grüßen'</a> </li>
<li>DE <strong>↘</strong><ahref="https://merlin-platform.eu/annis/#_q=dG9rX2xlbW1hPSJmYWhyZW4iICYgRUFfY2F0ZWdvcnk9L0dfVmVyYl9jb21wbC8gJiAjMSBfb18gIzI&_c=TUVSTElOX0dlcm1hbg&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"target="_blank">Examples of use for the word 'fahren' in complex predicates</a> (e. g. after modal verbs) </li>
<li>DE <strong>↘</strong><ahref="https://merlin-platform.eu/annis/#_q=dG9rX2xlbW1hPSJ3YXJ0ZW4iICYgRUFfY2F0ZWdvcnk9L0dfLiovICYgIzEgX29fICMy&_c=TUVSTElOX0dlcm1hbg&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"target="_blank">Grammatical errors related to all forms of' 'warten' </a> </li>
<li>CZ <strong>↘</strong><ahref="https://merlin-platform.eu/annis/#_q=bGVhcm5lcj0ibmEiICYgdG9rX3Bvcz0vUi4qLyAmIHRva19wb3M9L04uKi8gJiBFQV9jYXRlZ29yeT0vR19Nb3JwaG9sX1dyb25nLyAmICMxIF89XyAjMiAmICMzIF9vXyAjNCAmICMxIC4yLDIgIzMK&_c=TUVSTElOX0N6ZWNo&cl=5&cr=5&s=0&l=10"target="_blank">Case errors with Czech nouns after the preposition 'na' </a></li>
<li>CZ <strong>↘</strong><ahref="https://merlin-platform.eu/annis/#_q=R19Nb3JwaG9sX1dyb25nX3R5cGU9ImNhc2UiIA&_c=TUVSTElOX0N6ZWNo&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"target="_blank">Case errors in texts of German learners of Czech</a> </li>
<li>CZ <strong>↘</strong><ahref="https://merlin-platform.eu/annis/#_q=dG9rX2xlbW1hPSJtw610IiAmIGxlYXJuZXI9InLDoWQiICYgIzEgLjEsNCAjMg&_c=TUVSTElOX0N6ZWNo&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"target="_blank">Use of the structure 'mít rád'</a></li>
<li>IT<strong> ↘</strong><ahref="https://merlin-platform.eu/annis/#_q=R19WZXJiX3R5cGU9Im1kIg&_c=TUVSTElOX0l0YWxpYW4&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"target="_blank">Mood errors in texts of learners of Italian </a></li>
</ul>
<p>Using the metadata, you can limit queries to a specific sub-corpus, for example:</p>
<ul>
<li>DE <strong>↘</strong><ahref="https://merlin-platform.eu/annis/#_q=R19Nb3JwaG9sX1dyb25nX3R5cGU9ImNhc2UiICYgbWV0YTo6X3JhdGluZ19mYWlyX2NlZnI9IkIyIg&_c=TUVSTElOX0dlcm1hbg&cl=5&cr=5&s=0&l=10">Case errors in texts of learners at B2 level</a> (fair rating)</li>
<li>CZ <strong>↘</strong><ahref="https://merlin-platform.eu/annis/#_q=dG9rX3Bvcz0vVi4qLyAmIEdfVmVyYl90eXBlPSJhc3AiICYgIzEgX29fICMyICYgbWV0YTo6X2F1dGhvcl9MMT0iR2VybWFuIiAmIG1ldGE6Ol9yYXRpbmdfZmFpcl9jZWZyPSJCMSI&_c=TUVSTElOX0N6ZWNo&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"> Aspect errors of learners with German L1 at B1 level</a> (fair rating)</li>
<li>IT <strong> ↘</strong><ahref="https://merlin-platform.eu/annis/#_q=R19WZXJiX3R5cGU9Im1kIiAmIG1ldGE6Ol9yYXRpbmdfZmFpcl9jZWZyPSJCMSI&_c=TUVSTElOX0l0YWxpYW4&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"> Mood errors in texts of learners of Italian at B1 level</a> (fair rating)</li>
</ul>
</div><p></p>
<p><imgsrc="img/hint_bulb.png"alt="hint bulb"/><spanclass="StilSmall"> The <ahref="https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/corpus-tools/annis-tutorials/gui-tutorial"target="_blank"class="a.reference">video tutorial</a> by HU Berlin provides a general introduction to the ANNIS user interface (in German). You can also refer to the ANNIS help section under<em><strong>↘ Help/<ahref="https://merlin-platform.eu/annis/#_q=dG9rX2xlbW1hPSJtw610IiAmIGxlYXJuZXI9InLDoWQiICYgIzEgLjEsNCAjMg&_c=TUVSTElOX0N6ZWNo&cl=5&cr=5&s=0&l=10&_seg=bGVhcm5lcg"target="_blank">Tutorial</a></strong></em>. For explanations on the annotation layers please go to <ahref="#"onclick="document.forms['glossary'].submit();"class="a.reference"><?phpecho$trans['help_search'][$_SESSION['lang']];?></a></span>.</p>