diff --git a/php/en/archive-check/disclaimer.php b/php/en/archive-check/disclaimer.php deleted file mode 100644 index 5ead87f2380e5e3c0e1fd5780179d55d72ccee88..0000000000000000000000000000000000000000 --- a/php/en/archive-check/disclaimer.php +++ /dev/null @@ -1,57 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<form name="simple" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_simplesearch.php"></form> -<form name="advanced" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_advancedsearch.php"></form> -<form name="documents" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_documentsearch.php"></form> -<form name="statistics" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_statistics.php"></form> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>Disclaimer</h1> -<p> </p> -<p><img src="style/EU_flag_LLP_EN-01.png" alt="LLP-Logo" width="200" height="77"></p> -<p>This project has been funded with support from the European Commission. This website reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.</p> -<p> </p> -<h2>Project coordination:</h2> -<p><strong>Technische Universität Dresden</strong><br> - Fakultät Sprach-, Literatur- und Kulturwissenschaften<br> - Institut für Romanistik<br> - Prof. Lieber, Dr. Katrin Wisniewski<br> - 01062 Dresden<br> - Tel: +49 (0) 351 463-33216<br> - Fax: +49 (0) 351 463-37702</p> -<p> </p> -<h2>Responsible for website content and maintenance:</h2> -<p><strong>Universität Tübingen</strong><br> - Seminar für Sprachwissenschaft<br> - Abt. Theoretische Computerlinguistik<br> - Prof. Meurers<br> - 72074 Tübingen<br> - Tel: +49 (0) 7071 2973963<br> - Fax: +49 (0) 7071 295213</p> -<p> </p> -<p><strong>European Academy Bozen</strong><br> - Viale Druso, 1 / Drususallee 1<br> - 39100 Bolzano / Bozen - Italy<br> - Andrea Abel, Verena Lyding<br> - Tel: +39 0471 055 055<br> - Fax: +39 0471 055 099</p> -<p><br> -</p> -<h2>Credits</h2> -<p>The following icons used on the MERLIN platform are licensed under the Creative Commons-License:</p> -<p><img src="style/icon_info_alt.png" width="16" height="16"> by: <a href="http://www.famfamfam.com/about/" target="_blank">Mark James</a> | licensed under Creative Commons (CC BY 2.5), colour modified</p> -<p><img src="style/icon_help.png" alt="icon help" width="16" height="16"> by: <a href="http://icons8.com/" target="_blank"> Visual Pharm</a> | licensed under Creative Commons (CC BY-ND 3.0), colours inverted</p> - -<p>We further used:</p> -<p><img src="style/x_close.png" alt="icon help" width="16" height="16"> by: <a href="https://www.iconfinder.com/Zuczkowski" target="_blank"> Zuczkowski media design solutions</a> | free for commercial use, background color changed</p> -<p><a href="http://itsmeara.com/jquery/atooltip/#demos" target="blank">aToolTip</a> by Ara Abcarians, Copyright © 2009, licensed under Creative Commons (CC BY 3.0)</p> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/archive-check/research-sec.php b/php/en/archive-check/research-sec.php deleted file mode 100644 index 0430bf28ff203a1655b2aeb514c9dbeaac354d5a..0000000000000000000000000000000000000000 --- a/php/en/archive-check/research-sec.php +++ /dev/null @@ -1,433 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<form name="simple" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_simplesearch.php"></form> -<form name="advanced" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_advancedsearch.php"></form> -<form name="documents" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_documentsearch.php"></form> -<form name="statistics" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_statistics.php"></form> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>MERLIN for research</h1> -<h2>1. Linking the MERLIN texts to the CEFR</h2> -<div id="anchor11"></div> -<h3><a name="reratings"></a>1.1 Re-ratings</h3> - <a href="#anchor11" onClick="toggle('#content11','#img11')"><img id="img11" src="style/toggle-expand.png"></a> -<div id="content11" class="content"> -<p>The MERLIN texts are the writings sections of CEFR-related, standardized high-quality tests from telc (Frankfurt/Main, Italian and German tests, <a href="http://www.telc.net/" target="_blank" class="reference">homepage</a>) and ÚJOP (Prague, Czech tests, <a href="http://ujop.cuni.cz/" target="_blank" class="reference">homepage</a>). These institutions are ALTE-audited (<a href="http://www.alte.org" target="_blank" class="reference">ALTE-homepage</a>). The <a href="#" onclick="document.forms['mcorpus'].submit();" class="reference">tasks</a> were in use until 2013 and are now freely available on the platform. However, to have explicit and direct information about the CEFR profiles of the written productions themselves (and not only of the tests as a whole), for MERLIN all texts were re-rated independently by two professional raters per language. -The reliability of the re-ratings was examined with the help of Classical Test Theory and a Multi-Facet Rasch analysis. The latter is a probabilistic statistical procedure often used in language testing which allows for a correction of rating tendencies (e.g., leniency/harshness) and makes it possible to arrive at a fair average rating for each text. The intra-rater and inter-rater reliability was generally very high in MERLIN, with some exceptions for Italian. Therefore, the whole re-rating process was repeated for Italian resulting in a satisfying rating quality. -In MERLIN, the fair average is calculated based on a holistic scale (see <a href="#instruments" class="reference">1.2 rating instruments</a>). If you compile your own corpus based on CEFR levels, these are also based on the fair average ratings (» <em><strong>Define a subcorpus » Overall CEFR rating</strong></em>). -If you are interested in more details regarding the quality of the ratings and the difficulty of the single rating criteria, please consult the <a href="#" onclick="document.forms['download'].submit();" class="reference">technical report</a>. </p> - -<p> </p> -</div> -<div id="anchor12"></div> -<h3><a name="instruments"></a>1.2 Rating instruments </h3> - <a href="#anchor12" onClick="toggle('#content12','#img12')"><img id="img12" src="style/toggle-expand.png"></a> -<div id="content12" class="content"> -<p>Two rating instruments were used: An assessor-oriented version (Alderson 1991) of the holistic scale (page 2 of the <a href="download.html#corpus" target="_blank" class="reference">MERLIN rating grid</a>) for "General Linguistic Range" (Chapter 5, CEFR) was accompanied by an analytical rating grid (page 3 of the <a href="download.html#corpus" target="_blank" class="reference">MERLIN rating grid</a>) that is closely connected to Table 3 of the CEFR (CoE 2001). This table was of great importance in the process of scaling the CEFR descriptors (North 2005, 2000). The MERLIN version includes six rating criteria (vocabulary range | vocabulary control | grammatical accuracy | coherence & cohesion|orthography | sociolinguistic appropriateness). These criteria stem from scales in Chapter 5 of the CEFR that specifies aspects of communicative L2 competence. For the construction of the grid, descriptors of these scales were modified in an assessor-oriented way. Plus-levels (A2+, B1+) were excluded as the CEFR does not specify descriptors for these levels for all rating criteria. The rating instruments were piloted before their implementation in the MERLIN project.</p> -</div> -<p> </p> - -<h2><a name="dataprep"></a>2. Preparing the data</h2> -<div id="anchor21"></div> - <h3>2.1 Transcriptions</h3> - <a href="#anchor21" onClick="toggle('#content21','#img21')"><img id="img21" src="style/toggle-expand.png"></a> -<div id="content21" class="content"> -<p>The hand-written original learner texts were transcribed in an xml-based editor (xml mind©) inside the testing institutions (telc and ÚJOP). The transcribers followed <a href="download.html#corpus" class="reference">transcription guidelines</a> (available only in German) and the reliability of the transcripts was checked, initially for a sample of 5% of the texts per CEFR level. As many transcription errors were detected, in the end almost all texts had to undergo a revision stage.<br> -The transcription guidelines included tags (inline annotation) for basic textual features such as unreadable or ambiguous stretches of language, foreign language words, emoticons, images, paragraphs, copied words from the rubrics, or greeting formulae. The anonymization (names, places) was part of the transcription process and was carried through based on the guidelines.</p> -<div> - <div> </div> -</div> -</div> -<div id="anchor22"></div> -<h3>2.2 Tools & formats</h3> -<a href="#anchor22" onClick="toggle('#content22','#img22')"><img id="img22" src="style/toggle-expand.png"></a> -<div id="content22" class="content"> -<p>Once the transcriptions were available, all data was converted to PAULA (<a href="purl.org/net/paula" target="_blank" class="reference">purl.org/net/paula</a>), a standoff XML format designed as an exchange format for linguistic annotation. -Further manual annotations were carried through with two tools: MMAX2 (<a href="mmax2.net" target="_blank" class="reference">mmax2.net</a>) and the Falko Excel Add-in (<a href="purl.org/net/falko" target="_blank" class="reference">purl.org/net/falko</a>). MMAX2 is a text annotation tool that allows multi-layered annotation. It was used for the annotation of learner language features (see <a href="#" onclick="document.forms['annotation'].submit();" class="reference">2.3.1</a>). The Falko Add-in was used for annotating both target hypothesis 1 and 2 (» <em> <strong>for more details on the annotation of target hypotheses with the Falko Add-in see</strong></em> <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau und Annotationen_v2.01" target="_blank" class="reference">Falko-Handbuch</a>). -Automatic annotation made use of the UIMA framework (<a href="uima.apache.org" target="_blank" class="reference">uima.apache.org</a>). UIMA allows a modular integration of a wide range of NLP tools such as part-of-speech taggers and parsers. For the advanced search functions, the open source web-browser based search and visualization architecture ANNIS (<a href="purl.org/net/annis" target="_blank" class="reference">purl.org/net/annis</a>) is used in the MERLIN interface (<form name="help-annis" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="help-annis.php"></form><a href="#" onclick="document.forms['help-annis'].submit();" class="reference">see explanations on search output in ANNIS</a>). </p> -<p> </p> -</div> -<div id="anchor231"></div> -<p> </p> -<h3><a name="annotations"></a>2.3 Annotations</h3> -<p>A short introduction to the structure of the MERLIN annotations is provided <a href="#" onclick="document.forms['annotation'].submit();" class="reference">here</a>. Here, you find more detailed information on the single annotation layers that are available for the whole corpus, for the smaller core corpus, and you find indications on quality control aspects.</p> -<blockquote> - <h4>2.3.1 Manual annotations available for the whole corpus <a href="#anchor231" onClick="toggle('#content231','#img231')"><img id="img231" src="style/toggle-expand.png"></a></h4> -</blockquote> -<div id="content231" class="content"> - <p><img src="style/annotations_GRAPHIC-layer_en1.png" width="534" height="195" alt="EA1"></p> - <p> </p> - <h5>Minimal target hypotheses / target hypotheses 1 (TH1)</h5> - <p>All annotation is necessarily based on human interpretation of what the person who produced the text might have had on his/her mind. It is important to make this interpretation explicit so that MERLIN users can understand the annotations better. Therefore, the MERLIN corpus contains rule-based target hypotheses that suggest a corrected version of the learner texts. <br> - In the main phase of annotation, an orthographically and grammatically correct version of the learner text was created (target hypotheses 1, TH1) for the whole corpus. As little interventions as possible were allowed by the annotator. In this table, you find a simple example (for a definition of the tiers, please refer to the <a href="#" onclick="document.forms['help-annis'].submit();" class="reference">explanations of the search output</a>):</p> - <p><img src="style/TH1_example1.png"></p> - <p>The following example by the same learner shows that in TH1, errors from other linguistic areas were ignored. There are content and technical reasons for this.</p> - <p><img src="style/TH1_example2.png"></p> - <p>While the orthographical (capitalization error, word boundary error, missing hyphen) and grammatical (missing article) errors are corrected in the TH1 (termed ‘ZH1’ here), the lexically erroneous form *Reisespass (instead of “Reisepass”) was not substituted by another lexeme. Phenomena like this are annotated in the <a href="#corecorpus" target="_blank" class="reference">MERLIN core corpus</a> (for definitions of the errors see <a href="download.html#annotations" target="_blank" class="reference">MERLIN annotation scheme</a>).</p> - <p>The team followed the target hypotheses rules developed for the <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko" target="_blank" class="reference">Falko corpus</a> and adapted them to the project needs where necessary (cf. Reznicek/Lüdeling et al. 2012; see <a href="#" onclick="document.forms['download'].submit();" class="reference">annotation structure guidelines and Documentation of annotation decisions</a>). In some cases, annotators agreed upon annotation rules on a very fine-grained level. For example, it was decided that in German, the final double <ss> instead of standard German spelling <ß> was not changed in texts in which it might be possible that the learner consistently used the Swiss spelling, which does not use the <ß>. For single decisions that you might be interested in, please consult <a href="#" onclick="document.forms['download'].submit();" class="reference">the Documentation of annotation decisions</a>.</p> - <p>TH1 were compiled for the whole MERLIN corpus. The TH1 were written in Excel with the help of the Falko Add-in. The TH1 was piloted before the actual annotation took place.</p> - <p> If you want to display the TH1 on the MERLIN platform, go to » <strong><em>Advanced search. </em></strong> To get explanations about the output you get there, read more <a href="#" onclick="document.forms['help-annis'].submit();" class="reference">here</a>. You can also display TH1 for whole texts in the search results of <em><strong>» Define a subcorpus</strong></em>.</p> - <p> </p> - <table border="0" cellspacing="1" cellpadding="0"> - <tr> - <td valign="top"><img src="style/aim-icon.png" width="30" height="30" alt="go"></td> - <td width="720" bgcolor="#CCCCCC"><p>Useful links & downloads with regard to TH1:<br> - <a href="#" onclick="document.forms['download'].submit();" class="reference">MERLIN annotation manual</a><br> - <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau%20und%20Annotationen_v2.01" target="_blank" class="reference">Das Falko-Handbuch. Korpusaufbau und Annotationen. Version 2.01. HU Berlin</a> (Falko guidelines)<br> - <a href="#" onclick="document.forms['download'].submit();" class="reference">Documentation of annotation decisions</a></p></td> - </tr> - </table> - <p> </p> - <h5><strong><a name="ea1"></a></strong>Manual annotation of grammatical and orthographical learner language features – error annotation 1 (EA1)</h5> -<p>Building on the target hypotheses 1, all MERLIN texts were annotated with grammatical and orthographical language features from various sources (error annotation 1 – EA1). You can find a complete list of the features (“tags”) with examples <a href="#" onclick="document.forms['annotation'].submit();" class="reference">here</a>, while the <a href="#" onclick="document.forms['download'].submit();" class="reference">annotation scheme </a> gives you full access to the definitions of each learner language feature and additional examples.</p> -<p>The MERLIN annotation tags for <strong>EA1 and EA2</strong> were derived from …</p> -<ol> - <li> - <p><strong>CEFR scales</strong>: some tags were chosen to support research about the empirical validity of the CEFR scales underlying the <a href="docs/MERLIN_Rating-Grid.pdf">MERLIN analytical rating grid </a><img src="style/document-pdf.png" width="16" height="16"> (chapter 5 of the CEFR, CoE 2001). They can help to control whether the predictions of selected CEFR descriptors correspond to learner behaviour, e.g.: intelligibility, use of idioms, content jumps (<a href="#scale-valid" class="reference">see 3.2 MERLIN for scale validation</a>).  </p> - </li> - <li> - <p>issues in current <strong>SLA research</strong>, e.g. grammatical aspects such as verb valency, word order, negation, or lexical aspects, e.g. the use of formulaic sequences (<a href="#bib" class="reference">references</a>)</p> - </li> - <li> - <p>features reported to the MERLIN team by <strong>testers, teachers and teacher trainers</strong> in a questionnaire study and in expert interviews as being relevant for assessing language mastery at certain levels, e.g. the verbal aspect in Italian and Czech </p> - </li> - <li> - <p><strong>textbook and language test analyses </strong>revealed further recurrent topics some of which were included in the MERLIN annotation scheme, e.g. German modal verbs</p> - </li> - <li> - <p><strong>learner text analyses</strong> carried out in a random sample of MERLIN texts (5% per test level/language), e.g. use of articles and clitics</p> - </li> - </ol> -<p> </p> -<p>The annotation scheme specifies to which group(s) the single learner language features belong.</p> -<p>Furthermore, most error-related MERLIN tags (EA1 & EA2) incorporate the widely used <strong>‘target language modification’</strong> dimension (cf. Díaz-Negrillo/Fernández-Domínguez 2006). This dimension specifies the type of error: an element might have been omitted, changed, added, repositioned, merged with, or split from another element). You can find details about this in the <a href="docs/AS_part1.pdf" target="_blank" class="reference">annotation scheme</a> <img src="style/document-pdf.png" width="16" height="16">. </p> -<p>You can search for the annotated learner language features in the » <strong><em>Advanced search,</em></strong> or you can extract lists of features relevant for a specific linguistic field or a specific CEFR level here <strong><em>» Statistics.</em></strong> -</p> -<p> </p> -<table border="0" cellspacing="1" cellpadding="0"> - <tr> - <td valign="top"><img src="style/aim-icon.png" width="30" height="30" alt="go"></td> - <td width="720" bgcolor="#CCCCCC"><p>Further links:<br> - <a href="#" onclick="document.forms['help-annis'].submit();" class="reference">advanced search output explanation</a><br> - <a href="docs/AS_part1.pdf" target="_blank" class="reference">annotation scheme</a> <img src="style/document-pdf.png" alt="" width="16" height="16">. <br> - <a href="#bib" class="reference">references</a><br> - <a href="#" onclick="document.forms['annotation'].submit();" class="reference">list with learner language features and examples</a></p></td> - </tr> -</table> -<p> </p> -<div> - <div> </div> -</div> -</div> -<div id="anchor232"></div> -<blockquote> - <h4><a name="corecorpus"></a>2.3.2 Manual annotations in the MERLIN core corpus <a href="#anchor232" onClick="toggle('#content232','#img232')"><img id="img232" src="style/toggle-expand.png"></a></h4> -</blockquote> -<div id="content232" class="content"> - <p> </p> - <h5>The structure of the MERLIN core corpus</h5> - <p>For a small pilot sample (the <strong>MERLIN core corpus</strong>), in addition to grammar and orthography more linguistic dimensions are taken into consideration. The <strong>MERLIN core corpus</strong> consists of texts that received <a href="#reratings" class="reference">fair averages</a> of either A2 or B2. Thus, two groups of learners with a clearly distinct level of proficiency can be compared. It is important to notice that the ratings the learners received do not necessarily correspond to the CEFR level of the test they decided to take. You can distinguish between these dimensions here <em><strong> » Define a subcorpus </strong></em>(“CEFR level of test” and “Overall CEFR rating”).</p> - <p>Many outperformed the targeted CEFR levels, while others’ performances were rated lower than the learners would have expected. An extreme case is Italian, where only two texts actually received a B2 level, while many more students took B2 tests. Here, the MERLIN core corpus incorporates the 100 texts that were placed highest on the Rasch logit scale (<a href="#" onclick="document.forms['download'].submit();" class="reference">technical report</a>). </p> - <p><img src="style/annotations_GRAPHIC-layer_en2.png" width="529" height="200"></p> - <p> </p> - <h5>Core corpus: extended target hypotheses / target hypotheses 2 (TH2)  </h5> - <p> Target hypotheses 2 aim at creating an acceptable version of the learner text. This process involves more subjectivity and difficulties of decision reliability, which is why it was separated from the level of target hypotheses 1 like in the Falko project with which there was a strong cooperation. The aim of TH2 is to capture the perspective of <strong>acceptability</strong> of the learner text (not, like for TH1, its correctness). TH2 therefore are an extension of TH1. To this aim, the learner text was still only minimally modified while at the same time its reconstruction comes close to what a native speaker utterance would look like. This reconstruction regards semantic and lexical aspects, pragmatics, and sociolinguistics. Other than in the TH1, phenomena that over-arch sentences and that are determined by the context are modified, too.</p> - <p>You can search for the TH2 in the <em><strong> » Simple search </strong></em>and in the <strong></strong><em><strong>» Advanced search</strong></em>.</p> - <p> </p> - <h5>Core corpus: annotations of sociolinguistic, pragmatic, lexical, and other learner language features  (error annotation 2, EA2)</h5> - <p>For a part of the MERLIN core corpus, many tags from various linguistic perspectives were added to the grammatical and orthographical learner language features annotated in the main stage of the project. These tags stem from the same sources as the EA1 annotations (<a href="#ea1" class="reference">see 2.3.1</a>). </p> - <p>You can find detailed information about the single tags which include, for example, the speech act REQUEST, the use of language with an inappropriate level of formality, the use of structures that pertain to spoken language variants, or reference problems in the <a href="docs/AS_part1.pdf" target="_blank" class="reference">annotation scheme</a> <img src="style/document-pdf.png" width="16" height="16">. You can get an overview of the annotated features and find examples <a href="#" onclick="document.forms['annotation'].submit();" class="reference">in this table</a>.</p> - <p>Again, the MERLIN tags incorporate the widely used ‘target language modification’ dimension (cf. DÃaz-Negrillo/Fernández-DomÃnguez 2006) which yields information about the type of the learner language feature (an element might have been omitted, changed, added, repositioned, merged with, or split from another element). </p> - <p>You can find these learner language features in the <em><strong>»</strong></em> <strong><em>Advanced search</em></strong>. You can compile a list of these features for a particular linguistic area or a specific CEFR level here <em><strong>»</strong></em> <strong><em>Statistics. </em></strong></p> - -<p> </p> -</div> -<div id="anchor233"></div> -<blockquote> - <h4>2.3.3 Quality control aspects of the annotation process <a href="#anchor233" onClick="toggle('#content233','#img233')"><img id="img233" src="style/toggle-expand.png"></a></h4> -</blockquote> -<div id="content233" class="content"> - <p>It was important to make sure that the annotations in the MERLIN corpus are as <strong>consistent</strong> as possible, even if a certain degree of subjectivity is unavoidable. To this aim, the MERLIN project carried through a number of measures:</p> - <p> First of all, all instruments (TH 1 & TH2 rules, annotation scheme for EA1 and EA2) were <strong>piloted</strong> before their implementation. This allowed to detect possibly problematic aspects which could be corrected before the annotations started.</p> - <p> Secondly, all annotations are based on <strong>guidelines</strong> (<a href="#" onclick="document.forms['download'].submit();" class="reference">annotation manual</a>, <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau und Annotationen_v2.01" target="_blank" class="reference">Falko-Handbuch</a>). The guidelines were enriched by <strong>fine-grained decisions</strong> on single aspects of annotation (<a href="#" onclick="document.forms['download'].submit();" class="reference">documentation of annotation decisions</a>). </p> - <p> A third measure to control the quality of annotations is their <strong>documentation</strong>. Many decisions had to be taken about which tag to apply to what phenomenon, and consistency among the three project languages had to be taken care of. The most important discussions among the annotators are documented in the <a href="#" onclick="document.forms['download'].submit();" class="reference">documentation of annotation decisions</a>. In the <a href="#" onclick="document.forms['download'].submit();" class="reference">annotation scheme</a>, the ‘related tags’ sections mirror some of the extensive discussion processes. </p> - <p>Last but not least, the reliability of the annotations was controlled also a little bit more formally. <strong>Re</strong><strong>liability</strong> of annotations was controlled for 5% of the texts on each test level for target hypotheses (1 & 2) and error annotation (1 & 2). Different methods were applied: </p> - <blockquote> - <p> In a <strong>qualitative</strong> approach, half of the files were annotated independently by the coders to then be commonly discussed with the aim to arrive at a <strong>consensus</strong>. This happened before the annotation (which was done level by level) of the level started. The texts served as a reference throughout the annotation process. </p> - <blockquote> - <p> The second half of the files checked for reliability was annotated by all coders without their knowledge. This <strong>quantitative</strong>, <strong>double-blind procedure</strong> allows to check for intra-coder reliability (the consistency of one and the same annotator) and inter-coder reliability (the degree of agreement between different annotators). </p> - <p> </p> - <h5>Consistency and interference of annotation layers </h5> - <div> - <div> </div> - </div> - </blockquote> - </blockquote> - <p>From a technical perspective, it was complex to integrate and harmonize the different annotation formats in MERLIN without losing information or creating imprecisions. <br> - At the same time, on a content level, contradictions between the different annotation levels (TH1-EA1-TH2-EA2) were to be avoided.<br> - TH1 and EA1 are closely connected. If there is a change of the learner text on TH1, there ought to be a tag on EA1 that makes the learner language feature explicit in detail. There are single exceptions to this rule which are documented in the <a href="#" onclick="document.forms['download'].submit();" class="reference">documentation of annotation decisions</a>. <br> - Also, all EA2 annotations are reflected in TH2. The opposite, however, is not necessarily true: There might be TH2 modifications that are needed to arrive at an acceptable version of the learner text and that are not part of the <a href="#" onclick="document.forms['download'].submit();" class="reference">MERLIN annotation scheme</a>. The MERLIN team might have not included a phenomenon if it was not considered relevant and/or feasible. </p> -</div> -<p> </p> -<div id="anchor234"></div> -<blockquote> - <h4>2.3.4 Automatic annotations in MERLIN <a href="#anchor234" onClick="toggle('#content234','#img234')"><img id="img234" src="style/toggle-expand.png"></a></h4> -</blockquote> -<div id="content234" class="content"></p> - -<p>In MERLIN, a combination of automatic and manual [link] annotation - procedures was used in order to prepare learner texts for integration into - the platform. We have applied existing automatic annotation tools - developed for the target languages in order to expand the range of - available linguistic annotation beyond what would have been possible with - time-consuming and expensive manual annotation. However, it is important - to keep in mind that automatic annotation is particularly challenging for - learner language, since learner language often deviates considerably from - the target language across all levels of linguistic analysis, from -spelling to semantics.</p> -<p> </p> -<h5>The following tools were used for all three MERLIN languages:</h5> -<p>Texts were tokenized using the <a href="http://alias-i.com/lingpipe/docs/api/com/aliasi/tokenizer/IndoEuropeanTokenizerFactory.htm" target="_blank" class="reference">tokenizer for Indo-European - languages</a> from LingPipe and the resulting tokenization was then corrected by hand. <br> -Sentences were annotated with the <a href="https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.sentdetect" target="_blank" class="reference">OpenNLP sentence -segmenter</a>.<br> -Repetitions were identified using the <a href="https://code.google.com/p/saphre" target="_blank" class="reference">Saphre -library</a> on the basis of the automatic part-of-speech and lemma annotation described below.</p> -<p> </p> -<h5>Language-Specific Tools</h5> -<p>MERLIN contains part-of-speech tags (tok_pos), lemmas (tok_lemma), and - dependency parses (dependencies) for all three languages. Additional - part-of-speech tags, lemmas, and morphological analyses from alternate - tools are included where available. Details about the annotation tools -and annotation schemes are provided for each language individually below.</p> -<p> </p> -<table border="0" cellspacing="2" cellpadding="0"> - <tr> - <td rowspan="4" valign="top" bgcolor="#CCCCCC"><p>CZECH</p></td> - <td width="720" bgcolor="#CCCCCC"><p>Part-of-speech tags and lemmas (tok_pos and tok_lemma):</p></td> - </tr> - <tr> - <td width="720"><p><a href="http://ufal.mff.cuni.cz/morphodita" target="_blank" class="reference">MorphoDiTa</a> was used to annotate POS - tags and lemmas according to the <a href="http://ufal.mff.cuni.cz/pdt/Corpora/PDT_1.0/References/mman.html" target="_blank" class="reference">Prague Dependency Treebank - guidelines</a>. - There are 12 basic POS tags (seen in the first character of each tag) and - more than 4000 possible detailed morphosyntactic tags in the full tag set.</p></td> - </tr> - <tr> - <td width="720" bgcolor="#CCCCCC"><p>Dependency parses:</p></td> - </tr> - <tr> - <td width="720"><p>The <a href="https://code.google.com/p/mate-tools/wiki/ParserAndModels" target="_blank" class="reference">joint tagger and parser</a> from Bernd - Bohnet et al. (2013) was trained on data from the <a href="http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/a-layer/html/index.html" target="_blank" class="reference">Prague Dependency - Treebank</a>. The parser also provides basic POS tags (tok_pos_bohnet) and morphological - analyses (tok_morph_bohnet).</p></td> - </tr> - <tr> - <td rowspan="6" valign="top" bgcolor="#CCCCCC"><p>GERMAN</p></td> - <td width="720" bgcolor="#CCCCCC"><p>Part-of-speech tags and lemmas (tok_pos and tok_lemma):</p></td> - </tr> - <tr> - <td width="720"><p><a href="http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/" target="_blank" class="reference">TreeTagger</a> was - used to annotate POS tags and lemmas using the <a href="http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/stts_guide.pdf" target="_blank" class="reference">Stuttgart-Tübingen tag - set</a>, which contains 54 tags.</p></td> - </tr> - <tr> - <td width="720" bgcolor="#CCCCCC"><p>Dependency parses:</p></td> - </tr> - <tr> - <td width="720"><p>The <a href="https://code.google.com/p/mate-tools/wiki/ParserAndModels" target="_blank" class="reference">joint tagger and - parser</a> from - Bernd Bohnet et al. (2013) was trained on a dependency conversion of the - <a href="http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.en.html" target="_blank" class="reference">Tiger - Treebank </a> with additional data from the <a href="http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/smor.en.html" target="_blank" class="reference">SMOR morphological - analyzer</a>. <br> - Bernd Bohnet kindly provided a version of the German parsing model - customized for the MERLIN data. The parser also provides basic POS tags - (tok_pos_bohnet), lemmas (tok_lemma_bohnet), and morphological analyses (tok_morph_bohnet).</p></td> - </tr> - <tr> - <td width="720" bgcolor="#CCCCCC"><p> T-units (tunit and complextunit):</p></td> - </tr> - <tr> - <td width="720"><p>T-units and complex t-units were identified using the algorithms presented - in Julia Hancke's 2013 master's thesis "Automatic Prediction of CEFR - Proficiency Levels Based on Linguistic Features of Learner Language", - which relies on automatic parses produced by the <a href="http://nlp.stanford.edu/software/lex-parser.shtml" target="_blank" class="reference">Stanford - parser</a>. The parses - are not presented in the MERLIN corpus, but the POS tags from the Stanford - parser, which uses the same German tag set as TreeTagger (STTS), are shown - for reference in tok_pos_stanford.</p></td> - </tr> - <tr> - <td rowspan="4" valign="top" bgcolor="#CCCCCC"><p>ITALIAN</p></td> - <td width="720" bgcolor="#CCCCCC"><p> Part-of-speech tags and lemmas (tok_pos and tok_lemma):</p></td> - </tr> - <tr> - <td width="720"><p><a href="http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/" target="_blank" class="reference">TreeTagger</a> was - used to annotate POS tags and lemmas. The <a href="http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt" target="_blank" class="reference">POS tag - set</a> developed by Achim Stein contains 38 tags.</p></td> - </tr> - <tr> - <td width="720" bgcolor="#CCCCCC"><p>Dependency parses:</p></td> - </tr> - <tr> - <td width="720"><p>The <a href="https://code.google.com/p/mate-tools/wiki/ParserAndModels" target="_blank" class="reference">joint tagger and - parser</a> from - Bernd Bohnet et al. (2013) was trained with data from the <a href="http://medialab.di.unipi.it/wiki/ISDT" target="_blank" class="reference">Italian - Stanford Dependency Treebank</a>. - Additional POS tags and morphological analysis provided by the parser are - included as tok_pos_bohnet and tok_morph_bohnet.</p></td> - </tr> -</table> -<p> </p> -</div> -<p> </p> -<div id="anchor3"></div> -<h2>3. Using MERLIN for research purposes</strong> <a href="#anchor3" onClick="toggle('#content3','#img3')"><img id="img3" src="style/toggle-expand.png"></a></h2> -<div id="content3" class="content"> -<p>The main aim of MERLIN is not research-oriented: the platform was developed for practitioners who need empirical illustrations of rated CEFR levels for Czech, Italian, and German. Recently, an increasing number of initiatives (like <a href="http://www.slate.eu.org/" target="_blank" class="reference">SLATE</a>) have started to collect authentic learner language rated according to CEFR levels. Some of them pertain to the <em>Reference Level Descriptions</em> (RLD) initiative, i.e. a specification of the CEFR levels for single languages (the most prominent example is the <a href="http://www.englishprofile.org/" target="_blank" class="reference">English Profile Project</a>, other projects are ASK for Norwegian, Carlsen 2013, or the Profilo della lingua italiana, Spinelli/Parizzi 2010). The Council of Europe encourages the development of RLDs (CoE 2005, see <a href="http://www.coe.int/t/dg4/linguistic/cadre1_en.asp" title="CoE website for RLD" target="_blank" class="reference">CoE website for Reference Level Descriptions</a>).<br> -From corpora like these, features that characterize CEFR levels (sometimes called “criterial features”, Hawkins/FilipovÃc 2012) can be extracted. This process helps to deepen the understanding of what CEFR-related ratings mean and to build its use on firmer, empirical grounds. MERLIN contributes to the empirically-based exploration of the CEFR for German, Italian, and Czech. It differs from most existing initiatives in that all data, including full texts, test tasks and annotations, are fully and freely available online.<br> -Apart from this major practical aim<strong>, </strong>MERLIN is relevant for research purposes from various perspectives: </p> -<p> </p> - -<div id="anchor31"></div> -<h3><a name="scale-valid"></a>3.1 Validating CEFR scales with MERLIN</h3> - <a href="#anchor31" onClick="toggle('#content31','#img31')"><img id="img31" src="style/toggle-expand.png"></a> -<div id="content31" class="content"> -<p>The Council of Europe effort of scaling the CEFR descriptors (CoE 2001; North 2000; Schneider/North 2000) has led to immense improvements in standardization and transparency in language learning, teaching, and testing. Important decisions about language learners' lives are taken with reference to the CEFR levels. In many ways, it seems as if the scales have acquired a life of their own; often, they are over-estimated, misunderstood and applied in ways that they were not meant to be used for (North 2000). -One crucial aspect that is yet insufficiently understood is the empirical validity of the CEFR scales (Fulcher 2004; Hulstijn 2007): If scales are used to describe or rate learner language, they must reflect what learners actually do (Alderson 1991). -In spite of this, up to date there is almost no research that examines the power of the CEFR descriptors to capture the language learners actually produce (Wisniewski 2014). MERLIN allows to directly analyze the relationship between selected CEFR descriptors (such as "circumlocutions" or "content jumps" which were operationalized and annotated (see <a href="docs/AS_part1.pdf" target="_blank" class="reference">MERLIN annotation scheme</a>) and learner language without having to rely on ratings. </p> -</div> -<p> </p> - -<div id="anchor32"></div> -<h3>3.2 MERLIN and second language acquisition studies</h3> - <a href="#anchor32" onClick="toggle('#content32','#img32')"><img id="img32" src="style/toggle-expand.png"></a> - <div id="content32" class="content"> -<p>Many studies from the area of second language acquisition (SLA) refer to proficiency levels when describing the development and the variation of learner language. However, in many cases the proficiency classification is not yet based on procedures that comply with the strict standards that need to be met from the perspective of research-based, high-quality language testing (see for example AERA/APA/NCME; ALTE 2001; Bachman/Palmer 1996; <a href="http://www.ealta.eu.org/documents/archive/guidelines/English.pdf" target="_blank" class="reference">EALTA code of practice</a>). There is a particular lack of strict testing procedures and easily accessible empirical data for languages other than English when it comes to CEFR-based proficiency classifications. -Although MERLIN is small in size, its reliable relationship to the CEFR makes it a precious resource for future SLA studies. Also, it can be used for triangulating and validating data for many existing studies. -</p> -</div> -<p> </p> - -<div id="anchor33"></div> -<h3>3.3 MERLIN to advance NLP of learner language</h3> - <a href="#anchor33" onClick="toggle('#content33','#img33')"><img id="img33" src="style/toggle-expand.png"></a> -<div id="content33" class="content"> -<p>The MERLIN corpus provides valuable data for the development and evaluation of natural language processing tools for learner language (Meurers 2012). The corpus and its meta-information on learners and ratings readily support research on automatic native language identification, enabling such research to go beyond the current English learner focus. In a similar vein, the corpus has already been used for research on automatic proficiency classification for German (Hancke 2013). The MERLIN corpus also provides richly annotated learner data for the development and adaptation of NLP tools and applications that assist language learners in improving their vocabulary usage, coherence, spelling and grammatical accuracy. </p> -</div> -<p> </p> - -<div id="Pub"></div> -<h2><a name="bib"></a>References <a href="#Pub" onClick="toggle('#contentPub','#imgPub')"><img id="imgPub" src="style/toggle-expand.png"></a></h2> -<div id="contentPub" class="content"> -<p>[ALTE 2001] = ALTE Working Group on the Code of Practice: <em>Principles of Good Practice for ALTE Examinations. </em>Revised Draft. <a href="http://www.testdaf.de/institut/pdf/ALTE/ALTE_good_practice.pdf" target="_blank" class="reference">http://www.testdaf.de/institut/pdf/ALTE/ALTE_good_practice.pdf</a>, October 2013.<br> -[Consiglio d'Europa 2004a] = Trim, J./North, B./Coste, D.: <em>Quadro comune europeo di riferimento per le lingue: apprendimento, insegnamento, valutazione</em>. La Nuova Italia: Oxford.- A cura del Consiglio d'Europa.<br> -[Council of Europe 1975] = Van Ek, J. A.: <em>The Threshold Level in a European unit/credit system for modern language learning by adults</em>. Strasbourg: Council of Europe. <br> -[Council of Europe 1994a] = North, B.: <em>Scales of language proficiency: a survey of some existing systems.</em> Strasbourg: Council of Europe, CC-Lang (94) 24. <br> -[Council of Europe 1994b [1981]] = Galli de' Paratesi, N.: <em>Livello Soglia per l'insegnamento dell'italiano come lingua straniera. </em>Strasbourg: Edizioni del Consiglio d'Europa.<br> -[Council of Europe 1999 [1980]] = Baldegger, M./Müller, M./Schneider, G. (1999): <em>Kontaktschwelle Deutsch als Fremdsprache.</em> 4. Auflage. Berlin u.a.: Langenscheidt.- Herausgegeben vom Europarat.<br> -[Council of Europe 2001] = Trim, J./North, B./Coste, D.: <em>Common European Framework of Reference for Languages: Learning, teaching, assessment</em>. -Edited by the Council of Europe. Online-Dokument: <a href="http://www.coe.int/lang" target="_blank" class="reference">www.coe.int/lang</a>, Oktober 2013.<br> -[Europarat 2001] = Trim, J./North, B./Coste, D.: <em>Gemeinsamer europäischer Referenzrahmen für Sprachen: lernen, lehren, beurteilen</em>. Berlin u.a.: Langenscheidt.- Herausgegeben vom Europarat, Online-Dokument: <a href="http://www.goethe.de/z/50/commeuro/i7.htm" target="_blank" class="reference">http://www.goethe.de/z/50/commeuro/i7.htm</a>, Oktober 2013.<br> -[Europarat 2004b] = Takala, S./Kaftandjieva, F./Verhelst, N./Banerjee, J./Eckes, T./van der Schoot, F.: <em>Reference Supplement to the Preliminary Pilot Version of the Manual for Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment</em>.- Edited by the Council of Europe, Online-Dokument: www.coe.int/lang, Oktober 2013.<br> -[Europarat 2009 [2003]] = North, B./Figueras, N./Takala, S./Van Avermaet, P./Verhelst, N.: <em>Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment</em>. <em>Manual</em>. <em>Preliminary Pilot Version</em>.- Edited by the Council of Europe, Online-Dokument: www.coe.int/lang, Oktober 2013.<br> -Alderson, J.C. (2007): The CEFR and the need for more research. In: <em>The Modern Languagre Journal </em>91, 658-662. <br> -Alderson, J. C./Figueras, N./Kuijper, H./Nold, G./Takala, S./Tardieu, C. (2006): Analysing Tests of Reading and Listening in Relation to the Common European Framework of Reference: The Experience of the Dutch CEFR Construct Project. In: <em>Language Assessment Quarterly </em>3(1), 3-30.<br> -AERA/APA/NCME (1999): <em>Standards for educational and psychological testing.</em> Washington: AERA.<br> -Alderson, J.C. (1991): Bands and scores. In: Alderson, J.C./North, B. (eds.): <em>Language testing in the 1990s. London: British Council/Macmillan</em>, 71-86.<br> -Arnaud, P. J. L. (1984): The lexical richness of L2 written productionos and the validity of vocabulary tests: In: Culhane, T./Klein-Braley, C./Stevenson, D. K. (eds.): <em>Practice and Problems in Language </em><br> -Arras, U. (2010): Subjektive Theorien als Faktor bei der Beurteilung fremdsprachlicher Kompetenzen. In: Berndt, A./Kleppin, K. (eds.): <em>Sprachlehrforschung: Theorie und Empirie - Festschrift für Rüdiger Grotjahn</em>. Frankfurt: Lang, 169-179.<br> -Bachman, L.F. (2004): Statistical analyses for language assessment. Cambridge: CUP 2004.<br> -Bachmann, T. (2002): <em>Kohäsion und Kohärenz: Indikatoren für Schreibentwicklung: Zum Aufbau kohärenzstiftender Strukturen in instruktiven Texten von Kindern und Jugendlichen.</em> Innsbruck: Studienverlag. <br> -Bausch, K.-R./Christ, H./Königs, F.G./Krumm, H.-J. (eds.) (2003): <em>Der Gemeinsame Europäosche Referenzrahmen für Sprachen in der Diskussion. Arbeitspapiere der 15. Frühjarskonferenz zur Erforschung des Fremdsprachenunterrichts.</em> Tübingen: Narr.<br> -Bardovi-Harlig, K. (2009): Conventional Expressions as a Pragmalinguistic Resource: Recognition and Productions of Conventional Expressions in L2 Pragmatics. In: <em>Language Learning </em>59 (4), 755-795. <br> -Bestgen, Y./Granger, S. (2011): Categorising spelling errors to assess L2 writing. In: <em>International Journal of Continuing Engineering Education and Life Long Learning,</em> 21 (2), 235-252.<br> -Bond, T. G./Fox, C. M. (2007): Applying the Rasch model: Fundamental measurement in human sciences. Mahwah, NJ: Lawrence Erlbaum.<br> -Bulté, B./Housen, A. (2012): Defining and operationalising L2 complexity. In: Housen, A./Kuiken, F./Vedder, I. (eds.): D<em>imensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA</em>. Amsterdam: Benjamins, 21-46.<br> -Burger, H. (2007): <em>Phraseologie. Eine Einführung am Beispiel des Deutschen</em>. (3. Aufl.).Berlin: Erich Schmidt Verlag.<br> -Carlsen, C. (ed.) 2013. <em>Norsk Profil. </em><em>Det felles europeiske rammeverket spesifisert for norsk. Et første steg</em>. Oslo: Novus. <br> -Carlsen, C. (2010): Discourse connectives across CEFR levels: A corpus-based study. In: Bartning, I./Martin, M./Vedder, I. (eds.): <em>Communicative Proficiency and Linguistic Development: intersections between SLA and language testing research</em> (Eurosla). 191-210. purl.org/net/Carlsen-10.pdf<br> -Christ, O. (1994). A modular and flexible architecture for an integrated corpus query system. <em>arXiv preprint cmp-lg/9408005</em>.<br> -Corder, S. P. (1993 [1973]): <em>Introducing Applied Linguistics</em>. Harmondsworth: Pelican.<br> -Dallapiazza, R.M./von Jan, E., Schönherr, T. (1998) (eds.): Tangram: <em>Deutsch als Fremdsprache. Kurs- und Arbeitsbuch 1 A</em>. Munich: Hueber.<br> -Daller, H./van Hou, R./Treffers-Daller, J. (2003): Lexical richness in spontaneous speech of bilinguals. In: <em>Applied Linguistics </em>24, 197-222.<br> -Dewaele, J.-M. (2004): Indiviual differences in the use of colloquial vocabulary. The effects of sociobiographical and psychological factors. In: Bogaards, P./Laufer, L. (eds.): Vocabulary in a secons language. Amsterdam: John Bejamins, 127-154.<br> -Díaz-Negrillo, A./Fernández-Domínguez, J. (2006): Error-coding systems for learner corpora. In: <em>RESLA</em> 19, 83-102.<br> -Eckes, T. (2008): Rater types in writing performance assessments: A classification approach to rater variability. In: <em>Language Testing 25 </em>(2) 155-185.<br> -Eckes, T. (2009): <em>Reference Supplement to the Manual for Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Section H: Many-Facet Rasch Measurement</em>. (<a href="http://www.coe.int/t/dg4/linguistic/manuel1_en.asp" target="_blank" class="reference">http://www.coe.int/t/dg4/linguistic/manuel1_en.asp</a>, January 2014.)<br> -Eisenberg, P. (2007): Sprachliches Wissen im <em>Wörterbuch der Zweifelsfälle</em>. über die Rekonstruktion einer Gebrauchsnorm. In: <em>Aptum. Zeitschrift für Sprachkritik und Sprachkultur</em> 3/2007: 209-228.<br> -Ellis, R. (1994): <em>The study of Second Language Acquisition</em>. Oxford: Oxford University Press.<br> -Fulcher, G. (2004): Deluded by Artifices? The Common European Framework and Harmonization. In: <em>Language Assessment Quarterly</em> 1 (4), 253-266.<br> -Fulcher, G./Davidson, F. (2007): <em>Language Testing and Assessment. </em>London/New York: Routledge.<br> -Gould, S.J. (1996): <em>The mismeasure of man</em>. London: Penguin.<br> -Glaznieks A./Nicolas L./Stemle E./Lyding V./Abel A. (2012): Establishing a Standardised Procedure for Building Learner Corpora. In:<em> Apples - Journal of Applied Language Studies. Special Issue: Proceedings of LLLC2012</em>.<br> -Granger, S. (2003): Error-tagged learner corpora and CALL: a promising synergy. In: <em>CALICO Journal</em> 20 (3). Special issues on error analysis and error correction in computer-assisted language learning, 465-480.<br> -Granger, S. (2008): Learner corpora. In: Lüdeling, A. / Kytö, M. (eds.): <em>Corpus linguistics: an international handbook</em> (Handbooks of linguistics and communication science; 29.1_ 29.2). Berlin - New York: de Gruyter. 259-275.<br> -Granger, S. (2002): A Bird's-eye view of learner corpus research. In: Granger S,/Hung, J./ Petch-Tyson, St (eds.): <em>Computer Learner Corpora, Second Language Acquisition and Foreign Language Teachin</em>g. Amsterdam: John Benjamins, 3-33.<br> -Halliday, M. A. K. /Hasan, R. (1989): <em>Language, context and text: a social semiotic perspective. </em>Oxford: Oxford University Press.<br> -Hancke, J. <em>Automatic Prediction of CEFR Proficiency Levels Based on Linguistic Features of Learner Language</em>. Master's thesis, Universität Tübingen, April 2013<br> -Hancke J./Meurers D./Vajjala S. (2012): Readability Classification for German using lexical, syntactic, and morphological features<em>. </em>In: <em>Proceedings of the 24th International Conference on Computational Linguistics (COLING)</em>, 1063-1080.<br> -Hancke, J. (2013):<em>Automatic Prediction of CEFR Proficiency Levels Based on Linguistic Features of Learner Language</em>. Master's thesis, University of Tübingen.<br> -Hasil, J./Hájková, E./Hasilová, H. (2007): <em>Brána jazyka českého otevřená</em>. Prague: Karolinum.<br> -Hawkey, R./Barker, F. (2004): Developing a Common Scale for the Assessment of Writing. In: Assessing Writing 9, 122-159.<br> -Hawkins, J. A./FilipovÃc, L. (2012): <em>Criterial features in L2 English: Specifying the reference levels of the Common European Framework</em>. Cambridge: CUP.<br> -Housen, A./Kuiken, F. (2009): Complexity, Accuracy, and Fluency in Second Language Acquisition. In: <em>Applied Linguistics</em> 30 (4), 461-473.<br> -Hulstijn, J. H. (2007): The shaky ground beneath the CEFR: Quantitative and qualitative dimensions of language proficiency. In: <em>The Modern Language Journa</em>l 91, 663-667.<br> -Hulstijn, J. H./Alderson, C./Schoonen, R. (2010): Developmental stages in second-language acquisition and levels of second-language proficiency: Are there links between them? In: Bartning, I./Martin, M./Vedder, I. (eds.): <em>Communicative Proficiency and Linguistic dvelopment: intersections between SLA and language testing research</em>. Eurosla Monograph Series. (<a href="http://eurosla.org/monographs/EM01/EM01home.html" target="_blank" class="reference">http://eurosla.org/monographs/EM01/EM01home.html</a><em>) </em> <br> -Johns, T. (1988): Whence and whither classroom concordancing? In: Bongaarts, T./de Haan, P./Lobbe, S./Wekker, H. (eds.): Computer Applications in Language Learning. Dordrecht: Foris, 9-33.<br> -Johns, T. (1997): Contexts: The Background, Development and Trialling of a Concordance-based CALL Program. In: Wichmann, Anne/Fligelstone, Steven/McEnery, Tony/Knowles, Gerry (eds.) (1997), <em>Teaching and Language Corpora.</em> London: Longman, 100-115. <br> -Laufer, B./Nation, P. (1995): Vocabulary size and use: lexical richness in L3 written production. In: <em>Applied Linguistics </em>16, 307-322.<br> -Little, D. (2007): The Common European Framework of Reference for Languages: Perspectives on the Making of Supranational Languages Education Policiy. In: <em>The Modern Language Journal</em> 91, 645-655.<br> -Lu, X. (2011): A corpus-based evaluation of syntactic complexity measures as indices of College-level ESL writers' language development. In: <em>TESOL Quarterly</em> 45 (1) 36-62.<br> -Lu, X. (2010): Automatic analysis of syntactic complexity in second language writing. In: <em>International Journal of Corpus Linguistics</em> 15 (4), 474-496.<br> -Lüdeling, A. (2008): Mehrdeutigkeiten und Kategorisierung: Probleme bei der Annotation von Lernerkorpora. In: Walter, M./Grommes, P. (eds.): <em>Fortgeschrittene Lernervarietäten: Korpuslinguistik und Zweitsprachenerwerbsforschung. </em>Tübingen: Niemeyer, 119-140.<br> -Lüdeling, A./Walter, M./Kroymann, E./Adolphs, P. (2005): Multi-level Error Annotation in Learner Corpora. In: Hunston, S./Danielsson, P. (eds.): <em>Proceedings from the Corpus Linguistics Conference Series</em> (Corpus Linguistics 2005, Birmingham, 1415 July 2005). (<a href="http://www.corpus.bham.ac.uk/PCLC" target="_blank" class="reference">http://www.corpus.bham.ac.uk/PCLC</a>) <br> -Malvern, D./Richards, B./Chipere, N./Durán, P. (2008): <em>Lexical Diversity and Language Development. Quantification and Assessment. </em>New York: Palgrave Macmillan.<br> -Mellor, A. (2011): Essay Length, Lexical Diversity and Automatic Essay Scoring. In: <em>Memoirs of the Osaka Institute of Technology</em>, Series B Vol. 55, No. 2 (2011), 1-14.<br> -Meurers, D. (2012): Natural Language Processing and Language Learning. <em>Encyclopedia of Applied Linguistics</em>. Blackwell. purl.org/dm/papers/meurers-11.html<br> -Mezzadri, M. (2000). <em>Rete! Book 1</em>. Perugia: Guerra Edizioni.<br> -Müller, Ch./Strube M. (2006): Multi-Level Annotation of Linguistic Data with MMAX2. In: S. Braun, K. Kohn, J. Mukherjee (Eds.): Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods. Frankfurt: Peter Lang, 197-214.<br> -Nation, P. (2001): <em>Learning vocabulary in another language</em>. Cambridge: Cambridge University Press.<br> -Nation, P. (2007): Fundamental issues in modelling and assessing vocabulary knowledge. In: Daller, H./ Milton, J./Treffers-Daller, J. (eds.): <em>Modelling and Assessing Vocabulary Knowledge</em>. Cambridge: Cambridge University Press.<br> -Nesselhauf, N. (2005): <em>Collocations in a Learner Corpus</em>. Amsterdam: John Benjamins.<br> -North, B. (2000): <em>The Development of a Common Framework Scale of Language Proficiency. </em>Oxford: Peter Lang.<br> -O'Loughin, K. (1995): Lexical density in candidate output on direct and semi-direct versions of an oral proficiency test. In: <em>Language Testing </em>12 (2), 217-237. <br> -Ortega, L. (2003): Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. In: <em>Applied Linguistics</em> 24 (4), 492-518.<br> -Paquot, M./Granger, S. (2012): Formulaic language in Learner Corpora. In: <em>Annual Review of Applied Linguistics </em>32, 130-149.<br> -Pollitt, A./Murray, N.L. (1996): What raters really pay attention to. In: Milanovic, M./Saville, N. (eds.): <em>Performance testing, cognition and assessment; Selected papers from the 15th Language Testing Research Colloquium.</em> Cambridge: Cambrudge University Press, 74-91.<br> -Read, J./Nation, P. (2004): Measurement of formulaic sequences. In: Schmitt, N. (ed.): <em>Formulaic sequences: Acquisition, processing and use. </em>Amsterdam: John Benjamins, 23-35.<br> -Read, J. (2000): <em>Assessing vocabular</em>y. Cambridge: Cambridge University Press.<br> -Reznicek, M./Lüdeling, A./Krummes, C./Schwantuschke, F./Walter, M./Schmidt, K./Hirschmann, H./Andreas,T. (2012): <em>Das Falko-Handbuch. Korpusaufbau und Annotatione</em>n. Version 2.01. HU Berlin (<a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau%20und%20Annotationen_v2.01" target="_blank" class="reference">http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau%20und%20Annotationen_v2.01</a>)<br> -Reznicek, M./Lüdeling, A./Hirschmann, H. (in print): Competing Target Hypotheses in the Falko Corpus. A Flexible Multi-Layer Corpus Architecture. In: DÃaz-Negrillo, A./Ballier, N./Thompson, P. (eds.): <em>Automatic Treatment and Analysis of Learner Corpus Data</em>. Amsterdam: John Benjamins (Series Studies in Corpus Linguistics).<br> -Rimrott, A./Heift, T. (2008): Evaluating automatic detection of misspellings in German. In: <em>Language Learning & Technology</em> 11 (3), 73-92.<br> -Römer, U. (2010): Using general and specialized corpora in English language teaching: past, present and future. In: Campoy-Cubillo, M. et al. (eds.): Corpus-based approaches to English Language Teaching. London: Continuum, 18-38.<br> -Römer, Ute. 2008. 7. Corpora and language teaching. In: Lüdeling, Anke & Merja Kytö (eds.). <em>Corpus</em><em> L</em><em>inguistics. An International Handbook</em><em> (volume 1)</em>. [<a href="http://www.degruyter.com/cont/glob/neutralReiEn.cfm?rc=16647" target="_blank" class="reference">HSK series</a>] Berlin: Mouton de Gruyter. 112-130 <br> -Römer. U. (2006): Pedagogical applications of corpora: some reflections on the current scope and a wish list for future developments. In: Zeitschrift für ANglistik und Amerikanistik 54 (2) 121-134.<br> -Schmitt, N./Carter, N. (2004): Formulaic sequences in action: An Introduction. In: Schmitt, N. (ed.): <em>Formulaic sequences: Acquisition, processing and use. </em>Amsterdam: John Benjamins, 1-21.<br> -Schneider, J. G. (2013): Sprachliche ‚Fehler‘ aus sprachwissenschaftlicher Sicht. In: <em>Sprachreport</em> 1-2/2013, 30-37.<br> -Spinelli, B./Parizzi, F. (ed.) (2010): <em>Profilo della lingua italiana</em>. Firenze: La Nuova Italia.<br> -Stede, M. (2007): Korpusgestützte Textanalyse. Grundzüge der Ebenen-orientierten Textlinguistik. Tübingen: Narr.<br> -Trosborg, A. (1995): <em>Interlanguage Requests and Apologies. </em>Berlin: de Gruyter.<br> -Vajjala, S./Meurers, D. (2012): On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition. In: Tetreault, J./Burstein, J./ Leacock, C. (eds.): <em>Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA7) at NAACL-HLT</em>. Montreal, Canada: Association for Computational Linguistics, 163-173.<br> -Vaughan, C. (1991): Holistic assessment: What goes on in the rater's mind? In: Hamp-Lyons L. (ed.): <em>Assessing Second Language Writing in Academic Contexts. </em>Norwood: Ablex, 111.125.<br> -Wisniewski, K. (2013): The empirical validity of the CEFR fluency scale: the A2 level description. In: Galaczi, E.D./Weir, C.J. (eds.): <em>Exploring Language Frameworks: Proceedings of the ALTE Krakow Conference</em>. Cambridge: Cambridge University Press, 253-272. Studies in Language Testing.<br> -Wisniewski, K. (2014):<em> Die Validität der Skalen des Gemeinsamen europäischen Referenzrahmens für Sprachen. Eine empirische Untersuchung der Flüssigkeits- und Wortschatzskalen des GeRS am Beispiel des Italienischen und des Deutschen</em>. Frankfurt: Peter Lang. Language Testing and Evaluation Series, 33.<br> -Wisniewski, K./Schöne, K./Nicolas, L./Vettori, C./ Boyd, A./Meurers, D./ Abel, A./Hana, J. (2013): MERLIN: An online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data. In: <em>ICT for Language Learning, Conference Proceedings 2013</em>. Libreriauniversitaria.it Edizioni. (<a href="http://conference.pixel-online.net/ICT4LL2013/common/download/Paper_pdf/322-CEF03-FP-Wisniewski-ICT2013.pdf" target="_blank" class="reference">http://conference.pixel-online.net/ICT4LL2013/common/download/Paper_pdf/322-CEF03-FP-Wisniewski-ICT2013.pdf</a>) <br> -Wisniewski, K. / Abel, A. (2012): Die Sprachkompetenzerhebung: Theorie, Methoden, Qualitätssicherung. In: Abel, A. / Vettori, C. / Wisniewski, K. (eds.): <em>Gli studenti altoatesini e la seconda lingua: indagine linguistica e psicosociale. / Die Südtiroler SchülerInnen und die Zweitsprache: eine linguistische und sozialpsychologische Untersuchung</em>. Volume 1 - Band 1. Bolzano - Bozen: Eurac. 13-64 (<a href="http://www.eurac.edu/en/research/publications/PublicationDetails.aspx?pubId=0100156&type=Q" target="_blank" class="reference">http://www.eurac.edu/en/research/publications/PublicationDetails.aspx?pubId=0100156&type=Q</a>)<br> -Wolfe-Quintero, K./Inagaki, S./ Kim, H.-Y. (1998): <em>Second Language Development in Writing: Measures of Fluency, Accuracy & Complexity</em>. Honolulu: Second Language Teaching & Curriculum Center, University of Hawaii at Manoa.<br> -Yang, W./Sun, Y. (2012): The use of cohesive devices in argumentative writing by Chinese EFL learners at different proficiency levels. In: <em>Linguistics and Education</em>, 23 (1), 31-48. <br> -Wray, A. (2002): <em>Formulaic Language and the Lexicon</em>. Cambridge: Cambridge University Press.<br> -Zeldes, A./Ritz J./Lüdeling A. et al. (2009): <em>Annis: A search tool for multi-layer annotated corpora. In Proceedings of Corpus Linguistics</em>, July 20-23. Liverpool. (<a href="http://ucrel.lancs.ac.uk/publications/cl2009/" target="_blank" class="reference">http://ucrel.lancs.ac.uk/publications/cl2009/</a>) <br> -Zipser, F./Romary, L./al. (2010). A model oriented approach to the mapping of annotation formats using standards. In: <em>Workshop on Language Resource and Language Technology Standards, LREC 2010</em>.</p> -<p> </p> -</div> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/archive-check/start - Copy.php b/php/en/archive-check/start - Copy.php deleted file mode 100644 index e76027c5e336274edf5c8388cb1c95cc0c4473f1..0000000000000000000000000000000000000000 --- a/php/en/archive-check/start - Copy.php +++ /dev/null @@ -1,16 +0,0 @@ -<div id="content-menu3" style="min-height:200px"> - -<div id="merlin-info" style="width:390px; height:110px;"> -<h3>The MERLIN corpus</h3> -<p>MERLIN provides access to 2.286 texts written by learners of <b>Czech</b>, <b>Italian</b> and <b>German</b>.</p> -<p>The learner texts stem from standardized language tests and they have been reliably related to the CEFR levels. <a href="http://commul.eurac.edu/dev/merlin/php/C_mcorpus.php" class="reference"> read more</a></p> -</div> - -<div id="merlin-info" style="width:280x; height:110px;"> -<h3>Use MERLIN ...</h3> -<p>... to better understand the levels of the Common European Framework of Reference (CEFR). -<a href="http://commul.eurac.edu/dev/merlin/php/C_teacher.php" class="reference"> read more</a></p> -</div> - - -</div> diff --git a/php/en/archive-check/start-old.php b/php/en/archive-check/start-old.php deleted file mode 100644 index bab67d311fec8c99d4286ac9e31c4a7c7aa7a575..0000000000000000000000000000000000000000 --- a/php/en/archive-check/start-old.php +++ /dev/null @@ -1,56 +0,0 @@ -<div id="content-menu3" style="min-height:500px"> - -<div id="merlin-info" style="float:none; width:684px"> -<h3>The MERLIN project</h3> -<p>The MERLIN corpus project provides access to empirical learner language for those working with the Common European Framework of Reference for Languages (CEFR). The MERLIN platform allows CEFR users to explore authentic written learner productions for Czech, German, and Italian. The learner texts stem from standardized language tests and are reliably related to the CEFR levels. -<a href="#" onclick="document.forms['about'].submit();">>> read more</a> -</p> -</div> - -<div id="merlin-info" style="width:340px"> -<h3>What can I use MERLIN for?</h3> -<p>MERLIN offers you support for teaching, learning, or testing Czech, German, and Italian. For example, you can ... -<p><ul> -<li>find example texts for a specific CEFR level and bring them to the classroom -<a href="#" onclick="document.forms['documents'].submit();">>> document search</a></li> -<li>search for a word in learner texts and explore how learners use it -<a href="#" onclick="document.forms['simple'].submit();">>> simple search</a></li> -<li>create a sub-corpus to explore errors that are typical or frequent with learners with the same L1 or at the same age -<a href="#" onclick="document.forms['documents'].submit();">>> document search</a></li> -<li>search for examples of learner language features, e.g. grammatical or orthographical errors, on a specific CEFR level -<a href="#" onclick="document.forms['advanced'].submit();">>> advanced search</a></li> -<li>find examples of errors related to a specific word (e.g. valency errors with the verb "warten") -<a href="#" onclick="document.forms['advanced'].submit();">>> see example queries in the advanced search</a></li> -<li>compile lists of frequent errors in texts that you defined in your sub-corpus -<a href="#" onclick="document.forms['feature'].submit();">>> learner language features</a></li> -</ul></p> -<p><a href="#" onclick="document.forms['teacher'].submit();">>> read more</a> -</div> - -<div id="merlin-info" style="width:305px"> -<h3>MERLIN search and export functions</h3> -<p>You can search for: -<ul> -<li>occurrences of a word in learner texts -<a href="#" onclick="document.forms['simple'].submit();">>> simple search</a></li> -<li>full texts using learner- and test-specific metadata for filtering -<a href="#" onclick="document.forms['documents'].submit();">>> document search</a></li> -<li>learner language features, e.g. grammatical or orthographical errors, in the whole corpus or in a sub-corpus -<a href="#" onclick="document.forms['advanced'].submit();">>> advanced search</a></li> -</ul> -</p> -<p><a href="#" onclick="document.forms['help'].submit();">>> Please visit our tutorial "How to search in MERLIN"</a> -<br> -<p>You can compile and export: -<ul> -<li>a sub-corpus using learner- and test-specific metadata -<a href="#" onclick="document.forms['documents'].submit();">>> document search</a></li> -<li>results of your search for words and learner language features in their context -<a href="#" onclick="document.forms['advanced'].submit();">>> advanced search</a></li> -<li>feature lists for the whole corpus or a sub-corpus -<a href="#" onclick="document.forms['feature'].submit();">>> learner language features</a></li> -</ul> -</p> - -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/about.php b/php/en/old-04-12-14/about.php deleted file mode 100644 index 860c2dee1e520d771e06187f3691082e641234d2..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/about.php +++ /dev/null @@ -1,34 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar2.php'); - -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1><strong>MERLIN - M</strong>ultilingual Platform for <strong>E</strong>uropean <strong>R</strong>eference <strong>L</strong>evels: <strong>In</strong>terlanguage Exploration in Context </span></h1> -<p><br> - The KA2 Languages-financed project MERLIN <strong>started in 2012</strong> with the aim of developing a didactically motivated online platform that enables users of the Common European Framework of Reference for Languages (CEFR) to explore authentic written learner productions for Italian, Czech, and German. </p> -<p> </p> -<p><strong>Background</strong><strong> </strong><strong> </strong><br> -Since its publication in 2001, the Common European Framework of Reference for Languages (CEFR) has gained a leading role as an instrument of reference for the teaching and certification of languages and for the development of curricula. At the same time, there is a growing concern that the CEFR reference levels have not been sufficiently illustrated, leaving practitioners such as teachers, test and curriculum developers, and textbook authors without comprehensive empirical characterizations of the relevant distinctions. This is particularly the case for languages other than English, where supplementary empirical tools are urgently needed.</p> -<p> </p> -<p><strong>Project aims and results </strong><br> - MERLIN addresses this demand for <strong>Czech, German and Italian </strong>and proposes an online platform that enables CEFR users to explore authentic written learner productions which have been related to CEFR in a methodologically sophisticated way. A cross-linguistic and multifunctional web-based interface illustrates A1-C1 level learner texts, highlighting language characteristics relevant from practitioners’, research and intrinsic CEFR perspectives.<br> -The project thus addresses a broad target audience, with its relevance to anyone teaching, testing, or learning one of the three target languages in Europe. </p> -<p><br> - <strong>Availability </strong><strong> </strong><br> - Once the MERLIN platform is <strong>launched at the end of 2014</strong>, all <strong>MERLIN data will be freely accessible online</strong>. Resources as well as tools created by MERLIN are available under an open source license. In addition, the project approach and computational architecture is designed to be adaptable to other languages for which CEFR level illustration is needed.</p> -<p><br> - The <a href="#" onclick="document.forms['team'].submit();">MERLIN team</a> is convinced that MERLIN will be a great help for anyone working with the CEFR for Czech, Italian, and German in building a much more solid understanding of the CEFR level system. </p> -<p> </p> -<hr> -<p> </p> -<p><strong>The MERLIN project </strong>is funded <strong>until the end of 2014</strong> by the EU Lifelong Learning Programme under project number <em>518989-LLP-1-2011-1-DE-KA2-KA2MP.</em> </p> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/annotation.php b/php/en/old-04-12-14/annotation.php deleted file mode 100644 index 128a760efbe280cf9350df41eea8ddf8f18fb088..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/annotation.php +++ /dev/null @@ -1,250 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>Annotations in the MERLIN corpus </h1> -<p> </p> -<h2 class="example">Background information</h2> -<p> </p> -<div id="anchor1"></div> -<h3>The annotation structure <a href="#anchor1" onClick="toggle('#content1','#img1')"><img id="img1" src="style/toggle-expand.png"></a> </h3> -<p>The MERLIN data have been enriched with a multi-level annotation. ...</p> -<div id="content1" class="content"> -<p> -While most learner language features had to be annotated manually, NLP (Natural Language Processing) was used for <strong>automatic learner language annotations</strong> such as tokenization and lemmatization, part-of-speech tagging or segmentation into sentences or T-units.</p> -<p> </p> -<h3> - <strong>Annotations in the full MERLIN corpus</strong></h3> -<p> - <img src="style/annotations_GRAPHIC-layer_en1.png"/></p> -<p> -The main annotations <strong>available for the whole MERLIN corpus</strong> are target hypotheses <strong>(target hypotheses 1)</strong> and annotations of grammatical and orthographical learner language features <strong>(error annotation 1)</strong>: -</p> -<p> -All annotation is based on human interpretation of what the person who produced the text might have had on his/her mind. In a learner text collection (learner corpus), it is important to make this interpretation explicit to make annotations more easily understandable and to avoid problems of reliability. Therefore, the MERLIN team formulated target hypotheses (TH) that are a corrected version of the learner texts. The team followed the rules developed for the <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko" target="_blank" class="reference">FALKO corpus</a> and adapted them to the project needs where necessary (cf. Reznicek/Lüdeling et al. 2012). -</p> -<p> -The "minimal target hypothesis" <strong>(TH1)</strong> is a minimally intervening version of the learner text that is orthographically and grammatically correct, but might contain deviations from what a native speaker would say on other levels (e.g., lexical). TH1 were written for the whole MERLIN corpus.</p> -<p> -Based on these target hypotheses, data were annotated with a wide range of language characteristics - the <strong>learner language features</strong> - originating from various sources (<a href="#" onclick="document.forms['research'].submit();" class="reference">learn more here</a>). These language features are described in detail in the <a href="#" onclick="document.forms['annotation'].submit();" class="annotation">annotation scheme</a>. You can find a list of the features with some examples <a class="reference" href="#anchor4">here</a>. In the MERLIN corpus, learner language features from the fields of <strong>orthography and grammar</strong> are available for the whole database (error annotation 1). -</p> -<p> </p> -<h3> - <strong>Annotations in the core corpus</strong> -</h3> -<p> - <img src="style/annotations_GRAPHIC-layer_en2.png"/> -</p> -<p> -In the explorative, smaller MERLIN core corpus, linguistic aspects regarding vocabulary, pragmatics, sociolinguistic appropriateness are taken into consideration. The <strong>core corpus</strong> consists of two groups of texts which received either A2 or B2 ratings. -</p> -<p> -<a href="#" onclick="document.forms['research'].submit();" class="reference">Learn more about the core corpus, in the research section</a>. -</p> -<p> -The core corpus texts were enriched with an <strong>extended target hypothesis (TH2)</strong> that aims at creating an <strong>acceptable</strong> (for a native speaker) version of the original learner text. <strong>TH2</strong> takes into account more language dimensions that often regard context-dependent phenomena. -</p> -<p> -Also, <strong>learner language features</strong> regarding vocabulary, sociolinguistics, pragmatics, and intelligibility are included in the core corpus annotations (error annotation 2). Very often, these phenomena are not errors. These language features are also described in detail in the <a href="#" onclick="document.forms['download'].submit();" class="reference">annotation scheme</a>. -</p> -<p> -You can find more details on the annotation layers and contents <a href="#" onclick="document.forms['research'].submit();" class="reference" >here</a>. -</p> -<p> </p> -</div> -<div id="anchor2"></div> -<h3><a name="annotationscheme"></a>The annotation scheme - Learner language features </h3><a href="#anchor2" onClick="toggle('#content2','#img2')"><img id="img2" src="style/toggle-expand.png"></a> -<p>For the annotation of learner language characteristics, the MERLIN team developed an <strong>annotation scheme. </strong> ...</p> -<div id="content2" class="content"> -<p>The scheme is not merely based on error coding, but also takes into account other linguistic characteristics. It thus reflects the view of learner language as an evolving language system in its own right.</p> -<p> - Also, the annotation scheme integrates tags that were indicated as important by CEFR users as well as tags suggested by Second Language Acquisition research, the CEFR scales, and the learner texts themselves. </p> -<p><strong><a href="#" onclick="document.forms['research'].submit();" class="reference">read more about the source of the annotated features and the methodology </a></strong></p> -<p><a class="reference" href="docs/AS_part1.pdf" target="_new">Click to download the MERLIN annotation scheme.</a> <img src="style/document-pdf.png"/></p> - <p> -The MERLIN annotations followed a strict policy of reliability control. Also, difficult decisions in the annotation process are available through the project documentation (<a href="#" onclick="document.forms['download'].submit();">see FAQ document</a>). If you have any questions concerning concrete annotations, please don't hesitate to get in touch with the <a href="#" onclick="document.forms['contact'].submit();" class="reference">MERLIN-team</a> -<link: contact>. </p> - <p> </p> - -</div> -<p> </p> -<h2 class="example">Examples from the corpus</h2> -<p> </p> - -<div id="anchor4"></div> -<h3><a name="featurelist"></a>List of learner language features with examples <a href="#anchor4" onClick="toggle('#content4','#img4')"><img id="img4" src="style/toggle-expand.png"></a></h3> -<p class="Stil5"> </p> -<div id="content4" class="content"> -<h3><a href="#grammartags">Grammar</a></h3> -<p></p> -<h3><a href="#orthographytags">Orthography</a></h3> -<p> </p> -<table border="1" cellpadding="0" cellspacing="0" bordercolor="99dff9"> - <tr> - <td valign="top" bgcolor="99dff9"><p><strong><a name="grammartags"></a>GRAMMAR TAGS</strong></p></td> - <td valign="top" bgcolor="99dff9"><p><strong>Example</strong><br> - <strong>[] erroneous element {} missing element <> expected element</strong></p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>word order in main clause</strong></p></td> - <td valign="top"><p>*[Vielleicht du könntest mir bei meine Wohnungssuche helfen.]<br> - *[Sollst du Wasser und Bikini mitbringen.] </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>word order in subordinate clause</strong></p></td> - <td valign="top"><p>*[wenn haben Sie Zeit,] dann bitte sagen Sie mir. </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>negation general</strong></p></td> - <td valign="top"><p>*Ich habe [nicht] Zeit. <br> - *Er wird dort arbeiten [nein]. </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>CZE: double negation </strong></p></td> - <td valign="top"><p>*[mám] žádný čas <nemám žádný čas><br> - *nikdo [volal] <nikdo nevolal> </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>verb valency: number of obligatory arguments</strong></p></td> - <td valign="top"><p>*Er hat uns nicht gesagt, ob {er} kommen will. <br> - *Petr vstává v 6 hodin. On nesnídá, protože [on] nemá hlad. </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>agreement (subject and verb)</strong></p></td> - <td valign="top"><p>*Jana [hast] gelesen, *Jana [sind] müde </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>reflexive pronoun</strong></p></td> - <td valign="top"><p>*er [entschuldigt], *Laura und Ferdinand reden [sich]<br> - *smála [si]; *[se] <si> lava ogni mattina </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>CZE: possessive reflexive pronoun</strong></p></td> - <td valign="top"><p>*potřebuju [moji] knihu, * vidím [mého] otce </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>inexistent inflection (nouns, adj, verb)</strong></p></td> - <td valign="top"><p>adjective: *ein [blaus] Himmel <blauer>; [teuerer] <teurer>; [größen] <großen / größeren><br> - noun: *das schöne [Hause], *[euche] [Fahrrade], <br> - verb: *Johannes [trinks] keine Milch. *… meine Rechte und Pflichten zu [weißen]; *Wie ich dir [gesagen] hate... </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>wrong inflection (nouns, pronouns, adj)</strong></p></td> - <td valign="top"><p>case: *… ich suche eine neue Wohnung in [diese] Stadt; <br> - čte romány a chodí na [procházce] </p></td> - </tr> - <tr> - <td valign="top" class="example"><p> </p></td> - <td valign="top"><p>number: *Ich werde zwei [Woche] dort verbringen; </p></td> - </tr> - <tr> - <td valign="top" class="example"><p> </p></td> - <td valign="top"><p>gender: *Ich brauche [eine] [große] Wagen für die Möbel. </p></td> - </tr> - <tr> - <td valign="top" class="example"><p> </p></td> - <td valign="top"><p>ambiguous: *Die Silvesternacht habe ich mit [meiner] [Kinder] verbracht. (number? case?) </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>verb: tense</strong></p></td> - <td valign="top"><p>*gestern wir [kochen] gemeinsam;<br> - *Mi ha domandato se [ho] fretta <Mi ha domandato se avevo fretta> </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>verb: voice</strong></p></td> - <td valign="top"><p>*Peter [wurde gezeigt] mir sein neues Buch, die Stadt [gründete] im Jahre 1234; *studenti [budou napsáni] test </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>verb: mood</strong></p></td> - <td valign="top"><p>*[Jdi] do města?, *er [würde gehen] gestern ins Kino <ist gestern ins Kino gegangen/ging gestern>; *[Stai] bene! </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>verb: aspect (CZE+ITA)</strong></p></td> - <td valign="top"><p>CZE: *celý den [se naučil] <celý den učil><br> - ITA: imperfetto instead of pass.pross.: *sempre pensavo <ho sempre pensato> che voi due </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>verb formation (morphol.)</strong></p></td> - <td valign="top"><p><em>errors in the formation of complex predicates (i.e. analytical verb forms, predicates with modals and copulative predicates).</em></p> - <p>*er wird [lese]; *du musst [kommst]; *Diese zwei Frage richtig {zu} beantworten ist nicht einfach.; *Der Buchladen [hat] in der Stadt, *Die Studentin [ist] kam in die Schule </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>main verb</strong></p></td> - <td valign="top"><p>*… mit großem Interesse habe ich in XY Zeitung Ihre Anzeige {gelesen}; *Ich [nehme] besoche meine Tochter. </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>preposition</strong></p></td> - <td valign="top"><p>*ich warte {auf} deine Antwort; *kannst du [bei] mir helfen?, *Er ist gekommen eine Stunde [vor] </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>article</strong></p></td> - <td valign="top"><p>*habe {die} litauische Staatsangehörigkeit; *[il] mese fa siamo andati; *ich bringe [etwas] Geschänk </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>conjunction</strong></p></td> - <td valign="top"><p>*er füttert den Hund, {der/welcher} nicht ihm gehört; *er half mir [dass] ich aufstehe, *Karl kam [um] [für] helfen </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>ITA: Clitic</strong></p></td> - <td valign="top"><p>a) puoi [chiamarla] <puoi chiamarmi>; ho dimenticato di [scrivere] prima <ho dimenticato di scriverlo prima><br> - *non { c'è } problema </p></td> - </tr> - <tr> - <td valign="top" class="example"><p><strong>part of speech error</strong></p></td> - <td valign="top"><p>*Ich freue mich für unsere [besucht] <Besuch>; *Ich bin sehr flexibel und [Mobilität] <mobil>; *Kannst du mich [Hilfe] <helfen> </p></td> - </tr> -</table> -<p> </p> -<h3><a name="orthographytags"></a>Orthography</h3> -<p> </p> -<table border="1" cellpadding="0" cellspacing="0" bordercolor="99dff9"> - <tr> - <td valign="top" bgcolor="99dff9"><p><strong>ORTHOGRAPHY TAGS</strong></p></td> - <td valign="top" bgcolor="99dff9"><p><strong> </strong><strong>Example</strong></p> - <p><strong>[] erroneous element {} missing element <> expected element</strong></p></td> - </tr> - <tr> - <td valign="bottom" class="example"><p><strong>general grapheme error</strong></p></td> - <td valign="top"><p>*[libe] <liebe>, *[Monart] <Monat>; *[schreipt] <schreibt>; *[experienza] <esperienza>; *[mo] {ma}; *[wie] <wir> </p></td> - </tr> - <tr> - <td valign="bottom" class="example"><p><strong>grapheme transposition</strong></p></td> - <td valign="top"><p>*[revelant] <relevant>; *[saulti] <saluti>; *[kraští] <kratší> </p></td> - </tr> - <tr> - <td valign="bottom" class="example"><p><strong>CZE+ITA: diacritical marks</strong></p></td> - <td valign="top"><p>*[kratši] <kratší>; *[e] andata <è>; *[Váčlav] <Václav>; *[ůplný] <úplný>; *[perchè] <perché> </p></td> - </tr> - <tr> - <td valign="bottom" class="example"><p><strong>capitalization </strong></p></td> - <td valign="top"><p>*[sie] waren in Frankreich, [Und] danach in Deutschland. </p></td> - </tr> - <tr> - <td valign="bottom" class="example"><p><strong>word boundary </strong></p></td> - <td valign="top"><p>*[Schlafe zimmer]; *[das selbe]; [ne čekala] <nečekala>; *[qui ndi]; *[dolesa] <{do lesa>; *[Desweiteren] </p></td> - </tr> - <tr> - <td valign="bottom" class="example"><p><strong>abbreviation</strong></p></td> - <td valign="top"><p>*[Sms] <SMS>; *[at.] <atd.> </p></td> - </tr> - <tr> - <td valign="bottom" class="example"><p><strong>punctuation</strong></p></td> - <td valign="top"><p>*[Er kam nicht] aber er hat sich nicht entschuldigt.<br> - *Rom, Paris[,] und Berlin gefallen mir sehr. </p></td> - </tr> - <tr> - <td valign="bottom" class="example"><p><strong>GER+ITA: apostrophe</strong></p></td> - <td valign="top"><p>*Das ist [Mama's] Buch.; *d{‘}accordo </p></td> - </tr> -</table> -<p> </p></div> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/contact.php b/php/en/old-04-12-14/contact.php deleted file mode 100644 index 042e0f4c469137fdc3cd60f4d26b379504972552..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/contact.php +++ /dev/null @@ -1,20 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>Contact</h1> -<p> </p> -<p>If you have any comments or queries about the MERLIN project or if you are interested in using the MERLIN corpus for your research, please don’t hesitate to contact us for more details.</p> -<p> </p> -<p>Dr. Katrin Wisniewski: <a href="mailto:katrin.wisniewski@tu-dresden.de">katrin.wisniewski@tu-dresden.de</a> -</p> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/disclaimer.php b/php/en/old-04-12-14/disclaimer.php deleted file mode 100644 index 34f0231580d2b8a872bb03dc1ddfb8e3a69fc07e..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/disclaimer.php +++ /dev/null @@ -1,50 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>Disclaimer</h1> -<p> </p> -<p><img src="style/EU_flag_LLP_EN-01.png" alt="LLP-Logo" width="200" height="77"></p> -<p>This project has been funded with support from the European Commission. This website reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.</p> -<p> </p> -<p><strong>Responsible for website content and maintenance:</strong></p> -<p> Technische Universität Dresden<br> - Fakultät Sprach-, Literatur- und Kulturwissenschaften<br> - Institut für Romanistik<br> - Prof. Lieber, Dr. Katrin Wisniewski<br> - 01062 Dresden<br> - Tel: +49 (0) 351 463-33216<br> - Fax: +49 (0) 351 463-37702</p> -<p> </p> -<p><strong>Universität Tübingen</strong><br> - Seminar für Sprachwissenschaft<br> - Abt. Theoretische Computerlinguistik<br> - Prof. Meurers<br> - 72074 Tübingen<br> - Tel: +49 (0) 7071 2973963<br> - Fax: +49 (0) 7071 295213</p> -<p> </p> -<p><strong>European Academy Bozen</strong><br> - Institute for Specialised Communication and Multilingualism<br> - Viale Druso, 1 / Drususallee 1<br> - 39100 Bolzano / Bozen - Italy<br> - Andrea Abel, Verena Lyding<br> - Tel: +39 0471 055 055<br> - Fax: +39 0471 055 099<br> -</p> -<br> -<h2><strong>Credits</strong></h2> -<p>The following icons used on the MERLIN platform are licensed under the Creative Commons-License. In detail:</p> -<p><img src="style/toggle-expand.png" width="16" height="16"> and <img src="style/toggle-minus.png" width="16" height="16"> by: <a href="http://iconfindr.com/1qklJzy" target="_blank">Yusuke Kamiyamane</a> | licensed under Creative Commons (CC BY 3.0)</p> -<p><img src="style/icon_info.png" width="16" height="16"> by: <a href="http://www.famfamfam.com/about/" target="_blank">Mark James</a> | licensed under Creative Commons (CC BY 2.5)</p> -<p><img src="style/icon_help.png" width="16" height="16"> by: <a href="http://icons8.com/" target="_blank"> Visual Pharm</a> | licensed under Creative Commons (CC BY-ND 3.0) </p> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/download.php b/php/en/old-04-12-14/download.php deleted file mode 100644 index da2b5cd2b2d7034b6af9622115e7abd2348a6681..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/download.php +++ /dev/null @@ -1,40 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>Download page</h1> -<p><strong>Download MERLIN-related documents.</strong></p> -<p> </p> -<h2>User modelling</h2> -<p><a href="docs/WP4_UserModelling_Part1_Merlin_Content_report.pdf" target="_blank" class="reference">Report on user study </a><img src="style/document-pdf.png" alt="" width="16" height="16"></p> -<p><a href="docs/WP4_UserModelling_Part2_Merlin_Technical_report.pdf" target="_blank" class="reference">Report on user study - technical part</a></p> -<p> </p> -<h2>Corpus: Tests and data-preparation</h2> -<p><a href="docs/MERLIN_Rating-Grid.pdf" target="_blank" class="reference">MERLIN rating grid</a> <img src="style/document-pdf.png" width="16" height="16"></p> -<p><strong>Transcription guidelines</strong> (to be prepared for download)</p> -<p><strong>Complete test tasks </strong>including a task description are available for download in the section <a href="#" onclick="document.forms['mcorpus'].submit();" class="reference">MERLIN corpus</a>.</p> -<p><a href="docs/MERLIN_Technical-report.pdf" target="_blank" class="reference">Technical report</a> <img src="style/document-pdf.png" alt="" width="16" height="16">: Report on the reliability and scale functionality of the MERLIN written speech sample ratings, by O. Bärenfänger</p> -<p> </p> -<h2>Annotations: Annotation scheme and annotation process</h2> -<p><a href="docs/AS_part1.pdf" target="_blank" class="reference">MERLIN annotation scheme</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> -<p><strong>Documentation of annotation decisions</strong>: "FAQ-document" (to be prepared for download)</p> -<p><strong>Annotation structure guidelines </strong>(to be prepared for download)</p> -<p> </p> -<h2>Conference abstracts and publications of the MERLIN team</h2> -<p>Katrin Wisniewski. <em>Giving a Voice to the Learner. Using the Multilingual MERLIN Learner Corpus Related to the Common European Framework of Reference for Scale Validation</em>. <a href="http://www.engl.polyu.edu.hk/events/apclc2014/index.html" target="_blank">Second Asia Pacific Corpus Linguistics Conference (APCLC 2014)</a>, Hong Kong, March 7-9, 2014.</p> -<p>Katrin Wisniewski, Karin Schöne, Lionel Nicolas, Chiara Vettori, Adriane Boyd, Detmar Meurers, Andrea Abel, Jirka Hana. <em>MERLIN: An online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data</em>. <a href="http://conference.pixel-online.net/ICT4LL2012/" target="_blank">ICT for Language Learning 2013</a>, Conference Proceedings. Libreriauniversitaria.it Edizioni, Florence, Italy, November 14-15, 2013.</p> -<p>Julia Hancke and Detmar Meurers. <em>Exploring CEFR classification for German based on rich linguistic modeling.</em><a href="http://lcr2013.b.uib.no/files/2013/09/abstracts-book.pdf" target="_blank"> Learner Corpus Research 2013, Book of Abstracts</a> <img src="style/document-pdf.png" alt="" width="16" height="16">. pp. 54-56. September 27-29, 2013. </p> -<p>Andrea Abel, Lionel Nicolas, Jirka Hana, Barbora Čtindlová, Katrin Wisniewski, Claudia Woldt, Detmar Meurers and Serhiy Bykh. <em>A Trilingual Learner Corpus illustrating European Reference Levels</em>. <a href="http://lcr2013.b.uib.no/" target="_blank">LCR 2013 Learner Corpus Research Conference 2013</a>, <a href="http://lcr2013.b.uib.no/files/2013/09/abstracts-book.pdf" target="_blank"> Book of Abstracts</a> <img src="style/document-pdf.png" width="16" height="16">, Bergen, Norway, September 27-29, 2013. </p> -<p>Claudia Woldt, Andrea Abel. <em>Lernertexte zuverlässig bewerten: Die mehrsprachige Plattform für die Europäischen Referenzniveaus MERLIN</em>. <a href="http://www.idt-2013.it" target="_blank">IDT 2013</a>, Bolzano, Italy, July 29 – August 3, 2013.</p> -<p>Katrin Wisniewski. Poster: <em>Illustrating and Researching the Common European Framework Levels with a Multilingual Online Platform</em>. <a href="http://www.ltrc2013.or.kr/" target="_blank">The 35th Language Testing Research Colloquium</a>, Seoul, Korea, July 3-5, 2013.</p> -<p> </p> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/help-annis-glossary.php b/php/en/old-04-12-14/help-annis-glossary.php deleted file mode 100644 index 606f88702ecc62179a629d5d4a9e08c572416869..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/help-annis-glossary.php +++ /dev/null @@ -1,202 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>Glossary </h1> - <h2>"Scheme"-elements</h2> - <h3>G_ Grammar</h3> - <p> </p> -<table cellspacing="0" cellpadding="2"> - <col width="131"> - <col width="216"> - <tr> - <td width="131"><p><strong>G_Agr</strong></p></td> - <td width="216"><p>agreement (subject and verb)</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Art</strong></p></td> - <td width="216"><p>Article</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Clit</strong></p></td> - <td width="216"><p>ITA: Clitic</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Conj</strong></p></td> - <td width="216"><p>Conjunction</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Inflect_inexist</strong></p></td> - <td width="216"><p>inexistent inflection (nouns, adj, verb)</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Morphol_wrong</strong></p></td> - <td width="216"><p>wrong inflection (nouns, pronouns, adj)</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Neg_negdoub</strong></p></td> - <td width="216"><p>CZE: double negation</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Neg_neggen</strong></p></td> - <td width="216"><p>negation general</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_POS</strong></p></td> - <td width="216"><p>part of speech error</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Prep</strong></p></td> - <td width="216"><p>Preposition</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Refl_pronrefl</strong></p></td> - <td width="216"><p>reflexive pronoun</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Refl_pronreflposs</strong></p></td> - <td width="216"><p>CZE: possessive reflexive pronoun</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Valency_complnumb</strong></p></td> - <td width="216"><p>verb valency: number of obligatory arguments</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Verb_asp</strong></p></td> - <td width="216"><p>verb: aspect (CZE+ITA)</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Verb_compl</strong></p></td> - <td width="216"><p>verb formation (morphol.)</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Verb_main</strong></p></td> - <td width="216"><p>main verb</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Verb_md</strong></p></td> - <td width="216"><p>verb: mood</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Verb_tns</strong></p></td> - <td width="216"><p>verb: tense</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Verb_vc</strong></p></td> - <td width="216"><p>verb: voice</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Wo_womaincl</strong></p></td> - <td width="216"><p>word order in main clause</p></td> - </tr> - <tr> - <td width="131"><p><strong>G_Wo_wosubcl</strong></p></td> - <td width="216"><p>word order in subordinate clause</p></td> - </tr> -</table> -<h2> </h2> -<h3>O_ Orthography</h3> -<table cellspacing="0" cellpadding="2"> - <col width="131"> - <col width="216"> - <tr> - <td width="131"><p><strong>O_Abbrev</strong></p></td> - <td width="216"><p>abbreviation</p></td> - </tr> - <tr> - <td width="131"><p><strong>O_Apostr</strong></p></td> - <td width="216"><p>GER+ITA: apostrophe</p></td> - </tr> - <tr> - <td width="131"><p><strong>O_Capit</strong></p></td> - <td width="216"><p>capitalization</p></td> - </tr> - <tr> - <td width="131"><p><strong>O_Graph_act</strong></p></td> - <td width="216"><p>CZE+ITA: diacritical marks</p></td> - </tr> - <tr> - <td width="131"><p><strong>O_Graph_graphgen</strong></p></td> - <td width="216"><p>general grapheme error</p></td> - </tr> - <tr> - <td width="131"><p><strong>O_Graph_trans</strong></p></td> - <td width="216"><p>grapheme transposition</p></td> - </tr> - <tr> - <td width="131"><p><strong>O_Punct</strong></p></td> - <td width="216"><p>punctuation</p></td> - </tr> - <tr> - <td width="131"><p><strong>O_Wordbd</strong></p></td> - <td width="216"><p>word boundary</p></td> - </tr> -</table> -<p> </p> -<h2>"_type" - elements</h2> -<table cellpadding="2" cellspacing="0"> - <col width="131"> - <col width="216"> - <tr> - <td width="131"><p><strong>add</strong></p></td> - <td width="216"><p>superfluous (added) element</p></td> - </tr> - <tr> - <td><p><strong>ambig</strong></p></td> - <td><p>ambigues - type of error can't be specified</p></td> - </tr> - <tr> - <td><p><strong>asp</strong></p></td> - <td><p>aspect error</p></td> - </tr> - <tr> - <td><p><strong>ch</strong></p></td> - <td><p> wrong choice of element </p></td> - </tr> - <tr> - <td><p><strong>gend</strong></p></td> - <td><p>wrong gender</p></td> - </tr> - <tr> - <td><p><strong>merge</strong></p></td> - <td><p>elements are wrongly merged</p></td> - </tr> - <tr> - <td><p><strong>md</strong></p></td> - <td><p>wrong mood</p></td> - </tr> - <tr> - <td><p><strong>numb</strong></p></td> - <td><p>wrong number</p></td> - </tr> - <tr> - <td><p><strong>o</strong></p></td> - <td><p>omitted element </p></td> - </tr> - <tr> - <td width="131"><p><strong>pos</strong></p></td> - <td width="216"><p>wrong position</p></td> - </tr> - <tr> - <td><p><strong>split</strong></p></td> - <td><p>elements are wrongly split</p></td> - </tr> - <tr> - <td><p><strong>tns</strong></p></td> - <td><p>wrong tense</p></td> - </tr> - <tr> - <td width="131"><p><strong>vc</strong></p></td> - <td width="216"><p>wrong voice</p></td> - </tr> -</table> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/help-annis.php b/php/en/old-04-12-14/help-annis.php deleted file mode 100644 index 6458962846d56b241fa0682234cfd29b4fa51063..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/help-annis.php +++ /dev/null @@ -1,95 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>Explanation of your search output </h1> -<p><strong>Please note: this text has been opened in a new window (new tab of your browser).</strong></p> -<p>Your search result is displayed in ANNIS. ANNIS is an open-source software (a search and visualization architecture) that is capable to visualize multi-layered annotations. It enables corpus users to explore the whole set of diverse MERLIN annotations: target hypotheses (TH1, TH2), annotations of learner language features, and automatically assigned annotations (e.g. part of speech, sentences, etc.). <br> -In the search field on the left side (see [1] in the scheme below) you can see the query you started in the MERLIN search interface (advanced search) translated into the ANNIS query language. If you want to change your query, you can return to the MERLIN interface (“back to advanced search”) or modify it using the ANNIS query language. </p> -<p><strong><span class="Stil5">» For more information about ANNIS and the query language please visit</span></strong> <a href="http://www.sfb632.uni-potsdam.de/annis/" title="ANNIS home" target="_blank" class="reference">the ANNIS homepage.</a></p> -<p> </p> -<h2><strong>Basic information about your search output in ANNIS</strong></h2> -<h2><img src="style/ANNIS-view-legend.png" width="700" alt="grid view" title="to see the image full-sized right-click on it and choose 'view image'"></h2> -<p> </p> -<table border="0" cellpadding="6" cellspacing="0" bordercolor="#113547"> - <tr> - <td colspan="2" valign="top"><h3><strong>1-4 </strong><strong>Explanation of the basic elements in the interface</strong></h3></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>1</strong></p></td> - <td valign="top"><p>click here to minimize left part of the window (enlarge view of the result)</p></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>2</strong></p></td> - <td valign="top"><p>search field - displays your query in the query language. If you are familiar with the ANNIS query language you can modify your query here. Please use the button “back to advanced search”, if you want to run a new query with the help of MERLIN “advanced search” interface.</p></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>3</strong></p></td> - <td valign="top"><p>options - export results or run a frequency analysis</p></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>4</strong></p></td> - <td valign="top"><p>number of results that match your search - use arrows to switch between pages, (no. of results per page: 10)</p></td> - </tr> - <tr> - <td colspan="2" valign="top"><h3>5 <strong>Explanation of annotation levels (tiers)</strong></h3></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>5a</strong></p></td> - <td valign="top"><p><strong>tier: learner</strong> <br> - represents the original learner sentence. In case, that the learner used emoticons or images, this information is annotated within this tier, as well as unreadable passages.</p></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>5b</strong></p></td> - <td valign="top"><p><strong>tier: ZH1</strong><br> - Target hypothesis 1 (TH1) represents a grammatically and orthographically acceptable version of the learner text. TH1 takes into consideration single sentences and does not allow for errors that can be observed in the context of the surrounding text only.</p></td> - </tr> - <tr> - <td width="20" valign="top"><p> </p></td> - <td valign="top"><p><strong>tier: ZH2</strong><br> - Target hypothesis 2 (TH2) represents a sociolinguistically acceptable version of the original learner text. TH2 is availbale for the core corpus only.</p></td> - </tr> - <tr> - <td width="20" valign="top"><p> </p></td> - <td valign="top"><p><strong>tier: ZH1Diff / ZH2Diff</strong><br> - These levels describe the type of the deviation between the target hypothesis (TH1 or TH2) and the learner text (learner) with the elements being:</p> - <p><strong>CHA = changed element | INS = insertion | DEL = deletion of an element | MERGE = merging of two elements | SPLIT = splitting of two elements | MOVS / MOVT = moving element</strong></p> - <p> </p></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>5c</strong></p></td> - <td valign="top"><p><strong>tier: scheme </strong><br> - The “scheme” tiers show the annotated features, e.g. punctuation error (<strong>O_Punct</strong>). The grid allows for a detailed view of the span of the annotation (tag span), e.g. in case of a missing comma the whole clause or sentence including the missing mark is captured. If there is more than one feature in one unit (word, clause, and sentence), multiple tags are assigned. Then, you will find several “scheme” tiers. Also, tag spans may overlap.<br> - For some types of features, i.e. “scheme” elements, the kind of deviation from the target language is specified, too (see 5d).</p> -<p><strong><span class="Stil5">» An explanation of all "scheme”-elements is provided <a href="help-annis-glossary.html" target="_blank">here</a></span></strong>. </p> - <p><strong><span class="Stil5">» Tag spans are defined in <a href="docs/AS_part1.pdf" target="_blank">the annotation scheme</a> <img src="style/document-pdf.png" width="16" height="16">.</span></strong></p></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>5d</strong></p></td> - <td valign="top"><p><strong>tier: xxx_type (e.g.: o_punct_type)</strong><br> - The name of the tier<strong> </strong>(<strong>e.g. o_punct_type</strong>)<strong> </strong>refers to the annotated error specified in the “scheme”-tier (e.g. <strong>O_Punct</strong>).<br> - The “_type”-tier can contain the following elements: </p> - <p><strong>omit = element omitted | add = superfluous element | choose = wrong choice of element | place = wrong position | merge = elements are wrongly merged | split = elements are wrongly split</strong></p> - <p>This tier describes the deviation of the learner language from the target language formally, e.g. <strong>o_punct_type = o </strong>means: <em>omitted comma</em>.</p></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>6</strong></p></td> - <td valign="top"><p>Click on the <strong>+</strong> to show additional annotation layers:<br> - <em><strong>automatic</strong></em>: shows lemmas and part of speech tags that have been assigned to the words in the learner text automatically, and also the corresponding sentence and T-units<br> - <em><strong>dependencies</strong></em>: shows dependencies / linguistic relations between the elements of a sentence.</p></td> - </tr> - <tr> - <td width="20" valign="top"><p><strong>7</strong></p></td> - <td valign="top"><p>Click on the<strong> +</strong> to show the full learner text.</p></td> - </tr> -</table> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/help.php b/php/en/old-04-12-14/help.php deleted file mode 100644 index 0d33897af0e927bce1bc31b7e888a6fd62f30f6d..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/help.php +++ /dev/null @@ -1,276 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>HELP </h1> -<form name="help-annis" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="help-annis.php"></form> -<form name="documents" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_documentsearch.php"></form> -<form name="simple" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_simplesearch.php"></form> -<form name="feature" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_featuresearch.php"></form> -<form name="advanced" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="F_advancedsearch.php"></form> -<p> </p> -<h2>Tutorial </h2> -<p>The video tutorial for introducing the MERLIN search functions will be available soon.</p> -<h2>FAQ </h2> -<div id="anchor1"></div> -<p><em>What can I use MERLIN for?</em> -<a href="#anchor1" onClick="toggle('#content1','#img1')"><img id="img1" src="style/toggle-expand.png"></a></p> -<div id="content1" class="content"> -<ul type="disc"> - <li> - <p>find sample texts for a specific CEFR-level and bring them to the classroom <a href="#" onclick="document.forms['documents'].submit();"><em><strong>» document search</strong></em><strong></strong></a></p> - </li> -</ul> -<ul> - <li> - <p>have your students work with the<strong><em> >> MERLIN tasks</em></strong> and compare their results to the MERLIN texts <a href="#" onclick="document.forms['documents'].submit();"><em><strong>» document search</strong></em><strong></strong></a></p> - </li> - <li> - <p>search for a word in learner texts and explore how learners use it, and which errors are related to the word <a href="#" onclick="document.forms['simple'].submit();"><strong><em>» simple search</em></strong></a></p> - </li> -</ul> -<ul type="disc"> - <li> - <p>create a sub-set (sub-corpus) of texts <a href="#" onclick="document.forms['documents'].submit();"><em><strong>» document search</strong></em><strong></strong></a> to explore errors that tend to appear with learners: <br> - with the same L1, who achieved the same CEFR level in the MERLIN ratings or of the same age </p></li> - <li> - <p>search for examples of learner language features, e.g. grammatical or orthographical errors, in the whole corpus or in your sub-corpus <a href="#" onclick="document.forms['advanced'].submit();">» go to <em><strong>advanced search</strong></em></a> and choose from “<em>Learner language features</em>” </p> - </li> - <li> - <p>find out which errors are frequent in texts that you included in your sub-corpus <a href="#" onclick="document.forms['feature'].submit();">» go to <em><strong>learner language features</strong></em><strong></strong></a></p> - </li> - <li> - <p> find examples of errors in the learners’ texts related to a specific word (e.g. valency errors with the verb “warten”)<a href="#" onclick="document.forms['advanced'].submit();"> » see example queries in the <strong><em>advanced search</em></strong></a></p> - </li> -</ul> -<p> </p> -</div> - -<div id="anchor2"></div> -<p><em>Where can I find ... ? </em> -<a href="#anchor2" onClick="toggle('#content2','#img2')"><img id="img2" src="style/toggle-expand.png"></a></p> -<div id="content2" class="content"> -<ul> - <li> - <p>a list of all <strong>test and task descriptions</strong>? <a href="#" onclick="document.forms['mcorpus'].submit();" class="reference"> go to MERLIN corpus</a></em></p> - </li> - <li> - <p>a list of <strong>learner language features with examples </strong><a href="#" onclick="document.forms['annotation'].submit();" class="reference"> go to MERLIN annotations</a></p> - </li> - <li> - <p>explanations on the <strong>search output </strong><a href="#" onclick="document.forms['help-annis'].submit();" class="reference">here</a></p></li> - <li> - <p>detailed explanation of the annotations, their structure and sources (learner language features) <span class="Stil5"><a href="docs/AS_part1.pdf" target="_blank" class="reference">MERLIN annotation scheme </a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"></span></p> - </li> -</ul> - -<p> </p> -</div> -<div id="anchor3"></div> -<p><em>What can I do with my search results?</em> -<a href="#anchor3" onClick="toggle('#content3','#img3')"><img id="img3" src="style/toggle-expand.png"></a></p> -<div id="content3" class="content"> -<ul> - <li> - <p><strong>Simple search</strong>: access full texts by clicking on the word you looked for (key word)</p> - </li> - <li> - <p><strong>Advanced search</strong>: you can export your query results (word or feature)</p></li> - <li><p><strong>Document search</strong>: you can export full texts (learner text and/or target hypotheses) and metadata</p></li> -</ul> -<p> </p> -</div> - -<div id="anchor4"></div> -<p><em>What is ANNIS? </em> -<a href="#anchor4" onClick="toggle('#content4','#img4')"><img id="img4" src="style/toggle-expand.png"></a></p> -<div id="content4" class="content"> -<p><br> - ANNIS is an open-source software (actually a search and visualization architecture) that visualizes multi-level annotations as those from the MERLIN corpus. ANNIS is intergrated into the MERLIN interface as it enables corpus users to explore the whole set of diverse MERLIN annotations: target hypotheses (TH1 and TH2), annotations of learner language features, automatically assigned annotations as sentences and part of speech, etc. <br> - For more information about ANNIS and the ANNIS query language please visit the <span class="reference"><a href="http://www.sfb632.uni-potsdam.de/annis/" target="_blank">ANNIS homepage</a>.</span></p> -<p> </p> - -</div> -<p> </p> -<h2>Glossary </h2> -<h3><a href="#AH">A-H</a> <a href="#IO">I-O</a> <a href="#PZ">P-Z</a></h3> -<p> </p> -<table border="0" cellpadding="4" cellspacing="0" bordercolor="#113547"> - <tr> - <td width="140" valign="top"><h3><a name="AH"></a>A-H</h3></td> - <td valign="top"> </td> - </tr> - <tr> - <td width="140" valign="top"><p>ANNIS</p></td> - <td valign="top"><p>ANNIS is an open source software (actually a search and visualization architecture) that visualizes multi-level annotations as those from the MERLIN corpus. </p> - <p><strong><em>» more information about ANNIS at the developers <a href="http://www.sfb632.uni-potsdam.de/" target="_blank">website</a></em></strong></p></td> - </tr> - <tr> - <td width="140" valign="top"><p> agreement error </p></td> - <td valign="top"><p>An agreement error in MERLIN includes wrong grammatical forms in the combinations of subject and verb. <a href="docs/AS_part1.pdf" target="_blank" class="reference">see annotation scheme</a></p></td> - </tr> - <tr> - <td width="140" valign="top"><p>annotation</p></td> - <td valign="top"><p>markup of the learner's text; there are several types of annotations in the MERLIN corpus: <strong><em>» metadata</em></strong>, <strong><em>» learner language features</em></strong>, <strong><em>» POS</em></strong> annotations</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>author ID</p></td> - <td valign="top"><p>number that clearly identifies a text in the MERLIN corpus</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>CHA</p></td> - <td valign="top"><p>changed element; annotation describing the type of deviation between the learner text and the target hypothesis (type of target level modification)</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>corpus</p></td> - <td valign="top"><p>collection of texts, in MERLIN: collection of written productions of learners </p> - <p><em><strong>» for details on the texts and their sources see </strong></em><strong><em><a href="#" onclick="document.forms['mcorpus'].submit();">MERLIN Corpus</a></em></strong></p></td> - </tr> - <tr> - <td width="140" valign="top"><p>DEL</p></td> - <td valign="top"><p>Refers to TH1/TH2. Deletion of an element; annotation describing the type of deviation between the learner text and the target hypothesis (type of target level modification)</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>EA1</p></td> - <td valign="top"><p>first stage of error annotation that includes grammatical and orthographical errors; <br> - <em><strong>» read more about the </strong></em><strong></strong><a href="#" onclick="document.forms['annotation'].submit();" class="reference"> annotation structure</a></p></td> - </tr> - - <tr> - <td width="140" valign="top"><p>EA2</p></td> - <td valign="top"><p>second stage of error annotation highlighting learner language features that refer to vocabulary, pragmatics, sociolinguistics, and general intelligibility; EA2 annotations are available for the core corpus.</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>Fair CEFR-level</p></td> - <td valign="top"><p>the CEFR-related rating of a learner text corrected for unfair rating tendencies with a statistical procedure <br> - <a href="#MFR" class="reference">see also Multi-Facet Rasch analysis</a></p> - <div> - <div></div> - </div></td> - </tr> - <tr> - <td width="140" valign="top"><p>Formulaic sequence</p></td> - <td valign="top"><p> “[...] a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar” (Wray 2002: 9). <br> - <em><strong>» for examples</strong></em><strong><em>: see </em></strong><a href="docs/AS_part1.pdf" target="_blank" class="reference">annotation scheme</a>, <em><strong>part: Vocabulary </strong></em><strong></strong></p></td> - </tr> - <tr> - <td width="140" valign="top"><p>grapheme</p></td> - <td valign="top"><p>A grapheme is the smallest semantically distinguishing unit in a written language, analogous to the phonemes of spoken languages. A grapheme may or may not carry meaning by itself, and may or may not correspond to a single phoneme.</p></td> - </tr> - <tr> - <td width="140" valign="top"><h3><a name="IO"></a>I-O</h3></td> - <td valign="top"> </td> - </tr> - <tr> - <td width="140" valign="top"><p>inexistent inflection</p></td> - <td valign="top"><p>inflected forms of a substantive, article, numeral, pronoun or verb that contain a formal error, i.e. a form that does not exist in the inflectional paradigm<br> - <em><strong>» for examples</strong></em><strong><em>: see </em></strong><a href="docs/AS_part1.pdf" target="_blank" class="reference">annotation scheme</a>, <em><strong>part: Grammar </strong></em><strong></strong></p></td> - </tr> - <tr> - <td width="140" valign="top"><p>INS</p></td> - <td valign="top"><p>Refers to TH1/TH2. Insertion / omitted element; annotation describing the type of deviation between the learner text and the target hypothesis (type of target level modification)</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>KWIC</p></td> - <td valign="top"><p>key word in context; the most common way to present concordance lines; looks like your search output in the <em><strong>simple search</strong></em><strong></strong></p></td> - </tr> - <tr> - <td width="140" valign="top"><p>L1, L2</p></td> - <td valign="top"><p>L1 = mother tongue, L2 = foreign language </p></td> - </tr> - <tr> - <td width="140" valign="top"><p>learner language feature </p></td> - <td valign="top"><p>phenomenon of the learner language, can be an error but also not deficit- oriented features like e.g. the use of an idioms or the realization of an indirect request. Learner language features are annotated (EA1, EA2) in MERLIN <br> - <em><strong>» for examples</strong></em><strong><em>: see </em></strong><a href="docs/AS_part1.pdf" target="_blank" class="reference">annotation scheme</a></p></td> - </tr> - <tr> - <td width="140" valign="top"><p>lemma</p></td> - <td valign="top"><p>all inflected forms of a term</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>MERGE</p></td> - <td valign="top"><p>refers to TH1/TH2; merging of two elements; annotation describing the type of deviation between the learner text and the target hypothesis (type of target level modification)</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>metadata</p></td> - <td valign="top"><p>“labels” that are attached to learner texts informing about the author (L1, age, sex) or the text (test institution, rating, etc.)<br> - <strong><em><strong>» </strong>read more about metadata that</em></strong> <strong><em>are available for the MERLIN corpus</em></strong> <em><strong>in <a href="#" onclick="document.forms['mcorpus'].submit();">the MERLIN corpus </a>section</strong></em><strong></strong></p></td> - </tr> - <tr> - <td width="140" valign="top"><p>MOVS/MOVT</p></td> - <td valign="top"><p>refers to TH1/TH2; moved element; annotation describing the type of deviation between the learner text and the target hypothesis (type of target level modification)</p></td> - </tr> - <tr> - <td width="140" valign="top"><p><a name="MFR"></a>Multi-Facet Rasch analysis</p></td> - <td valign="top"><p>probabilistic statistical procedure often used in language testing which allows for a correction of unwished rating tendencies (e.g., leniency/harshness) and makes it possible to arrive at a fair average rating for each text. - <br> - <strong><em> </em></strong><em><strong>» </strong></em><strong><em>for details on re-ratings of learner texts for the MERLIN corpus <a href="#" onclick="document.forms['mcorpus'].submit();">go to "MERLIN corpus" </a></em></strong></p></td> - </tr> - <tr> - <td width="140" valign="top"><p>NLP</p></td> - <td valign="top"><p>abbr.: natural language processing</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>POS</p></td> - <td valign="top"><p>abbr.: part of speech (noun, verb, adjective, etc.)</p></td> - </tr> - <tr> - <td width="140" valign="top"><h3><a name="PZ"></a>P-Z</h3></td> - <td valign="top"> </td> - </tr> - <tr> - <td width="140" valign="top"><p>scheme</p></td> - <td valign="top"><p><em>in the output of the advanced search</em>: level that contains information about the annotated learner language feature <a href="#" onclick="document.forms['help-annis'].submit();" class="reference">find here more explanations about the search output</a></p></td> - </tr> - <tr> - <td width="140" valign="top"><p>sentence</p></td> - <td valign="top"><p>A sentence is a group of words delimited with one of the following punctuation marks that signal the end of a sentence: period, question mark, exclamation mark, quotation mark, or ellipsis. (Hunt 1965, Tapia 1993)</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>SLA</p></td> - <td valign="top"><p>abbr.: second language acquisition</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>SPLIT</p></td> - <td valign="top"><p>refers to TH1/TH2; splitting of two elements; annotation describing the type of deviation between the learner text and the target hypothesis (type of target level modification)</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>sub-corpus</p></td> - <td valign="top"><p>is a small component of the whole corpus usually compiled by choosing from certain criteria such as learner and test features (metadata)</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>tag</p></td> - <td valign="top"><p>annotation (highlighting) of an error or features in the learner text; tags are assigned to single words, phrases or sentences in the learner text. <br> - <a href="#tagspan" class="reference">see tag span</a></td> - </tr> - <tr> - <td width="140" valign="top"><p><a name="tagspan"></a>tag span</p></td> - <td valign="top"><p>area or part of the learner texts that has been annotated with an error or feature; the tag span can be a single word form, several words or whole phrases and sentences (e.g. in case of punctuation errors). </p></td> - </tr> - <tr> - <td width="140" valign="top"><p><a name="th"></a>target hypothesis (TH1, TH2) </p></td> - <td valign="top"><p>correct reconstruction of a learner’s utterances in the target language based on strict rules</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>verb valency</p></td> - <td valign="top"><p>Verb valency refers to the number of arguments controlled by a verbal predicate. Verb valency includes all obligatory arguments, including the subject of the verb. </p></td> - </tr> - <tr> - <td width="140" valign="top"><p>XML</p></td> - <td valign="top"><p>text format (markup language) for creating and encoding documents in a format that is both human-readable and machine-readable</p></td> - </tr> - <tr> - <td width="140" valign="top"><p>ZH1, ZH2</p></td> - <td valign="top"><p>German abbreviations for <a href="#th" class="reference">target hypothesis</a></p></td> - </tr> -</table> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/mcorpus.php b/php/en/old-04-12-14/mcorpus.php deleted file mode 100644 index 6ad19b4df34f7ea4d116137b79b0dae455029721..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/mcorpus.php +++ /dev/null @@ -1,291 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>The MERLIN corpus</h1> -<p> </p> -<p>The MERLIN corpus contains appr. 2,300 texts for learners of Italian, German and Czech that were taken from written examinations of acknowledged test institutions. The exams aim to test knowledge across the levels A1-C1 of the Common European Framework of Reference (CEFR).</p> -<p> </p> -<div id="anchor1"></div> -<h2>Texts and test institutions <a href="#anchor1" onClick="toggle('#content1','#img1')"><img id="img1" src="style/toggle-expand.png"></a></h2> -<p>Standardised texts used in written exams within the Common European Framework of Reference for Languages (CEFR) were extracted for the learner corpus to create texts for written assessments. </p> -<div id="content1" class="content"> -<p>The exam tasks comply with strict international quality standards (e.g., all tasks passed the ALTE audit) and were provided by acknowledged testing institutions – e.g., the <em>Testinstitut TELC</em> in Frankfurt/M. for German and Italian and the Research and Test Centre of the Institute of Language and Preparatory Studies at Charles University in Prague for Czech as L2.</p> -<p><a href="researcher.html" class="reference"> more information on the tests</a></p> -<p><a href="researcher.html" target="_blank" class="reference">more details on data preparation for annotations</a></p> -<p> </p> -</div> -<p> </p> -<div id="anchor2"></div> -<h2>The relation to the Framework of Reference - the MERLIN rating grid <a href="#anchor2" onClick="toggle('#content2','#img2')"><img id="img2" src="style/toggle-expand.png"></a></h2> -<p>To ensure an immediate relation to the CEFR, specially trained testers re-rated all exam texts using the MERLIN rating grid that was developed within the project. ...</p> -<div id="content2" class="content"> -<p>The reliability of the ratings was subjected to rigorous statistical verification procedures. As a result, a reliable rating profile is created for each text in the corpus. The profile reflects both a general holistic overall level and the individual rating criteria detailed below:</p> -<p class="example"><strong>general linguistic range | vocabulary range | vocabulary control | grammatical accuracy | coherence | sociolinguistic appropriateness | orthography</strong></p> -<p><a href="docs/MERLIN_Rating-Grid.pdf" target="_blank" class="reference">download the MERLIN rating grid </a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> -<p><a href="researcher.html" target="_blank" class="reference">more information on the re-ratings </a></p> -</div> -<p> </p> -<div id="anchor3"></div> -<h2>Test tasks <a href="#anchor3" onClick="toggle('#content3','#img3')"><img id="img3" src="style/toggle-expand.png"></a></h2> -<p>We provide a comprehensive overview of the test tasks by target language and CEFR level tested. ...</p> -<div id="content3" class="content"> -<p>The level of the test may differ from the level that the learner text received in the re-ratings.</p> -<p>The tasks are represented using a <a href="https://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CCIQFjAA&url=https%3A%2F%2Fwww.coe.int%2Ft%2Fdg4%2Flinguistic%2FSource%2FCEFRWritingGridv3_1_analysis.doc&ei=bfX1U9LOJcLRywOs4IE4&usg=AFQjCNHsUTjEbfVMmXl4kVJ0h3H8PFyxwQ&bvm=bv.73231344,d.bGQ" target="_blank" class="reference">grid</a> that was developed for these purposes by ALTE (Association of Language Testers in Europe, <a href="www.alte.org" target="_blank" class="reference">www.alte.org</a>). The grid contains detailed information about the tasks and the specific characteristics of the intended text, e.g., regarding topic, register, domain (author: Olaf Bärenfänger).</p> -<p> </p> -<p><a href="docs/Notes- task-description.pdf" class="reference">General notes on task descriptions</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> -<p> </p> -<h3>German</h3> -<table border="0" cellspacing="0" cellpadding="0"> - <tr> - <td width="54" valign="top"><p><strong>A1</strong> </p></td> - <td width="600" valign="top"><p><a href="docs/INF-EMAIL-Help-request-apartment.pdf" target="_blank">Informal e-mail: ask a friend for help with finding an apartment</a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-EMAIL-Appointment-GER.pdf" target="_blank">Informal e-mail: arrange an appointment with a friend to go swimming together</a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Congratulations.pdf" target="_blank">Informal letter: congratulate to birth of a child</a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> - <p> </p></td> - </tr> - <tr> - <td width="54" valign="top"><p><strong>A2</strong></p></td> - <td width="600" valign="top"><p><a href="docs/FORM-LETTER-Housing-office.pdf" target="_blank">Formal letter: to housing office</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Request-pet-sitting.pdf" target="_blank">Informal letter: ask friend to take care of pet</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Offer-a-ticket.pdf" target="_blank">Informal letter: offer a ticket not used to a friend</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> - <p> </p></td> - </tr> - <tr> - <td width="54" valign="top"><p><strong>B1</strong></p></td> - <td width="600" valign="top"><p><a href="docs/INF-LETTER-New-years-wishes.pdf" target="_blank">Informal letter: for New Year to a friend</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Announce-a-visit.pdf" target="_blank">Informal letter: to a friend announcing a visit </a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Congratulations-birthday.pdf" target="_blank">Informal letter: birthday congratulations</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> - <p> </p></td> - </tr> - <tr> - <td width="54" valign="top"><p><strong>B2</strong></p></td> - <td width="600" valign="top"><p><a href="docs/FORM-LETTER-Information-request-Aupair-agency.pdf">Formal letter: ask for information at Au pair Agency</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-LETTER-Complaint-Aupair-Agency.pdf">Formal letter: Au pair writes letter of complaint to Agency </a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-LETTER-Application-internship.pdf">Formal letter: apply for internship in sales department </a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> - <p> </p></td> - </tr> - <tr> - <td width="54" valign="top"><p><strong>C1</strong></p></td> - <td width="600" valign="top"><p><a href="docs/FORM-ESSAY-Learning-German.pdf" target="_blank">Essay: why it's of value to learn German </a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-POSTING-traditions-and-assimilation.pdf" target="_blank">Online article: about sticking to one's traditions and "assimilation" in a new environment </a><img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-REPORT-housing-situation.pdf" target="_blank">Report: about the housing situation</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p></td> - </tr> -</table> -<p> </p> -<h3>Italian</h3> -<table border="0" cellspacing="0" cellpadding="0"> - <tr> - <td width="54" valign="top"><p><strong>A1</strong> </p></td> - <td width="600" valign="top"><p><a href="docs/INF-EMAIL-Appointment.pdf" target="_blank">Informal e-mail: reschedule an appointment</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-EMAIL-Help-a-friend.pdf" target="_blank">Informal e-mail: help a friend who is looking for work</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> - <p> </p></td> - </tr> - <tr> - <td width="54" valign="top"><p><strong>A2</strong></p></td> - <td width="600" valign="top"><p><a href="docs/INF-LETTER-See-a-friend.pdf" target="_blank">Informal letter: go see a friend</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Contact-a-friend.pdf" target="_blank">Informal letter: contact a friend after a long time</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Information-on-lang-courses.pdf" target="_blank">Informal letter: inform friends about language course</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> - <p> </p></td> - </tr> - <tr> - <td width="54" valign="top"><p><strong>B1</strong></p></td> - <td width="600" valign="top"><p><a href="docs/FORM-LETTER-Information-on-lang-courses.pdf" target="_blank">Formal letter: inform oneself about language course</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Cooking-with-teacher.pdf" target="_blank">Informal letter: cook with teacher</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Answer-wedding-invitation.pdf" target="_blank">Informal letter: answer to an wedding invitation</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-LETTER-Help-a-friend-Work.pdf" target="_blank">Informal letter: help a friend who is looking for work after school-leaving exam</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> - <p> </p></td> - </tr> - <tr> - <td width="54" valign="top"><p><strong>B2</strong></p></td> - <td width="600" valign="top"><p><a href="docs/INF-LETTER-Help-with-chats.pdf" target="_blank">Informal letter: help someone who has problems with chats</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-LETTER-Experiences-lang-learning.pdf" target="_blank">Formal letter: describe experiences with language learning</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-LETTER-Complaint-hotel.pdf" target="_blank">Formal letter: complaining against a hotel</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-LETTER-Information-request-cooking.pdf" target="_blank">Formal letter: ask for information about International Cooking Evenings</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-LETTER-Information-on-aid-project.pdf" target="_blank">Formal letter: inform oneself about an aid project</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-LETTER-Application-internship-company.pdf" target="_blank">Formal letter: apply for internship in company</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-LETTER-Application-internship-fashion-sector.pdf" target="_blank">Formal letter: apply for internship in fashion sector</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p></td> - </tr> -</table> -<p> </p> -<h3>Czech</h3> -<table border="0" cellspacing="0" cellpadding="0"> - <tr> - <td width="54" valign="top"><p><strong>A2</strong> </p></td> - <td width="560" valign="top"><p><a href="docs/INF-EMAIL-Answer-an-invitation.pdf" target="_blank">Informal e-mail: answering a birthday invitation</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-DESCRIPTION-swimming.pdf" target="_blank">Description of a photo: swimming in the sea</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-EMAIL-hotel.pdf" target="_blank">Formal e-mail: write an email to a hotel</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-DESCRIPTION-playground.pdf" target="_blank">Description of a photo: playground</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-DESCRIPTION-situation-at-the-window.pdf" target="_blank">Description of a photo: woman sitting at the window</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> -<p> </p></td> - </tr> - <tr> - <td width="54" valign="top"><p><strong>B1</strong></p></td> - <td width="560" valign="top"><p><a href="docs/INF-EMAIL-Answer-to-a-friend_A.pdf" target="_blank">Informal e-mail: answer to the email of Alena, a friend</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/INF-EMAIL-Answer-to-a-friend_J.pdf" target="_blank">Informal e-mail: answer to the email of Jana, a friend</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/FORM-EMAIL-Information-request-Tandem-Agency.pdf" target="_blank">Formal e-mail: Information request, e-mail to a Tandem agency</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p> - <p> </p></td> - </tr> - <tr> - <td width="54" valign="top"><p><strong>B2</strong></p></td> - <td width="560" valign="top"><p><a href="docs/ESSAY-everywhere-well-but.pdf" target="_blank">Essay: Everywhere well but at home the best</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/ESSAY-No-pain-no-gain.pdf" target="_blank">Essay: No pain no gain</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/ESSAY-Friend-in-need.pdf" target="_blank">Essay: A friend in need is a friend indeed</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/ESSAY-More-people-know-more.pdf" target="_blank">Essay: More people know more</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/ESSAY-School-basis-for-life.pdf" target="_blank">Essay: School is the basis of life</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"><br> - <a href="docs/ESSAY-Clothes-make-the-man.pdf" target="_blank">Essay: Clothes make the man</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16"></p></td> - </tr> -</table> -</div> -<p> </p> - -<div id="anchor5"></div> -<h2>Available metadata <a href="#anchor5" onClick="toggle('#content5','#img5')"><img id="img5" src="style/toggle-expand.png"></a></h2> -<div id="content5" class="content"> -<p class="screen-reader-text">Jeder Lernertext ist beschrieben mit: </p> -<ul type="disc"> - <li class="screen-reader-text">Angaben zum Autor: Alter, Geschlecht, Muttersprache (L1) </li> - <li class="screen-reader-text">Angaben zum Prüfungsformat: ID der Prüfungsaufgabe, Thema, Register (formell/informell), Textsorte und geprüftes Niveau</li> - <p>Each text in the corpus is described with the following metadata: </p> - <p> </p> -</ul> -<p class="example"><strong>Information about the test author:</strong> <br> - age, sex, mother tongue (L1)</p> -<p class="example"> <strong>Information about the test</strong>: <br> - Task ID, topic, register (formal/informal), genre and level of the test the written production was extracted from</p> -<p class="example"> <strong>Ratings</strong>: </p> -<ol class="example"> - <li> - <p class="example">Overall rating: CEFR level the test received in the re-rating</p> - </li> - <li> - <p class="example"> Fair CEFR level according to single rating criteria: general linguistic range | vocabulary range | vocabulary control | grammatical accuracy | coherence | sociolinguistic appropriateness | orthogaphy</p> - </li> -</ol> -<p> </p> -<p>You can filter all search results using these metadata. In this way you can analyse phenomena of the learner language for certain groups of learners. Metadata can be displayed and exported for each text.</p> -</div> -<p> </p> -<div id="anchor6"></div> -<h2>The MERLIN corpus in figures <a href="#anchor6" onClick="toggle('#content6','#img6')"><img id="img6" src="style/toggle-expand.png"></a></h2> -<p> </p> -<div id="content6" class="content"> -<p><strong>Number of texts per CEFR level of the test (test level) compared to the number of texts per CEFR level assigned in the re-rating (fair average)</strong></p> -<p> </p> -<table width="424" border="1" cellpadding="2" cellspacing="0" bordercolor="#035683"> - <tr> - <td width="90" valign="bottom"></td> - <td width="148" colspan="2" valign="bottom"><p><strong>Test Level </strong><strong></strong></p></td> - <td width="186" colspan="2" valign="bottom"><p><strong>Fair Average</strong><strong></strong></p></td> - </tr> - <tr> - <td width="90" rowspan="5"><p><strong>Czech</strong><strong></strong></p></td> - <td width="90" valign="bottom"><p><strong> </strong></p></td> - <td width="57" valign="bottom"><p><strong> </strong></p></td> - <td width="90" valign="bottom"><p><strong>A1</strong></p></td> - <td width="96" valign="bottom"><p>1</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>A2</strong></p></td> - <td width="57" valign="bottom"><p>111</p></td> - <td width="90" valign="bottom"><p><strong>A2</strong></p></td> - <td width="96" valign="bottom"><p>189</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>B1</strong></p></td> - <td width="57" valign="bottom"><p>143</p></td> - <td width="90" valign="bottom"><p><strong>B1</strong></p></td> - <td width="96" valign="bottom"><p>165</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>B2</strong></p></td> - <td width="57" valign="bottom"><p>188</p></td> - <td width="90" valign="bottom"><p><strong>B2</strong></p></td> - <td width="96" valign="bottom"><p>81</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong> </strong></p></td> - <td width="57" valign="bottom"><p> </p></td> - <td width="90" valign="bottom"><p><strong>C1</strong></p></td> - <td width="96" valign="bottom"><p>2</p></td> - </tr> - <tr> - <td width="90" rowspan="4"><p><strong>Italian</strong><strong></strong></p></td> - <td width="90" valign="bottom"><p><strong>A1</strong></p></td> - <td width="57" valign="bottom"><p>207</p></td> - <td width="90" valign="bottom"><p><strong>A1</strong></p></td> - <td width="96" valign="bottom"><p>29</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>A2</strong></p></td> - <td width="57" valign="bottom"><p>202</p></td> - <td width="90" valign="bottom"><p><strong>A2</strong></p></td> - <td width="96" valign="bottom"><p>378</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>B1</strong></p></td> - <td width="57" valign="bottom"><p>201</p></td> - <td width="90" valign="bottom"><p><strong>B1</strong></p></td> - <td width="96" valign="bottom"><p>394</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>B2</strong></p></td> - <td width="57" valign="bottom"><p>201</p></td> - <td width="90" valign="bottom"><p><strong>B2</strong><strong></strong></p></td> - <td width="96" valign="bottom"><p>2</p></td> - </tr> - <tr> - <td width="90" rowspan="6"><p><strong>German</strong><strong></strong></p></td> - <td width="90" valign="bottom"><p><strong>A1</strong></p></td> - <td width="57" valign="bottom"><p>206</p></td> - <td width="90" valign="bottom"><p><strong>A1</strong></p></td> - <td width="96" valign="bottom"><p>57</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>A2</strong></p></td> - <td width="57" valign="bottom"><p>209</p></td> - <td width="90" valign="bottom"><p><strong>A2</strong></p></td> - <td width="96" valign="bottom"><p>297</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>B1</strong></p></td> - <td width="57" valign="bottom"><p>210</p></td> - <td width="90" valign="bottom"><p><strong>B1</strong></p></td> - <td width="96" valign="bottom"><p>331</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>B2</strong></p></td> - <td width="57" valign="bottom"><p>204</p></td> - <td width="90" valign="bottom"><p><strong>B2</strong></p></td> - <td width="96" valign="bottom"><p>293</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>C1</strong></p></td> - <td width="57" valign="bottom"><p>204</p></td> - <td width="90" valign="bottom"><p><strong>C1</strong></p></td> - <td width="96" valign="bottom"><p>42</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong> </strong></p></td> - <td width="57" valign="bottom"><p><strong> </strong></p></td> - <td width="90" valign="bottom"><p><strong>C2</strong></p></td> - <td width="96" valign="bottom"><p>4</p></td> - </tr> - <tr> - <td width="90" valign="bottom"><p><strong>Total</strong><strong></strong></p></td> - <td width="148" colspan="2" valign="bottom"><p><strong>2286</strong></p></td> - <td width="186" colspan="2" valign="bottom"><p><strong>2265</strong></p></td> - </tr> -</table> -<p> </p> -</div> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/research.php b/php/en/old-04-12-14/research.php deleted file mode 100644 index a32c56347aa3d5146258aa6c96b19de379eb1e0a..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/research.php +++ /dev/null @@ -1,292 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>MERLIN for research</h1> -<h2>1. Linking the MERLIN texts to the CEFR</h2> -<div id="anchor11"></div> -<h3><a name="reratings"></a>1.1 Re-ratings</h3> - <a href="#anchor11" onClick="toggle('#content11','#img11')"><img id="img11" src="style/toggle-expand.png"></a> -<div id="content11" class="content"> -<p>The MERLIN texts are the writings sections of CEFR-related, standardized high-quality tests from telc (Frankfurt/Main, Italian and German tests, <a href="http://www.telc.net/" target="_blank" class="reference">homepage</a>) and ÚJOP (Prague, Czech tests, <a href="http://ujop.cuni.cz/" target="_blank" class="reference">homepage</a>). These institutions are ALTE-audited (<a href="http://www.alte.org" target="_blank" class="reference">ALTE-homepage</a>). The <a href="#" onclick="document.forms['mcorpus'].submit();" class="reference">tasks</a> were in use until 2013 and are now freely available on the platform. However, to have explicit and direct information about the CEFR profiles of the written productions themselves (and not only of the tests as a whole), for MERLIN all texts were re-rated independently by two professional raters per language. -The reliability of the re-ratings was examined with the help of Classical Test Theory and a Multi-Facet Rasch analysis. The latter is a probabilistic statistical procedure often used in language testing which allows for a correction of rating tendencies (e.g., leniency/harshness) and makes it possible to arrive at a fair average rating for each text. The intra-rater and inter-rater reliability was generally very high in MERLIN, with some exceptions for Italian. Therefore, the whole re-rating process was repeated for Italian resulting in a satisfying rating quality. -In MERLIN, the fair average is calculated based on a holistic scale (see <a href="#instruments" class="reference">1.2 rating instruments</a>). In the document search, if you compile your own corpus based on CEFR levels, these are also based on the fair average ratings (» <em><strong>document search / Rated CEFR level</strong></em>). -If you are interested in more details regarding the quality of the ratings and the difficulty of the single rating criteria, please consult the <a href="#" onclick="document.forms['download'].submit();" class="reference">technical report</a>. </p> - -<p> </p> -</div> -<div id="anchor12"></div> -<h3><a name="instruments"></a>1.2 Rating instruments </h3> - <a href="#anchor12" onClick="toggle('#content12','#img12')"><img id="img12" src="style/toggle-expand.png"></a> -<div id="content12" class="content"> -<p>Two rating instruments were used: An assessor-oriented version (Alderson 1991) of the holistic scale (page 2 of the <a href="docs/MERLIN_Rating-Grid.pdf" target="_blank" class="reference">MERLIN rating grid</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16">) for "General Linguistic Range" (Chapter 5, CEFR) was accompanied by an analytical rating grid (page 3 of the <a href="docs/MERLIN_Rating-Grid.pdf" target="_blank" class="reference">MERLIN rating grid</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16">) that is closely connected to Table 3 of the CEFR (CoE 2001). This table was of great importance in the process of scaling the CEFR descriptors (North 2005, 2000). The MERLIN version includes six rating criteria (vocabulary range | vocabulary control |grammatical accuracy|coherence & cohesion|orthography| sociolinguistic appropriateness). These criteria stem from scales in Chapter 5 of the CEFR that specifies aspects of communicative L2 competence. For the construction of the grid, descriptors of these scales were modified in an assessor-oriented way. Plus-levels (A2+, B1+) were excluded as the CEFR does not specify descriptors for these levels for all rating criteria. The rating instruments were piloted before their implementation in the MERLIN project.</p> -</div> -<p> </p> - -<h2>2. Preparing the data</h2> -<div id="anchor21"></div> - <h3>2.1 Transcriptions</h3> - <a href="#anchor21" onClick="toggle('#content21','#img21')"><img id="img21" src="style/toggle-expand.png"></a> -<div id="content21" class="content"> -<p>The hand-written original learner texts were transcribed in an xml-based editor (xml mind©) inside the testing institutions (telc/ÚJOP). The transcribers followed <u>transcription guidelines</u> (>>>LINK, available only in German) and the reliability of the transcripts was checked, initially for a sample of 5% of the texts per CEFR level. As many transcription errors were detected, in the end almost all texts had to undergo a revision stage.<br> -The transcription guidelines included tags (inline annotation) for basic textual features such as unreadable or ambiguous stretches of language, foreign language words, emoticons, images, paragraphs, copied words from the rubrics, or greeting formulae. The anonymization (names, places) was part of the transcription process and was carried through based on the guidelines.</p> -<div> - <div> </div> -</div> -</div> -<div id="anchor22"></div> -<h3>2.2 Tools & formats</h3> -<a href="#anchor22" onClick="toggle('#content22','#img22')"><img id="img22" src="style/toggle-expand.png"></a> -<div id="content22" class="content"> -<p>Once the transcriptions were available, all data was converted to PAULA (<a href="purl.org/net/paula" target="_blank" class="reference">purl.org/net/paula</a>), a standoff XML format designed as an exchange format for linguistic annotation. -Further manual annotations were carried through with two tools: MMAX2 (<a href="mmax2.net" target="_blank" class="reference">mmax2.net</a>) and the Falko Excel Add-in (<a href="purl.org/net/falko" target="_blank" class="reference">purl.org/net/falko</a>). MMAX2 is a text annotation tool that allows multi-layered annotation. It was used for the annotation of learner language features (see <a href="#" onclick="document.forms['annotation'].submit();" class="reference">2.3.1</a>). The Falko Add-in was used for annotating both target hypothesis 1 and 2 (» <em> <strong>for more details on the annotation of target hypotheses with the Falko Add-in see</strong></em> <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau und Annotationen_v2.01" target="_blank" class="reference">Falko-Handbuch</a>). -Automatic annotation made use of the UIMA framework (<a href="uima.apache.org" target="_blank" class="reference">uima.apache.org</a>). UIMA allows a modular integration of a wide range of NLP tools such as part-of-speech taggers and parsers. For the advanced search functions, the open source web-browser based search and visualization architecture ANNIS (<a href="purl.org/net/annis" target="_blank" class="reference">purl.org/net/annis</a>) is used in the MERLIN interface (<form name="help-annis" action="../php/index.php" method="post"><input type="hidden" name="curscr" value="help-annis.php"></form><a href="#" onclick="document.forms['help-annis'].submit();" class="reference">see explanations on search output in ANNIS</a>). </p> -</div> -<div id="anchor231"></div> -<h3><a name="annotations"></a>2.3 Annotations</h3> -<p>A short introduction to the structure of the MERLIN annotations is provided <a href="#" onclick="document.forms['annotation'].submit();" class="reference">here</a>. Here, you find more detailed information on the single annotation layers that are available for the whole corpus, for the smaller core corpus, and you find indications on quality control aspects.</p> -<blockquote> - <p><strong>2.3.1 Annotations available for the whole corpus</strong> <a href="#anchor231" onClick="toggle('#content231','#img231')"><img id="img231" src="style/toggle-expand.png"></a></p> -</blockquote> -<div id="content231" class="content"> - <p><img src="style/annotations_GRAPHIC-layer_en1.png" width="534" height="195" alt="EA1"></p> - <p> </p> - <h5><strong>Minimal target hypotheses /target hypotheses 1 (TH1)</strong></h5> - <p>All annotation is necessarily based on human interpretation of what the person who produced the text might have had on his/her mind. It is important to make this interpretation explicit so that MERLIN users can understand the annotations better. Therefore, the MERLIN corpus contains rule-based target hypotheses that suggest a corrected version of the learner texts. <br> - In the main phase of annotation, an orthographically and grammatically correct version of the learner text was created (target hypotheses 1, TH1) for the whole corpus. As little interventions as possible were allowed by the annotator. In this table, you find a simple example (for a definition of the tiers, please refer to the <a href="#" onclick="document.forms['help-annis'].submit();" class="reference">explanations of the search output</a>):</p> - <p><img src="style/TH1_example1.png" width="597" height="68"></p> - <p>The following example by the same learner shows that in TH1, errors from other linguistic areas were ignored. There are content and technical reasons for this.</p> - <p><img src="style/TH1_example2.png" width="596" height="66"></p> - <p>While the orthographical (capitalization error, word boundary error, missing hyphen) and grammatical (missing article) errors are corrected in the TH1 (termed ‘ZH1’ here), the lexically erroneous form *Reisespass (instead of “Reisepass”) was not substituted by another lexeme. Phenomena like this are annotated in the <a href="#corecorpus" target="_blank" class="reference">MERLIN core corpus</a> (for definitions of the errors see <a href="docs/AS_part1.pdf" target="_blank" class="reference">MERLIN annotation scheme</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16">).</p> - <p>The team followed the target hypotheses rules developed for the <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko" target="_blank" class="reference">Falko corpus</a> and adapted them to the project needs where necessary (cf. Reznicek/Lüdeling et al. 2012; » see <a href="#" onclick="document.forms['download'].submit();" class="reference">annotation structure guidelines and FAQ document</a>). In some cases, annotators agreed upon annotation rules on a very fine-grained level. For example, it was decided that in German, the final double <ss> instead of standard German spelling <ß> was not changed in texts in which it might be possible that the learner consistently used the Swiss spelling, which does not use the <ß>. For single decisions that you might be interested in, please consult <a href="#" onclick="document.forms['download'].submit();" class="reference">the FAQ document</a>.</p> - <p>TH1 were compiled for the whole MERLIN corpus. The TH1 were written in Excel with the help of the Falko Add-in. The TH1 was piloted before the actual annotation took place.</p> - <p> If you want to display the TH1 on the MERLIN platform, go to » <strong><em>advanced search. </em></strong> To get explanations about the output you get there, read more <a href="#" onclick="document.forms['help-annis'].submit();" class="reference">here</a>. Alternatively, you can use the » <strong><em>document search</em></strong> to view TH1 for whole texts.</p> - <p> </p> - <p><strong>Useful links & downloads with regard to TH1:</strong><br> - <a href="#" onclick="document.forms['download'].submit();" class="reference">MERLIN annotation structure guidelines</a></p> - <p> Falko guidelines: <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau%20und%20Annotationen_v2.01" target="_blank" class="reference">Das Falko-Handbuch. Korpusaufbau und Annotationen. Version 2.01. HU Berlin</a></p> - <p><a href="#" onclick="document.forms['download'].submit();" class="reference">FAQ document</a></p> -<p> </p> -<h5><strong>Annotation of grammatical and orthographical learner language features – error annotation 1 (EA1)</strong></h5> -<p>Building on the target hypotheses 1, all MERLIN texts were annotated with grammatical and orthographical language features from various sources (error annotation 1 – EA1). You can find a complete list of the features (“tags”) with examples <a href="#" onclick="document.forms['annotation'].submit();" class="reference">here</a>, while the <a href="#" onclick="document.forms['download'].submit();" class="reference">annotation scheme </a> gives you full access to the definitions of each learner language feature and additional examples.</p> -<p>The MERLIN annotation tags for <strong>EA1 and EA2</strong> were derived from …</p> -<ol> - <li> - <p><strong>CEFR scales</strong>: some tags were chosen to support research about the empirical validity of the CEFR scales underlying the <a href="docs/MERLIN_Rating-Grid.pdf">MERLIN analytical rating grid </a><img src="style/document-pdf.png" width="16" height="16"> (chapter 5 of the CEFR, CoE 2001). They can help to control whether the predictions of selected CEFR descriptors correspond to learner behaviour, e.g.: intelligibility, use of idioms, content jumps (<a href="#scale-valid" class="reference">see 3.2 MERLIN for scale validation</a>).  </p> - </li> - <li> - <p>issues in current <strong>SLA research</strong>, e.g. grammatical aspects such as verb valency, word order, negation, or lexical aspects, e.g. the use of formulaic sequences (<a href="#bib" class="reference"> references</a>)</p> - </li> - <li> - <p>features reported to the MERLIN team by <strong>testers, teachers and teacher trainers</strong> in a questionnaire study and in expert interviews as being relevant for assessing language mastery at certain levels, e.g. the verbal aspect in Italian and Czech </p> - </li> - <li> - <p><strong>textbook and language test analyses </strong>revealed further recurrent topics some of which were included in the MERLIN annotation scheme, e.g. German modal verbs</p> - </li> - <li> - <p><strong>learner text analyses</strong> carried out in a random sample of MERLIN texts (5% per test level/language), e.g. use of articles and clitics</p> - </li> - </ol> -<p> </p> -<p>The annotation scheme specifies to which group(s) the single learner language features belong.</p> -<p>Furthermore, most error-related MERLIN tags (EA1 & EA2) incorporate the widely used <strong>‘target language modification’</strong> dimension (cf. Díaz-Negrillo/Fernández-Domínguez 2006). This dimension specifies the type of error: an element might have been omitted, changed, added, repositioned, merged with, or split from another element). You can find details about this in the <a href="docs/AS_part1.pdf" target="_blank" class="reference">annotation scheme</a> <img src="style/document-pdf.png" width="16" height="16">. </p> -<p>You can search for the annotated learner language features in the » <strong><em>advanced search,</em></strong> or you can extract lists of features relevant for a specific linguistic field or a specific CEFR level here <strong><em>» learner language features.</em></strong> -</p> -<p><strong>Further links: </strong></p> -<p><a href="#" onclick="document.forms['help-annis'].submit();" class="reference">advanced search output explanation</a></p> -<p><a href="docs/AS_part1.pdf" target="_blank" class="reference">annotation scheme</a> <img src="style/document-pdf.png" alt="" width="16" height="16">. </p> -<p><a href="'bib" target="_blank" class="reference">references</a></p> -<p><a href="#" onclick="document.forms['annotation'].submit();" class="reference">list with learner language features and examples</a></p> -<p> </p> - -<p> </p> -<div> - <div> </div> -</div> -</div> -<div id="anchor232"></div> -<blockquote> - <p><a name="corecorpus"></a><strong>2.3.2 Annotations in the MERLIN core corpus</strong> <a href="#anchor232" onClick="toggle('#content232','#img232')"><img id="img232" src="style/toggle-expand.png"></a></p> -</blockquote> -<div id="content232" class="content"> - <h5><strong>The structure of the MERLIN core corpus</strong></h5> - <p>For a small pilot sample (the <strong>MERLIN core corpus</strong>), in addition to grammar and orthography more linguistic dimensions are taken into consideration. The <strong>MERLIN core corpus</strong> consists of texts that received <a href="#reratings" class="reference">fair averages</a> of either A2 or B2. Thus, two groups of learners with a clearly distinct level of proficiency can be compared. It is important to notice that the ratings the learners received do not necessarily correspond to the CEFR level of the test they decided to take. You can distinguish between these dimensions in the <em><strong> » document search </strong></em>(“CEFR level of test” and “Rated CEFR level”).</p> - <p>Many outperformed the targeted CEFR levels, while others’ performances were rated lower than the learners would have expected. An extreme case is Italian, where only two texts actually received a B2 level, while many more students took B2 tests. Here, the MERLIN core corpus incorporates the 100 texts that were placed highest on the Rasch logit scale (<a href="#" onclick="document.forms['download'].submit();" class="reference">technical report</a>). </p> - <p><img src="style/annotations_GRAPHIC-layer_en2.png" width="529" height="200"></p> - <p> </p> - <h5><strong>Core corpus: extended target hypotheses/target hypotheses 2 (TH2)  </strong></h5> - <p> Target hypotheses 2 aim at creating an acceptable version of the learner text. This process involves more subjectivity and difficulties of decision reliability, which is why it was separated from the level of target hypotheses 1 like in the Falko project with which there was a strong cooperation. The aim of TH2 is to capture the perspective of <strong>acceptability</strong> of the learner text (not, like for TH1, its correctness). TH2 therefore are an extension of TH1. To this aim, the learner text was still only minimally modified while at the same time its reconstruction comes close to what a native speaker utterance would look like. This reconstruction regards semantic and lexical aspects, pragmatics, and sociolinguistics. Other than in the TH1, phenomena that over-arch sentences and that are determined by the context are modified, too.</p> - <p>You can search for the TH2 in the <em><strong> » advanced search</strong></em><strong></strong>.</p> - <p> </p> - <h5><strong>Core corpus: annotations of sociolinguistic, pragmatic, lexical, and other learner language features  (error annotation 2, EA2)</strong></h5> - <p>For the MERLIN core corpus, many tags from various linguistic perspectives were added to the grammatical and orthographical learner language features annotated in the main stage of the project. These tags stem from the same sources as the EA1 annotations (>>>LINK zur übersicht oben). </p> - <p>You can find detailed information about the single tags which include, for example, the speech act REQUEST, the use of language with an inappropriate level of formality, the use of structures that pertain to spoken language variants, or reference problems in the <a href="docs/AS_part1.pdf" target="_blank" class="reference">annotation scheme</a> <img src="style/document-pdf.png" width="16" height="16">. You can get an overview of the annotated features and find examples <a href="#" onclick="document.forms['annotation'].submit();" class="reference"><strong><em>in this table</em></strong></a>.</p> - <p>Again, the MERLIN tags incorporate the widely used ‘target language modification’ dimension (cf. DÃaz-Negrillo/Fernández-DomÃnguez 2006) which yields information about the type of the learner language feature (an element might have been omitted, changed, added, repositioned, merged with, or split from another element). </p> - <p>You can find these learner language features in the <em><strong>»</strong></em> <strong><em>advanced search</em></strong>. You can compile a list of these features for a particular linguistic area or a specific CEFR level here <em><strong>»</strong></em> <strong><em>learner language features. </em></strong></p> - -<p> </p> -</div> -<div id="anchor233"></div> -<blockquote> - <p><strong>2.3.3 Quality control aspects of the annotation process</strong> <a href="#anchor233" onClick="toggle('#content233','#img233')"><img id="img232" src="style/toggle-expand.png"></a></p> -</blockquote> -<div id="content233" class="content"> - <p>It was important to make sure that the annotations in the MERLIN corpus are as <strong>consistent</strong> as possible, even if a certain degree of subjectivity is unavoidable. To this aim, the MERLIN project carried through a number of measures:</p> - <p> First of all, all instruments (TH 1 & TH2 rules, annotation scheme for EA1 and EA2) were <strong>piloted</strong> before their implementation. This allowed to detect possibly problematic aspects which could be corrected before the annotations started.</p> - <p> Secondly, all annotations are based on <strong>guidelines</strong> (<a href="#" onclick="document.forms['download'].submit();" class="reference">annotation structure guidelines</a>, <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau und Annotationen_v2.01" target="_blank" class="reference">Falko-Handbuch</a>). The guidelines were enriched by <strong>fine-grained decisions</strong> on single aspects of annotation (<a href="#" onclick="document.forms['download'].submit();" class="reference">FAQ document</a>). </p> - <p> A third measure to control the quality of annotations is their <strong>documentation</strong>. Many decisions had to be taken about which tag to apply to what phenomenon, and consistency among the three project languages had to be taken care of. The most important discussions among the annotators are documented in the <a href="#" onclick="document.forms['download'].submit();" class="reference">FAQ document</a>. In the <a href="#" onclick="document.forms['download'].submit();" class="reference">annotation scheme</a>, the ‘related tags’ sections mirror some of the extensive discussion processes. </p> - <p>Last but not least, the reliability of the annotations was controlled also a little bit more formally. <strong>Re</strong><strong>liability</strong> of annotations was controlled for 5% of the texts on each test level for target hypotheses (1 & 2) and error annotation (1 & 2). Different methods were applied: </p> - <blockquote> - <p> In a <strong>qualitative</strong> approach, half of the files were annotated independently by the coders to then be commonly discussed with the aim to arrive at a <strong>consensus</strong>. This happened before the annotation (which was done level by level) of the level started. The texts served as a reference throughout the annotation process. </p> - <blockquote> - <p> The second half of the files checked for reliability was annotated by all coders without their knowledge. This <strong>quantitative</strong>, <strong>double-blind procedure</strong> allows to check for intra-coder reliability (the consistency of one and the same annotator) and inter-coder reliability (the degree of agreement between different annotators). </p> - <p> </p> - <h5><strong>Consistency and interference of annotation layers </strong></h5> - <div> - <div> </div> - </div> - </blockquote> - </blockquote> - <p>From a technical perspective, it was complex to integrate and harmonize the different annotation formats in MERLIN without losing information or creating imprecisions. <br> - At the same time, on a content level, contradictions between the different annotation levels (TH1-EA1-TH2-EA2) were to be avoided.<br> - TH1 and EA1 are closely connected. If there is a change of the learner text on TH1, there ought to be a tag on EA1 that makes the learner language feature explicit in detail. There are single exceptions to this rule which are documented in the "<a href="#" onclick="document.forms['download'].submit();" class="reference">FAQ document</a>". <br> - Also, all EA2 annotations are reflected in TH2. The opposite, however, is not necessarily true: There might be TH2 modifications that are needed to arrive at an acceptable version of the learner text and that are not part of the <a href="#" onclick="document.forms['download'].submit();" class="reference">MERLIN annotation scheme</a>. The MERLIN team might have not included a phenomenon if it was not considered relevant and/or feasible. </p> - <div> - <div> </div> - <div> </div> - </div> -</div> -<p> </p> - -<div id="anchor3"></div> -<h2>3. Using MERLIN for research purposes</strong> <a href="#anchor3" onClick="toggle('#content3','#img3')"><img id="img3" src="style/toggle-expand.png"></a></h2> -<p>The main aim of MERLIN is not research-oriented: the platform was developed for practitioners who need empirical illustrations of rated CEFR levels for Czech, Italian, and German. Recently, an increasing number of initiatives (like <a href="http://www.slate.eu.org/" target="_blank" class="reference">SLATE</a>) have started to collect authentic learner language rated according to CEFR levels. Some of them pertain to the <em>Reference Level Descriptions</em> (RLD) initiative, i.e. a specification of the CEFR levels for single languages (the most prominent example is the <a href="http://www.englishprofile.org/" target="_blank" class="reference">English Profile Project</a>, other projects are ASK for Norwegian, Carlsen 2013, or the Profilo della lingua italiana, Spinelli/Parizzi 2010). The Council of Europe encourages the development of RLDs (CoE 2005, see <a href="http://www.coe.int/t/dg4/linguistic/cadre1_en.asp" title="CoE website for RLD" target="_blank" class="reference">CoE website for Reference Level Descriptions</a>).<br> -From corpora like these, features that characterize CEFR levels (sometimes called “criterial features”, Hawkins/FilipovÃc 2012) can be extracted. This process helps to deepen the understanding of what CEFR-related ratings mean and to build its use on firmer, empirical grounds. MERLIN contributes to the empirically-based exploration of the CEFR for German, Italian, and Czech. It differs from most existing initiatives in that all data, including full texts, test tasks and annotations, are fully and freely available online.<br> -Apart from this major practical aim<strong>, </strong>MERLIN is relevant for research purposes from various perspectives: </p> -<p> </p> - -<div id="anchor31"></div> -<h3><a name="scale-valid"></a>3.1 Validating CEFR scales with MERLIN</h3> - <a href="#anchor31" onClick="toggle('#content31','#img31')"><img id="img31" src="style/toggle-expand.png"></a> -<div id="content31" class="content"> -<p>The Council of Europe effort of scaling the CEFR descriptors (CoE 2001; North 2000; Schneider/North 2000) has led to immense improvements in standardization and transparency in language learning, teaching, and testing. Important decisions about language learners' lives are taken with reference to the CEFR levels. In many ways, it seems as if the scales have acquired a life of their own; often, they are over-estimated, misunderstood and applied in ways that they were not meant to be used for (North 2000). -One crucial aspect that is yet insufficiently understood is the empirical validity of the CEFR scales (Fulcher 2004; Hulstijn 2007): If scales are used to describe or rate learner language, they must reflect what learners actually do (Alderson 1991). -In spite of this, up to date there is almost no research that examines the power of the CEFR descriptors to capture the language learners actually produce (Wisniewski 2014). MERLIN allows to directly analyze the relationship between selected CEFR descriptors (such as "circumlocutions" or "content jumps" which were operationalized and annotated (see <a href="docs/AS_part1.pdf" target="_blank" class="reference">MERLIN annotation scheme</a> <img src="style/document-pdf.png" alt="pdf" width="16" height="16">) and learner language without having to rely on ratings. </p> -</div> -<p> </p> - -<div id="anchor32"></div> -<h3>3.2 MERLIN and second language acquisition studies</h3> - <a href="#anchor32" onClick="toggle('#content32','#img32')"><img id="img32" src="style/toggle-expand.png"></a> - <div id="content32" class="content"> -<p>Many studies from the area of second language acquisition (SLA) refer to proficiency levels when describing the development and the variation of learner language. However, in many cases the proficiency classification is not yet based on procedures that comply with the strict standards that need to be met from the perspective of research-based, high-quality language testing (see for example AERA/APA/NCME; ALTE 2001; Bachman/Palmer 1996; <a href="http://www.ealta.eu.org/documents/archive/guidelines/English.pdf" target="_blank" class="reference">EALTA code of practice</a>). There is a particular lack of strict testing procedures and easily accessible empirical data for languages other than English when it comes to CEFR-based proficiency classifications. -Although MERLIN is small in size, its reliable relationship to the CEFR makes it a precious resource for future SLA studies. Also, it can be used for triangulating and validating data for many existing studies. -</p> -</div> -<p> </p> - -<div id="anchor33"></div> -<h3>3.3 MERLIN to advance NLP of learner language</h3> - <a href="#anchor33" onClick="toggle('#content33','#img33')"><img id="img33" src="style/toggle-expand.png"></a> -<div id="content33" class="content"> -<p>The MERLIN corpus provides valuable data for the development and evaluation of natural language processing tools for learner language (Meurers 2012). The corpus and its meta-information on learners and ratings readily support research on automatic native language identification, enabling such research to go beyond the current English learner focus. In a similar vein, the corpus has already been used for research on automatic proficiency classification for German (Hancke 2013). The MERLIN corpus also provides richly annotated learner data for the development and adaptation of NLP tools and applications that assist language learners in improving their vocabulary usage, coherence, spelling and grammatical accuracy. </p> -</div> -<p> </p> - -<div id="Pub"></div> -<h2><a name="bib"></a>References <a href="#Pub" onClick="toggle('#contentPub','#imgPub')"><img id="imgPub" src="style/toggle-expand.png"></a></h2> -<div id="contentPub" class="content"> -<p>[CEFR 2001] Council of Europe (2001): The Common European framework of reference for languages: Learning, teaching, assessment, Cambridge: Cambridge University Press.<br> - Alderson, J.C. (2007): The CEFR and the need for more research. In: <em>The Modern Languagre Journal </em>91, 658-662. <br> - Alderson, J. C./Figueras, N./Kuijper, H./Nold, G./Takala, S./Tardieu, C. (2006): Analysing Tests of Reading and Listening in Relation to the Common European Framework of Reference: The Experience of the Dutch CEFR Construct Project. In: <em>Language Assessment Quarterly </em>3(1), 3-30.<br> - Alderson, J.C. (1991): Bands and scores. In: Alderson, J.C./North, B. (eds.): <em>Language testing in the 1990s. London: British Council/Macmillan</em>, 71-86.<br> - Arnaud, P. J. L. (1984): The lexical richness of L2 written productionos and the validity of vocabulary tests: In: Culhane, T./Klein-Braley, C./Stevenson, D. K. (eds.): <em>Practice and Problems in Language </em><br> - Arras, U. (2010): Subjektive Theorien als Faktor bei der Beurteilung fremdsprachlicher Kompetenzen. In: Berndt, A./Kleppin, K. (eds.): <em>Sprachlehrforschung: Theorie und Empirie – Festschrift für Rüdiger Grotjahn</em>. Frankfurt: Lang, 169-179.<br> - Bachman, L.F. (2004): Statistical analyses for language assessment. Cambridge: CUP 2004.<br> - Bachmann, T. (2002): <em>Kohäsion und Kohärenz: Indikatoren für Schreibentwicklung: Zum Aufbau kohärenzstiftender Strukturen in instruktiven Texten von Kindern und Jugendlichen.</em> Innsbruck: Studienverlag. <br> - Bausch, K.-R./Christ, H./Königs, F.G./Krumm, H.-J. (eds.) (2003): <em>Der Gemeinsame Europäosche Referenzrahmen für Sprachen in der Diskussion. Arbeitspapiere der 15. Frühjarskonferenz zur Erforschung des Fremdsprachenunterrichts.</em> Tübingen: Narr.<br> - Bardovi-Harlig, K. (2009): Conventional Expressions as a Pragmalinguistic Resource: Recognition and Productions of Conventional Expressions in L2 Pragmatics. In: <em>Language Learning </em>59 (4), 755-795. <br> - Bestgen, Y./Granger, S. (2011): Categorising spelling errors to assess L2 writing. In: <em>International Journal of Continuing Engineering Education and Life Long Learning,</em> 21 (2), 235–252.<br> - Bond, T. G./Fox, C. M. (2007): Applying the Rasch model: Fundamental measurement in human sciences. Mahwah, NJ: Lawrence Erlbaum.<br> - Bulté, B./Housen, A. (2012): Defining and operationalising L2 complexity. In: Housen, A./Kuiken, F./Vedder, I. (eds.): D<em>imensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA</em>. Amsterdam: Benjamins, 21-46.<br> - Burger, H. (2007): <em>Phraseologie. Eine Einführung am Beispiel des Deutschen</em>. (3. Aufl.).Berlin: Erich Schmidt Verlag.<br> - Carlsen, C. (ed.) 2013. <em>Norsk Profil. </em><em>Det felles europeiske rammeverket spesifisert for norsk. Et første steg</em>. Oslo: Novus. <br> - Carlsen, C. (2010): Discourse connectives across CEFR levels: A corpus-based study. In: Bartning, I./Martin, M./Vedder, I. (eds.): <em>Communicative Proficiency and Linguistic Development: intersections between SLA and language testing research</em> (Eurosla). 191-210. purl.org/net/Carlsen-10.pdf<br> - Christ, O. (1994). A modular and flexible architecture for an integrated corpus query system. <em>arXiv preprint cmp-lg/9408005</em>.<br> - Corder, S. P. (1993 [1973]): <em>Introducing Applied Linguistics</em>. Harmondsworth: Pelican.<br> - Dallapiazza, R.M./von Jan, E., Schönherr, T. (1998) (eds.): Tangram: <em>Deutsch als Fremdsprache. Kurs- und Arbeitsbuch 1 A</em>. Munich: Hueber.<br> - Daller, H./van Hou, R./Treffers-Daller, J. (2003): Lexical richness in spontaneous speech of bilinguals. In: <em>Applied Linguistics </em>24, 197-222.<br> - Dewaele, J.-M. (2004): Indiviual differences in the use of colloquial vocabulary. The effects of sociobiographical and psychological factors. In: Bogaards, P./Laufer, L. (eds.): Vocabulary in a secons language. Amsterdam: John Bejamins, 127-154.<br> - Díaz-Negrillo, A./Fernández-Domínguez, J. (2006): Error-coding systems for learner corpora. In: <em>RESLA</em> 19, 83-102.<br> - Eckes, T. (2008): Rater types in writing performance assessments: A classification approach to rater variability. In: <em>Language Testing 25 </em>(2) 155-185.<br> - Eckes, T. (2009): <em>Reference Supplement to the Manual for Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Section H: Many-Facet Rasch Measurement</em>. (http://www.coe.int/t/dg4/linguistic/manuel1_en.asp, January 2014.)<br> - Eisenberg, P. (2007): Sprachliches Wissen im <em>Wörterbuch der Zweifelsfälle</em>. über die Rekonstruktion einer Gebrauchsnorm. In: <em>Aptum. Zeitschrift für Sprachkritik und Sprachkultur</em> 3/2007: 209-228.<br> - Ellis, R. (1994): <em>The study of Second Language Acquisition</em>. Oxford: Oxford University Press.<br> - Fulcher, G. (2004): Deluded by Artifices? The Common European Framework and Harmonization. In: <em>Language Assessment Quarterly</em> 1 (4), 253-266.<br> - Fulcher, G./Davidson, F. (2007): <em>Language Testing and Assessment. </em>London/New York: Routledge.<br> - Gould, S.J. (1996): <em>The mismeasure of man</em>. London: Penguin.<br> - Glaznieks A./Nicolas L./Stemle E./Lyding V./Abel A. (2012): Establishing a Standardised Procedure for Building Learner Corpora. In:<em> Apples - Journal of Applied Language Studies. Special Issue: Proceedings of LLLC2012</em>.<br> - Granger, S. (2003): Error-tagged learner corpora and CALL: a promising synergy. In: <em>CALICO Journal</em> 20 (3). Special issues on error analysis and error correction in computer-assisted language learning, 465-480.<br> - Granger, S. (2008): Learner corpora. In: Lüdeling, A. / Kytö, M. (eds.): <em>Corpus linguistics: an international handbook</em> (Handbooks of linguistics and communication science; 29.1_ 29.2). Berlin - New York: de Gruyter. 259-275.<br> - Granger, S. (2002): A Bird’s-eye view of learner corpus research. In: Granger S,/Hung, J./ Petch-Tyson, St (eds.): <em>Computer Learner Corpora, Second Language Acquisition and Foreign Language Teachin</em>g. Amsterdam: John Benjamins, 3-33.<br> - Halliday, M. A. K. /Hasan, R. (1989): <em>Language, context and text: a social semiotic perspective. </em>Oxford: Oxford University Press.<br> - Hancke J./Meurers D./Vajjala S. (2012): Readability Classification for German using lexical, syntactic, and morphological features<em>. </em>In: <em>Proceedings of the 24th International Conference on Computational Linguistics (COLING)</em>, 1063-1080.<br> - Hancke, J. (2013): <em>Automatic Prediction of CEFR Proficiency Levels Based on Linguistic Features of Learner Language</em>. Master's thesis, University of Tübingen.<br> - Hasil, J./Hájková, E./Hasilová, H. (2007): <em>Brána jazyka českého otevřená</em>. Prague: Karolinum.<br> - Housen, A./Kuiken, F. (2009): Complexity, Accuracy, and Fluency in Second Language Acquisition. In: <em>Applied Linguistics</em> 30 (4), 461-473.<br> - Hulstijn, J. H. (2007): The shaky ground beneath the CEFR: Quantitative and qualitative dimensions of language proficiency. In: <em>The Modern Language Journa</em>l 91, 663–667.<br> - Hulstijn, J. H./Alderson, C./Schoonen, R. (2010): Developmental stages in second-language acquisition and levels of second-language proficiency: Are there links between them? In: Bartning, I./Martin, M./Vedder, I. (eds.): <em>Communicative Proficiency and Linguistic dvelopment: intersections between SLA and language testing research</em>. Eurosla Monograph Series. (<a href="http://eurosla.org/monographs/EM01/EM01home.html">http://eurosla.org/monographs/EM01/EM01home.html</a><em>) </em> <br> - Laufer, B./Nation, P. (1995): Vocabulary size and use: lexical richness in L3 written production. In: <em>Applied Linguistics </em>16, 307-322.<br> - Little, D. (2007): The Common European Framework of Reference for Languages: Perspectives on the Making of Supranational Languages Education Policiy. In: <em>The Modern Language Journal</em> 91, 645-655.<br> - Lu, X. (2011): A corpus-based evaluation of syntactic complexity measures as indices of College-level ESL writers’ language development. In: <em>TESOL Quarterly</em> 45 (1) 36-62.<br> - Lu, X. (2010): Automatic analysis of syntactic complexity in second language writing. In: <em>International Journal of Corpus Linguistics</em> 15 (4), 474-496.<br> - Lüdeling, A. (2008): Mehrdeutigkeiten und Kategorisierung: Probleme bei der Annotation von Lernerkorpora. In: Walter, M./Grommes, P. (eds.): <em>Fortgeschrittene Lernervarietäten: Korpuslinguistik und Zweitsprachenerwerbsforschung. </em>Tübingen: Niemeyer, 119-140.<br> - Lüdeling, A./Walter, M./Kroymann, E./Adolphs, P. (2005): Multi-level Error Annotation in Learner Corpora. In: Hunston, S./Danielsson, P. (eds.): <em>Proceedings from the Corpus Linguistics Conference Series</em> (Corpus Linguistics 2005, Birmingham, 1415 July 2005). (<a href="http://www.corpus.bham.ac.uk/PCLC">http://www.corpus.bham.ac.uk/PCLC</a>) <br> - Malvern, D./Richards, B./Chipere, N./Durán, P. (2008): <em>Lexical Diversity and Language Development. Quantification and Assessment. </em>New York: Palgrave Macmillan.<br> - Mellor, A. (2011): Essay Length, Lexical Diversity and Automatic Essay Scoring. In: <em>Memoirs of the Osaka Institute of Technology</em>, Series B Vol. 55, No. 2 (2011), 1-14.<br> - Meurers, D. (2012): Natural Language Processing and Language Learning. <em>Encyclopedia of Applied Linguistics</em>. Blackwell. purl.org/dm/papers/meurers-11.html<br> - Mezzadri, M. (2000). <em>Rete! Book 1</em>. Perugia: Guerra Edizioni.<br> - Müller, Ch./Strube M. (2006): Multi-Level Annotation of Linguistic Data with MMAX2. In: S. Braun, K. Kohn, J. Mukherjee (Eds.): Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods. Frankfurt: Peter Lang, 197-214.<br> - Nation, P. (2001): <em>Learning vocabulary in another language</em>. Cambridge: Cambridge University Press.<br> - Nation, P. (2007): Fundamental issues in modelling and assessing vocabulary knowledge. In: Daller, H./ Milton, J./Treffers-Daller, J. (eds.): <em>Modelling and Assessing Vocabulary Knowledge</em>. Cambridge: Cambridge University Press.<br> - Nesselhauf, N. (2005): <em>Collocations in a Learner Corpus</em>. Amsterdam: John Benjamins.<br> - North, B. (2000): <em>The Development of a Common Framework Scale of Language Proficiency. </em>Oxford: Peter Lang.<br> - O’Loughin, K. (1995): Lexical density in candidate output on direct and semi-direct versions of an oral proficiency test. In: <em>Language Testing </em>12 (2), 217-237. <br> - Ortega, L. (2003): Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. In: <em>Applied Linguistics</em> 24 (4), 492-518.<br> - Paquot, M./Granger, S. (2012): Formulaic language in Learner Corpora. In: <em>Annual Review of Applied Linguistics </em>32, 130-149.<br> - Pollitt, A./Murray, N.L. (1996): What raters really pay attention to. In: Milanovic, M./Saville, N. (eds.): <em>Performance testing, cognition and assessment; Selected papers from the 15th Language Testing Research Colloquium.</em> Cambridge: Cambrudge University Press, 74-91.<br> - Read, J./Nation, P. (2004): Measurement of formulaic sequences. In: Schmitt, N. (ed.): <em>Formulaic sequences: Acquisition, processing and use. </em>Amsterdam: John Benjamins, 23-35.<br> - Read, J. (2000): <em>Assessing vocabular</em>y. Cambridge: Cambridge University Press.<br> - Reznicek, M./Lüdeling, A./Krummes, C./Schwantuschke, F./Walter, M./Schmidt, K./Hirschmann, H./Andreas,T. (2012): <em>Das Falko-Handbuch. Korpusaufbau und Annotatione</em>n. Version 2.01. HU Berlin (<a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau%20und%20Annotationen_v2.01">http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau%20und%20Annotationen_v2.01</a>)<br> - Reznicek, M./Lüdeling, A./Hirschmann, H. (in print): Competing Target Hypotheses in the Falko Corpus. A Flexible Multi-Layer Corpus Architecture. In: DÃaz-Negrillo, A./Ballier, N./Thompson, P. (eds.): <em>Automatic Treatment and Analysis of Learner Corpus Data</em>. Amsterdam: John Benjamins (Series Studies in Corpus Linguistics).<br> - Rimrott, A./Heift, T. (2008): Evaluating automatic detection of misspellings in German. In: <em>Language Learning & Technology</em> 11 (3), 73-92.<br> - Schmitt, N./Carter, N. (2004): Formulaic sequences in action: An Introduction. In: Schmitt, N. (ed.): <em>Formulaic sequences: Acquisition, processing and use. </em>Amsterdam: John Benjamins, 1-21.<br> - Schneider, J. G. (2013): Sprachliche ‚Fehler‘ aus sprachwissenschaftlicher Sicht. In: <em>Sprachreport</em> 1-2/2013, 30-37.<br> - Spinelli, B./Parizzi, F. (ed.) (2010): <em>Profilo della lingua italiana</em>. Firenze: La Nuova Italia.<br> - Stede, M. (2007): Korpusgestützte Textanalyse. Grundzüge der Ebenen-orientierten Textlinguistik. Tübingen: Narr.<br> - Trosborg, A. (1995): <em>Interlanguage Requests and Apologies. </em>Berlin: de Gruyter.<br> - Vajjala, S./Meurers, D. (2012): On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition. In: Tetreault, J./Burstein, J./ Leacock, C. (eds.): <em>Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA7) at NAACL-HLT</em>. Montreal, Canada: Association for Computational Linguistics, 163–173.<br> - Vaughan, C. (1991): Holistic assessment: What goes on in the rater’s mind? In: Hamp-Lyons L. (ed.): <em>Assessing Second Language Writing in Academic Contexts. </em>Norwood: Ablex, 111.125.<br> - Wisniewski, K. (2013): The empirical validity of the CEFR fluency scale: the A2 level description. In: Galaczi, E.D./Weir, C.J. (eds.): <em>Exploring Language Frameworks: Proceedings of the ALTE Krakow Conference</em>. Cambridge: Cambridge University Press, 253-272. Studies in Language Testing.<br> - Wisniewski, K. (2014):<em> Die Validität der Skalen des Gemeinsamen europäischen Referenzrahmens für Sprachen. Eine empirische Untersuchung der Flüssigkeits- und Wortschatzskalen des GeRS am Beispiel des Italienischen und des Deutschen</em>. Frankfurt: Peter Lang. Language Testing and Evaluation Series, 33.<br> - Wisniewski, K./Schöne, K./Nicolas, L./Vettori, C./ Boyd, A./Meurers, D./ Abel, A./Hana, J. (2013): MERLIN: An online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data. In: <em>ICT for Language Learning, Conference Proceedings 2013</em>. Libreriauniversitaria.it Edizioni. (<a href="http://conference.pixel-online.net/ICT4LL2013/common/download/Paper_pdf/322-CEF03-FP-Wisniewski-ICT2013.pdf">http://conference.pixel-online.net/ICT4LL2013/common/download/Paper_pdf/322-CEF03-FP-Wisniewski-ICT2013.pdf</a>) <br> - Wisniewski, K. / Abel, A. (2012): Die Sprachkompetenzerhebung: Theorie, Methoden, Qualitätssicherung. In: Abel, A. / Vettori, C. / Wisniewski, K. (eds.): <em>Gli studenti altoatesini e la seconda lingua: indagine linguistica e psicosociale. / Die Südtiroler SchülerInnen und die Zweitsprache: eine linguistische und sozialpsychologische Untersuchung</em>. Volume 1 – Band 1. Bolzano - Bozen: Eurac. 13-64 (<a href="http://www.eurac.edu/en/research/publications/PublicationDetails.aspx?pubId=0100156&type=Q">http://www.eurac.edu/en/research/publications/PublicationDetails.aspx?pubId=0100156&type=Q</a>)<br> - Wolfe-Quintero, K./Inagaki, S./ Kim, H.-Y. (1998): <em>Second Language Development in Writing: Measures of Fluency, Accuracy & Complexity</em>. Honolulu: Second Language Teaching & Curriculum Center, University of Hawaii at Manoa.<br> - Yang, W./Sun, Y. (2012): The use of cohesive devices in argumentative writing by Chinese EFL learners at different proficiency levels. In: <em>Linguistics and Education</em>, 23 (1), 31-48. <br> - Wray, A. (2002): <em>Formulaic Language and the Lexicon</em>. Cambridge: Cambridge University Press.<br> - Zeldes, A./Ritz J./Lüdeling A. et al. (2009): <em>Annis: A search tool for multi-layer annotated corpora. In Proceedings of Corpus Linguistics</em>, July 20-23. Liverpool. (<a href="http://ucrel.lancs.ac.uk/publications/cl2009/">http://ucrel.lancs.ac.uk/publications/cl2009/</a>) <br> - Zipser, F./Romary, L./al. (2010). A model oriented approach to the mapping of annotation formats using standards. In: <em>Workshop on Language Resource and Language Technology Standards, LREC 2010</em>.</p> -<p> </p> -</div> -<p> </p> - -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/start-old.php b/php/en/old-04-12-14/start-old.php deleted file mode 100644 index bab67d311fec8c99d4286ac9e31c4a7c7aa7a575..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/start-old.php +++ /dev/null @@ -1,56 +0,0 @@ -<div id="content-menu3" style="min-height:500px"> - -<div id="merlin-info" style="float:none; width:684px"> -<h3>The MERLIN project</h3> -<p>The MERLIN corpus project provides access to empirical learner language for those working with the Common European Framework of Reference for Languages (CEFR). The MERLIN platform allows CEFR users to explore authentic written learner productions for Czech, German, and Italian. The learner texts stem from standardized language tests and are reliably related to the CEFR levels. -<a href="#" onclick="document.forms['about'].submit();">>> read more</a> -</p> -</div> - -<div id="merlin-info" style="width:340px"> -<h3>What can I use MERLIN for?</h3> -<p>MERLIN offers you support for teaching, learning, or testing Czech, German, and Italian. For example, you can ... -<p><ul> -<li>find example texts for a specific CEFR level and bring them to the classroom -<a href="#" onclick="document.forms['documents'].submit();">>> document search</a></li> -<li>search for a word in learner texts and explore how learners use it -<a href="#" onclick="document.forms['simple'].submit();">>> simple search</a></li> -<li>create a sub-corpus to explore errors that are typical or frequent with learners with the same L1 or at the same age -<a href="#" onclick="document.forms['documents'].submit();">>> document search</a></li> -<li>search for examples of learner language features, e.g. grammatical or orthographical errors, on a specific CEFR level -<a href="#" onclick="document.forms['advanced'].submit();">>> advanced search</a></li> -<li>find examples of errors related to a specific word (e.g. valency errors with the verb "warten") -<a href="#" onclick="document.forms['advanced'].submit();">>> see example queries in the advanced search</a></li> -<li>compile lists of frequent errors in texts that you defined in your sub-corpus -<a href="#" onclick="document.forms['feature'].submit();">>> learner language features</a></li> -</ul></p> -<p><a href="#" onclick="document.forms['teacher'].submit();">>> read more</a> -</div> - -<div id="merlin-info" style="width:305px"> -<h3>MERLIN search and export functions</h3> -<p>You can search for: -<ul> -<li>occurrences of a word in learner texts -<a href="#" onclick="document.forms['simple'].submit();">>> simple search</a></li> -<li>full texts using learner- and test-specific metadata for filtering -<a href="#" onclick="document.forms['documents'].submit();">>> document search</a></li> -<li>learner language features, e.g. grammatical or orthographical errors, in the whole corpus or in a sub-corpus -<a href="#" onclick="document.forms['advanced'].submit();">>> advanced search</a></li> -</ul> -</p> -<p><a href="#" onclick="document.forms['help'].submit();">>> Please visit our tutorial "How to search in MERLIN"</a> -<br> -<p>You can compile and export: -<ul> -<li>a sub-corpus using learner- and test-specific metadata -<a href="#" onclick="document.forms['documents'].submit();">>> document search</a></li> -<li>results of your search for words and learner language features in their context -<a href="#" onclick="document.forms['advanced'].submit();">>> advanced search</a></li> -<li>feature lists for the whole corpus or a sub-corpus -<a href="#" onclick="document.forms['feature'].submit();">>> learner language features</a></li> -</ul> -</p> - -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/start.php b/php/en/old-04-12-14/start.php deleted file mode 100644 index 1856cae872dbcff296390e259531c3a2e9b5958e..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/start.php +++ /dev/null @@ -1,16 +0,0 @@ -<div id="content-menu3" style="min-height:200px"> - -<div id="merlin-info" style="width:390px; height:110px;"> -<h3>The MERLIN corpus</h3> -<p>MERLIN provides access to 2.286 texts written by learners of <b>Czech</b>, <b>Italian</b> and <b>German</b>.</p> -<p>The learner texts stem from standardized language tests and they have been reliably related to the CEFR levels. <a href="#" onclick="document.forms['mcorpus'].submit();">>> read more</a></p> -</div> - -<div id="merlin-info" style="width:280x; height:110px;"> -<h3>Use MERLIN ...</h3> -<p>... to better understand the levels of the Common European Framework of Reference (CEFR). -<a href="#" onclick="document.forms['teacher'].submit();">>> read more</a></p> -</div> - - -</div> diff --git a/php/en/old-04-12-14/teacher.php b/php/en/old-04-12-14/teacher.php deleted file mode 100644 index 148a597e1a70b2c65150368059ac2f16fc4fae35..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/teacher.php +++ /dev/null @@ -1,90 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>MERLIN for CEFR-related language learning, teaching, and testing</h1> -<h2>Using MERLIN for language teaching</h2> -<div id="anchor3"></div> -<h3>MERLIN in the language classroom</h3> -<a href="#anchor3" onClick="toggle('#content3', '#img3')"><img id="img3" src="style/toggle-expand.png"></a> -<p><em>Make your students understand CEFR levels: </em> -<div id="content3" class="content"> -<p>You can prepare your sub-corpus of MERLIN texts (e.g., sorted according to CEFR ratings) and bring it to your language classroom. Your learners can discuss strengths and weaknesses of written productions. </p> -<p> </p> -<p><em>Make your students understand their own L2 competence with relation to CEFR levels</em>: Your learners can use the <a href="docs/MERLIN_Rating-Grid.pdf" target="_blank" class="reference">MERLIN rating grid</a> for self-evaluation, they can do one or more <a href="#" onclick="document.forms['mcorpus'].submit();" class="reference">MERLIN tasks</a>, and they can compare their performances to the sub-corpus you prepared. Thus, they can more easily understand where they are in their language learning process as well. This might be more appropriate for learners from B1.</p> -<p> </p> - -<p class="example">To find written test of learners that performed on a specific CEFR-level: <br> -<span class="Stil5">Use the <strong>» Document search</strong> to filter e.g. for Italian texts rated B1 and B2 on the topic "describe experiences with language learning”</span></p> -<p> </p> -<p><em>Bring the platform to the classroom:</em> You can also let your (advanced) students look for language phenomena in the MERLIN corpus by themselves in order to familiarize them with the technology and enhance their autonomy in language learning. They could do peer-group error analyses of MERLIN samples, but also of texts of their own. You could have them compare MERLIN data with a native speaker corpus to illustrate differences in language use. </p> -<p> </p> -</div> -<p> </p> -<div id="anchor2"></div> -<h3>“Hands on” for material writers</h3> -<a href="#anchor2" onClick="toggle('#content2', '#img2')"><img id="img2" src="style/toggle-expand.png"></a> -<p><em>Explore crucial aspects of language learning, such as learners' use of collocations, verbal aspect, and mood, etc. and find suitable examples for your own materials</em>.</p> -<div id="content2" class="content"> -<p>You can then use data from the corpus to add usage notes to your materials, e.g. hints on correct use of a structure or suggestions to avoid the overuse or underuse of words or structures.</p> -<p> </p> -<p class="example">To find examples for the wrong usage of a specific structure, e.g. verbal aspect, by native speakers of German: <br> - <span class="Stil5">Create and save a sub-corpus of Czech texts written by authors with L1=German <strong>» document search</strong> <br> - AND make a list of all instances of erroneous use of verbal aspect in Czech for your sub-corpus <strong>» learner language features</strong> -> "Select learner language feature": verb </em></span><span class="example"></span> -<p class="example">To search for a word in learner texts and explore how learners use it, and which errors are related to it: <span class="Stil5">use <strong> simple search</strong>, type e.g. <em>"Wohnung</em>" in the search field and choose "search in target hypothesis".</span> </p> -<p class="example">To explore errors related to a specific lexical item: <br> - <span class="Stil5">Use the <strong>» advanced search</strong> to search for instances of the verb “warten” and choose e.g. “preposition” or “verb valency” from the learner language features. </span></p> - -<p> </p> -<p>Many teaching materials, including the vast majority of textbooks, claim to be related to the CEFR, but they do not make use of authentic learner language data. In addition, often learners proficiency comes in a profile, so that a learner might be more successful in grammar than, for example, in vocabulary.</p> - -<p><em>Use MERLIN to explore these different aspects of learners' communicative L2 competence</em><em>, e.g. vocabulary range/control, grammatical accuracy, coherence/cohesion, on different CEFR levels</em> and develop your own materials tailored to your students.</p> - -<p class="example">To get an impression of what texts with a CEFR-related rating of these dimensions of language proficiency look like: <span class="Stil5">Create a sub-corpus of texts with vocabulary control/grammatical accuracy/coherence and cohesion rated B2 <strong>» document search</strong></span></p> -<p> </p> -</div> - -<div id="anchor1"></div> -<h3>Syllabus and curriculum development </h3> <a href="#anchor1" onClick="toggle('#content1', '#img1')"><img id="img1" src="style/toggle-expand.png"></a> -<p>Most syllabi, curricula and even national educational standards in Europe refer to the CEFR. ... </p> -<div id="content1" class="content"> -<p>Nevertheless, often it is not well understood what learner language on these levels is like. <br> - MERLIN helps you to concretely identify typical & relevant milestones/errors in learner language with reference to CEFR levels. It can thus support decisions about the selection and progression of syllabus / curriculum contents. </p> -<p> </p> -<p class="example">To get a general impression of what B1 texts look like: <span class="Stil5"><em><br> - Create your own corpus of texts extracted from Italian tests rated B1</em></span><em class="Stil5"> <strong>» Document search </strong></em><br> -To find out typical problems learners have on a specific CEFR level: <br> -<span class="Stil5"><em>Compile a list of frequent learner language features, e.g. grammatical errors <strong>»</strong> <strong>Learner language features</strong></em></span></p> -</div> -<p> </p> - -<p> </p> -<div id="anchor4"></div> -<h2>Using MERLIN for language testing <a href="#anchor4" onClick="toggle('#content4', '#img4')"><img id="img4" src="style/toggle-expand.png"></a></h2> -<div id="content4" class="content"> -<p>Most European language tests are (or claim to be) related to the CEFR. While the Council of Europe provides numerous <a href="http://www.coe.int/t/dg4/linguistic/cadre1_en.asp" target="_blank" class="reference">helpful materials</a>, there is not yet much empirical data (i.e. CEFR-related language samples) to support the test development process, especially for languages other than English (for English, see <a href="http://www.englishprofile.org/" target="_blank" class="reference">www.englishprofile.org</a>). </p> - - <p>We believe the MERLIN data help to enhance transparency and quality in test construction. MERLIN is useful for familiarization with the CEFR, and it can be used for benchmarking purposes. It can be used for empirically based development of assessment materials.</p> -<p> Furthermore, MERLIN data lends itself to the empirical validation of the CEFR scales (see <a href="#" onclick="document.forms['research'].submit();" class="reference">MERLIN for research</a>) and might be helpful for empirically based rating scale construction.</p> - <p>You can use MERLIN in your institutions to create a common understanding of the CEFR levels and to practice rating procedures of written texts.</p> -<p> </p> -<p class="example">To extract a random sample of written tests on a specific tasks: <br> - <span class="Stil5">Go to<strong><em>» Document search</strong> and filter for tests on a specific task topic, e.g. "andare a trovare un amico" (Task ID: INF-LETTER-See-a-friend).</span><br> -To adjust the rating behavior among your teacher colleagues: <br> -<span class="Stil5">Have the example texts re-rated by your colleagues using the <strong><a href="docs/MERLIN_Rating-Grid.pdf" target="_blank">» MERLIN rating grid</a>. </strong>The results can be discussed in the group and they can be compared to the MERLIN ratings.</span></p> -<p> </p> -<h3>Links </h3> -<p>Council of Europe (2011). <a href="http://www.coe.int/t/dg4/linguistic/Cadre1_en.asp" target="_blank">Common European Framework of Reference for: Learning, Teaching, Assessment</a>. Council of Europe.</p> -<p>The English profile: <a href="http://www.englishprofile.org/" target="_blank">www.englishprofile.org</a></p> -<p>Council of Europe materials supporting the use of the CEFR: <a href="http://www.coe.int/t/dg4/linguistic/Cadre1_en.asp " title="http://www.coe.int/t/dg4/linguistic/Cadre1_en.asp " target="_blank">http://www.coe.int/t/dg4/linguistic/Cadre1_en.asp </a></p> - -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file diff --git a/php/en/old-04-12-14/team.php b/php/en/old-04-12-14/team.php deleted file mode 100644 index 8e778e49f2e902cdbf6bb8372c316d315740f13a..0000000000000000000000000000000000000000 --- a/php/en/old-04-12-14/team.php +++ /dev/null @@ -1,60 +0,0 @@ -<div id="main"> -<?php -// side bar -require('F_mainsidebar.php'); -?> -<div id="mainpartwrapper"> - <div id="mainpart3"> - <div id="content-menu3"> -<!--INSERT--> -<h1>The MERLIN team</h1> -<p> </p> -<p>The MERLIN platform has been devised by an international team of linguists, computational linguists, language testers, and language teaching institutions.</p> -<p> </p> -<h2>Project partners </h2> -<p><strong>University of Technology Dresden (coordination)</strong><br> -</p> -<p>Katrin Wisniewski <a href="mailto:Katrin.Wisniewski@tu-dresden.de">Katrin.Wisniewski@tu-dresden.de</a><br> -Maria Lieber, Claudia Woldt, Karin Schöne </p> -<p> </p> -<p><strong>European Academy Bozen</strong><br> -Andrea Abel <a href="mailto:andrea.abel@eurac.edu">andrea.abel@eurac.edu</a><br> -Verena Blaschitz, Verena Lyding, Lionel Nicolas, Chiara Vettori</p> -<p> </p> -<p><strong>Charles University Prague</strong><br> - Kateřina Vodičková <a href="mailto:katerina.vodickova@ujop.cuni.cz">katerina.vodickova@ujop.cuni.cz</a> <br> -Pavel Pečený <a href="mailto:pavel.peceny@ujop.cuni.cz">pavel.peceny@ujop.cuni.cz</a><br> -Jirka Hana, Veronika Čurdová<br> -<br> -<strong>telc Frankfurt/Main</strong><br> -Sybille Plassmann <a href="mailto:info@telc.net">info@telc.net</a><em><br> -</em></p> -<p> </p> -<p><strong>Berufsförderungsinstitut Oberösterreich, Linz</strong><br> -Gerhard Zahrer <a href="mailto:Gerhard.Zahrer@bfi-ooe.at">Gerhard.Zahrer@bfi-ooe.at</a></p> -<p>Pia Zaller</p> -<p> </p> -<p><strong>Eberhard Karls University Tübingen</strong><br> -Detmar Meurers <a href="mailto:detmar.meurers@uni-tuebingen.de">detmar.meurers@uni-tuebingen.de</a><br> -Adriane Boyd, Serhiy Bykh, Julia Krivanek</p> -<p> </p> -<h2>Associated partners </h2> -<p><strong>The European Centre for Modern Languages, Graz</strong><br> -<a href="http://www.ecml.at" target="_blank">www.ecml.at</a></p> -<p><br> - <strong>Ministry of Education, Youth and Sports, Prague</strong><br> - <a href="http://www.msmt.cz" target="_blank">www.msmt.cz</a></p> -<p> </p> -<h2>Other partners </h2> - <p>MERLIN is cooperating with the <a href="http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko" target="_blank">FALKO project</a>, run by Anke Lüdeling.</p> - <p> </p> - <h2>Annotators </h2> -<p>A dedicated team of annotators has supported the partnership: Teresa Knittel, Tassja Weber, Emanuele Casani, David Beneš, Rosella Nobile, Radka Julínková, Petra Klimešová, Lenka Žehrová, Ivana Šálená, Blanka Jelínková, Karolina Kofler, Serena Santoriello, Tina Schönfelder, Roberto Malpede, Maria Grazia Sorgonà </p> -<div> - <div> </div> -</div> -<!--INSERT END--> -</div> -</div> -</div> -</div> \ No newline at end of file