%PDF-1.3
%âãÏÓ
2 0 obj
<<
/Length 8904
>>
stream
BT
/TT2 1 Tf
13.92 0 0 13.92 283.679 748.047 Tm
0 g
/GS1 gs
0.0002 Tc
0 Tw
(XCES:)Tj
-11.0345 -1.431 TD
0.0001 Tc
0.0005 Tw
[(An XML-based Encoding Standard fo)26.6(r )56.5(Linguistic Corpo)28.3(r)0.1(a)]TJ
12 0 0 12 165.119 700.047 Tm
0 Tc
0.0004 Tw
(Nancy Ide*, Patrice Bonhomme)Tj
6.96 0 0 6.96 327.839 705.087 Tm
0 Tw
( †)Tj
12 0 0 12 333.119 700.047 Tm
0.0004 Tw
(, Laurent Romary)Tj
6.96 0 0 6.96 426.1226 705.087 Tm
0 Tw
(†)Tj
/TT4 1 Tf
9.84 0 0 9.84 196.079 673.887 Tm
(*)Tj
/TT6 1 Tf
0.6098 0 TD
0.0002 Tw
(Department of Computer Science, Vassar College)Tj
2.2683 -1.2195 TD
0.0004 Tw
(Poughkeepsie, NY 12604-0520 USA)Tj
3.7073 -1.2195 TD
0 Tw
(ide@cs.vassar.edu)Tj
6 0 0 6 247.439 641.967 Tm
(†)Tj
9.84 0 0 9.84 250.559 637.887 Tm
0.0001 Tc
0.0007 Tw
(LORIA \(CNRS, INRIA\))Tj
-1.3171 -1.2195 TD
0 Tc
0.0002 Tw
(Campus Scientifique - BP 239)Tj
-2.0244 -1.2195 TD
0.0005 Tw
(54506 Vandoeuvre-lès-Nancy FRANCE)Tj
1.8537 -1.2195 TD
0 Tw
({bonhomme, romary}@loria.fr)Tj
/TT2 1 Tf
5.0976 -2.1707 TD
0.0002 Tc
(Abstract)Tj
/TT6 1 Tf
8.88 0 0 8.88 53.759 568.287 Tm
0.0088 Tc
0.0876 Tw
(The Corpus Encoding Standard \(CES\) is a part of the EAGLES Guidelines developed by the Expert Advisory Group on Language)Tj
0 -1.1081 TD
0.01 Tc
0.0999 Tw
(Engineering Standards \(EAGLES\) that provides a set of encoding standards for corpus-based work in natural language processing)Tj
T*
0.0057 Tc
0.0568 Tw
(applications. We have instantiated the CES as an XML application called XCES, based on the same data architecture comprised of )Tj
54.4523 0 TD
0 Tc
0 Tw
(a)Tj
-54.4523 -1.1351 TD
0.0036 Tc
0.0358 Tw
(primary encoded text and "standoff" annotation in separate documents. Conversion to XML enables use of some of the more powerfu)Tj
54.6057 0 TD
0 Tc
0 Tw
(l)Tj
-54.6057 -1.1351 TD
0.0065 Tc
0.0645 Tw
(mechanisms provided in the XML framework, including the XSLT Transformation Language, XML Schemas, and support for inter-)Tj
0 -1.1081 TD
0.0649 Tw
(rescue reference together with an extensive path syntax for pointers. In this paper, we describe the differences between the CE)Tj
52.5636 0 TD
(S and)Tj
-52.5636 -1.1351 TD
0.003 Tc
0.0304 Tw
(XCES DTDs and demonstrate how XML mechanisms can be used to select from and manipulate annotated corpora encoded according)Tj
T*
0.0078 Tc
0.0774 Tw
(to XCES specifications. We also provide a general overview of XML and the XML mechanisms that are most relevant to language)Tj
0 -1.1081 TD
0 Tc
0 Tw
(engineering research and applications.)Tj
/TT2 1 Tf
12 0 0 12 138.479 457.167 Tm
0.0002 Tc
(Introduction)Tj
/TT6 1 Tf
9.84 0 0 9.84 67.919 443.247 Tm
0.0081 Tc
0.081 Tw
(The Corpus Encoding Standard \(CES\) \(Ide, 1998a, b\))Tj
-1.439 -1.0976 TD
0.0171 Tc
0.1716 Tw
(is a part of the EAGLES Guidelines developed by the)Tj
0 -1.1219 TD
0.0408 Tc
0.408 Tw
(Expert Advisory Group on Language Engineering)Tj
T*
0.0269 Tc
0.2685 Tw
(Standards \(EAGLES\). The CES is an application of)Tj
T*
0.0229 Tc
0.229 Tw
(SGML \(ISO 8879:1986, Information Processing--Text)Tj
T*
0.0385 Tc
0.3854 Tw
(and Office Systems--Standard Generalized Markup)Tj
T*
0.0146 Tc
0.1459 Tw
(Language\) compliant with the specifications of the )Tj
/TT8 1 Tf
22.3415 0 TD
0 Tc
0 Tw
(TEI)Tj
-22.3415 -1.0976 TD
0.0071 Tc
0.0713 Tw
(Guidelines for Electronic Text Encoding and Interchange)Tj
/TT6 1 Tf
0 -1.122 TD
0.0028 Tc
0.0283 Tw
(of the Text Encoding Initiative. The CES is designed to be)Tj
T*
0.0075 Tc
0.0747 Tw
(optimally suited for use in language engineering research)Tj
T*
0.0028 Tc
0.0276 Tw
(and applications, in order to serve as a widely accepted set)Tj
T*
0.0146 Tc
0.1458 Tw
(of encoding standards for corpus-based work in natural)Tj
T*
0.0043 Tc
0.0434 Tw
(language processing applications. The standard specifies a)Tj
0 -1.0976 TD
0.0123 Tc
0.123 Tw
(minimal encoding level that corpora must achieve to be)Tj
0 -1.1219 TD
0.0522 Tc
0.5217 Tw
(considered standardized in terms of descriptive)Tj
T*
0.024 Tc
0.2394 Tw
[(representation \(marking of structural and typographic)]TJ
T*
0.0046 Tc
0.0455 Tw
(information\), provides a suite of DTDs for encoding basic)Tj
T*
0.0416 Tc
0.4155 Tw
(document structure and linguistic annotation, and)Tj
T*
0.0121 Tc
0.1214 Tw
(specifies a corresponding data architecture for linguistic)Tj
0 -1.0976 TD
0 Tc
0 Tw
(corpora.)Tj
1.439 -1.122 TD
0.0314 Tc
0.3137 Tw
(The eXtensible Markup Language \(XML\) is the)Tj
-1.439 -1.122 TD
0.0129 Tc
0.1292 Tw
(emerging standard for data representation and exchange)Tj
T*
0.0274 Tc
0.2745 Tw
(on the World Wide Web \(Bray, Paoli, & Sperberg-)Tj
T*
0.0039 Tc
0.0385 Tw
(McQueen, 1998\). Although at its most basic level XML is)Tj
T*
0.031 Tc
0.3094 Tw
[(a document markup language directly derived from)]TJ
0 -1.0976 TD
0.0222 Tc
0.222 Tw
(SGML \(i.e., allowing tagged text \(elements\), element)Tj
0 -1.122 TD
0.021 Tc
0.2104 Tw
(nesting, and element references\), various features and)Tj
T*
0.0093 Tc
0.093 Tw
(extensions of XML make it a far more powerful tool for)Tj
T*
0.0423 Tc
0.4232 Tw
(data representation and access. For example, the)Tj
T*
0.0185 Tc
0.1846 Tw
(eXtensible Style Language \(XSL\) provides a powerful)Tj
T*
0.0037 Tc
0.0371 Tw
(transformation language \(XSLT\) \(Clark, 1999\) that can be)Tj
0 -1.0976 TD
0.038 Tc
0.3803 Tw
(used to convert any XML document into another)Tj
0 -1.122 TD
0.0099 Tc
0.0985 Tw
(document \(either another XML document or a document)Tj
T*
0.0114 Tc
0.1137 Tw
(marked with HTML, etc.\) by selecting, rearranging, and)Tj
T*
0.0062 Tc
0.0623 Tw
(adding information to it, in order to serve any application)Tj
T*
0.0186 Tc
0.1856 Tw
(that relies on part or all of its contents. Also, XML’s)Tj
25.6829 40.561 TD
0.0087 Tc
0.087 Tw
(provision for accessing part or all of multiple DTDs in a)Tj
0 -1.122 TD
0.0121 Tc
0.1206 Tw
(single document provides an elegant means to represent)Tj
T*
0.0065 Tc
0.0648 Tw
(and manipulate documents encoded according to the CES)Tj
T*
0 Tc
0 Tw
(data architecture.)Tj
1.439 -1.1219 TD
0.0079 Tc
0.0794 Tw
(We have instantiated the CES as an XML application)Tj
-1.439 -1.0976 TD
0.009 Tc
0.0894 Tw
[(called XCES)]TJ
6 0 0 6 359.279 406.527 Tm
0 Tc
0 Tw
(1)Tj
9.84 0 0 9.84 362.159 402.447 Tm
0.0116 Tc
0.1163 Tw
(. A primary motivation for this effort is to)Tj
-5.6585 -1.122 TD
0.0347 Tc
0.3472 Tw
(provide a state-of-the-art representation and access)Tj
T*
(framework for the American National Corpus \(see)Tj
T*
0.0082 Tc
0.0824 Tw
(Macleod, Ide, & Grishman, 2000\) as well as to serve the)Tj
T*
0.0251 Tc
0.2511 Tw
(language engineering community as a whole. In this)Tj
T*
0.0105 Tc
0.1051 Tw
(paper, we describe the differences between the CES and)Tj
0 -1.0976 TD
0.0151 Tc
0.1507 Tw
(XCES, and demonstrate how XML mechanisms can be)Tj
0 -1.122 TD
0.0158 Tc
0.1579 Tw
(used to select from, manipulate, and transform corpora)Tj
T*
0.0279 Tc
0.2785 Tw
(encoded according to XCES specifications. We also)Tj
T*
0.0302 Tc
0.3022 Tw
(provide a general overview of XML and the XML)Tj
T*
0.0494 Tc
0.494 Tw
(mechanisms that are most relevant to language)Tj
T*
0 Tc
0 Tw
(engineering research and applications.)Tj
/TT2 1 Tf
12 0 0 12 349.679 258.207 Tm
0.0004 Tw
(XML Conversion of the CES)Tj
/TT6 1 Tf
9.84 0 0 9.84 320.639 244.287 Tm
0.0146 Tc
0.1456 Tw
(Minimally, conversion of the CES to XML requires)Tj
-1.439 -1.122 TD
0 Tc
0 Tw
(the following:)Tj
/TT10 1 Tf
0 -1.0976 TD
(•)Tj
/TT12 1 Tf
0.8353 0 TD
( )Tj
/TT6 1 Tf
0.994 0 TD
0.0358 Tc
0.3578 Tw
(adaptation of the DTDs for XML compliance,)Tj
0 -1.122 TD
0.015 Tc
0.1506 Tw
(principally by eliminating inclusion exceptions and)Tj
T*
0 Tc
0 Tw
(making mixed-content models XML-compliant;)Tj
/TT10 1 Tf
-1.8293 -1.122 TD
(•)Tj
/TT12 1 Tf
0.5257 0 TD
( )Tj
/TT6 1 Tf
1.3036 0 TD
0.0058 Tc
0.0582 Tw
(adaptation of the CES mechanism for inter-document)Tj
0 -1.122 TD
0.0102 Tc
0.1016 Tw
(reference to meet the specifications of XML pointer)Tj
T*
0 Tc
0 Tw
(and linking mechanisms..)Tj
-1.8293 -1.0976 TD
0.0091 Tc
0.0915 Tw
[(However, we further exploit the capabilities of the XML)]TJ
0 -1.122 TD
0 Tc
0.0002 Tw
(framework to accomplish the following:)Tj
/TT10 1 Tf
T*
0 Tw
(•)Tj
/TT12 1 Tf
0.5007 0 TD
( )Tj
/TT6 1 Tf
1.3286 0 TD
0.0037 Tc
0.037 Tw
(validate the CES data architecture, in which linguistic)Tj
0 -1.1219 TD
0.0033 Tc
0.0334 Tw
(annotations are maintained in separate documents that)Tj
T*
0.0365 Tc
0.3647 Tw
(point back to the original, yielding a “hyper-)Tj
T*
0.0235 Tc
0.2345 Tw
(document” composed of the original text and all)Tj
0 -1.0976 TD
0.0171 Tc
0.171 Tw
(annotations. This will enable us to ensure that the)Tj
0 -1.122 TD
0.005 Tc
0.0503 Tw
(architecture and pointing mechanisms are conformant)Tj
/TT4 1 Tf
-1.8293 -1.2195 TD
0 Tc
0 Tw
( )Tj
ET
306.479 69.567 144 0.48 re
f
BT
/TT6 1 Tf
5.28 0 0 5.28 306.479 59.967 Tm
(1)Tj
8.88 0 0 8.88 309.119 56.367 Tm
0.0001 Tc
0.0005 Tw
( http://www.cs.vassar.edu/XCES.)Tj
ET
endstream
endobj
3 0 obj
<<
/ProcSet [/PDF /Text ]
/Font <<
/TT2 4 0 R
/TT4 5 0 R
/TT6 6 0 R
/TT8 7 0 R
/TT10 8 0 R
/TT12 9 0 R
>>
/ExtGState <<
/GS1 10 0 R
>>
>>
endobj
13 0 obj
<<
/Length 12909
>>
stream
BT
/TT6 1 Tf
9.84 0 0 9.84 71.759 762.447 Tm
0 g
/GS1 gs
0.0073 Tc
0.073 Tw
(to the specifications of mechanisms for manipulation)Tj
0 -1.122 TD
0.0117 Tc
0.1173 Tw
(of and access to XML documents, such as the XSL)Tj
T*
0.0052 Tc
0.0519 Tw
(Transformation Language, XQL \(Robie, et al., 1998\),)Tj
T*
0 Tc
0 Tw
(etc.)Tj
/TT10 1 Tf
-1.8293 -1.122 TD
(•)Tj
/TT12 1 Tf
0.4925 0 TD
( )Tj
/TT6 1 Tf
1.3368 0 TD
0.0032 Tc
0.0323 Tw
(exploit XML mechanisms for combining all or part of)Tj
0 -1.0976 TD
0.011 Tc
0.1101 Tw
(documents described by different DTDs, in order to)Tj
0 -1.1219 TD
0.0065 Tc
0.0655 Tw
(create, for example, a new document containing only)Tj
T*
0.022 Tc
0.2199 Tw
(certain types of annotation and structural markup)Tj
T*
0.0252 Tc
0.2518 Tw
(\(e.g., markup for paragraphs, sentences, etc.\). In)Tj
T*
0.0136 Tc
0.136 Tw
(particular, we will rely on XML “namespaces” and)Tj
T*
0.0108 Tc
0.1081 Tw
(the ability to reference DTD fragments to retain the)Tj
0 -1.0976 TD
0.0055 Tc
0.0551 Tw
(integrity \(and validity )Tj
/TT8 1 Tf
9.2683 0 TD
0.0053 Tc
0.0535 Tw
(vis à vis)Tj
/TT6 1 Tf
3.3767 0 TD
0.007 Tc
0.0706 Tw
[( the original DTDs\) of)]TJ
-12.645 -1.1219 TD
0 Tc
0 Tw
(the newly formed documents.)Tj
/TT10 1 Tf
-1.8293 -1.122 TD
(•)Tj
/TT12 1 Tf
1.1113 0 TD
( )Tj
/TT6 1 Tf
0.718 0 TD
0.0601 Tc
0.6006 Tw
(instantiate the DTDs using XML schemas)Tj
0 -1.1219 TD
0.006 Tc
0.0599 Tw
(\(Thompson, et al., 1999\), which, among other things,)Tj
T*
0.0161 Tc
0.1616 Tw
(provides a means to limit element content by type)Tj
T*
0.0205 Tc
0.2051 Tw
(\(e.g., text only, numbers, etc.\). In addition, XML)Tj
0 -1.0976 TD
0.0052 Tc
0.0519 Tw
(schemas enable definition of data types \(for example,)Tj
0 -1.1219 TD
0.0155 Tc
0.1549 Tw
(a “part of speech” type, a “lemma” type, etc.\) and)Tj
T*
0.0088 Tc
0.0881 Tw
(specification of legal values; thus, precise values for)Tj
T*
0.0105 Tc
0.1053 Tw
(tag contents describing linguistic phenomena can be)Tj
T*
0.021 Tc
0.2104 Tw
(defined and validated, thus reducing the need for)Tj
T*
0 Tc
0 Tw
(extensive manual checking.)Tj
/TT2 1 Tf
10.8 0 0 10.8 53.759 497.247 Tm
0.0006 Tw
[(2.1)-1372.2(Adaptation of the CES DTDs)]TJ
/TT6 1 Tf
9.84 0 0 9.84 67.919 483.327 Tm
0.0221 Tc
0.2214 Tw
(XML has been designed to eliminate some of the)Tj
-1.439 -1.122 TD
0.0087 Tc
0.0866 Tw
(arcane and/or redundant syntax of SGML, in an effort to)Tj
0 -1.0976 TD
0.0446 Tc
0.446 Tw
(streamline SGML and enable easier parsing and)Tj
0 -1.122 TD
0.0068 Tc
0.0679 Tw
(processing. Conversion of the CES DTDs from SGML to)Tj
T*
0.0123 Tc
0.1235 Tw
(XML conformance is relatively trivial, involving only a)Tj
T*
0.0169 Tc
0.1684 Tw
(few minor syntactic changes, none of which affect the)Tj
T*
0 Tc
0 Tw
(definitions of legal element content. For example,)Tj
/TT10 1 Tf
T*
(•)Tj
/TT12 1 Tf
0.4894 0 TD
( )Tj
/TT6 1 Tf
1.3399 0 TD
0.003 Tc
0.03 Tw
(Attributes with types such as NAME and NAMES are)Tj
0 -1.0976 TD
0 Tc
0.0004 Tw
(changed to NMTOKEN and NMTOKENS.)Tj
/TT10 1 Tf
-1.8293 -1.122 TD
0 Tw
(•)Tj
/TT12 1 Tf
0.6615 0 TD
( )Tj
/TT6 1 Tf
1.1678 0 TD
0.0189 Tc
0.1887 Tw
(Default values for attributes must be quoted, e.g.,)Tj
/TT4 1 Tf
8.88 0 0 8.88 71.759 373.407 Tm
0 Tc
0 Tw
(complete \(y|n\) "y".)Tj
/TT10 1 Tf
9.84 0 0 9.84 53.759 362.367 Tm
(•)Tj
/TT12 1 Tf
0.8172 0 TD
( )Tj
/TT6 1 Tf
1.012 0 TD
0.0331 Tc
0.3307 Tw
(In mixed content models \(i.e., elements whose)Tj
0 -1.122 TD
0.0152 Tc
0.1526 Tw
(content descriptions allow free mixture of text and)Tj
T*
0.0095 Tc
0.0952 Tw
(elements\) #PCDATA \(meaning, in effect, text\) must)Tj
0 -1.0976 TD
0.0147 Tc
0.1466 Tw
(be the first in the list of allowed elements; e.g.,)Tj
/TT4 1 Tf
8.88 0 0 8.88 276.719 329.487 Tm
0 Tc
0 Tw
( )Tj
/TT6 1 Tf
9.84 0 0 9.84 283.2805 329.487 Tm
(a)Tj
-21.4961 -1.122 TD
0.0118 Tc
0.1174 Tw
(content model that allows text mixed with elements)Tj
T*
0.0103 Tc
0.1032 Tw
(for numbers and abbreviations would be: )Tj
/TT4 1 Tf
8.88 0 0 8.88 245.759 307.407 Tm
0 Tc
0 Tw
(\(#PCDATA)Tj
-19.5946 -1.2432 TD
(| num | abbr\)*.)Tj
/TT10 1 Tf
9.84 0 0 9.84 53.759 285.327 Tm
(•)Tj
/TT12 1 Tf
0.6173 0 TD
( )Tj
/TT6 1 Tf
1.212 0 TD
0.0149 Tc
0.1485 Tw
(The '&' connector is disallowed in content models;)Tj
0 -1.122 TD
0.0049 Tc
0.0486 Tw
(for the CES this meant simplifying the content model)Tj
0 -1.0976 TD
0.0064 Tc
0.0646 Tw
(for the header element )Tj
/TT4 1 Tf
8.88 0 0 8.88 166.559 263.487 Tm
0 Tc
0 Tw
(respStmt)Tj
/TT6 1 Tf
9.84 0 0 9.84 209.279 263.487 Tm
0.002 Tc
0.0206 Tw
[( to )]TJ
/TT4 1 Tf
8.88 0 0 8.88 222.3372 263.487 Tm
0.0132 Tc
0.1321 Tw
(\(\(respType |)Tj
-16.957 -1.2432 TD
0 Tc
0 Tw
(respName\)+\).)Tj
/TT6 1 Tf
9.84 0 0 9.84 67.919 241.407 Tm
0.0107 Tc
0.1072 Tw
(XML also disallows "inclusions" and "exclusions" in)Tj
-1.439 -1.122 TD
0.0099 Tc
0.0989 Tw
(content models, i.e., specification of elements that either)Tj
/TT8 1 Tf
T*
0.0186 Tc
0 Tw
(must)Tj
/TT6 1 Tf
1.9637 0 TD
0.0023 Tc
0.0228 Tw
[( or )]TJ
/TT8 1 Tf
1.3778 0 TD
0.0042 Tc
0.042 Tw
(must not)Tj
/TT6 1 Tf
3.4925 0 TD
0.0037 Tc
0.0368 Tw
[( appear nested within the defined element)]TJ
-6.834 -1.122 TD
0.0075 Tc
0.0745 Tw
(in the document itself. The CES makes use of exclusions)Tj
0 -1.0976 TD
0.0264 Tc
0.2645 Tw
(to disallow recursive nesting of certain elements; in)Tj
0 -1.122 TD
0.026 Tc
0.2594 Tw
(particular, the elements )Tj
/TT4 1 Tf
8.88 0 0 8.88 162.239 186.447 Tm
0.0252 Tc
0.2523 Tw
(hi, foreign, distinct,)Tj
-12.2162 -1.2432 TD
0.0366 Tc
0 Tw
(mentioned,)Tj
/TT6 1 Tf
9.84 0 0 9.84 110.2949 175.407 Tm
0.0049 Tc
0.0494 Tw
[( and )]TJ
/TT4 1 Tf
8.88 0 0 8.88 130.799 175.407 Tm
0.0206 Tc
0 Tw
(title )Tj
/TT6 1 Tf
9.84 0 0 9.84 165.7026 175.407 Tm
0.0134 Tc
0.1345 Tw
(are not allowed to be nested.)Tj
-11.3764 -1.122 TD
0.014 Tc
0.1398 Tw
(Each of these elements is defined to be a member of a)Tj
T*
0.0471 Tc
0.4712 Tw
(class of "phrase-level" elements \(e.g., )Tj
/TT4 1 Tf
8.88 0 0 8.88 245.759 153.327 Tm
0 Tc
0 Tw
(foreign,)Tj
-21.6216 -1.2432 TD
0.0068 Tc
0.0674 Tw
(mentioned, distinct, title, hi, list, corr,)Tj
0 -1.2162 TD
0.0193 Tc
0.1933 Tw
(gap, reg, ptr, ref)Tj
/TT6 1 Tf
9.84 0 0 9.84 157.919 131.487 Tm
0.0173 Tc
0.1731 Tw
(\) and their content models are)Tj
-10.5854 -1.122 TD
0.0142 Tc
0.1417 Tw
(defined to consist of members of this class inter-mixed)Tj
T*
0.0284 Tc
0.2839 Tw
(with text. SGML DTD syntax provides a shorthand)Tj
T*
0.0044 Tc
0.0445 Tw
(notation for indicating that a given element or elements in)Tj
T*
0.0028 Tc
0.028 Tw
(the content model may not appear, even though it is listed;)Tj
T*
0.0112 Tc
0.1116 Tw
(XML requires an exact listing of allowed elements. The)Tj
0 -1.0976 TD
0.003 Tc
0.0305 Tw
(XCES DTDs therefore had to be modified to explicitly list)Tj
25.6829 70.8293 TD
0.0424 Tc
0.4244 Tw
(allowed elements in the content models for )Tj
/TT4 1 Tf
8.88 0 0 8.88 525.119 762.447 Tm
0 Tc
0 Tw
(hi,)Tj
-24.6216 -1.2432 TD
(foreign, distinct, mentioned,)Tj
/TT6 1 Tf
9.84 0 0 9.84 461.039 751.407 Tm
( and )Tj
/TT4 1 Tf
8.88 0 0 8.88 480.239 751.407 Tm
(title.)Tj
/TT2 1 Tf
10.8 0 0 10.8 306.479 728.3669 Tm
0.0005 Tw
[(2.2)-1372.2(Linking with XPointer and XLink)]TJ
/TT6 1 Tf
9.84 0 0 9.84 320.639 714.447 Tm
0.017 Tc
0.1696 Tw
[(In the CES, primary documents \(encoded using the)]TJ
-1.439 -1.122 TD
0.006 Tc
0.0594 Tw
[(cesDoc DTD\) and annotations \(encoded using the cesAna)]TJ
T*
0.0128 Tc
0.1283 Tw
(and cesAlign DTDs\) are linked using a mechanism that)Tj
T*
0.0093 Tc
0.0931 Tw
(enables identification of the elements and/or text content)Tj
T*
0.0183 Tc
0.1832 Tw
(to be referenced and the document that contains these)Tj
0 -1.0976 TD
0 Tc
0 Tw
(elements.)Tj
1.439 -1.1219 TD
0.0066 Tc
0.0664 Tw
(XML, like SGML, offers a mechanism for identifying)Tj
-1.439 -1.122 TD
0.0253 Tc
0.2527 Tw
(pointer targets known as ID/IDREF. The mechanism)Tj
T*
0.015 Tc
0.1495 Tw
[(works by including an ID attribute specifying a unique)]TJ
T*
0.0053 Tc
0.0532 Tw
(identifier on the element that is the target of the reference)Tj
T*
0.0132 Tc
0.1324 Tw
(\(i.e., the element is "localized"\); an IDREF attribute on)Tj
0 -1.0976 TD
0.0037 Tc
0.037 Tw
(the source of the reference source the same identifier, thus)Tj
0 -1.1219 TD
0.006 Tc
0.0604 Tw
(providing a pointer to the target. However, the ID/IDREF)Tj
T*
0.0046 Tc
0.0461 Tw
(mechanism presents certain problems for linking elements)Tj
T*
0.005 Tc
0.0494 Tw
[(in linguistic corpora. First, the ID mechanism can be used)]TJ
T*
0.0045 Tc
0.0449 Tw
(only to point to another )Tj
/TT8 1 Tf
9.9756 0 TD
0 Tc
0 Tw
(tagged)Tj
/TT6 1 Tf
2.7317 0 TD
0.004 Tc
0.0398 Tw
[( element. Therefore, its use)]TJ
-12.7073 -1.122 TD
0.0257 Tc
0.2575 Tw
(demands inserting ID attributes--and tags as well, if)Tj
0 -1.0976 TD
0.0052 Tc
0.0524 Tw
(necessary--on every item that may possibly be a target. In)Tj
0 -1.1219 TD
0.0037 Tc
0.0368 Tw
(linguistic corpora, it is not uncommon to require reference)Tj
T*
0.0044 Tc
0.0442 Tw
(to parts of the text that may not be tagged; for example, if)Tj
T*
0.0431 Tw
(only sentences are tagged, the ID/IDREF mechanism does)Tj
T*
0.0041 Tc
0.0416 Tw
(not enable referring to a specific word within the sentence)Tj
T*
0.0095 Tc
0.0945 Tw
(unless the word itself is tagged and provided with an ID)Tj
0 -1.0976 TD
0.0327 Tc
0.3266 Tw
[(attribute. The addition of tags and IDs can be a)]TJ
0 -1.122 TD
0.0099 Tc
0.0987 Tw
(substantial task, and may be impractical if the document)Tj
T*
0 Tc
0 Tw
(will be modified frequently.)Tj
1.439 -1.122 TD
0.0386 Tc
0.3855 Tw
(Another problem arises from the fact that the)Tj
-1.439 -1.122 TD
0.0116 Tc
0.1163 Tw
(ID/IDREF mechanism allows references only within the)Tj
T*
0.0546 Tc
0.5456 Tw
(same SGML/XML document. Because the data)Tj
0 -1.0976 TD
0.0407 Tc
0.4071 Tw
(architecture of the CES provides for maintaining)Tj
0 -1.122 TD
0.0114 Tc
0.1139 Tw
(annotations and other related information \(e.g., different)Tj
T*
0.0101 Tc
0.1016 Tw
[(versions of the text\) in separate SGML/XML documents)]TJ
T*
0.0359 Tc
0.3592 Tw
(with different DTDs, the ID/IDREF mechanism is)Tj
T*
0 Tc
0.0003 Tw
(inappropriate for our use.)Tj
1.439 -1.122 TD
0.0041 Tc
0.0406 Tw
(To answer these problems, XML provides an extended)Tj
-1.439 -1.0976 TD
0.0394 Tw
[(addressing syntax called the XML Path Language \(XPath\))]TJ
0 -1.122 TD
0.0034 Tc
0.0344 Tw
(\(Clark & DeRose, 1999\), which defines a concise notation)Tj
T*
0.0091 Tc
0.0915 Tw
[(for element localization in the document tree \(as defined)]TJ
T*
0.0153 Tc
0.1533 Tw
(by the nesting of elements in the document itself\). For)Tj
T*
0.0433 Tc
0.4331 Tw
(example, the XPath expression)Tj
/TT4 1 Tf
8.88 0 0 8.88 453.119 285.327 Tm
0.0386 Tc
0.3865 Tw
[( /div/p[2]/s[3])]TJ
/TT6 1 Tf
9.84 0 0 9.84 306.479 274.287 Tm
0.0198 Tc
0.1976 Tw
(specifies the third )Tj
/TT4 1 Tf
8.88 0 0 8.88 388.6133 274.287 Tm
0.1209 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 406.799 274.287 Tm
0.0233 Tc
0.2331 Tw
[( \(sentence\) element within the)]TJ
-10.1951 -1.0976 TD
0.0084 Tc
0 Tw
(second )Tj
/TT4 1 Tf
8.88 0 0 8.88 337.679 263.487 Tm
0 Tc
(
)Tj
/TT6 1 Tf
9.84 0 0 9.84 353.759 263.487 Tm
0.0072 Tc
0.0719 Tw
[( \(paragraph\) element within each )]TJ
/TT4 1 Tf
8.88 0 0 8.88 492.1367 263.487 Tm
0.0266 Tc
0 Tw
(
)Tj
/TT6 1 Tf
9.84 0 0 9.84 519.9618 263.487 Tm
0.0061 Tc
0.0616 Tw
[( \(text)]TJ
-21.6954 -1.122 TD
0.0097 Tc
0.0974 Tw
(division\) element; )Tj
/TT4 1 Tf
8.88 0 0 8.88 384.0007 252.447 Tm
0.0327 Tc
0 Tw
(/descendant::p)Tj
/TT6 1 Tf
9.84 0 0 9.84 462.239 252.447 Tm
0.0194 Tc
0.1945 Tw
[( specifies all )]TJ
/TT4 1 Tf
8.88 0 0 8.88 522.4888 252.447 Tm
0.1501 Tc
0 Tw
(
)Tj
/TT6 1 Tf
9.84 0 0 9.84 306.479 241.407 Tm
0.024 Tc
0.2398 Tw
(elements in the document. In addition, XPath allows)Tj
T*
0.0107 Tc
0.1066 Tw
(addressing text fragments within a particular element by)Tj
T*
0.0504 Tc
0.5036 Tw
(providing predicates for manipulating chains of)Tj
T*
0 Tc
0.0003 Tw
(characters. For example, the expression)Tj
/TT4 1 Tf
8.88 0 0 8.88 358.559 195.087 Tm
0 Tw
(substring\(/p/s[2]/text\(\),6\))Tj
/TT6 1 Tf
9.84 0 0 9.84 306.479 180.447 Tm
0.0135 Tc
0.1346 Tw
(selects the string "one would expect that the whole sky)Tj
T*
0.0137 Tc
0.1366 Tw
(would be as bright as the sun, even at night." from the)Tj
T*
0 Tc
0 Tw
(following text:)Tj
/TT4 1 Tf
8.88 0 0 8.88 320.639 144.687 Tm
0.0239 Tc
0.2392 Tw
(
The difficulty is)Tj
0 -1 TD
0.0258 Tc
0.2585 Tw
(that in an infinite static universe)Tj
T*
0.0164 Tc
0.1636 Tw
(nearly every line of sight would end)Tj
0 -1.027 TD
0.0569 Tc
0.5693 Tw
(on the surface of a star.Thus one would expect)Tj
0 -1.027 TD
0.0089 Tc
0.0885 Tw
(that the whole sky would be as bright)Tj
T*
0 Tc
0 Tw
(as the sun, even at night.
)Tj
/TT6 1 Tf
9.84 0 0 9.84 320.639 76.287 Tm
(Similarly, the expression)Tj
/TT4 1 Tf
8.88 0 0 8.88 342.479 63.087 Tm
(substring\(/p/s[2]/text\(\), 10, 12\))Tj
ET
endstream
endobj
14 0 obj
<<
/ProcSet [/PDF /Text ]
/Font <<
/TT2 4 0 R
/TT4 5 0 R
/TT6 6 0 R
/TT8 7 0 R
/TT10 8 0 R
/TT12 9 0 R
>>
/ExtGState <<
/GS1 10 0 R
>>
>>
endobj
16 0 obj
<<
/Length 12047
>>
stream
BT
/TT6 1 Tf
9.84 0 0 9.84 53.759 762.447 Tm
0 g
/GS1 gs
0.0171 Tc
0.1709 Tw
(selects "would expect". Thus the reference is made by)Tj
0 -1.122 TD
0.0179 Tc
0.1794 Tw
(specifying \(1\) the address \(absolute or relative\) of the)Tj
T*
0.008 Tc
0.0803 Tw
(element closest to the substring to be referred to, and \(2\))Tj
T*
0.0434 Tc
0.4338 Tw
(the substring within this element. Another XML)Tj
T*
0.0139 Tc
0.1393 Tw
(mechanism, XPointer \(DeRose, Daniel, & Maler, 1999\))Tj
0 -1.0976 TD
0.0231 Tc
0.231 Tw
(extends XPath syntax to allow addressing points and)Tj
0 -1.1219 TD
0.0153 Tc
0.1532 Tw
(ranges as well as nodes, locating information by string)Tj
T*
0.0231 Tc
0.2307 Tw
(matching, and use of addressing expressions in URI-)Tj
T*
0 Tc
0.0004 Tw
(references as fragment identifiers.)Tj
1.439 -1.1219 TD
0.0102 Tc
0.1024 Tw
(The pointer mechanisms in the SGML version of the)Tj
-1.439 -1.122 TD
0.0307 Tc
0.3066 Tw
(CES are based on HyTime \(ISO, 1992; DeRose &)Tj
0 -1.0976 TD
0.0205 Tc
0.2053 Tw
(Durand, 1994\) and TEI extended pointers \(DeRose &)Tj
0 -1.1219 TD
0.0087 Tc
0.0869 Tw
(Durand, 1995\), the latter of which provided the basis for)Tj
T*
0.0038 Tc
0.0378 Tw
(the development of XPath. The CES reference mechanism)Tj
T*
0.0116 Tc
0.1161 Tw
(for identifying specific strings of characters utilizes two)Tj
T*
0.0067 Tc
0 Tw
(attributes, )Tj
/TT8 1 Tf
4.3659 0 TD
0.0289 Tc
(from)Tj
/TT6 1 Tf
2.0046 0 TD
0.0048 Tc
0.0475 Tw
[( and )]TJ
/TT8 1 Tf
2.0627 0 TD
0.0475 Tc
0 Tw
(to)Tj
/TT6 1 Tf
0.8108 0 TD
0.009 Tc
0.0898 Tw
(, to identify the beginning and end)Tj
-9.2439 -1.122 TD
0.0134 Tc
0.1338 Tw
(points of the string, as well as a third attribute, )Tj
/TT8 1 Tf
20.8644 0 TD
0.0458 Tc
0 Tw
(doc,)Tj
/TT6 1 Tf
1.8769 0 TD
0.0092 Tc
0.0926 Tw
[( to)]TJ
-22.7413 -1.0976 TD
0 Tc
0.0002 Tw
(specify the target document, if necessary; for example, :)Tj
/TT4 1 Tf
8.88 0 0 8.88 67.919 562.047 Tm
0 Tw
(
)Tj
/TT6 1 Tf
9.84 0 0 9.84 53.759 547.407 Tm
0.0003 Tw
(This is shorthand for the HyTime/TEI expression:)Tj
/TT4 1 Tf
8.88 0 0 8.88 67.919 533.727 Tm
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 53.759 510.447 Tm
0.0069 Tc
0.0685 Tw
(XML's mechanism is more explicit and requires only one)Tj
0 -1.122 TD
0 Tc
0 Tw
(attribute; for example:)Tj
/TT4 1 Tf
8.88 0 0 8.88 60.959 485.727 Tm
()Tj
/TT6 1 Tf
9.84 0 0 9.84 67.919 453.327 Tm
0.0031 Tc
0.0307 Tw
(As this example shows, XML also provides a powerful)Tj
-1.439 -1.122 TD
0.0089 Tc
0.0894 Tw
(mechanism for specifying a link \(uni-directional or more)Tj
0 -1.0976 TD
0.039 Tc
0.3906 Tw
(complex linking structures\) between two or more)Tj
0 -1.122 TD
0.0049 Tc
0.0491 Tw
(resources or portions of resources, called XLink \(DeRose,)Tj
T*
0.0271 Tc
0.2714 Tw
(et al., 2000\). In XCES, this mechanism is used for)Tj
T*
0.0124 Tc
0.1242 Tw
(alignment in cesAlign documents, to link corresponding)Tj
T*
0.0082 Tc
0.0819 Tw
(segments of two or more primary texts. It is also used to)Tj
T*
0.0067 Tc
0.0673 Tw
(link annotation documents to a base document containing)Tj
0 -1.0976 TD
0.037 Tc
0.3696 Tw
(the primary text, as in the example above where)Tj
0 -1.122 TD
0.0685 Tc
0.6853 Tw
(annotation information \(e.g., morpho-syntactic)Tj
T*
0.0137 Tc
0.1365 Tw
(information\) about a specific token \()Tj
/TT4 1 Tf
7.92 0 0 7.92 209.4447 343.407 Tm
0.0487 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 235.1369 343.407 Tm
0.0095 Tc
0.0949 Tw
(\) is linked to)Tj
-18.4327 -1.1219 TD
0.0166 Tc
0.1665 Tw
(the string of characters in the original text to which it)Tj
T*
0 Tc
0 Tw
(applies.)Tj
6 0 0 6 83.999 325.407 Tm
(2)Tj
9.84 0 0 9.84 86.879 321.327 Tm
0.0121 Tc
0.1206 Tw
[( In addition to specifying the target location for)]TJ
-3.3659 -1.122 TD
0.0171 Tc
0.1713 Tw
(information in the same or external documents, XLink)Tj
0 -1.0976 TD
0.0085 Tc
0.0848 Tw
(attributes can be used to specify the role of the link, i.e.,)Tj
0 -1.122 TD
0.0467 Tc
0.4673 Tw
(how the link should be activated \(by hand, or)Tj
T*
0.0157 Tc
0.1568 Tw
(automatically by the browser\) and what to do with the)Tj
T*
0.0204 Tc
0.2045 Tw
(target fragment \(replace it or insert it into the source)Tj
T*
0 Tc
0 Tw
(document\).)Tj
1.439 -1.122 TD
0.0234 Tc
0.2336 Tw
(Two of the CES DTDs use links extensively: the)Tj
-1.439 -1.0976 TD
0.0274 Tc
0.2742 Tw
(cesAna DTD for segmentation and morpho-syntactic)Tj
0 -1.122 TD
0.0037 Tc
0.0374 Tw
(annotation, and the cesAlign DTD for alignments between)Tj
T*
0.0167 Tc
0.1667 Tw
(parallel texts. In these DTDs, the link element has the)Tj
T*
0 Tc
0 Tw
(following attributes:)Tj
/TT10 1 Tf
T*
(•)Tj
/TT12 1 Tf
0.5393 0 TD
( )Tj
/TT8 1 Tf
1.29 0 TD
0.0067 Tc
(doc )Tj
/TT6 1 Tf
1.787 0 TD
0.0083 Tc
0.0831 Tw
(for the address \(URL\) of the target resource. By)Tj
-1.787 -1.122 TD
0.005 Tc
0.0497 Tw
(the definitions in the CES, if )Tj
/TT8 1 Tf
12.1463 0 TD
0.0188 Tc
0 Tw
(doc)Tj
/TT6 1 Tf
1.5003 0 TD
0.0045 Tc
0.0452 Tw
[( is given on a parent)]TJ
-13.6467 -1.0976 TD
0.0252 Tc
0.2525 Tw
(element, it is inherited by all children elements,)Tj
0 -1.122 TD
0.0056 Tc
0.0561 Tw
(thereby avoiding repetition of the attribute. However,)Tj
T*
0.015 Tc
0.1504 Tw
(inheritance is not defined for SGML attributes and)Tj
T*
0 Tc
0.0002 Tw
(therefore not implemented in any SGML parser.)Tj
/TT4 1 Tf
-1.8293 -2.3415 TD
0 Tw
( )Tj
ET
53.759 113.487 144 0.48 re
f
BT
/TT6 1 Tf
5.28 0 0 5.28 53.759 103.887 Tm
(2)Tj
8.88 0 0 8.88 56.399 100.287 Tm
0.0049 Tc
0.0496 Tw
[( Although at present we link only text, the mechanism provides)]TJ
-0.2973 -1.2432 TD
0.0079 Tc
0.0788 Tw
(for linking resources in any medium \(audio, video, etc.\), which)Tj
0 -1.2162 TD
0.0041 Tc
0.041 Tw
(in later versions of XCES will allow for linking speech, external)Tj
0 -1.2432 TD
0.0094 Tc
0.0941 Tw
(images, video, applets, form-processing programs, style sheets,)Tj
T*
0 Tc
0 Tw
(etc.)Tj
/TT10 1 Tf
9.84 0 0 9.84 306.479 762.447 Tm
(•)Tj
/TT12 1 Tf
0.6365 0 TD
( )Tj
/TT8 1 Tf
1.1928 0 TD
(to)Tj
/TT6 1 Tf
0.7805 0 TD
0.0179 Tc
0.1784 Tw
[( for the beginning of the annotated fragment, in)]TJ
-0.7805 -1.122 TD
0 Tc
0.0002 Tw
(terms of the ID on the \(sentence\) tag and a token.)Tj
/TT10 1 Tf
-1.8293 -1.122 TD
0 Tw
(•)Tj
/TT12 1 Tf
0.46 0 TD
( )Tj
/TT8 1 Tf
1.3693 0 TD
(from)Tj
/TT6 1 Tf
1.8892 0 TD
0.0001 Tw
( for the end of the annotated fragment.)Tj
-2.2794 -1.1219 TD
0.0147 Tc
0.1471 Tw
(The conversion of the CES into XML has modified)Tj
-1.439 -1.122 TD
0.0324 Tc
0.3237 Tw
(this architecture. In XML, annotated fragments are)Tj
0 -1.0976 TD
0.0204 Tc
0.2041 Tw
(referenced by the URI \(remote or local\) of the target)Tj
0 -1.1219 TD
0.0132 Tc
0.1319 Tw
(resource, and an extended pointer identifying a element)Tj
T*
0.0281 Tc
0.281 Tw
(and, where necessary, the selected substring of that)Tj
T*
0 Tc
0 Tw
(element's content, as in the following:)Tj
/TT4 1 Tf
8.88 0 0 8.88 320.639 660.687 Tm
()Tj
/TT6 1 Tf
9.84 0 0 9.84 320.639 619.407 Tm
0.0382 Tc
0.3816 Tw
(Annotation resulting from automatic processing)Tj
-1.439 -1.1219 TD
0.0145 Tc
0.145 Tw
(\(marking of sentence boundaries, tokens, links between)Tj
T*
0.0037 Tc
0.0365 Tw
(parallel texts, etc.\) often includes thousands of links to the)Tj
T*
0.025 Tc
0.25 Tw
(same external document. Repetition of the document)Tj
0 -1.0976 TD
0.0094 Tc
0.0942 Tw
(name on, for example, every )Tj
/TT4 1 Tf
7.92 0 0 7.92 428.7315 575.487 Tm
0.0377 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 453.9894 575.487 Tm
0.0089 Tc
0.0887 Tw
[( element in a cesAna)]TJ
-14.9909 -1.1219 TD
0.0304 Tc
0.3042 Tw
[(annotation document would obviously significantly)]TJ
T*
0.0179 Tc
0.1793 Tw
(multiply its size. XML includes an attribute)Tj
/TT4 1 Tf
7.92 0 0 7.92 496.7205 553.407 Tm
0 Tc
0 Tw
( )Tj
/TT8 1 Tf
9.84 0 0 9.84 501.9803 553.407 Tm
0.055 Tc
(xml:base)Tj
/TT6 1 Tf
-19.868 -1.1219 TD
0.0232 Tc
0.2316 Tw
(\(Marsh, 2000\) that builds in to XML the inheritance)Tj
T*
0.0112 Tc
0.1118 Tw
(specified for the CES )Tj
/TT8 1 Tf
9.5241 0 TD
0.0095 Tc
0 Tw
(doc )Tj
/TT6 1 Tf
1.8269 0 TD
0.0132 Tc
0.1321 Tw
(attribute. For example, in the)Tj
-11.351 -1.122 TD
0 Tc
0 Tw
(following text:)Tj
/TT4 1 Tf
8.88 0 0 8.88 320.639 506.607 Tm
()Tj
0 -1 TD
( )Tj
0 -1.027 TD
( )Tj
T*
()Tj
/TT6 1 Tf
9.84 0 0 9.84 306.479 411.327 Tm
0.0234 Tc
0.2343 Tw
(the value of the attribute )Tj
/TT8 1 Tf
11.8275 0 TD
0.0812 Tc
0 Tw
(xml:base)Tj
/TT6 1 Tf
4.2594 0 TD
0.0258 Tc
0.2574 Tw
[( specified on the)]TJ
/TT4 1 Tf
7.92 0 0 7.92 306.479 400.287 Tm
0 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 339.839 400.287 Tm
0.0091 Tc
0.0914 Tw
[( element is inherited by the two )]TJ
/TT4 1 Tf
7.92 0 0 7.92 475.6258 400.287 Tm
0.0439 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 501.1296 400.287 Tm
0.0155 Tc
0.1547 Tw
[( elements)]TJ
-19.7816 -1.0976 TD
0.0295 Tc
0.2946 Tw
(that are its children, and therefore need not be re-)Tj
0 -1.122 TD
0.0335 Tc
0.3353 Tw
(specified. The inclusion of )Tj
/TT4 1 Tf
7.92 0 0 7.92 436.3038 378.447 Tm
0.1254 Tc
0 Tw
(xml:base)Tj
/TT6 1 Tf
9.84 0 0 9.84 482.269 378.447 Tm
0.026 Tc
0.2596 Tw
[( in the XML)]TJ
-17.8648 -1.122 TD
0.0192 Tc
0.1925 Tw
(specification ensures that conformant XML processors)Tj
T*
0 Tc
0.0002 Tw
(will handle it \(unlike SGML\).)Tj
/TT2 1 Tf
12 0 0 12 317.039 333.327 Tm
0.0005 Tw
(Manipulating and Extracting from XCES)Tj
7.26 -0.92 TD
0.0002 Tc
0 Tw
(Documents)Tj
/TT6 1 Tf
9.84 0 0 9.84 320.639 308.367 Tm
0.0079 Tc
0.0789 Tw
(The Extensible Style Language \(XSL\) is a part of the)Tj
-1.439 -1.122 TD
0.0063 Tc
0.063 Tw
(XML framework, consisting of two parts: the best known)Tj
T*
0.0156 Tc
0.1563 Tw
(is the XSL formatting or "style sheet" language; and a)Tj
0 -1.0976 TD
0.0195 Tc
0.1948 Tw
(powerful tree-traversal language, XSLT \(Clark, 1999\),)Tj
0 -1.122 TD
0.0218 Tc
0.218 Tw
(that can be used to convert any XML document into)Tj
T*
0.0128 Tc
0.1282 Tw
(another document in any form \(e.g., XML, well-formed)Tj
T*
0.0068 Tc
0.0678 Tw
(HTML, plain text, etc.\). The transformed documents may)Tj
T*
0.0081 Tc
0.0808 Tw
(or may not be intended for rendering data on a computer)Tj
T*
0.0133 Tc
0.1331 Tw
(screen, but may be used simply to move data from one)Tj
0 -1.0976 TD
0.0049 Tc
0.0484 Tw
(computer system or program to another \(e.g., to transduce)Tj
0 -1.122 TD
0 Tc
0.0002 Tw
(between encoding and/or annotation formats, etc.\).)Tj
1.439 -1.122 TD
0.0287 Tc
0.2873 Tw
(XSLT supports the following kinds of document)Tj
-1.439 -1.122 TD
0 Tc
0 Tw
(manipulation:)Tj
/TT10 1 Tf
T*
(•)Tj
/TT12 1 Tf
0.5745 0 TD
( )Tj
/TT6 1 Tf
1.2547 0 TD
0.0106 Tc
0.1065 Tw
(selection of elements or portions of element content)Tj
0 -1.122 TD
0 Tc
0 Tw
(using the XPath syntax;)Tj
/TT10 1 Tf
-1.8293 -1.0976 TD
(•)Tj
/TT12 1 Tf
1.0868 0 TD
( )Tj
/TT6 1 Tf
0.7425 0 TD
0.0485 Tc
0.4848 Tw
(rearrangement or transformation of extracted)Tj
0 -1.122 TD
0.0086 Tc
0.086 Tw
(information \(including not only text content but also)Tj
T*
0 Tc
0 Tw
(element names, etc.\) in the target document;)Tj
/TT10 1 Tf
-1.8293 -1.122 TD
(•)Tj
/TT12 1 Tf
0.46 0 TD
( )Tj
/TT6 1 Tf
1.3693 0 TD
(addition of information in the target document.)Tj
-0.3902 -1.1219 TD
0.0041 Tc
0.0411 Tw
(Thus, a suite of documents representing a base text \(or)Tj
-1.439 -1.122 TD
0.0054 Tc
0.0538 Tw
(texts\) and its annotations can be manipulated to serve any)Tj
0 -1.0976 TD
0.0093 Tc
0.0934 Tw
(application that relies on part or all of its contents. Thus)Tj
ET
endstream
endobj
17 0 obj
<<
/ProcSet [/PDF /Text ]
/Font <<
/TT2 4 0 R
/TT4 5 0 R
/TT6 6 0 R
/TT8 7 0 R
/TT10 8 0 R
/TT12 9 0 R
>>
/ExtGState <<
/GS1 10 0 R
>>
>>
endobj
19 0 obj
<<
/Length 14693
>>
stream
BT
/TT6 1 Tf
9.84 0 0 9.84 53.759 762.447 Tm
0 g
/GS1 gs
0.0058 Tc
0.0576 Tw
(XSLT is likely to have the most to offer for manipulation)Tj
0 -1.122 TD
0 Tc
0.0002 Tw
(of and access to annotated corpora.)Tj
T*
0.0097 Tc
0.0973 Tw
(XSLT is relatively complex and will not be described in)Tj
T*
0.0024 Tc
0.0235 Tw
(detail here.)Tj
6 0 0 6 97.919 733.407 Tm
0 Tc
0 Tw
(3)Tj
9.84 0 0 9.84 100.799 729.327 Tm
0.0033 Tc
0.0332 Tw
[( A short example can provide some idea of the)]TJ
-4.7805 -1.122 TD
0.0444 Tc
0.4439 Tw
(possibilities. Using as input a cesAna document)Tj
0 -1.0976 TD
0.0511 Tc
0.5107 Tw
(containing morpho-syntactic information \(e.g., a)Tj
0 -1.1219 TD
0.0263 Tc
0.2635 Tw
[(document containing the fragment in Figure )36.8(1)]TJ
6 0 0 6 261.119 700.527 Tm
0 Tc
0 Tw
(4)Tj
9.84 0 0 9.84 263.999 696.447 Tm
0.0299 Tc
0.2986 Tw
(\), the)Tj
-21.3659 -1.1219 TD
0.0184 Tc
0.1962 Tw
[(XSLT )18.7(document )18.7(in )22(Figure )37.7(2)18.5( can be used to create an)]TJ
T*
0.0098 Tc
0.098 Tw
(HTML document that displays a text in "word | lemma |)Tj
T*
0.0048 Tc
0.0479 Tw
(pos" form. When the resulting HTML document is loaded)Tj
T*
0 Tc
0.0002 Tw
(into a browser, it will display the following:)Tj
/TT4 1 Tf
8.88 0 0 8.88 75.119 638.607 Tm
0 Tw
(It|it|PPER3 was|be|PAST3 a|a|DINT)Tj
0 -0.973 TD
(bright|bright|ADJE cold|cold|ADJE)Tj
0 -1.027 TD
(day|day|NN…)Tj
/TT6 1 Tf
9.84 0 0 9.84 67.919 606.447 Tm
0.0178 Tc
0.1781 Tw
(The XSLT script in Figure 2 could be modified to)Tj
-1.439 -1.122 TD
0.0047 Tc
0.0465 Tw
(produce output in any desired form, or to produce another)Tj
T*
0.0472 Tc
0.4719 Tw
(XML document containing the merged text and)Tj
T*
0.0155 Tc
0.1553 Tw
(annotation documents. Similarly, XSLT can be used to)Tj
T*
0.0098 Tc
0.0978 Tw
(produce concordances, paired sentences or words from a)Tj
0 -1.0976 TD
0.0144 Tc
0.1437 Tw
(parallel text, or even a web document that displays the)Tj
0 -1.1219 TD
0.0199 Tc
0.1985 Tw
(orthographic representation of a text and provides the)Tj
T*
0.018 Tc
0.1796 Tw
(audio rendition when the word is clicked on, etc. The)Tj
T*
0.0103 Tc
0.1029 Tw
(XCES web page)Tj
6 0 0 6 121.919 522.447 Tm
0 Tc
0 Tw
(5)Tj
9.84 0 0 9.84 124.799 518.367 Tm
0.0096 Tc
0.0956 Tw
[( provides additional examples of XSLT)]TJ
-7.2195 -1.122 TD
0.0105 Tc
0.1046 Tw
(scripts and their output. Also, Ide, Kilgarriff, & Romary)Tj
T*
0.0235 Tc
0.2345 Tw
(\(2000\) describe an XML format for encoding lexical)Tj
0 -1.0976 TD
0.0294 Tc
0.2944 Tw
(information \(primarily drawn from dictionaries\) and)Tj
0 -1.122 TD
0.0204 Tc
0.2045 Tw
(demonstrate how XSLT can be used to implement an)Tj
T*
0 Tc
0 Tw
(inheritance mechanism over the document tree.)Tj
6 0 0 6 239.999 467.487 Tm
(6)Tj
9.84 0 0 9.84 67.919 452.367 Tm
0.0083 Tc
0.083 Tw
(We include in our presentation a demonstration using)Tj
-1.439 -1.122 TD
0.0201 Tc
0.2007 Tw
(the publicly-available XT tool \(Clark, 1999\), showing)Tj
T*
0.0383 Tc
0.3827 Tw
(how combination of and selection from base and)Tj
0 -1.0976 TD
0.0282 Tc
0.2824 Tw
(annotation documents can be accomplished, and the)Tj
0 -1.122 TD
0.0183 Tc
0.1831 Tw
(various presentation and formatting options XSLT can)Tj
T*
0.0063 Tc
0.0627 Tw
(effect. In particular, we demonstrate application of XCES)Tj
T*
0.025 Tc
0.2505 Tw
[(and the use of XT and other XML tools to corpora)]TJ
T*
0.0204 Tc
0.204 Tw
(including extensive morpho-syntactic information, and)Tj
T*
0 Tc
0.0002 Tw
(aligned documents \(potentially either text or speech\).)Tj
ET
55.199 346.287 231.12 0.48 re
f
55.199 336.447 0.48 9.84 re
f
285.839 336.447 0.48 9.84 re
f
BT
/TT4 1 Tf
7.92 0 0 7.92 60.959 329.487 Tm
0 Tw
()Tj
ET
55.199 327.567 0.48 9.12 re
f
285.839 327.567 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 320.367 Tm
()Tj
ET
55.199 300.447 0.48 9.12 re
f
285.839 300.447 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 293.487 Tm
( )Tj
ET
55.199 291.567 0.48 9.12 re
f
285.839 291.567 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 284.367 Tm
( )Tj
ET
55.199 282.447 0.48 9.12 re
f
285.839 282.447 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 275.487 Tm
( )Tj
ET
55.199 255.567 0.48 9.12 re
f
285.839 255.567 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 248.367 Tm
( It)Tj
ET
55.199 246.447 0.48 9.12 re
f
285.839 246.447 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 239.487 Tm
( )Tj
ET
55.199 237.567 0.48 9.12 re
f
285.839 237.567 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 230.367 Tm
( it)Tj
ET
55.199 228.447 0.48 9.12 re
f
285.839 228.447 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 221.487 Tm
( Pp3ns)Tj
ET
55.199 219.567 0.48 9.12 re
f
285.839 219.567 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 212.367 Tm
( PPER3)Tj
ET
55.199 210.447 0.48 9.12 re
f
285.839 210.447 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 203.487 Tm
( )Tj
ET
55.199 201.567 0.48 9.12 re
f
285.839 201.567 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 194.367 Tm
( it)Tj
ET
55.199 192.447 0.48 9.12 re
f
285.839 192.447 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 185.487 Tm
( Pp3ns)Tj
ET
55.199 183.567 0.48 9.12 re
f
285.839 183.567 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 176.367 Tm
( PPER3)Tj
ET
55.199 174.447 0.48 9.12 re
f
285.839 174.447 0.48 9.12 re
f
BT
7.92 0 0 7.92 60.959 167.487 Tm
( )Tj
ET
55.199 147.567 0.48 9.12 re
f
285.839 147.567 0.48 9.12 re
f
BT
9.84 0 0 9.84 53.759 133.407 Tm
( )Tj
ET
53.759 135.567 144 0.48 re
f
BT
/TT6 1 Tf
5.28 0 0 5.28 53.759 125.967 Tm
(3)Tj
8.88 0 0 8.88 56.399 122.367 Tm
0.0002 Tw
( Full documentation is available at http://www.w3.org/TR/xslt.)Tj
5.28 0 0 5.28 53.759 114.927 Tm
0 Tw
(4)Tj
8.88 0 0 8.88 56.399 111.327 Tm
0.0074 Tc
0.0741 Tw
[( Note that this cesAna document contains full sementation and)]TJ
-0.2973 -1.2432 TD
0.0436 Tc
0.4355 Tw
(annotation information, including full morpho-syntactic)Tj
T*
0.0179 Tc
0.1789 Tw
(specifications for all potential annotations and the results of)Tj
0 -1.2162 TD
0 Tc
0 Tw
(automatic disambiguation.)Tj
5.28 0 0 5.28 53.759 71.007 Tm
(5)Tj
8.88 0 0 8.88 56.399 67.407 Tm
0.0001 Tc
0.0005 Tw
( http://www.cs.vassar.edu/XCES)Tj
5.28 0 0 5.28 53.759 59.967 Tm
0 Tc
0 Tw
(6)Tj
8.88 0 0 8.88 56.399 56.367 Tm
( See also Erjavec )Tj
/TT8 1 Tf
7.1081 0 TD
(et al. )Tj
/TT6 1 Tf
2.2495 0 TD
(in this volume.)Tj
/TT4 1 Tf
7.92 0 0 7.92 313.679 763.887 Tm
( was)Tj
ET
307.919 761.967 0.48 9.12 re
f
538.559 761.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 755.007 Tm
( )Tj
ET
307.919 753.087 0.48 9.12 re
f
538.559 753.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 745.887 Tm
( be)Tj
ET
307.919 743.967 0.48 9.12 re
f
538.559 743.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 737.007 Tm
( Vmis3s)Tj
ET
307.919 735.087 0.48 9.12 re
f
538.559 735.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 727.887 Tm
( PAST3)Tj
ET
307.919 725.967 0.48 9.12 re
f
538.559 725.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 719.007 Tm
( )Tj
ET
307.919 717.087 0.48 9.12 re
f
538.559 717.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 709.887 Tm
( be)Tj
ET
307.919 707.967 0.48 9.12 re
f
538.559 707.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 701.007 Tm
( Vais1s)Tj
ET
307.919 699.087 0.48 9.12 re
f
538.559 699.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 691.887 Tm
( AUX1)Tj
ET
307.919 689.967 0.48 9.12 re
f
538.559 689.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 683.007 Tm
( )Tj
ET
307.919 681.087 0.48 9.12 re
f
538.559 681.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 673.887 Tm
( be)Tj
ET
307.919 671.967 0.48 9.12 re
f
538.559 671.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 665.007 Tm
( Vais3s)Tj
ET
307.919 663.087 0.48 9.12 re
f
538.559 663.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 655.887 Tm
( AUX3)Tj
ET
307.919 653.967 0.48 9.12 re
f
538.559 653.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 647.007 Tm
( )Tj
ET
307.919 645.087 0.48 9.12 re
f
538.559 645.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 637.887 Tm
( be)Tj
ET
307.919 635.967 0.48 9.12 re
f
538.559 635.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 629.007 Tm
( Vmis1s)Tj
ET
307.919 627.087 0.48 9.12 re
f
538.559 627.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 619.887 Tm
( PAST1)Tj
ET
307.919 617.967 0.48 9.12 re
f
538.559 617.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 611.007 Tm
( )Tj
ET
307.919 609.087 0.48 9.12 re
f
538.559 609.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 601.887 Tm
( be)Tj
ET
307.919 599.967 0.48 9.12 re
f
538.559 599.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 593.007 Tm
( Vmis3s)Tj
ET
307.919 591.087 0.48 9.12 re
f
538.559 591.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 583.887 Tm
( PAST3…)Tj
ET
307.919 581.967 0.48 9.12 re
f
538.559 581.967 0.48 9.12 re
f
307.919 571.407 231.12 0.48 re
f
307.919 571.887 0.48 10.08 re
f
538.559 571.887 0.48 10.08 re
f
BT
/TT6 1 Tf
9.84 0 0 9.84 346.559 551.007 Tm
0.0003 Tc
-0.0002 Tw
(Figure 1 : Fragment of a cesAna document)Tj
ET
307.919 521.727 231.12 0.48 re
f
307.919 511.887 0.48 9.84 re
f
538.559 511.887 0.48 9.84 re
f
BT
/TT4 1 Tf
7.92 0 0 7.92 313.679 504.927 Tm
0 Tc
0 Tw
()Tj
ET
307.919 485.007 0.48 9.12 re
f
538.559 485.007 0.48 9.12 re
f
307.919 475.887 0.48 9.12 re
f
538.559 475.887 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 468.927 Tm
()Tj
ET
307.919 467.007 0.48 9.12 re
f
538.559 467.007 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 459.807 Tm
( )Tj
ET
307.919 457.887 0.48 9.12 re
f
538.559 457.887 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 450.927 Tm
( )Tj
ET
307.919 449.007 0.48 9.12 re
f
538.559 449.007 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 441.807 Tm
( )Tj
ET
307.919 439.887 0.48 9.12 re
f
538.559 439.887 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 432.927 Tm
( )Tj
ET
307.919 431.007 0.48 9.12 re
f
538.559 431.007 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 423.807 Tm
( )Tj
ET
307.919 421.887 0.48 9.12 re
f
538.559 421.887 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 414.927 Tm
()Tj
ET
307.919 413.007 0.48 9.12 re
f
538.559 413.007 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 401.967 Tm
()Tj
ET
307.919 395.967 0.48 17.04 re
f
538.559 395.967 0.48 17.04 re
f
BT
7.92 0 0 7.92 313.679 389.007 Tm
( )Tj
ET
307.919 387.087 0.48 9.12 re
f
538.559 387.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 379.887 Tm
( )Tj
ET
307.919 377.967 0.48 9.12 re
f
538.559 377.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 371.007 Tm
( |)Tj
ET
307.919 369.087 0.48 9.12 re
f
538.559 369.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 361.887 Tm
( )Tj
ET
307.919 359.967 0.48 9.12 re
f
538.559 359.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 353.007 Tm
( |)Tj
ET
307.919 351.087 0.48 9.12 re
f
538.559 351.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 343.887 Tm
( )Tj
ET
307.919 341.967 0.48 9.12 re
f
538.559 341.967 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 335.007 Tm
( )Tj
ET
307.919 333.087 0.48 9.12 re
f
538.559 333.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 325.887 Tm
()Tj
ET
307.919 323.967 0.48 9.12 re
f
538.559 323.967 0.48 9.12 re
f
307.919 315.087 0.48 9.12 re
f
538.559 315.087 0.48 9.12 re
f
BT
7.92 0 0 7.92 313.679 307.647 Tm
()Tj
ET
307.919 301.887 0.48 12.96 re
f
538.559 301.887 0.48 12.96 re
f
307.919 291.567 231.12 0.48 re
f
307.919 292.047 0.48 9.84 re
f
538.559 292.047 0.48 9.84 re
f
BT
/TT6 1 Tf
9.84 0 0 9.84 330.479 266.847 Tm
0.0002 Tc
-0.0002 Tw
(Figure 2 : XSLT document to create HTML output)Tj
/TT2 1 Tf
12 0 0 12 385.919 232.767 Tm
0.0001 Tc
0.0006 Tw
(XML Schemas)Tj
/TT6 1 Tf
9.84 0 0 9.84 320.639 218.847 Tm
0.0084 Tc
0.0837 Tw
(The XML Schema definition language \(Thompson, et)Tj
-1.439 -1.122 TD
0.0337 Tc
0 Tw
(al.,)Tj
/TT8 1 Tf
1.3565 0 TD
0 Tc
( )Tj
/TT6 1 Tf
0.2776 0 TD
0.021 Tc
0.2105 Tw
[(2000; Biron & Malhotra, 2000\) enables document)]TJ
-1.6341 -1.0976 TD
0.0157 Tc
0.1574 Tw
(creators to constrain and document the meaning, usage)Tj
0 -1.122 TD
0.0347 Tc
0.3471 Tw
(and relationships of the constituent parts of XML)Tj
T*
0.0201 Tc
0.2005 Tw
[(documents: datatypes, elements and their content, and)]TJ
T*
0.0116 Tc
0.1159 Tw
(attributes and their values. Schemas can also be used to)Tj
T*
0.02 Tc
0.2003 Tw
(provide default values for attributes and elements. As)Tj
T*
0.0093 Tc
0.0928 Tw
(such, XML schemas provide means to define an abstract)Tj
0 -1.0976 TD
0.0106 Tc
0.1062 Tw
[(data model for a class of documents. While duplicating)]TJ
0 -1.122 TD
0.0079 Tc
0.0788 Tw
(\(or making explicit\) some of the capabilities provided by)Tj
T*
0.0169 Tc
0.1693 Tw
(XML DTDs, they significantly extend their power and)Tj
T*
0.0047 Tc
0.0473 Tw
(provide for much tighter validation of document form and)Tj
T*
0 Tc
0 Tw
(content.)Tj
1.439 -1.122 TD
0.0264 Tc
0.2636 Tw
(XML schemas have considerable implications for)Tj
-1.439 -1.0976 TD
0.0272 Tc
0.2725 Tw
(development of XCES and for corpus encoding and)Tj
ET
endstream
endobj
20 0 obj
<<
/ProcSet [/PDF /Text ]
/Font <<
/TT2 4 0 R
/TT4 5 0 R
/TT6 6 0 R
/TT8 7 0 R
>>
/ExtGState <<
/GS1 10 0 R
>>
>>
endobj
22 0 obj
<<
/Length 13820
>>
stream
BT
/TT6 1 Tf
9.84 0 0 9.84 53.759 762.447 Tm
0 g
/GS1 gs
0.0218 Tc
0.218 Tw
(annotation in general. The following lists only a few)Tj
0 -1.122 TD
0.0273 Tc
0.2726 Tw
(possibilities for the application of XML schemas in)Tj
T*
0 Tc
0 Tw
(XCES:)Tj
/TT10 1 Tf
T*
(•)Tj
/TT12 1 Tf
0.9104 0 TD
( )Tj
/TT6 1 Tf
1.9676 0 TD
0.0336 Tc
0.3363 Tw
(different attribute declarations and/or content)Tj
-2.878 -1.122 TD
0.0201 Tc
0.2009 Tw
(models can apply to elements with the same name in)Tj
0 -1.0976 TD
0.0476 Tc
0.4763 Tw
(different contexts. This allows for more tightly)Tj
0 -1.1219 TD
0.0066 Tc
0.0659 Tw
(constrained content models than possible with DTDs. For)Tj
T*
0.0295 Tc
0.2948 Tw
(example, names in headers \(names of authors, etc.,)Tj
T*
0.004 Tc
0.0399 Tw
(consisting of the usual "first name", "last name" elements\))Tj
T*
0.021 Tc
0.2103 Tw
(and names in the text \("named entities"\) should have)Tj
T*
0.0049 Tc
0.0493 Tw
(different content models and attributes in order to provide)Tj
0 -1.0976 TD
0.0123 Tc
0.1232 Tw
(for tight validation of form in each context. In the TEI,)Tj
0 -1.1219 TD
0.0033 Tc
0.0329 Tw
(upon which the CES is based, the element )Tj
/TT4 1 Tf
8.88 0 0 8.88 225.7474 630.447 Tm
0.014 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 258.4665 630.447 Tm
0.0026 Tc
0.0259 Tw
[( is used)]TJ
-20.8036 -1.122 TD
0.0309 Tc
0.3086 Tw
(in both headers and text, and its content model is)Tj
T*
0.0202 Tc
0.2018 Tw
(necessarily broad enough to encompass the variety of)Tj
T*
0.0085 Tc
0.0849 Tw
[(forms it may have in these contexts. In the CES, header)]TJ
T*
0.0096 Tc
0.0956 Tw
(elements are prefixed with "h." so that names in headers)Tj
0 -1.0976 TD
0.0301 Tc
0.3016 Tw
(are tagged with )Tj
/TT4 1 Tf
8.88 0 0 8.88 130.319 575.487 Tm
0 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 173.039 575.487 Tm
0.0287 Tc
0.2872 Tw
(, whose content model is)Tj
-12.122 -1.1219 TD
0.0051 Tc
0.0513 Tw
(different from that of the )Tj
/TT4 1 Tf
8.88 0 0 8.88 157.919 564.447 Tm
0.0144 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 190.6583 564.447 Tm
0.0037 Tc
0.0369 Tw
[( element that can appear)]TJ
-13.9125 -1.122 TD
0.021 Tc
0.2094 Tw
[(in the body of the text. This strategy is effectively a)]TJ
T*
0.0165 Tc
0.1652 Tw
("kludge" to overcome the fact that SGML provides no)Tj
T*
0.0472 Tc
0.4724 Tw
(scoping capabilities. XML schemas, building on)Tj
T*
0.0103 Tc
0.1027 Tw
(definitions using XML Namespaces \(Bray, Hollander, &)Tj
0 -1.0976 TD
0.0112 Tc
0.1121 Tw
(Layman, 1999\), solves this problem. Thus in XCES, we)Tj
0 -1.1219 TD
0.028 Tc
0.2801 Tw
(avoid the invention of variant element names while)Tj
T*
0.0177 Tc
0.1766 Tw
(retaining the ability to constrain content and attributes)Tj
T*
0 Tc
0 Tw
(based on context.)Tj
/TT10 1 Tf
T*
(•)Tj
/TT12 1 Tf
0.6049 0 TD
( )Tj
/TT6 1 Tf
2.2732 0 TD
0.0139 Tc
0.1393 Tw
(equivalence classes can be defined for groups of)Tj
-2.878 -1.122 TD
0.0172 Tc
0.1724 Tw
(elements and/or attributes, indicating that they may be)Tj
0 -1.0976 TD
0.0095 Tc
0.0949 Tw
(used in the same ways as defined for a particular named)Tj
0 -1.122 TD
0.0099 Tc
0.0993 Tw
(element \("the exemplar"\). The CES makes extensive use)Tj
T*
0.0241 Tc
0.2411 Tw
(of parameter entities to group together elements that)Tj
T*
0.0178 Tc
0.1778 Tw
(behave identically. For example, phrase-level elements)Tj
T*
0.0221 Tc
0.2212 Tw
(\(i.e., elements that can appear within but not outside)Tj
T*
0.0218 Tc
0.218 Tw
(paragraphs or paragraph-like elements, such as name,)Tj
0 -1.0976 TD
0.0661 Tc
0.6613 Tw
(num, etc.\) are grouped using the parameter)Tj
/TT4 1 Tf
8.88 0 0 8.88 53.759 366.447 Tm
0.0412 Tc
0 Tw
(%phrase.seq)Tj
/TT6 1 Tf
9.84 0 0 9.84 116.4026 366.447 Tm
0.0111 Tc
0.1112 Tw
(, so that all paragraph-level elements can)Tj
-6.3662 -1.122 TD
0.0068 Tc
0.0675 Tw
(include this class in their content models. Again, this is a)Tj
T*
0.0071 Tc
0.0711 Tw
(work-around for the fact that equivalence and inheritance)Tj
T*
0.0238 Tc
0.2381 Tw
(of properties is not expressable in SGML. Similarly,)Tj
T*
0.0028 Tc
0.0277 Tw
(groups of attributes are defined in all CES DTDs, as in the)Tj
0 -1.0976 TD
[(cesANa DTD fragment given in Figure 3)15(.)-0.2( In XCES, this is)]TJ
0 -1.122 TD
0.0003 Tc
-0.0003 Tw
(replaced by the schema in Figure 4.)Tj
/TT10 1 Tf
T*
0 Tc
0 Tw
(•)Tj
/TT12 1 Tf
0.6994 0 TD
( )Tj
/TT6 1 Tf
2.1786 0 TD
0.0218 Tc
0.2175 Tw
(attribute or element values, or combinations of)Tj
-2.878 -1.122 TD
0.0183 Tc
0.1835 Tw
(attribute and element values, can be constrained to be)Tj
T*
0.0455 Tc
0.4546 Tw
(unique. That is, it is possible to indicate in a)Tj
T*
0.0079 Tc
0.0792 Tw
(computational lexicon that only one entry can be defined)Tj
0 -1.0976 TD
0.0061 Tc
0.0606 Tw
(with the value of a given word form as its content \(or the)Tj
0 -1.122 TD
0.0266 Tc
0.2658 Tw
(content of one of its child elements\), that only one)Tj
T*
0.0145 Tc
0.1448 Tw
(paragraph can have an attribute indicating that it is the)Tj
T*
0.0037 Tc
0.0369 Tw
(23rd, or in general that a given key appears only once in a)Tj
T*
0.0346 Tc
0.3463 Tw
(document. Similarly, we can ensure that only one)Tj
T*
0.011 Tc
0.1101 Tw
(disambiguated form is given for each token in a cesAna)Tj
0 -1.0976 TD
0.032 Tc
0.3202 Tw
(document, or only one correspondence for a given)Tj
0 -1.122 TD
0.0034 Tc
0.0341 Tw
(sentence in a cesAlign document. Obviously, this is useful)Tj
T*
0 Tc
0.0002 Tw
(for error detection and prevention.)Tj
/TT10 1 Tf
T*
0 Tw
(•)Tj
/TT12 1 Tf
0.4944 0 TD
( )Tj
/TT6 1 Tf
2.3836 0 TD
0.0033 Tc
0.0326 Tw
(dependencies can be established based on values of)Tj
-2.878 -1.1219 TD
0.0097 Tc
0.0968 Tw
(elements or attributes. This has similar benefits for error)Tj
T*
0.0205 Tc
0.2055 Tw
(detection in creating annotated corpora: nouns can be)Tj
0 -1.0976 TD
0.0052 Tc
0.052 Tw
(prevented from being assigned a tense, tokens whose )Tj
/TT8 1 Tf
22.1526 0 TD
0.0178 Tc
0 Tw
(type)Tj
/TT6 1 Tf
-22.1526 -1.122 TD
0.0255 Tc
0.2555 Tw
[(attribute has the value PUNCT can be specified to)]TJ
T*
0.0296 Tc
0.2962 Tw
(include only )Tj
/TT4 1 Tf
8.88 0 0 8.88 114.7518 91.407 Tm
0.1123 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 152.7097 91.407 Tm
0.0375 Tc
0.3755 Tw
[( elements containing specific)]TJ
-10.056 -1.122 TD
0.0165 Tc
0.1654 Tw
(characters, etc. More generally, annotation labels \(e.g.,)Tj
T*
0.0162 Tc
0.162 Tw
(pos indicators\) used in an annotation document can be)Tj
T*
0.0435 Tc
0.4353 Tw
(specified elsewhere, and element content can be)Tj
25.6829 71.561 TD
0.004 Tc
0.0399 Tw
(constrained to these values only; for example, to constrain)Tj
0 -1.122 TD
0.011 Tc
0.1099 Tw
(the values of the )Tj
/TT4 1 Tf
8.88 0 0 8.88 379.919 751.407 Tm
0.0301 Tc
0 Tw
()Tj
/TT6 1 Tf
9.84 0 0 9.84 407.8989 751.407 Tm
0.0082 Tc
0.0819 Tw
[( element in an XCES annotation)]TJ
-10.3069 -1.122 TD
0.0065 Tc
0.0647 Tw
(document to the EAGLES morphosyntactic specifications)Tj
T*
0.0154 Tc
0.1539 Tw
(\(Monachini & Calzolari, 1996\), the following could be)Tj
T*
0 Tc
0 Tw
(specified:)Tj
/TT4 1 Tf
8.88 0 0 8.88 334.799 696.447 Tm
()Tj
0 -1.2432 TD
( )Tj
T*
( )Tj
T*
( )Tj
T*
( )Tj
0 -1.2162 TD
( )Tj
0 -1.2432 TD
( ...)Tj
T*
( )Tj
T*
( )Tj
ET
300.719 583.407 246.24 0.48 re
f
300.719 571.407 0.48 12 re
f
546.479 571.407 0.48 12 re
f
BT
7.92 0 0 7.92 306.479 564.447 Tm
()]TJ
ET
300.719 526.527 0.48 9.12 re
f
546.479 526.527 0.48 9.12 re
f
300.719 517.407 0.48 9.12 re
f
546.479 517.407 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 510.447 Tm
()Tj
ET
300.719 481.407 0.48 9.12 re
f
546.479 481.407 0.48 9.12 re
f
300.719 472.527 0.48 9.12 re
f
546.479 472.527 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 465.327 Tm
[()]TJ
ET
300.719 463.407 0.48 9.12 re
f
546.479 463.407 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 456.447 Tm
[()]TJ
ET
300.719 436.527 0.48 9.12 re
f
546.479 436.527 0.48 9.12 re
f
300.719 425.967 246.24 0.48 re
f
300.719 426.447 0.48 10.08 re
f
546.479 426.447 0.48 10.08 re
f
BT
/TT6 1 Tf
9.84 0 0 9.84 325.679 405.327 Tm
0.0002 Tc
-0.0001 Tw
(Figure 3 : cesAna DTD fragment for global attributes)Tj
ET
300.719 391.407 246.24 0.48 re
f
300.719 379.407 0.48 12 re
f
546.479 379.407 0.48 12 re
f
BT
/TT4 1 Tf
7.92 0 0 7.92 306.479 371.007 Tm
0 Tc
0 Tw
()Tj
ET
300.719 368.607 0.48 11.04 re
f
546.479 368.607 0.48 11.04 re
f
BT
7.92 0 0 7.92 306.479 361.407 Tm
( )Tj
ET
300.719 359.487 0.48 9.12 re
f
546.479 359.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 352.527 Tm
( )Tj
ET
300.719 341.487 0.48 9.12 re
f
546.479 341.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 334.527 Tm
( )Tj
ET
300.719 323.487 0.48 9.12 re
f
546.479 323.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 316.527 Tm
( )Tj
ET
300.719 305.487 0.48 9.12 re
f
546.479 305.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 298.527 Tm
( )Tj
ET
300.719 296.607 0.48 9.12 re
f
546.479 296.607 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 289.407 Tm
( )Tj
ET
300.719 287.487 0.48 9.12 re
f
546.479 287.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 280.527 Tm
( )Tj
ET
300.719 269.487 0.48 9.12 re
f
546.479 269.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 262.527 Tm
( )Tj
ET
300.719 251.487 0.48 9.12 re
f
546.479 251.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 244.527 Tm
( )Tj
ET
300.719 242.607 0.48 9.12 re
f
546.479 242.607 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 235.407 Tm
( )Tj
ET
300.719 233.487 0.48 9.12 re
f
546.479 233.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 226.527 Tm
( )Tj
ET
300.719 224.607 0.48 9.12 re
f
546.479 224.607 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 217.407 Tm
( )Tj
ET
300.719 215.487 0.48 9.12 re
f
546.479 215.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 208.527 Tm
( )Tj
ET
300.719 206.607 0.48 9.12 re
f
546.479 206.607 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 199.407 Tm
( )Tj
ET
300.719 197.487 0.48 9.12 re
f
546.479 197.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 190.527 Tm
( )Tj
ET
300.719 188.607 0.48 9.12 re
f
546.479 188.607 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 181.407 Tm
( )Tj
ET
300.719 170.607 0.48 9.12 re
f
546.479 170.607 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 163.407 Tm
( )Tj
ET
300.719 152.607 0.48 9.12 re
f
546.479 152.607 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 145.407 Tm
( )Tj
ET
300.719 143.487 0.48 9.12 re
f
546.479 143.487 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 136.527 Tm
( )Tj
ET
300.719 134.607 0.48 9.12 re
f
546.479 134.607 0.48 9.12 re
f
BT
7.92 0 0 7.92 306.479 127.407 Tm
()Tj
ET
300.719 125.487 0.48 9.12 re
f
546.479 125.487 0.48 9.12 re
f
300.719 113.007 246.24 0.48 re
f
300.719 113.487 0.48 12 re
f
546.479 113.487 0.48 12 re
f
BT
/TT6 1 Tf
9.84 0 0 9.84 341.999 92.367 Tm
0.0002 Tc
-0.0001 Tw
(Figure 4 : XCES schema for global attributes)Tj
ET
endstream
endobj
23 0 obj
<<
/ProcSet [/PDF /Text ]
/Font <<
/TT4 5 0 R
/TT6 6 0 R
/TT8 7 0 R
/TT10 8 0 R
/TT12 9 0 R
>>
/ExtGState <<
/GS1 10 0 R
>>
>>
endobj
25 0 obj
<<
/Length 9106
>>
stream
BT
/TT6 1 Tf
9.84 0 0 9.84 67.919 762.447 Tm
0 g
/GS1 gs
0.0139 Tc
0.1389 Tw
(XML Schemas have been developed for all three of)Tj
-1.439 -1.122 TD
0.0148 Tc
0.1483 Tw
(the current XCES DTDs, and are available through the)Tj
T*
0 Tc
0 Tw
(XCES web site.)Tj
/TT2 1 Tf
12 0 0 12 142.319 717.327 Tm
0.0001 Tc
(Conclusion)Tj
/TT6 1 Tf
9.84 0 0 9.84 67.919 703.407 Tm
0.0074 Tc
0.074 Tw
(The XML framework provides search, extraction, and)Tj
-1.439 -1.1219 TD
0.0066 Tc
0.0664 Tw
(transformation capabilities that answer most, if not all, of)Tj
T*
0.0072 Tc
0.0718 Tw
(the current and foreseen needs for corpus-based language)Tj
T*
0.006 Tc
0.0597 Tw
(engineering. In particular, XML provides mechanisms for)Tj
0 -1.0976 TD
0.0043 Tc
0.0427 Tw
(easily implementing the CES and XCES data architecture,)Tj
0 -1.1219 TD
0.0178 Tc
0.1776 Tw
(which calls for modularization of resources by putting)Tj
T*
0.0044 Tc
0.0442 Tw
(different kinds of annotation, different versions of the text)Tj
T*
0.0176 Tc
0.1761 Tw
(and annotations, etc. in separate, linked documents. In)Tj
T*
0.0481 Tc
0.4815 Tw
(addition, processing tools for the various XML)Tj
T*
0.0275 Tc
0.2755 Tw
(recommendations \(XPath, XPointer, XLink, etc.\) are)Tj
0 -1.0976 TD
0.008 Tc
0.0805 Tw
(generally freely distributed, thus eliminating the need for)Tj
0 -1.1219 TD
0 Tc
0 Tw
(costly and time-consuming tool development.)Tj
1.439 -1.122 TD
0.0237 Tc
0.2365 Tw
(The CES and XCES encoding specifications have)Tj
-1.439 -1.1219 TD
0.0249 Tc
0.2491 Tw
(been developed for and by the language engineering)Tj
T*
0.0034 Tc
0.034 Tw
(community, and their coverage will continue to evolve. At)Tj
T*
0.0118 Tc
0.1183 Tw
(present, XCES provides guidelines for encoding various)Tj
0 -1.0976 TD
0.0079 Tc
0.0787 Tw
(features in written text, morpho-syntactic annotation, and)Tj
0 -1.1219 TD
0.0092 Tc
0.0925 Tw
[(alignment information, all of which are relatively stable)]TJ
T*
0.0069 Tc
0.0694 Tw
(and agreed-upon within the community. We are currently)Tj
T*
0.0294 Tc
0.2942 Tw
(working with several different groups to implement)Tj
T*
0.0143 Tc
0.143 Tw
(encoding guidelines for additional written text features,)Tj
T*
0.0124 Tc
0.1241 Tw
(computational lexicons, discourse and dialogue, and co-)Tj
0 -1.0976 TD
0.0212 Tc
0.2118 Tw
(reference, as well as speech and its various levels of)Tj
0 -1.122 TD
0.0203 Tc
0.203 Tw
(annotation and representation. Our development effort)Tj
T*
0.0053 Tc
0.0535 Tw
(will continue to be based on the principle of collaborative)Tj
T*
0.0262 Tc
0.2616 Tw
(and distributed development in a bottom-up fashion,)Tj
T*
0.0225 Tc
0.2252 Tw
(building up the specifications as need and agreement)Tj
T*
0.0136 Tc
0.1357 Tw
(within the community dictate. We welcome all input to)Tj
0 -1.0976 TD
0 Tc
0 Tw
(the continuing development of the XCES.)Tj
/TT2 1 Tf
12 0 0 12 124.079 361.167 Tm
0.0002 Tc
(Acknowledgments)Tj
/TT6 1 Tf
9.84 0 0 9.84 53.759 347.487 Tm
0.012 Tc
0.12 Tw
(This work was partially funded by the National Science)Tj
0 -1.122 TD
0.0241 Tc
0.2416 Tw
(Foundation and the Centre National de la Recherche)Tj
T*
0.0523 Tc
0.5232 Tw
(Scientifique, under the NSF/CNRS International)Tj
T*
0 Tc
0 Tw
(Collaborative program.)Tj
/TT2 1 Tf
12 0 0 12 143.039 291.327 Tm
(References)Tj
/TT6 1 Tf
9.84 0 0 9.84 53.759 277.407 Tm
0.0167 Tc
0.1665 Tw
(Biron, P. & Malhotra, A., 2000. XML Schema Part 2:)Tj
1 -1.122 TD
0.021 Tc
0.2101 Tw
(Datatypes. W3C Working Draft, 25 February 2000.)Tj
T*
0 Tc
0 Tw
(http://www.w3.org/TR/xmlschema-2/.)Tj
-1 -1.122 TD
0.0097 Tc
0.0971 Tw
(Bray, T., Hollander, D., Layman, M., 1999. Namespaces)Tj
1 -1.0976 TD
0.0858 Tc
0.8577 Tw
(in XML. World Wide Web Consortium)Tj
0 -1.122 TD
0 Tc
0 Tw
(Recommendation, 14 January 1999.)Tj
T*
(http://www.w3.org/TR/REC-xml-names/.)Tj
-1 -1.122 TD
0.0053 Tc
0.0526 Tw
(Bray, T., Paoli, J., Sperberg-McQueen, C.M. \(eds.\), 1998.)Tj
1 -1.122 TD
0.029 Tc
0.2902 Tw
(Extensible Markup Language \(XML\) Version 1.0.)Tj
T*
0 Tc
0 Tw
(W3C Recommendation.)Tj
-1 -1.0976 TD
( http://www.w3.org:TR/1998/REC-xml-19980210.)Tj
0 -1.122 TD
0.0272 Tc
0.2724 Tw
(Clark, J. \(ed.\), 1999. XSL Transformations \(XSLT\).)Tj
1 -1.122 TD
0 Tc
0 Tw
(Version 1.0. W3C Recommendation.)Tj
T*
0.0001 Tc
(http://www.w3.org/TR/xslt.)Tj
-1 -1.1219 TD
0.0206 Tc
0.2064 Tw
(Clark, J. and DeRose, S., 1999. XML Path Language)Tj
1 -1.122 TD
0.0545 Tc
0.5448 Tw
(\(XPath\). Version 1.0. W3C Recommendation.)Tj
0 -1.0976 TD
0 Tc
0 Tw
(http://www.w3.org/TR/xpath.)Tj
-1 -1.122 TD
0.0002 Tw
(Clark, J., 1999. XT Version 1991105.)Tj
1 -1.122 TD
0 Tw
(http://www.jclark.com/xml/xt.html)Tj
24.6829 69.4146 TD
0.017 Tc
0.1698 Tw
(DeRose, S & Durand, D., 1994. )Tj
/TT8 1 Tf
14.4994 0 TD
0.0379 Tc
0.3788 Tw
(Making HyperMedia)Tj
-13.4994 -1.122 TD
0.0041 Tc
0.0414 Tw
(work: A users's guide to HyTime.)Tj
/TT6 1 Tf
13.6631 0 TD
0.0047 Tc
0.0465 Tw
[( Boston., MA: Kluwer)]TJ
-13.6631 -1.122 TD
0 Tc
0 Tw
(Academic Publishers.)Tj
-1 -1.1219 TD
0.026 Tc
0.2595 Tw
[(DeRose, S & Durand, D., 1995. The TEI Hypertext)]TJ
1 -1.122 TD
0.0191 Tc
0.1915 Tw
(Guidelines. In N. Ide & J. Véronis \(eds.\), )Tj
/TT8 1 Tf
19.0976 0 TD
0.0172 Tc
0.1718 Tw
(The Text)Tj
-19.0976 -1.0976 TD
0.0528 Tc
0.5281 Tw
(Encoding Initiative: Background and Context)Tj
/TT6 1 Tf
22.6098 0 TD
0 Tc
0 Tw
(.)Tj
-22.6098 -1.1219 TD
0.0004 Tw
(Dordrecht: Kluwer Academic Publishers, 181-90.)Tj
-1 -1.1219 TD
0.0155 Tc
0.1548 Tw
(DeRose, S, Maler, E., Orchard, D., Trafford, B. \(eds.\),)Tj
1 -1.122 TD
0.0056 Tc
0.0564 Tw
(2000. XML Linking Language \(XLink\). W3C Working)Tj
T*
0 Tc
0.0003 Tw
(Draft, 21 February 2000. http://www.w3.org/TR/xlink.)Tj
-1 -1.122 TD
0.0081 Tc
0.0814 Tw
(DeRose, S., Daniel, R., & Maler, E., 1999. XML Pointer)Tj
1 -1.0976 TD
0.0513 Tc
0.5132 Tw
(Language \(XPointer\). W3C Working Draft, 6)Tj
0 -1.1219 TD
0 Tc
0.0003 Tw
(December 1999. http://www.w3.org/TR/xptr.)Tj
-1 -1.122 TD
0.01 Tc
0.0995 Tw
[(Erjavec, T., Evans, R., Ide, N., Kilgarriff, A., 2000. The)]TJ
1 -1.1219 TD
0.0577 Tc
0.5765 Tw
(CONCEDE model for Lexical Databases. In)Tj
/TT8 1 Tf
T*
0.0252 Tc
0.252 Tw
(Proceedings of the Second International Language)Tj
T*
0 Tc
0 Tw
(Resources and Evaluation Conference)Tj
/TT6 1 Tf
15.3659 0 TD
(, this volume.)Tj
-16.3659 -1.0976 TD
0.044 Tc
0.4397 Tw
(Ide, N., 1998a. Encoding Linguistic Corpora. In)Tj
/TT8 1 Tf
1 -1.1219 TD
0.0239 Tc
0.2388 Tw
(Proceedings of the Sixth Workshop on Very Large)Tj
T*
0 Tc
0 Tw
(Corpora)Tj
/TT6 1 Tf
3.4453 0 TD
(, 9-17.)Tj
-4.4453 -1.1219 TD
0.0334 Tc
0.3344 Tw
(Ide, N., 1998b. Corpus Encoding Standard: SGML)Tj
1 -1.1219 TD
0.0406 Tc
0.4061 Tw
(Guidelines for Encoding Linguistic Corpora. In)Tj
/TT8 1 Tf
T*
0.0352 Tc
0.3521 Tw
(Proceedings of the First International Language)Tj
0 -1.0976 TD
0 Tc
0 Tw
(Resources and Evaluation Conference,)Tj
/TT6 1 Tf
15.6098 0 TD
( 463-70.)Tj
-16.6098 -1.1219 TD
0.0033 Tc
0.0335 Tw
(Ide, N., Kilgarriff, A., Romary, L., 2000. A Formal Model)Tj
1 -1.122 TD
0.0075 Tc
0.0748 Tw
(of Dictionary Structure and Content. In )Tj
/TT8 1 Tf
16.7073 0 TD
0.0054 Tc
0.0537 Tw
(Proceedings of)Tj
-16.7073 -1.122 TD
0 Tc
0 Tw
(EURALEX'00)Tj
/TT6 1 Tf
5.5464 0 TD
(, to appear.)Tj
-6.5464 -1.122 TD
0.0113 Tc
0.1135 Tw
[(ISO, 1992\) ISO )113.5(10744: 1992. )]TJ
/TT8 1 Tf
12.855 0 TD
0.015 Tc
0.15 Tw
(Information technology --)Tj
-11.855 -1.122 TD
0.069 Tc
0.69 Tw
(Hypermedia/time-based Structuring Language)Tj
0 -1.0976 TD
0 Tc
0 Tw
(\(HyTime\).)Tj
/TT6 1 Tf
4.082 0 TD
0.0005 Tw
( Geneva: ISO.)Tj
-5.082 -1.122 TD
0.0031 Tc
0.0313 Tw
(Macleod, C., Ide, N., Grishman, R., 2000. Progress Report)Tj
1 -1.1219 TD
0.0131 Tc
0.1315 Tw
(on the American National Corpus. In )Tj
/TT8 1 Tf
16.3413 0 TD
0.0219 Tc
0.2192 Tw
(Proceedings of)Tj
-16.3413 -1.122 TD
0.0273 Tc
0.2729 Tw
(the Second International Language Resources and)Tj
T*
0.0076 Tc
0.0757 Tw
(Evaluation Conference)Tj
/TT6 1 Tf
9.4878 0 TD
0.013 Tc
0.1299 Tw
[( \(this volume\). Paris: European)]TJ
-9.4878 -1.122 TD
0 Tc
0.0003 Tw
(Language Resources Association.)Tj
-1 -1.0976 TD
0.0031 Tc
0.0308 Tw
(Marsh, J., 2000. XML Base \(XBase\). W3C Working Draft)Tj
1 -1.122 TD
0 Tc
0.0005 Tw
(21-February-2000. http://www.w3.org/TR/xmlbase.)Tj
-1 -1.122 TD
0.0268 Tc
0.2681 Tw
(Monachini, M. & Calzolari, N., 1996. Synopsis and)Tj
1 -1.1219 TD
0.0097 Tc
0.0974 Tw
[(Comparison of Morphosyntactic Phenomena Encoded)]TJ
T*
0.0161 Tc
0.1614 Tw
[(in Lexicons and Corpora: A Common Proposal and)]TJ
T*
0.0076 Tc
0.0764 Tw
(Applications to European Languages. EAGLES Report)Tj
0 -1.0976 TD
0.0001 Tc
0 Tw
(EAG-CLWG-MORPHSYN/R.)Tj
0 -1.122 TD
0 Tc
(http://www.ilc.pi.cnr.it/EAGLES96/morphsyn/.)Tj
-1 -1.1219 TD
0.0271 Tc
0.2709 Tw
(Robie, J., Lapp, J., Schach, D., 1998\). XML Query)Tj
1 -1.122 TD
0.6534 Tc
6.5346 Tw
(Language \(XQL\).)Tj
T*
0 Tc
0 Tw
(http://www.w3.org/TandS/QL/QL98/pp/xql.html)Tj
-1 -1.122 TD
0.0085 Tc
0.0855 Tw
(Thompson, H., Beech, D., Maloney, M. Mendelsohn, N.,)Tj
1 -1.0976 TD
0.0094 Tc
0.0937 Tw
(2000. XML Schema Part 1: Structures. W3C Working)Tj
0 -1.122 TD
0 Tc
0.0003 Tw
(Draft, 25 February 2000.)Tj
T*
0 Tw
(http://www.w3.org/TR/xmlschema-1/.)Tj
ET
endstream
endobj
26 0 obj
<<
/ProcSet [/PDF /Text ]
/Font <<
/TT2 4 0 R
/TT6 6 0 R
/TT8 7 0 R
>>
/ExtGState <<
/GS1 10 0 R
>>
>>
endobj
10 0 obj
<<
/Type /ExtGState
/SA false
/SM 0.02
/OP false
/op false
/OPM 1
/BG2 /Default
/UCR2 /Default
/HT /Default
/TR2 /Default
>>
endobj
27 0 obj
<<
/Type /FontDescriptor
/Ascent 750
/CapHeight 676
/Descent -250
/Flags 262178
/FontBBox [-168 -218 1000 935]
/FontName /Times-Bold
/ItalicAngle 0
/StemV 133
/XHeight 461
/StemH 139
>>
endobj
28 0 obj
<<
/Type /FontDescriptor
/Ascent 832
/CapHeight 570
/Descent -300
/Flags 34
/FontBBox [-13 -274 638 783]
/FontName /CourierNewPSMT
/ItalicAngle 0
/StemV 42
/XHeight 421
>>
endobj
29 0 obj
<<
/Type /FontDescriptor
/Ascent 750
/CapHeight 662
/Descent -250
/Flags 34
/FontBBox [-168 -218 1000 898]
/FontName /Times-Roman
/ItalicAngle 0
/StemV 84
/XHeight 450
/StemH 84
>>
endobj
30 0 obj
<<
/Type /FontDescriptor
/Ascent 750
/CapHeight 653
/Descent -250
/Flags 98
/FontBBox [-169 -217 1010 883]
/FontName /Times-Italic
/ItalicAngle -15
/StemV 76
/XHeight 441
/StemH 76
>>
endobj
31 0 obj
<<
/Type /FontDescriptor
/Ascent 701
/CapHeight 0
/Descent -298
/Flags 32
/FontBBox [-167 -299 1063 827]
/FontName /Symbol
/ItalicAngle 0
/StemV 0
>>
endobj
32 0 obj
<<
/Type /FontDescriptor
/Ascent 905
/CapHeight 718
/Descent -211
/Flags 32
/FontBBox [-222 -210 1000 913]
/FontName /ArialMT
/ItalicAngle 0
/StemV 94
/XHeight 515
>>
endobj
4 0 obj
<<
/Type /Font
/Subtype /TrueType
/FirstChar 32
/LastChar 134
/Widths [250 0 0 0 0 0 0 0 0 0 500 0 250 333 250 0
0 500 500 0 0 0 0 0 0 0 333 0 0 0 0 0
0 722 667 722 722 667 0 0 0 389 0 0 667 944 722 0
611 0 722 556 667 0 0 0 722 0 0 0 0 0 0 0
0 500 556 444 556 444 333 500 556 278 0 556 278 833 556 500
556 0 444 389 333 556 500 722 500 500 0 0 0 0 0 0
0 0 0 0 0 0 500 ]
/Encoding /WinAnsiEncoding
/BaseFont /Times-Bold
/FontDescriptor 27 0 R
>>
endobj
5 0 obj
<<
/Type /Font
/Subtype /TrueType
/FirstChar 32
/LastChar 148
/Widths [600 600 600 600 0 600 0 600 600 600 600 600 600 600 600 600
600 600 600 600 600 600 600 0 0 600 600 600 600 600 600 600
0 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600
600 600 600 600 600 600 600 600 600 600 0 600 600 600 0 0
0 600 600 600 600 600 600 600 600 600 0 600 600 600 600 600
600 600 600 600 600 600 600 600 600 600 0 0 600 0 600 0
0 0 0 0 0 600 0 0 0 0 0 0 0 0 0 0
0 0 0 600 600 ]
/Encoding /WinAnsiEncoding
/BaseFont /CourierNewPSMT
/FontDescriptor 28 0 R
>>
endobj
6 0 obj
<<
/Type /Font
/Subtype /TrueType
/FirstChar 32
/LastChar 233
/Widths [250 0 408 500 0 0 778 180 333 333 0 0 250 333 250 278
500 500 500 500 500 500 500 500 500 500 278 278 564 0 564 0
921 722 667 667 722 611 556 722 722 333 389 722 611 889 722 722
556 722 667 556 611 722 722 944 722 722 0 0 0 0 0 0
0 444 500 444 500 444 333 500 500 278 278 500 278 778 500 500
500 500 333 389 278 500 500 722 500 500 444 480 200 480 0 0
0 0 0 0 0 0 500 0 0 0 0 0 0 0 0 0
0 0 333 444 444 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 444 444 ]
/Encoding /WinAnsiEncoding
/BaseFont /Times-Roman
/FontDescriptor 29 0 R
>>
endobj
7 0 obj
<<
/Type /Font
/Subtype /TrueType
/FirstChar 32
/LastChar 224
/Widths [250 0 0 0 0 0 0 214 333 333 0 0 250 333 250 278
500 0 0 0 0 0 0 0 0 0 333 0 0 0 0 0
0 611 611 667 0 611 611 722 722 333 0 0 556 833 0 0
611 0 611 500 556 722 611 833 611 0 0 0 0 0 0 0
0 500 500 444 500 444 278 500 500 278 0 444 278 722 500 500
500 0 389 389 278 500 444 667 444 444 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
500 ]
/Encoding /WinAnsiEncoding
/BaseFont /Times-Italic
/FontDescriptor 30 0 R
>>
endobj
8 0 obj
<<
/Type /Font
/Subtype /TrueType
/FirstChar 149
/LastChar 149
/Widths [460 ]
/Encoding /WinAnsiEncoding
/BaseFont /Symbol
/FontDescriptor 31 0 R
>>
endobj
9 0 obj
<<
/Type /Font
/Subtype /TrueType
/FirstChar 32
/LastChar 32
/Widths [278 ]
/Encoding /WinAnsiEncoding
/BaseFont /ArialMT
/FontDescriptor 32 0 R
>>
endobj
1 0 obj
<<
/Type /Page
/Parent 11 0 R
/Resources 3 0 R
/Contents 2 0 R
>>
endobj
12 0 obj
<<
/Type /Page
/Parent 11 0 R
/Resources 14 0 R
/Contents 13 0 R
>>
endobj
15 0 obj
<<
/Type /Page
/Parent 11 0 R
/Resources 17 0 R
/Contents 16 0 R
>>
endobj
18 0 obj
<<
/Type /Page
/Parent 11 0 R
/Resources 20 0 R
/Contents 19 0 R
>>
endobj
21 0 obj
<<
/Type /Page
/Parent 11 0 R
/Resources 23 0 R
/Contents 22 0 R
>>
endobj
24 0 obj
<<
/Type /Page
/Parent 11 0 R
/Resources 26 0 R
/Contents 25 0 R
>>
endobj
33 0 obj
<<
/S /D
>>
endobj
34 0 obj
<<
/Nums [0 33 0 R ]
>>
endobj
11 0 obj
<<
/Type /Pages
/Kids [1 0 R 12 0 R 15 0 R 18 0 R 21 0 R 24 0 R]
/Count 6
/MediaBox [0 0 595 842]
>>
endobj
35 0 obj
<<
/CreationDate (D:20050219173319-05'00')
/ModDate (D:20050219173319-05'00')
/Producer (PSNormalizer.framework)
>>
endobj
36 0 obj
<<
/Type /Catalog
/Pages 11 0 R
/PageLabels 34 0 R
>>
endobj
xref
0 37
0000000000 65535 f
0000076737 00000 n
0000000016 00000 n
0000008972 00000 n
0000073946 00000 n
0000074413 00000 n
0000074991 00000 n
0000075743 00000 n
0000076410 00000 n
0000076574 00000 n
0000072669 00000 n
0000077306 00000 n
0000076818 00000 n
0000009125 00000 n
0000022088 00000 n
0000076902 00000 n
0000022242 00000 n
0000034343 00000 n
0000076986 00000 n
0000034497 00000 n
0000049244 00000 n
0000077070 00000 n
0000049374 00000 n
0000063248 00000 n
0000077154 00000 n
0000063391 00000 n
0000072550 00000 n
0000072810 00000 n
0000073012 00000 n
0000073200 00000 n
0000073397 00000 n
0000073597 00000 n
0000073763 00000 n
0000077238 00000 n
0000077266 00000 n
0000077423 00000 n
0000077555 00000 n
trailer
<<
/Size 37
/Root 36 0 R
/Info 35 0 R
/ID []
>>
startxref
77625
%%EOF