The current version defines two types of documents: the global elements
below... The global types are available for reusing through schema type extension/restriction.
The most up to date document definition is CHAT, it is also the richest in structure. Ideally,
each group should develop a schema module defining the structure of their specific (class of)
annotations, this schema should be an assembly of their definitions.
Developed by Romeo Anghelache, from the CHAT specifications, released under
the GNU Public License, 2001. Continuing development by Franklin Chen.
structure of a CHAT document
@Participants; a structure enumerating the beings
participating
31 March 1999 is formatted as 1999-03-31
this work might be done in an extended interval of time; a duration of
1 year, 2 months, 3 days, 10 hours, and 30 minutes, one would write:
P1Y2M3DT10H30M
an AIF document, see http://morph.ldc.upenn.edu/AG/doc/xml/
administrative descriptions, reused from Dublin Core
() in a word
precode at the beginning of an utterance; CHAT [- ...]
unscoped code in the middle of an utterance; CHAT {...}
postcode at the end of an utterance; CHAT [+ ...]
allows semi structured extensions to the current set of annotations:
allows for identification of a user who made this annotation
inlined annotations, the conventional CHAT symbols are listed
too
[!]
[!!]
["]
[?] in CHAT, ( text ) in CA
[/] in CHAT
[//] in CHAT, - in CA
[///] in CHAT
[/?]
[/-]
quicker tempo, no CHAT equivalent, used in CA
slower tempo, no CHAT equivalent, used in CA
larger volume, louder, no CHAT equivalent, used in
CA
lower volume, no CHAT equivalent, used in CA
CA-style overlap
fmc
fmc
fmc
fmc
mark overlap scoping
[>]
[<]
[*] or [* text]
,, for %mor
For %mor
non verbal happenings
scoped symbols
the place to add research content
0
0word
0*word
00word
&; phonological fragment
&=; happening, such as sneeze
intended as a feature of a word, see also the CHAT conventional notations
@ap
@b
@c
@cue
@d
@f
@fp
@fs
@g
@i
@inf
@ins
@k
@l
@m
@n
@nv
@o
@p
@pm
@pr
-s
@q
@s
@sc
@sas
@si
@sl
@t
@u
@x
@wp
a nonempty string
temporary hack for list of languages for Lang attribute
syntactic structure
the unit of a %mor line corresponding to a word (this element belongs to a
word element, but, if the precise correspondence is not yet established, these elements will
be present at the utterance level (contained in an utterance);
%mor part of speech
omitted, CHAT equivalent is 0
category
subcategory
a %mor or %trn line
two or more alternate morphemic groups; CHAT ^
a group of words in %mor
a single word or a compound word
a construct formed by words linked through clitic or compound e.g.
once+and+for+all
equivalent of CHAT symbol @;
the place to add research content
scoped symbols
an optional suffix
nonfinal tone marker
-,
-_
-'
structure used to let annotations to belong to more than one word, can be
recursive, although unnecessary: one can attach more than one annotations to a word, group
of words, or whole utterances
a reference to a point/portion of a mute/action signal, e.g. 0
semicolon , clause delimiter [^c];
scoped symbols
the place to add research content
a word
equivalent of CHAT symbol @;
the place to add research content
scoped symbols
an optional suffix
xx
yy
xxx
yyy
www
0
0word
0*word
00word
&; phonological fragment
&=; happening, such as sneeze
utterance initiators or linkers; they indicate the way to fit the current
utterance with an earlier one, the CHAT conventional symbols are listed
too
+"
+^
+<
+,
++
a pointer to a selection in a video/audio file
frame
second
millisecond
byte
character
+ for mor
word#
=word (English translation)
morphemes
suffix marker, CHAT equivalent is -
suffix fusion marker, CHAT equivalent is &;
morphological category, CHAT equivalent is :, when used after
the stem
the beings along with their characteristics (age, sex...)
stress, blocking etc.
/
//
///
:
^ internal
^ at beginning
*text* in CA
#, pause between words
the place to add research content
the place to add research content
[x number] in CHAT
,
,,
;
:
[c] clause-delimiter;
period, question, exclamation; basic utterance terminator; tone
terminator
+.
+...
+..?
+=.
++.
+!?
+/.
+/?
+//.
+//?
+"/.
+".
a reference to a point/portion of a mute/action signal, e.g. 0
semicolon, clause_delimiter [^c];
scoped symbols
the place to add research content
Phonetic reps of orthographic forms.
Collection of syllables.
Collection of constituents.
Specifies a syllable constituent. The type is one of constituentTypeType.
Each constituent can constist of one or more phones identified by zero-based index of the
parent phonetic rep.
This type represents the alignment (syllable, or segmental) of two phonetic
representations.
Maps a number from pho1-ref to pho2-ref. The number -1 represents an indel
(insertion-deletion point). The numbers must be taken in context with the alignmentType to
be useful.
clitic or compound or reduplication markers in wordnet
compound, CHAT +
clitic, CHAT ~
reduplication, CHAT ++
hyphen, CHAT -
clitic separators in morphemics
preclitic, CHAT $
postclitic, CHAT ~
a group of utterances having something in common, usually the
speaker
these are the (legacy) dependent tiers, %mor line is, now,
<morphemics> element
%add
%act
%alt
%cod; general purpose coding
%coh; cohesion tier
%com;[% text]; comments by investigator
%eng
%err; error coding
[%exc ...]
%exp; [= text]
%flo
%fac
%gls
%gpx
%int
%lan
%ort
%par:
%:
%pho:
%pht:
%mod:
[: text]
%def; on the main line, not recommended
%sit
%ssy
%spa
%spe
%tim
arbitrary annotations, intended as an extension
mechanism
%ton
%rom
%sdi
%sch
%sxx
No symbol
d
#
##
###
fmc should change to xs:duration
For use for delimited material. A workaround for lack of overlapping
elements in XML.
Begin delimited material
End delimited material
For use for delimited material. A workaround for lack of overlapping
elements in XML.
Begin delimited material
End delimited material
Begin and end delimited material (degenerate case)
Underline arbitrary content
Italic arbitrary content
Bold arbitrary content
Long feature <TAG material TAG>for Santa Barbara; other
begin/end features
Nonvocal <<TAG material TAG>>for Santa
Barbara