A full definition of the CHAT format.
Developed by Romeo Anghelache, from the CHAT specifications, released under
the GNU Public License, 2001. Continuing development by Franklin Chen.
CHAT manual section on this
topic...
A single CHAT transcript.
CHAT manual section on this
topic...
List of the participants in the transcript along with their
individual attributes. Every utterance in the transcript must be identified by a
unique listed participant.
CHAT manual
section on this topic...
Information about a participant
CHAT manual section on this
topic...
Begin a gem (requires matching end of gem).
CHAT manual section
on this topic...
Label for a begin/end gem.
CHAT manual
section on this topic...
End a gem (requires earlier begin of gem).
CHAT manual section
on this topic...
Label for a begin/end gem.
CHAT manual
section on this topic...
Begin a lazy gem; does not require a matching end gem, but its
scope is up to the next lazy gem header, or the end of the transcript if there is no
further lazy gem header.
CHAT manual section
on this topic...
Label for a lazy gem.
CHAT manual
section on this topic...
Version of the XML Schema this transcript was created for.
CHAT manual section on
this topic...
Date of transcription. Note that there can only be one date. A session
spread out over multiple dates must be split into multiple CHAT transcripts.
CHAT manual section
on this topic...
Every transcript must be part of a corpus.
CHAT manual section
on this topic...
The transcript may be associated with at most one media file.
CHAT manual section on
this topic...
The transcript may be associated with at most one media URI.
CHAT manual section
on this topic...
The transcript may be associated with at most one media file.
CHAT manual section on
this topic...
The main languages used in the transcript. (Other languages used only
in specific words do not need to be listed.)
CHAT manual
section on this topic...
CHAT manual
section on this topic...
Design type
CHAT manual
section on this topic...
Activity type
CHAT manual
section on this topic...
Group type
CHAT manual
section on this topic...
Information about text color mappings for use by the CLAN editor.
CHAT manual
section on this topic...
Information about window size and placement for use by the CLAN editor.
CHAT manual
section on this topic...
The PID for the document.
CHAT manual section
on this topic...
The font to be used for display in the CLAN editor.
CHAT manual section
on this topic...
Key to ensure uniqueness among u elements. These IDs are used externally
for double-blind transcription.
Key to ensure all participants have unique ids.
CHAT manual
section on this topic...
KeyRef to ensure that utterances refer to an actual participant.
CHAT manual section on
this topic...
Unscoped complex local events in the middle of an utterance.
CHAT manual
section on this topic...
Code that can only occur at the end of an utterance. Currently arbitrary
information, although there are some conventions.
CHAT manual section on this
topic...
CHAT manual
section on this topic...
CHAT manual
section on this topic...
A comment header.
CHAT manual section on
this topic...
The allowable types of comment headers.
CHAT manual section on
this topic...
Activities.
CHAT manual
section on this topic...
Bck.
CHAT manual section on
this topic...
Date of transcription.
CHAT manual section
on this topic...
Number.
CHAT manual section
on this topic...
Recording quality.
CHAT
manual section on this topic...
Transcription.
CHAT manual
section on this topic...
Types.
CHAT manual
section on this topic...
Blank.
CHAT manual section
on this topic...
Thumbnail.
CHAT manual
section on this topic...
Comment.
CHAT manual
section on this topic...
Location.
CHAT manual
section on this topic...
New episode.
CHAT manual
section on this topic...
Room layout.
CHAT manual
section on this topic...
Situation.
CHAT manual
section on this topic...
Tape location.
CHAT manual
section on this topic...
Time duration.
CHAT manual
section on this topic...
Time start.
CHAT manual
section on this topic...
Transcriber for this transcript.
CHAT manual
section on this topic...
Warning.
CHAT manual
section on this topic...
Page.
CHAT manual section
on this topic...
Tag marker, used in both main line and %mor.
CHAT manual section on
this topic...
MOR manual section on
this topic...
,
CHAT manual section on
this topic...
„
CHAT
manual section on this topic...
MOR manual section on this topic...
‡
CHAT manual section on this topic...
MOR manual section on this topic...
Nonverbal event.
CHAT manual section on
this topic...
0
CHAT manual section on
this topic...
&=; happening, such as sneeze
CHAT manual section on
this topic...
A nonempty string.
Allowable media name.
CHAT manual section on
this topic...
A list of languages, using the official ISO codes.
CHAT manual
section on this topic...
%mor unit of one-to-one correspondence with main line. A single word or a compound word or a terminator.
MOR manual
section on this topic...
mor preclitic
MOR manual
section on this topic...
mor postclitic
MOR manual
section on this topic...
What language(s) a word is in (if not the one in default scope).
CHAT manual
section on this topic...
Word is to be interpreted in a single language.
CHAT manual section on this topic...
Word is a combination of many languages.
CHAT manual section on this topic...
Word can be interpreted as one of many languages.
CHAT manual section on this topic...
Utterance initiators or linkers; they indicate the way to fit the current
utterance with an earlier one.
CHAT manual section
on this topic...
+"
CHAT
manual section on this topic...
+^
CHAT
manual section on this topic...
+<
CHAT
manual section on this topic...
+,
CHAT
manual section on this topic...
++
CHAT
manual section on this topic...
+≋
CHAT
manual section on this topic...
+≈
CHAT
manual section on this topic...
Media bullet used only before a terminator.
CHAT manual section
on this topic...
Media bullet is allowed at the end of an utterance only if after a
terminator.
CHAT manual section
on this topic...
A pointer to a selection in the single video/audio file associated with the
transcript.
CHAT manual section on
this topic...
CHAT manual
section on this topic...
The start time for the selection.
CHAT manual
section on this topic...
The end time for the selection.
CHAT manual
section on this topic...
The unit of time used for the selection.
CHAT manual
section on this topic...
Whether the CLAN editor should skip upon playback.
CHAT manual
section on this topic...
Type of external media referenced.
CHAT manual section on
this topic...
Audio file.
CHAT manual section on
this topic...
Video file.
CHAT manual section on
this topic...
Media is missing.
CHAT manual section on
this topic...
Media exists, but transcript is unlinked.
CHAT manual section on
this topic...
Media exists, but no transcription
exists yet.
CHAT manual section on
this topic...
The time unit.
frame
second
millisecond
byte
character
Information about a participant
CHAT manual section on this
topic...
Speaker's id.
CHAT manual
section on this topic...
Speaker's role.
CHAT manual section on this
topic...
CHAT manual section on
this topic...
Speaker's name.
CHAT manual section on
this topic...
Speaker's age, start of range during transcript.
CHAT manual section on
this topic...
Speaker's group.
CHAT manual section on
this topic...
Speaker's sex.
CHAT manual section on
this topic...
Speaker's SES.
CHAT manual section on
this topic...
Speaker's education.
CHAT manual section
on this topic...
Custom field for additional information about speaker.
CHAT manual section
on this topic...
Speaker's birth date.
CHAT manual section on
this topic...
Speaker's list of languages. Actually redundant because duplicated from
transcript's list of languages.
CHAT manual
section on this topic...
Speaker's first language (note that this does not need to be listed in
the languages header).
CHAT manual section on
this topic...
Speaker's birthplace.
CHAT manual
section on this topic...
Terminator for an utterance.
CHAT manual section on
this topic...
Period.
CHAT manual
section on this topic...
Question mark.
CHAT
manual section on this topic...
Exclamation point.
CHAT manual section on this topic...
+.
CHAT manual section on this topic...
+...
CHAT
manual section on this topic...
+..?
CHAT manual section on this topic...
+!?
CHAT manual section on this topic...
+/.
CHAT
manual section on this topic...
+/?
CHAT manual section on this topic...
+//.
CHAT manual section on this topic...
+//?
CHAT manual section on this topic...
+"/.
CHAT manual section on this topic...
+".
CHAT manual section on this topic...
For heritage only
CHAT
manual section on this topic...
≋
CHAT manual section on this topic...
≈
CHAT manual section on this topic...
Main line terminator with optional %mor information.
CHAT manual section on
this topic...
MOR manual
section on this topic...
Terminator on %mor line, important for %gra.
MOR manual section
on this topic...
Group purely for phonetic annotation purposes. Note that there are no main
line annotations allowed for the group.
CHAT manual
section on this topic...
Group purely for sign annotation purposes. Note that there are no main line annotations allowed for the group. Grouping brackets 〔 and 〕 may be used.
CHAT manual
section on this topic...
Atomic unit on the %sin tier.
CHAT manual section on this topic...
CHAT manual section on this topic...
Syllable stress. Either primary or secondary.
Primary stress, unicode 0x02c8
Secondary stress, unicode 0x02cc
CHAT manual section on
this topic...
CHAT manual section on
this topic...
Phonetic transcriptions of orthographic forms.
CHAT manual section on
this topic...
Phonetic transcription for a word.
CHAT manual section on
this topic...
A phone in the IPA transcription. A phone may consist of one or more
unicode characters.
Specifies a syllable constituent. The type is one of constituentTypeType.
Each constituent can constist of one or more phones identified by zero-based index of the
parent phonetic rep.
The syllable constituent type for this phone.
Each phone is required to have a locally unique id. i.e., sibling ph
elements cannot have the same id.
Used when two ph elements with sctype of 'N' are adjacent.
If hiatus is true, each nucleus is the root of its own syllable.
If hiatus is false, the pair of nuclei are considered a diphthong.
Valid syllable constituent labels.
Syllable boundary marker ('.')
Syllable stress (i.e., primary or secondary)
Deprecated: use stress attribute of syll_start
instead
Left appendix
Onset
Nucleus
Coda
Right appendix
Onset of an empty headed syllable
Ambisyllabic
Unknown
This type represents the alignment of two phonetic representations.
A single column in the alignment.
model reference
actual reference
Utterance annotation type.
CHAT manual section on
this topic...
%add
CHAT manual
section on this topic...
%act
CHAT manual section
on this topic...
%alt
CHAT manual
section on this topic...
%cod; general purpose coding
CHAT manual section
on this topic...
%coh; cohesion tier
CHAT manual section
on this topic...
%com; comments by investigator
CHAT manual section
on this topic...
%eng
CHAT manual section
on this topic...
%err; error coding
CHAT manual section on
this topic...
%exp; [= text]
CHAT manual
section on this topic...
CHAT manual
section on this topic...
%flo
CHAT manual section on
this topic...
%fac
CHAT manual
section on this topic...
%gls
CHAT manual section on
this topic...
%gpx
CHAT manual section
on this topic...
%int
CHAT manual
section on this topic...
%ort
CHAT manual
section on this topic...
%par:
CHAT manual
section on this topic...
%def; on the main line, not recommended
CHAT manual
section on this topic...
%sit
CHAT manual
section on this topic...
%spa
CHAT manual
section on this topic...
%tim
CHAT manual section
on this topic...
Arbitrary annotation of the form %xfoo, intended as an extension
mechanism for the user.
CHAT manual
section on this topic...
Type of group annotation on main line.
CHAT manual
section on this topic...
[=? text]
CHAT manual section on this topic...
[% text]
CHAT manual section
on this topic...
[= text]
CHAT manual
section on this topic...
[=! text]
CHAT
manual section on this topic...
Length of a pause, in nonnumeric terms.
CHAT manual section on
this topic...
(.)
CHAT manual
section on this topic...
(..)
CHAT manual
section on this topic...
(...)
CHAT
manual section on this topic...
Pause length, in numeric terms.
CHAT manual section on
this topic...
For use for delimited material. A workaround for lack of overlapping
elements in XML.
Begin delimited material
End delimited material
For use for delimited material. A workaround for lack of overlapping
elements in XML.
Begin delimited material
End delimited material
Begin and end delimited material (degenerate case)
Mark as underlined arbitrary content, for presentation purposes in CLAN.
CHAT manual section on this
topic...
Mark as italicized arbitrary content, for presentation purposes in CLAN.
CHAT manual section on this
topic...
Long event.
CHAT manual section on this
topic...
Nonvocal material.
CHAT manual
section on this topic...
Either begin, end, or simple.
CHAT
manual section on this topic...
CA delimited material with begin/end.
CHAT manual section on
this topic...
CA subwords that must occur inside a word.
CHAT manual section on
this topic...
≠
CHAT manual section on this topic...
∾
CHAT manual section on this topic...
⁑
CHAT manual section on this topic...
⤇
CHAT manual section on this topic...
∙
CHAT manual section on this topic...
⤆
CHAT manual section on this topic...
Ἡ
CHAT manual section on this topic...
↓
CHAT manual section on this topic...
↻
CHAT manual section on this topic...
↑
CHAT manual section on this topic...
ˈ
CHAT manual section on this topic...
ˌ
CHAT manual section on this topic...
♋
CHAT manual section on this topic...
⁎
CHAT manual section on this topic...
∆
CHAT manual section on this topic...
▔
CHAT manual section on this topic...
◉
CHAT manual section on this topic...
▁
CHAT manual section on this topic...
§
CHAT manual section on this topic...
↫
CHAT manual section on this topic...
∮
CHAT manual section on this topic...
∇
CHAT manual section on this topic...
☺
CHAT manual section on this topic...
°
CHAT manual section on this topic...
⁇
CHAT manual section on this topic...
∬
CHAT manual section on this topic...
Ϋ
CHAT manual section on this topic...
Pause at a point in an utterance.
CHAT manual section on
this topic...
Mark a scope for overlaps.
CHAT manual section on this
topic...
Integer label to distinguish among different overlaps over the same text.
CHAT manual section on
this topic...
[>]
CHAT
manual section on this topic...
[<]
CHAT
manual section on this topic...
CA-style overlap
CHAT manual section on
this topic...
Integer label to distinguish among different overlaps over the same text.
CHAT manual section on
this topic...
Start or end of overlap
CHAT manual section
on this topic...
Start
CHAT manual
section on this topic...
CHAT manual
section on this topic...
End
CHAT manual
section on this topic...
CHAT manual
section on this topic...
The first of a set of overlaps.
CHAT manual
section on this topic...
CHAT manual
section on this topic...
The second (or third, etc.) of a set of overlaps.
CHAT manual
section on this topic...
CHAT manual
section on this topic...
A word. Note that there are lexical restrictions on what characters are
allowed in the text of a word.
CHAT manual section on this
topic...
CHAT manual section on
this topic...
CHAT manual section on this
topic...
CHAT manual section on this
topic...
CHAT manual section on this
topic...
CHAT manual section on this
topic...
word# indicates the word is a separated prefix
CHAT manual
section on this topic...
@z:code user-specified code
CHAT manual
section on this topic...
-s and similar after a form marker
CHAT manual
section on this topic...
Form marker: an attribute for a word.
CHAT manual
section on this topic...
@a
CHAT manual
section on this topic...
@b
CHAT manual
section on this topic...
@c
CHAT
manual section on this topic...
@d
CHAT
manual section on this topic...
@e
CHAT
manual section on this topic...
@f
CHAT manual section on this topic...
@fp
CHAT
manual section on this topic...
@g
CHAT manual section on this topic...
@i
CHAT
manual section on this topic...
@k
CHAT manual
section on this topic...
@l
CHAT manual
section on this topic...
@n
CHAT manual
section on this topic...
@nv
CHAT
manual section on this topic...
@o
CHAT
manual section on this topic...
@p
CHAT manual
section on this topic...
@q
CHAT manual section on this topic...
@sas
CHAT
manual section on this topic...
@si
CHAT manual
section on this topic...
@sl
CHAT
manual section on this topic...
@t
CHAT manual
section on this topic...
@u
CHAT manual
section on this topic...
@x
CHAT manual
section on this topic...
@wp
CHAT manual
section on this topic...
Optional attribute for a word.
0word
CHAT manual
section on this topic...
&~; nonword
CHAT manual section on this topic...
&-; filler
CHAT manual section on this topic...
&+; incomplete
CHAT manual section on this topic...
Mark as untranscribed non-word.
xxx
CHAT manual section on this topic...
yyy
CHAT
manual section on this topic...
www
CHAT manual section on this topic...
A group of material that is annotated. May be nested, i.e., a group may
contain groups as well as words and other material.
CHAT manual section on this
topic...
[: word1 ...]; indicate replacement of a word by one or more words instead. [:: word1 ...] to indicate that the word is a real word
CHAT manual section on
this topic...
CHAT manual section
on this topic...
CHAT manual section on
this topic...
CHAT manual
section on this topic...
[:: word1 ...] indicates that the word was real and MOR should analyze it
CHAT manual
section on this topic...
A group of words in %mor or %trn or %umor.
CHAT manual
section on this topic...
CHAT manual
section on this topic...
GRASP data for a single word in %gra or %grt or %ugra
CHAT manual
section on this topic...
CHAT manual
section on this topic...
Begin or end quoted material; “ and ”
CHAT manual section
on this topic...
Retracing and other markers.
CHAT manual section on this
topic...
[!]
CHAT manual
section on this topic...
[!!]
CHAT manual section on this topic...
[?] in CHAT, ( text ) in CA
CHAT manual
section on this topic...
[/] in CHAT
CHAT manual
section on this topic...
[//] in CHAT, - in CA
CHAT manual
section on this topic...
[///] in CHAT
CHAT
manual section on this topic...
[/?]
CHAT
manual section on this topic...
[/-]
CHAT manual
section on this topic...
[e]
CHAT manual
section on this topic...
[# ...] duration annotation
CHAT manual section
on this topic...
Separator or tone direction marker.
CHAT manual section on
this topic...
CHAT manual
section on this topic...
;
CHAT manual section
on this topic...
:
CHAT manual section on
this topic...
[c] clause-delimiter;
CHAT
manual section on this topic...
⇗
CHAT manual
section on this topic...
↗
CHAT manual
section on this topic...
→
CHAT manual section on
this topic...
↘
CHAT manual
section on this topic...
⇘
CHAT manual
section on this topic...
∞
CHAT manual
section on this topic...
≡
CHAT manual section
on this topic...
Legal speaker ID for identifying utterances.
CHAT manual section on
this topic...
Transcript-scoped option that affects the interpretation of the transcript.
CHAT manual section on
this topic...
Allows CA features and restriction relaxations.
CHAT manual section on
this topic...
Allows CA features and restriction relaxations, but does not automatically open CAFont.
CHAT manual section
on this topic...
Turns off checking of time sequence of bullets.
CHAT manual
section on this topic...
Allows a transcript to be accepted even though it is not properly
parsable yet.
CHAT manual
section on this topic...
Purely for the display purposes of CLAN, in order to have CLAN handle
multiple media bullets in a single utterance.
CHAT manual section
on this topic...
IPA.
CHAT manual section on
this topic...
Mark file as dummy for CLAN CHECK.
CHAT manual section
on this topic...
A reference to a graphics file.
CHAT manual section on this
topic...
&*WHO=word; word spoken by someone else during an utterance.
CHAT manual section
on this topic...
morphemes
MOR manual section on
this topic...
suffix marker, CHAT equivalent is -suffix
MOR
manual section on this topic...
suffix fusion marker, CHAT equivalent is &suffix
MOR manual section on this topic...
morphological category, CHAT equivalent is :suffix
MOR
manual section on this topic...
Morphemic translation: =word
MOR manual section
on this topic...
Morphemix prefix: word#
MOR manual section on
this topic...
%mor part of speech
MOR manual section on this
topic...
%mor POS subcategory
MOR manual
section on this topic...
Morphemic "word": the unit of a %mor line corresponding to a single non-compound word on the
main line.
MOR manual section
on this topic...
Morphemic "compound word" using +
MOR manual
section on this topic...
Unspoken segment in a word, coded in CHAT
by surrounding parentheses.
CHAT manual section on
this topic...
CHAT manual
section on this topic...
Prosody inside a word: stress, blocking etc.
CHAT manual section on this
topic...
:
CHAT
manual section on this topic...
^ internal
CHAT
manual section on this topic...
Clitic or compound marker inside a word.
CHAT manual section on this
topic...
compound, CHAT +
MOR manual
section on this topic...
clitic, CHAT ~
MOR manual
section on this topic...
Morphological category
MOR manual
section on this topic...
Morphological stem, alphanumeric
MOR manual section
on this topic...
[*] or [* text]
CHAT manual
section on this topic...
Dependent tier: scoped annotation that applies to a whole utterance.
CHAT manual section on
this topic...
Open-ended user-specifiable annotation subtype.
CHAT manual section on
this topic...
Allows for identification of a user who made this annotation. (Not
currently supported in CHAT.)
Inlined dependent tier: scoped annotation that applies to a group.
CHAT manual section on
this topic...
A single utterance, along with all dependent information.
CHAT manual section on
this topic...
The language the entire utterance is in (unless individual words'
languages are overridden explicitly).
CHAT manual
section on this topic...
The speaker of the utterance.
CHAT manual
section on this topic...
A unique ID is provided for each utterance in a transcript, for use by
tools. Note that the text format of CHAT does not currently support this, and CLAN does
not know about it.
CHAT manual section on
this topic...
%wor dependent tier, similar to main tier.
CHAT manual section on
this topic...
The language the entire utterance is in (unless individual words'
languages are overridden explicitly).
CHAT manual
section on this topic...
Language code restricted to only three instead of one to eight characters.
CHAT manual section
on this topic...
Allowable roles.
CHAT manual section on this topic...