Computer-mediated Communication

TEI P5:

Guidelines for Electronic Text Encoding and Interchange

9. Computer-mediated Communication

Table of contents

This chapter describes the TEI encoding mechanisms available for textual data that represents discourse from genres of computer-mediated communication (CMC). It is intended to provide the basic framework needed to encode CMC corpora.

Next 9.2. Basic Units of CMC
Home

TEI: General Considerations⚓︎9.1. General Considerations

While the term computer-mediated communication might be used broadly to describe all kinds of communications that are mediated by digital technologies (such as text on web pages, written exchanges in chats and forums, interactions with artificial intelligence systems, the spoken conversations in internet video meetings), for the purposes of these Guidelines we use the term to apply to forms of communication that share the following features:

they are dialogic;
they are organized as interactional sequences so that each communicative move may determine the context for subsequent moves (typically taken by another interlocutor) and may react to the context created by a prior move;
they are created and displayed using computer technology or human-machine interfaces such as keyboard, mouse, speech-to-text conversion software, monitor or screen and transmitted over a computer network (typically the internet).

Such communications may be expressed as posts (cf. 9.3.1. CMC Posts), utterances, onscreen activities, or bodily activities exerted by a virtual avatar.

The following kinds of platforms support CMC:

chats, messengers, or online forums;
social media platforms and applications;
the communication functions of collaborative platforms and projects (e.g. an online learning environment, or a ‘talk’ page);
3D virtual world environments;
other interactive services supported by the internet.

CMC supports multimodal expression combining text, images, sound. Whereas early CMC systems (e.g. Internet Relay Chat, ‘IRC’ for short, the Usenet ‘newsgroups’, or even the Unix talk system) were completely ASCII-based, most CMC applications now permit combining media formats (e.g. written or spoken language with graphic icons and images) and mixing communication technologies on one platform (e.g. combined use of an audio connection, a chat system, and a 3D interface in which users control a virtual avatar as in many multiplayer online computer games or in virtual worlds).

TEI: Basic Units of CMC⚓︎9.2. Basic Units of CMC

This section describes the encoding mechanisms for the basic units of CMC and for their combined use to encode CMC data.

We use the term basic CMC unit to refer to a communication produced by an interlocutor to initiate or contribute to an ongoing CMC interaction or joint CMC activity. Contributions to an ongoing interaction are produced to perform a move to develop the interactional sequence, for instance to respond in chats or forum discussions. Contributions to joint CMC activities may not all be directly interactional; some may be part of a collaborative project of the involved individuals. Such collaboration could involve editing activities in a shared text editor or whiteboard in parallel with an ongoing CMC interaction (chat, audio conversation, or audio-video conference) in the same CMC environment in which these editing activities are discussed by the participants.

Basic units of CMC can be described according to three criteria:

the temporal properties of when these contributions are produced by their creators, transmitted via CMC systems, and made accessible for the recipients;
the modality of the unit as a whole, whether verbal or nonverbal;
for verbal units: whether the unit is expressed in the written or spoken mode.

A taxonomy of basic CMC units resulting from these criteria is given in the following figure.

Figure 11. Taxonomy of basic CMC units according to Beißwenger and Lüngen (2020)

The most important distinction in the CMC taxonomy concerns the temporal nature of units exchanged via CMC technologies. The left part of the taxonomy describes units that are performed (by a producer) and perceived (by a recipient) as a continuous stream of behaviour. Units of this type can be performed as

spoken utterances,: i.e. stretches of speech which are produced to perform a speaker turn in a conversation,
bodily activity,: i.e. nonverbal behaviour (gesture, gaze) produced to perform a speaker turn, either performed by the real body of an interlocutor (e.g. in a video conference) or through the virtual avatar of an interlocutor in a 3D environment,
onscreen activities,: i.e. non-bodily expressions that are transmitted to the group of interacting or coworking participants, for instance the editing of content in a shared text editor which can be perceived by the other parties simultaneously (as may be the case in learning or collaboration environments).

The right part of the CMC taxonomy describes units in which the production, transmission, and perception of contributions to CMC interactions are organized in a strictly consecutive order: The content—verbal, nonverbal, or multimodal—of the contribution has to be produced before it can be transmitted through a network and made available on the computer monitor or mobile screen of any other party as a preserved and persistent unit. We term this type of unit a post. Posts occur in two different variants:

as written or multimodal posts, which are produced with an editor form that is designed for the composition of stretches of written text. Most contemporary post-based CMC technologies provide features for the inclusion of graphic and audio-visual content (emoji graphics, images, videos) into posts and even to produce posts without verbal content (which then may consist only of emojis, an image, or a video file). Written and multimodal posts are the standard formats for user contributions in primarily text-based CMC genres and applications such as chat, SMS, WhatsApp, Instagram, Facebook, X (Twitter), online forums, or Wikipedia talk pages.
as audio posts, which are produced using a recording function. In contrast to CMC units of the type utterance which are produced and transmitted simultaneously, audio posts first have to be recorded as a whole and are then transmitted, as audio files, via the internet; the availability of the recording is indicated in the screen protocol by a template-generated, visual post; the recipients can play the recording (repeatedly) by activating the play button displayed in the post on the screen. Examples of CMC applications that implement audio posts are WhatsApp or RocketChat.

Three of the four basic CMC units described above can be represented with models that are described elsewhere in the TEI Guidelines:

CMC unit	Type of corpus data	TEI P5 element
spoken utterance	transcription of speech	u
bodily activity	textual description	kinesic
onscreen activity	textual description	incident

The u, kinesic, and incident elements are not limited to CMC, but apply to encode textual transcriptions of spoken turns and CMC data about bodily activity and onscreen activity. The CMC unit post, which is specific to CMC, is introduced in 9.3.1. CMC Posts.

TEI: Encoding Unique to CMC⚓︎9.3. Encoding Unique to CMC

This section describes elements, attributes, and models which are unique to CMC and the TEI CMC module.

TEI: CMC Posts⚓︎9.3.1. CMC Posts

While the concept of a post is not unique to computer-mediated communication (ask anyone who has posted a ‘lost cat’ sign in the local market), this chapter concerns itself only with postings within a framework of a CMC system. Thus the element post is unique to the encoding of computer-mediated communication (CMC).

post a written (or spoken) contribution to a CMC interaction which has been composed (or recorded) by its author in its entirety before being transmitted over a network (e.g., the internet) and made available on the monitor or screen of the other parties en bloc.

Posts occur in a broad range of written CMC genres, including (but not limited to) messages in chats and WhatsApp dialogues, tweets in X (Twitter) timelines, comments on Facebook pages, posts in forum threads, and comments or contributions to discussions on Wikipedia talk pages or in the comment sections of weblogs.

Posts can be either written or spoken:

written or multimodal posts: In the majority of CMC technologies posts are composed as stretches of text using a keyboard or speech-to-text conversion software in an entry form on the screen. In many cases the technology allows authors to include or embed graphics (emojis or images), video files, and hyperlinks into their posts.
spoken (audio posts): A growing number of CMC technologies, e.g. messenger software such as WhatsApp or RocketChat, allow for an alternative, spoken production of posts by providing a recording function which allows users to record a stretch of spoken language and transmit the resulting audio file to the other parties.

The element post may co-occur with u, kinesic, incident, or other existing TEI elements within a div, or directly within the body, and may contain headings, paragraphs, openers, closers, or salutations.

The post element is a member of several TEI attribute classes, including att.ascribed, att.canonical, att.datable, att.global, att.timed, and att.typed, and as such may take a variety of attributes.

Previous 9.3.1. CMC Posts
Next 9.3.3. Attributes for General CMC Encoding
Home

TEI: Attributes Specific to CMC post⚓︎9.3.2. Attributes Specific to CMC <post>

Three attributes pertain specifically to post:

post a written (or spoken) contribution to a CMC interaction which has been composed (or recorded) by its author in its entirety before being transmitted over a network (e.g., the internet) and made available on the monitor or screen of the other parties en bloc.
modality written or spoken mode. Suggested values include: 1] written; 2] spoken (for audio (or audio-visual) posts)
replyTo indicates to which previous post the current post replies or refers.
att.indentation provides attributes for describing the indentation of a textual element on the source page or object.
indentLevel specifies the level of indentation of an item using a numeric value.

The type of the content of a post (i.e., whether the content is text, an image, a video clip, etc.) is indicated by the child elements of the post. (E.g., a post might have a child p, or a child figure with a graphic, or a child figure with a media, or some combination thereof.) How that content was created—whether it was recorded speech or not—may be described with the modality attribute. Because spoken language differs significantly from written language, the suggested values only separate the spoken modality from the written modality, which covers all cases other than spoken natural language. The use of modality is recommended but not required.

<post modality="written"
generatedBy="human" synch="#t005" who="#A06"
xml:id="cmc_post09">
<figure type="image" generatedBy="human">
  <desc xml:lang="en">screenshot of the google search for hairdresser "Pasha's Haare'm"
     with the average google rating (4,5 of 5 stars), the address, the phone number, and
     the opening hours.</desc>
</figure>
</post>

⚓︎

The replyTo attribute is used to capture information drawn from the original metadata associated with a post that asserts to which previous post the current post is a response, or to which previous post it refers. This metadata is included by many, but not all, CMC environments, when the user executes a formal reply action (e.g., by clicking or tapping a reply button). This attribute should not be used to encode interpreted or inferred reply relations based on linguistic cues or discourse markers.

The replyTo attribute indicates the replied-to or referred-to posts by providing one or more pointers to them. In the following example, reply references in the source indicate that the first post is a reply to an initial post that is not part of the example, the second is a reply to the first, and the third is a reply to the second.

<post type="comment" modality="written"
generatedBy="human" xml:id="cmc_post10" who="#u7"
replyTo="#cmc_post09" when-iso="2015-07-29T21:44">
<p>Es hat den Anschein, als wäre bei BER durchaus große Kompetenz am Bau, allerdings
   nicht in Form von Handwerkern….</p>
<p>http://www.zeit.de/2015/29/imtech-flughafen-berlin-ber-verzoegerung/komplettansicht</p>
</post>
<post type="comment" modality="written"
generatedBy="human" xml:id="cmc_post11" who="#u8"
replyTo="#cmc_post10" when-iso="2015-07-30T19:11">
<p>Nein Nein, an den Handwerkern kann es rein strukturel nicht gelegen haben. Niemand
   lässt seine Handwerker auf der Baustelle derart allein. Zudem gibt es höchstoffizielle
   “Abnahmen” von Bauabschnitten/phasen. Welcher Mangel auch bestanden hatte, er hätte
   Zeitnah auffallen müssen.</p>
<p>Uuups, für Imtek hab ich mal in einer Nachunternehmerfirma gearbeitet. Imtek is
   offenbar ein universeler Bauträger, der alles baut.</p>
</post>
<post type="comment" modality="written"
generatedBy="human" xml:id="cmc_post12" who="#u8"
replyTo="#cmc_post11" when-iso="2015-07-30T19:26">
<p>Stahlkunstruktionen dacht ich mal, was die bauen—oder bauen lassen.</p>
<p>Das ist schon ein übles Ding. Die Ausschreibungenund Angebote sind unauffällig, aber
   wenn Unregelmässigkeiten auftreten (im Bauverlauf) dann gibt es die saftigen
   Rechnungen. Da steht dann der Bauherr da und fragt sich, wie er denn so schnell einen
   fähigen Ersatz herbekommt. Und diese Frage erübrigt sich meist, weil der Markt der
   Baufirmen das nicht hergibt — weil tendenziel 100 % Auslastung. (und noch schlimmer:
   Absprachen) Was auch Folge des Marktdrucks gewesen war.</p>
</post>

modality	written or spoken mode. Suggested values include: 1] written; 2] spoken (for audio (or audio-visual) posts)
replyTo	indicates to which previous post the current post replies or refers.

9. Computer-mediated Communication

TEI: General Considerations⚓︎9.1. General Considerations

TEI: Basic Units of CMC⚓︎9.2. Basic Units of CMC

TEI: Encoding Unique to CMC⚓︎9.3. Encoding Unique to CMC

TEI: CMC Posts⚓︎9.3.1. CMC Posts

TEI: Attributes Specific to CMC post⚓︎9.3.2. Attributes Specific to CMC <post>

TEI: Attributes for General CMC Encoding⚓︎9.3.3. Attributes for General CMC Encoding

TEI: CMC Macrostructure⚓︎9.4. CMC Macrostructure

TEI: Macrostructure of CMC Collections and Documents⚓︎9.4.1. Macrostructure of CMC Collections and Documents

TEI: Sequences, Sections, Threads⚓︎9.4.2. Sequences, Sections, Threads

TEI: Multimodal CMC⚓︎9.4.3. Multimodal CMC

TEI: Documenting CMC (and providing general metadata)⚓︎9.5. Documenting CMC (and providing general metadata)

TEI: Documenting the Source of a Corpus of CMC data⚓︎9.5.1. Documenting the Source of a Corpus of CMC data

TEI: Describing the Source of a CMC Document⚓︎9.5.2. Describing the Source of a CMC Document

TEI: Documenting the Sampling of CMC data⚓︎9.5.3. Documenting the Sampling of CMC data

TEI: Participants⚓︎9.5.4. Participants

TEI: Timeline⚓︎9.5.5. Timeline

TEI: Recommendations for Encoding CMC Microstructure⚓︎9.6. Recommendations for Encoding CMC Microstructure

TEI: Emojis and Emoticons⚓︎9.6.1. Emojis and Emoticons

TEI: Posts with Graphics⚓︎9.6.2. Posts with Graphics

TEI: Circulation⚓︎9.6.3. Circulation

TEI: Linguistic Annotation⚓︎9.6.4. Linguistic Annotation

TEI: Named Entities and Anonymization⚓︎9.6.5. Named Entities and Anonymization

TEI: The TEI CMC Module⚓︎9.7. The TEI CMC Module