A fully formatted Postscript version of this paper is available.
This work was supported by the Cognitive Science Program, Office of Naval Research under Grant No. N00014-94-1-0338, to Illinois Institute of Technology. The content does not reflect the position or policy of the government and no official endorsement should be inferred.
Most text-based ITSs view text generation as a back end to pedagogical planning and implement it in a very simple fashion. In contrast, we view the generation of a response as primarily a text planning problem since all decisions, even pedagogical ones, must eventually be expressed as text. We believe that this approach will produce more varied and higher-quality language, and improve student understanding and retention as a result.
Our main conceptual data structure is the dialogue schema. From studying transcripts of human tutors, we have isolated a small group of dialogue schemata which suffice to teach the concepts in the CIRCSIM-Tutor domain model. As these essential dialogue schemata can be implemented with a small number of semantic primitives, the use of dialogue schemata, along with a backtracking planner and a sufficiently rich lexicon, allows us to generate a large and varied set of dialogues at moderate cost.
Over the last four years, the CIRCSIM-Tutor project has collected over 5000 turns of keyboard-to-keyboard tutoring sessions using similar problems and live tutors in order to model pedagogical and linguistic strategies. Within each physiological stage, we have observed that the tutorial dialogue is divided into segments, one for each incorrect core variable. Within a segment, each attempt to teach the value of a variable ends with the tutor requesting the correct value. If the student gives the correct answer, the segment ends. Otherwise, the attempt fails, causing the remaining goals associated with it to be removed from the agenda, although turns which have already been uttered remain part of the conversation. If an attempt fails, the tutor can make another attempt or give the student the answer.
Within an attempt, the tutor can employ any one of a number of correction mechanisms in a nested fashion in an attempt to get the student to give the correct answer. The choice of mechanisms cannot be expressed as a simple algorithmic process but is implemented using a classical planner called the tutorial planner. Each correction mechanism is expressed as a dialogue schema which contains the raw material to enable the tutor to achieve a communicative goal. A dialogue schema contains one or more discourse goals which must be satisfied in the coming turn or turns. The applicability condition for the first turn tells the tutorial planner when the schema applies. Applicability conditions for later turns tell the planner which responses on the part of the student are sufficient to continue the schema. If the student gives a wrong or unexpected response, the planner will search for another schema, such as an error recovery schema, which can be used instead.
For implementation purposes, an offline process breaks each dialogue schema into dialogue schema sections, or plan operators for individual turns. Each dialogue section, whether derived from the expected path of the schema or from an error recovery schema, is implemented as a plan operator. Plan operators are also known as semantic forms.
The choice of semantic primitive does not constrain the form of the eventual surface structure. For example, T-elicits(x, P(x)) does not require that the resulting text have the surface form of a question, but could use any of the major sentence structures:
Similarly, T-conveys(P(x)) need not be realized only by a declarative sentence:
Although many semantic primitives represent actions, not all of them do. In particular, some semantic forms represent discourse-based concepts such as "therefore."
From the rules defined in the previous section, we can see that S-knows(P(x)) can be expressed in three ways:
In other words, each form in the dialogue schema can be implemented by giving the student some information, asking the student for some information, or going on to the next form. The following instantiation for correct-neural occurs the most frequently in our transcripts:
After lexical and syntactic decisions are made, this option would generate text such as the following, assuming that the student gives the correct answer to the initial question. (The generation of the the acknowledgment, i.e. "right", will be described in the following section.)
(1) T: How is TPR controlled?
S: Nervous system.
T: Right. And we're talking about what happens before there are
any neural changes. Now what do you say about TPR?
This text is an example of an interactive explanation, a generalization of the phenomenon which Sanders (1995) [*] calls a "directed line of reasoning" (DLR). If the student finds interactive reasoning too difficult, the tutorial planner may try an explanation which is not interactive. The following common pattern generates an explanation with a followup question.
(2) T: TPR is a neurally controlled variable...Then what value would you
assign to TPR in DR?
Although the human tutors usually prefer to use explicit followup questions, occasionally they terminate an explanation by giving the student the answer. We can use the correct-neural schema to generate this option also, as in the following example:
(3) T: TPR is controlled by the nervous system, and we're talking about
what happens before there are any neural changes. So TPR doesn't
change.
Notice that in (2) the second and third semantic forms have been combined into one sentence, while in (3) the first two forms have been combined.
A final option is to instantiate one of the first two semantic forms as nil, giving rise to a form such as the following:
This option is used to generate a hint for the student. In particular, this option generates a CI-hint in the terminology of Hume et al. (1993) [*] (CI = 'convey information'):
(4) T: Remember that we're talking about what happens before there are
any neural changes. Now what do you say about TPR?
Since each instance of T-conveys and T-elicits can be implemented using any of the syntactic forms suggested at the end of the previous section, a large number of dialogues can be generated. Lexical variation adds to the count. Still other options can be generated by starting with a different discourse schema for teaching this topic. In particular, a different schema is required to generate a PT-hint such as "Think about what controls TPR" (PT = 'point to').
Although human tutors don't need to, the mechanized tutor always ends a turn with an explicit question so that the student knows when to respond. Turn-taking rules work in person-to-person conversation because we are socialized to understand and use them (Sacks, Schegloff & Jefferson, 1974), [*] but people have different expectations from a computer (Dahlbäck & Jönsson, 1991) [*]. The question may belong to either the content-oriented section of the response or the new material. Since each part of the turn must be contiguous, the question can be part of the response only when the turn contains no new material.
An acknowledgment may take one of two forms, or a combination of the two:
If the student gives a correct answer, even if the language used is not precisely what the tutor would like, the content-oriented reply, if provided, usually takes one of the following forms:
If the student's reply is wrong, the tutor will probably issue one of these common forms for negative content-oriented replies:
After replying to the student's statement, the tutor returns to the tutorial plan and generates the next turn's worth of semantic forms.
In the following example, the student does not give the right answer to the initial question. The tutor gives a negative acknowledgment ("not quite"), then responds directly to the student's error before going on to ask the question again. The second time the student gives the correct response ("by neural control"), permitting the tutor to continue with the correct-neural schema.
(5) T: In what way is CC controlled?
S: It's controlled by the volume of blood in the compartment and
affected by inotropic changes. [wrong answer]
--> T: Not quite. Changing the volume changes the performance of the
muscle via the length/tension relationship, i.e. Starling's Law.
Changing the inotropic state of the myocardium is what we mean when
we refer to CC. By what mechanism is CC controlled, then?
S: By neural control?
T: So how will CC be affected in DR?
Since utterances once spoken cannot be retracted, the text generated by the initial instantiation of T-elicits remains part of the conversation. Thus the surface form of a conversation does not necessarily resemble the underlying schema or schemata.
(6) T: I need to remind you. Things work according to the way that they
are controlled. How is HR controlled?
S: Autonomic nervous system.
T: This is DR. How will HR change?
S: MAP [mean arterial pressure] changing affects baroreceptor reflex
changing, affecting HR. [wrong answer]
--> T: In DR no reflex changes have occurred yet.
S: So HR will not change.
T: Correct...
In addition to the case of wrong answers, the tutor often changes schemata when the student says something which is on the path to a correct answer but which needs further dialogue. In that case the new schema usually contains a more detailed discussion of the physiology involved. The following excerpt starts out with the interactive explanation pattern illustrated in (1). The student's answer is correct but incomplete. To help the student trace the response back to a core variable, the tutor uses an interactive explanation pattern which is a variant of the one in (1).
(7) T: What is the primary mechanism of control of TPR?
S: Radius of arterioles.
--> T: Yes. And what is the primary mechanism by which arteriolar radius
is controlled?
S: Sympathetic nervous system.
T: Yes. And we're dealing with the period before any change in nervous
activity occurs. So what do you think about TPR now?
Hume, G. D., Michael, J. A., Rovick, A. A. and Evens, M. W. (1993). Use of hints as a tutorial tactic. In Proceedings of the 15th Annual Conference of the Cognitive Science Society, Boulder. Hillsdale, NJ: Lawrence Erlbaum. return
Sacks, H., Schegloff, E. A. and Jefferson, G. (1974). A simplest systematics for the organization of turn-taking in conversation. Language, 50(4), 696-735. return
Sanders, G. A. (1995). Generation of Explanations and Multi-turn Discourse Structures in Tutorial Dialogue, Based on Transcript Analysis. Doctoral dissertation. Chicago: Illinois Institute of Technology, Department of Computer Science. return
Sinclair, J. M. & Coulthard, R. M. (1975). Towards an Analysis of Discourse: The English Used by Teachers and Pupils. London: Oxford University Press. return
Last updated by Reva Freedman freedman@delta.eecs.nwu.edu on 4/25/96.