Friendly Medical Database System Interface in Chinese

Da-Jinn Wang wangdaj@minna.acc.iit.edu
Tsong-Yi Chen chentso@minna.acc.iit.edu
Martha W. Evens mwe@minna.acc.iit.edu

Department of Computer Science and Applied Mathematics
Illinois Institute of Technology, Chicago, IL 60616

Abstract

We have built a friendly Chinese language interface for a medical database using the dBASE system. The interface accepts a query in Chinese from a user, then it uses the parser, the grammar rules, and the underlying lexicon to understand and translate the user's query. Finally, the interface will execute the translated query and display the result to the user. The most important merit of the system is that it can also serve as an interface for an expert system. When a user inputs his query, the system will ask him for more information based on his query. Then the system will make a medical decision based on the given information.

Introduction

There are many friendly interfaces in the world, but most of them have only very restricted defined functions. Users can only use the defined functions. If users want to do something not included in the defined functions, users must take a long time to learn how to make a query understood by the system. Access to the database must be easy [1]. We can no longer make users spend a long time to learn how to get their data in a database. Users need a natural language interface. We can find a few interface systems in natural English [2,3], but no natural language interface in Chinese. We built a natural language interface that translates queries in Chinese into database queries that can be made really friendly. Users do not need any training to formulate legal commands before they operate the natural language interface. Users only need to input their queries in Chinese, e.g., '§ÚªºÀY«Üµh'(my head is very painful) to the interface. The system will respond '§A¥i¯à±o¤FÀY µh¯g©Î°¾ÀYµh' (you may have a headache or migraine) to user. The result is the same as if the users input a dBASE command 'DISPLAY ALL FOR SYMPTOM='ÀYµh''.

System Architecture

How do we build a friendly interface that can understand Chinese and translate Chinese into commands that can be communicated to the system? The interface needs a lexicon to understand word meanings and a set of grammar rules to parse user's sentences. Our friendly interface is divided into five parts: INPUT, WDSG, PRS, IN_ACT and OUTPUT. The INPUT routine helps the user enter a query in Chinese. WDSG uses a lexicon to do word segmentation, because Chinese sentences do not contain spaces between two words. The lexicon is also used to translate our words to the equivalent dBASE command words. PRS uses these dBASE words and grammar rules to formulate a query following dBASE rules. IN_ACT follows the translated query got from PRS and asks the user for some related information to make a medical decision. IN_ACT will modify the old query to produce a new query. OUTPUT executes the modified query produced by IN_ACT and formats the answer. The architecture of the friendly interface is : (Fig. 1)

INPUT     ==> WDSG            ==>PRS          ==>IN_ACT          ==>OUTPUT
(query in     (uses lexicon      (parses using   (asks questions    (formultes
Chinese)      to segment input   grammar rules)  to make a medical  the
              into words and                     decision)          answer)
              find dBASE terms)

An Example: Medical Database System

As one application of our interface, we designed a simple friendly interface in Chinese for a medical database system on dBASE. The medical database is included two data tables: DISEASE.DBF and SYMPTOM.DBF.

Syntax of dBASE Commands

The format of dBASE commands is 'VERB SCOPE FIELD FOR (or WHILE) CONDITION', where 'CONDITION' is made up of 'FIELD COMPARISON VALUE' or 'VALUE1 COMPARISON VALUE2' or 'CONDITION LOGICAL CONDITION'. For example, 'Display All Item For Price>100' is a legal dBASE command. Here 'Display' is 'VERB', 'All' is 'SCOPE', 'Item' is 'FIELD', and 'Price>100' is 'CONDITION'.

WDSG

Some Chinese translate the concept 'word' as a single Chinese character, but Chinese linguists give a different definition. We use the definition that the 'word' in Chinese is the smallest, meaningful and freely used unit [6]. This unit is often a sequence of characters. For example, '¸ê®Æ' (data) is a 'word' but it has two Chinese characters. A Chinese sentence is composed of concatenated words with no space between two adjacent words [7]. At first, we need to do word segmentation to decide where the word boundaries are. We put many Chinese words into a lexicon. WDSG will search for the fitted words in the lexicon, and separate every word into different rows of a temporary table.

Lexicon

The structure of our lexicon includes : UNIT, CATEGORY, SUBCAT, TRANS, and ROOT.

There is a row in lexicon table:

  UNIT  CATEGORY     SUBCAT     TRANS      ROOT
  ¦C¥X   VERB        N          Display    Åã¥Ü 

The row means that '¦C¥X' (display) is a verb, it does not have sub-category, it can be translate to a dBASE command word 'Display', and its synonym is 'Åã¥Ü'. There are some important ideas in our lexicon:

WDSG Action

For example, a user inputs '§ÚÀY©ü' (I am dizzy).

WDSG uses a lexicon to do word segmentation:
     UNIT              CATEGORY       SUBCAT       TRANS     ROOT
     §Ú(I)             NONE           S            '§Ú'
     ÀY©ü(dizziness)   VALU           S            'ÀY©ü'    'ÀY·w'
WDSG does synonym replacement:
     UNIT              CATEGORY       SUBCAT      TRANS    ROOT
     §Ú(I)             NONE           S           '§Ú'
     ÀY·w(dizziness)   VALU           S           'ÀY·w' 

PRS

We obtain the keywords for the dBASE commands from the result of WDSG. Now, we only need the fields CATEGORY, SUBCAT and TRANS in the parsing process. We selected a bottom-up strategy to do parsing, because we have the command keywords in TRANS. We have developed some rules to reorganize the keywords as legal commands.

Grammar Rules

There are two kinds of grammar rules, the first group of rules is used to reorganize keywords as a legal dBASE command; the second group of rules is to determine the proper meaning of ambiguous words in a sentence.

    CATEGORY       SUBCAT         TRANS
     VALU          S              'ÀYµh' (headache)
After PRS works, the row will be changed to:
    CATEGORY       SUBCAT         TRANS
    COND           S              SYMPTOM->SYMPTOM='ÀYµh'

How do the group of rules work?

    CATEGORY       SUBCAT        TRANS
    VERB           S             '·Pı' (feel)
    VALU           SD            'ÀYµh' (headache)

After PRS works, the correct meaning of 'ÀYµh' is decided and a new row:

    CATEGORY       SUBCAT        TRANS
    VERB           S             '·Pı' (feel)
    COND           S             SYMPTOM->SYMPTOM='ÀYµh'

PRS Action

PRS extracts CATEGORY, SUBCAT, TRANS from the result of WDSG:

    CATEGORY    SUBCAT  TRANS
    NONE        S       '§Ú'
    VALU        S       'ÀY·w'
Rule: VALU(S) ->COND(S)
    CATEGORY    SUBCAT  TRANS
    NONE        S       '§Úªº'
    COND        S       SYMPTOM->SYMPTOM='ÀY·w'
Rule: NONE ->
    CATEGORY    SUBCAT  TRANS
    COND        S       SYMPTOM->SYMPTOM='ÀY·w'
Rule: COND(S) -> SENT(E)
    CATEGORY    SUBCAT  TRANS
    SENT        E       DISP ALL FOR SYMPTOM->SYMPTOM='ÀY·w'

IN_ACT

A user inputs his query in Chinese to the system, till now the system has translated the Chinese query to a dBASE command. If the system executes the dBASE command now, a result will be displayed. But the result is not friendly enough. For example, if the Chinese query '§ÚÀY·w' (I am dizzy), the result of the query is:

   Record#     DISEASE       SYMPTOM    WEIGHT
       34      ¥Ø¯t          ÀY·w       20
       37      ¤¤´»          ÀY·w       20

We can see that there are some disadvantages to the result. The first disadvantage is that most users cannot understand the meaning of the result. So, the first purpose of IN_ACT is to format the result to be more friend. From the preceding example, there are two diseases¥Ø¯t (dizzy), ¤¤´» (sun-stroke) in the result, user can not know which disease he has. There is no enough information to make a medical decision. The second purpose of IN_ACT is to improve the output and server as an interface for an expert system. We have a common experience when we go to see a doctor, we only say our most evident symptoms to doctor. Then the doctor will check whether you have other symptoms. So, our system assumes that users will not generally input all their symptoms to the system at one time. IN_ACT will use the symptoms that users have told to the system as a base and ask the users for more possible symptoms. For example,

    Disease D1 has symptoms S1 (weight 60), S2 (weight 30), S3 (weight 10)
    Disease D2 has symptoms S1 (weight 70), S4 (weight 30)
    Disease D3 has symptoms S1 (weight 65), S2 (weight 15), S4 (weight 20)
    Disease D4 has symptoms S2 (weight 75), S5 (weight 25)

If the user says that he has symptom S1, then IN_ACT will find that diseases D1, D2, D3 has symptom S1. IN_ACT will ask the user whether he has symptoms S2, S3, S4. IN_ACT will not ask the user whether he has symptom S5. If the user says that he also has symptom S4, then IN_ACT calculates the sum of disease's weight:

    The sum of disease D1's weight is 60.
    The sum of disease D2's weight is 100.
    The sum of disease D3's weight is 85.

IN_ACT will determine that the user has the disease with the highest weight. The following is the action of IN_ACT for query '§ÚÀY·w' (I am dizzy).

IN_ACT Action

    IN_ACT asks user                                        User's answer
    §A¬O§_¤ß±ª (Do you feel palpitations?)                  ¬O(yes)
    §A¬O§_¥Ö½§µo¬õ (Is your skin hot?)                           §_(no)
    §A¬O§_¥ØÐV (Are you dizzy?)                             ¬O(yes)
    §A¬O§_¦Õ»ï (Do you feel a buzzing in you ears?)              §_(no)
    §A¬O§_©I§l«æ«P (Are you breathing fast?)                ¬O(yes)
    §A¬O§_¹Ã¦R (Do you throw up?)                                §_(no)
    §A¬O§_äú¤ß (Do you have nausea?)                        ¬O(yes)

OUTPUT

We can find that the final result is friendly after IN_ACT acts. The user will get only one disease that he may have. There are some other words in the output to help user to understand the final result. The final result of query '§ÚÀY·w' (I am dizzy):

    §A¦³¯gª¬: ÀY·w ¤ß±ª ¥Ø¯t ©I§l«æ«P äú¤ß
    (You have symptoms: vertigo, palpitation, dizziness, rapid respiration,
    nausea)   §A¥i¯à±o¤F: ¤¤´»
    (You may have : heat stroke)
    ¤¤´»ªº¯gª¬: ÀY·w ©I§l«æ«P ¥Ö½§µo¬õ ¥Ø¯t ¤ß±ª äú¤ß
    (Symptoms of heat stroke: vertigo, rapid respiration, skin fever, dizziness,
    palpitation, nausea)   ¬ÛÃöª¾ÃÑ: ….
    (What heat stroke is: )
    «æ±Ï¤èªk: ….
    (First-aid methods)

Conclusion

We have described how a friendly interface in Chinese for a medical database system works. A friendly interface in natural language will be good for users who are not familiar with formulating queries. They can extract their useful information residing in a database by using a natural language interface. The friendly medical database system interface not only translates user's Chinese query to dBASE command, but also formats friendly output for the users. The model of the lexicon and the grammar rules can also be used in other databases, for example, a personnel database. The structure of the lexicon can be extended for other applications.

Reference

[1]  Templeton, Majorie (1979);"EDFID: A Friendly and Flexible Front-End for Data Management Systems";
     17th Annual Meeting of the Association for Computational Linguistics, 91-93.
[2]  Burger, J., Leal, A., and Shoshani, A. (1975); "Semantic Based Parsing and a Natural-Language Interface
     for Interactive Data Management"; AJCL Microfiche 32, 58-71.
[3]  Grishman, R. and Hirschman (1978);"Question Answering from Natural Language Medical Data Base";
     Artificial Intelligence 11, 25-43.
[4]  Shue, Chu-Chie (1980); Family First-Aid Methods; Shing-Fong Company,
     Taipei, Taiwan.
[5]  Wang, Li-Fang (1980); General Family Medical Knowledge; National Bookstore Company,
     Taipei, Taiwan.
[6]  Liang, N.Y. (1989);"On the Automatic Segmentation of Chinese Words and the
     Related Theory"; R.O.C. Computational Linguistics Conference¢º, 23-27.
[7]  Lin, Wei-Huang and Martha Evens (1995);"Statistical Approaches to Chinese Word Segmentation";
     Proceedings of the 1995 Midwest Artificial Intelligence and Cognitive Science Society Conference,
     83-87.

COPYRIGHT (c), 1996