A202 / I211 Assignment 5

Robust applications

Due Thursday, October 7th, 3:00 PM

Individual work this week.

In lab

  1. The simple word count application of your last lab was not robust, or user friendly: it did not behave well if the user makes an error. For example, suppose the user forgets the .txt extension on the test file used in the last assignment's example
    >python lab4.py Monty test
    Traceback (most recent call last):
      File "lab4.py", line 12, in ?
        main()
      File "lab4.py", line 7, in main
        file = open(fileName)
    IOError: [Errno 2] No such file or directory: 'test'

    This is not friendly to the user. Though the message generated by the system does indicate the problem is that the file does not exist, this is buried in a lot of information that may help a programmer to debug programs, but is likely to confuse or even alarm most users.

    This program can be used to count not just single words, but also phrases consisting of words separated by spaces (but not newlines). However, if the user fails to put quotes around the phrase, the result is a message that is of no help to anyone who is not a programmer.

    >python lab4.py Monty Python test.txt
    Traceback (most recent call last):
      File "lab4.py", line 12, in ?
        main()
      File "lab4.py", line 6, in main
        word, fileName = sys.argv[1:]
    ValueError: unpack list of wrong size
    The shell (command line interpreter) parsed Monty and Python as two separate arguments, so the program was passed three arguments instead of the two it was designed for. Giving a command the wrong number of arguments is a common error, usually due to the user simply forgetting an argument. If the program had been written differently, it might have simply ignored the extra argument and tried to use the word Python as the file name, resulting in a different error message that might be even more confusing to the user.

    (Though widely used software should never present users with unintelligible error messages, it is all to common to encounter them. Knowing some programming often helps in understanding such messages, even if one is not familiar with the program or even the language it is written in. This is another benefit of learning some programming.)

    Start with your lab4.py solution, or the one posted on the course web. Rename it lab5.py and modify it to provide a helpful message if there are the wrong number of arguments or the file does not exist: For example

    >python lab5.py Monty test
    Error: test is not a readable file
    >python lab5.py Monty Pyton test.txt
    Usage: python lab5.py word file
    Returns the number of times word occurs in file.
    The word argument may be a phrase containing spaces if it is quoted.
    >python lab5.py "Monty Python" test.txt
    1
    Notice that the file name was included in the first error message. The second error message might have been just Error: wrong number of arguments, but instead it follows a practice commonly used by Unix utilities of printing brief documentation for the command, beginning with a Usage line.

    Of course you will need to use a conditional statement to determine if there are the right number of arguments. Before trying to open the file, use the expression os.access(fileName, os.R_OK), which returns true if and only if the file whose name (path) is stored in the variable fileName. Of course the os module must be imported first.

    You may wish to use another feature of Python: if the first expression or statement in a module is a string, it is stored in a module variable named __doc__. If the program prints general documentation of its usage in response to errors, the documentation can then be placed where the programmer most expects to find it, at the beginning of the program module, and not have to be repeated for the error message. Python has a number of such handy features that enhance the pleasure of programming.

  2. Next add the numbersToStrings and getTitle methods of lab3.py to your lab5.py file so you can make them more robust. (Of course, as always when we build on prior work, you can use either your version or the solution on the web.) To allow testing of these files using F5 without running the above application, temporarily comment out the main call at the end of the file. Here is a sample of how the lab 3 versions of these functions probably behave when they are given bad arguments:
    >>> numbersToStrings(3)
    
    Traceback (most recent call last):
      File "", line 1, in -toplevel-
        lab3.numbersToStrings(3)
      File "C:\home\202\a\5\lab3.py", line 11, in numbersToStrings
        for n in numList:
    TypeError: iteration over non-sequence
    >>> getTitle('<html><body><titl>Holy Grail</titl>')
    '<body><titl>Holy Grail</titl'
    

    For a function intended to be used by programmers, rather than an application often used by non-programmer, the traceback information is appropriate, for it helps the programmer know where the error occurred. But the TypeError: iteration over non-sequence  message is not very helpful, even to a programmer. It would nice to know what the non-sequence was and that it was supposed to be a list. The behavior of getTitle is even less satisfactory from a programmer's perspective: it silently returns garbage! This will surely result in a problem at some later point in the program that calls this function, and it may be difficult then to determine where the error originated.

    Modify these functions so they handle bad arguments more appropriately, as illustrated by

    >>> numbersToStrings(3)
    
    Traceback (most recent call last):
      File "", line 1, in -toplevel-
        lab5.numbersToStrings(3)
      File "C:\home\202\a\5\lab5.py", line 11, in numbersToStrings
        raise 'Not a list: ' + str(numList)
    Not a list: 3
    >>> numbersToStrings((3,4))
    ['3', '4']
    >>> numbersToStrings(['three', False])
    ['three', 'False']
    >>> getTitle('<html><body><titl>Holy Grail</titl>')
    
    Traceback (most recent call last):
      File "<pyshell#19>", line 1, in -toplevel-
        lab5.getTitle('<html><body><titl>Holy Grail</titl>')
      File "C:\home\202\a\5\lab5.py", line 26, in getTitle
        raise 'No ' + endTitleTag + ' in ' + string
    No </title> in <html><body><titl>Holy Grail</titl>

    This is much better, but does not try to catch every possible error. It turns out, at least the way the sample program was written, that numbersToStrings will work properly given a tuple instead of a list. There is probably no harm in allowing this, and it might be appreciated. (However, using features of a program that are not documented, but just happen to work, is always dangerous. Such features may, without warning, fail to work when the program is revised.) Also observe that this function actually converts a list of any values at all, not just numbers, to their printed representations. Taking advantage of this is even more likely to lead to trouble. If you wish, you may add a test to generate an error if the sequence elements are not numbers. Hint: the expression type(x) == list only returns true if x is a list. Corresponding expressions may be used to test for other types (such as int, float, or str).

    Though it is often handy when the values of offending data are printed as part of an error message, it is possible that the bad data may be such a large data structure that it takes many pages to print its value! Though this sort of problem must sometimes be taken seriously, techniques for dealing with it are beyond the scope of this assignment.

When you have completed the last exercise above, or 15 minutes before the lab ends, whichever comes first, submit your lab5.py file as lab 5 in Vincent.

Assignment

  1. Modify the wc.py program of assignment 4 to print an error message for each argument that is not the name of a readable file, but continue to process other files.
    >python wc.py test.txt Monty Python wc.py
          1       4      23 test.txt
    Error: Monty is not the name of a readable file
    Error: Python is not the name of a readable file
         34     120    1006 wc.py
         35     124    1029 total
  2. Add to wc.py the initials, date, and percentToLetterGrade functions from assignment 3, and make them robust. They should raise an appropriately named exception if their argument is not of the proper type and form, and include the argument (converted to a string if necessary) in the error message.
Hints
  1. You could use the os.access function in an if statement to determine if a file exists with read permission, but this will not catch (much less common) errors such as an error due to a bad disk that causes the file read to fail. It is often better to use a try statement to catch all problems with processing a given file.
  2. The split method makes the date function easier.

When you are done, submit your final wc.py file as a5 using Vincent.