A202 / I211 Assignment 4

Word-count application

Due Thursday, September 30th, 3:00 PM

Use pair-programming in lab and the assignment this week. As always, you partner has to be someone in your lab, and this time your partner must be different from your partner in the second assignment.

Recall from class the STAIR steps to problem solving:

  1. State the problem (understand it thoroughly)
  2. Review the Tools you have to solve the problem
  3. Devise an Algorithm
  4. Implement your algorithm
  5. Test and Refine your solution, returning to previous steps as necessary
     

In lab

  1. In a file named lab4.py define application that takes as its arguments a word and a file name and prints the number of occurrences of the word in the file. Use the file read method to read the entire file at once. To test your application, open a Windows shell using Start > Run > cmd. Then change the shell's current directory to the one your lab4.py program is in. Assuming you have put it in directory named lab4 on your CFS drive, the shell command for this is cd m:\lab4. Also create a plain-text file in this directory named something like test.txt to use as test input. You can use the IDLE editor for this if you like. Then you can test your application with a shell command assuming test.txt contains two instances of the word Monty.
    C:\home\202\a\3>python lab4.py Monty test.txt
    2

    Hints:

    1. Recall the first application example in class that prints the sum of its arguments
      C:\home\202\src>python sum.py 1 2 3.3
      6.3
      With solution:
      import sys
      def main():
          sum = 0
          for arg in sys.argv[1:]:
              sum += float(arg)
          print sum
      main()
    2. There is a string module function that counts substrings. Use the on-line documentation to find it. Whenever a problem calls for doing something that seams like it is a simple-to-specify and frequently useful operation such as this, check the library module documentation. You will often find it has been done for you! Such re-use of library code is one of the most important keys to managing software complexity.
  2. Modify your solution to the problem above so that the file is read a line at a time. Hint: use the readlines method, instead of read, and use an accumulator to count occurrences as you iterate over the lines with a loop.

When you have completed the last exercise above, or 15 minutes before the lab ends, whichever comes first, submit your lab4.py file as lab 4 in Vincent.

Assignment

  1. In a file named wc.py, define a main method and call to this method at the end of the file that turn the file into an application. Of course feel free to add to the file any import statements, global variables, and comments appropriate to your program. The application's arguments are file names and it prints for each one a line with the number of lines, words, and characters in the file, followed by the file name. Finally, it prints the total number of lines, words, and characters in all the files, with the word total instead of a file name. A word is defined to be any sequence of characters surrounded by whitespace. The line, word, and character counts are right justified in seven column fields, each followed by a space, and then the file name.
    C:\home\202\a\3>python wc.py futval.py test.txt wc.py
         22     103     699 futval.py
          2       4      22 test.txt
         52     223    1683 wc.py
         76     330    2404 total
    
Hints
  1. Don't know how to write the whole thing? Then start by solving a simpler problem that has some of the same elements. In this case, start by writing a "word-count" application that only takes one file name and prints the counts associated with that file. Loop over the lines in the file with three accumulators.
  2. Now finish the assignment, adding an outer loop with three more accumulators for the totals.

Motivation: Besides Windows, the other most common family of operating systems is Unix (and Linux is the most popular flavor of Unix). This application's behavior is similar to the popular Unix command wc (word count). wc counts newline (a.k.a. line feed) and return characters as separate, while wc.py counts them as one character if they appear together at the end of a line. Though wc is smart enough not to print a total when there is only one (or zero) file names, that requires a conditional test that has not been introduced yet in this course, so print the total line in every case. If there are no file name arguments, the totals will be zero.

When you are done, submit your final wc.py file as a4 using Vincent. As always, if you cannot finish all of the assignment, be sure to submit, before the due time, a version of your file that does as much as you can get working.