A202 / I211 Assignment 8

Word to txt format conversion

Due Thursday, October 28th, 3:00 PM

Pair programming in this assignment if you like. You may choose any partner in your lab, or work on your own if you prefer.

In lab

  1. Write a simple application lab8.py with a main method that takes a keyword argument named test that defaults to False. If the argument is false, print the message Testing one two, and otherwise prints Not testing. As usual, end the program with a main method call with no arguments. Try it with F5 and confirm it prints the not testing message. Try it again after changing it to main(test=True) and, also with main(test=False), main(True), and main(False).
  2. Add to your program a function openFile that takes three keyword arguments: fileName, prompt, and mode, which default to None, 'File name: ', and 'r', respectively. It uses the prompt to input a file name if there is none, and then tries to open the file with the given (or default) mode. If the open fails with exception class IOError (most likely because the file name is bad), it prints the message Bad file: , followed by the file name, prompts for a new file name as if none was given, and tries again to open the file. This is repeated until the file open succeeds. Test this with some code in your main method that is invoked if you are in testing mode.
  3. Add a function replaceTabs that takes input file and an output file objects as arguments and copies the contents of the input file (which is assumed to be a plain text file) to the output file a line at a time, replacing all tabs with spaces in each line. To specify the number of spaces to replace each tab with, the function has an optional third keyword argument named numSpaces, which defaults to eight. Write this function using a for statement of the form for line in infile: ..., which will actually work as well if infile is a sequence of strings ending with newline characters, rather than a file object.

    Test this function by calling it with a string sequence in place of the input file argument and sys.stdout as the output file. (sys.stdout is a special  file object that sends whatever is written to it on to the program's standard output. By default, standard output is the userf's console. The console may be the IDLE shell window or an operating system command shell window.) Try it with and without specifying the number of spaces in the call.
  4. Modify the application so that its usage is python lab8.py [ infilename outfilename ].  That is, it take either zero or two command-line arguments indicating the input and output file names. If the file names are omitted, the application inputs them using the prompts Input file name:  and Output file name:. The files are opened using your openFile function. The application copies the input file to the output file, replacing tabs with eight spaces.

    You will need to create a test file with tabs in it. You can do this either with a plain-text editor that does not insert spaces when you use the tab key (such as Notepad, but not the IDLE in its default configuration), or by adding test code that writes such a file from a string literal.

No examples are provided for this lab exercise. This is deliberate. Examples are often helpful, but it is important to learn to read specifications very carefully and do exactly what is specified. Examples seldom can convey everything in a specification, so the specification needs to be read carefully in any case. Ask your lab instructor if these instructions are not clear to you.

When you have completed the last exercise above, or 15 minutes before the lab ends, whichever comes first, submit your lab8.py file as lab 8 in Vincent. (The file you submit will reflect your effort on part 3 above, not part 2.) If you have time, start the assignment below.

Assignment

To motivate this assignment, cut several paragraphs of text from a Word document or web page and paste them into a plain-text editor such as an IDLE edit window. Notice that the result is unusable because all the characters in a paragraph are usually in one long line. This is not a problem in a web browser or word processors such as Word because they automatically wrap long lines by starting new lines at appropriate points where there are spaces.

For display as plain text it is necessary to explicitly change spaces to line breaks in appropriate places so the text fills a given number of columns as much as possible without going past the given number of columns. If the text contains a word (sequence of characters unbroken by a space) that is longer than the column limit, that word should be on a line of its own which cannot help extending beyond the column limit.

Write an application named word2txt.py that satisfies the following concise documentation:

Usage: python word2txt.py [-w width | /w width] infile [outfile]
    Copy infile to outfile wrapping long lines for a given maximum line length
    (default 80). outfile defaults to standard output.
    Perform unit if test is true.
In your program use the openFile function developed in lab and write a function wrapLines that takes an input and output files as in the replaceTabs function, and a keyword width argument that specifies the wrapping column width, which defaults to 80. If the application is called with improper usage, print the above concise documentation and terminate the application. If the file names are bad, the openFile function will prompt repeatedly for a file name until a good one is entered. Note that either Unix- or Window-style switches are supported, and they are optional, and the output file name is optional as well. The square brackets mean optional, as usual, and the vertical bar means one or the other of the possibilities on either side of it.

Paragraph breaks are indicated in the input by a new line, and in the output by a blank line.

If there are word breaks consisting of multiple whitespace characters, you may handle this in a variety of ways, such as lines that appear to break prematurely (as in the example below) and reducing multiple whitespace characters to one throughout. But one thing is required: this should never result in an output line that contains only whitespace, for that looks like a blank line, which indicates a new paragraph. A blank line should be output only when going from one input line to the next.

For example, given that the file test.txt contains

This is a file
with_a_very_long_word
and      extra spaces. Try with a width of 6.

A second paragraph.
a partial test follows:
>python word2txt.py /w 12 test.txt out.txt
>type out.txt
This is a
file
with_a_very_long_word
and
extra
spaces. Try
with a width
of 6.
>python word2txt.py test.txt
This is a file
with_a_very_long_word
and      extra spaces. Try with a width of 6.
>python word2txt.py -w 6 test.txt
This
is a
file
with_a_very_long_word
and
extra
spaces.
Try
with a
width
of 6.

A second
paragraph.
>python word2txt.py
Usage: python word2txt.py [-w width | /w width] infile [outfile]
    Copy infile to outfile wrapping long lines for a given maximum line width
    (default 80). outfile defaults to standard output.
    Perform unit if test is true.

Your application's main method should take a keyword test parameter that defaults to false as in the lab application. Use this to include a test of your wrapLines function. You will want to change the call to main at the end of the program to perform this test, but be sure to change it back to a call to main with no arguments before submitting your assignment. (If is fine to leave testing code in a program you submit, but it should always be disabled either by commenting it out or bypassing it as in this case with a test that is false when the program is run as it is submitted.)

When you are done, submit your final word2txt.py file as a8 using Vincent.

Hints:

  1. If the optional switch arguments are present, after processing them you can delete them from sys.argv so you can process the file name(s) the same way whether the switch is present or absent.
  2. The string.rfind function is useful.
  3. Work some line wrapping examples by hand, being sure to test exceptional conditions such as long words. Only try to program something when you are confident you know how to do it manually. Then programming is just teaching the computer what you already know.