A202 / I211 Assignment 9

txt to Word format conversion

Due Thursday, November 4th, 3:00 PM

Pair programming in this assignment if you like. You may choose any partner in your lab, or work on your own if you prefer.

In lab

Motivation: Writing custom code to process optional command switches and keyword arguments is tedious and error prone, even if the order of the arguments is fixed, and it is considerably more involved if they may appear in any order, as should be the case. In such situations the thing to do is to abstract the problem and solve it once and for all time. In this lab we do that for the processing of command keywords and arguments. In the process you will get practice using dictionaries, keyword arguments, loops, and conditional logic.

We assume the usual form of Unix and Windows command usage: a collection of switch and keyword options, followed by some number of other arguments. Switch and keyword options are both introduced by an argument identifying the option, which starts with a dash (for folks used to Unix) or a slash (for Windows folks), followed by an identifier (a sequence of characters without whitespace). A switch is just one argument, and by its presence indicates an associated option value of True rather than a default value of False, while a keyword option consists of the identifying argument followed by an argument that provides the associated option value (always a string), instead of a default string value associated with the option. For example, the width option in the last assignment was a keyword option and the Windows dir command /b option used as an example in class is a switch option.

Write a function named commandArgs, in a package with the same name (hence in a file name commandArgs.py), that takes a list, args, of command arguments (typically sys.argv[1:]) , followed the following (optional) keyword arguments:

  1. minArgs, indicating the minimum number of non-option arguments that can follow the option arguments in args (default zero),
  2. maxArgs, indicating the maximum number of non-option arguments that can follow the option arguments in args (default None, indicating no limit),
  3. switches, a list of switch identifiers (without the slash or dash prefix), which defaults to the empty list, and
  4. keywords, a dictionary associating keyword option identifiers (again without prefix) with their default values (strings), which defaults to the empty dictionary. Assume the keyword and switch identifiers are all different.

This functions returns a triple (a three-element tuple) containing a dictionary, a list of the non-option arguments, and a list of error message strings (in that order, of course). The dictionary associates each switch and keyword identifier with either their default value, if the option is omitted in args, and otherwise the indicated value (True for a switch and the following argument for a keyword option).

Do not bother to detect option arguments that might appear in error after the first non-option argument.

For example, the following (minimal) test code

def main():
    switches = ['a', 'b', 'c']
    keywords = {'k1': 'k1_value', 'k2': 'k2_value'}
    dict, args, errors = commandArgs(sys.argv[1:], 3, switches=switches,
                                     keywords=keywords)
    print dict
    print args
    print errors
       
if __name__ == '__main__':
    sys.argv = '_ /c -k2 new_k2_value -a /bogus more args'.split()
    main()
prints
{'a': True, 'k2': 'new_k2_value', 'k1': 'k1_value', 'b': False, 'c': True}
['more', 'args']
['Not a keyword or switch: /bogus', 'Not enough non-option arguments']

As always, break the problem into smaller chunks and solve as many as you have time for. You might start with just handling switches, then add the minimum and maximum number of arguments, and finally if there is time work on the keyword arguments.

A solution will be posted shortly after the last lab that you can use in the assignment below (and any that follow in the course).

When you have completed the last exercise above, or 15 minutes before the lab ends, whichever comes first, submit your commandArgs.py file as lab 9 in Vincent. Also, before the end of the lab take time to read the assignment below and ask if after some reflection it is not clear what is required.

Assignment

In the last assignment you implemented an application that addresses the common need to transform text cut from a Word document, web page, or other text viewer with automatic line wrapping into a format with explicit line wrapping, which is suitable for plain text display environments that do not do automatic line wrapping.

In this assignment you are to write an application for the equally common task of reversing this process. That is, taking text that has explicit line breaks into a form that is suitable for environments that do paragraph formatting by starting new lines at points that fit the current text display width (which cannot be predicted in advance). For example, if you cut text from most email reading environments and paste it into a Word document you will have lots of line breaks in the wrong place, which are very tedious to delete one-by-one. (Try it of you aren't convinced!)

Once complication that sometimes needs to be addressed is that some of the original text may be of a form, such as program fragments or other examples of plain-text output, in which line breaks need to be preserved. This can be accommodated with simple markers to indicate where the "preformatted" text begins and ends.

With the following (non-indented) text in the file in.txt

This is a file
to test txt2word.py
which has lots of
short lines
.
and some lines
between the .
markers.
 .
But the line above is not a marker
. 
to test marker
handling. You can't see it, but
there is a blank after the above marker
to test right stripping.

This is a second paragraph
which should be
on a new line.
  
And a third paragraph with
a line containing just spaces before
it.
the (minimal) test code
    sys.argv = '_ -h'.split()
    main()
    sys.argv = '_ -b in.txt'.split()
    main()
produces the following output:
    Usage: python txt2word.py [options] [infilename [outfilename]]
    where options is one or more of
    -m marker  defines the marker (default '.'), see below
    -b         paragraphBlankLine (default False), see below
    -h         print this help message and quit
    Options may be prefixed by / (Windows style) instead of -.
    infilename and outfilename default to standard input and standard
    output, respectively.
    Normally, trim lines and turn newlines to spaces, unless a line
    is empty (when trimmed), in which case the current line is terminated and,
    if paragraphBlankLine is True, a blank line is output.
    Exception: lines between marker lines, which contain only
    the marker when trailing whitespace is removed, are output verbatim,
    and the marker lines are not output.
    An error is raised if there is an unmatched marker.
    
This is a file to test txt2word.py which has lots of short lines 
and some lines
between the .
markers.
 .
But the line above is not a marker
to test marker handling. You can't see it, but there is a blank after the above marker to test right stripping. 

This is a second paragraph which should be on a new line. 

And a third paragraph with a line containing just spaces before it. 
Implement an application that is consistent with the above usage message and behavior.

Use a helper function that begins:

def txt2word(infile=sys.stdin, outfile=sys.stdout, paragraphBlankLine=False,
             marker='.'):

When you are done, submit your final txt2word.py file as a9 using Vincent.