MIME-Version: 1.0 Content-Location: file:///C:/F48430D4/Week4.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii" Week 4

Week = 4

Files and Applications

Indiana University

Computer Science A202 / A598

and Informatics I211

 

This week’s success strategy

<= ![if !supportLists]>n    A leading scientific study examined a great many factors to see what most contributed to student success in college. The three leading factors contributing to success were:

<= ![if !supportLists]>q   <= /span>Time on task: Ok, so this one is obvious. But it's hard to put in time studying if you're not enjoying it, because you're not succeeding, because you're not studying effectively and taking advantage of the other factors contributing to success. <= /span>

<= ![if !supportLists]>q   <= /span>Studying with other students: Many students rate social life as the most valuable parts of their college experience, but they often fail to extend this to their studies. Studying together can be more fun, and more productive too. You can really relate to each others learning needs. And when you help a friend, the material becomes more lively in your own mind.

<= ![if !supportLists]>q   <= /span>Involvement with faculty: Most faculty wish students would seek their help more often. They have a deeper = and broader perspective on the material and learning process, and an enthusiasm= for the material, which they are eager to share with students. Naturally you le= arn more, and faculty enjoy working with you more, if you make a good effort to learn what you reasonably can by reading and attempting exercises on your o= wn. Then please do come with any questions you are stuck on. =

<= ![if !supportLists]>n    This and other strategies are on the course web strategies page!

This = week

<= ![if !supportLists]>n   Problem solving

<= ![if !supportLists]>n   Simple file I/O

<= ![if !supportLists]>n   Writing and using Python applications

<= ![if !supportLists]>q   <= /span>not covered by your text

<= ![if !supportLists]>n   Parameter passing

=  

=  

Probl= em solving

<= ![if !supportLists]>n      STAIR steps to problem solving

<= ![if !supportLists]>q  S= tate the problem (understand it thoroughly)

<= ![if !supportLists]>q  Revi= ew the Tools you have to solve the problem

<= ![if !supportLists]>n  = do this in conjunction with the next two steps as needed<= /span>

<= ![if !supportLists]>n  = some tools you should commit to memory, like much of appendix A=

<= ![if !supportLists]>n  = others you look up in books or on-line documentation as needed<= /o:p>

<= ![if !supportLists]>q devise an Algorithm

<= ![if !supportLists]>n  = a step-by-step procedure to solve the problem

<= ![if !supportLists]>n  may be expressed in = pseudo-code (structured English)

<= ![if !supportLists]>q  I= mplement your algorithm

<= ![if !supportLists]>q Test and Refine your solution as necessary<= o:p>

<= ![if !supportLists]>n  = return to previous steps as necessary

<= ![if !supportLists]>n      Try  to think of problem solving as a <= i>puzzle, not a struggle!

Probl= em solving example: multiplication table

<= ![if !supportLists]>n    Problem: print a multiplication table

  1   2   3   4   5   6   7   8   9

  2   4   6   8  10  12  14  16  18

  3   6   9  12  15  18  21  24  27

  4   8  12  16  20  24  28  32  36

  5  10  15  20  25  30  35  40  45

  6  12  18  24  30  36  42  48  54

  7  14  21  28  35  42  49  56  63

  8  16  24  32  40  48  56  64  72

  9  18=   27  36  45  54  63  72  81

<= ![if !supportLists]>n    Pseudo-code

<= ![if !supportLists]>q   <= /span>for each line of the table

<= ![if !supportLists]>n   &nb= sp; for each column of the table

<= ![if !supportLists]>q   &nb= sp; print the product corresponding to the row and column (right-justify= in 4 columns)

<= ![if !supportLists]>n    Solution

def multTable():

    for n in range(1, 10):

        for m in range(1, 10):

        =     print '%3d' % (n * m),

        print

Probl= em solving example: getTitle

<= ![if !supportLists]>n   &nb= sp; Problem: get title markup from HTML text

   <= span style=3D'font-size:6.0pt;font-family:"Courier New";mso-bidi-font-family:"Co= urier New"; mso-bidi-language:#AC45'>>>> getTitle("<html><head><title>Elmer's Wonderful Web</title></head>")
"Elmer's Wonderful Web"
<= o:p>

<= ![if !supportLists]>n   &nb= sp; Tools: string.lower and string.find methods

<= ![if !supportLists]>n   &nb= sp; Pseudo-code

<= ![if !supportLists]>q store lower-case copy of string in a local variable

<= ![if !supportLists]>q find start index of end of <title> tag in copy

<= ![if !supportLists]>q find end index of beginning of </title> tag in copy

<= ![if !supportLists]>q return substring that is the title using start and end indices<= /o:p>

<= ![if !supportLists]>n   &nb= sp; Solution

startTitleTag =3D '<title>'

endTitleTag =3D '</title>'

def getTitle(string):

    lowerStri= ng =3D string.lower()

    start =3D lowerString.find(startTitleTag) + len(startTitleTag)<= /p>

    end =3D lowerString.find(endTitleTag)

    return string[start : end]

Probl= em solving example: numbersToStrings

<= ![if !supportLists]>n      Problem: convert list of numbers to list of strings

>>> numbersToStrings([1, 2, 3.5])

['1', '2', '3.5']

<= ![if !supportLists]>n      Pseudo-code

<= ![if !supportLists]>q initialize accumulator to the empty list

<= ![if !supportLists]>q for each element of the number list

<= ![if !supportLists]>n  = convert the number to a string and append it to the string list=

<= ![if !supportLists]>q return the accumulated list

<= ![if !supportLists]>n      Solution

def numbersToStrings(numList):

    strList = =3D []

    for n in numList:

        strList +=3D [str(n)]

    return st= rList

Probl= em solving problem: reverse

<= ![if !supportLists]>n   Write a function reverse that takes a list and returns a list with the same elements, but in reverse order

>>> reverse([1, 2, 3])

[3, 2, 1]

<= ![if !supportLists]>n   A solution

def reverse(lst):

    newList = =3D []

    for i in range(len(lst)-1, -1, -1):

        newList.append(lst[i])

    return ne= wList

What'= s a file, and why have them?

<= ![if !supportLists]>n    A file is a just a sequence of bytes

<= ![if !supportLists]>q   <= /span>file input does not involve data type checking

<= ![if !supportLists]>n   &nb= sp; file name extensions often suggest the type of data in the file

<= ![if !supportLists]>q     examples: paper.doc, notes.txt, and program.exe

<= ![if !supportLists]>q   <= /span>meaningless results if files are not read in a way that is consistent with the way they were written

<= ![if !supportLists]>n    Files are usually stored on disk=

<= ![if !supportLists]>q   <= /span>disk storage is persistent

<= ![if !supportLists]>n   &nb= sp; data on disk is not lost when the power is turned off

<= ![if !supportLists]>n   &nb= sp; data on disk is usually not lost when a system "crashes"

<= ![if !supportLists]>n   &nb= sp; most computer "main" memory is volatile (not persistent)

<= ![if !supportLists]>q   <= /span>disk storage is much less expensive per byte than computer memory

<= ![if !supportLists]>n   &nb= sp; disk storage is much larger than main memory

<= ![if !supportLists]>n     example: 512MB memory f= or $60 vs 200GB disk for $90, so this memory is about 250 times cheaper

<= ![if !supportLists]>q   <= /span>disks are slower than memory. How much?

How f= ast, or slow?

<= ![if !supportLists]>n    Disk random access is about a million times slo= wer than memory !!

<= ![if !supportLists]>q   <= /span>random access: reaching any (randomly chosen) element in the same amount of time

<= ![if !supportLists]>q   <= /span>disks are not perfectly random access

<= ![if !supportLists]>n   &nb= sp; depends on the track it is on, and even where the disk is in rotatio= n

<= ![if !supportLists]>q   <= /span>disk data transfer can be pretty fast, so data is often moved in blo= cks of a few thousand bytes at a time

<= ![if !supportLists]>q   <= /span>disk "pseudo-random" access times are 10's of milliseconds=

<= ![if !supportLists]>q   <= /span>memory access times are 10's of nanoseconds

<= ![if !supportLists]>n    Digression: small units of time<= /p>

<= ![if !supportLists]>q   <= /span>millisecond: a thousandth (10**-3)= of a second

<= ![if !supportLists]>q   <= /span>microsecond: a millionth (10**-6) = of a second

<= ![if !supportLists]>q   <= /span>nanosecond: a billionth (10**-9) = of a second

<= ![if !supportLists]>n   &nb= sp; light travels about a foot in a nanosecond

File objects

<= ![if !supportLists]>n   Before reading or writing a file is necessary to open it<= /p>

<= ![if !supportLists]>q   <= /span>this creates a file object that is used for file I/O

<= ![if !supportLists]>q   <= /span>the file object keeps track such things as

<= ![if !supportLists]>n   &nb= sp; your position in the file

<= ![if !supportLists]>n   &nb= sp; whether you are allowed to read or write the file<= /p>

<= ![if !supportLists]>n   &nb= sp; data you may have recently written to the file

<= ![if !supportLists]>q   <= /span>it also locks the file on disk so other programs cannot use it while it is open

<= ![if !supportLists]>n   When you are done with a file, you should close it

<= ![if !supportLists]>q   <= /span>writes any unsaved data to disk<= /p>

<= ![if !supportLists]>q   <= /span>unlocks it so others can use it<= /p>

<= ![if !supportLists]>q   <= /span>releases a limited operating system resource associated with each open file in the whole system

<= ![if !supportLists]>q   <= /span>supposedly done automatically when your program terminates, but don't count on it, and that may not be soon enough

Openi= ng and closing files

<= ![if !supportLists]>n&nb= sp;  Function open(<file name> [, <mode>])

<= ![if !supportLists]>q   <= /span>returns a file object

<= ![if !supportLists]>q   <= /span>both arguments are strings

<= ![if !supportLists]>q   <= /span>the optional mode is a string indicating how the f= ile will be used

<= ![if !supportLists]>n   &nb= sp; ‘r’ for reading the file (= the default)

<= ![if !supportLists]>n   &nb= sp; ‘w’ for writing the file (first removing any existing file of the same name)

<= ![if !supportLists]>n   &nb= sp; ‘a’ for appending (writing, starting at the end of an existing file)

<= ![if !supportLists]>n   &nb= sp; a few other possibilities we won't go into

<= ![if !supportLists]>n   When done with an open file object, close it using a close() method call<= o:p>

=  

 =

"File variable" terminology in your text is misleading

<= ![if !supportLists]>n   File method calls are messages to a file object, not a variable

<= ![if !supportLists]>q   <= /span>even though file objects are usually stored in variables

<= ![if !supportLists]>n   Sometimes it is important to know the difference between a variable and the value it contains

<= ![if !supportLists]>q   <= /span>for example all variables are mutable, but they may contain objects that are not mutable

<= ![if !supportLists]>n   &nb= sp; you can assign to a variable containing a string, but not change a string

<= ![if !supportLists]>q   <= /span>multiple variables may contain the same object

<= ![if !supportLists]>n   &nb= sp; this is called aliasing: more on this in chapter 6=

Some = file object methods

<= ![if !supportLists]>n   read() returns the entire remaining contents of the file in a string

<= ![if !supportLists]>q   <= /span>"the file" is of course the object (or "target") of the method call

<= ![if !supportLists]>n   readline() returns the next line of the file in a string=

<= ![if !supportLists]>q   <= /span>newline character is included at the end of the st= ring

<= ![if !supportLists]>n   &nb= sp; on Microsoft operating systems, if a return character just precedes a newline character in a file, it is ignored (not placed in the string)<= /o:p>

<= ![if !supportLists]>q   <= /span>it is not named readLine<= /span>

<= ![if !supportLists]>n   readlines() returns a list of the remaining lines of the file

<= ![if !supportLists]>n   write(string) writes the contents of the string to the file=

<= ![if !supportLists]>q   <= /span>to write a line there must be a newline character = at the end of the string

<= ![if !supportLists]>n   close() closes the file and returns nothing

<= ![if !supportLists]>q   it is an error to read = or write a closed file

File processing example

<= ![if !supportLists]>n    Problem: read a file named usernames with o= ne line per username and output a corresponding fie named iuaddresses containing corresponding IU email addresses.

<= ![if !supportLists]>n    Solution:

def main():

    inFile =3D open('usernames', 'r')

    outFile = =3D open('iuaddresses', 'w')

   

    for line = in inFile.readlines():

        outFile.write(line.strip() + '@indiana.edu\n')=

 

    inFile.cl= ose()

    outFile.c= lose()

 

main()

Opera= ting system command shells

<= ![if !supportLists]>n   &nb= sp; Most operating systems provide a command line interface<= /o:p>

<= ![if !supportLists]>q    they are commonly called a shell, since it may be thought of = as containing much of the operating system power

<= ![if !supportLists]>q    shells  provides an alternative to a GUI (Graphic User Interface) for many operations

<= ![if !supportLists]>q    some things can be done only via a shell, or more conveniently than = via a GUI

<= ![if !supportLists]>q    shells make it much easier to automate many operations, especially w= hen dealing with files

<= ![if !supportLists]>n   &nb= sp; The shell repeatedly prints a prompt, reads command line text, interprets certain special characters, and invokes the indicated command wi= th the given arguments

<= ![if !supportLists]>q    a Windows CoM= manD shell may be started with Start > Run > cmd<= /p>

<= ![if !supportLists]>q    command line syntax: <command name> <argument> …

<= ![if !supportLists]>n   &nb= sp;  here <argument> …  means zero or more arguments separated by spaces

<= ![if !supportLists]>n   &nb= sp; Some common Windows commands

<= ![if !supportLists]>q    change directory: cd <directory name>=

<= ![if !supportLists]>q    output contents of file= : type <file name>

<= ![if !supportLists]>q    list information about files: dir <file name> …

<= ![if !supportLists]>n   &nb= sp;  list information about = all (non-hidden) files in the current directory if no file name is given

<= ![if !supportLists]>n   &nb= sp; Applications are programs that can be run as shell commands

Python applications

<= ![if !supportLists]>n    (this is not in your text)

<= ![if !supportLists]>n    Python programs can be invoked as applications

<= ![if !supportLists]>q   <= /span>command line syntax: python <Python file> <argument&= gt; …

<= ![if !supportLists]>n    Arguments are passed to the program as a list of strings in sys.argv (variable argv of module sys)=

<= ![if !supportLists]>q   <= /span>this often eliminates the need to prompt for information<= /span>

<= ![if !supportLists]>q   <= /span>sys.argv[0] is the Python file name from the command line

<= ![if !supportLists]>q   <= /span>v in argv stands for vector, another name for a list or array

<= ![if !supportLists]>q   <= /span>module sys contains a lot of other "system" related stuff

<= ![if !supportLists]>n    Python applications, like Python statements, do not return anything

<= ![if !supportLists]>q   <= /span>actually, an error code may be returned, but it is usually ignored

<= ![if !supportLists]>n    Example: tsvToCsvApp.py takes file names = as arguments, instead of prompting for them

<= ![if !supportLists]>q   try the command: pyt= hon tsvToCsvApp.py address.tsv address.csv

Exerc= ise

<= ![if !supportLists]>n   Write a Python application that prints the sum of its arguments=

C:\home\202\src>python sum.py 1 2 3.3

6.3  =

<= ![if !supportLists]>n   Answer

import sys

def main():

    sum =3D 0=

    for arg in sys.argv[1:]:

        sum +=3D float(arg)

    print sum=

main()

Digre= ssion: tsv and csv formats

<= ![if !supportLists]>n    Data in tab-separated-value (tsv) format is common=

<= ![if !supportLists]>q   each line is a data = record

<= ![if !supportLists]>q   <= /span>fields (values) in each record are separated by tabs

<= ![if !supportLists]>q   <= /span>example: ad= dress.tsv

Sue Smith    &nb= sp;        10 Arbor Way  Pleasant View NY

John Jones    &n= bsp; 1934 Main St. Summerville   TX

<= ![if !supportLists]>n    What is the advantage of this format?

<= ![if !supportLists]>n    Answer: data appears in nice columns if tabs are s= et appropriately

<= ![if !supportLists]>n    What is a disadvantage of this format?<= /span>

<= ![if !supportLists]>n    Answer: data appearance is a mess if tabs are not = set appropriately

<= ![if !supportLists]>q   <= /span>tabs may even look like spaces

<= ![if !supportLists]>n    Common alternative: comma-separated-value (csv) fo= rmat

<= ![if !supportLists]>q   <= /span>uses commas instead of tabs

<= ![if !supportLists]>n    What if commas, or tabs, can be part of field valu= es?

File processing application example

<= ![if !supportLists]>n    Application tsvToCsv1.py converts files from tsv to csv format<= /span>

<= ![if !supportLists]>q   example: m:\>python tsvToCsv1.py address.tsv address.csv=

<= ![if !supportLists]>n    Problem: what if the file is too big to fit in mem= ory (twice!) ?

<= ![if !supportLists]>q   <= /span>solution: read and write a line, or even a character, at a time=

<= ![if !supportLists]>n   &nb= sp; a bit more complicated to program and requires conditional tests

<= ![if !supportLists]>n    Problem: what if a name contains a comma?

<= ![if !supportLists]>q   <= /span>fairly common, for example: Martin Luther King, Jr.

<= ![if !supportLists]>q   <= /span>common solution: put double quotes around field values

<= ![if !supportLists]>n   &nb= sp; assumes field values do not contain double quotes<= /p>

<= ![if !supportLists]>q   <= /span>exercise: modify tsvToCsv1.py to quote fields

<= ![if !supportLists]>n     example: "Sue Smith","10 Arbor= Way","Pleasant View","NY"

<= ![if !supportLists]>n   &nb= sp; solution tsvToCsv.py

 

lab3.py

import sys

def multTable():
    for n in range(1, 10):
        for m in range(1, 10):
            print '%3d' % (n * m),
        print

def numbersToStrings(numList):
    strList =3D []
    for n in numList:
        strList +=3D [str(n)]
    return strList

startTitleTag =3D ''
endTitleTag =3D ''

def getTitle(string):
    lowerString =3D string.lower()
    start =3D lowerString.find(startTitleTag) + len(startTitleTag)
    end =3D lowerString.find(endTitleTag)
    return string[start : end]


tmp.py

def main():

    inFile =3D open ('usernames', 'r')

    outFile=3D open ('iuaddresses', 'w')

    for line in inFile.readlines():

        outFile.write(line.strip() + '@indiana.edu\n')

    inFile.close()

    outFile.close()

main()

tsvToCsv.py

'''
Usage: python tsvToCsv.py  

Convert file named  in tab-separated-value (tsv) format to a new
one named  in comma-separated-value (csv) foramt.
Comma-separated values are enclosed in double quotes,
which are assumed not to be in the original file.
'''

import sys

def main():
    inputFileName, outputFileName =3D sys.argv[1:]

    inputFile =3D open(inputFileName)
    tsvTextLines =3D inputFile.readlines()
    inputFile.close()

    outputFile =3D open(outputFileName, 'w')
    for line in tsvTextLines:
        line =3D line[:-1].replace('\t', '","')
        outputFile.write('"' + line + '"\n')
    outputFile.close()

main()

tsvToCsv1.py

'''
Usage: python tsvToCsv.py  

Convert file named  in tab-separated-value (tsv) format to a new
one named  in comma-separated-value (csv) foramt.
'''

import sys

def main():
    inputFileName, outputFileName =3D sys.argv[1:]

    inputFile =3D open(inputFileName)
    tsvText =3D inputFile.read()
    inputFile.close()

    csvText =3D tsvText.replace('\t', ',')

    outputFile =3D open(outputFileName, 'w')
    outputFile.write(csvText)
    outputFile.close()

main()

usernamesToIUAddresses.py

'''
Read a file named usernames with one line per username and output a
corresponding fie named iuaddresses containing corresponding IU email
addresses.
'''

def main():
    inFile =3D open('usernames', 'r')
    outFile =3D open('iuaddresses', 'w')

    for line in inFile.readlines():
        outFile.write(line.strip() + '@indiana.edu\n')

    inFile.close()
    outFile.close()

main()