MIME-Version: 1.0 Content-Location: file:///C:/F48430D4/Week4.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii"
Week =
4
Files and Applications
Computer Science A202 /
A598
and Informatics I211
This
week’s success strategy
<=
![if !supportLists]>n =
span>A leading scientific study examined a great many
factors to see what most contributed to student success in college. The
three leading factors contributing to success were: <=
![if !supportLists]>q <=
/span>Time on task: Ok, so this one is
obvious. But it's hard to put in time studying if you're not enjoying it,
because you're not succeeding, because you're not studying effectively and
taking advantage of the other factors contributing to success. <=
![if !supportLists]>q <=
/span>Studying with other students: Many
students rate social life as the most valuable parts of their college
experience, but they often fail to extend this to their studies. Studying
together can be more fun, and more productive too. You can really relate to
each others learning needs. And when you help a friend, the material becomes
more lively in your own mind. <=
![if !supportLists]>q <=
/span>Involvement with faculty: Most
faculty wish students would seek their help more often. They have a deeper =
and
broader perspective on the material and learning process, and an enthusiasm=
for
the material, which they are eager to share with students. Naturally you le=
arn
more, and faculty enjoy working with you more, if you make a good effort to
learn what you reasonably can by reading and attempting exercises on your o=
wn.
Then please do come with any questions you are stuck on. <=
![if !supportLists]>n =
span>This and other strategies are on the course web
strategies page! This =
week <=
![if !supportLists]>n Problem
solving <=
![if !supportLists]>n Simple
file I/O <=
![if !supportLists]>n Writing
and using Python applications <=
![if !supportLists]>q <=
/span>not covered by your text <=
![if !supportLists]>n Parameter
passing Probl=
em
solving <=
![if !supportLists]>n STAIR
steps to problem solving <=
![if !supportLists]>q S=
tate
the problem (understand it thoroughly) <=
![if !supportLists]>q Revi=
ew the
Tools you have to solve the problem <=
![if !supportLists]>n =
do this in conjunction with the next two steps as needed <=
![if !supportLists]>n =
some tools you should commit to memory, like much of appendix A <=
![if !supportLists]>n =
others you look up in books or on-line documentation as needed <=
![if !supportLists]>q devise an Algorithm <=
![if !supportLists]>n =
a step-by-step procedure to solve the problem <=
![if !supportLists]>n
may be expressed in =
pseudo-code
(structured English) <=
![if !supportLists]>q I=
mplement
your algorithm <=
![if !supportLists]>q Test and Refine your solution as necessary<=
o:p> <=
![if !supportLists]>n =
return to previous steps as necessary <=
![if !supportLists]>n Try to think of problem solving as a <=
i>puzzle,
not a struggle! Probl=
em
solving example: multiplication table <=
![if !supportLists]>n =
span>Problem: print a multiplication table 1 2 3 4 5 6 7 8 9 2 4 6 8 10
12 14 16
18 3 6 9 12
15 18 21
24 27 4 8 12
16 20 24
28 32 36 5 10
15 20 25
30 35 40
45 6 12
18 24 30
36 42 48
54 7 14
21 28 35
42 49 56
63 8 16
24 32 40
48 56 64
72 9 18=
27 36 45
54 63 72
81 <=
![if !supportLists]>n =
span>Pseudo-code <=
![if !supportLists]>q <=
/span>for each line of the table <=
![if !supportLists]>n &nb=
sp; for each column of the table <=
![if !supportLists]>q &nb=
sp; print the product corresponding to the row and column (right-justify=
in
4 columns) <=
![if !supportLists]>n =
span>Solution def multTable(): for n in
range(1, 10): for
m in range(1, 10): =
print '%3d' % (n * m),
print Probl=
em
solving example: getTitle <=
![if !supportLists]>n &nb=
sp; Problem: get title markup from HTML text <=
span
style=3D'font-size:6.0pt;font-family:"Courier New";mso-bidi-font-family:"Co=
urier New";
mso-bidi-language:#AC45'>>>>
getTitle("<html><head><title>Elmer's Wonderful
Web</title></head>")
"Elmer's Wonderful Web"
<=
![if !supportLists]>n &nb=
sp; Tools: string.lower and string.find methods
<=
![if !supportLists]>n &nb=
sp; Pseudo-code
<=
![if !supportLists]>q store lower-case copy of string in a local variable
<=
![if !supportLists]>q find start index of end of <title> tag in copy
<=
![if !supportLists]>q find end index of beginning of </title> tag in copy
<=
![if !supportLists]>q return substring that is the title using start and end indices
<=
![if !supportLists]>n &nb=
sp; Solution
startTitleTag =3D '<title>'
endTitleTag =3D '</title>'
def getTitle(string):
lowerStri=
ng =3D
string.lower()
start =3D
lowerString.find(startTitleTag) + len(startTitleTag)
end =3D
lowerString.find(endTitleTag)
return
string[start : end]
Probl=
em
solving example: numbersToStrings
<=
![if !supportLists]>n Problem:
convert list of numbers to list of strings
>>> numbersToStrings([1, 2, 3.5])
['1', '2', '3.5']
<=
![if !supportLists]>n Pseudo-code
<=
![if !supportLists]>q initialize accumulator to the empty list
<=
![if !supportLists]>q for each element of the number list
<=
![if !supportLists]>n =
convert the number to a string and append it to the string list
<=
![if !supportLists]>q return the accumulated list
<=
![if !supportLists]>n Solution
def numbersToStrings(numList):
strList =
=3D []
for n in
numList:
strList +=3D [str(n)]
return st=
rList
Probl=
em
solving problem: reverse
<=
![if !supportLists]>n Write
a function reverse that takes a list and returns a list with the same
elements, but in reverse order
>>> reverse([1, 2, 3])
[3, 2, 1]
<=
![if !supportLists]>n A
solution
def reverse(lst):
newList =
=3D []
for i in
range(len(lst)-1, -1, -1):
newList.append(lst[i])
return ne=
wList
What'=
s a
file, and why have them?
<=
![if !supportLists]>n =
span>A file is a just a sequence of bytes
<=
![if !supportLists]>q <=
/span>file input does not involve data type checking
<=
![if !supportLists]>n &nb=
sp; file name extensions often suggest the
type of data in the file
<=
![if !supportLists]>q
examples: paper.doc,
notes.txt, and program.exe
<=
![if !supportLists]>q <=
/span>meaningless results if files are not read in a way that is consistent
with the way they were written
<=
![if !supportLists]>n =
span>Files are usually stored on disk
<=
![if !supportLists]>q <=
/span>disk storage is persistent
<=
![if !supportLists]>n &nb=
sp; data on disk is not lost when the power is turned off
<=
![if !supportLists]>n &nb=
sp; data on disk is usually not lost when a system "crashes"
<=
![if !supportLists]>n &nb=
sp; most computer "main" memory is volatile (not
persistent)
<=
![if !supportLists]>q <=
/span>disk storage is much less expensive per byte than computer memory
<=
![if !supportLists]>n &nb=
sp; disk storage is much larger than main memory
<=
![if !supportLists]>n
example: 512MB memory f=
or
$60 vs 200GB disk for $90, so this memory is about 250 times cheaper
<=
![if !supportLists]>q <=
/span>disks are slower than memory. How much?
How f=
ast,
or slow?
<=
![if !supportLists]>n =
span>Disk random access is about a million times slo=
wer
than memory !!
<=
![if !supportLists]>q <=
/span>random access: reaching any (randomly
chosen) element in the same amount of time
<=
![if !supportLists]>q <=
/span>disks are not perfectly random access
<=
![if !supportLists]>n &nb=
sp; depends on the track it is on, and even where the disk is in rotatio=
n
<=
![if !supportLists]>q <=
/span>disk data transfer can be pretty fast, so data is often moved in blo=
cks
of a few thousand bytes at a time
<=
![if !supportLists]>q <=
/span>disk "pseudo-random" access times are 10's of milliseconds=
<=
![if !supportLists]>q <=
/span>memory access times are 10's of nanoseconds
<=
![if !supportLists]>n =
span>Digression: small units of time <=
![if !supportLists]>q <=
/span>millisecond: a thousandth (10**-3)=
of
a second <=
![if !supportLists]>q <=
/span>microsecond: a millionth (10**-6) =
of a
second <=
![if !supportLists]>q <=
/span>nanosecond: a billionth (10**-9) =
of a
second <=
![if !supportLists]>n &nb=
sp; light travels about a foot in a nanosecond File
objects <=
![if !supportLists]>n Before
reading or writing a file is necessary to open it <=
![if !supportLists]>q <=
/span>this creates a file object that is used for
file I/O <=
![if !supportLists]>q <=
/span>the file object keeps track such things as <=
![if !supportLists]>n &nb=
sp; your position in the file <=
![if !supportLists]>n &nb=
sp; whether you are allowed to read or write the file <=
![if !supportLists]>n &nb=
sp; data you may have recently written to the file <=
![if !supportLists]>q <=
/span>it also locks the file on disk so other
programs cannot use it while it is open <=
![if !supportLists]>n When
you are done with a file, you should close it <=
![if !supportLists]>q <=
/span>writes any unsaved data to disk <=
![if !supportLists]>q <=
/span>unlocks it so others can use it <=
![if !supportLists]>q <=
/span>releases a limited operating system resource
associated with each open file in the whole system <=
![if !supportLists]>q <=
/span>supposedly done automatically when your program
terminates, but don't count on it, and that may not be soon enough Openi=
ng
and closing files <=
![if !supportLists]>n&nb=
sp;
Function open(<file name> =
b>[,
<mode>]) <=
![if !supportLists]>q <=
/span>returns a file object <=
![if !supportLists]>q <=
/span>both arguments are strings <=
![if !supportLists]>q <=
/span>the optional mode is a string indicating how the f=
ile
will be used <=
![if !supportLists]>n &nb=
sp; ‘r’ for reading the file (=
the
default) <=
![if !supportLists]>n &nb=
sp; ‘w’ for writing the file
(first removing any existing file of the same name) <=
![if !supportLists]>n &nb=
sp; ‘a’ for appending (writing,
starting at the end of an existing file) <=
![if !supportLists]>n &nb=
sp; a few other possibilities we won't go into <=
![if !supportLists]>n When
done with an open file object, close it using a close() method call<=
o:p> "File
variable" terminology in your text is misleading <=
![if !supportLists]>n File
method calls are messages to a file object, not a variable <=
![if !supportLists]>q <=
/span>even though file objects are usually stored in
variables <=
![if !supportLists]>n Sometimes
it is important to know the difference between a variable and the value it
contains <=
![if !supportLists]>q <=
/span>for example all variables are mutable, but they may
contain objects that are not mutable <=
![if !supportLists]>n &nb=
sp; you can assign to a variable containing a string, but not change a
string <=
![if !supportLists]>q <=
/span>multiple variables may contain the same object <=
![if !supportLists]>n &nb=
sp; this is called aliasing: more on this in chapter 6 Some =
file
object methods <=
![if !supportLists]>n read()
returns the entire remaining contents of the file in a string <=
![if !supportLists]>q <=
/span>"the file" is of course the object (or
"target") of the method call <=
![if !supportLists]>n readline()
returns the next line of the file in a string <=
![if !supportLists]>q <=
/span>newline character is included at the end of the st=
ring <=
![if !supportLists]>n &nb=
sp; on Microsoft operating systems, if a return character just precedes a
newline character in a file, it is ignored (not placed in the string) <=
![if !supportLists]>q <=
/span>it is not named readLine <=
![if !supportLists]>n readlines()
returns a list of the remaining lines of the file <=
![if !supportLists]>n write(string)
writes the contents of the string to the file <=
![if !supportLists]>q <=
/span>to write a line there must be a newline character =
at
the end of the string <=
![if !supportLists]>n close()
closes the file and returns nothing <=
![if !supportLists]>q
it is an error to read =
or
write a closed file File
processing example <=
![if !supportLists]>n =
span>Problem: read a file named usernames with o=
ne
line per username and output a corresponding fie named iuaddresses
containing corresponding IU email addresses. <=
![if !supportLists]>n =
span>Solution: def main(): inFile =3D
open('usernames', 'r') outFile =
=3D
open('iuaddresses', 'w') for line =
in
inFile.readlines():
outFile.write(line.strip() + '@indiana.edu\n') inFile.cl=
ose() outFile.c=
lose() main() Opera=
ting
system command shells <=
![if !supportLists]>n &nb=
sp; Most operating systems provide a command line interface <=
![if !supportLists]>q =
span>they are commonly called a shell, since it may be thought of =
as
containing much of the operating system power <=
![if !supportLists]>q =
span>shells provides an
alternative to a GUI (Graphic User Interface) for many
operations <=
![if !supportLists]>q =
span>some things can be done only via a shell, or more conveniently than =
via
a GUI <=
![if !supportLists]>q =
span>shells make it much easier to automate many operations, especially w=
hen
dealing with files <=
![if !supportLists]>n &nb=
sp; The shell repeatedly prints a prompt, reads command line text,
interprets certain special characters, and invokes the indicated command wi=
th
the given arguments <=
![if !supportLists]>q
a Windows CoM=
manD
shell may be started with Start > Run > cmd <=
![if !supportLists]>q =
span>command line syntax: <command name> <argument> … <=
![if !supportLists]>n &nb=
sp;
here <argument>
… means zero or more
arguments separated by spaces <=
![if !supportLists]>n &nb=
sp; Some common Windows commands <=
![if !supportLists]>q =
span>change directory: cd <directory name> <=
![if !supportLists]>q
output contents of file=
: type
<file name> <=
![if !supportLists]>q =
span>list information about files: dir <file name> … <=
![if !supportLists]>n &nb=
sp;
list information about =
all
(non-hidden) files in the current directory if no file name is given <=
![if !supportLists]>n &nb=
sp; Applications are programs that can be
run as shell commands Python
applications <=
![if !supportLists]>n =
span>(this is not in your text) <=
![if !supportLists]>n =
span>Python programs can be invoked as applications <=
![if !supportLists]>q <=
/span>command line syntax: python <Python file> <argument&=
gt;
… <=
![if !supportLists]>n =
span>Arguments are passed to the program as a list of
strings in sys.argv (variable argv of module sys) <=
![if !supportLists]>q <=
/span>this often eliminates the need to prompt for information <=
![if !supportLists]>q <=
/span>sys.argv[0] is the Python file name
from the command line <=
![if !supportLists]>q <=
/span>v in argv stands for vector,
another name for a list or array <=
![if !supportLists]>q <=
/span>module sys contains a lot of other "system" related
stuff <=
![if !supportLists]>n =
span>Python applications, like Python statements, do not
return anything <=
![if !supportLists]>q <=
/span>actually, an error code may be returned, but it is usually ignored <=
![if !supportLists]>n =
span>Example: tsvToCsvApp.py takes file names =
as
arguments, instead of prompting for them <=
![if !supportLists]>q
try the command: pyt=
hon
tsvToCsvApp.py address.tsv address.csv Exerc=
ise <=
![if !supportLists]>n Write
a Python application that prints the sum of its arguments C:\home\202\src>python sum.py 1 2 3.3 6.3 <=
![if !supportLists]>n Answer import sys def main(): sum =3D 0=
for arg in
sys.argv[1:]: sum
+=3D float(arg) print sum=
main() Digre=
ssion:
tsv and csv formats <=
![if !supportLists]>n =
span>Data in tab-separated-value (tsv) format is common=
<=
![if !supportLists]>q
each line is a data =
record <=
![if !supportLists]>q <=
/span>fields (values) in each
record are separated by tabs <=
![if !supportLists]>q <=
/span>example: ad=
dress.tsv Sue Smith &nb=
sp; John Jones &n=
bsp; 1934
<=
![if !supportLists]>n =
span>What is the advantage of this format? <=
![if !supportLists]>n =
span>Answer: data appears in nice columns if tabs are s=
et
appropriately <=
![if !supportLists]>n =
span>What is a disadvantage of this format? <=
![if !supportLists]>n =
span>Answer: data appearance is a mess if tabs are not =
set
appropriately <=
![if !supportLists]>q <=
/span>tabs may even look like spaces <=
![if !supportLists]>n =
span>Common alternative: comma-separated-value (csv) fo=
rmat <=
![if !supportLists]>q <=
/span>uses commas instead of tabs <=
![if !supportLists]>n =
span>What if commas, or tabs, can be part of field valu=
es? File
processing application example <=
![if !supportLists]>n =
span>Application tsvToCsv1.py converts files from tsv to csv format <=
![if !supportLists]>q
example: m:\>python tsvToCsv1.py address.tsv address.csv=
<=
![if !supportLists]>n =
span>Problem: what if the file is too big to fit in mem=
ory
(twice!) ? <=
![if !supportLists]>q <=
/span>solution: read and write a line, or even a character, at a time <=
![if !supportLists]>n &nb=
sp; a bit more complicated to program and requires conditional tests <=
![if !supportLists]>n =
span>Problem: what if a name contains a comma? <=
![if !supportLists]>q <=
/span>fairly common, for example: Martin Luther King, Jr. <=
![if !supportLists]>q <=
/span>common solution: put double quotes around field values <=
![if !supportLists]>n &nb=
sp; assumes field values do not contain double quotes <=
![if !supportLists]>q <=
/span>exercise: modify tsvToCsv1.py to quote fields <=
![if !supportLists]>n
example: "Sue
Smith"," <=
![if !supportLists]>n &nb=
sp; solution tsvToCsv.py
import sys
def multTable():
for n in range(1, 10):
for m in range(1, 10):
print '%3d' % (n * m),
print
def numbersToStrings(numList):
strList =3D []
for n in numList:
strList +=3D [str(n)]
return strList
startTitleTag =3D ''
endTitleTag =3D ' '
def getTitle(string):
lowerString =3D string.lower()
start =3D lowerString.find(startTitleTag) + len(startTitleTag)
end =3D lowerString.find(endTitleTag)
return string[start : end]
def main():
inFile =3D open ('usernames', 'r')
outFile=3D open ('iuaddresses', 'w')
for line in inFile.readlines():
outFile.write(line.strip() + '@indiana.edu\n')
inFile.close()
outFile.close()
main()
''' Usage: python tsvToCsv.py
''' Usage: python tsvToCsv.pyConvert file named in tab-separated-value (tsv) format to a new one named in comma-separated-value (csv) foramt. ''' import sys def main(): inputFileName, outputFileName =3D sys.argv[1:] inputFile =3D open(inputFileName) tsvText =3D inputFile.read() inputFile.close() csvText =3D tsvText.replace('\t', ',') outputFile =3D open(outputFileName, 'w') outputFile.write(csvText) outputFile.close() main()
'''
Read a file named usernames with one line per username and output a
corresponding fie named iuaddresses containing corresponding IU email
addresses.
'''
def main():
inFile =3D open('usernames', 'r')
outFile =3D open('iuaddresses', 'w')
for line in inFile.readlines():
outFile.write(line.strip() + '@indiana.edu\n')
inFile.close()
outFile.close()
main()