References to objects, not objects themselves, are stored in variables
and passed around
the references are actually memory addresses, but Python programmers
can never see them this way
Every time a list literal is evaluated, including the empty list [], a new
list is created
Picturing aliasing
Picture references as arrows to the object referred to, so a list stored
"in" variable is pictured as an arrow from in the variable box to a
representation of the list as a series of adjacent boxes (indicating the
list elements, which contain the list values, which may be arrows
pointing to other lists
There may be many references to (think "arrows pointing to") the same
object
when an object is mutated, every arrow pointing to the object "sees"
the change
When there is more than one reference to the same object, the references
are all "aliases" for the same object, so this situation is called
aliasing
What's in a file?
A file is a sequence of bytes.
A byte is 8 bits, representing a character, number, or part of a
multi-byte representation of a character, number, or other data.
in this course, they will always represent characters
File input does not involve data type checking in most operating systems.
File name extensions often suggest the type of data in the file.
examples: paper.doc, notes.txt, and program.exe
You get meaningless results if files are not read in a way that is consistent
with the way they were written.
Why have files?
Files are persistent.
traditionally stored on disk, but increasingly flash memory
persistent storage is not lost when the power is turned off
persistent data is usually not lost when a system "crashes"
most computer "main" memory is volatile (not persistent)
Disk storage is much less expensive per byte than computer memory.
disk storage is much larger than main memory
example: 1GB memory for $50 vs 500GB disk for $100, so this memory
is about 250 times cheaper
disks are slower than random access memory (RAM). How much?
How fast, or slow?
Disk random access is about a million times slower than memory !!
random access: reaching any (randomly chosen) element in the same
amount of time
disks are not perfectly random access
disk "pseudo-random" access times are several milliseconds
memory access times are several nanoseconds
Digression: small units of time
millisecond: a thousandth (10**-3) of a second
microsecond: a millionth (10**-6) of a second
nanosecond: a billionth (10**-9) of a second
light travels about a foot in a nanosecond
Example: get_file
def get_file(file_name):
"""Return the contents of the file in a string.
>>> print get_file('stats_data.txt')
1
3
2
<BLANKLINE>
>>>
"""
infile = file(file_name, 'r')
text = infile.read()
infile.close()
return text
Note how blank output lines are indicated when transcripts are embedded
in documentation strings
File objects and opening files
Before reading or writing a file it is necessary to open it
this creates a file object that is used for all file Input/Output (I/O)
the file object keeps track such things as
your position in the file
whether you are allowed to read or write the file
data buffers that store may store data in memory before it is read or
written
opening a file usually locks it so other programs cannot use it while
it is open
Opening files in Python
Use the built-in function file(file_name,mode) to open a file
returns a file object
both arguments are strings
the mode is a string indicating how the file will be used
'r' for reading the file (the default)
'w’ for writing the file (first removing any existing file of the
same name)
a few other possibilities we won't go into
When a file name does not include any directory information, the file
must be in the "current directory", which normally is the directory in
which the program itself is stored.
FYI: the file name may be a "path", containing directory information,
in which case the file may be in a directory other than the current one
file is the same as the older open function used in your text
Closing files
When you are done with a file, you should close it, which
writes to disk any unsaved data in buffers
unlocks it so others can use it
releases some limited operating system memory associated
there is a limit on the total number of open files in the entire
operating system
Closing open files is done automatically when your program terminates,
but sometimes that is not soon enough
File object read methods
read() method returns the entire remaining contents of the file (the
object of the call) in a string.
readline() method returns the next line of the file in a string.
the newline character at the end of the lin is included at the end of
the string
unless it is the last line and the file does not end with a newline
if there are no more characters, the empty string is returned
this is called the end-of-file, or EOF, condition
There are a few more we won't use.
File object write and close methods
write(string) method writes the contents of the string to the file.
to write a line there must be a newline (\n) character at the end of
the string
close() method closes the file and returns nothing.
it is an error to read or write a closed file
"The file" can mean several things
The contents of the file
bytes of data
The file itself
space on disk managed as a unit by the operating system
The name of the file
represented in programs as a string
An opening of the file in a program
represented by a program object
connected with operating system information
Example: iu_addresses
def iu_addresses(in_file_name, out_file_name):
"""Read a file containing one user id per line and output a file
containing on each line the corresponding IU email address, obtained by
appending @indiana.edu to the user id. Whitespace is stripped from user
ids.
>>> print file('userids.txt').read()
jcleese
tgilliam
gchapman
<BLANKLINE>
>>> iu_addresses('userids.txt', 'iuaddresses.txt')
>>> print file('iuaddresses.txt').read()
jcleese@indiana.edu
tgilliam@indiana.edu
gchapman@indiana.edu
<BLANKLINE>
>>>
"""
in_file = file(in_file_name, 'r')
out_file = file(out_file_name, 'w')
while True:
line = in_file.readline()
if line == '': # EOF
break
out_file.write(line.strip() + '@indiana.edu\n')
in_file.close()
out_file.close()
It is Ok to not close files in tests
In the function documentation strings above, shell transcripts use
expressions of the form
file().read()
This is a convenient way to inspect the contents of a file in the shell,
and is ok there because closing the shell will close files opened in this
way. However, expressions of this kind are generally bad programming
practice, because they does not allow the close() method to be used.
printget_file(name>) could be used instead
A file must be opened before you can
read it
write it
both A and B
neither A or B
Answer: C
The file write method message is sent to
a string that names a file
a file object
both of the above
none of the above: write is a function, not a method
Answer: B
Using the basic file methods it is possible to
open a file, both read and write it, and then close it
write a list of numbers directly to a file (without converting the list
to a string)
write many lines to a file with one method call
all of the above
none of the above
Answer: C
If your program runs a long time, but uses a file for only a short time, and you fail to close it, what might happen?
some data written to the file is lost when the system crashes
another program that tries to use the file cannot do so
a program tries to open a file and cannot because of an operating system limit
all of the above
none of the above
Answer: D
Exercise: remove_blank_lines
def remove_blank_lines(in_file_name, out_file_name):
"""Copy the input file to the output file, removing all lines containing only
whitespace.
"""
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
Solution: remove_blank_lines
def remove_blank_lines(in_file_name, out_file_name):
"""Copy the input file to the output file, removing all lines containing only
whitespace.
"""
in_file = file(in_file_name, 'r')
out_file = file(out_file_name, 'w')
while True:
line = in_file.readline()
if line == '': # EOF
break
if line.strip() != '':
out_file.write(line)
in_file.close()
out_file.close()
Exercise: file_replace
def file_replace(in_file_name, out_file_name, old, new):
"""Copy the input file to the output file, replacing all occurrances of the
string old with the string new.
"""
# Hint: use the file read() and string replace(old, new) methods
#.
#.
#.
#.
#.
#.
#.
Solution: file_replace
def file_replace(in_file_name, out_file_name, old, new):
"""Copy the input file to the output file, replacing all occurrances of the
string old with the string new.
"""
# Hint: use the file read() and string replace(old, new) methods
in_file = file(in_file_name, 'r')
text = in_file.read()
in_file.close()
out_file = file(out_file_name, 'w')
text = text.replace(old, new)
out_file.write(text)
out_file.close()
Tables
Tables are one of the most common data structures
individual elements are accessed by row and column indices
if all the values in a table are of the same type, it is often called a
matrix
for example, spreadsheets and database components
Tables in Python
The most natural representation of a table in Python is as a list of
lists
the entire table is a list of rows
the length of each row is the same
Accessing an individual element is done with an expression of the form
table[row_index][column_index]
first access the row, and then the column within the row
Inventory table example
A very simple business order table might have a row for each item and
columns indicating item names, prices, and quantities
in database terminology each row is a record and the columns
correspond to the fields in each record
abstract view of a sample order:
Blue shirt $19.95 2
Brown hat $8.00 1
Gray sock pair $5.00 3
sample order as a Python list of lists (coding prices in cents):
An almost universal way of storing tables in plain-text files is the
comma-separated-value (CSV) format
Example
spreadsheets
database tables, such as Microsoft Outlook contacts and address book
In a CSV file
each line represents a table row
values within a row (corresponding to the columns of the table) are
separated by commas
FYI: Common CSV format variations
if values contain commas (and sometimes even if they do not), the
values are surrounded in double quotation marks
double quotation marks are escaped with a backslash
Exercise: write_csv_table
def write_csv_table(table, file_name):
"""Write the contents of the table to the file in comma-separated
format. The table is a list of lists values, where the string representation
of each value is assumed to contain no comma or quote characters.
>>> write_csv_table([[1, 2], ['a string', 'another string']], 'table.csv')
>>> print get_file('table.csv')
1,2
a string,another string
<BLANKLINE>
>>>
"""
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
Solution: write_csv_table
def write_csv_table(table, file_name):
"""Write the contents of the table to the file in comma-separated
format. The table is a list of lists values, where the string representation
of each value is assumed to contain no comma or quote characters.
>>> write_csv_table([[1, 2], ['a string', 'another string']], 'table.csv')
>>> print get_file('table.csv')
1,2
a string,another string
<BLANKLINE>
>>>
"""
file_object = file(file_name, 'w')
row_index = 0
while row_index < len(table):
row = table[row_index]
# accumulate a row
s = ''
column_index = 0
while column_index < len(row):
s = s + str(row[column_index])
if column_index < len(row) - 1:
s = s + ','
column_index = column_index + 1
file_object.write(s + '\n')
row_index = row_index + 1
file_object.close()
If the documentation of a function says that it returns something, then its body
must have at least one return statement
may or may not have a return statement, depending on whether it uses
print
needs a return statement only if it returns from somewhere
other than the end of the function body
Answer: A
If the documentation of a function says (or implies one way or another) that it mutates a data structure, and does not say that it returns anything, then its body
must contain a return statement
needs a return statement only if it needs to exit from somewhere
other than the end of the function body
may use a return statement only if it does not include an expression,
or the expression's value is None
may not contain a return statement
B and C
Answer: E
If the documentation of a function says that it does something with "corresponding" elements of two or more sequences, and you are not using recursion, you know that its implementation requires