Assignment 12
Histograms
Pair or individual programming in this lab and assignment.
In Lab
Fill in the ellipsis:
# lab12.py, by chaynes@indiana.edu
def int_histogram(int_list):
"""Prints a histogram of frequency of occurrances of values in the given
list of integers. Print one line for each distinct integer value in the
list. If there are n occurrances of an integer i, the format of the
corresponding line is the value i, right justified in 3 columns, followed by
a space, and then n stars. The lines are in increasing order of the data
values.
>>> data = [8, 5, 2, 8, 4, 5, 5, 2, 8, 4, 11, 5]
>>> int_histogram(data)
2 **
4 **
5 ****
8 ***
11 *
>>>
"""
# Hints: when passed a list of numbers, the built-in functions min and max
# return the minimum and maximum number, respectively.
#
# The string method call s.rjust(n) returns a string that right justifies s
# in n columns.
#.
#.
#.
#.
#.
#.
#.
#.
#.
def find_letter(s, start_index):
"""Find the the first letter (alpha character) in the string s, starting
the search with the given index start. Return the index of this letter,
or -1 if there is no letter at or after the start index.
For example, in the following test the search starts with the second
character, so the 'a' character is ignored, and the index of 'Z' is returned.
>>> find_letter('a1 Z2', 1)
3
>>>
"""
# Hint: the string method call s.isalpha() returns true if all the charcters
# in s are letters.
#.
#.
#.
#.
#.
#.
def find_non_letter(s, start_index):
"""Find the the first non-letter (non-alpha character) in the string s, starting
the search with the given index start. Return the index of this letter,
or -1 if there is no non-letter at or after the start index.
In the following test the search starts with the second character, and the
index of the character '1' is returned.
>>> find_non_letter('Palin123', 1)
5
>>>
"""
# Hint: this is almost the same as the find_letter function
#.
#.
#.
#.
#.
#.
def word_list(s):
"""Return a list of the (non-overlapping) words in string s,
where a word is defined to be a contiguous sequence of letters that is as
long as possible.
>>> word_list('Knights of Ni: Ni! Ni!! Ni! Ni!')
['Knights', 'of', 'Ni', 'Ni', 'Ni', 'Ni', 'Ni']
>>>
"""
# Hint: Use the find_letter and find_non_letter functions and an indexed
# accumulating loop.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
def test():
"""Include some tests of the functions in this file here."""
test()
Submit your work via Oncourse as lab12.
Assignment
# a12.py, by chaynes@indiana.edu
import lab12
COUNT_COLUMN_INDEX = 0
WORD_COLUMN_INDEX = 1
def find_row(table, column_index, value):
"""Return the index of the first row in the table that contains value in the
column with the given index, or -1 if there is no such row.
>>> table = [[1, 'knights'], [1, 'of'], [5, 'ni']]
>>> find_row(table, WORD_COLUMN_INDEX, 'ni')
2
>>> find_row(table, 0, 'Ni')
-1
>>>
"""
#.
#.
#.
#.
def word_count(list_of_words):
"""Return a table with one row for each distinct word in the list of words,
where the second column is the word and the first is the number of times the
word occurs in the list. Words are converted to lower case for case
insensitivity.
>>> words = lab12.word_list('Knights of Ni: Ni! Ni!! Ni! Ni!')
>>> word_count(words)
[[1, 'knights'], [1, 'of'], [5, 'ni']]
>>>
"""
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
def max_word_length(word_table):
"""Return the length of the longest word in a word table of the form
returned by the word_count function.
>>> table = [[1, 'knights'], [1, 'of'], [5, 'ni']]
>>> max_word_length(table)
7
>>>
"""
#.
#.
#.
#.
#.
#.
def word_histogram(text_file_name, num_words):
"""Prints a histogram of the frequency of the most common words in the text
file. Each histogram line contains the word followed by a space and then a
star for each use of the word. The lines printed in order of decreasing word
frequency and the histogram is cut off after num_words lines. The words are
right-justified in the number of columns required for the longest histogram
word.
>>> print file('brave_sir_robin.txt').read()
Bravely bold Sir Robin
Brought forth from Camelot.
He was not afraid to die,
Oh, brave Sir Robin!
He was not at all afraid to be killed in nasty ways.
Brave, brave, brave Sir Robin.
<BLANKLINE>
>>> # Sir Robin text from the movie "Monty Python and the Holy Grail"
>>> word_histogram('brave_sir_robin.txt', 4)
brave ****
sir ***
robin ***
was **
>>>
"""
# Hint: use the string methods sort and reverse, and lab12.word_list
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
#.
def test():
"""Tests of the functions in this file."""
# Put at one test for each function here.
test()
Here is pseudo code for word_histogram function:
read the fileconvert the words to a list of wordsmake a word count tablesort the rows of the word count tablereverse row order of the word count tableeliminate all but the first num_words rows of the tablefind the longest word in the table and save its lengthfor each row of the tableright justify the row's word in to the length of the longest wordprint the justified word and a histogram star for each of the row counts
Recall that the list sort and reverse methods mutate the list.
You may use your own lab12 solution or the solution posted via Oncourse. (It is only used to make a word list, so you can develop and test most of this assignment with a small literal word list.)
Due to the lexicographic nature of list sorting, the table in this function is sorted first by word count and then (when the word counts are equal) by the word strings. The word count is stored first in each row in order for the count to have this priority in the sort. Otherwise, it would be a little more natural for table rows to have the string first and then the number, since the strings are used as "keys" to find the row when accumulating the counts.
It is easy to make an interesting test of the word histogram function by creating a plain text file that contains at least a few hundred (could be thouusands) of words. You can do this, for example, by saving a web page or word processor document as a (plain) text file.
Submit your work program (not test data) as Assignment 12 via Oncourse.