A201

Assignment 12

Histograms

Pair or individual programming in this lab and assignment.

In Lab

Fill in the ellipsis:

# lab12.py, by chaynes@indiana.edu

def int_histogram(int_list):
    """Prints a histogram of frequency of occurrances of values in the given
    list of integers. Print one line for each distinct integer value in the
    list. If there are n occurrances of an integer i, the format of the
    corresponding line is the value i, right justified in 3 columns, followed by
    a space, and then n stars. The lines are in increasing order of the data
    values.

    >>> data = [8, 5, 2, 8, 4, 5, 5, 2, 8, 4, 11, 5]
    >>> int_histogram(data)
      2 **
      4 **
      5 ****
      8 ***
     11 *
    >>>
    """
    # Hints: when passed a list of numbers, the built-in functions min and max
    # return the minimum and maximum number, respectively.
    #
    # The string method call s.rjust(n) returns a string that right justifies s
    # in n columns.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.

def find_letter(s, start_index):
    """Find the the first letter (alpha character) in the string s, starting
    the search with the given index start. Return the index of this letter,
    or -1 if there is no letter at or after the start index.

    For example, in the following test the search starts with the second
    character, so the 'a' character is ignored, and the index of 'Z' is returned.

    >>> find_letter('a1 Z2', 1)
    3
    >>>

    """
    # Hint: the string method call s.isalpha() returns true if all the charcters
    # in s are letters.
    #.
    #.
    #.
    #.
    #.
    #.

def find_non_letter(s, start_index):
    """Find the the first non-letter (non-alpha character) in the string s, starting
    the search with the given index start. Return the index of this letter,
    or -1 if there is no non-letter at or after the start index.

    In the following test the search starts with the second character, and the
    index of the character '1' is returned.

    >>> find_non_letter('Palin123', 1)
    5
    >>>
    """
    # Hint: this is almost the same as the find_letter function
    #.
    #.
    #.
    #.
    #.
    #.

def word_list(s):
    """Return a list of the (non-overlapping) words in string s, 
    where a word is defined to be a contiguous sequence of letters that is as
    long as possible.

    >>> word_list('Knights of Ni: Ni!  Ni!!  Ni!  Ni!')
    ['Knights', 'of', 'Ni', 'Ni', 'Ni', 'Ni', 'Ni']
    >>>
    """
    # Hint: Use the find_letter and find_non_letter functions and an indexed
    # accumulating loop.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.

def test():
    """Include some tests of the functions in this file here."""

test()


Submit your work via Oncourse as lab12.

Assignment

Due 3pm, April 10th
# a12.py, by chaynes@indiana.edu

import lab12

COUNT_COLUMN_INDEX = 0
WORD_COLUMN_INDEX = 1

def find_row(table, column_index, value):
    """Return the index of the first row in the table that contains value in the
    column with the given index, or -1 if there is no such row.

    >>> table =     [[1, 'knights'], [1, 'of'], [5, 'ni']]
    >>> find_row(table, WORD_COLUMN_INDEX, 'ni')
    2
    >>> find_row(table, 0, 'Ni')
    -1
    >>>
    """
    #.
    #.
    #.
    #.

def word_count(list_of_words):
    """Return a table with one row for each distinct word in the list of words,
    where the second column is the word and the first is the number of times the
    word occurs in the list. Words are converted to lower case for case
    insensitivity. 

    >>> words = lab12.word_list('Knights of Ni: Ni!  Ni!!  Ni!  Ni!')
    >>> word_count(words)
    [[1, 'knights'], [1, 'of'], [5, 'ni']]
    >>>
    """
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.

def max_word_length(word_table):
    """Return the length of the longest word in a word table of the form
    returned by the word_count function.

    >>> table = [[1, 'knights'], [1, 'of'], [5, 'ni']]
    >>> max_word_length(table)
    7
    >>>
    """
    #.
    #.
    #.
    #.
    #.
    #.

def word_histogram(text_file_name, num_words):
    """Prints a histogram of the frequency of the most common words in the text
    file. Each histogram line contains the word followed by a space and then a
    star for each use of the word. The lines printed in order of decreasing word
    frequency and the histogram is cut off after num_words lines. The words are
    right-justified in the number of columns required for the longest histogram
    word. 

    >>> print file('brave_sir_robin.txt').read()
    Bravely bold Sir Robin 
    Brought forth from Camelot. 
    He was not afraid to die, 
    Oh, brave Sir Robin! 
    He was not at all afraid to be killed in nasty ways. 
    Brave, brave, brave Sir Robin. 
    <BLANKLINE>
    >>> # Sir Robin text from the movie "Monty Python and the Holy Grail"
    >>> word_histogram('brave_sir_robin.txt', 4)
    brave ****
      sir ***
    robin ***
      was **
    >>>
    """
    # Hint: use the string methods sort and reverse, and lab12.word_list
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.

def test():
    """Tests of the functions in this file."""
    # Put at one test for each function here.

test()

Here is pseudo code for word_histogram function:

read the file
convert the words to a list of words
make a word count table
sort the rows of the word count table
reverse row order of the word count table
eliminate all but the first num_words rows of the table
find the longest word in the table and save its length
for each row of the table
right justify the row's word in to the length of the longest word
print the justified word and a histogram star for each of the row counts

Recall that the list sort and reverse methods mutate the list.

You may use your own lab12 solution or the solution posted via Oncourse. (It is only used to make a word list, so you can develop and test most of this assignment with a small literal word list.)

Due to the lexicographic nature of list sorting, the table in this function is sorted first by word count and then (when the word counts are equal) by the word strings. The word count is stored first in each row in order for the count to have this priority in the sort. Otherwise, it would be a little more natural for table rows to have the string first and then the number, since the strings are used as "keys" to find the row when accumulating the counts.

It is easy to make an interesting test of the word histogram function by creating a plain text file that contains at least a few hundred (could be thouusands) of words. You can do this, for example, by saving a web page or word processor document as a (plain) text file.

Submit your work program (not test data) as Assignment 12 via Oncourse.