Week 9

String Methods and Data Mining

Author:Christopher Haynes
Email:chaynes@indiana.edu
Affiliation:Indiana University
Course:BL CSCI A201
Date:2008-03-06

String method demo

>>> s = 'I never, never did anything...'
>>> s.upper()
'I NEVER, NEVER DID ANYTHING...'
>>> import math
>>> math.sqrt(4)
2
>>> s.lower()
'i never, never did anything...'
>>> s.find('never')
2
>>> s.find('Never')
-1
>>> s.index('never')
2
>>> s.index('Never')
Traceback (most recent call last):
  File "", line 1, in 
ValueError: substring not found
>>> s.find('never', 3)
9
>>> s.index('never', 3)
9
>>> word = s[0: s.index('never')]
>>> word
'I '
>>> word.strip()
'I'
>>> s[0 : s.index('never')].strip()
'I'
>>> '123'.isdigit()
True
>>> ' 123 '.isdigit()
False
>>> ' 123 '.strip().isdigit()
True
>>>

Method call syntax

Method call semantics

Optional arguments

Some string methods

Since strings are immutable, wording such as in the last three points above means that a copy of the string s is returned with just the indicated changes.

Other string methods and functions

clicker Which of the following is a method call to find the index of 'a' in s?

  1. find(s, 'a')
  2. string.find(s, 'a')
  3. s.find('a')
  4. a.find(s)
  5. A or B

Answer: C

clicker What does the following print?

s = 'AIR RAID SIRENS'
s.lower()
print s
  1. 'AIR RAID SIRENS'
  2. 'air raid sirens'
  3. None of the above, there is an error

Answer: A

clicker What does '[812]'[1:4].isdigit() return?

  1. True
  2. False
  3. '812`
  4. None of the above, there is an error

Answer: A

Data mining background

Uniform Resource Locators (URLs)

HTML

Data mining

Accessing the web with Python

Getting the world population, Part 1

Getting the world population, Part 2

Some limitations of this data mining methodology

Exercise: similar

def similar(s1, s2):
    """
    Return a boolean value indicating if strings s1 and s2 are equal,
    ignoring case (case insensitive) and leading and trailing whitespace.

    >>> similar('  Ni  ', 'ni')
    True
    >>> similar('eggs', ' EGGS')
    True
    >>> similar('eggs', 'egg')
    False
    >>>
    """
    #...

Solution: similar

def similar(s1, s2):
    """
    Return a boolean value indicating if strings s1 and s2 are equal,
    ignoring case (case insensitive) and leading and trailing whitespace.

    >>> similar('  Ni  ', 'ni')
    True
    >>> similar('eggs', ' EGGS')
    True
    >>> similar('eggs', 'egg')
    False
    >>>
    """
    return s1.lower().strip() == s2.lower().strip()

Exercise: inches1

def inches1(length):
    """
    Return the number of inches (an int) in length, a string that should be
    in the format <inches>", where <inches> is a non-negative integer, Return None
    if length is not a string in the proper format.

    >>> inches1('3"')
    3
    >>> inches1('3') # None returned by this and following tests
    >>> inches1('+3"')
    >>>
    """
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.

Solution: inches1

def inches1(length):
    """
    Return the number of inches (an int) in length, a string that should be
    in the format <inches>", where <inches> is a non-negative integer, Return None
    if length is not a string in the proper format.

    >>> inches1('3"')
    3
    >>> inches1('3') # None returned by this and following tests
    >>> inches1('+3"')
    >>>
    """
    if length == '' or length[len(length) -1] != '"':
        return None # no double quote
    else:
        # remove the double quote character
        length = length[0 : len(length) - 1]
        if not length.isdigit():
            return None # <inches> is not an integer
    return int(length)

Exercise: inches2

def inches2(length):
    """
    Return the number of inches (an int) in length, a string in the format
    <feet>'<inches>", where both <feet> and <inches> are non-negative integers,
    Return None if length is not in the proper format.

    >>> inches2('2\'3"')
    27
    >>> inches2("bodus") # None returned by this and following tests
    >>> inches2('11"')
    >>>
    """
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.

Solution: inches2

def inches2(length):
    """
    Return the number of inches (an int) in length, a string in the format
    <feet>'<inches>", where both <feet> and <inches> are non-negative integers,
    Return None if length is not in the proper format.

    >>> inches2('2\'3"')
    27
    >>> inches2("bodus") # None returned by this and following tests
    >>> inches2('11"')
    >>>
    """
    index = length.find("'")
    if index == -1:
        return None # Missing ' mark
    else:
        feet_string = length[0 : index]
        inches_string = length[index + 1 : len(length)]
        if feet_string.isdigit():
            feet = int(feet_string)
        else:
            return None # <feet> is not an integer
    if inches_string == '':
        return None # no <inchex>"
    elif inches_string[len(inches_string) - 1] != '"':
        return None # no double quote at end
    else:
        # remove the double quote character
        inches_string = inches_string[0 : len(inches_string) - 1]
        if inches_string.isdigit():
            inches = int(inches_string)
        else:
            return None # <inches> is not an integer
    return feet * 12 + inches

Exercise: inches

def inches1(length):
    """
    Return the number of inches (an int) in length, a string that should be
    in the format <inches>", where <inches> is a non-negative integer, Return None
    if length is not a string in the proper format.

    >>> inches1('3"')
    3
    >>> inches1('3') # None returned by this and following tests
    >>> inches1('+3"')
    >>>
    """
    #.
    #.
    #.
    #.
    #.
    #.
    #.
    #.

Solution: inches

def inches1(length):
    """
    Return the number of inches (an int) in length, a string that should be
    in the format <inches>", where <inches> is a non-negative integer, Return None
    if length is not a string in the proper format.

    >>> inches1('3"')
    3
    >>> inches1('3') # None returned by this and following tests
    >>> inches1('+3"')
    >>>
    """
    if length == '' or length[len(length) -1] != '"':
        return None # no double quote
    else:
        # remove the double quote character
        length = length[0 : len(length) - 1]
        if not length.isdigit():
            return None # <inches> is not an integer
    return int(length)

clicker What is the value of 'a'.strip == 'a' ?

  1. True
  2. False
  3. there is a syntax error
  4. there is a runtime error

Answer: B

clicker What does the test print?

def mystery(s, n):
    """
    >>> mystery('Chamilot', 2)
    #...
    >>>
    """
    while 0 < n:
        s = s[1 : len(s) - 2]
        n = n - 1
    print s
  1. Chamilot
  2. hamilo
  3. am
  4. nothing (but a newline)
  5. none of the above, but no error
  6. none of the above, there is an error

Answer: C