On this page:
1 Selection sort
2 DNA decompression
3 DNA compression
4 Challenges
8.12

Lab 11: Generative recursion 1🔗

Remember to follow (and when necessary, adapt) the design recipe for each function. When you write a generative recursive function, you must provide a termination argument. A termination argument must say why the function always eventually stops, not just when it stops.

1 Selection sort🔗

This week we start to learn about generative recursion, which is more advanced than structural recursion.
  • Structural recursion is decomposing a problem non-creatively—by following the template for processing the input.

  • Generative recursion is decomposing a problem creatively—by not following the template for processing the input.

Exercise 1. Recall from Lecture 19: Follow the template that insertion sort is a basic sorting algorithm that decomposes the sorting problem in a non-creative way—by following the template for processing a list:

How does insertion sort decompose the sorting problem above into smaller sorting problems? Finish the tree on a piece of paper. Make sure to put each sorting problem in a box, and don’t put anything else in a box. Keep decomposing until the list is empty.

Exercise 2. Selection sort is a different and slightly creative sorting algorithm. When it sees a non-empty list, it does not decompose the problem into the rest of the list like insertion sort does. Instead, it removes the smallest number from the list:

How does selection sort decompose the sorting problem above into smaller sorting problems? Finish the tree on a piece of paper. Make sure to put each sorting problem in a box, and don’t put anything else in a box. Keep decomposing until the list is empty.

Exercise 3.
; A SelectTree is one of:
; - (make-done)
; - (make-select Number SelectTree)
(define-struct done [])
(define-struct select [chosen remain])
Define the SelectTree you just made as a constant. Call it st1.

Exercise 4. Design a function select-tree-result that takes a SelectTree and produces the sorted list of numbers. Hint: Instead of generative recursion, stick with structural recursion and follow the template for processing a SelectTree.

The goal of the rest of this section is to generate a SelectTree from a list of numbers. The problem decomposition can use some helper functions.

Exercise 5. Design a function smallest-number that takes a non-empty list of numbers and finds the smallest number in it. (You did this in Problem set 8: Abstraction. You can do it here with or without any helper function.)

Exercise 6. Design a function remove-number that takes a non-empty list of numbers and a number in it, and removes the given number to produce a shorter list. If the given number occurs multiple times in the given list, then remove the first occurrence only.

Exercise 7. Design a function generate-select-tree that decomposes a given list of numbers into a SelectTree. Follow the decomposition strategy in Exercise 2.

Hint: Use generative recursion. Use smallest-number and remove-number. When you write a generative recursive function, you must provide a termination argument. A termination argument must say why the function always eventually stops, not just when it stops.

Exercise 8. Design a function that sorts a list of numbers using generate-select-tree and select-tree-result.

Challenge. The data definition above for a SelectTree uses the defined structures done and select. Change the data definition to use the built-in structures empty and cons instead, and revise all the functions in this section to match. Why would select-tree-result be no longer needed?

2 DNA decompression🔗

DNA is often modeled by sequences of the characters A, C, G and T. These sequences are very long and so often need to be compressed. A simple way to do this is run-length encoding. It means to replace identical consecutive characters with the character followed by its count. For example, the run-length encoding of AAAAAAAAAAGCCCCC is A10G1C5. This is the only run-length encoding—something like A4A6G1C5 is not valid, because each count must cover as many identical consecutive characters as possible.

The goal of this section is decoding, to convert run-length encodings to DNA sequences. For example, we want to convert the run-length encoding A10G1C5 to the DNA sequence AAAAAAAAAAGCCCCC, and we want to convert the empty run-length encoding to the empty DNA sequence. We will decompose the problem of decoding A10G1C5 to the subproblem of decoding G1C5.

We will represent both the run-length encoding and the DNA sequence as [ListOf 1String]. Recall that a 1String is a string of length 1, in other words, a character.

Exercise 9. Design a helper function drop-digits, which drops all the digits at the beginning of a given [ListOf 1String].
(check-expect (drop-digits (explode "10G1C5")) (explode "G1C5"))
The example above is written using the built-in function explode. Recall that explode converts a String to a [ListOf 1String]. Use it to write at least two more examples: one whose input starts with a non-digit character, and one whose input is empty.

Hint: Your definition of drop-digits can use the built-in function string-numeric?

Exercise 10. Design another helper function take-digits, which collects all the digits at the beginning of a given [ListOf 1String].
(check-expect (take-digits (explode "10G1C5")) (explode "10"))

Exercise 11. Next, you will use the helpers above to decompose decoding problems via a DNATree defined below:
; A DNATree is one of:
; - (make-end)
; - (make-run 1String NaturalNumber DNATree)
(define-struct end [])
(define-struct run [base count rest])
Below is an example of a DNATree:
(define dt1
  (make-run "A" 10
    (make-run "G" 1
      (make-run "C" 5 (make-end)))))
Design a function generate-decoding that takes a run-length encoding as input and decomposes its decoding problem into a DNATree. Here are two examples:
(check-expect (generate-decoding (explode "A10G1C5")) dt1)
(check-expect (generate-decoding empty) (make-end))

Hint: Use the helper functions you designed in Exercises 9 and 10 to decompose the problem (explode "A10G1C5") into the subproblem (explode "G1C5"). Also, use the built-in functions string->number and implode:
(check-expect (string->number (implode (list "1" "0"))) 10)

Exercise 12. Design a function decode-dna-tree that takes a DNATree as input and produces the DNA sequence. Here are two examples:
(check-expect (decode-dna-tree dt1) (explode "AAAAAAAAAAGCCCCC"))
(check-expect (decode-dna-tree (make-end)) empty)
Hint: Follow the template for processing a DNATree. Also, use the built-in functions make-list and append:
(check-expect (make-list 10 "A")
              (explode "AAAAAAAAAA"))
(check-expect (append (make-list 10 "A") (list "G" "C" "C" "C" "C" "C"))
              (explode "AAAAAAAAAAGCCCCC"))

Exercise 13. Combine generate-decoding and decode-dna-tree to design a function decode that consumes the run-length encoding of a DNA sequence and produces the DNA sequence. Here are two examples:
(check-expect (decode (explode "A10G1C5")) (explode "AAAAAAAAAAGCCCCC"))
(check-expect (decode empty) empty)
Hint: The definition of decode is very short, does not use recursion, and can be written just by looking at the signatures of generate-decoding and decode-dna-tree.

3 DNA compression🔗

The goal of this section is encoding, to convert DNA sequences to run-length encodings. For example, we want to convert the DNA sequence AAAAAAAAAAGCCCCC to the run-length encoding A10G1C5, and we want to convert the empty DNA sequence to the empty run-length encoding. We will decompose the problem of encoding AAAAAAAAAAGCCCCC to the subproblem of encoding GCCCCC.

Again, we will represent both the DNA sequence and the run-length encoding as [ListOf 1String], in other words, a list of characters.

Exercise 14. Design a helper function drop-letter, which drops all the copies of a given 1String at the beginning of a given [ListOf 1String].
(check-expect (drop-letter "A" (explode "AAAAAAAAAAGCCCCC"))
              (explode "GCCCCC"))
Write at least two more examples: one whose input starts with a different character, and one whose list input is empty.

Hint: Your definition of drop-letter can use the built-in function string=?

Exercise 15. Design another helper function take-letter, which collects all the copies of a given 1String at the beginning of a given [ListOf 1String].
(check-expect (take-letter "A" (explode "AAAAAAAAAAGCCCCC"))
              (explode "AAAAAAAAAA"))

Exercise 16. Next, you will use the helpers above to decompose encoding problems via the same DNATree that you used in the previous section to decompose decoding problems.

Design a function generate-encoding that takes a DNA sequence as input and decomposes its encoding problem into a DNATree. Assume that the given DNA sequence will only consist of upper-case A, C, G and T. Here are two examples:
(check-expect (generate-encoding (explode "AAAAAAAAAAGCCCCC")) dt1)
(check-expect (generate-encoding empty) (make-end))
Hint: Use the helper functions you designed in Exercises 14 and 15 to decompose the problem (explode "AAAAAAAAAAGCCCCC") into the subproblem (explode "GCCCCC").

Exercise 17. Design a function encode-dna-tree that takes a DNATree as input and produces the run-length encoding. Here are two examples:
(check-expect (encode-dna-tree dt1) (explode "A10G1C5"))
(check-expect (encode-dna-tree (make-end)) empty)
Hint: Follow the template for processing a DNATree. Also, use the built-in functions number->string and explode:
(check-expect (explode (number->string 10)) (list "1" "0"))

Exercise 18. Combine generate-encoding and encode-dna-tree to design a function encode that consumes a DNA string and produces its run-length encoding. Again, assume that the given DNA string will only consist of upper-case A, C, G and T. Here are two examples:
(check-expect (encode (explode "AAAAAAAAAAGCCCCC")) (explode "A10G1C5"))
(check-expect (encode empty) empty)
Hint: The definition of encode is very short, does not use recursion, and can be written just by looking at the signatures of generate-encoding and encode-dna-tree.

4 Challenges🔗

Exercise 19. Abstract from the helpers drop-digits and drop-letter you designed in Exercises 9 and 14.

Exercise 20. Abstract from the helpers take-digits and take-letter you designed in Exercises 10 and 15.

Exercise 21. Think: on what kinds of input is run-length encoding most efficient (compresses the most) versus least efficient (compresses the least)? Express your answer by defining two constant DNA sequences very-efficient and very-inefficient, such that
(> (compression-ratio very-efficient) 100)
(< (compression-ratio very-inefficient) 1)
 
; compression-ratio : [ListOf 1String] -> Number
; returns how many times shorter encode makes the given string
(check-expect (compression-ratio (make-list 4 "A")) 2)
(check-expect (compression-ratio (make-list 2 "A")) 1)
(define (compression-ratio dna)
  (/ (length dna) (length (encode dna))))