;; CMPU101, SPRING 2013 ;; Lab 10 - Working with DNA strings (display "\n CS101 Lab 10, Spring 2013") (display "\n PLEASE WRITE YOUR NAME HERE\n\n") ; ; Computer science is very much a part of almost every other ; scientific discipline (and many non-scientific disciplines, ; for that matter). The rapidly emerging field of bioinformatics ; is currently one of the areas where computer scientists are ; needed to develop algorithms that process strings of data. ; In particular, processing DNA sequences is one of the new ; challenges for computer scientists. ; ; The genetic code for all living organisms is carried in DNA -- ; a molecule with the remarkable capacity to replicate its own ; structure. The DNA molecule itself consists of a long strand of ; chemical bases, wound together with a complementary molecule to ; form a double-helix pattern. The four bases in its structure-- ; adenine, cytosine, guanine, and thymine--combine with each other ; only in the following ways: ; ; Adenine links only with Thymine, and vice versa. ; ; Cytosine links only with Guanine and vice versa. ; ; ; Typically, biologists abbreviate the names of the bases so that ; each is represented by its initial letter: A, C, G, or T. ; ; Inside the cell, a DNA strand acts as a template to which other ; DNA strands can attach themselves. As an example, suppose that ; you have the following sequence of bases, in which the position ; of each base has been numbered as it would be in a string: ; ; ; T A A C G G T A C G ; 0 1 2 3 4 5 6 7 8 9 ; ; ; One of the problems that biologists need to solve is finding ; a location on which a short DNA strand can attach itself to a ; longer one. If, for example, you are trying to find a match for ; the strand ; ; T T G C C (the matching complement is A A C G G) ; ; ; The rules for DNA replication dictate that this strand can bind ; to the longer one at position 1: ; ; T T G C C ; T A A C G G T A C G ; 0 1 2 3 4 5 6 7 8 9 ; ; ; In this lab, you will start writing functions for a program ; that can be used to find matching sites on DNA strands. You will ; finish this program as part of an upcoming lab or homework. ; ; Follow the design recipe and write contract, header, purpose and ; check-expect statements for each function. Post-function printf ; statements are not necessary but you may choose to use them. (display "\n\n----------------------------\n") (display "Problem 1: STRING->UPPERCASE") (display "\n----------------------------\n") ; ; Write a function called STRING->UPPERCASE that consumes a string ; and converts all the characters to uppercase. You can assume ; that the input string is either all letters or empty. ; ; (check-expect ; (string->uppercase "AgctAC") "AGCTAC") ; ; (check-expect ; (string->uppercase "cccttaaag") "CCCTTAAAG") ; ; (check-expect ; (string->uppercase "") "") ; ; (check-expect ; (string->uppercase "yaodkf") "YAODKF") ; ; ; The following version descriptions are just suggestions. You do ; NOT have to write two versions of this function. Also, you may ; choose not to use either suggestion and write the function your ; own way. ; ; 1st version suggestion: ; ; Use a local helper function that has a position number, pos, ; and uses string recursion to convert the char at pos to ; upper-case using the char-upcase function. String-append ; an accumulator onto the string form of each upper-case ; character, one character string at a time, until pos is equal ; to the length of the string. You may also choose not to use ; an accumulator. ; ; 2nd version suggestion: ; ; Use explode to create a list of 1-char strings. Then use ; string-ref to convert each 1-char string to a char by ; calling char-upcase on the char. Use map to process each ; char, and then call implode to put all the characters back ; together into a string. ; ; Since there is no string-upcase function in Racket, you may ; want to write a small helper function that converts a single ; 1-character string to uppercase by using string-ref and char- ; upcase. It will make writing the function using map easier. ; (display "\n\n----------------------------\n") (display "Problem 2: ALL-VALID-BASES?") (display "\n----------------------------\n") ; ; Write a function that takes in a string in all capital letters ; and that checks if all the characters in the string are valid bases: ; A, G, C, T. Call this function all-valid-bases? ; ; ;; pre-function tests: ; (check-expect (all-valid-bases? "GCFTA") #f) ; ; (check-expect (all-valid-bases? "VIUOP") #f) ; ; (check-expect (all-valid-bases? "GATTACC") #t) ; ; (check-expect (all-valid-bases? "") #f) ; ; Hint 1: There are many ways to write this function. One uses ; the higher-order function andmap, along with explode ; and member? Pay attention to the case in which the ; input string is empty because in this case your func- ; tion should return false ; ; Hint 2: Another version of this function uses a helper function ; called valid-base? that checks if each character in ; the string is one of the bases A, G, C, or T. In this ; version, you can write a helper function to keep track ; of the current position in the string and stop when ; pos is equal to the length of the intput string. ; ; You may assume that the input string is all letters and is in ; uppercase. Your function should return false for the empty ; string. ; (display "\n\n----------------------------\n") (display "Problem 3: CONVERT-TO-COMPLEMENT") (display "\n----------------------------\n") ; ; Write a function called convert-to-complement that takes in ; valid uppercase strings of nucleotide bases and produces ; the valid uppercase matching string. For an input of ; "A" (or #\A), your function should produce "T" ; "T" (or #\T), your function should produce "A" ; "C" (or #\C), your function should produce "G" ; "G" (or #\G), your function should produce "C" ; ; Note: You can assume in the function contract that all strings ; entered will be all upper-case and will contain only the ; letters A, C, T, G. You do not have to check for the empty ; string. ; ; ; ;; pre-function tests: ; (check-expect (convert-to-complement "ACGTTA") "TGCAAT") ; ; (check-expect (convert-to-complement "TATATC") "ATATAG") ; ; ; Hint: Use a local function that contains the current position ; you are converting and an accumulator, with recursive ; clauses for each base. ; (display "\n\n----------------------------\n") (display "Problem 4: FIND-MATCH-POS") (display "\n----------------------------\n") ; ; Write a function called find-match-pos that consumes two ; strings of letters representing the bases in DNA strands. The ; first argument should be a valid DNA string in all upper-case ; that is less than or equal to the length of the second argument. ; The second argument should be the longer sequence, also all ; in upper-case. ; ; The function should return the first index position in the longer ; DNA strand to which the second strand matches or -1 if there is no ; match. The shorter string is not complemented, so your function ; can check for a direct match. ; ; ;; pre-function tests: ; (check-expect (find-match-pos "ACG" "TTATGC") -1) ; no match found ; (check-expect (find-match-pos "TGC" "TTATGC") 3) ; match at pos 3 ; (check-expect (find-match-pos "TT" "CTTGG") 1) ; match at pos 1 ; (check-expect (find-match-pos "A" "CCCCCA") 5) ; match at pos 5 ; (check-expect (find-match-pos "C" "AAA") -1) ; no match found ; ; ;; Algorithm: find-match-pos (short long) ; ;; 1. ; ;; 2. cond ; ;; 3. [((pos + length of shorter string) > (length of longer string)) -1] ; ;; 4. [(string=? (short and the substring of long starting at pos and ; ;; going to pos + string-length short)) pos] ; ;; 5. [else do a recursive call with incremented pos] ;