BIOL/CMPU 353 - Bioinformatics
Smith and Schwarz
Spring 2012
Assigned: Thu, Feb 9
Due: Tue, Feb 14
This assignment is loosely based on the first assignment. Instead of using index and substring functions, you will use regular expressions (Regex) to produce the sample output.
It may be easier to start with your perl script from the first assignment than to start from scratch. Alternatively, you can copy/paste our solution to Project 1 listed below.
A sample output is shown below. Your program’s output need not be exactly identical but you should print out the information in the same order and obviously your answers should agree.
+++++++++ Upstream and Genic Report ++++++++++++++++ Starting sequence is: cgccatataatgctcgtccgcgcccta Converted to uppercase: CGCCATATAATGCTCGTCCGCGCCCTA Length of starting sequence is: 27 ---------------------------------------------------- Upstream sequence is: CGCCATATA Gene sequence is: ATGCTCGTCCGCGCCCTA Codon 1 = ATG Codon 2 = CTC Codon 3 = GTC ---------------------------------------------------- Upstream length (bp): 9 Gene length (bp): 18 ----------------------------------------------------
Your completed program is due on Tue, Feb 14. Submit your program electronically, using the submit353 script.
If you would like to start by modifying the code from Project 1, here is some help getting started...
#!/usr/bin/perl use strict; use warnings; #================================================================ # # BIOL/CMPU-353 # Spring 2012 # Project 1 Solution # # Summary: This Perl program isolates the upstream and genic # regions of a sequence. A report is printed, a sample # of which is shown below: # # (you paste a sample of your program's output here) # # Programmer: Marc Smith # # Date Last Modified: # 02/11/2008 -- started program # #=============================================================== print "+++++++++ Upstream and Genic Report ++++++++++++++++\n\n"; my $someSequence; # upstream and start of a gene ... $someSequence = "cgccatataatgctcgtccgcgcccta"; print "Starting sequence is: $someSequence \n"; # convert all nucleotides to uppercase $someSequence = uc($someSequence); print "Converted to uppercase: $someSequence \n\n"; my $seqLength = length($someSequence); print "Length of starting sequence is: $seqLength \n"; print "----------------------------------------------------\n\n"; # get the position of the start codon "ATG" my $ATGPosition = index($someSequence, "ATG"); my $codon2Pos = $ATGPosition + 3; my $codon3Pos = $ATGPosition + 6; # get the first three codons my $codon2 = substr($someSequence, $codon2Pos, 3); my $codon3 = substr($someSequence, $codon3Pos, 3); print "ATG start codon begins in position (bp) ", $ATGPosition+1, "\n"; print " followed by codon $codon2 in position (bp) ", $codon2Pos+1, "\n"; print " followed by codon $codon3 in position (bp) ", $codon3Pos+1, "\n\n"; print "----------------------------------------------------\n\n"; my $upStream; $upStream = substr($someSequence, 0, $ATGPosition); print "Upstream sequence is: $upStream \n\n"; my $upStreamLength = length($upStream); print "Upstream length (bp): $upStreamLength \n\n"; print "----------------------------------------------------\n\n"; my $genicSeq = substr($someSequence, $ATGPosition); my $genicSeqLen = length($genicSeq); print "Gene sequence is: $genicSeq\n\n"; print "Gene length (bp): $genicSeqLen\n\n"; print "----------------------------------------------------\n\n"; my $reverseCompSeq = reverse($someSequence); $reverseCompSeq =~ tr/ACTG/TGAC/; print "Gene + Strand: $someSequence\n\n"; print "Gene - Strand: $reverseCompSeq\n\n"; print "----------------------------------------------------\n\n"; my $origSeqHilighted = lc($someSequence); $origSeqHilighted =~ s/atg/ATG/; print "Original sequence highlighted: $origSeqHilighted \n\n"; print "----------------------------------------------------\n\n"; my $numA = $upStream =~ tr/A/A/; my $numT = $upStream =~ tr/T/T/; print "Measures of AT-richness:\n"; print "\tA:\t$numA\n"; print "\tT:\t$numT\n\n"; print "----------------------------------------------------\n\n";
You should copy/paste the above code into jEdit, then save it via secure FTP on junior:
project2, then
Once your program works properly, use the submit353 script to submit your program electronically. As a reminder, here’s what to do:
cd ~/bioinf
submit353 command to submit your project2 directory (or whatever you named your project directory, if different from project2:
‘‘submit353 project2”