Kzebrows Week 3

From LMU BioDB 2015
Revision as of 01:23, 22 September 2015 by Kzebrows (Talk | contribs) (+2 and +3 reading frames.)

Jump to: navigation, search

Complement of a Strand

I decided to use the E. coli file for practice in the first part of the assignment. Initially, my instinct was to use SED as the command to replace the letters in the sequence. I needed to replace A with T, T with A, C with G, and G with C; however, I realized that SED only replaces things in a sequence, and if I used SED then every letter that changed would immediately change back to the original, defeating the purpose of the command. I then remembered that I can use sed “y”/<original characters>/<new characters>/ to replace everything in one go.

I opened the prokaryote file using cat infA-E.coli-K12.txt, which gave me the DNA sequence of the mRNA-like strand. If this is read from 5’ to 3’, I needed to create the complementary strand. I then typed in the sed rule indicating what I wanted to replace, which gave me the complementary strand. The complete command was

cat infA-E.coli-K12.txt | sed “y/atcg/tagc/”.

Reading Frames

First I opened the file and replaced all of the T's with U's using

sed "s/t/u/g" 

Which gave me the DNA sequence translated into mRNA. This still gave me a long string of letters so I used

sed "s/.../& /g" 

to indicate that I wanted a space every three letters, separating the sequence into codons.

Then, from looking at the file genetic-code.sed, which contains a separate list of each codon and the letter of the corresponding amino acid, I knew that this file needed to be added to the list of commands in order for its information to be used with the infA-E.coli-K12.txt file. The final string of commands for the +1 sequence then looks like this:

cat infA-E.coli-K12.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed

For the +2 reading frame I needed to figure out how to delete the first nucleotide in the sequence. I did this by adding the command

sed "s/^.//g" 

This command has to come right after the file is opened. Each (.) after the carrot indicates a deletion of one character starting from the beginning of the first line. I also realized that there would be a nucleotide or two left over so I needed to truncate it somehow so only the codons that would be translated into amino acids would show. This is done by using this command:

sed “s/[acug]//g”
I then proceeded with the rest of the commands so the list of commands looked like this:
cat infA-E.coli-K12.txt | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed “s/[acug]//g”

To get the +3 reading frame I added one more (.) after the carrot so the command was

sed "s/^..//g"

indicating that I wanted to delete the first TWO characters in the first line. This entire command was:

cat infA-E.coli-K12.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed “s/[acug]//g”