Jkuroda Week 3

From LMU BioDB 2015
Revision as of 22:59, 17 September 2015 by Jkuroda (Talk | contribs) (first edit)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Complement of a Strand

To get the complement, I immediately thought of replacing each nucleotide with its complement, and so I used sed with "y/atcg/tagc/" to implement my idea.

cat sequence file | sed "y/atcg/tagc/"

Reading Frames

For this initial reading frame, I first thought of replacing the t's with u's, then I simply used the genetic-code.sed file to do the rest for me. But this did not work, since the sed command was going through the commands line by line. I was left with a messy line of lonely nucleotides with the amino acid abbreviations between them. I thought about it for a second and realized that I could solve this issue by simply separating each base triplet with a space. That seemed to solve the problem but then there were a couple of stray nucleotides at the end of the line, so I added a sed command to get rid of any extra nucleotides.

+1
cat sequence file | sed "s/t/u/g" | sed "s/.../& /g" |sed -f genetic-code.sed | sed "s/[aucg]//g"

Now that we are in a different reading frame, the overall process is mainly similar, but there is one small addition. I used the sed command to delete the first character in the sequence file.

+2
cat sequence file | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" |sed -f genetic-code.sed | sed "s/[aucg]//g"

Similarly for this reading frame, I just added an extra character to be deleted from the beginning of the sequence.

+3
cat sequence file | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" |sed -f genetic-code.sed | sed "s/[aucg]//g"

For the next three reading frames, I remembered that there was a handy rev command for reversing the characters in a sequence, so I placed that command before I did the usual sequence of commands.

-1
cat sequence file | rev sequence file | sed "s/t/u/g" | sed "s/.../& /g" |sed -f genetic-code.sed | sed "s/[aucg]//g"

Now that the reverse command is in place, the rest of the commands are similar to the previous reading frames, with the deletion of the first and second characters for -2 and -3, respectively.

-2
cat sequence file | rev sequence file | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" |sed -f genetic-code.sed | 
    sed "s/[aucg]//g"
-3
cat sequence file | rev sequence file | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" |sed -f genetic-code.sed | 
    sed "s/[aucg]//g"