Troque Week 4
Contents
Transcription and Translation “Taken to the Next Level”
First, login to the LMU CS server using ssh. Type in the following in a command prompt (Windows) or terminal (Mac) window:
ssh <username@my.cs.lmu.edu>
Enter your password. Note: You will not visibly see the cursor move when typing in your password so just keep typing. Then change directories to dondi's using the following commands to find the practice files and other miscellaneous files:
cd ~dondi/xmlpipedb/data
Here, you can use the command ls
in order to see the list of files in the directory. Then we can start manipulating some files.
Note: I collaborated with Lena Olufson when starting this assignment. We first decided to use grep
in order to visually see where the pattern would be (it was actually my fault since we could've jumped to using sed
right away, but I didn't read the assignment description thoroughly; I didn't notice that we were supposed to add the tags).
In this assignment, we will be manipulating the file infA-E.coli-K12.txt.
Adding tags to the strand
We start off by adding the -35 box of the promoter. The tag that we will add is
...<minus35box>...</minus35box>...
We do this by using the sed
command in order to "replace" the empty string around the pattern we are looking for. For this part, we are looking for the pattern tt[gt]ac[at]
.
Type the following command to insert the tags around the pattern on each line of its occurrence:
sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/2" infA-E.coli-K12.txt
Since we want to keep the pattern in the same file, but we want to add the tags around it, we use the & symbol. We are also adding a new line after we add the tags using \n
. This will be especially useful later on when we are looking for the -10 box. Note: we cannot simply type a forward slash (/) into the code; a regular forward slash is treated differently by the command line so we have to "escape" it using the escape character backslash (\). After running the command above, the command will output something like this:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatctttactta<minus35box>tttaca</minus35box> gaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaa ggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaa atgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtc ttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
Note: This is actually just 2 lines of bases: the second line starts after the tag </minus35box>
. For visual purposes, I decided to break up the strand into lines for this wiki.
Next, we can add the -10 box. We will use a similar command that we use in adding the -35 box:
sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"
When we pipeline the commands for -35 and -10,
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/2" | sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"
the tags around the patterns found will be added so we have the following strand:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box> tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgc tcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa atctttactta<minus35box>tttaca</minus35box> gaacttcgg<minus10box>cattat</minus10box> cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactac atccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctga ttgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
Notice that sed
found 2 matches for the pattern for the -10 box. We can determine which is the correct one by remembering that the -35 box and -10 box are generally 17 bases apart. This means that, from the end of the -35 box to the start of the -10 box, there are 17 bases. We use the sed
Assignment Links
Weekly Assignments
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- No Week 13 Assignment
- Week 14
- Week 15
Individual Journal Entries
- Week 1 - This is technically the user page.
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- No Week 13 Assignment
- Week 14
- Week 15