|
|
− | Human genes
| + | [[User:Eyanosch | Erich Yanoschik]] |
| | | |
− | about 50% repeated bp's
| + | [[Week 4]] |
| | | |
− | about 5% code for proteins
| + | '''#Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene:''' |
| | | |
− | 45% are regulatory (differentiation)
| + | cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | grep "tt[gt]ac[at]" | sed "s/tttact/ <MINUS35BOX> & <\/MINUS35BOX> /g" | sed "s/cattat/ <MINUS10BOX> & <\/MINUS10BOX> /" | sed |
| + | "s/cttgcc/&\n/g" | sed "2s/g/ <TSS>&<\/TSS>/1" | sed ':a;N;$!ba;s/\n//g' | sed "s/gagg/ <RBS>&<\/RBS> /g" | grep "aaaaggt.*gcctttt" | sed "s/aaaaggtc.*tttttatt/ <TERMINATOR> |
| + | <\/TERMINATOR> /g" | sed "s/tac/ <START CODON>&<\/START CODON> /7" | sed "s/tga/ <STOP CODON>&<\/STOP CODON> /10" |
| + | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgtt |
| + | ttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <MINUS35BOX> tttact </MINUS35BOX> tatttacagaacttcgg <MINUS10BOX> cattat </MINUS10BOX> cttgcc <TSS>g</TSS>gttcaaattacggta |
| + | gtga <START CODON>tac</START CODON> ccca <RBS>gagg</RBS> attagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgt |
| + | agagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgc |
| + | <STOP CODON>tga</STOP CODON> ttgttttaccgcctgatgggcgaagagaaagaacgagt <TERMINATOR>aaaaggtcggtttaaccggcctttttatt</TERMINATOR> ttat |
| | | |
− | 3 million bp's = human genome size | + | # The first part of this exercise was started inclass with my partner and Professor Dionisio assisting us. The -35 box, -10 box and TSS were finished. Although I ended up going back and editing my TSS seeing as how i counted from the wrong nucleotide in the -10 box when originally finding the TSS. |
| + | # The start and stop codons were picked when I went back after looking at the codon sequences to see which 3 nucleotides coded for an actual codon during transcription |
| + | # BY using sed and choosing a sequence of characters I could input the location markers fairly easily |
| | | |
− | Roughly ~ 20,296 genes (protein coding genes)
| |
| | | |
− | humans have about 100,000 different proteins in our cells
| + | '''#What is the exact mRNA sequence that is transcribed from this gene?''' |
| | | |
| + | cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | grep "tt[gt]ac[at]" | sed "s/tttact/ <MINUS35BOX> & <\/MINUS35BOX> /g" | sed "s/cattat/ <MINUS10BOX> & <\/MINUS10BOX> /" | sed |
| + | "s/cttgcc/&\n/g" | sed "2s/g/ <TSS>&<\/TSS>/1" | sed ':a;N;$!ba;s/\n//g' | sed "s/gagg/ <RBS>&<\/RBS> /g" | grep "aaaaggt.*gcctttt" | sed "s/aaaaggtc.*tttttatt/ <TERMINATOR>& |
| + | <\/TERMINATOR> /g" | sed "s/tac/ <START CODON>&<\/START CODON> /7" | sed "s/tga/ <STOP CODON>&<\/STOP CODON> /10" | sed "s/t/u/g" | sed "s/ <TSS>/\n/g" | sed |
| + | "s/<\/TSS>/\n/g;s/ <START CODON>/\n/g;s/<\/START CODON> /\n/g;s/ <RBS>/\n/g;s/<\/RBS> /\n/g;s/ <STOP CODON>/\n/g;s/<\/STOP CODON> /\n/g;s/ |
| + | <TERMINATOR>/\n/g;s/<\/TERMINATOR> /\n/g" | sed "1D;11D" |
| + | g |
| + | guucaaauuacgguaguga |
| + | uac |
| + | ccca |
| + | gagg |
| + | auuagauggccaaagaagacaauauugaaaugcaagguaccguucuugaaacguugccuaauaccauguuccgcguagaguuagaaaacggucacgugguuacugcacacaucuccgguaaaaugcgcaaaaacuacauccgcauccugacgggcgacaaagug acuguugaacugaccccguacgaccugagcaaaggccgc |
| + | auu |
| + | gucuuccguagucgcugauuguuuuaccgccugaugggcgaagagaaagaacgagu |
| + | aaaaggucgguuuaaccggccuuuuuauu |
| | | |
− | Open reading frame and doing translation = area coded for a certain gene
| + | *First I had to get rid of all the markers, the easiest way to do this was to create new lines of only the desired code and to erase the labels and excess code. This was done by creating a chain of sed commands to create a new list of lines. The format sed "y/actg/tgac/;s/t/u/g" was used |
| | | |
− | '''Bioinformatics'''
| |
| | | |
− | Database analytical tools - available access to lots of data
| + | cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | grep "tt[gt]ac[at]" | sed "s/tttact/ <MINUS35BOX> & <\/MINUS35BOX> /g" | sed "s/cattat/ <MINUS10BOX> & <\/MINUS10BOX> /" | sed |
| + | "s/cttgcc/&\n/g" | sed "2s/g/ <TSS>&<\/TSS>/1" | sed ':a;N;$!ba;s/\n//g' | sed "s/gagg/ <RBS>&<\/RBS> /g" | grep "aaaaggt.*gcctttt" | sed "s/aaaaggtc.*tttttatt/ <TERMINATOR>& |
| + | <\/TERMINATOR> /g" | sed "s/tac/ <START CODON>&<\/START CODON> /7" | sed "s/tga/ <STOP CODON>&<\/STOP CODON> /10" | sed "s/t/u/g" | sed "s/ <TSS>/\n/g" | sed |
| + | "s/<\/TSS>/\n/g;s/ <START CODON>/\n/g;s/<\/START CODON> /\n/g;s/ <RBS>/\n/g;s/<\/RBS> /\n/g;s/ <STOP CODON>/\n/g;s/<\/STOP CODON> /\n/g;s/ |
| + | <TERMINATOR>/\n/g;s/<\/TERMINATOR> /\n/g" | sed "1D;11D" | sed ':a;N;$!ba;s/\n//g' | sed "s/.../& /g" | sed "s/aug/\n &/1" | sed "1D" | grep "uga" |
| + | aug gcc aaa gaa gac aau auu gaa aug caa ggu acc guu cuu gaa acg uug ccu aau acc aug uuc cgc gua gag uua gaa aac ggu cac gug guu acu gca cac auc ucc ggu aaa aug cgc aaa aac uac auc cgc |
| + | auc cug acg ggc gac aaa gug acu guu gaa cug acc ccg uac gac cug agc aaa ggc cgc auu guc uuc cgu agu cgc uga uug uuu uac cgc cug aug ggc gaa gag aaa gaa cga gua aaa ggu cgg uuu aac cgg |
| + | ccu uuu uau u |
| | | |
− | *information
| |
| | | |
− | **representation, organization, manipulation, distribution, maintenance | + | *Utilized grep function to find the stop codon, the same was done to remove the nucleotides before the start aug codon, ( sed "s/aug/\n&/1" ) takes the first aug to pop up and creates a new line infront of it. Sed "1D" then erases that line |
| | | |
− | transcends all science, interdisciplinary
| + | eyanosch@ab201:/nfs/home/dondi/xmlpipedb/data$ cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | grep "tt[gt]ac[at]" | sed "s/tttact/ <MINUS35BOX> & <\/MINUS35BOX> /g" | sed "s/cattat/ |
| + | <MINUS10BOX> & <\/MINUS10BOX> /" | sed "s/cttgcc/&\n/g" | sed "2s/g/ <TSS>&<\/TSS>/1" | sed ':a;N;$!ba;s/\n//g' | sed "s/gagg/ <RBS>&<\/RBS> /g" | grep "aaaaggt.*gcctttt" | sed |
| + | "s/aaaaggtc.*tttttatt/ <TERMINATOR>&<\/TERMINATOR> /g" | sed "s/tac/ <START CODON>&<\/START CODON> /7" | sed "s/tga/ <STOP CODON>&<\/STOP CODON> /1o" | sed "s/t/u/g" | |
| + | sed "s/ <TSS>/\n/g" | sed "s/<\/TSS>/\n/g;s/ <START CODON>/\n/g;s/<\/START CODON> /\n/g;s/ <RBS>/\n/g;s/<\/RBS> /\n/g;s/ <STOP CODON>/\n/g;s/<\/STOP CODON> /\n/g;s/ |
| + | <TERMINATOR>/\n/g;s/<\/TERMINATOR> /\n/g" | sed "1D;11D" | sed ':a;N;$!ba;s/\n//g' | sed "s/.../& /g" | sed "s/aug/\n &/1" | sed "1D" | grep "uga" | sed "s/uga/&\n/g" |
| + | aug gcc aaa gaa gac aau auu gaa aug caa ggu acc guu cuu gaa acg uug ccu aau acc aug uuc cgc gua gag uua gaa aac ggu cac gug guu acu gca cac auc ucc ggu aaa aug cgc aaa aac uac auc cgc |
| + | auc cug acg ggc gac aaa gug acu guu gaa cug acc ccg uac gac cug agc aaa ggc cgc auu guc uuc cgu agu cgc uga |
| + | uug uuu uac cgc cug aug ggc gaa gag aaa gaa cga gua aaa ggu cgg uuu aac cgg ccu uuu uau u |
| | | |
− | check out Data life cycle (plan, collect, assure, describe, preserver, discover, integrate, analyze)
| + | *Utilized grep to find the stop codon and sed to create a new line after, which will then be erased by the sed "2D" function |
| | | |
− | *Key Concepts | + | cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | grep "tt[gt]ac[at]" | sed "s/tttact/ <MINUS35BOX> & <\/MINUS35BOX> /g" | sed "s/cattat/ <MINUS10BOX> & <\/MINUS10BOX> /" | sed |
| + | "s/cttgcc/&\n/g" | sed "2s/g/ <TSS>&<\/TSS>/1" | sed ':a;N;$!ba;s/\n//g' | sed "s/gagg/ <RBS>&<\/RBS> /g" | grep "aaaaggt.*gcctttt" | sed "s/aaaaggtc.*tttttatt/ <TERMINATOR> |
| + | &<\/TERMINATOR> /g" | sed "s/tac/ <START CODON>&<\/START CODON> /7" | sed "s/tga/ <STOP CODON>&<\/STOP CODON> /10" | sed "s/t/u/g" | sed "s/ <TSS>/\n/g" | sed |
| + | "s/<\/TSS>/\n/g;s/ <START CODON>/\n/g;s/<\/START CODON> /\n/g;s/ <RBS>/\n/g;s/<\/RBS> /\n/g;s/ <STOP CODON>/\n/g;s/<\/STOP CODON> /\n/g;s/ |
| + | <TERMINATOR>/\n/g;s/<\/TERMINATOR> /\n/g" | sed "1D;11D" | sed ':a;N;$!ba;s/\n//g' | sed "s/.../& /g" | sed "s/aug/\n &/1" | sed "1D" | grep "uga" | sed "s/uga/&\n/g" | sed "2D" |
| + | aug gcc aaa gaa gac aau auu gaa aug caa ggu acc guu cuu gaa acg uug ccu aau acc aug uuc cgc gua gag uua gaa aac ggu cac gug guu acu gca cac auc ucc ggu aaa aug cgc aaa aac uac auc cgc |
| + | auc cug acg ggc gac aaa gug acu guu gaa cug acc ccg uac gac cug agc aaa ggc cgc auu guc uuc cgu agu cgc uga |
| | | |
− | ** ID's = identifiers
| |
− | ** Record = entry in a database
| |
− | ** searching a database is executing a query
| |
− | ** Different databases use different file formats
| |
| | | |
| | | |
− | Pertinent types for this class
| + | '''#What is the amino acid sequence that is translated from this mRNA?''' |
− | -Sequence
| + | |
− | -3d structure
| + | |
− | -Model organism databases
| + | |
− | -etc.
| + | |
| | | |
Exception encountered, of type "Error"
[ce4d2b5f] /biodb/fall2015/index.php?diff=1968&oldid=1620&title=Eyanosch_Week_4 Error from line 434 of /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /apps/xmlpipedb/biodb/fall2015/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(111): DiffFormatter->block()
#9 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(888): DiffFormatter->format()
#10 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(802): DifferenceEngine->generateTextDiffBody()
#11 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(733): DifferenceEngine->generateContentDiffBody()
#12 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(795): DifferenceEngine->showDiffPage()
#16 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(506): Article->showDiffPage()
#17 /apps/xmlpipedb/biodb/fall2015/includes/actions/ViewAction.php(44): Article->view()
#18 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(395): ViewAction->show()
#19 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(273): MediaWiki->performAction()
#20 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(566): MediaWiki->performRequest()
#21 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(414): MediaWiki->main()
#22 /apps/xmlpipedb/biodb/fall2015/index.php(44): MediaWiki->run()
#23 {main}