|
|
| *Looking over the [[Week 6| Week 6 Assignment Page]], I found that <code>curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip</code> can be used, while I am in the ''/nfs/home/blitvak'' directory, to place a .zip containing the data files directly into my personal folder. I found that <code>unzip UCM054599.zip</code> unzips the files into the personal folder. | | *Looking over the [[Week 6| Week 6 Assignment Page]], I found that <code>curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip</code> can be used, while I am in the ''/nfs/home/blitvak'' directory, to place a .zip containing the data files directly into my personal folder. I found that <code>unzip UCM054599.zip</code> unzips the files into the personal folder. |
| *I also downloaded and installed pgAdmin III from [http://www.pgadmin.org/ http://www.pgadmin.org/] | | *I also downloaded and installed pgAdmin III from [http://www.pgadmin.org/ http://www.pgadmin.org/] |
| + | *I spent some time reviewing [[PostgreSQL Tutorial | the PostgreSQL Tutorial]] |
| + | *I booted up PuTTy and unzipped the files into my home folder |
| | | |
| + | ===Working with ''application.txt''=== |
| | | |
| + | ====Opening and Reviewing the File==== |
| + | *I opened up and reviewed the ''application.txt'' and ''Product.txt'' files using <code>more <filename.txt></code>; I found that the data column labels for ''application.txt'' are: |
| + | ApplNo ApplType SponsorApplicant MostRecentLabelAvailableFlag CurrentPatentFlag ActionType Chemical_Type Ther_Potential Orphan_Code |
| + | *Reviewing the actual data, and with PostgreSQL in mind, I found that the variable type for each column should be: |
| + | ApplNo: ''int(primary key)'' ApplType: ''varchar'' SponsorApplicant: ''varchar'' MostRecentLabelAvailableFlag: ''boolean'' CurrentPatentFlag: ''boolean'' ActionType: ''varchar'' Chemical_Type: ''int'' |
| + | Ther_Potential: ''varchar'' Orphan_Code: ''varchar'' |
| + | *I realized that any empty data spaces in ''application.txt'' will have to be turned into <code>null</code> |
| + | *I realized that <code>sed "1D"</code> will have to be executed in order to remove the first row (which is column labeling) |
| + | *Referencing the [[Week 6| Week 6 Assignment Page]], I learned that the data within these text files is separated by ''tabs'' (<code>\t</code>) instead of commas |
| + | ====Modifying ''application.txt''==== |
| + | *I opened the file and decided to try to turn the tabs into commas using <code>cat application.txt | sed "s/\t/,/g"</code> |
| + | **Adding onto that, I decided to get rid of the other spaces between the data by using <code>sed -e "s/\s\{4,\}//g"</code>, which matches 4 whitespaces and removes them. This command was found in a [http://superuser.com/questions/112834/how-to-match-whitespace-in-sed StackExchange] post. |
| + | *At this point, I noticed that many lines had extra commas either at the ends of the lines or in the middle, indicating missing or nonexistent values |
| + | **I used <code>sed "s/,,,\r$/,null,null,null/;s/,,\r$/,null,null/;s/,\r$/,null/1" | sed "s/,,/,null/;s/,,,/,null,null/g"</code>, along with what I already have, in order to turn any extra commas (missing values) into ''null'' |
| + | *My focus now turned to making sure that all data that was identified as being ''varchar'' has apostrophes wrapping around it. |
| + | **Working with the ''ApplType'' data and my previous work, I used <code>sed "s/......,/&'/1" | sed "s/'./&'/1"</code> to surround it with apostrophes. |
| + | **Working with the ''SponsorApplicant'' data, I added <code>sed "s/',/&'/g" | sed "s/,./'&/3"</code> to wrap apostrophes around the third data type in each line |
| + | **I used <code>sed "s/,/&'/5" | sed "s/'../&'/4"</code>, to place apostrophes around the two character data under ''ActionType'' |
| + | *Using <code>grep "nullP"</code>, <code>grep "nullS"</code>, and <code>grep "nullV"</code> with the current pipeline of commands: |
| + | **<code>cat application.txt | sed "1D" | sed "s/\t/,/g" | sed -e "s/\s\{4,\}//g" | sed "s/,,,\r$/,null,null,null/;s/,,\r$/,null,null/;s/,\r$/,null/1" | sed "s/,,/,null/;s/,,,/,null,null/g" | sed "s/......,/&'/1" | sed "s/'./&'/1" | sed "s/',/&'/g" | sed "s/,./'&/3" | sed "s/,/&'/5" | sed "s/'../&'/4" </code> |
| + | **I noticed that some of the null values are not separated from adjacent data with commas; I added <code>sed "s/nullS/null,S/g" | sed "s/nullV/null,V/g" | sed "s/nullP/null,P/g"</code> to fix this issue and I checked the result using <code>grep</code> |
| + | **With the ''Ther_Potential'' data being now completely separated by commas from the other data, I then proceeded to try to surround it with apostrophes. I first added <code>sed "s/,'..',.,/&'/g"</code> to add the first apostrophe, and I noticed that this command led some null values to gain an apostrophe. I added <code>sed "s/'null/null/g"/<code> to clean them up. I later noticed that some ''Ther_potential'' values have asterisks tied to them, I used <code>grep "P\*"</code> and <code>grep "S\*"</code> to confirm the presence of asterisks. Finally, I added <code>sed "s/'S,null/'S',null/g" | sed "s/'P,null/'P',null/g" | sed "s/'S,V/'S',V/g" | sed "s/'P,V/'P',V/g" | sed "s/'P\*/'P\*'/g" | sed "s/'S\*/'S\*'/g"</code> to the pipeline to fully surround the ''Ther_Potential'' values. |
| + | **I surrounded the ''Orphan_Code'' variable with apostrophes by adding <code>sed "s/,V/,'V'/g"</code> to the pipeline (''Orphan_Code'' is often null but when it is present, it is always a ''V'') |
| + | *I finished formatting the file by adding <code>sed "s/^/insert into applications(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | rev | sed -r "s/llun|'V'/;)&/1" | rev</code> to the pipeline. I decided to reverse the file in this set of commands because I could not get <code>sed</code> to add '');'' to the very end of each line; I decided upon creating a multiple choice <code>sed</code> command that worked on the very first match of a line. |
| | | |
| + | ====Generating the ''application.sql.txt'' file==== |
| + | *I generated the ''application.sql.txt'' file using <code>cat application.txt | sed "1D" | sed "s/\t/,/g" | sed -e "s/\s\{4,\}//g" | sed "s/,,,\r$/,null,null,null/;s/,,\r$/,null,null/;s/,\r$/,null/1" | sed "s/,,/,null/;s/,,,/,null,null/g" | sed "s/......,/&'/1" | sed "s/'./&'/1" | sed "s/',/&'/g" | sed "s/,./'&/3" | sed "s/,/&'/5" | sed "s/'../&'/4" | sed "s/nullS/null,S/g" | sed "s/nullV/null,V/g" | sed "s/nullP/null,P/g" | sed "s/,'..',.,/&'/g" | sed "s/'null/null/g" | sed "s/'S,null/'S',null/g" | sed "s/'P,null/'P',null/g" | sed "s/'S,V/'S',V/g" | sed "s/'P,V/'P',V/g" | sed "s/'P\*/'P\*'/g" | sed "s/'S\*/'S\*'/g" | sed "s/,V/,'V'/g" | sed "s/^/insert into applications(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | rev | sed -r "s/llun|'V'/;)&/1" | rev > ~/public_html/application.sql.txt</code> |
| + | *I proceeded to input <code>http://my.cs.lmu.edu/~blitvak/application.sql.txt</code> into my browser and I copied all of the data |
| | | |
| + | ====Testing ''application.sql.txt'' in pgAdmin III==== |
| + | *I created a table in postgreSQL using <code>create table applications (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar)</code> |
| + | *I pasted the data into the program and noticed these types of errors: |
| | | |
| + | 1. ''Type 1'' |
| + | ERROR: syntax error at or near "lse" |
| + | LINE 17882: ... values(125422,'B','THROMBOGENICS', INC,False,'Fa'lse,AP,nul... |
| | | |
− | ===Defining the appropriate tables for the ''Application'' and ''Product entities''===
| + | 2. ''Type 2'' |
| + | ERROR: column "s" does not exist |
| + | LINE 161: ...e) values(008319,'N','NOVARTIS',False,False,'AP',14,S,null); |
| + | ERROR: column "s" does not exist |
| + | LINE 360: ...10721,'N','CITRON PHARMA LLC',False,False,'AP',null,S,null); |
| + | ERROR: column "p" does not exist |
| + | LINE 617: ...de) values(013025,'N','HOSPIRA',False,False,'AP',14,P,null); |
| | | |
Exception encountered, of type "Error"
[ac88b632] /biodb/fall2015/index.php?diff=cur&oldid=2176&title=Blitvak_Week_6 Error from line 434 of /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /apps/xmlpipedb/biodb/fall2015/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(111): DiffFormatter->block()
#9 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(888): DiffFormatter->format()
#10 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(802): DifferenceEngine->generateTextDiffBody()
#11 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(733): DifferenceEngine->generateContentDiffBody()
#12 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(795): DifferenceEngine->showDiffPage()
#16 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(506): Article->showDiffPage()
#17 /apps/xmlpipedb/biodb/fall2015/includes/actions/ViewAction.php(44): Article->view()
#18 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(395): ViewAction->show()
#19 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(273): MediaWiki->performAction()
#20 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(566): MediaWiki->performRequest()
#21 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(414): MediaWiki->main()
#22 /apps/xmlpipedb/biodb/fall2015/index.php(44): MediaWiki->run()
#23 {main}