Difference between revisions of "Vpachec3 Week 6"
(added section application text file) |
(→Product text file: added a NOTE) |
||
(22 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==application text file== | ==application text file== | ||
− | My partner,Kevin, and I met on Sunday at 1:30pm in Seaver 120 to begin our assignment. Downloading and unzipping the file was not a problem. The problem occurred when we had to condense the information. What took me a while to understand is how to get rid of the tabs and spaces efficiently. | + | My partner,[https://xmlpipedb.cs.lmu.edu/biodb/fall2015/index.php/User:Kwyllie Kevin], and I met on Sunday at 1:30pm in Seaver 120 to begin our assignment. Downloading and unzipping the file was not a problem. The problem occurred when we had to condense the information. What took me a while to understand is how to get rid of the tabs and spaces efficiently and where to go from there. We had massive kickstart help from [https://xmlpipedb.cs.lmu.edu/biodb/fall2015/index.php/User:Anuvarsh Anu]!!! We are very grateful she was there on Sunday to help get the thought process going. |
Line 27: | Line 27: | ||
adding apostrophes to the varchar components in the file. We need the apostrophes so that they can be read as text in SQL. Booleans don't need the single quotes so we are in the clear for those. We also have to make sure that the nulls do not have apostrophes. The following pipeline shows that we need to have go by instance of each tilde to add in the apostrophe. And at the end, we had to make sure that the nulls didn't have apostrophes. | adding apostrophes to the varchar components in the file. We need the apostrophes so that they can be read as text in SQL. Booleans don't need the single quotes so we are in the clear for those. We also have to make sure that the nulls do not have apostrophes. The following pipeline shows that we need to have go by instance of each tilde to add in the apostrophe. And at the end, we had to make sure that the nulls didn't have apostrophes. | ||
− | cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g" | + | cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g" |
Line 37: | Line 37: | ||
− | + | To get to our final command we just added the wording and syntax necessary for SQL in front of the line so that we can just copy and paste all the commands straight into SQL. http://my.cs.lmu.edu/~vpachec3/application.sql.txt | |
− | FINAL COMMAND | + | |
+ | '''FINAL COMMAND''' | ||
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"|sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_code) values(/g" |sed "s/$/);/g" > ~/public_html/application.sql.txt | cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"|sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_code) values(/g" |sed "s/$/);/g" > ~/public_html/application.sql.txt | ||
+ | ==Product text file== | ||
+ | cat Product.txt | sed "s/\t/~/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/);/g" > ~/public_html/application.sql.txt | ||
+ | I used the same method to go about this pipe from the last pipe. However the above pipe left me with problems that need fixing. Here were the thing I needed to fix: | ||
+ | #There were two warning labels that became a problem. | ||
+ | # There were plenty of instances where there was a description similar to "5'-Phosphate" where the ' was meant to stand for prime. This was a problem because 'is also used as language so I had to take care of that. | ||
+ | #The column TECode, I had put as varchar when creating the table but I didn't put the column enteries with single quotes so that showed up as a problem. | ||
+ | #Finally, I need to have a command that got ride of the single quotes around the nulls. | ||
+ | |||
+ | Here is where I left off on Monday 10/12 | ||
+ | |||
+ | Thus, after playing around with the different commands several times, I got the pipe that worked. | ||
+ | http://my.cs.lmu.edu/~vpachec3/Product.sql.txt | ||
+ | |||
+ | '''FINAL COMMAND''': | ||
+ | cat Product.txt | sed "s/\t/~/g"|sed "s/'/\"/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/5"| sed s"/~/'~/6"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/'null'/null/g"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/');/g"|sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" > ~/public_html/Product.sql.txt | ||
+ | |||
+ | |||
+ | *'''NOTE''': I would like to acknowledge that after sending my pipe, Kevin and I realized that there was an easier way to get to the commands we need. Adding the tilde as a place holder was not needed. We used the tilde so we could count the instances so we could put in the commas and single quotes but it could have been better by replacing the tabs with commas and using the comma instances for putting in the single quotes. | ||
+ | |||
+ | == Questions to Answer == | ||
+ | |||
+ | 1. Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables. | ||
+ | |||
+ | |||
+ | '''create table application(ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar)''' | ||
+ | |||
+ | |||
+ | |||
+ | '''create table Product(ApplNo int, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus varchar, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar)''' | ||
+ | |||
+ | |||
+ | |||
+ | 2. Provide the <code>sed</code> command sequences that you used to convert the raw text files into sequences of SQL <code>insert</code> statements. | ||
+ | |||
+ | |||
+ | cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"|sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_code) values(/g" |sed "s/$/);/g" > ~/public_html/application.sql.txt | ||
+ | |||
+ | |||
+ | |||
+ | cat Product.txt | sed "s/\t/~/g"|sed "s/'/\"/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/5"| sed s"/~/'~/6"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/'null'/null/g"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/');/g"|sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" > ~/public_html/Product.sql.txt | ||
+ | |||
+ | |||
+ | |||
+ | 3. Using the command line, how can you determine the number of records in each file? Provide the command. | ||
+ | |||
+ | Use this command:cat application.txt| wc and the first set of numbers is the number of records. For this case, you subtract one because the first line is just the column names. So there are 19746 lines which means there are 19746 records for the application text file. For the Product text file, it is 32770. | ||
+ | |||
+ | |||
+ | 4. Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement. | ||
+ | |||
+ | '''select count(*) from Product''' The number was the same 32770 | ||
+ | '''select count(*) from application''' The number was the same 19746 | ||
+ | |||
+ | |||
+ | 5. In your database, are these numbers the same or different? Explain why you think so. | ||
+ | |||
+ | Initially, they would have been one off but I remember having to delete the first row in SQL because the first row is just the titles of the columns. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | 6. What are the names of the drug products that are administered in the ''form'' <code>INJECTABLE;INTRAVENOUS, SUBCUTANEOUS</code>? | ||
+ | |||
+ | '''select drugname from Product where form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS'''' | ||
+ | |||
+ | ##LOVENOX | ||
+ | ##VELCADE | ||
+ | ##VIDAZA | ||
+ | ##ENOXAPARIN SODIUM | ||
+ | ##ACTEMRA | ||
+ | ##AZACITIDINE | ||
+ | |||
+ | |||
+ | 7. What are the names of the drug products whose active ingredient (''activeingred'') is <code>ATROPINE</code>? | ||
+ | |||
+ | '''select drugname from Product where activeingred = 'ATROPINE'''' | ||
+ | |||
+ | ##ATROPEN | ||
+ | ##ATROPINE | ||
+ | |||
+ | |||
+ | 8. In what ''form''s and ''dosage''s can the drug product named <code>BENADRYL</code> be administered? | ||
+ | |||
+ | '''select form, dosage from Product where drugname = 'BENADRYL'''' | ||
+ | |||
+ | ##CAPSULE;ORAL;50MG | ||
+ | ##ELIXIR;ORAL;12.5MG/5ML | ||
+ | ##CAPSULE;ORAL";25MG" | ||
+ | ##INJECTABLE;INJECTION;10MG/ML | ||
+ | ##INJECTABLE;INJECTION";50MG/ML | ||
+ | |||
+ | |||
+ | 9. Which drug products have a name ''ending'' in <code>ESTROL</code>? | ||
+ | |||
+ | '''select drugname from Product where drugname like '%ESTROL%'''' | ||
+ | |||
+ | ##DIETHYLSTILBESTROL | ||
+ | ##STILBESTROL | ||
+ | ##DIENESTROL | ||
+ | ##MEGESTROL ACETATE | ||
+ | |||
+ | |||
+ | 10. Produce a table listing all of the known values for the ''therapeutic_potential'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''therapeutic_potential'' codes are explained in the ''ReviewClass_Lookup.txt'' file, in case you’re interested.) | ||
+ | |||
+ | select ther_potential, count (*) from application where ther_potential like '%' group by ther_potential | ||
+ | |||
+ | ##S*;3 | ||
+ | ##S;3459 | ||
+ | ##P;928 | ||
+ | ##P*;26 | ||
+ | |||
+ | |||
+ | 11. Produce a table listing all of the known values for the ''chemical_type'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''chemical_type'' codes are explained in the ''ChemTypeLookup.txt'' file, in case you’re interested.) | ||
+ | |||
+ | I realized that I had put varchar for the chemical_type instead of int, so I had to go and change my table and to be able to do this command. After I did that, here is the command for SQL:'''select chemical_type, count (*) from application where chemical_type >=0 or chemical_type <0 group by chemical_type''' | ||
+ | |||
+ | 6;87 | ||
+ | |||
+ | 14;34 | ||
+ | |||
+ | 34;9 | ||
+ | |||
+ | 8;27 | ||
+ | |||
+ | 1;1271 | ||
+ | |||
+ | 2;139 | ||
+ | |||
+ | 3;1464 | ||
+ | |||
+ | 24;8 | ||
+ | |||
+ | 7;35 | ||
+ | |||
+ | 10;2 | ||
+ | |||
+ | 23;8 | ||
+ | |||
+ | 4;323 | ||
+ | |||
+ | 5;1046 | ||
+ | |||
+ | 9;1 | ||
+ | |||
+ | |||
+ | 12. What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>? | ||
+ | |||
+ | '''select drugname from product inner join application on (product.applno = application.applno) where sponsorapplicant = 'MERCK' group by drugname''' | ||
+ | |||
+ | ##DECADRON | ||
+ | ##HUMORSOL | ||
+ | ##NEO-HYDELTRASOL | ||
+ | ##PRINIVIL | ||
+ | ##MAXALT-MLT | ||
+ | ##HYDROCORTONE | ||
+ | ##PERIACTIN | ||
+ | ##PROPECIA | ||
+ | ##PROSCAR | ||
+ | ##CLINORIL | ||
+ | ##PRINZIDE | ||
+ | ##ELSPAR | ||
+ | ##ALDOMET | ||
+ | ##ALDORIL D30 | ||
+ | ##EMEND | ||
+ | ##DIUPRES-500 | ||
+ | ##NEODECADRON | ||
+ | ##ALDOCLOR-250 | ||
+ | ##MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER | ||
+ | ##ARAMINE | ||
+ | ##TIAMATE | ||
+ | ##BLOCADREN | ||
+ | ##CANCIDAS | ||
+ | ##CHIBROXIN | ||
+ | ##CORTONE | ||
+ | ##PEPCID | ||
+ | ##TRUSOPT | ||
+ | ##REDISOL | ||
+ | ##VIOXX | ||
+ | ##FLOROPRYL | ||
+ | ##ALPHAREDISOL | ||
+ | ##DECADRON-LA | ||
+ | ##MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER" | ||
+ | ##PEPCID RPD | ||
+ | ##NOROXIN | ||
+ | ##ZOCOR | ||
+ | ##LERITINE | ||
+ | ##DECADERM | ||
+ | ##DOLOBID" | ||
+ | ##MANNITOL 25% | ||
+ | ##DECADRON W/ XYLOCAINE | ||
+ | ##ALDORIL D50 | ||
+ | ##TIMOLIDE 10-25 | ||
+ | ##CYCLAINE | ||
+ | ##HYDROPRES 25 | ||
+ | ##AMINOHIPPURATE SODIUM | ||
+ | ##MEVACOR | ||
+ | ##MODURETIC 5-50 | ||
+ | ##ALDORIL 25 | ||
+ | ##SINGULAIR | ||
+ | ##COLBENEMID | ||
+ | ##DIUPRES-250 | ||
+ | ##HYDELTRA-TBA | ||
+ | ##PRIMAXIN | ||
+ | ##BENEMID | ||
+ | ##MAXALT | ||
+ | ##FOSAMAX PLUS D | ||
+ | ##HYDELTRASOL | ||
+ | ##HYDROPRES 50 | ||
+ | ##ZOLINZA | ||
+ | ##ALDORIL 15 | ||
+ | ##FOSAMAX | ||
+ | ##DECASPRAY | ||
+ | ##COGENTIN | ||
+ | ##ALDOCLOR-150 | ||
+ | ##PEPCID PRESERVATIVE FREE | ||
+ | |||
+ | |||
+ | 13. Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>? | ||
+ | |||
+ | '''select sponsorapplicant from application inner join product on (application.applno = product.applno) where sponsorapplicant like '%LABS%' and activeingred like '%ASPIRIN%' and activeingred like '%CAFFEINE%' group by sponsorapplicant''' | ||
+ | |||
+ | ##ACTAVIS LABS UT INC | ||
+ | ##WATSON LABS | ||
==Links== | ==Links== | ||
[https://xmlpipedb.cs.lmu.edu/biodb/fall2015/index.php/User:Vpachec3 Vpachec3 User Page] | [https://xmlpipedb.cs.lmu.edu/biodb/fall2015/index.php/User:Vpachec3 Vpachec3 User Page] | ||
+ | |||
+ | {{Template:Vpachec3 journal links}} | ||
+ | |||
[[Category: Journal Entry]] | [[Category: Journal Entry]] |
Latest revision as of 23:30, 14 October 2015
application text file
My partner,Kevin, and I met on Sunday at 1:30pm in Seaver 120 to begin our assignment. Downloading and unzipping the file was not a problem. The problem occurred when we had to condense the information. What took me a while to understand is how to get rid of the tabs and spaces efficiently and where to go from there. We had massive kickstart help from Anu!!! We are very grateful she was there on Sunday to help get the thought process going.
Here is the break down for each section of the command:
cat application.txt
This opened the text file.
cat application.txt|sed "s/\t/~/g"
The added sed command replaced each tab instance with a tilde. We used a tilde as a filler because it was easier to put the tilde instead of another marker such as an asterick in terms of syntax. Now we need to get rid of the extra spaces in the line. However, we can't just use a command to get rid of the the spaces because some of the company names have spaces in them. Since the names only have one space we can make the command to get rid of spaces in pairs.
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"
The second sed command would allow for any pairs of spaces to reduce.
I am going to jump in my explanation and explain the next four added commands (see below). I am lumping the explanation because they all are going for the same goal. The fist of the four gets rid of the remaining space in front of the tilde by replacing it with just a tilde, no space. The following step was to make sure that the columns with no answer in them (any empty entry in the columns) have null instead of an empty section. We need to do this exact same command twice because the computer will only read the groups of two if there were a greater number it would only count the first two. The last command of the four newly added makes sure that the end of line has a null if there is no information in the column for each row.
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"
adding apostrophes to the varchar components in the file. We need the apostrophes so that they can be read as text in SQL. Booleans don't need the single quotes so we are in the clear for those. We also have to make sure that the nulls do not have apostrophes. The following pipeline shows that we need to have go by instance of each tilde to add in the apostrophe. And at the end, we had to make sure that the nulls didn't have apostrophes.
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"
change the tilde into commas.
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"
To get to our final command we just added the wording and syntax necessary for SQL in front of the line so that we can just copy and paste all the commands straight into SQL. http://my.cs.lmu.edu/~vpachec3/application.sql.txt
FINAL COMMAND
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"|sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_code) values(/g" |sed "s/$/);/g" > ~/public_html/application.sql.txt
Product text file
cat Product.txt | sed "s/\t/~/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/);/g" > ~/public_html/application.sql.txt
I used the same method to go about this pipe from the last pipe. However the above pipe left me with problems that need fixing. Here were the thing I needed to fix:
- There were two warning labels that became a problem.
- There were plenty of instances where there was a description similar to "5'-Phosphate" where the ' was meant to stand for prime. This was a problem because 'is also used as language so I had to take care of that.
- The column TECode, I had put as varchar when creating the table but I didn't put the column enteries with single quotes so that showed up as a problem.
- Finally, I need to have a command that got ride of the single quotes around the nulls.
Here is where I left off on Monday 10/12
Thus, after playing around with the different commands several times, I got the pipe that worked. http://my.cs.lmu.edu/~vpachec3/Product.sql.txt
FINAL COMMAND:
cat Product.txt | sed "s/\t/~/g"|sed "s/'/\"/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/5"| sed s"/~/'~/6"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/'null'/null/g"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/');/g"|sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" > ~/public_html/Product.sql.txt
- NOTE: I would like to acknowledge that after sending my pipe, Kevin and I realized that there was an easier way to get to the commands we need. Adding the tilde as a place holder was not needed. We used the tilde so we could count the instances so we could put in the commas and single quotes but it could have been better by replacing the tabs with commas and using the comma instances for putting in the single quotes.
Questions to Answer
1. Provide the DDL (create table
) statements that you used for your application and product tables.
create table application(ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar)
create table Product(ApplNo int, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus varchar, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar)
2. Provide the sed
command sequences that you used to convert the raw text files into sequences of SQL insert
statements.
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"|sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_code) values(/g" |sed "s/$/);/g" > ~/public_html/application.sql.txt
cat Product.txt | sed "s/\t/~/g"|sed "s/'/\"/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/5"| sed s"/~/'~/6"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/'null'/null/g"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/');/g"|sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" > ~/public_html/Product.sql.txt
3. Using the command line, how can you determine the number of records in each file? Provide the command.
Use this command:cat application.txt| wc and the first set of numbers is the number of records. For this case, you subtract one because the first line is just the column names. So there are 19746 lines which means there are 19746 records for the application text file. For the Product text file, it is 32770.
4. Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL select
statement.
select count(*) from Product The number was the same 32770 select count(*) from application The number was the same 19746
5. In your database, are these numbers the same or different? Explain why you think so.
Initially, they would have been one off but I remember having to delete the first row in SQL because the first row is just the titles of the columns.
6. What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS
?
select drugname from Product where form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS'
- LOVENOX
- VELCADE
- VIDAZA
- ENOXAPARIN SODIUM
- ACTEMRA
- AZACITIDINE
7. What are the names of the drug products whose active ingredient (activeingred) is ATROPINE
?
select drugname from Product where activeingred = 'ATROPINE'
- ATROPEN
- ATROPINE
8. In what forms and dosages can the drug product named BENADRYL
be administered?
select form, dosage from Product where drugname = 'BENADRYL'
- CAPSULE;ORAL;50MG
- ELIXIR;ORAL;12.5MG/5ML
- CAPSULE;ORAL";25MG"
- INJECTABLE;INJECTION;10MG/ML
- INJECTABLE;INJECTION";50MG/ML
9. Which drug products have a name ending in ESTROL
?
select drugname from Product where drugname like '%ESTROL%'
- DIETHYLSTILBESTROL
- STILBESTROL
- DIENESTROL
- MEGESTROL ACETATE
10. Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
select ther_potential, count (*) from application where ther_potential like '%' group by ther_potential
- S*;3
- S;3459
- P;928
- P*;26
11. Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
I realized that I had put varchar for the chemical_type instead of int, so I had to go and change my table and to be able to do this command. After I did that, here is the command for SQL:select chemical_type, count (*) from application where chemical_type >=0 or chemical_type <0 group by chemical_type
6;87
14;34
34;9
8;27
1;1271
2;139
3;1464
24;8
7;35
10;2
23;8
4;323
5;1046
9;1
12. What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK
?
select drugname from product inner join application on (product.applno = application.applno) where sponsorapplicant = 'MERCK' group by drugname
- DECADRON
- HUMORSOL
- NEO-HYDELTRASOL
- PRINIVIL
- MAXALT-MLT
- HYDROCORTONE
- PERIACTIN
- PROPECIA
- PROSCAR
- CLINORIL
- PRINZIDE
- ELSPAR
- ALDOMET
- ALDORIL D30
- EMEND
- DIUPRES-500
- NEODECADRON
- ALDOCLOR-250
- MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER
- ARAMINE
- TIAMATE
- BLOCADREN
- CANCIDAS
- CHIBROXIN
- CORTONE
- PEPCID
- TRUSOPT
- REDISOL
- VIOXX
- FLOROPRYL
- ALPHAREDISOL
- DECADRON-LA
- MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER"
- PEPCID RPD
- NOROXIN
- ZOCOR
- LERITINE
- DECADERM
- DOLOBID"
- MANNITOL 25%
- DECADRON W/ XYLOCAINE
- ALDORIL D50
- TIMOLIDE 10-25
- CYCLAINE
- HYDROPRES 25
- AMINOHIPPURATE SODIUM
- MEVACOR
- MODURETIC 5-50
- ALDORIL 25
- SINGULAIR
- COLBENEMID
- DIUPRES-250
- HYDELTRA-TBA
- PRIMAXIN
- BENEMID
- MAXALT
- FOSAMAX PLUS D
- HYDELTRASOL
- HYDROPRES 50
- ZOLINZA
- ALDORIL 15
- FOSAMAX
- DECASPRAY
- COGENTIN
- ALDOCLOR-150
- PEPCID PRESERVATIVE FREE
13. Which sponsor applicant companies have the text LABS
in their names and have products whose active ingredients (activeingred) include both ASPIRIN
and CAFFEINE
?
select sponsorapplicant from application inner join product on (application.applno = product.applno) where sponsorapplicant like '%LABS%' and activeingred like '%ASPIRIN%' and activeingred like '%CAFFEINE%' group by sponsorapplicant
- ACTAVIS LABS UT INC
- WATSON LABS
Links
- Individual Journal Week 2
- Individual Journal Week 3
- Individual Journal Week 4
- Individual Journal Week 5
- Individual Journal Week 6
- Individual Journal Week 7
- Individual Journal Week 8
- Individual Journal Week 9
- Individual Journal Week 10
- Individual Journal Week 11
- Individual Journal Week 12
- Individual Journal Week 13
- Individual Journal Week 14
- Individual Journal Week 15