Difference between revisions of "Vpachec3 Week 6"

From LMU BioDB 2015
Jump to: navigation, search
(Questions to Answer: made that into a section instead of a subsection)
(Questions to Answer: added answers to the questions)
Line 63: Line 63:
 
== Questions to Answer ==
 
== Questions to Answer ==
  
# Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables.
+
1. Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables.
# Provide the <code>sed</code> command sequences that you used to convert the raw text files into sequences of SQL <code>insert</code> statements.
+
create table application(ApplNo int,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_code)
# Using the command line, how can you determine the number of records in each file? Provide the command.
+
create table Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred)
# Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement.
+
 
# In your database, are these numbers the same or different? Explain why you think so.
+
2. Provide the <code>sed</code> command sequences that you used to convert the raw text files into sequences of SQL <code>insert</code> statements.
#: ''For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.''
+
 
# What are the names of the drug products that are administered in the ''form'' <code>INJECTABLE;INTRAVENOUS, SUBCUTANEOUS</code>?
+
 
# What are the names of the drug products whose active ingredient (''activeingred'') is <code>ATROPINE</code>?
+
3. Using the command line, how can you determine the number of records in each file? Provide the command.
# In what ''form''s and ''dosage''s can the drug product named <code>BENADRYL</code> be administered?
+
Use this command:cat application.txt| wc and the first set of numbers is the number of records. For this case, you subtract one because the first line is just the column names. So there are 19746 lines which means there are 19746 records for the application text file. For the Product text file, it is 32770.
# Which drug products have a name ''ending'' in <code>ESTROL</code>?
+
 
# Produce a table listing all of the known values for the ''therapeutic_potential'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''therapeutic_potential'' codes are explained in the ''ReviewClass_Lookup.txt'' file, in case you’re interested.)
+
 
# Produce a table listing all of the known values for the ''chemical_type'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''chemical_type'' codes are explained in the ''ChemTypeLookup.txt'' file, in case you’re interested.)
+
4. Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement.
# What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>?
+
'''select count(*) from Product''' The number was the same 32770
# Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>?
+
'''select count(*) from application''' The number was the same 19746
 +
 
 +
 
 +
5. In your database, are these numbers the same or different? Explain why you think so.
 +
Initially, they would have been one off but I remember having to delete the first row in SQL because the first row is just the titles of the columns.  
 +
 
 +
 
 +
 
 +
 
 +
6. What are the names of the drug products that are administered in the ''form'' <code>INJECTABLE;INTRAVENOUS, SUBCUTANEOUS</code>?
 +
 
 +
'''select drugname from Product where form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS''''
 +
 
 +
##LOVENOX
 +
##VELCADE
 +
##VIDAZA
 +
##ENOXAPARIN SODIUM
 +
##ACTEMRA
 +
##AZACITIDINE
 +
 
 +
 
 +
7. What are the names of the drug products whose active ingredient (''activeingred'') is <code>ATROPINE</code>?
 +
 
 +
'''select drugname from Product where activeingred = 'ATROPINE''''
 +
 
 +
##ATROPEN
 +
##ATROPINE
 +
 
 +
 
 +
8. In what ''form''s and ''dosage''s can the drug product named <code>BENADRYL</code> be administered?
 +
 
 +
'''select form, dosage from Product where drugname = 'BENADRYL''''
 +
 
 +
##CAPSULE;ORAL;50MG
 +
##ELIXIR;ORAL;12.5MG/5ML
 +
##CAPSULE;ORAL";25MG"
 +
##INJECTABLE;INJECTION;10MG/ML
 +
##INJECTABLE;INJECTION";50MG/ML
 +
 
 +
 
 +
9. Which drug products have a name ''ending'' in <code>ESTROL</code>?  
 +
 
 +
'''select drugname from Product where drugname like '%ESTROL%''''
 +
 
 +
##DIETHYLSTILBESTROL
 +
##STILBESTROL
 +
##DIENESTROL
 +
##MEGESTROL ACETATE
 +
 
 +
 
 +
10. Produce a table listing all of the known values for the ''therapeutic_potential'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''therapeutic_potential'' codes are explained in the ''ReviewClass_Lookup.txt'' file, in case you’re interested.)
 +
 
 +
select ther_potential, count (*) from application where ther_potential like '%' group by ther_potential
 +
 
 +
##S*;3
 +
##S;3459
 +
##P;928
 +
##P*;26
 +
 
 +
 
 +
11. Produce a table listing all of the known values for the ''chemical_type'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''chemical_type'' codes are explained in the ''ChemTypeLookup.txt'' file, in case you’re interested.)
 +
 
 +
 
 +
12. What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>?
 +
 
 +
'''select drugname from product inner join application on (product.applno = application.applno) where sponsorapplicant = 'MERCK' group by drugname'''
 +
 
 +
##DECADRON
 +
##HUMORSOL
 +
##NEO-HYDELTRASOL
 +
##PRINIVIL
 +
##MAXALT-MLT
 +
##HYDROCORTONE
 +
##PERIACTIN
 +
##PROPECIA
 +
##PROSCAR
 +
##CLINORIL
 +
##PRINZIDE
 +
##ELSPAR
 +
##ALDOMET
 +
##ALDORIL D30
 +
##EMEND
 +
##DIUPRES-500
 +
##NEODECADRON
 +
##ALDOCLOR-250
 +
##MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER
 +
##ARAMINE
 +
##TIAMATE
 +
##BLOCADREN
 +
##CANCIDAS
 +
##CHIBROXIN
 +
##CORTONE
 +
##PEPCID
 +
##TRUSOPT
 +
##REDISOL
 +
##VIOXX
 +
##FLOROPRYL
 +
##ALPHAREDISOL
 +
##DECADRON-LA
 +
##MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER"
 +
##PEPCID RPD
 +
##NOROXIN
 +
##ZOCOR
 +
##LERITINE
 +
##DECADERM
 +
##DOLOBID"
 +
##MANNITOL 25%
 +
##DECADRON W/ XYLOCAINE
 +
##ALDORIL D50
 +
##TIMOLIDE 10-25
 +
##CYCLAINE
 +
##HYDROPRES 25
 +
##AMINOHIPPURATE SODIUM
 +
##MEVACOR
 +
##MODURETIC 5-50
 +
##ALDORIL 25
 +
##SINGULAIR
 +
##COLBENEMID
 +
##DIUPRES-250
 +
##HYDELTRA-TBA
 +
##PRIMAXIN
 +
##BENEMID
 +
##MAXALT
 +
##FOSAMAX PLUS D
 +
##HYDELTRASOL
 +
##HYDROPRES 50
 +
##ZOLINZA
 +
##ALDORIL 15
 +
##FOSAMAX
 +
##DECASPRAY
 +
##COGENTIN
 +
##ALDOCLOR-150
 +
##PEPCID PRESERVATIVE FREE
 +
 
 +
 
 +
13. Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>?
 +
 
 +
'''select sponsorapplicant from application inner join product on (application.applno = product.applno) where sponsorapplicant like '%LABS%' and activeingred like '%ASPIRIN%' and activeingred like '%CAFFEINE%' group by sponsorapplicant'''
 +
 
 +
##ACTAVIS LABS UT INC
 +
##WATSON LABS
  
 
==Links==
 
==Links==

Revision as of 06:00, 14 October 2015

application text file

My partner,Kevin, and I met on Sunday at 1:30pm in Seaver 120 to begin our assignment. Downloading and unzipping the file was not a problem. The problem occurred when we had to condense the information. What took me a while to understand is how to get rid of the tabs and spaces efficiently and where to go from there. We had massive kickstart help from Anu!!! We are very grateful she was there on Sunday to help get the thought process going.


Here is the break down for each section of the command:

cat application.txt 

This opened the text file.

cat application.txt|sed "s/\t/~/g"

The added sed command replaced each tab instance with a tilde. We used a tilde as a filler because it was easier to put the tilde instead of another marker such as an asterick in terms of syntax. Now we need to get rid of the extra spaces in the line. However, we can't just use a command to get rid of the the spaces because some of the company names have spaces in them. Since the names only have one space we can make the command to get rid of spaces in pairs.

cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"

The second sed command would allow for any pairs of spaces to reduce.

I am going to jump in my explanation and explain the next four added commands (see below). I am lumping the explanation because they all are going for the same goal. The fist of the four gets rid of the remaining space in front of the tilde by replacing it with just a tilde, no space. The following step was to make sure that the columns with no answer in them (any empty entry in the columns) have null instead of an empty section. We need to do this exact same command twice because the computer will only read the groups of two if there were a greater number it would only count the first two. The last command of the four newly added makes sure that the end of line has a null if there is no information in the column for each row.

cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"


adding apostrophes to the varchar components in the file. We need the apostrophes so that they can be read as text in SQL. Booleans don't need the single quotes so we are in the clear for those. We also have to make sure that the nulls do not have apostrophes. The following pipeline shows that we need to have go by instance of each tilde to add in the apostrophe. And at the end, we had to make sure that the nulls didn't have apostrophes.

cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"


change the tilde into commas.

cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"


To get to our final command we just added the wording and syntax necessary for SQL in front of the line so that we can just copy and paste all the commands straight into SQL. http://my.cs.lmu.edu/~vpachec3/application.sql.txt

FINAL COMMAND

cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"|sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_code) values(/g" |sed "s/$/);/g"  > ~/public_html/application.sql.txt

Product text file

Here is where I left off on Monday 10/12

cat Product.txt | sed "s/\t/~/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/);/g"  > ~/public_html/application.sql.txt

I used the same method to go about this pipe from the last pipe. However the above pipe left me with problems that need fixing. Here were the thing I needed to fix:

  1. There were two warning labels that became a problem.
  2. There were plenty of instances where there was a description similar to "5'-Phosphate" where the ' was meant to stand for prime. This was a problem because 'is also used as language so I had to take care of that.
  3. The column TECode, I had put as varchar when creating the table but I didn't put the column enteries with single quotes so that showed up as a problem.
  4. Finally, I need to have a command that got ride of the single quotes around the nulls.


Thus, after playing around with the different commands several times, I got the pipe that worked.


FINAL COMMAND:

 cat Product.txt | sed "s/\t/~/g"|sed "s/'/\"/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/5"| sed s"/~/'~/6"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/'null'/null/g"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/');/g"|sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" > ~/public_html/Product.sql.txt

Questions to Answer

1. Provide the DDL (create table) statements that you used for your application and product tables. create table application(ApplNo int,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_code) create table Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred)

2. Provide the sed command sequences that you used to convert the raw text files into sequences of SQL insert statements.


3. Using the command line, how can you determine the number of records in each file? Provide the command. Use this command:cat application.txt| wc and the first set of numbers is the number of records. For this case, you subtract one because the first line is just the column names. So there are 19746 lines which means there are 19746 records for the application text file. For the Product text file, it is 32770.


4. Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL select statement. select count(*) from Product The number was the same 32770 select count(*) from application The number was the same 19746


5. In your database, are these numbers the same or different? Explain why you think so. Initially, they would have been one off but I remember having to delete the first row in SQL because the first row is just the titles of the columns.



6. What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS?

select drugname from Product where form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS'

    1. LOVENOX
    2. VELCADE
    3. VIDAZA
    4. ENOXAPARIN SODIUM
    5. ACTEMRA
    6. AZACITIDINE


7. What are the names of the drug products whose active ingredient (activeingred) is ATROPINE?

select drugname from Product where activeingred = 'ATROPINE'

    1. ATROPEN
    2. ATROPINE


8. In what forms and dosages can the drug product named BENADRYL be administered?

select form, dosage from Product where drugname = 'BENADRYL'

    1. CAPSULE;ORAL;50MG
    2. ELIXIR;ORAL;12.5MG/5ML
    3. CAPSULE;ORAL";25MG"
    4. INJECTABLE;INJECTION;10MG/ML
    5. INJECTABLE;INJECTION";50MG/ML


9. Which drug products have a name ending in ESTROL?

select drugname from Product where drugname like '%ESTROL%'

    1. DIETHYLSTILBESTROL
    2. STILBESTROL
    3. DIENESTROL
    4. MEGESTROL ACETATE


10. Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)

select ther_potential, count (*) from application where ther_potential like '%' group by ther_potential

    1. S*;3
    2. S;3459
    3. P;928
    4. P*;26


11. Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)


12. What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK?

select drugname from product inner join application on (product.applno = application.applno) where sponsorapplicant = 'MERCK' group by drugname

    1. DECADRON
    2. HUMORSOL
    3. NEO-HYDELTRASOL
    4. PRINIVIL
    5. MAXALT-MLT
    6. HYDROCORTONE
    7. PERIACTIN
    8. PROPECIA
    9. PROSCAR
    10. CLINORIL
    11. PRINZIDE
    12. ELSPAR
    13. ALDOMET
    14. ALDORIL D30
    15. EMEND
    16. DIUPRES-500
    17. NEODECADRON
    18. ALDOCLOR-250
    19. MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER
    20. ARAMINE
    21. TIAMATE
    22. BLOCADREN
    23. CANCIDAS
    24. CHIBROXIN
    25. CORTONE
    26. PEPCID
    27. TRUSOPT
    28. REDISOL
    29. VIOXX
    30. FLOROPRYL
    31. ALPHAREDISOL
    32. DECADRON-LA
    33. MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER"
    34. PEPCID RPD
    35. NOROXIN
    36. ZOCOR
    37. LERITINE
    38. DECADERM
    39. DOLOBID"
    40. MANNITOL 25%
    41. DECADRON W/ XYLOCAINE
    42. ALDORIL D50
    43. TIMOLIDE 10-25
    44. CYCLAINE
    45. HYDROPRES 25
    46. AMINOHIPPURATE SODIUM
    47. MEVACOR
    48. MODURETIC 5-50
    49. ALDORIL 25
    50. SINGULAIR
    51. COLBENEMID
    52. DIUPRES-250
    53. HYDELTRA-TBA
    54. PRIMAXIN
    55. BENEMID
    56. MAXALT
    57. FOSAMAX PLUS D
    58. HYDELTRASOL
    59. HYDROPRES 50
    60. ZOLINZA
    61. ALDORIL 15
    62. FOSAMAX
    63. DECASPRAY
    64. COGENTIN
    65. ALDOCLOR-150
    66. PEPCID PRESERVATIVE FREE


13. Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE?

select sponsorapplicant from application inner join product on (application.applno = product.applno) where sponsorapplicant like '%LABS%' and activeingred like '%ASPIRIN%' and activeingred like '%CAFFEINE%' group by sponsorapplicant

    1. ACTAVIS LABS UT INC
    2. WATSON LABS

Links

Vpachec3 User Page