Difference between revisions of "Troque Week 6"
(Starting to answer the questions) |
(Updated journal with solutions) |
||
Line 80: | Line 80: | ||
OrphanCode) values (/g" | | OrphanCode) values (/g" | | ||
sed "1d" > ~/public_html/application.sql.txt | sed "1d" > ~/public_html/application.sql.txt | ||
+ | |||
+ | But then I realized that there was an even easier way of doing this assignment since PostgreSQL accepts quotes around Boolean values and numbers. (I didn't notice the tips that were before the questions so I didn't realize right away that we could use less commands): | ||
+ | cat application.txt | | ||
+ | sed "s/\t/','/g" | | ||
+ | sed "s/ *'/'/g" | | ||
+ | sed "s/\r$/');/g" | | ||
+ | sed "s/''/null/g" | | ||
+ | sed "s/'//1" | | ||
+ | sed "1d" | | ||
+ | sed "s/^/insert into application (ApplNo,ApplType,SponsorApplicant,MostRecentLabelFlag,CurrentPatentFlag,ActionType, | ||
+ | ChemicalType,Ther_Potential,OrphanCode) values (/g" > ~/public_html/application.sql.txt | ||
=== Formatting product.txt file === | === Formatting product.txt file === | ||
Line 85: | Line 96: | ||
cat Product.txt | | cat Product.txt | | ||
− | + | sed "s/'/\"/g" | | |
− | sed "s/ | + | sed "s/\t/','/g"| |
− | sed "s/\t/, | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
sed "s/\r$/');/g" | | sed "s/\r$/');/g" | | ||
− | sed "s/' | + | sed "s/''/null/g" | |
+ | sed "s/'//1" | | ||
sed "s/^/insert into product (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode, | sed "s/^/insert into product (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode, | ||
− | + | ReferenceDrug,drugname,activeingred) values (/g" > ~/public_html/Product.sql.txt | |
− | + | ||
=== Questions to answer === | === Questions to answer === | ||
− | * Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables. | + | * '''Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables.''' |
'''application.txt:''' | '''application.txt:''' | ||
create table application (ApplNo int primary key, ApplType varchar, | create table application (ApplNo int primary key, ApplType varchar, | ||
Line 110: | Line 111: | ||
ActionType varchar, ChemicalType int, Ther_Potential varchar, OrphanCode varchar); | ActionType varchar, ChemicalType int, Ther_Potential varchar, OrphanCode varchar); | ||
− | ''' | + | '''Product.txt:''' |
create table product (ApplNo int references application, ProductNo int, Form varchar, | create table product (ApplNo int references application, ProductNo int, Form varchar, | ||
Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int, | Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int, | ||
drugname varchar, activeingred varchar); | drugname varchar, activeingred varchar); | ||
− | * Provide the <code>sed</code> command sequences that you used to convert the raw text files into sequences of SQL <code>insert</code> statements. | + | * '''Provide the <code>sed</code> command sequences that you used to convert the raw text files into sequences of SQL <code>insert</code> statements.''' |
'''application.txt:''' | '''application.txt:''' | ||
cat application.txt | | cat application.txt | | ||
− | sed "s/\t/ | + | sed "s/\t/','/g" | |
− | + | sed "s/ *'/'/g" | | |
− | + | sed "s/\r$/');/g" | | |
− | sed "s/ | + | sed "s/''/null/g" | |
− | + | sed "s/'//1" | | |
− | + | sed "1d" | | |
− | sed "s/ | + | sed "s/^/insert into application (ApplNo,ApplType,SponsorApplicant,MostRecentLabelFlag,CurrentPatentFlag,ActionType, |
− | + | ChemicalType,Ther_Potential,OrphanCode) values (/g" > ~/public_html/application.sql.txt | |
− | + | ||
− | + | ||
− | + | ||
− | sed "s/ | + | |
− | + | ||
− | sed "s | + | |
− | + | ||
− | sed " | + | |
− | + | ||
− | sed "s/^/insert into application (ApplNo,ApplType,SponsorApplicant,MostRecentLabelFlag, | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
'''Product.txt:''' | '''Product.txt:''' | ||
cat Product.txt | | cat Product.txt | | ||
− | + | sed "s/'/\"/g" | | |
− | sed "s/ | + | sed "s/\t/','/g"| |
− | sed "s/\t/, | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
sed "s/\r$/');/g" | | sed "s/\r$/');/g" | | ||
− | sed "s/' | + | sed "s/''/null/g" | |
+ | sed "s/'//1" | | ||
sed "s/^/insert into product (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode, | sed "s/^/insert into product (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode, | ||
− | + | ReferenceDrug,drugname,activeingred) values (/g" > ~/public_html/Product.sql.txt | |
− | + | ||
− | * Using the command line, how can you determine the number of records in each file? Provide the command. | + | |
+ | * '''Using the command line, how can you determine the number of records in each file? Provide the command.''' | ||
In order to determine how many records are in each file, we use the <code>wc</code> since this gives us the number of lines, word count, and byte count for each file. The number of records in each file, therefore, is the number of lines minus 1 since the first line for each is actually a row of labels. application.txt = 19746 records and Product.txt = 32770 records. | In order to determine how many records are in each file, we use the <code>wc</code> since this gives us the number of lines, word count, and byte count for each file. The number of records in each file, therefore, is the number of lines minus 1 since the first line for each is actually a row of labels. application.txt = 19746 records and Product.txt = 32770 records. | ||
The commands are as follows: | The commands are as follows: | ||
+ | |||
'''application.txt:''' | '''application.txt:''' | ||
cat application.txt | wc | cat application.txt | wc | ||
Line 169: | Line 149: | ||
cat product.txt | wc | cat product.txt | wc | ||
− | * Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement. | + | * '''Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement.''' |
In SQL we have to use the keyword <code>count</code>. The result will be in a table with the column label ''count bigint''. | In SQL we have to use the keyword <code>count</code>. The result will be in a table with the column label ''count bigint''. | ||
Using SQL, the commands are as follows: | Using SQL, the commands are as follows: | ||
Line 182: | Line 162: | ||
So both numbers match up with what the command line gave. | So both numbers match up with what the command line gave. | ||
− | * In your database, are these numbers the same or different? Explain why you think so. | + | * '''In your database, are these numbers the same or different? Explain why you think so. |
− | The numbers are the same since what <code>wc</code> gives since the SQL table ''technically'' gives the records + the 1 row of labels that was in the first line of each of the files. | + | The numbers are the same since what <code>wc</code> gives since the SQL table ''technically'' gives the records + the 1 row of labels that was in the first line of each of the files.''' |
*: ''For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.'' | *: ''For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.'' | ||
− | * What are the names of the drug products that are administered in the ''form'' <code>INJECTABLE;INTRAVENOUS, SUBCUTANEOUS</code>? | + | |
− | * What are the names of the drug products whose active ingredient (''activeingred'') is <code>ATROPINE</code>? | + | * '''What are the names of the drug products that are administered in the ''form'' <code>INJECTABLE;INTRAVENOUS, SUBCUTANEOUS</code>?''' |
+ | (a) The drugs are "LOVENOX", "VELCADE", "VIDAZA", "ENOXAPARIN SODIUM", "ACTEMRA", and "AZACITIDINE". <br> | ||
+ | (b) <code>select drugname from product where form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS';</code>. | ||
+ | * '''What are the names of the drug products whose active ingredient (''activeingred'') is <code>ATROPINE</code>?''' | ||
+ | (a) There were only two, ATROPEN and ATROPINE with the active ingredient <code>ATROPINE</code>. Note: There were 4 instances of ATROPEN in the database. <br> | ||
+ | (b) select drugname from product where activeingred = 'ATROPINE'; | ||
* In what ''form''s and ''dosage''s can the drug product named <code>BENADRYL</code> be administered? | * In what ''form''s and ''dosage''s can the drug product named <code>BENADRYL</code> be administered? | ||
− | * Which drug products have a name ''ending'' in <code>ESTROL</code>? | + | (a) |
− | * Produce a table listing all of the known values for the ''therapeutic_potential'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''therapeutic_potential'' codes are explained in the ''ReviewClass_Lookup.txt'' file, in case you’re interested.) | + | {| class="wikitable" style="text-align: center; width: 200px; height: 200px;" |
− | * Produce a table listing all of the known values for the ''chemical_type'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''chemical_type'' codes are explained in the ''ChemTypeLookup.txt'' file, in case you’re interested.) | + | |- |
− | * What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>? | + | ! |
− | * Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>? | + | ! form |
+ | ! dosage | ||
+ | |- | ||
+ | !1 | ||
+ | | "CAPSULE;ORAL" || "50MG" | ||
+ | |- | ||
+ | !2 | ||
+ | | "ELIXIR;ORAL" || "12.5MG/5ML" | ||
+ | |- | ||
+ | !3 | ||
+ | | "CAPSULE;ORAL" || "25MG" | ||
+ | |- | ||
+ | !4 | ||
+ | | "INJECTABLE;INJECTION" || "10MG/ML" | ||
+ | |- | ||
+ | !5 | ||
+ | | "INJECTABLE;INJECTION" || "50MG/ML" | ||
+ | |- | ||
+ | |} | ||
+ | (b) <code>select form, dosage from product where drugname = 'BENADRYL';</code> | ||
+ | * '''Which drug products have a name ''ending'' in <code>ESTROL</code>?''' | ||
+ | (a) "DIETHYLSTILBESTROL", "STILBESTROL", and "DIENESTROL". <br> | ||
+ | (b) <code>select drugname from product where drugname ~ 'ESTROL$';</code> | ||
+ | * '''Produce a table listing all of the known values for the ''therapeutic_potential'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''therapeutic_potential'' codes are explained in the ''ReviewClass_Lookup.txt'' file, in case you’re interested.)''' | ||
+ | (a) | ||
+ | (b) | ||
+ | * '''Produce a table listing all of the known values for the ''chemical_type'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''chemical_type'' codes are explained in the ''ChemTypeLookup.txt'' file, in case you’re interested.)''' | ||
+ | (a) | ||
+ | (b) | ||
+ | * '''What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>?''' | ||
+ | (a) | ||
+ | (b) | ||
+ | * '''Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>?''' | ||
+ | (a) | ||
+ | (b) | ||
+ | |||
{{Template:Troque_Journal}} | {{Template:Troque_Journal}} |
Revision as of 02:45, 15 October 2015
Contents
Creating the Table on PostgreSQL
First, login to the LMU CS server using ssh. Type in the following in a command prompt (Windows) or terminal (Mac) window:
ssh <username@my.cs.lmu.edu>
Enter your password. Note: You will not visibly see the cursor move when typing in your password so just keep typing. You will end up on your home directory.
Here, you can use the command ls
in order to see the list of files in the directory. Then we can start manipulating some files.
On the PostgreSQL command window, type the following to create a table called "application":
create table application (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelFlag boolean, CurrentPatentFlag boolean, ActionType varchar, ChemicalType int, Ther_Potential varchar, OrphanCode varchar);
To create the table for the product data, enter the following command on the PostgreSQL window:
create table product (ApplNo int references application, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
Processing the Data on the Command Line
Formatting the application.txt file
Before starting this assignment, I didn't realized that I could easily remove the spaces using the regex for the tab character \t
, and so I tried to take out the unnecessary blank spaces using the regex \s*
. The result of using this method is the following (which didn't really work out since I got frustrated from all the outliers in this txt file and I eventually trashed this set of commands):
cat application.txt | sed -e 's/\s\+/,/2' | sed -e 's/\s/,/1'| sed "s/,/,'/1" | sed "s/,/',/2" | sed "s/,/,'/2" | sed "s/\s*False/',False/1" | sed "s/False\s*False/FalseFalse/g" | sed "s/False\s*/False,/g" | sed "s/,/,'/5" | sed "s/,'../&',/3" | sed "s/,\s*/,/6" | sed "s/\t/,'/" | sed "s/\t/'/" | sed "s/P',$/&,,/g" | sed "s/TA',$/&,,/g" | sed "s//null/g" | sed "s/V\r$/,'V'/g" | sed "s/'S'/&,null/g" | sed "s/'P'/&,null/g" | sed "1d" | sed "s/\t//g" | sed "s/[0-9],null/&,null/" | sed "s/^/insert into application (ApplNo,ApplType,SponsorApplicant,MostRecentLabelFlag,CurrentPatentFlag, ActionType,ChemicalType,Ther_Potential,OrphanCode) values (/g" | sed "s/\r$/);/" | sed "s/null$/&);/g" | sed "s/'V'$/&);/" > ~/public_html/application.sql.txt
From doing this, I realized how badly formatted these data are so I decided to start over:
cat application.txt | sed "s/\t/,/g" | sed "s/,/,'/1" | sed "s/,'[A-Z]/&'/1" | sed "s/,/,'/2" | sed "s/\s*,False/',False/1" | sed "s/,/,'/5 " | sed "s/,/',/6" | sed "s/,,/,null,/" | sed "s/,,/,null,/" | sed "s/,\r$/,null/g" | sed "s/,S,/,'S',/g" | sed "s/,P,/,'P',/g" | sed "s/,V/,'V'/g" | sed "s/P\*/'P\*'/g" | sed "s/S\*/'S\*'/g" | sed "s/,AP,/,'AP',/g" | sed "s/$/);/g" | sed "s/^/insert into application (ApplNo,ApplType,SponsorApplicant,MostRecentLabelFlag, CurrentPatentFlag,ActionType,ChemicalType,Ther_Potential, OrphanCode) values (/g" | sed "1d" > ~/public_html/application.sql.txt
But then I realized that there was an even easier way of doing this assignment since PostgreSQL accepts quotes around Boolean values and numbers. (I didn't notice the tips that were before the questions so I didn't realize right away that we could use less commands):
cat application.txt | sed "s/\t/','/g" | sed "s/ *'/'/g" | sed "s/\r$/');/g" | sed "s//null/g" | sed "s/'//1" | sed "1d" | sed "s/^/insert into application (ApplNo,ApplType,SponsorApplicant,MostRecentLabelFlag,CurrentPatentFlag,ActionType, ChemicalType,Ther_Potential,OrphanCode) values (/g" > ~/public_html/application.sql.txt
Formatting product.txt file
Doing this file proved easier than I thought since I already knew to look out for the outliers and inspect the data first before attempting to do the processing:
cat Product.txt | sed "s/'/\"/g" | sed "s/\t/','/g"| sed "s/\r$/');/g" | sed "s//null/g" | sed "s/'//1" | sed "s/^/insert into product (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode, ReferenceDrug,drugname,activeingred) values (/g" > ~/public_html/Product.sql.txt
Questions to answer
- Provide the DDL (
create table
) statements that you used for your application and product tables.
application.txt:
create table application (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelFlag boolean, CurrentPatentFlag boolean, ActionType varchar, ChemicalType int, Ther_Potential varchar, OrphanCode varchar);
Product.txt:
create table product (ApplNo int references application, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
- Provide the
sed
command sequences that you used to convert the raw text files into sequences of SQLinsert
statements.
application.txt:
cat application.txt | sed "s/\t/','/g" | sed "s/ *'/'/g" | sed "s/\r$/');/g" | sed "s//null/g" | sed "s/'//1" | sed "1d" | sed "s/^/insert into application (ApplNo,ApplType,SponsorApplicant,MostRecentLabelFlag,CurrentPatentFlag,ActionType, ChemicalType,Ther_Potential,OrphanCode) values (/g" > ~/public_html/application.sql.txt
Product.txt:
cat Product.txt | sed "s/'/\"/g" | sed "s/\t/','/g"| sed "s/\r$/');/g" | sed "s//null/g" | sed "s/'//1" | sed "s/^/insert into product (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode, ReferenceDrug,drugname,activeingred) values (/g" > ~/public_html/Product.sql.txt
- Using the command line, how can you determine the number of records in each file? Provide the command.
In order to determine how many records are in each file, we use the wc
since this gives us the number of lines, word count, and byte count for each file. The number of records in each file, therefore, is the number of lines minus 1 since the first line for each is actually a row of labels. application.txt = 19746 records and Product.txt = 32770 records.
The commands are as follows:
application.txt:
cat application.txt | wc
Product.txt:
cat product.txt | wc
- Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL
select
statement.
In SQL we have to use the keyword count
. The result will be in a table with the column label count bigint.
Using SQL, the commands are as follows:
application.txt
select count(*) from application;
The result is 19746.
Product.txt
select count(*) from product;
The result is 32770. So both numbers match up with what the command line gave.
- In your database, are these numbers the same or different? Explain why you think so.
The numbers are the same since what wc
gives since the SQL table technically gives the records + the 1 row of labels that was in the first line of each of the files.
- For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
- What are the names of the drug products that are administered in the form
INJECTABLE;INTRAVENOUS, SUBCUTANEOUS
?
(a) The drugs are "LOVENOX", "VELCADE", "VIDAZA", "ENOXAPARIN SODIUM", "ACTEMRA", and "AZACITIDINE".
(b) select drugname from product where form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS';
.
- What are the names of the drug products whose active ingredient (activeingred) is
ATROPINE
?
(a) There were only two, ATROPEN and ATROPINE with the active ingredient ATROPINE
. Note: There were 4 instances of ATROPEN in the database.
(b) select drugname from product where activeingred = 'ATROPINE';
- In what forms and dosages can the drug product named
BENADRYL
be administered?
(a)
form | dosage | |
---|---|---|
1 | "CAPSULE;ORAL" | "50MG" |
2 | "ELIXIR;ORAL" | "12.5MG/5ML" |
3 | "CAPSULE;ORAL" | "25MG" |
4 | "INJECTABLE;INJECTION" | "10MG/ML" |
5 | "INJECTABLE;INJECTION" | "50MG/ML" |
(b) select form, dosage from product where drugname = 'BENADRYL';
- Which drug products have a name ending in
ESTROL
?
(a) "DIETHYLSTILBESTROL", "STILBESTROL", and "DIENESTROL".
(b) select drugname from product where drugname ~ 'ESTROL$';
- Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
(a) (b)
- Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
(a) (b)
- What are the names of the drug products that are sponsored (sponsor applicant column) by
MERCK
?
(a) (b)
- Which sponsor applicant companies have the text
LABS
in their names and have products whose active ingredients (activeingred) include bothASPIRIN
andCAFFEINE
?
(a) (b)
Assignment Links
Weekly Assignments
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- No Week 13 Assignment
- Week 14
- Week 15
Individual Journal Entries
- Week 1 - This is technically the user page.
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- No Week 13 Assignment
- Week 14
- Week 15