Difference between revisions of "Vpachec3 Week 6"
(Added questions to answer section) |
(→Questions to Answer: made that into a section instead of a subsection) |
||
Line 61: | Line 61: | ||
cat Product.txt | sed "s/\t/~/g"|sed "s/'/\"/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/5"| sed s"/~/'~/6"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/'null'/null/g"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/');/g"|sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" > ~/public_html/Product.sql.txt | cat Product.txt | sed "s/\t/~/g"|sed "s/'/\"/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/5"| sed s"/~/'~/6"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/'null'/null/g"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/');/g"|sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" > ~/public_html/Product.sql.txt | ||
− | + | == Questions to Answer == | |
# Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables. | # Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables. | ||
Line 77: | Line 77: | ||
# What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>? | # What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>? | ||
# Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>? | # Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>? | ||
− | |||
==Links== | ==Links== |
Revision as of 04:33, 14 October 2015
application text file
My partner,Kevin, and I met on Sunday at 1:30pm in Seaver 120 to begin our assignment. Downloading and unzipping the file was not a problem. The problem occurred when we had to condense the information. What took me a while to understand is how to get rid of the tabs and spaces efficiently and where to go from there. We had massive kickstart help from Anu!!! We are very grateful she was there on Sunday to help get the thought process going.
Here is the break down for each section of the command:
cat application.txt
This opened the text file.
cat application.txt|sed "s/\t/~/g"
The added sed command replaced each tab instance with a tilde. We used a tilde as a filler because it was easier to put the tilde instead of another marker such as an asterick in terms of syntax. Now we need to get rid of the extra spaces in the line. However, we can't just use a command to get rid of the the spaces because some of the company names have spaces in them. Since the names only have one space we can make the command to get rid of spaces in pairs.
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"
The second sed command would allow for any pairs of spaces to reduce.
I am going to jump in my explanation and explain the next four added commands (see below). I am lumping the explanation because they all are going for the same goal. The fist of the four gets rid of the remaining space in front of the tilde by replacing it with just a tilde, no space. The following step was to make sure that the columns with no answer in them (any empty entry in the columns) have null instead of an empty section. We need to do this exact same command twice because the computer will only read the groups of two if there were a greater number it would only count the first two. The last command of the four newly added makes sure that the end of line has a null if there is no information in the column for each row.
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"
adding apostrophes to the varchar components in the file. We need the apostrophes so that they can be read as text in SQL. Booleans don't need the single quotes so we are in the clear for those. We also have to make sure that the nulls do not have apostrophes. The following pipeline shows that we need to have go by instance of each tilde to add in the apostrophe. And at the end, we had to make sure that the nulls didn't have apostrophes.
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"
change the tilde into commas.
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"
To get to our final command we just added the wording and syntax necessary for SQL in front of the line so that we can just copy and paste all the commands straight into SQL. http://my.cs.lmu.edu/~vpachec3/application.sql.txt
FINAL COMMAND
cat application.txt|sed "s/\t/~/g"|sed -r "s/( ){2}//g"| sed "s/ \~/~/g"| sed "s/~~/~null~/g"| sed "s/~~/~null~/g"| sed "s/~\r$/~null/g"| sed "s/~/~'/1"| sed "s/~/'~'/2" | sed "s/~/'~/3"| sed "s/~/~'/5"|sed "s/~/'~/6"| sed "s/~/~'/7"|sed "s/~/'~/8"|sed "s/~'null'~/~null~/g"| sed "s/\r$/'/g"| sed "s/~/~'/8"| sed "s/~'null/~null/g"|sed "s/~/,/g"|sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_code) values(/g" |sed "s/$/);/g" > ~/public_html/application.sql.txt
Product text file
Here is where I left off on Monday 10/12
cat Product.txt | sed "s/\t/~/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/);/g" > ~/public_html/application.sql.txt
I used the same method to go about this pipe from the last pipe. However the above pipe left me with problems that need fixing. Here were the thing I needed to fix:
- There were two warning labels that became a problem.
- There were plenty of instances where there was a description similar to "5'-Phosphate" where the ' was meant to stand for prime. This was a problem because 'is also used as language so I had to take care of that.
- The column TECode, I had put as varchar when creating the table but I didn't put the column enteries with single quotes so that showed up as a problem.
- Finally, I need to have a command that got ride of the single quotes around the nulls.
Thus, after playing around with the different commands several times, I got the pipe that worked.
FINAL COMMAND:
cat Product.txt | sed "s/\t/~/g"|sed "s/'/\"/g"|sed "s/~~/~null~/g"| sed "s/~/~'/2"|sed "s/~/'~'/3"|sed "s/~/'~/4"|sed "s/~/~'/5"| sed s"/~/'~/6"|sed "s/~/~'/7"| sed "s/~/'~'/8"|sed "s/'null'/null/g"|sed "s/~/,/g"|sed "s/^/insert into Product(ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |sed "s/\r$/');/g"|sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" > ~/public_html/Product.sql.txt
Questions to Answer
- Provide the DDL (
create table
) statements that you used for your application and product tables. - Provide the
sed
command sequences that you used to convert the raw text files into sequences of SQLinsert
statements. - Using the command line, how can you determine the number of records in each file? Provide the command.
- Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL
select
statement. - In your database, are these numbers the same or different? Explain why you think so.
- For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
- What are the names of the drug products that are administered in the form
INJECTABLE;INTRAVENOUS, SUBCUTANEOUS
? - What are the names of the drug products whose active ingredient (activeingred) is
ATROPINE
? - In what forms and dosages can the drug product named
BENADRYL
be administered? - Which drug products have a name ending in
ESTROL
? - Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
- Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
- What are the names of the drug products that are sponsored (sponsor applicant column) by
MERCK
? - Which sponsor applicant companies have the text
LABS
in their names and have products whose active ingredients (activeingred) include bothASPIRIN
andCAFFEINE
?