Kevin Wyllie Week 6
From LMU BioDB 2015
Revision as of 03:37, 13 October 2015 by Kwyllie (Talk | contribs) (Just saving my progress in case of timeout.)
Questions
Electronic Journal Assignment
- Veronica and I worked quite a bit with Anu, initially. To no avail, we spent approximately 3.5 hours attempting to convert application.txt into an SQL-friendly file (Sunday, 10/11/15).**
What We Tried First...
- Although columns **were** separated by tabs, in many cases they were also separated by a variable number of spaces (depending on the length of the Sponsor Applicant name). At the time, we thought this was our main obstacle in rewriting the lines for pgAdmin. The command:
sed "s/\t/, /g"
was an attempt to replace all tabs between columns with a comma followed by a space (we chose a comma since it is what SQL uses). But of course, this was when we discovered the extra spaces. You'll also notice that the formatting of the rows is inconsistent (a minority of the rows actually did what we expected them to do). This stressed us out too.
- Opting to address the parts of the file that I **did** know how to address, my next command was to remove the four spaces between the ApplType and SponsorApplicant columns, something that was consistent across all of the rows:
sed -r "s/( ){4}//g"
. What I had failed to realize was that this would still remove groups of four spaces from the aforementioned gaps between the SponsorApplicant and MostRecentLabelAvailableFlag columns, leaving the gaps as either doublet or triplet spaces. Seeing this, I added two more commands to remove these remaining spaces:sed -r "s/( ){3}//g" | sed -r "s/( ){2}//g"
.
- But there were still remaining spaces. So we used
sed "s/ , False/, False/g" | sed "s/ , True/, True/g" | sed "s/, /,/g" | sed "s/,/','/g" | sed "s/^/'/g"
to eliminate these spaces and add "single-quotes" around each entry (since those that don't **need** single-quotes can still function properly with them).