A Quick Relational Database Tour
This page gives you a tutorial-style walkthrough of a relational database, specifically PostgreSQL. The walkthrough assumes that you are at a Seaver 120 lab computer.
Contents
Overview
This diagram is a simplified rendition of how many web databases are structured. Most of the time, we only interact with these systems through a web browser. Later this semester, we will interact with the web service layer. This tutorial gives you a taste of what it’s like to interact directly with a database.
Running PostgreSQL on the Lab Machines
- Login to the computer as usual
- Launch pgAdmin
- The pgAdmin window starts with a hierarchical view on the left that starts with three layers:
- Server Groups
- Servers (1)
- PostgreSQL 9.6 (localhost:5432)
- Servers (1)
- Server Groups
- Double-click on PostgreSQL 9.6 (localhost:5432) to connect to the database server
- The password to start the server will be given in class
Creating a Database
Once the server is running, the red x disappears from the PostgreSQL 9.6 (localhost:5432) icon, and additional icons appear beneath it. If you click on the + button to the left of the Databases icon, you will see the databases that are currently available. Initially, you will see a single database called postgres.
To create a database, right-click on the Databases icon and choose New Database... from the menu that appears. In the New Database dialog, the only information you need to supply is your new database’s name.
When you click OK, you will return to the main pgAdmin window and you should see your new database underneath the Databases icon.
Note that you only need to go through this creation process once; that database will remain available until it is explicitly deleted.
Connecting to a Database
To start using a database, click on its icon. The red x disappears from the database icon and you should now be able to work.
Creating a Table
It’s hard to use a database without any data—so let’s bring some in. For this tour, we’ll use some publicly-available data: the FDA Drug Database.
Download the linked .zip file in the Download File section of that page. Or curl it from the command line! You can use curl -O to automatically save the download as a file.
When you decompress the file, you’ll see that it’s a bunch of .txt files. Many databases are distributed in this format for maximum interoperability. However, this means that we have to do some work to get it into an actual relational database management system like PostgreSQL.
First, we have to know what is in the database. The FDA provides this in the “Data Definitions and ERD for Drugs@FDA” section of the above-linked page. If you click it, you’ll see a diagram of how the data in the files are related and the column definitions of each table.
We don’t have time to load everything, so we’ll just take one: the Products table. We see its definition on the page, and we can translate this into an SQL create table
statement:
create table product(appl_no varchar(6) not null, product_no varchar(6) not null, form varchar(255), strength varchar(240), reference_drug int, drug_name varchar(125), active_ingredient varchar(255), reference_standard int, primary key (appl_no, product_no))
To do this in pgAdmin, click the SQL button in the toolbar. A new window with an SQL Editor tab appears. You can type (or copy-paste) the above command into that tab.
As always, watch out for typos! When ready, click on the Execute query button in the toolbar. (its button looks like a green play button)
Upon executing the query, a success message should appear in the Messages tab of the Output Pane in the bottom half of the window.
To assure yourself that the product table is indeed there, type and execute this query:
select * from product
The Data Output tab of the Output Pane should now show an empty tabular display with headings for appl_no, product_no, form, strength, reference_drug, drug_name, active_ingredient, and reference_standard.
Inserting Data
The SQL insert
statement allows you to add a record to a table. But if you look at the _Products.txt_ file, you’ll notice that it is what’s called a _tab-delimited text file_. How does one build an insert
statement for each of these lines without having to do a lot of typing? One solution involves a command you’ve seen before: sed!
cat Products.txt | sed "1d" | sed "s/'/''/g" | sed "s/\t/','/g" | sed "s/^/insert into product(appl_no, product_no, form, strength, reference_drug, drug_name, active_ingredient, reference_standard) values('/g" | sed "s/\r/');/g" | sed "s/'','');$/'');/g" | sed "s/'');$/'0');/g"
Try this command to see what comes out. There isn’t time right now to explain every single command, but having used sed before, you should have some inkling of how this is being done. One note of interest: the last two sed steps are there solely to accommodate some bad data in the file. You can try this whole sequence without those two steps to see what happens.
This command produces quite a few insert
statements! To get them into PostgreSQL, you can save this into a file:
cat Products.txt | sed "1d" | sed "s/'/''/g" | sed "s/\t/','/g" | sed "s/^/insert into product(appl_no, product_no, form, strength, reference_drug, drug_name, active_ingredient, reference_standard) values('/g" | sed "s/\r/');/g" | sed "s/'','');$/'');/g" | sed "s/'');$/'0');/g" > Products.sql
Then, open the file in a text editor and copy/paste the commands into pgAdmin’s SQL Editor' tab. Click Execute query.
SQL Injection
First, a little leadoff cartoon: http://xkcd.com/327
SQL injection is the term for a security exploit that attempts to bypass the web front end of a database and go directly to an SQL query. The exploit works on websites that do not—as stated in the XKCD strip—“sanitize their inputs.” In other words, a site is at risk for SQL injection if its query text field does not check for characters that might “break out of” the search term and thus spill into the SQL code.
This is probably best illustrated with an example…