Difference between revisions of "The Web from the Command Line"

From LMU BioDB 2017
Jump to: navigation, search
(More curl content.)
(Add POST example.)
Line 15: Line 15:
 
== curl Mimics Requests in the Network Developer Tools Tab ==
 
== curl Mimics Requests in the Network Developer Tools Tab ==
  
One such use of '''curl''' is to trigger requests that you can’t perform just by typing a URL into a web browser. The location bar in web browsers only perform what are called '''GET''' requests—requests that are meant to retrieve content from a server. Some requests, however, have a different ''method'', such as '''POST''' or '''PUT'''—these are meant to ''submit'' data to a server. On web browsers, you do this implicitly by filling out forms and clicking on some ''Submit'' button. If you don’t want to go through a browser or would like to do this automatically, you can use '''curl''':
+
One such use of '''curl''' is to trigger requests that you can’t perform just by typing a URL into a web browser. The location bar in web browsers only perform what are called '''GET''' requests—requests that are meant to retrieve content from a server. Some requests, however, have a different ''method'', such as '''POST''' or '''PUT'''—these are meant to ''submit'' data to a server. On web browsers, you do this implicitly by filling out forms and clicking on some ''Submit'' button. If you don’t want to go through a browser or would like to do this automatically, you can use '''curl'''.
{{ Under Construction }}
+
 
 +
For example, interacting with this wiki makes use of '''POST''' requests (go ahead, work with this wiki with the Network tab selected in developer tools). This means that one can’t just edit pages by typing something into the web browser locator bar—which is probably a good thing. Instead, you are required to type into an editor area, then click one of the buttons at the bottom in order to process the request.
 +
 
 +
With '''curl''', you can perform the same request directly:
 +
curl -X POST -d "title=Sample_Page&action=submit" <nowiki>https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php</nowiki>
 +
This sends a '''POST''' request to our wiki server with data items <code>title</code> whose value is ''Sample_Page'' and <code>action</code> with a value of ''submit''. If you read through the response that you’ll get, you’ll eventually see that the wiki server ''did'' correctly interpret this request as one to edit a page, but it refused to do it anyway because a user needs to be logged in. If you search the text you’ll see that it includes the message “You do not have permission to edit this page:”
 +
curl -X POST -d "title=Sample_Page&action=submit" <nowiki>https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php</nowiki> | grep "permission"
 +
Logging in is a whole other issue, but the point of this example is to show that you can simulate a web browser action that you wouldn’t otherwise be able to do without interacting with the web browser directly.
 +
 
 +
If you can see a web request in the Network tab of a web browser’s developer tools, then you can simulate this request with '''curl'''.

Revision as of 03:53, 11 September 2017

Due to the “need-to-know” approach that we take in this course, our study of the command line necessarily takes a leap to a fairly powerful and advanced command. This command beings the web to the command line, and learning it helps one understand what is truly happening behind the scenes when we visit websites with our web browsers. That command is curl. Strictly speaking, it is spelled cURL because its name is intended to mean “see URL.” “See” and “c”—get it?

Basic Usage

Put simply, curl performs single web requests and displays the response provided by the contacted web server without further processing (nor layout). In its simplest form, one can just give it a URL:

curl http://www.lmu.edu

For most URLs, invoking this command will produce a flood of text—a perfect use case for more/less, or for output redirection in case you want to save the file to your computer:

curl http://www.lmu.edu > lmu.html

That’s it, really. Using curl, you can get the content of a web page purely at the data level. No images are loaded, no layout is done, no visuals are rendered. This command is like the very first step that a web browser takes when visiting a website, except that it goes no further than that.

Why curl?

One look at this might beg the question of why the command exists at all, especially when we have web browsers that work perfectly fine for daily use. Daily use is the operative term here—of course curl is not meant to be a web browser replacement. Instead, as a command line program that performs a simple request/response cycle, curl can be used for scripting, automation, and other types of processing that go beyond the visual consumption of a website’s content.

curl Mimics Requests in the Network Developer Tools Tab

One such use of curl is to trigger requests that you can’t perform just by typing a URL into a web browser. The location bar in web browsers only perform what are called GET requests—requests that are meant to retrieve content from a server. Some requests, however, have a different method, such as POST or PUT—these are meant to submit data to a server. On web browsers, you do this implicitly by filling out forms and clicking on some Submit button. If you don’t want to go through a browser or would like to do this automatically, you can use curl.

For example, interacting with this wiki makes use of POST requests (go ahead, work with this wiki with the Network tab selected in developer tools). This means that one can’t just edit pages by typing something into the web browser locator bar—which is probably a good thing. Instead, you are required to type into an editor area, then click one of the buttons at the bottom in order to process the request.

With curl, you can perform the same request directly:

curl -X POST -d "title=Sample_Page&action=submit" https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php

This sends a POST request to our wiki server with data items title whose value is Sample_Page and action with a value of submit. If you read through the response that you’ll get, you’ll eventually see that the wiki server did correctly interpret this request as one to edit a page, but it refused to do it anyway because a user needs to be logged in. If you search the text you’ll see that it includes the message “You do not have permission to edit this page:”

curl -X POST -d "title=Sample_Page&action=submit" https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php | grep "permission"

Logging in is a whole other issue, but the point of this example is to show that you can simulate a web browser action that you wouldn’t otherwise be able to do without interacting with the web browser directly.

If you can see a web request in the Network tab of a web browser’s developer tools, then you can simulate this request with curl.