HTTP Introduction and Debugging
| History | Author: Gordon McKinney http://gmckinney.info/ |
| TBA | Update with latest Charles screengrabs |
| TBA | Will introduce details of HTTP/1.1 and anatomy of cookies |
| 27 Jul 2007 | Updated Wireshark project (was Ethereal) |
| 27 Jul 2007 | Republished on new server |
| 09 Mar 2004 | Added several new links to Further Reading |
| 30 Nov 2003 | Added Charles screenshots and HTTP/1.1 through proxies |
| 31 Oct 2003 | Added links to HTTP/TCPIP books (see end) |
| 04 Feb 2001 | First Release |
1 Intro
"Hyper Text Transport Protocol" - HTTP is the single most important
technology that drives the web and yet remains virtually transparent. Without
this protocol HTML and XML via the web would not be able to perform the myriad
of tasks that we put them to daily.
This article aims to cover the key concepts of HTTP, the tools needed for debugging
and where to find the relevant Internet standards for more detail. Throughout
later sections full graphical examples are given illustrating each concept with
live web sites.
2 Intended Audience
Why bother learning about HTTP when for the large part most developers manage
to produce web sites and never have to deal with HTTP directly.
The simple answer is that every form that's posted and every cookie that you
rely on is sent over HTTP. Wouldn't it be nice to see exactly what is being
sent to and from the web server and know for certain what is working and what
is broken.
Here are situations where knowing how HTTP works will save your time and gain
you credibility with the client.
- For developers this will provide an invaluable aid to diagnosing and testing
your code, as you'll be able to construct your own HTTP messages with nothing
more than notepad and telnet.
- Security audits become simpler for non-secure sites as you'll be able to
determine where the weaknesses are.
- Troubleshooting such problems as "The site is not responding" when
you know the server is running and pinging.
3 What is HTTP
HTTP is a protocol run over TCP with all the features necessary for interacting
with web servers that hold text and binary resources.
TCP guarantees that packets arriving to and from the web server are error free
and in the right order. It doesn't however guarantee that packets arrive no
matter what the network conditions are. When communications are congested or
unavailable web page delivery is slow and can time-out.
The sections below outline the key features of HTTP and therefore how a browser
and web server interact. Knowing this when writing client side JavaScript and
server side ASP or Vignette will save many hours of head scratching when code
doesn't work.
3.1 Asynchronous Protocol
Asynchronous means "not at the same time". This is the basis of the
request-response architecture of HTTP.
A request is issued and the response will return some time later. The web browser
will not wait for the response actively, instead it will leave the line of communication
to the server open until the response or a timeout occurs.
It is important to note that it is request-response from the client web browser.
A server cannot send unsolicited responses. There are however web-push technologies
that are not discussed here.
A web browser is configured to have no more than two outstanding requests open
concurrently. This is defined in the HTTP specification and is designed to prevent
overloading of the server by any single individual.
3.2 The Request
The most frequently used requests are GET and POST. The GET fetches a resource
from the web server using a path, not a fully qualified URL as the server is
implicitly the second party in the communication.
GET /index.html HTTP/1.1
Host: www.aserver.org
Virtual hosting on certain servers can make use of the intended host in the
URL so it is included in the message header with the "Host" directive.
That's it! All that's required to fetch a resource from the web server. The
above example can be typed by hand into telnet (port 80) and the web page will
be returned. Only the first line is actually required for a server that has
no virtual hosting.
When writing interactive web pages we will need to pass details to the web
server using URL encoding. An example below illustrates this with AccountNo=276
and Amount=50 accessing the balance page:
GET /balance.html?AccountNo=276&Amount=50 HTTP/1.1
Host: www.aserver.org
You'll notice the page now has a ? to indicate where the page path ends and
the variables begin. Each variable in turn is delimited by an & character.
Special characters like space and ampersand that have to be part of the variable,
or value, must be escaped by using the % sign followed by its hex ASCII index.
So why have POST at all when we can pass information by the GET command? The
answer is two fold. There is a physical limit to the length of a GET request
and therefore the amount of information you can pass to the server. The second
is that the GET is visible in the web browser address bar making it viewable
to any user watching and could potentially be bookmarked.
A post looks very similar to a GET except the variables (right of the ?) are
in the message body:
POST /balance.html HTTP/1.1
Host: www.aserver.org
AccountNo=276&Amount=50
Now we have the basics of a request we'll move on to response.
3.3 The Response
The response consists of a header and body just like the request. The header
contains a status code and the body contains the resource.
The response codes fall into several classes:
1xx Informational
2xx Successful
3xx Redirection
4xx Client Error
5xx Server Error
The most common are code "200 OK" and "404 Not Found" with
redirections taking code "301" or "302".
The response can also set other header directives that we will see next.
3.4 Cookies for State Management
Since a request-response protocol lives as long as the request is outstanding
how do we manage state information for each user?
The term "cookie" is very familiar but what is it? It's not part
of the HTTP specification but rather an add-on that is described in other specifications.
Cookies exist in the client browser's cache and are transmitted to the server
in a request header directive marked: "Cookie:".
The server can tell the client to set a cookie by using the "Set-Cookie:"
directive. The web browser is responsible for transmitting appropriate cookies
to the server that set them.
Three types of cookie exist and they are based on the lifetime:
- Session, indicates as long as the web browser is open.
- Expiring, indicates that it has a fixed time to live.
- Permanent, indicates that it will live until it is deleted the server.
In all cases a user can clear their cookies no matter what life expectancy
they were set to.
Rather than give examples here we dive into this subject when tracing a live
web site below.
3.5 Caching
The last directive controls caching. This is a huge subject but in its simplest
form you can set a page to not be cached by using on response header directive:
Cache-Control: no-cache
Once it's received by the web browser all-subsequent requests for that page
will be issued with the same directive preventing proxies and the client's local
cache from using stale data.
In fact some web servers (Netscape) have caching components built in to the
server-side that need to have the directive set to prevent sensitive personalised
information from being sent to multiple users.
3.6 HTTP/1.1 Through Proxies
The subject of HTTP/1.1 will be covered in more detail in future updates to this article.
For now it is important to know that when using a proxy server (regularly or when debugging)
IE will default to HTTP/1.0 which has poor performance compared to HTTP/1.1. So before using
a tool such as Charles it is important to enable HTTP/1.1 in IE, "Tools | Internet Options | Advanced".
4 Charles - Debugging Proxy
Enough with the theory! Below are live examples of how debugging HTTP can give
a deep insight into web applications.
A request is described by a GET for all resources, images, HTML etc. etc.

A response is returned below with the response header (upper pane) and response data (lower pane)

Charles untangles all the communication and represents it in site-structure format, making
it easy to locate resources. IE will use two conversations concurrently whilst accessing a site
for increased performance. Charles handles this concurrency automatically.
Now viewing HTTP becomes very useful when tracking information submitted in
an HTML form

No prizes for the search string. That was a GET submission of a form, below
is a POST, notice how the data is carried in the body of the request. The query used here is
"internet", you'll also notice the "book" variable being set to "dictionary"

This allows for more data to be sent to the web server as GET has a finite
limit. For completeness the response to the POST is below, notice that the server
is Apache/1.3.27 running on UNIX

When there are problems, you can see which resource failed with a dreaded 404.
Notice how the web server returns an HTML fragment to be optionally display
by the browser.

And when a site is down an exception is reported and the response stays empty.

Cookies are present in every request and response. Here is an example of two cookies
being set by the server:

Each subsequent request by the client now includes the cookie (cookie name = "Site").

5 Troubleshooting
5.1 Performance and Networking Problems
We have seen that when a server is down the resources are simply never returned,
it is up to the browser to time-out and give up. This can be 60 seconds or 5
minutes depending on the configuration.
When your site and all it's resources are located and managed by one provider
the number of potential problems are reduced. Your site is either up or down.
This changes when your web pages rely on third parties such as advert providers.
Ad providers can supply simple GIF images or more complex dynamic HTML. Each
failure scenario is covered below:
An Ad provider not serving GIF images will cause that request to fail after
a timeout. This will cause the web page to load more slowly as one of the two
concurrent download slots is busy waiting for the failed Ad server.
When an Ad provider is supplying dynamic HTML you will see JavaScript source
being requested. A page can stall completely if the JavaScript loads but then
request two more resources, for example an image rollover where both images
are unavailable. Remember only two resources can be requested at a time leaving
both waiting for a dead Ad server.
Both cases should cause concern as a third party can effectively disable a
web site by having their servers fail. Charles can provide the evidence
for this sort of failure within seconds as it clearly shows the requests that
have failed with an exception, or simply display "Active Connections"
that may be outstanding.

5.2 Dynamic HTML
As discussed above Ad providers can include JavaScript instead of static images
to produce a rich experience for the user and hopefully get that much sought
after 'click-through'.
While having an Ad server fail can cause a headache and possible loss of a
site there is another problem. This comes in the form of JavaScript errors and
incompatibilities. All HTML developers have hit NS/IE problems and most times
have had to use some tricks to make them co-exist after some heavy testing.
Ad providers can upset a page of working HTML by introducing JavaScript with
DHTML content. The browser normally hides the provided code when it loads the
page but Charles can list all requested resources and importantly all responses,
including any suspect JavaScript.
5.2 Limitations of Charles
Remember that Charles is a non-caching proxy that does affect the communications
between client and server (as does any proxy). Use it for diagnosing problems but
never for testing.
6 Internet Standards
Below is a list of the Internet standards that define HTTP:
HTTP Related documents:
RFC2616 -- Hypertext Transfer Protocol -- HTTP/1.1
RFC2965 -- HTTP State Management Mechanism (Cookies)
RFC2964 -- Use of HTTP State Management
RFC2936 -- HTTP MIME Type Handler Detection
RFC2817 -- Upgrading to TLS Within HTTP/1.1
RFC2617 -- HTTP Authentication: Basic and Digest Access Authentication
Multipurpose Internet Mail Extensions (MIME):
RFC2045 -- Part 1: Format of Internet Message Bodies
RFC2046 -- Part 2: Media Types
RFC2047 -- Part 3: Message Header Extensions for Non-ASCII Text
RFC2048 -- Part 4: Registration Procedures
RFC2049 -- Part 5: Conformance Criteria and Examples
Read RFCs at: RFC
Denmark or FAQs.org
7 Downloads and Tools
 |
|
Charles - Web Debugging
Charles is an HTTP proxy / HTTP monitor / Reverse Proxy / WAN Simulator with full NTLM support that enables a developer to view all of the HTTP traffic between their machine and the Internet. This includes requests, responses and the HTTP headers (which contain the cookies and caching information). The WAN simulator allows the simulation of high latency and low bandwidth links and provides detailed page timing statistics for analysis. Charles is a great all-in-one tool for debugging and performance tuning.
|
|
 |
|
Wireshark
Wireshark is a free network protocol analyzer for Unix and Windows. It allows you to examine data from a live network or from a capture file on disk. You can interactively browse the capture data, viewing summary and detail information for each packet. Wireshark has several powerful features, including a rich display filter language and the ability to view the reconstructed stream of a TCP session.
|
8 Further Reading
 |
|
Understanding Application Layer Protocols
Extract from a chapter covering TCP-based services such as HTTP, UDP services like DNS, and applications that use a combination of TCP and UDP, such as the Real Time Streaming Protocol (RTSP). Finally, we'll look at how these types of applications can be secured using Secure Sockets Layer (SSL).
|
|
 |
|
HTTP: The Definitive Guide
Web technology has become the foundation for all sorts of critical networked applications and far-reaching methods of data exchange, and beneath it all is a fundamental protocol: HyperText Transfer Protocol, or HTTP. HTTP: The Definitive Guide documents everything that technical people need for using HTTP efficiently. A reader can understand how web applications work, how the core Internet protocols and architectural building blocks interact, and how to correctly implement Internet clients and servers.
|
|
  |
|
HTTP Pocket Reference
All web programmers, administrators, and application developers need to be familiar with HTTP in order to work effectively. The HTTP Pocket Reference provides a solid conceptual foundation of HTTP, and also serves as a quick reference to each of the headers and status codes that compose an HTTP transaction. For those who need to get "beyond the browser," this book is the place to start.
|
|
 |
|
TCP/IP Illustrated, Volume 1: The Protocols
TCP/IP Illustrated, Volume 1 is a complete and detailed guide to the entire TCP/IP protocol suite - with an important difference from other books on the subject. Rather than just describing what the RFCs say the protocol suite should do, this unique book uses a popular diagnostic tool so you may actually watch the protocols in action.
By forcing various conditions to occur - such as connection establishment, timeout and retransmission, and fragmentation - and then displaying the results, TCP/IP Illustrated gives you a much greater understanding of these concepts than words alone could provide. Whether you are new to TCP/IP or you have read other books on the subject, you will come away with an increased understanding of how and why TCP/IP works the way it does, as well as enhanced skill at developing aplications that run over TCP/IP.
|
|
 |
|
Ethereal Packet Sniffing (new name is Wireshark)
Ethereal offers more protocol decoding and reassembly than any free sniffer out there and ranks well among the commercial tools. You’ve all used tools like tcpdump or windump to examine individual packets, but Ethereal makes it easier to make sense of a stream of ongoing network communications. Ethereal not only makes network troubleshooting work far easier, but also aids greatly in network forensics, the art of finding and examining an attack, by giving a better "big picture" view. Ethereal Packet Sniffing will show you how to make the most out of your use of Ethereal.
|
|