HTTP & WWW servers

Bogdan Iakym, xiakym@fi.muni.cz

Content

HTTP Protocol

- Hypertext Transfer Protocol is an application layer transfer protocol, primary designed for hypertext documents, later extended to support any data types. Development was coordinated by Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C).

- Communication model is client-server. Client sends HTTP request message to the server, which handles requested resource and sends back HTTP response message. Server is typically listening on the port 80 (443 in the case of https).

- Example, using telnet:
$ telnet localhost 80
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.0
Host: localhost

HTTP/1.1 200 OK
Date: Tue, 23 Oct 2012 06:31:30 GMT
Server: Apache/2.2.22 (Fedora)
Last-Modified: Tue, 23 Oct 2012 06:30:51 GMT
Accept-Ranges: bytes
Content-Length: 124
Connection: close
Content-Type: text/html; charset=UTF-8

<html>
<head><title>Server test</title></head>
<body>
<h1>Congratulations</h1>
<p>Server works fine !!!</p>
</body>
</html>
Connection closed by foreign host.

Versions Overview

Methods

- HTTP command, informs server about the goal of the request

GETrequests specified resource, only retrieves data, no other effects
HEADsimilar to GET, but without response body
POSTsubmits data to be processed on server
PUTuploads specified resource on the server
DELETEdeletes specified resource
OPTIONSReturns the HTTP methods, supported on the specified URL
TRACEAllows to find out what changes have been made by the server for the given request
CONNECT SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy
PATCHPartial resource modifications

Header fields

- Components of the message header of requests and responses, operating parameters of an HTTP transaction. Each header field is defined as follows:

header field: value

- Four groups:

Request headers

AcceptAcceptable Content-Types (e.g. text/plain)
Accept-CharsetAcceptable character sets (e.g. utf-8)
Accept-EncodingAcceptable encoding (e.g. gzip)
Accept-LanguageAcceptable languages for response (e.g. en-US)
HostThe domain name of the server and the TCP port number on which the server is listening (e.g. example.com:80)
......

Response headers

Content-LocationLocation of the data content (e.g. /image.jpg)
Content-LanguageThe language of returned content (e.g. cs)
Content-LengthLength of the returned content (e.g. 408)
ExpiresDate/time after which the response is considered stale (e.g. Thu, 16 Sep 2012 18:00:00 GMT)
WWW-AuthenticateAuthentication scheme that should be used to access the requested entity (e.g. Basic)
......

Status codes

- Numeric status code is always located in the first line of the response, informs about the status of processed request.
- Five groups of codes:

Apache

- Still one of the most popular http server (serves 54.98% of all active websites, according to estimates, made in September 2012).
- Open source software, developed and maintained by Apache Software Foundation, originally based on NCSA HTTd code, being among the first developed web servers.

Installation:

Fedora/Cent OS: # yum install httpd
RHEL: # up2date httpd
Debian: # apt-get install apache2

Configuration notes

- Configuration parameters defined in the main configuration file, could be named differently, depends on the Linux distribution
- Three groups of directives:

Some directives

ServerAdminE-mail, where problems should be addressed (e.g. root@example.com)
ServerNameName and port of the server (e.g. www.example.com:80)
KeepAliveAllows persistent communication (on or off)
MaxKeepAliveRequestMaximum number of requests allowed within a single connection
DocumentRootDefault root directory, handling the requests (e.g. /var/www/html)
......

Virtual Hosts

- Apache allows running more than one web site on a single machine.
- IP-based vs. Name-based virtual hosts.

Example of Name-based configuration:

NameVirtualHost *:80

<VirtualHost *:80>
  ServerName www.company.com
  ServerAlias company.com *.company.com
  DocumentRoot /www/company
</VirtualHost>

<VirtualHost *:80>
  ServerName www.other.company.com
  DocumentRoot /www/othercompany
</VirtualHost>

Example of IP-based Configuration:

<VirtualHost 193.29.200.130>
  DocumentRoot /www/company1
  ServerAdmin webmaster@first.company.com
  ServerName www.first.company.com
</VirtualHost>
 
<VirtualHost 193.29.200.140>
  DocumentRoot /www/company2
  ServerAdmin webmaster@second.company.com
  ServerName www.second.company.com
</VirtualHost>

Modules

- Apache functionality can be easily extended by loadable modules
- More than 500 modules for different purposes, some of them developed by Apache Software Foundation, others by custom open source developers
- Modules can be compiled with the apache in the core, others can be loaded dynamically, using LooadModule directive
- Some of modules:

SSL

- SSL provides cryptographically secure transactions between web server and clients
- In most cases only the server end is authenticated, that gives a client guarantee that the server is who it claims to be
- Self-signed certificates vs. Certificates, signed by CA
- Problem with virtual hosts: impossible to host more than one SSL virtual host on the same address and port.
- mod_ssl should be loaded as it provides an API for Openssl
- SSL Configuration example:

Listen 443
NameVirtualHsot	*:443

<VirtualHost *:443>
  SSLEngine on
  SSLCertificateFile    /etc/pki/tls/certs/localhost.cert
  SSLCertificateKeyFile /etc/pki/tls/private/localhost.key
</VirtualHost>

CGI

- CGI(Common Gateway Interface) - mechanism that allows web server to delagate running of web scenarios to executable files, i.e. cgi scripts
- Programs, written in scripting language
- Module mod_cgi should be loaded for cgi support
- According to common convention cgi scripts shoud be located in cgi-bin/ directory or to have a .cgi extension

Example of defining cgi-bin/ directory:

ScriptAlias /cgi-bin/ "/usr/local/apache2/cgi-bin/"

Configuration, that allows to execute any cgi scripts, ending in .cgi in home directories:

<Directory /home/*/public_html>
  Options +ExecCGI
  AddHandler cgi-script .cgi
</Directory>

Directory-level Configuration

- Decentralized management of web server configuration
- File .htaccess(hypertext access), original purpose was to allow per-directory access control
- Global configuration settings can be overriden, by the ones defined in .htaccess
- Usage:

Alternatives

Sources