Tuesday, November 10, 2009

Web Server

There are two common features of the web servers: One is Virtual Hosting to serve many web sites using one IP address and another is Bandwidth Throttling to limit the speed of responses in order to not saturate the network and to be able to serve more clients.

Virtual Hosting

Virtual hosting is a method that servers such as web servers use to host more than one domain name on the same computer, sometimes on the same IP address. Virtual web hosting is one of the most popular hosting options available at the moment probably because it is one of the most cost effective options on the market. Also known as shared web hosting, virtual hosting allows a website owner to have a site hosted on a web server that is shared with other websites. In simple terms, the virtual hosting company's server will allocate out hosting services and bandwidth to more than one website. Virtual web hosting is a cheaper hosting option because you won't have to pay for a dedicated server to host just your website. Virtual web hosting is a good solution for small- to medium-sized (and even some larger) websites that aren't constantly being visited or that have reasonable bandwidth needs.

There are two basic methods of accomplishing virtual hosting: name-based, and IP address or ip-based.

Name Based :

Name-based virtual hosts use multiple host names for the same webserver IP address. With web browsers that support HTTP/1.1 (as nearly all now do), upon connecting to a webserver, the browsers send the address that the user typed into their browser's address bar (the URL). The server can use this information to determine which web site, as well as page, to show the user. The browser specifies the address by setting the Host HTTP header with the host specified by the user. The Host header is required in all HTTP/1.1 requests.

If the Domain Name System (DNS) is not properly functioning, it becomes much harder to access a virtually-hosted website. The user could try to fall back to using the IP address to contact the system, as in http://10.23.45.67/. The web browser doesn't know which hostname to use when this happens; moreover, since the web server relies on the web browser client telling it what server name (vhost) to use, the server will respond with a default website—often not the site the user expects.

A workaround in this case is to add the IP address and hostname to the client system's hosts file. Accessing the server with the domain name should work again. Users should be careful when doing this, however, as any changes to the true mapping between hostname and IP address will be overridden by the local setting. This workaround is not really useful for an average web user, but may be of some use to a site administrator while fixing DNS records.

Another issue with virtual hosting is the inability to host multiple secure websites running Secure Sockets Layer or SSL. Because the SSL handshake takes place before the expected hostname is sent to the server, the server doesn't know which certificate to present when the connection is made. One workaround is to run multiple web server programs, each listening to a different incoming port, which still allows the system to just use a single IP address. If running multiple web server programs is considered clumsy, a more efficient solution is to select TLS. Another option is to do IP aliasing, where a single computer listens on more than one IP address.

IP Address

In IP-based virtual hosting each site (either a DNS hostname or a group of DNS hostnames that act the same) points to a unique IP address. The web server is configured with multiple physical network interfaces, virtual network interfaces on the same physical interface or multiple IP addresses on one interface.

The web server can obtain the address the TCP connection was intended for using a standard API and use this to determine which website to serve. The client is not involved in this process and therefore (unlike with name based virtual hosting) there are no compatibility issues.

Path Installation

Web servers are able to map the path component of a Uniform Resource Locator (URL) into:

  • a local file system resource (for static requests);
  • an internal or external program name (for dynamic requests).

For a static request the URL path specified by the client is relative to the Web server's root directory. Consider the following URL as it would be requested by a client:

http://www.example.com/path/file.html

The client's web browser will translate it into a connection to www.example.com with the following HTTP 1.1 request:

GET /path/file.html HTTP/1.1
Host: www.example.com

The web server on www.example.com will append the given path to the path of its root directory. On Unix machines, this is commonly /var/www. The result is the local file system resource:

/var/www/path/file.html

The web server will then read the file, if it exists, and send a response to the client's web browser. The response will describe the content of the file and contain the file itself.

Bandwidth Throttaling :

Bandwidth throttling is a method of ensuring a bandwidth intensive device, such as a server, will limit ("throttle") the quantity of data it transmits and/or accepts within a specified period of time. For website servers and web applications, bandwidth throttling helps limit network congestion and server crashes, whereas for ISP's, bandwidth throttling can be used to limit users' speeds across certain applications (such as BitTorrent), or limit upload speeds.

A server, such as a web server, is a host computer connected to a network, such as the Internet, which provides data in response to requests by client computers. Understandably, there are periods where client requests may peak (certain hours of the day, for example). Such peaks may cause congestion of data (bottlenecks) across the connection or cause the server to crash, resulting in downtime. In order to prevent such issues, a server administrator may implement bandwidth throttling to control the number of requests a server responds to within a specified period of time.

When a server using bandwidth throttling has reached the allowed bandwidth set by the administrator, it will block further read attempts, usually moving them into a queue to be processed once the bandwidth use reaches an acceptable level. Bandwidth throttling will usually continue to allow write requests (such as a user submitting a form) and transmission requests, unless the bandwidth continues to fail to return to an acceptable level.

Likewise, some software, such as peer-to-peer (P2P) network programs, have similar bandwidth throttling features, which allow a user to set desired maximum upload and download rates, so as not to consume the entire available bandwidth of his or her Internet connection.

Load Limits:

A web server (program) has defined load limits, because it can handle only a limited number of concurrent client connections (usually between 2 and 80,000, by default between 500 and 1,000) per IP address (and TCP port) and it can serve only a certain maximum number of requests per second depending on:

  • its own settings;
  • the HTTP request type;
  • content origin (static or dynamic);
  • the fact that the served content is or is not cached;
  • the hardware and software limits of the OS where it is working.
When a web server is near to or over its limits, it becomes overloaded and thus unresponsive.

Kernal-mode and user-mode web servers:

A web server can be either implemented into the OS kernel, or in user space (like other regular applications).

An in-kernel web server (like TUX on Linux or Microsoft IIS on Windows) will usually work faster because, as part of the system, it can directly use all the hardware resources it needs, such as:

  • non-paged memory;
  • CPU time-slices;
  • network adapters buffers.

Web servers that run in user-mode have to ask the system the permission to use more memory or more CPU resources. Not only these requests to the kernel take time, but they are not always satisfied because the system reserves resources for its own usage and has the responsibility to share hardware resources with all the other running applications.

Also applications cannot access the system internal buffers, which is causing useless buffer copies that create another handicap for user-mode web servers. As a consequence, the only way for a user-mode web server to match kernel-mode performances is to raise the quality of its code to much higher standards than the code used into another web server that runs in the kernel.

This is more difficult under Windows than under Linux where the user-mode overhead is 6 times smaller than under Windows.

Overload Causes:

At any time web servers can be overloaded because of:

  • Too much legitimate web traffic. Thousands or even millions of clients hitting the web site in a short interval of time. (e.g. Slashdot effect);
  • DDoS. Distributed Denial of Service attacks;
  • Computer worms that sometimes cause abnormal traffic because of millions of infected computers (not coordinated among them);
  • XSS viruses can cause high traffic because of millions of infected browsers and/or web servers;
  • Internet web robots. Traffic not filtered/limited on large web sites with very few resources (bandwidth, etc.);
  • Internet (network) slowdowns, so that client requests are served more slowly and the number of connections increases so much that server limits are reached;
  • Web servers (computers) partial unavailability. This can happen because of required or urgent maintenance or upgrade, hardware or software failures, back-end (e.g. DB) failures, etc.; in these cases the remaining web servers get too much traffic and become overloaded.
Mostly Used Server in the World:

Vendor Product Web Sites Hosted Percent
Apache Apache 108,078,535 46.90%
Microsoft IIS 49,723,999 21.58%
Tencent qq.com 30,069,136 13.05%
Google GWS 13,819,947 6.0%
nginx nginx 13,813,997 5.99%


No comments:

Post a Comment