Doctor Thesis Abstract

Studies on Adaptive Network for Flash Crowds Alleviation

Chenyu Pan
September, 2006

As the Internet becomes more integrated into our everyday lives, the availability of information services built on top of it becomes increasingly important. Content Distribution Networks (CDN) is a widely used Internet technique to improve the performance and availability of web sites by deploying geographically-dispersed server surrogates and distributing client requests to an "appropriate" server based on various considerations.

Traditionally, a CDN system has used a fixed number of dedicated servers based on capacity planning, which works well if the request load is relatively consistent and matches the planned capacity. However, web requests could be very explosive. Recently, with the rapid spread of information and ubiquitous access of browsers, flash crowd, a sudden, unanticipated surge in the volume of request rates towards a particular web site, presents new challenges to maintain high accessibility of the information. Flash crowds are short-term dramatic load spikes. For such situations, over-provisioning a web site with CDN is inefficient and uneconomical, particularly to small web sites.

To handle flash crowds flexibly, we advocate a more cost-effective approach: FCAN -- Flash Crowds Alleviation Network, an adaptive CDN network that dynamically optimizes the system structure between peer-to-peer (P2P) and client-server (C/S) configurations. We utilize an Internet infrastructure of cache proxies to organize a temporal proxy cloud, among which perform P2P cooperative caching. This P2P mode is invoked on the fly when flash crowd comes but get out of the way when normal C/S configuration works fine.

In this dissertation, we first study the characteristics of flash crowds. We show that network bandwidth is the most serious bottleneck and a small amount of objects is responsible for a larger amount of requests. These observations implicate flooded requests must be redirected away from sever network and caching these flashed objects may be a possible way, on which the design of FCAN is based. The related works show there still leaves much to be satisfied in handling flash crowds problem. We benefit from the state-of-art extensively and propose our improved design, moving the P2P overlay from the client side to cache proxy layer, and using this layer to offload the flash traffic from origin web server.

There are three main challenges in conducting this research: (1) How to organize the temporal proxy cloud quickly while impact little on cache surrogates during their normal period of operations? We propose an implicit P2P overlay to minimize the warm-up time and use simple, unstructured 1st generation P2P system instead of DHT-based 2nd generation P2P system to decrease the operation complexity and computation cost. (2) How to detect the coming and leaving of a load spike properly while avoid the oscillations of the system architecture? We employ threshold-based detection but treat load increase and decrease differently. A pair of threshold watermarks is used instead of just a single one. When detecting load increase we react quickly based on the current load level; when detecting load decrease, we prefer a blunt react to keep system stable. (3) How to direct the client requests to the proxy cloud whose surrogates are temporally organized and geographically distributed? DNS-based redirection mechanism is adopted in this context. DNS returns the addresses of proxy cloud which have been recently verified first-hand and waits being propagated to the client side. We prefer load balance to the client response time since system throughput becomes more crucial under the flash crowds conditions.

FCAN has the following advantages. Firstly, scalable -- the edge resource of cache proxies on the Internet is second to the clients. A web server can expand its capacity by employing more and more cache proxies. Secondly, optimum -- C/S architecture is simple but incurs single point of failure, while P2P architecture is scalable but difficult to control. Both the limitations of the C/S and the P2P architecture can be minimized through the dynamic transition. Thirdly, economic -- the P2P overlay is built on the top of existing cache proxies on the Internet. There is no additional hardware investment or cost. Fourthly, reliable -- compared with transient clients, cache proxies are managed by network administrators. They are much more stable and reliable to delivery objects. Finally, transparent -- being an intermediate solution, clients do not need to aware FCAN. Their browsers keep unchanged.

Our preliminary simulation results, with both the artificial workload and real trace driven workload, show that the system is overall well behaved. With a proper setting of parameters, FCAN helps a web site's availability under heavy load from 45.91% up to 99.85%. These results validate the correctness of our basic design, however, also point out the scalability limitation of the current design. The fixed pre-determined parameters are not suitable because the peak load is hard to predict in advance. So we make an extending design of a scalable proxy cloud that dynamically shrink and enlarge according to the changing load instead of involving all the cache proxies all the time. We leave the concrete extending design, quantitative evaluation as well as other open issues as our future works.

The main contributions of this dissertation are: (1) We study the flash crowds phenomena and its related work extensively. We analyze the main characteristics of flash crowds and point out the need of an adaptive network to handle short-term Internet congestion. (2) Our design idea of changing the network architecture between C/S and P2P configuration is original and unique. We present the design detail along with the mechanisms for proxy-based P2P object dissemination, DNS redirection and dynamic transition. (3) Different from common detection algorithms, we propose a strategic load detection to treat the load increase and decrease differently. (4) We model a flash traffic generator and use a home-grown simulator to do the experiments in verifying the correctness of the design. (5) Finally, we give an extending design in amending the scalability limitation of the system.

The content in this dissertation is organized as: Chapter 1 introduces the background and motivations of this research. Chapter 2 takes a closer look at flash crowds phenomena. Chapter 3 discusses the related works. Chapter 4 presents the overall design including the mechanisms for proxy cloud organization, DNS redirection, dynamic transition, application protocols and implementation outline. Chapter 5 describes the preliminary simulation and its results. Chapter 6 gives an extending design and other open issues. Chapter 7 is the conclusion and future works.


Theses Index (in Japanese)