``The stretched exponential distribution of Internet media access patterns" 

Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang 

Proceedings of 27th ACM Symposium on Principles of Distributed Computing 
(PODC 2008), Toronto, Canada, August 18-21, 2008. 


The commonly agreed Zipf-like access pattern of Web workloads is mainly 
based on Internet measurements when text-based content dominated the Web 
traffic.  However, with dramatic increase of media traffic on the Internet, 
the inconsistency between the access patterns of media objects and the 
Zipf model has been observed in a number of studies.  An insightful 
understanding of media access patterns is essential to guide Internet 
system design and management, including resource provisioning and 
performance optimizations.  

In this paper, we have studied a large variety of media workloads 
collected from both client and server sides in different media systems 
with different delivery methods.  Through extensive analysis and modeling, 
we find: (1) the object reference ranks of all these workloads follow the 
stretched exponential (SE) distribution despite their different media 
systems and delivery methods; (2) one parameter of this distribution 
well characterizes the media file sizes, the other well characterizes 
the aging of media accesses; (3) some biased measurements may lead to 
Zipf-like observations on media access patterns; and (4) the deviation 
of media access pattern from the Zipf model in these workloads increases 
along with the workload duration.

We have further analyzed the effectiveness of media caching with a 
mathematical model.  Compared with Web caching under the Zipf model, 
media caching under the SE model is far less effective unless the cache 
size is enormously large.  This indicates that many previous studies 
based on a Zipf-like assumption have potentially overestimated the 
media caching benefit, while an effective media caching system must be 
able to scale its storage size to accommodate the increase of media 
content over a long time.  Our study provides an analytical basis for 
applying a P2P model rather than a client-server model to build large 
scale Internet media delivery systems.