The Size and Growth Rate of the Internet by K. G. Coffman and A. M. Odlyzko
spacer
Read related articles on Internet accessibility and Telecommunications

spacer

The public Internet is currently far smaller, in both capacity and traffic, than the switched voice network. The private line networks are considerably larger in aggregate capacity than the Internet. They are about as large as the voice network in the U. S., but carry less traffic. On the other hand, the growth rate of traffic on the public Internet, while lower than is often cited, is still about 100% per year, much higher than for traffic on other networks. Hence, if present growth trends continue, data traffic in the U. S. will overtake voice traffic around the year 2002 and will be dominated by the Internet

spacer

Contents

Introduction
What To Measure and How
Costs and Prices and the Decline of Distance
Units of Measurement
Voice Networks
The Public Internet
Private Line Networks
Rates and Sources of Growth
Conclusions
Notes

spacer

There are many predictions of when data traffic will overtake voice. It either happened yesterday, or will happen today, tomorrow, next week, or perhaps only in 2007. There are also wildly differing estimates for the growth rate of the Internet. The number of Internet users is variously given as increasing at 20 or 50 percent per year, and the traffic on the Internet is sometimes reported as doubling every three months, even in sober government reports [1]. Often the same source contains seemingly contradictory information. For example, John Sidgmore, the chief operating officer for WorldCom, and the person in charge of all its Internet activities, was interviewed early in 1998 [2]. He stated that revenues from Internet operations at WorldCom were about doubling each year. Later in the interview, he said that the bandwidth of UUNet's Internet links was increasing 10-fold each year. Since the prices that UUNet charges have not decreased recently, certainly not by a factor of five, both of these claims can be correct only if something unusual is happening to the WorldCom network.

Since there is no comprehensive source for information on the size and growth rate of the Internet, it seemed worthwhile to do as careful an analysis as the fragmentary publicly available data allows. This is especially important because, as we point out below, the growth rate of the Internet has not been stable.

In this study we focus on the sizes of networks (measured by their transmission capacity) and the traffic they carry (measured in bytes). We find that the public Internet (the part of the Internet that is not restricted to users from any single organization) is still small, whether measured in capacity of links or in traffic, when compared to the voice telephone network. (This may not be true for all routes. There are frequent reports, for example, that there is more Internet traffic than voice traffic between the U. S. and Scandinavia.) In contrast, the private parts of the Internet (largely corporate private line networks) already have capacity close to that of the voice network. On the other hand, traffic on private line networks is still much smaller than that on the voice network, possibly not much bigger than traffic on the public Internet.

In modeling the transition to a network dominated by data it appears important to recognize three distinct growth rates:

  1. Interstate voice traffic (which carries some fax and modem data) has been growing recently about 8% a year when measured in minutes of use. This is an acceleration from the 4% rates of the early 1990s, but not as fast as the record 23% increase in 1984, or the average of 10.3% per year that was observed during the 1980s [3].

  2. Capacity and presumably traffic on private line networks have been growing 15% to 20% per year in the last few years [4].

  3. Traffic and capacity of the public Internet grew at rates of about 100% per year in the early 1990s. There was then a brief period of explosive growth in 1995 and 1996. During those two years, traffic grew by a factor of about 100, which is about 1,000% per year. In 1997, it appears that traffic growth has slowed down to about 100% per year.

Traffic on the Frame Relay semi-public data networks is also growing about 100% per year [5].

spacer

Reports, such as U. S. Department of Commerce's The Emerging Digital Economy, which claim 1,000% growth rates for the Internet, appear to be inaccurate today, since they are based on a brief period of anomalously rapid growth a short while ago [6]. Still, even a doubling each year is fantastically fast by the standards of the communications industry.

If traffic on the Internet continues to double each year, data should exceed voice on U. S. long distance networks around the year 2002. (We are using data here to refer to packet traffic, and voice to the circuit switched traffic, whether those are used to carry voice or data.) There are obvious uncertainties in making such projections, but in the "Rates and Sources of Growth" section of this paper we discuss the long historical trend of consistently rapid growth of the Internet, and the reasons we expect it to continue.

Our estimates of the transition date from domination by voice to domination by data differ from many published ones. Most of the claims (such as [7], which says that the bandwidth of data networks will equal that devoted to voice in 2000) are not substantiated by detailed analysis and appear incorrect, as that transition is occurring about now, it appears. Mutooni and Tennenhouse's analysis [8] appears to go astray by assuming utilization rates of data and voice networks are about the same. However, as is shown in the companion paper [9] and discussed at greater length later in this paper in the section "Private Line Networks," data networks typically are used much less intensively than voice networks. Thus network capacities do not represent the amount of traffic those networks carry. Finally, there are claims which may agree with our prediction of a transition around 2002, but they are not accompanied by detailed arguments.

 

spacer

What To Measure and How

Many studies of the Internet look at the number of users [10]. This is the most relevant measure for some purposes, although it is inadequate for others, as it does not say anything about the intensity of usage. Other studies measure the Internet's size by the number of computers connected to it (cf. [11] and Table 7 in this paper, both based on data from [12]). In this study we focus on the sizes of networks and the traffic they carry.

Since the Internet is a loose collection of networks, it is hard to decide what to include in estimating its size. See Fig. 1 for a sketch of the universe of data networks and the role that the public Internet plays in it. A key point is that what is commonly thought of as the Internet, namely the public backbones connecting all the local networks, is only a small part of the data networking universe. Most data traffic, just like most voice phone traffic, is local. Also, most of the cost of data and voice communications is associated with local facilities. For example, universities tend to devote between 10% and 20% of their network budget (counting the cost of people as well as equipment and services obtained from carriers) to Internet connections. In voice telephony, of the approximately $200 billion that is spent in the U. S. each year, only about $80 billion is for long-distance (inter-LATA) calls. Moreover, of that $80 billion, about $30 billion is paid to the local carriers in access charges, so the true long-distance component of the cost to the public is around $50 billion, only about a quarter of the total for the voice system.

spacer

Although most of the cost of telecommunications networks is for local facilities, we consider long-distance transport only. The economics and patterns of usage of local facilities are different (see [13] for more extensive discussion of this point). Studies of data as well as voice communications have historically concentrated on long haul circuits. They are the most heavily utilized and also the most difficult or expensive to upgrade, so sophisticated engineering and pricing approaches are most appropriate there. We will follow this precedent, and will not consider the LANs and WANs. Similarly, we will not consider local access links, such as the phone lines used by residential Internet users to connect to their ISPs or to make local voice calls. While we cannot avoid comparing apples to oranges, we try not to compare apples to orange trees.

Unlike most studies of the Internet, we consider not only the public Internet, but also the long-distance private line networks. These networks are estimated to mostly use IP (Internet Protocol), and, as we show, are far larger in both cost and bandwidth (but not necessarily in traffic carried) than the public Internet. The evolution of the Internet in the next few years is likely to be determined by those private networks, especially by the rate at which they are replaced by VPNs (Virtual Private Networks) running over the public Internet. Thus it is important to understand how large they are and how they behave.

In this study, we consider only U. S. networks. These networks still account for between 60% and 70% of users and host computers in the world [14], and almost surely a higher fraction of capacity and traffic, since transmission costs much less in the U. S. than in most of the world [15]. It is also easier to obtain data for North America. Further, most countries are striving to lower their telecommunications costs, and so the patterns we observe in the U. S. are likely to be replicated elsewhere in the next few years. The geographical restriction does mean, however, that the equality of data and voice traffic we predict for 2002 applies only to the U. S., and the crossover is likely to be somewhat later in most other countries.

Even in the U. S., we exclude government networks from consideration. We also do not consider research networks such as vBNS, since they carry little traffic, although they do have substantial capacity (as we will mention).

We consider only lines that are used for carrying voice and data traffic. The capacities of the underlying fiber networks (cf. [16]) are far higher than those we will be listing. It would take us too far afield to try to explain the disparity in the estimates, but they have to do with differences between air distances and fiber route distances, presence of dark fiber, restoration capacity, and other factors.

spacer

Costs and Prices and the Decline of Distance

We study capacities of networks and the traffic they carry. We measure these in Gbps (gigabits per second, 109 bits per second) and TB/month (terabytes per month, 1012 bytes per month). However, it seems intuitive that a terabyte carried between Baltimore and Philadelphia is not equivalent to a terabyte between Baltimore and San Francisco. Thus a more complete description of communications traffic should incorporate a measure of how far that traffic travels. Distance plays an important role in the evolution of networks. For example, an important reason cited for the migration of corporate data traffic to the public Frame Relay networks is that charging for Frame Relay is insensitive to distance, making it much less expensive for long distance communications [17].

While distance does play a role in telecommunications, it is a decreasing role [18]. The monthly tariffs for interstate T1 leased lines from MCI (quoted from [19]) consist of a fixed fee of $3,234 and $3.87 for each mile. (The corresponding figures for a T3 are $22,236 and $52.07, respectively, so the distance dependence is stronger for higher capacity links, a phenomenon we expect to continue.) For the typical 300-mile leased circuit distance, the fixed fee is 73.6% of the cost of a T1, and 58.7% for a T3, and a more representative cost estimate would show much greater share for fixed costs, since it would include local connections and lease of terminating equipment. (The figures in Table 3 are for all-inclusive costs of private lines of various speeds, and for a 56 Kbps line include 57% for local access costs. It should be noted, as is detailed in [20], for example, that customers typically lower their costs by up to 50% through long-term contracts and bulk purchase discounts.) The historical trend in pricing of a T1 connection is shown in Figure 2, which displays the tariffed rates for a T1 line of 700 miles, broken down into fixed and mileage-sensitive components, from the time this service was first offered until the end of 1997. Even at such a large distance, the distance-sensitive part of the price has decreased rapidly.

spacer

The decreasing role of distance can also be seen in the voice telephony price structure. As is shown in Table 14.2 of [21], in the early 1980s there were many rates for interstate calls, depending on both distance and time of day. Today there is still some variation depending on time of day, even though that is also decreasing (see [22] for a discussion), but calls are priced independently of distance (within the U. S.). This reflects the decreasing fraction of the price that goes to cover network costs. It is estimated that of the 12 cents a minute that carriers collect on average for a voice call within the U. S., only about 1.5 cents is needed to pay for the network. By far the largest component of cost is the approximately 5 cents a minute of access charges paid to local carriers largely to subsidize local service.

The discussion above was intended to show that while our analysis does ignore important factors of distance, it is reasonable to do so as a first approximation. In further defense of our approach, let us mention that most communication is still local. This was apparently first noted in the 19th century by Carey and others [51], but is best known from the work of Zipf [23], who collected a variety of statistics on communication and transportation patterns. Zipf observed that whether one measured phone calls, car travel, or mail usage, the interaction between two cities with populations A and B at distance D appeared to be proportional to A*B/Dspacer , with spacer = 1. Other investigators since that time have found better fit for other values of spacer , typically with 1 spacer spacer 2. Even services with distance-insensitive fees, such as mail, appear to be closely tied to social interactions, and are mostly local. That is certainly the case for usage of the telephone network. Interstate voice calls on average go over 500 air miles, while private lines are on average about 300 air miles in length.

A fascinating topic for further research is whether Zipf's observations will apply to the Internet. In voice telephony, the last 20 years have seen a growth in interstate and intrastate toll calls from 8 minutes per line per day to 14 minutes, while local calling has stayed about constant at 40 minutes (Table 12.2 of [24]). This was presumably caused by the decline in long-distance prices and the greater mobility of the population. Still, most calls are local, and even in the interstate calling case, the intensity of calls drops off with distance, as Zipf observed. The Internet is a world-wide network, and much traffic comes from downloading from popular Web servers, many of which appear to be located in California. On the other hand, a disproportionate share of Internet traffic is within California in any event (as is seen by examining the backbone maps in [25]), so it could be that most traffic is local even on the Internet. Even if that is not the case now, the penetration of the Internet into everyday life may mean that Internet traffic will again follow patterns of our everyday social and economic interactions, and be largely local in the future. Further, even if there is no trend towards local information sources, the spread of caching may mean that most packets will be transported over short distances. Additional investigation is clearly desirable, especially since the speed with which it is economical to deploy some novel transport technologies depends on the distances over which traffic is to be carried.

spacer

Units of Measurement

It will be convenient to state some conversion factors between different units and between the bandwidth of a connection and the traffic carried by that connection. Since there is substantial uncertainty about many estimates, we will not attempt to achieve precision, and will often not worry about 10% differences.

Voice on phone networks is carried in digitized form at 64,000 bits per second. Each voice call occupies two channels, one in each direction, so takes up 128,000 bps of network bandwidth. Thus one minute of a voice call takes 60*128*1000 bits, or 937.5 KB (kilobytes, units of 1024 = 210 bytes). Rounding this off, we get

1 minute of switched voice traffic spacer 1 MB.

(There is a discrepancy between the meaning of the "k" or "K" prefixes, which commonly denote 1000 in communication and 1024 = 210 in computing. Given the lack of precision in most of the estimates we will be dealing with, this difference will be immaterial and will be ignored.) Compression can reduce that to a much smaller figure, and is used to some extent on high-cost international circuits, as well as on some corporate private line networks. As far as the network is concerned, though, it is carrying almost 1 MB of digital data for each minute of a voice call. Further, most data traffic can also be compressed, so we will ignore this factor.

A T3 (or DS3) line operates at 45 Mbps (actually, closer to 43 Mbps, but again we won't worry about this discrepancy) in each direction, so that if it were fully loaded, it would carry 90 Mbps. Over a full month of 30 days, that comes to 29 TB (terabytes, which are 1012 bytes). We will say that

full capacity of a T3 link spacer 30 TB/month.

A T1 line (1.5 Mbps) is 1/28th of a T3, and we will say that

full capacity of a T1 link spacer 1 TB/month.

spacer

Voice Networks

The FCC collects and publishes comprehensive statistics on long-distance switched voice networks. They show that at the end of 1997, U. S. carriers had about 40 billion voice minutes per month of interstate traffic on their public networks (Table 11.1 of [26]), which is about 40,000 TB/ month. This number has recently been growing at about 8% per year (Table 12.1 of [27]). Including local and intrastate toll traffic boosts the estimate to about 275,000 TB/month. (Table 12.1 of [28] also shows that since 1980, intrastate and interstate toll calls have grown from 7% and 8%, respectively, of switched voice minutes, to 11% and 15%, another sign of the declining role of distance.)

spacer

The 40,000 TB/month figure for long distance switched network traffic includes a large but unknown fraction of fax and modem calls, which carry data. However, since they appear on the network as switched calls, they will be counted as voice.

Defining long distance traffic is easy compared to defining what is meant by long distance network capacity. There are various special connections for operator services, 800 number services, and the like. We consider just the long distance lines between large switches. Then known distributions of traffic over a week, achievable busy hour utilizations, and reserve capacity rules of thumb (all described in the literature, for example in [29]) show that the average utilization of such links is around 33%. Combined with the traffic estimates above, this shows that all the switched voice networks in the U. S. had capacity of around 350 Gbps at the end of 1997.

spacer

The Public Internet

There are many estimates of the size and growth rate of the Internet that are either implausible, or inconsistent, or even clearly wrong. We already cited the Sidgmore interview [30] as an example. It has been claimed in early 1997 that one third of Internet traffic went through the MAE East peering point [31]. Actually, although about one third of the traffic that went through public peering points went through MAE East, this traffic was only a part of total Internet backbone traffic.

The major reason for the uncertainties in measuring the Internet is that carriers do not release detailed information about their networks. As a result, any estimates made from publicly available data will necessarily have a large error margin.

As a first step, to provide a "sanity check" on other estimates, we consider the traffic generated by residential users accessing the Internet with a modem. There are about 20 million of them (or, more precisely, there are about 20 million active accounts) and according to the latest information from America Online and other services, on average an account is connected about 25 hours each month. These users download data at a rate of about 5 Kbps when they are online (with considerably smaller average upload rates), which generates traffic load of just about 1,000 TB/month. (To the voice phone network, which dedicates 128 Kbps for each connection, the load appears as 26,000 TB/month, about 10% of the total load of voice calls, local and long distance.) Since there are more PCs in corporate environments than at home, we should expect to see total Internet traffic of at least twice that, or 2,000 TB/month.

We next consider traffic through the public peering points. Statistics for them are available [32], often going back a year or more. The five largest ones are shown in Table 1. The traffic estimates are for the early part of December 1997 (to avoid the Christmas and New Year holiday effects). The other public peering points are much smaller. Total traffic through all the public peering points is dominated by that through the five points in Table 1, and comes to about 4 Gbps, or 1,200 TB/month. For comparison, in mid-1996, traffic through these points was about 1.6 Gbps, or 500 TB/month. Growth has been uneven, with especially rapid increase in traffic at the Chicago NAP. That peering point had traffic of only around 0.2 Gbps as late as October 1997, but by April 1998 was carrying about 0.7 Gbps. Overall, though, aggregate traffic through the NAPs and MAEs appears to have been growing at about 100% per year from late 1996 through April 1998. This agrees with the 100% growth rates for 1997 for MCI and an unnamed ISP [33].

Table 1: Major public exchange point traffic, end of year 1997
peering point traffic in Gbps
Sprint NAP (New York City) 0.5
Ameritech NAP (Chicago) 0.7
Pac Bell NAP (San Francisco) 0.5
MAE East (Washington, DC) 1.1
MAE West (San Jose)
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.