A Digital Library as a Digital Island

Hal Berghel

February, 2001

Welcome to the second installment of DL Pearls.

Last month, you may recall, we covered, albeit lightly, the history of the concept of a "digital library." This month, I'll begin by trying to give you some sense of the enormity of this undertaking.

I used the metaphor of an island in an anatomical sense. As anatomical islands are clusters of cells that differ in structure or function than those of the surrounding tissue, so digital libraries are clusters of documents whose structure or function is different from other documents within the surrounding digital network. Digital libraries are more than collections of documents, or clusters of hyperlinks. They all share the same object-level structure, i.e., are interwoven hypertext sites in which all underlying components are interrelated to one another by means of similar infrastructures (in the case of the ACM DL, Oracle's Intermedia is used now as the indexing and query support engine). The ACM DL library is indeed an island in the path of what society has found to be the most important economic trade route in history.

To give you some idea of the population of our island, the ACM DL currently contains 865 proceedings volumes representing 45,806 individual articles, and 1857 journal issues offering another 18,210 articles. Between the conference proceedings and the journal articles, there are at this writing 64,016 individual articles in the ACM Digital Library database, and there's plenty more to come. Not all of the ACM archives are scheduled to come online until late Spring, 2001. This is not to mention the 18,175 abstracts of articles that are included and 2,654 reviews of articles from Computing Reviews with about 500 more coming online soon. Reflect for a moment on the volume of information in the DL - the magnitude of this endeavor is quite imposing. Further, in terms of online and indexed content, 56,926 of the 64,016 articles are now available for full-text downloads (the remaining are being readied as this column appears), as well as most of the abstracts and reviews.

Given these impressive demographics, let's look at visitation. On average, 45,345 different computers (either clients or intermediary servers) visit the ACM DL 75,543 times and download 602,342 articles each month. Expressed in terms of ratios, this means that every visit in which any article is downloaded actually produces sufficient interest to download seven more articles. At a more abstract level, on average 95,464 computers visit the DL 175,731 times monthly to  download 702,395 citations relating to ACM publications, and 56,744 computers visit 109,351 times to download 245,930 Tables of Contents. In the last thirteen months alone, there have been a total of 21,709,332 downloads of all types from more than two-and-a-half million individual visits to the ACM Digital Library. We are talking some non-trivial transaction levels here, folks.

Of course, all of this is for naught if the users can't get the information they need in a timely fashion. That comes about via the streamlined interface of the ACM Digital Library at Let's see how this works.

First, we log into the ACM DL. In this case, we're interested in finding out everything we can about a current LAN technology, "fast Ethernet" which is our search term. Since we want a quick overview on all available content, we'll select all search parameters for relevant titles, full-text articles, abstracts, reviews and index terms. Our search produces 40 matches, 24 of which are displayed on the first page of the DL report. Were we so inclined, we could click on any title (they're all sensitized links) and download the PDF version of the article.

Instead, this time we want to save some time by developing a most-recent-first chronology of these articles to see what's most current, first, and then drill-down into the more dated material. We do this by changing the order of the listing from the default, alphabetical-order-by-publication, to a chronological listing by selecting ORDER BY: publication date. Now we've got the listings in the desired order.

But we still lack sufficient information to determine whether a download is called for. We want to avoid unnecessary downloads because PDF files tax our available bandwidth. Like all document formats that render beautifully, they tend to be bloated. So, we expand the individual entries in our list by choosing a full listing from the VIEW menu. At this point, the meta-level content is laid before us (see image, below). We have links to the citation of the article, its abstract, and applicable index terms, and a link to the entire full text article tastefully laid out before us together with the file size so that we can estimate download time.

To review, the trick to making the first pass at personalizing the search results from the ACM Digital Library is to take advantage of the SEARCH, ORDER BY, and VIEW presentation parameters. Try it. You'll like it.

