Three Dimensional Data Web Set To Emerge

New Protocols Enable Manipulation of Quantitative Data by Data Web Browsers

Open Standards Likely to Give Huge Boost to Data Mining Activities

Work Pushed By Terabyte Challenge Consortium Enables Remote Interaction of Data Sets

pp. 1 -12

A new web is emerging. The data web will likely exceed the document web in size and in its impact on Internet infrastructure. We interview Robert Grossman, CEO of Magnify, Inc. and Director of the Laboratory for Advanced Computing at the University of Illinois Chicago. Grossman has played a pioneering role in the use of high performance computer networks to assist scientists in their analysis of extremely large data sets. He has built a layered view of how data mining - a process of data analysis and real time decision making - could be carried out over the Internet.

Many businesses have extensive data sets showing information about their customers including their customer purchasing history. Grossman explains his role in catalyzing the Data Mining Group which is a consortium made up of Angoss, IBM, Magnify, Microsoft, Mineit, NCR, Oracle, Salford Systems, SGI, SPSS, and Xchange. The group is made up predominantly of vendors of proprietary data mining software packages. These vendors are now joined in an effort to develop a set of open standards that should lead to much new software and to a vast increase in the amount of data mining. Furthermore, with the spread of the XML markup language that is used to display rows and columns of data on the web, it is expected that these developments will lead to the take off of a public data web. This will mean the growth of sites having publicly accessible data sets where visitors with client browsers equipped to interact with the site¹s data servers can retrieve data that can be manipulated as data rather than examined but not changed as is the case with an HTML page. The result will be the data web or what Grossman calls Data Space.

As Grossman explains: "From the user's perspective, Data Space works like the document web. You can use a browser to examine remote and distributed data. And you can analyze and mine it with a point and click interface. Web sites can use Data Space services such as personalization and predictive modeling to provide a site with interactions which are created on the fly for each individual visitor.

From the vendor's perspective, Data Space is also like the document web, it simply uses a richer suite of services, including services for moving data (DSTP) and real time scoring (PSUP), and specialized XML languages for working with data, including the Predictive Model Markup Language (PMML) and the Data Extraction and Transformation Language (DXML)."

Data Space uses open standards to provide the Internet infrastructure necessary to work with scientific, engineering, business, and health care data. "Unlike HTTP and HTML which are designed for multi-media documents, Data Space is somewhat more complicated because you have higher expectations when you work with data than when you work with documents."

"A document you only have to read. With data you have to analyze, score and make decisions. What everyone interested in tracking and planning for the further growth and development of Internet infrastructure needs to understand is that so far the current internet barely scratches the surface of what you will be able to do with data as Data Space and similar infrastructure begins to be deployed. I¹m sure that the data web will be an important driver of bandwidth over the next few years."

Grossman also explains the Terabyte Challenge which for the past four years has been used both as a test bed for the basic protocols, languages and tools for Data Space, as well as a testbed for different ways to scale data intensive applications, especially remote and distributed data intensive applications. The focus has been on developing an open infrastructure for working with large and distributed data sets. Grossman¹s group has developed a process of stripping that allows large data sets to interact with each other in real time at sustained bandwidth usage of more than 250 megabytes per second.

The data space transfer protocol (DSTP) is the protocol used to move data between nodes in the data web. The data extraction and transformation mark up language (DXML) describes how to clean, transform, and shape data. This is usually one of the most labor intensive tasks when working with data. Statistical models are built using statistical and data mining applications. The predictive model markup language (PMML) describes the output of such systems in an open format. Scoring is the process of using statistical models to make decisions. The Predictive Scoring and Update Protocol (PSUP) is a protocol that can be used for both on line real time scoring and updates as well as scoring in an off line batch environment.

When PMML was adopted as an open standard by the likes of IBM and other major players earlier this year, the trade press had a flurry of articles. However, our interview with Grossman represents the first article that covers the entire extent of what he is doing.

ISOC Summarizes ICANN Dilemma

p. 12

On December 9 - 10 the ISOC Board of Trustees met in San Diego California. ‹ Their minutes provide a frank assessment of ICANN¹s lack of authority.

IPv6 from the Viewpoint of Mobile Wireless

Continued Cell Phone Growth to Cause Deployment of IPv6 Nets Interconnected to Non Disappearing IPv4 Infrastructure

Substantial Work Remains to Bring V6 and Data to Cell Phones,

pp. 13 - 19

We interview Charlie Perkins a Research Fellow in the Wireless Internet Mobility Group at the Nokia Communication Systems Laboratory. Perkins offers a fresh point of view on the issue of IPv6 deployment. He explains that independent nodes running IPv6 already exist and will spread. "IPv4 and IPv6 can co-exist in the same general network because they do not collide with each other. They just have to know how to address each other. For example you can have a router that routes IPv6 packets and IPv4 packets on the same network."

"The whole thing about IPv6 to begin with was to develop a protocol, deploy it, and do what IETF does well which is to get to inter-operability testing going and then to just start to build it. People want to buy solutions to the problems facing them, be they IPv4 problems or IPv6 problems. People will want to buy solutions for their IPv4 problems and for their IPv6 problems. Eventually the solutions for the IPv4 problems may become more expensive than the solutions for the IPv6 problems. This will be true in part because the IPv6 solutions that are already available will become cheaper as IPv6 grows in market share."

NAT will not suddenly disappear. "Having large domains of both IPv4 and IPv6 is merely one way to partition the possibility of overall IP address space in general. In such a situation with the right kind of fire wall NAT platform, you can even mention translating IPv6 addresses into IPv4 addresses at the border of the domains so that IPv4 applications can in effect be tricked into believing that what is going on is only an interaction between two IPv4 applications." Talking about the inordinate expense of converting and IPv4 internet into IPv6 is asking the wrong question because v6 can be meaningfully deployed in an Internet where v4 continues to function.

However the arrival of a billion cell phones over the next 18 months will force much more serious deployment of IPv6 which is the only reasonable means of doing both voice and data over a single cell phone.

According to Perkins: "We have answers for most of what we have looked at but, as we look, we see more and more problems. For example there are a lot of problems in security and a lot of problems in quality of service. There are also a lot of problems in header compression. Also the way in which the base stations are coordinated to manage spectrum most effectively is historically not very friendly towards the IP model."

"All of these things add up to a situation where, as I mentioned before, you can employ IPv6 now. But for specific applications like voice IPv6 cannot currently match the performance of analog voice-over the air as a part of the PSTN. Now we're going to change this and believe that we will be able to equal or exceed the current capabilities of analog voice over the air as part of the PSTN.² Adding mobility to the mix of necessary protocol development for IPv6 data phones complicates the technical issues involved. According to Perkins: ³There are a lot of people who want to use v4. But I don't think we will ever get to global deployment of mobile IPv4 for voice-over IP. I think by the time voice-over-IP really comes into play, we will be using largely IPv6."

Of several interesting protocols being developed the most interesting is an IETF a working group called AAA (authentication, authorization and accounting). Radius only works for static objects and has some other difficulties as well. Consequently this AAA working group is building up a replacement protocol for Radius. The AAA protocol will come with features such as session measurement and Accounting. Tied in with IP SEC, AAA will do authorization and accounting for services such as mobile IP.

Among Scaling Issues IPv6 Solves Only IP Number Problem

NATs Depend on Both IP Numbers and Routing Issues

Since IPv4 and v6 Interconnect But Do not Interoperate Introducing v6 Means Running Two Networks -- 3G Makes v6 Cellular Viable

pp. 20 - 22

Yet another IETF discussion this time with interesting new information on levels of complexity of NATs and levels of address allocation.

Klensin Internet Drafts Propose Radical DNS Revamp

New Class Means New Root -- Drafts Are Aftermath of Network Solutions Split With IETF on ENUM and Internationalized Domain Names

pp. 23 - 26

DNS issues are less settled than ever before. Uncertainty about the fate of the protocol and ICANN¹s failure to generate any consensus on issues of Internet governance have lead to a situation where Network Solutions new VeriSign owners are doing something that before ICANN would have been unthinkable. Namely it has instituted an ENUM trial that flies squarely in the face of IETF - ITU agreement on the ENUM standard. On December 18, Tony Rutkowski, speaking for VeriSign Network Solutions at the NTIA Roundtable on ENUM dismissed the IETF - ITU model of national ENUM administrators for the namespace and advocated a model of industry control with NetNumber and the other unsuccessful applicants for ENUM like gTLDs lined up in opposition to Richard Shockey, Neustar, the IETF and ITU. Observers of the meeting seem in agreement that there now is no agreement on ENUM and that deployment in the US will be seriously postponed. On December 20 Net Sol announced the opening of its ENUM trials with a statement long on hype and short on substance.

The area of Internationalized Domain names is even more contentious than ENUM. There Net Sol got another head start with its own proprietary solution and has been registering .com, .net and .org names in Chinese, Japanese and Korean characters since November 10th. These Asian nations in the meantime have begun to register names according to their own systems. In effect we are getting the pollution of the name space that ICANN has warned about no matter whether ICANN likes it or not. The conflicts between the opposing sides appear to be intractable and the ITEF IDN standards process has bogged down.

In the midst of this something intriguing happened with the publication by John Klensin, Chair of the IAB of three internet drafts. The most important of these came on December 13. See: http URL:draft-klensin-i18n-newclass-00.txt Title: Internationalizing the DNS ‹ A New Class <>

Klensin states "The [draft] proposal is radical in the sense that it implies a major restructuring of DNS usage and, indeed, of the Internet, to make the DNS seamlessly capable of working with multinational character sets. Such a restructuring is, and should be, quite frightening. It is worth considering only if the long-term risks and problems of other proposals are severe enough to justify a radical approach. It is the working hypothesis of this document that they are." Klensin goes on to call for the creation of a new universal class in DNS one that would be designed for the UTF 8 character set. By calling for a universal class Klensin is, in effect, calling a new root into which the old ASCII based root would be folded as a subset of the new 'order'. Over time every DNS resolver would be obsoleted and replaced. It would be a bit like rebooting the Internet. But so convoluted have the DNS wars gown that at the very apex of power the thought has suddenly become one of sweeping everything aside and starting afresh. Klensin noted that: "A mailing list has been initiated for discussion of this draft, its successors, and closely-related issues at This email address is being protected from spambots. You need JavaScript enabled to view it. . To subscribe to the mailing list, send a message to This email address is being protected from spambots. You need JavaScript enabled to view it. with the single word "subscribe" (without the quotes) in the body of the message." We predict that this mail list could become one of the most important lists in a long long time