History of the Internet

“The internet has revolutionised the computer and communications world like never before. The invention of the telegraph, telephone, radio, and computer set the stage for this unprecedented integration of capabilities. The internet is at once a worldwide broadcasting capability, a mechanism for information dissemination, and a medium for collaboration and interaction between individuals and their computers without regard for geographic location.”


 - Barry M. Leiner (1999) A Brief History of the Internet


The concept of the internet was first envisioned by J.C.R Licklider in 1962. His idea was to create a ‘Galactic Network’ which could allow computers all over the globe to share both programs and information. His idea, nearly half a century ahead of the technology which could support it, bears a striking resemblance to today’s World Wide Web. The first step towards achieving Licklider’ vision was achieved when Leonard Kleinrock, also at MIT, researched and documented his concept of a packet switching (rather than circuit switching) computer communication model between 1961 and 1964. This was the first step towards allowing multiple computers to communicate in a routing network environment. Following the invention of the concept of packet switching networks the next step was the development of protocols which could provide these computers a common language and set of rules by which to communicate. Work in this area began to gain pace in 1966 when Lawrence Roberts joined DARPA and began work on his plan for the ARPANET. This plan was published a years later in 1967. He presented his work at a conference alongside two British researchers named Donald Davies and Roger Scantlebury who were delivering their research on the packet switching networking concept. The parallel and often independent nature of computer networking research epitomised its early years. Concepts such a packet switching and software systems such as communication protocols we’re usually prototyped and developed by multiple separate groups. The research community at large would then review their peers work and who’s solution progressed further was decided upon in a very democratic and organic manner. Historians in the field agree that this free and independent exchange of ideas and robust peer review process had a positive direct impact on the growth of the internet allowing it to become the global information commodity that it is today.


The ARPANET in generally considered the first incarnation of the internet but it was not the only competitor in the early days of networking. These early networks had little or no interconnectivity or inter-compatibility. Early competitors of ARPANET included the NSFNET developed by the national science foundation, the XNS from Xerox and SNA from IBM. Bear in mind this list is in no way exhaustive. The three examples I have given are purpose built networks. They each we’re designed to help achieve particular goals for the organisations that created them and were never intended to scale globally and encompass people of all walks of life. ARPANET was different in that from the outset it was intended to be scalable.


During the early 80’s important steps were taken to align the fields of operating system design and networking protocol development. It was during this period that the transmission control protocol/internet protocol (TCP/IP) was first integrated into the UNIX kernel at Berkeley. This proved to be a critical step in the distribution of the protocol to the greater academic and commercial community of network researchers. It was during this period that the number of computers available to students, businesses and even homes began to skyrocket. The IBM PC in particular came on the market in 1981 and was very successful. IBM and the Internet both owe their success to their open architectures. Like the developers of the internet, IBM decided to open up their design and development process to the public making it a more attractive option for the academic community. The IBM PC architecture is still the most popular format in use today.


The NSFNET was the second largest network at the time and work to integrate it with ARPANET was ongoing. As more and more researchers and campuses joined ARPANET the amount of people with valuable contributions to the technology began to increase exponentially. A decision was made to divide authority over the project into three groups. The first and most important of these groups was the internet cooperation board (ICB) which coordinated the highest level activities of the research teams. There were also the internet engineering and architecture task forces at DARPA and the network technical advisory group who worked together under the ICB to ensure the ongoing compatibility of the ARPANET and the NSFNET. The internet activities board (IAB) was later setup from the head researchers of the task forces. The IAB was populated with most of the same members as the internet configuration control board (ICCB) whose job it was to coordinate the activities of the other boards. This level of cooperation was made possible primarily by the use of a common protocol set on both networks, the TCP/IP protocol. NSFNET first adopted TCP/IP in 1981 and ARPANET later adopted in 1983. By this point the ARPANET had already grown to such a size that the switch to transmission control protocol had to be planned for three years before it could actually be implemented. The switchover had to be coordinated globally and implemented simultaneously. This made January 1st 1983 a major event in the evolution of the internet as the protocol switchover also allowed the network to be partitioned into two distinct elements. The military portion of the network took on the name MILNET while the civilian and academic portion carried on the name ARPANET.


By the mid 80’s the internet was a widely adopted technology interconnecting a wide range of computer technology with a global social reach. Much of this success is owed to the open nature of research into the technology. However, unlike other academic research subjects the internet progressed very rapidly from the start, therefore the normal mechanism of publication and peer review. As early as 1969 it became clear that a new model would be required. This model was known as the ‘request for comments series’. This model allowed for a more rapid distribution of ideas and feedback. Although it was originally distributed by mail it has since moved to ftp, then email then the World Wide Web.


During the 80’s commercial interest in the internet began to match that of the academics. It began necessary to include the private sector in the process of the protocol and technology development. This culminated in the formation of the internet society in 1992. This organisations purpose was to ensure the “open development and evolution of the internet in manner that would benefit all people throughout the world” (Wikipedia: Internet Society). As the internet was now branching out into normal society there were now people using the network to communicate who were not from the research or commercial community. With the help of the World Wide Web new communities we’re forming who needed the same benefits from standardisation. This led to the formation of the World Wide Web Consortium (W3C). This organisation was initially led by Tim Bernes-Lee and Al Vezza. The purpose of this organisation is to provide a single authority on standards and protocols associated with the WWW. The W3C is the current authority and all things relating to internet standardisation. The galactic network concept has seen over four decades of evolution not just in standards and technology but also in research and development methodology and central organisation.


The internet we know today as the World Wide Web was pioneered by Tim-Bernes Lee at MIT labs with the development with the development of hypertext mark-up language (HTML) and hypertext transfer protocol (HTTP). HTML allowed for the creation of web pages stored on networked computers. HTTP allowed these pages to be requested and retrieved over the network. The most important mechanism of HTML is the hyperlink; with it references to HTML documents can be embedded in other HTML documents. According to Bernes-Lee the concept of hypertext can be traced back as far as the 1945 essay by Vannevar Bush in which he described a Memex machine which would use binary coding, instant photography and photocells to allow the reader to follow microfilm cross references. The term hypertext however wasn’t coined until 1965 with the work of Ted Nelson. The first system that could be considered a forerunner to Bernes-Lee’s WWW would be the Enquire System which he developed at European particle physics laboratory (CERN). This system allowed for navigable document cross referencing, or hyperlinks. However this system was by no means complete, although it allowed Berne-Lee to organise his research it was not designed to be used on a wide area network where interoperability between different network architectures was essential. Discussion as to the requirements and architecture of the WWW began in 1990. Quickly a few core criteria emerged. According to Bernes-Lee these were:

1)      The system must be able to ad-hoc association between arbitrary objects.

2)      Making links between objects must be a scalable effort.

3)      Users must not be limited to particular languages or operating systems.

4)      Users mustn’t have to deal with information in the same way as computers do, it must be intuitive to the way the mind works, not the processor.

5)      Information must be easy to input to the system and update once its there.

Previous hypertext systems were centralised, documents, document identifiers and links between documents were all stored centrally. This was not very scalable as authority over a system and its administration was controlled via a central database. It did however guarantee referential integrity. If a page was removed so too could any links pointing to it. This loss of this guarantee of integrity was a fundamental compromise that Bernes-Lee had to make in order to make the WWW the globally scalable information system it has since become. The next stage in creating a global information space was defining a common addressing system. This system came in the form of Universal Resource Identifiers (URI). A URI can be either a universal resource locator (URL) or a universal resource name (URN). A URL identifies both a resource and a method of accessing it. For Example:


The ‘www.google.com’ identifies the resource and the ‘http://’ identifies the method for accessing it. A universal resource name identifies a resource but not a method of accessing it. For example below this is the URN for the book PHP in a Nutshell:


This addressing mechanism allowed the WWW to develop as a completely decentralised system. URI’s are unique in that any new address space can be integrated into the URI address space if its addressing mechanism can be represented as a human readable string and its name can be mapped onto the start of the URI address. Next the web required an editor and a browser capable of creating and requesting web page pages on the network. The first version of this application was written by Bernes-Lee for the NeXT Step system. The first data to appear on the Web was sourced from a legacy contact information database at CERN. This led people to believe that the web was a glorified phone directory with a peculiar interface. This first web browser was soon supplemented with clones for other systems developed by the nascent community. ViolaWWW and Cello were two notable early browsers systems for x-windows and windows receptively. However, the first browser to gain a respectable level of popularity was Mosaic due to its ease of use and its capability to support inline images. These small steps seems trivial compared to the incredibly fast pace of change in today’s internet but it is important to remember that at this stage only a fraction of the number of people were actively contributing to its development. As time progressed and the web grew rules of etiquette and social protocol on the network began to form. The use of www in the first part of a URL is one of the early Web conventions from this period that has stuck around to this day. Public interest in the technology also began to pickup. The web began to no longer rely on computer access to grow but instead started to drive computer uptake in schools, businesses and even homes. The web owes a great deal of its success to the adaptability of HTTP, it can be used to communicate essentially any data format and has since grown from being a mechanism for communicating text and imagery to being used for such things as the communication of application code written in java to digitised representations of virtual worlds written in VRML.


Unfortunately as the web grew so did the variation in supporting clients. In today’s internet web developers often have to implement the same functionality many times over to accommodate the nuances of the wide range of browsers available. Perhaps the most infamous of these nuances is the ‘broken box model’. For reasons unbeknown to the rest of us Internet Explorer’s developers have chosen to calculate the width of a given css styled element by adding its stated width to its padding and border width. So for example an element with a width of 100px, a padding of 10px and a border of 2px would have a total width of 124px. Every other browser follows the W3C convention of considering an elements CSS styled width irrelevant of its stated padding and border width. The end result is if you want your page to display consistently in Internet Explorer and every other browser you have to create two separate style sheets. This complicates maintenance and often leads to inconsistent rendering. Javascript also has a very inconsistent implementation across browsers. The W3C strives to minimise this cross client unpredictability as much as possible. Fortunately it is organised in an ideal fashion to achieve this. The W3C is made of around 150 members. These members come from academia, private industry and governments from all around the globe. It is a neutral forum where member of competing companies and even countries can work together for the common interest of the community.


As the net grew the next logical step was to begin facilitating financial transactions. Research into this area began in 1992 with the publication of ‘Future Shop: How New Technologies Will Change the Way We Shop and What We Buy’ by Terra Ziporyn. This book accurately predicted the decline of bricks and mortar commerce in the face of cheaper and easier online shopping options. Bizarrely, ecommerce got its first real world practical application in 1994 with the launch of Pizza Huts online ordering system. Food seems to a bizarre first implementation of this new financial technology but it worked; Pizza Hut are still offering food online to this day. It was at this time that adult material became available for purchase online. The massive surge in online transactions that came about as a result exposed security flaws in the system that were addressed later in 1994 by the introduction of Secure Sockets Layer (SSL).  SSL allows hosts on the internet to communicate without risk of eavesdropping, tampering or message forgery. 


In 1996 the internet took another big step forward with the launch of the google search engine. Since its inception the amount of content on the net has grown exponentially. Very quickly this content grew large enough that new ways of cataloguing and categorising it became necessary. The most popular method to achieve this was the search engine but most of the early search engines didn’t scale well when faced with the ever larger body of content available. A functional method of calculating the value of a page’s content for any given search term was hard to come by. Google however was different. It used a unique democratic mechanism of rating content which it called PageRank. The PageRank system used the configuration of the web of hyperlinks that make up the web as a kind of voting system. In this system each page has a value. The page then votes a share of its value to every page it links to and is in turn voted for by every page that votes for it. When a user provides a search term google searches its database of known web content and finds every page that has content matching that search term. The content is then ordered based on the values calculation shown above and presents a list of options to the user. Over the years this algorithm has been constantly supplemented to stay ahead of website developers who dedicate a large amount of their development time to figuring out how to get their content to the top of this list regardless of its actual value to the end user. Today the internet is so large that google can only recalculate every pages value four times a year. Also given the cyclical nature of this link based voting system when the PageRank for the whole internet is recalculated the algorithm has to be run several times over in order for the results to stabilise. This is because the act of recalculating a given pages PageRank will change the values of the votes that pages casts to other pages. We don’t know many times over the algorithm is run. We don’t even know how exactly it currently works. This is because google has to be very cryptic about its exact nature to prevent abuse.


Unfortunately, google, like every other search engine suffers from an as yet insurmountable flaw. When a user provides a search term they are presented with an enormous list of results, the majority of which have nothing to do with the subject being researched. This is because google searches based on a combination of keyword density and PageRank. Unfortunately, just because a page has a high density of given keyword does not mean that it is the subject of the page. For example of one were to make a page featuring the statement ‘This is not a page about France’ frequently then that page would likely appear highly for the search ‘information about France’. Google does not yet parse the meaning of the sentences within a page. Bernes-Lee documented a potential solution to this in 1998 with the publication of ‘A high level plan of the architecture of the semantic WWW’. This document proposes a mechanism by which not just the content of the web can be mapped but also the meaning of this content. Under such a system the page featuring a high concentration of the term ‘This is not a page about France’ would not feature as a result from the search ‘information about France’. The advantage of such a system is that it would only return provably correct answers. The disadvantage is that the processing power requirements of ascertaining the meaning of all the content of all the pages on Web is immeasurably huge and growing daily. Once the semantic web is implemented the potential of the WWW will grow enormously. Currently a computer can receive information from a web server but it can assess its meaning and act on it to achieve user goals to a limited degree. If the web server also delivered the meaning of the information in a semantic fashion a computer could understand what message the information is trying to convey and could therefore perform tasks previously only achievable by truly intelligent agents. Currently the only truly intelligent agents are people.


The business world is now so dependant on internet communication that the financial damage caused by even a relatively short network outage is unacceptable. The network on which the WWW operates was first devised under funding from the US military who hoped to achieve a fault tolerant communication network capable of surviving a nuclear war. The reliability achievable by internet servers has grown continuously since the dawn of wide area networking. However, it has not yet become possible to achieve 100% reliability for any one network node. Some basic research into the hosting packages available will show that nobody will guarantee more than 99.9% guaranteed uptime. The most effective method devised so far to give a web service 100% reliability is the use of content delivery networks (CDN). In a CDN content is mirrored across multiple servers on multiple backbones. When a user requests content from the CDN it is delivered from the nearest available server. When an administrator updates the content on any one CDN server it is propagated amongst the others.


Your rating: None