Thursday, February 22, 2007

Origin of Unicode

Unicode has the explicit aim of transcending the limitations of traditional character encodings, such as those defined by the ISO 8859 standard which find wide usage in various countries of the world but remain largely incompatible with each other. Many traditional character encodings share a common problem in that they allow bilingual computer processing (usually using Roman characters and the local language) but not multilingual computer processing (computer processing of arbitrary languages mixed with each other).
Unicode, in intent, encodes the underlying characters — graphemes and grapheme-like units — rather than the variant glyphs (renderings) for such characters. In the case of Chinese characters, this sometimes leads to controversies over distinguishing the underlying character from its variant glyphs.

In text processing, Unicode takes the role of providing a unique code point — a number, not a glyph — for each character. In other words, Unicode represents a character in an abstract way and leaves the visual rendering (size, shape, font or style) to other software, such as a web browser or word processor. This simple aim becomes complicated, however, by concessions made by Unicode's designers in the hope of encouraging a more rapid adoption of Unicode.
The first 256 code points were made identical to the content of ISO 8859-1 so as to make it trivial to convert existing western text. A lot of essentially identical characters were encoded multiple times at different code points to preserve distinctions used by legacy encodings and therefore allow conversion from those encodings to Unicode (and back) without losing any information. For example, the "fullwidth forms" section of code points encompasses a full Latin alphabet that is separate from the main Latin alphabet section. In Chinese, Japanese and Korean (CJK) fonts, these characters are rendered at the same width as CJK ideographs rather than at half the width. For other examples, see Duplicate characters in Unicode.
Also, while Unicode allows for combining characters it also contains precomposed versions of most letter/diacritic combinations in normal use. These make conversion to and from legacy encodings simpler and allow applications to use Unicode as an internal text format without having to implement combining characters. For example é can be represented in Unicode as U+0065 (Latin small letter e) followed by U+0301 (combining acute) but it can also be represented as the precomposed character U+00E9 (Latin small letter e with acute).
The Unicode standard also includes a number of related items, such as character properties, text normalisation forms and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic or Hebrew, and left-to-right scripts).

Unicode

Unicode is an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard, Unicode consists of a character repertoire, an encoding methodology and set of standard character encodings, a set of code charts for visual reference, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and rules for normalization, decomposition, collation and rendering.

The Unicode Consortium, the non-profit organization that coordinates Unicode's development, has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes, as many of the existing schemes are limited in size and scope and are incompatible with multilingual environments. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including XML, the Java programming language and modern operating systems.

ICANN

The Internet Corporation for Assigned Names and Numbers (ICANN) is the authority that coordinates the assignment of unique identifiers on the Internet, including domain names, Internet protocol addresses, and protocol port and parameter numbers. A globally unified namespace (i.e., a system of names in which there is one and only one holder of each name) is essential for the Internet to function. ICANN is headquartered in Marina del Rey, California, but is overseen by an international board of directors drawn from across the Internet technical, business, academic, and non-commercial communities.

The US government continues to have the primary role in approving changes to the root zone file that lies at the heart of the domain name system. Because the Internet is a distributed network comprising many voluntarily interconnected networks, the Internet, as such, has no governing body. ICANN's role in coordinating the assignment of unique identifiers distinguishes it as perhaps the only central coordinating body on the global Internet, but the scope of its authority extends only to the Internet's systems of domain names, Internet protocol addresses, and protocol port and parameter numbers.

On Nov. 16, 2005, the World Summit on the Information Society, held in Tunis, established the Internet Governance Forum (IGF) to discuss Internet-related issues.

Internet Structure

There have been many analyses of the Internet and its structure. For example, it has been determined that the Internet IP routing structure and hypertext links of the World Wide Web are examples of scale-free networks.

Similar to the way the commercial Internet providers connect via Internet exchange points, research networks tend to interconnect into large subnetworks such as:
GEANT GLORIAD Abilene Network JANET (the UK's Joint Academic Network aka UKERNA) These in turn are built around relatively smaller networks. See also the list of academic computer network organizations

In network schematic diagrams, the Internet is often represented by a cloud symbol, into and out of which network communications can pass.

Creation of the Internet

The USSR's launch of Sputnik spurred the United States to create the Advanced Research Projects Agency (ARPA, later known as the Defense Advanced Research Projects Agency, or DARPA) in February 1958 to regain a technological lead. ARPA created the Information Processing Technology Office (IPTO) to further the research of the Semi Automatic Ground Environment (SAGE) program, which had networked country-wide radar systems together for the first time. J. C. R. Licklider was selected to head the IPTO, and saw universal networking as a potential unifying human revolution.

In 1950, Licklider moved from the Psycho-Acoustic Laboratory at Harvard University to MIT where he served on a committee that established MIT Lincoln Laboratory. He worked on the SAGE project. In 1957 he became a Vice President at BBN, where he bought the first production PDP-1 computer and conducted the first public demonstration of time-sharing.

Licklider recruited Lawrence Roberts to head a project to implement a network, and Roberts based the technology on the work of Paul Baran who had written an exhaustive study for the U.S. Air Force that recommended packet switching (as opposed to Circuit switching) to make a network highly robust and survivable. After much work, the first node went live at UCLA on October 29, 1969 on what would be called the ARPANET, one of the "eve" networks of today's Internet. Following on from this, the British Post Office, Western Union International and Tymnet collaborated to create the first international packet switched network, referred to as the International Packet Switched Service (IPSS), in 1978. This network grew from Europe and the US to cover Canada, Hong Kong and Australia by 1981.

The first TCP/IP wide area network was operational by 1 January 1983, when the United States' National Science Foundation (NSF) constructed a university network backbone that would later become the NSFNet. (This date is held by some to be technically that of the birth of the Internet.) It was then followed by the opening of the network to commercial interests in 1985. Important, separate networks that offered gateways into, then later merged with, the NSFNet include Usenet, BITNET and the various commercial and educational X.25 Compuserve and JANET. Telenet (later called Sprintnet), was a large privately-funded national computer network with free dialup access in cities throughout the U.S. that had been in operation since the 1970s. This network eventually merged with the others in the 1990s as the TCP/IP protocol became increasingly popular. The ability of TCP/IP to work over these pre-existing communication networks, especially the international X.25 IPSS network, allowed for a great ease of growth. Use of the term "Internet" to describe a single global TCP/IP network originated around this time.

The network gained a public face in the 1990s. On August 6, 1991 CERN, which straddles the border between France and Switzerland publicized the new World Wide Web project, two years after Tim Berners-Lee had begun creating HTML, HTTP and the first few Web pages at CERN.
An early popular Web browser was ViolaWWW based upon HyperCard. It was eventually replaced in popularity by the Mosaic Web Browser. In 1993 the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign released version 1.0 of Mosaic and by late 1994 there was growing public interest in the previously academic/technical Internet. By 1996 the word "Internet" was coming into common daily usage, frequently misused to refer to the World Wide Web.

Meanwhile, over the course of the decade, the Internet successfully accommodated the majority of previously existing public computer networks (although some networks such as FidoNet have remained separate). This growth is often attributed to the lack of central administration, which allows organic growth of the network, as well as the non-proprietary open nature of the Internet protocols, which encourages vendor interoperability and prevents any one company from exerting too much control over the network.

Internet and WWW

The Internet and the World Wide Web are not synonymous: the Internet is a collection of interconnected computer networks, linked by copper wires, fiber-optic cables, wireless connections, etc.; the Web is a collection of interconnected documents and other resources, linked by hyperlinks and URLs. The World Wide Web is accessible via the Internet, as are many other services including e-mail, file sharing, and others described below.

The best way to define and distinguish between these terms is with reference to the Internet protocol suite. This collection of standards and protocols is organized into layers such that each layer provides the foundation and the services required by the layer above. In this conception, the term Internet refers to computers and networks that communicate using IP (Internet protocol) and TCP (transfer control protocol). Once this networking structure is established, then other protocols can run “on top.” These other protocols are sometimes called services or applications. Hypertext transfer protocol, or HTTP, is the application layer protocol that links and provides access to the files, documents and other resources of the World Wide Web.