Impact of the World Wide Web
o Transformation of lives in industrialized and unindustrialized countries
o Daily use for communication, shopping, and information gathering
o Role in social and political demonstrations and revolutions
Downsides of the Web
o Easier access to harmful content (e.g., pornography and gambling)
o Ease of spreading destructive ideas
Upsides of the Web
o Communication with friends, relatives, and business associates
o Online shopping for a wide variety of products
o Access to limitless information on various topics
1. A Brief Introduction to the Internet
1.1 Origins
The U.S. Department of Defense (DoD) developed a large-scale computer network in the
1960s for communications, program sharing, and remote computer access for defense-
related research.
The network was required to be robust enough to continue functioning even if some
network nodes were lost due to sabotage, war, or other causes.
ARPA (later renamed DARPA) funded the construction of the first network, ARPAnet,
connecting about a dozen research laboratories and universities, with the first node
established at UCLA in 1969.
ARPAnet was primarily used for text-based communications through electronic mail and was
only available to ARPA-funded research institutions.
Other networks, such as BITNET and CSNET, were developed in the late 1970s and early
1980s due to limited ARPAnet access.
NSFnet, sponsored by the National Science Foundation (NSF), was created in 1986 and
eventually replaced ARPAnet for most nonmilitary uses. By 1992, NSFnet connected over 1
million computers worldwide.
In 1995, a small part of NSFnet returned to being a research network, with the rest becoming
known as the Internet.
1.2 What Is the Internet?
The Internet is a vast collection of interconnected computers and devices of various sizes,
configurations, and manufacturers.
The Transmission Control Protocol/Internet Protocol (TCP/IP) is the single, low-level protocol
that allows diverse devices to communicate with each other. TCP/IP became the standard for
computer network connections in 1982.
The Internet is a network of networks, with individual computers within an organization
connected to each other in a local network, which is then connected to the Internet.
All devices connected to the Internet must be uniquely identifiable.
TCP/IP is not the only communication protocol used by the Internet; User Datagram
Protocol/Internet Protocol (UDP/IP) is an alternative used in some situations.
1.3 Internet Protocol Addresses
Internet nodes identified by names for people and numeric addresses for computers
Internet Protocol (IP) address is a unique 32-bit number for machines connected to the
Internet
Written as four 8-bit numbers separated by periods
Organizations assigned blocks of IP addresses for their machines that need Internet access
IPv6 standard approved in late 1998, expanding address size from 32 bits to 128 bits
1.4 Domain Names
Machines on the Internet have textual names due to people's difficulty dealing with numbers
Domain names start with the host machine name, followed by progressively larger enclosing
collections of machines called domains
Last domain name identifies the type of organization in which the host resides
Fully qualified domain name (FQDN) is the combination of the hostname and all domain
names
FQDN must be converted to an IP address before message transmission over the Internet
Domain Name System (DNS) and name servers handle FQDN to IP address conversion
FQDNs and IP addresses must be unique
Figure 1: Domain name conversion process
telnet can be used to determine the IP address of a Web site
Variety of protocols running on top of TCP/IP developed by the mid-1980s to support
different Internet uses
World Wide Web emerged as a better approach to access the Internet's advantages
2. The World Wide Web
2.1 Origins
In 1989, Tim Berners-Lee and a small group at CERN proposed a new protocol and document
access system for the Internet called the World Wide Web
The Web aimed to allow scientists worldwide to exchange documents describing their work
The system enabled users to search for and retrieve documents from databases on any
document-serving computer connected to the Internet
The Web used hypertext, which is text with embedded links to other documents, allowing
nonsequential browsing
The first implementation of the Web was on a NeXT computer at CERN in late 1990, and it
was released to the world in 1991
2.2 Web or Internet?
The Internet and the Web are not the same thing
The Internet is a collection of computers and devices connected by communication
equipment
The Web is a collection of software and protocols installed on most, if not all, computers on
the Internet
Web servers provide documents, while Web clients (browsers) request and display
documents to users
The Internet was useful before the Web, and it remains useful without it, but most users now
access the Internet through the Web
3 Web Browsers
Web operates in a client-server configuration, with browsers acting as clients and servers
providing documents
Early browsers were text-based, limiting the growth of the Web
Mosaic, released in early 1993, was the first browser with a graphical user interface,
developed at NCSA, University of Illinois
Mosaic's interface provided convenient access to the Web for non-scientist and non-
developer users
Versions of Mosaic for Apple Macintosh and Microsoft Windows systems were released in
late 1993, leading to explosive growth in Web usage
Browsers initiate communication with servers, which respond to requests
Servers may provide static documents or request user input through the browser
The Hypertext Transfer Protocol (HTTP) is the most common protocol used by the Web for
communication between browsers and servers
Most commonly used browsers are Microsoft Internet Explorer (IE), Firefox, and Chrome,
with a focus on these in this text
4 Web Servers
4.1 Web Server Operation
Web servers provide documents to requesting browsers
Servers act when requests are made by browsers running on other computers
Most commonly used web servers are Apache and Microsoft's Internet Information Server
(IIS)
Web browsers initiate network communications with servers by sending URLs
URLs can specify a data file or a program stored on the server
All communications between a web client and a web server use the standard web protocol,
Hypertext Transfer Protocol (HTTP)
Web servers monitor a communications port, accept HTTP commands, and perform specified
operations
4.2 General Server Characteristics
File structure of a web server has two separate directories: document root and server root
Document root stores web documents to which the server has direct access and normally
serves to clients
Server root stores the server and its support software
Files stored directly in the document root are available to clients through top-level URLs
Virtual document trees allow part of the servable document collection to be stored outside
the directory at the document root
Contemporary servers provide a wide variety of client services, including support for virtual
hosts and proxy servers
Many servers can interact with database systems through server-side scripts
4.3 Apache
Began as the NCSA server, httpd, with added features
Most widely used web server due to its speed, reliability, and open-source nature
Offers a long list of services beyond serving documents to clients
Configuration information is read from a file when Apache begins execution
Three configuration files: httpd.conf, srm.conf, and access.conf
httpd.conf stores the directives that control Apache server behavior
4.4 IIS
Most popular server on Windows platforms
Supplied as part of Windows and considered a reasonably good server
Apache and IIS provide similar services
IIS is controlled by a window-based management program, IIS snap-in
IIS snap-in controls both IIS and FTP, allowing site managers to set server parameters
Accessed through Control Panel, Administrative Tools, and IIS Manager on Windows XP and
Vista
5. Uniform Resource Locators (URLs)
5.1 URL Formats
All URLs have the same general format: scheme
Common schemes include http, ftp, gopher, telnet, file, mailto, and news
HTTP protocol is used to request and send HTML documents
URL format for HTTP: //fully-qualified-domain-name/path-to-document
File protocol is used for documents residing on the machine running the browser: file://path-
to-document
Host name is the name of the server computer that stores the document or provides access
to it
Default port number of Web server processes is 80; if a server uses a different port number,
it must be attached to the host name in the URL
URLs cannot have embedded spaces or special characters; they must be coded as a percent
sign (%) followed by the two-digit hexadecimal ASCII code for the character
Eg: if domain name ‘RV CE’ has to be specfied then it has to be written as ‘RV%20CE’ (20 is
ASCII code for space)
5.2 URL Paths
Path to the document for the HTTP protocol is similar to a path to a file or directory in an
operating system's file system
Path is given by a sequence of directory names and a file name, separated by the appropriate
separator character (forward slashes for UNIX servers, backward slashes for Windows
servers)
Path in a URL can be complete (includes all directories along the way) or partial (relative to
some base path specified in the server's configuration files)
If the specified document is a directory, its name is followed immediately by a slash
If a directory does not have a file that the server recognizes as a home page, a directory
listing is constructed and returned to the browser
6. Multipurpose Internet Mail Extensions (MIME)
6.1 Type Specifications
MIME was developed to specify the format of documents sent via Internet mail
Adopted by the Web to specify document types transmitted over the Web
MIME format specification attached to the beginning of a document by the Web server
MIME specifications have the form: type/subtype
Common MIME types: text, image, video
Common text subtypes: plain, html
Common image subtypes: gif, jpeg
Common video subtypes: mpeg, quicktime
Servers determine the type of a document by using the file name's extension as the key into
a table of types
6.2 Experimental Document Types
Experimental subtypes begin with x-, e.g., video/x-msvideo
Web providers can add an experimental subtype to the list of MIME specifications stored in
their server
Browser must supply a program (helper application or plug-in) to display the contents of
experimental document types
Browsers have a set of MIME specifications they can handle, and an error message is
displayed if they cannot render a document
Browsers can indicate to the server their preferred document types to receive (discussed in
Section 7)
The Hypertext Transfer Protocol (HTTP)
All Web communications use HTTP
Current version of HTTP is 1.1 (RFC 2616)
HTTP consists of two phases: request and response
Each HTTP communication has two parts: header and body
HTTP Request Phase
General form: HTTP method Domain part of the URL HTTP version, header fields, blank line,
message body
Commonly used HTTP request methods: GET, HEAD, POST, PUT, DELETE
GET: returns the contents of the specified document
HEAD: returns the header information for the specified document
POST: executes the specified document using the enclosed data
PUT: replaces the specified document with the enclosed data
DELETE: deletes the specified document
Most common request methods: GET and POST
Header fields provide additional information, such as Accept (preferred MIME type) and Host
(host name)
If-Modified-Since: date request field specifies that the requested file should be sent only if it
has been modified since the given date
Content-length field gives the length of the response body in bytes
Header of a request must be followed by a blank line
Browser not necessary to communicate with a Web server; telnet can be used instead
7.2 The Response Phase
General form of HTTP response consists of status line, response header fields, blank line, and
response body
Status line includes HTTP version, three-digit status code, and short textual explanation of
status code
Status codes categorized into five groups:
o Informational (1xx)
o Success (2xx)
o Redirection (3xx)
o Client error (4xx)
o Server error (5xx)
Common status codes:
o 200 OK: request handled without error
o 404 Not Found: requested file not found
o 500 Internal Server Error: server encountered a problem
Response header contains several lines of information about the response, with Content-
type being the essential field
Response header followed by a blank line, then the response body (e.g., HTML file)
HTTP 1.1 default operation keeps the connection open for a time, allowing multiple requests
without reestablishing the connection, increasing web efficiency
8 Security
Internet and Web are prone to security problems
Web server side: anyone can request software execution or access data on the server
Browser end: any server can download software to be executed on the browser host
machine
Security issues for transactions (e.g., credit card purchase): privacy, integrity, authentication,
nonrepudiation
Security issues are
o Privacy: it shouldn’t be possible to steal the data while transmitting
o Integrity: it shouldn’t be possible to modify the data while transmitting
o Authentication: it should be possible for both ends to be certain of each others
identity
o Nonrepudiation: it should be possible to legally prove that the message was actually
sent and received
Encryption is the basic tool to support privacy and integrity
Public-key encryption uses a public key and a private key to encrypt and decrypt messages
RSA is the most widely used public-key algorithm
Intentional and malicious destruction of data on Internet-connected computers is another
security problem
Denial-of-service (DoS) attacks, viruses, and worms cause billions of dollars in damage
DoS attacks flood a Web server with requests, overwhelming its ability to operate effectively
Viruses replicate and overwrite memory, destroying programs and data
Worms spread on their own and damage memory
Protection against viruses and worms is provided by antivirus software, which must be
updated frequently
9. The Web Programmer's Toolbox
HTML: A markup language used to describe the form and layout of documents for display in
a browser
XML: A meta-markup language used to define custom markup languages
JavaScript: A client-side scripting language used for creating dynamic web content
PHP: A server-side scripting language used for creating dynamic web content
Ruby: A server-side scripting language used for creating dynamic web content
JSF (JavaServer Faces): A Java-based framework for building web applications
ASP.NET: A Microsoft framework for building web applications
Rails (Ruby on Rails): A Ruby-based framework for building web applications
Flash: A technology for creating and displaying graphics and animation in HTML documents
Ajax (Asynchronous JavaScript and XML): A web technology used for creating dynamic and
interactive web applications
9.1 Overview of HTML
HTML is not a programming language; it cannot describe computations
HTML documents consist of content and controls specified by tags
Tags delimit particular kinds of content and form elements
Some tags include attribute specifications for additional information
9.2 Tools for Creating HTML Documents
HTML documents can be created with a general-purpose text editor
HTML editors provide shortcuts for producing repetitious tags and may include spell-
checkers, syntax-checkers, and color-coding
WYSIWYG (What You See Is What You Get) HTML editors allow users to see the formatted
document while writing the HTML code
Examples of WYSIWYG HTML editors: Microsoft FrontPage and Adobe Dreamweaver
9.3 Plug-ins and Filters
Plug-ins: Programs that can be integrated with a word processor to add new capabilities,
such as creating HTML documents with WYSIWYG features
Filters: Converters that transform an existing document into HTML format; they are not part
of the editor or word processor that created the document
Neither plug-ins nor filters produce HTML documents with identical appearance to the
original word processor document
Using plug-ins or filters allows easy conversion of existing documents to HTML and the use of
familiar word processors for creating HTML documents
HTML output produced by converters often needs modification, leading to version problems
during maintenance
9.4 Overview of XML
XML (eXtensible Markup Language) is a simplified version of SGML (Standard Generalized
Markup Language) for creating custom markup languages
XML-based markup languages describe data and its meaning through individualized tags and
attributes, while HTML describes overall layout and presentation
XML allows application programs to process specific kinds of data based on tag meanings
and validate documents before processing
9.5 Overview of JavaScript
JavaScript is a client-side scripting language used for validating form data, building Ajax-
enabled HTML documents, and creating dynamic HTML documents
JavaScript is dynamically typed, unlike strongly typed languages such as C++ and Java
JavaScript code is embedded in HTML documents and interpreted by the browser on the
client-side
JavaScript defines an object hierarchy for accessing and modifying elements of an HTML
document, enabling dynamic document creation and manipulation
SUMMARY
Internet and Web Fundamentals
The Internet began as ARPAnet in the late 1960s and later became known as NSFnet for
nonmilitary users
The Internet connects millions of computers worldwide through the TCP/IP protocol, making
them appear the same at the lowest level
Two kinds of addresses are used on the Internet: IP addresses (four-part numbers for
computers) and fully qualified domain names (words separated by periods for people)
Fully qualified domain names are translated to IP addresses by name servers running DNS
A number of information interchange protocols have been created, including telnet, ftp, and
mailto
Web Basics
The Web started in the late 1980s at CERN as a means for physicists to share results
efficiently with colleagues at other locations
The fundamental idea of the Web is to transfer hypertext documents among computers
using the HTTP protocol on the Internet
Browsers request HTML documents from Web servers and display them for users
URLs are used to address all documents on the Internet, including the specific protocol, fully
qualified domain name, and file path to the specific document on the server
Web servers find and send requested documents to browsers
The type of a document delivered by a Web server appears as a MIME specification in the
first line of the document
Web Programming Languages and Tools
Web programmers use several languages to create documents that servers can provide to
browsers
HTML is the standard markup language for describing how Web documents should be
presented by browsers
Tools like plug-ins and filters can be used without specific knowledge of HTML to create
HTML documents
XML is a meta-markup language that provides a standard way to define new markup
languages
JavaScript is a client-side scripting language that can be embedded in an HTML document to
describe simple computations and change elements dynamically
Flash is a framework for building animation into HTML documents
Ajax is an approach to building Web applications in which partial document requests are
handled asynchronously
PHP is a server-side scripting language used primarily for form processing and database
access from browsers
Servlets and JSP are server-side Java programs used for form processing, database access, or
building dynamic documents
ASP.NET is a Web development framework using any .NET programming language
Ruby, with the Rails framework, is used for building Web applications that access databases,
simplifying the development process