Module-1 Traditional HTML and XHTML
MODULE 1
Introduction to HTML and XHTML
HTML and XHTML are two of the most popular markup languages used for developing web
pages and applications. HTML is the standard markup language for creating web pages,
while XHTML is a stricter and more standardized version of HTML. Both HTML and
XHTML include a wide range of features, such as support for multimedia, styling, and
scripting.
HTML and XHTML both have features to create rich and interactive web pages and
applications. Some of the most popular HTML and XHTML features include:
Support for multimedia: Both HTML and XHTML support various forms of
multimedia, such as images, video, and audio. HTML also supports animated
images and graphics.
Styling: Both HTML and XHTML offer a wide range of options for styling web
pages. CSS (Cascading Style Sheets) is the most commonly used style sheet
language, and it can be used to style both HTML and XHTML documents.
Scripting: HTML and XHTML both support various forms of scripting, such
as JavaScript. Scripting can be used to add interactivity to web pages and
applications.
Forms: Forms are one of the most important features of HTML and XHTML.
Forms allow users to input data, which can then be processed by a server-side
script.
Tables: Tables are another important feature of HTML and XHTML. Tables can
be used to display tabular data, such as product information or financial data.
Links: Links are one of the most basic features of HTML and XHTML. Links
allow users to navigate between web pages.
Metadata: Metadata is information about a web page or document. It can include
information such as the author, keywords, and description.
HTML and XHTML are two of the most popular markup languages used for developing web
pages and applications. Both HTML and XHTML include a wide range of features, such as
support for multimedia, styling, and scripting. HTML is the standard markup language for
creating web pages, while XHTML is a stricter and more standardized version of HTML.
Dept. of CSE, DSCE Page 1
Module-1 Traditional HTML and XHTML
HTML and XHTML Specifications
HTML and XHTML specifications are the standards that define the syntax and semantics of
the Hypertext Markup Language (HTML) and Extensible Hypertext Markup Language
(XHTML), respectively. They are maintained by the World Wide Web Consortium (W3C).
The latest versions of the HTML and XHTML specifications are known as HTML5 and
XHTML5, respectively. These specifications are the basis for all modern web browsers, and
define the standard for how HTML documents are structured and processed.
First Look at HTML and XHTML
An HTML tag is a piece of markup language used to indicate the beginning and end of an
HTML element in an HTML document or HTML tags are like keywords which defines that
how web browser will format and display the content.
In the case of HTML, markup instructions found within a Web page relay the structure of the
document to the browser software. For example, if you want to emphasize a portion of text,
you enclose it within the tags <em> and </em>, as shown here: <em>This is important text!
</em>. When a Web browser reads a document that has HTML markup in it, it determines
how to render it onscreen by considering the HTML elements embedded within the
document:
Hello HTML and XHTML World
Dept. of CSE, DSCE Page 2
Module-1 Traditional HTML and XHTML
What is HTML?
HTML stands for Hyper Text Markup Language
HTML is the standard markup language for creating Web pages
HTML describes the structure of a Web page
HTML consists of a series of elements
HTML elements tell the browser how to display the content
HTML elements label pieces of content such as "this is a heading", "this is a
paragraph", "this is a link", etc.
Structure of HTML Document
A Simple HTML Document
<!DOCTYPE html>//
<html>
<head>
<title>Page Title</title>
</head>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>
The <!DOCTYPE html> declaration defines that this document is an HTML5
document
The <html> element is the root element of an HTML page
The <head> element contains meta information about the HTML page
The <title> element specifies a title for the HTML page (which is shown in the
browser's title bar or in the page's tab)
The <body> element defines the document's body, and is a container for all the visible
contents, such as headings, paragraphs, images, hyperlinks, tables, lists, etc.
Dept. of CSE, DSCE Page 3
Module-1 Traditional HTML and XHTML
The <h1> element defines a large heading
The <p> element defines a paragraph
An HTML element is defined by a start tag, some content, and an end tag:
<tagname> Content goes here... </tagname>
The HTML element is everything from the start tag to the end tag:
<h1>My First Heading</h1>
<p>My first paragraph.</p>
HTML Syntax:
Elements and Attributes
HTML documents are composed of textual content and HTML elements.
The term HTML element is often used interchangeably with the term tag.
However, an HTML element consists of the element name within angle brackets (i.e., the
tag) and the content within the tag.
A tag consists of the element name within angle brackets.
The element name appears in both the beginning tag and the closing tag.
The closing tag contains a forward slash followed by the element name, all enclosed
within angle brackets.
In the above example, <p> is the tag and “Hello I am a paragraph” is the content. HTML
elements can also contain attributes.
An HTML attribute is a ‘name = value’ pair that provides more information about the
HTML element. In the above example, style is an attribute.
An element which does not contain any text or image content is called an empty element.
It is an instruction to the browser to do something.
Character entities
These are special characters for symbols for which there is either no easy way to type
them via a
keyboard (such as the copyright symbol or accented characters) or which have a reserved
meaning in HTML (for instance the “<” or “>” symbols).
There are many HTML character entities. They can be used in an HTML document by
using the entity name or the entity number.
The Most Common Character Entities:
Symbol Description Entity Name Number Code
" quotation mark " "
' apostrophe ' '
& ampersand & &
< less-than < <
Dept. of CSE, DSCE Page 4
Module-1 Traditional HTML and XHTML
> greater-than > >
Lists
HTML provides simple and effective ways to specify lists in documents.
There are three types of lists:
■ Unordered lists. Collections of items in no particular order; these are by default rendered by
the browser as a bulleted list. However, it is common in CSS to style unordered lists without the
bullets. Unordered lists have become the conventional way to markup navigational menus.
■ Ordered lists. Collections of items that have a set order; these are by default rendered by the
browser as a numbered list.
■ Definition lists. Collection of name and definition pairs. These tend to be used infrequently.
Perhaps the most common example would be a FAQ list.
<ol > <ul >
<li> Paragraphs </li> <li> Paragraphs </li>
<li> Anchor Tag </li> <li> Anchor Tag </li>
<li> Images </li> <li> Images </li>
</ol> </ul>
HTML and XHTML: Version History
Dept. of CSE, DSCE Page 5
Module-1 Traditional HTML and XHTML
What is XHTML?
XHTML stands for Extensible Hyper Text Markup Language
XHTML is a stricter, more XML-based version of HTML
XHTML is HTML defined as an XML application
XHTML is supported by all major browsers
Why XHTML?
XML is a markup language where all documents must be marked up correctly (be "well-
formed").
XHTML was developed to make HTML more extensible and flexible to work with other data
formats (such as XML). In addition, browsers ignore errors in HTML pages, and try to
display the website even if it has some errors in the markup. So, XHTML comes with a much
stricter error handling.
The Most Important Differences from HTML
<!DOCTYPE> is mandatory
The xmlns attribute in <html> is mandatory
<html>, <head>, <title>, and <body> are mandatory
Elements must always be properly nested
Elements must always be closed
Elements must always be in lowercase
Attribute names must always be in lowercase
Attribute values must always be quoted
Attribute minimization is forbidden
Document Structure Elements
(X)HTML documents should begin with a <!DOCTYPE> declaration. This statement
identifies the type of markup that is supposedly used in a document. For example,
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
Dept. of CSE, DSCE Page 6
Module-1 Traditional HTML and XHTML
Indicates that we are using the transitional variation of HTML 4.01 that starts with a root
element html. In other words, an <html> tag will serve as the ultimate parent of all the
content and elements within this document.
In the case of an XHTML document:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
There are numerous doctype declarations that are found in HTML and XHTML documents:
(X)HTML Document Structure:
Given the HTML 4.01 DTD, a basic document template can be derived from the specification
Dept. of CSE, DSCE Page 7
Module-1 Traditional HTML and XHTML
Within a root html element, the basic structure of a document reveals two elements: the head
and the body. The head element contains information and tags describing the document, such
as its title, while the body element houses the document itself, with associated markup
required to specify its structure.
The structure of an XHTML document is pretty much the same with the exception of a
different <!DOCTYPE> indicator and an xmlns (XML name space) attribute added to the
html tag so that it is possible to intermix XML more easily into the XHTML document:
Dept. of CSE, DSCE Page 8
Module-1 Traditional HTML and XHTML
Alternatively, in either HTML or XHTML (but not in HTML5), we can replace the <body>
tag with a <frameset> tag. Each frame in turn would reference another HTML/XHTML
document containing either a standard document, complete with <html>, <head>, and
<body> tags, or perhaps yet another framed document. The <frameset> tag also should
include a noframes element that provides a version of the page for browsers that do not
support frames. Within this element, a <body> tag should be found for browsers that do not
support frames. A visual representation of this idea is shown here:
Dept. of CSE, DSCE Page 9
Module-1 Traditional HTML and XHTML
The structure of a non-framed (X)HTML document:
The Document Head
The information in the head element of an (X)HTML document is very important because it
is used to describe or augment the content of the document. The element acts like the front
matter or cover page of a document. In many cases, the information contained within the
Dept. of CSE, DSCE Page 10
Module-1 Traditional HTML and XHTML
head element is information about the page that is useful for visual styling, defining
interactivity, setting the page title, and providing other useful information that describes or
controls the document.
The title Element
A single title element is required in the head element and is used to set the text that most
browsers display in their title bar. The value within a title is also used in a browser’s history
system, recorded when the page is bookmarked, and consulted by search engine robots to
help determine page meaning. In short, it is pretty important to have a syntactically correct,
descriptive, and appropriate page title. Thus, given
<title>Simple HTML Title Example</title>
<meta>: Specifying Content Type, Character Set, and More
A <meta> tag has a number of uses. For example, it can be used to specify values that are
equivalent to HTTP response headers. For example, if you want to make sure that your
MIME type and character set for an English-based HTML document is set, you could use
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Other Elements in the head
In addition to the title and meta elements, under the HTML 4.01 and XHTML 1.0 strict
DTDs, the elements allowed within the head element include base, link, object, script, and
style. Comments are also allowed.
<base>
A <base> tag specifies an absolute URL address that is used to provide server and
directory information for partially specified URL addresses, called relative links, used
within the document:
<base href="http://htmlref.com/basexeample" >
<link>
A <link> tag specifies a special relationship between the current document and
another document. Most commonly, it is used to specify a style sheet used by the
document
<link rel="stylesheet" media="screen" href="global.css" type="text/css" >
<object>
An <object> tag allows programs and other binary objects to be directly embedded in
a Web page. Here, for example, a nonvisible Flash object is being referenced for some
use:
<object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"
width="0" height="0" id="HiddenFlash" > <param name="movie"
value="flashlib.swf" /> </object>
<script>
Dept. of CSE, DSCE Page 11
Module-1 Traditional HTML and XHTML
A <script> tag allows scripting language code to be either directly embedded within,
<script type="text/javascript"> alert("Hi from JavaScript!"); /* more code
below */ </script>
<style>
A <style> tag is used to enclose document-wide style specifications, typically in
Cascading Style Sheet (CSS) format, relating to fonts, colors, positioning, and other
aspects of content presentation:
<style type="text/css" media="screen"> h1 {font-size: xx-large; color: red; font-
style: italic;} /* all h1 elements render as big, red and italic */ </style>
Comments
Comments are often found in the head of a document. Following SGML syntax, a
comment starts with <!-- and ends with --> and may encompass many lines:
<!-- Hi I am a comment -->
The complete syntax of the markup allowed in the head element under strict (X)HTML
is shown here:
Example:
Dept. of CSE, DSCE Page 12
Module-1 Traditional HTML and XHTML
The Document Body
After the head section, the body of a document is delimited by <body> and </body>. Within
the body of a Web document is a variety of types of elements. For example, blocklevel
elements define structural content blocks such as paragraphs (p) or headings (h1-h6). Block-
level elements generally introduce line breaks visually. Special forms of blocks, such as
unordered lists (ul), can be used to create lists of information. Within nonempty blocks, inline
elements are found. There are numerous inline elements, such as bold (b), italic (i), strong
(strong), emphasis (em), and numerous others.
Dept. of CSE, DSCE Page 13
Module-1 Traditional HTML and XHTML
The Rules of (X)HTML
1. HTML Is Not Case Sensitive, XHTML Is
<B>Go boldly</B>
<B>Go boldly</b>
<b>Go boldly</B>
<b>Go boldly</b>
2. Attribute Values May Be Case Sensitive
Any white space between characters displays as a single space. This includes all tabs,
line breaks, and carriage returns.
3. (X)HTML Follows a Content Model
All forms of markup support a content model that specifies that certain elements are
supposed to occur only within other elements.
4. Elements Should Have Close Tags Unless Empty
some elements have optional close tags. For example, both of the paragraphs here are
allowed, although the second one is better:
<p>This isn't closed
<p>This is</p>
However, given the content model, the close of the top paragraph can be inferred
since its content model doesn’t allow for another <p> tag to occur within it. A few
elements, like the horizontal rule (hr) and line break (br), do not have close tags
because they do not enclose any content. However, under XHTML you must always
close tags, so you would have to write <br></br>
5. Unused Elements May Minimize
Sometimes tags may not appear to have any effect in a document.
<p></p><p></p><p></p>
Doesnot produce numerous blank lines, so can be avoided.
6. Elements Should Nest
A simple rule states that tags should nest, not cross; thus
<b><i>is in error as tags cross</b></i>
whereas
<b><i>is not since tags nest</i></b>
and thus is syntactically correct
7. Attributes Should Be Quoted
Quotes should be used under transitional markup forms and are required under strict
forms like XHTML; so,
<img src="robot.gif" height="10" width="10" alt="robot" />
would be the correct form of the tag
Dept. of CSE, DSCE Page 14
Module-1 Traditional HTML and XHTML
8. Entities Should Be Used for Special Characters
Instead of writing these potentially parse-dangerous characters in the document, they
should be escaped out using a character entity. For example, instead of <, use < or
the numeric equivalent <. Instead of >, use > or >.
9. Browsers Ignore Unknown Attributes and Elements
The browsers will ignore unknown elements and attributes; so,
<bogus>this text will display on screen</bogus>
and markup such as
<p id="myPara" obviouslybadattribute="TRUE">will also render fine.</p>
Browsers make best guesses at structuring malformed content and tend to ignore code
that is obviously wrong. The permissive nature of browsers has resulted in a massive
number of malformed HTML documents on the Web.
Major Themes of (X)HTML
1. Logical and Physical Markup:
Physical markup refers to using a markup language such as (X)HTML to make pages
look a particular way; logical markup refers to using (X)HTML to specify the
structure or meaning of content while using another technology, such as CSS, to
designate the look of the page.
Physical markup is obvious; if you want to highlight something that is important to
the reader, you might embolden it by enclosing it within a <b> tag:
<b>This is important!</b>
Logical markup is a little less obvious; to indicate the importance of the phrase, it
should be enclosed in the logical strong element:
<strong>This is important.</strong>
Remember, the <strong> tag is used to say that something is important content, not to
indicate how it looks. If a CSS rule were defined to say that important items should be
big, red, and italic
<style="text/css"> strong {font-size: xx-large; color: red; font-style: italic;}
</style>
2. Standards vs. Practice
Many Web developers simply do not know or care about standards. As long as their
page looks right in their favorite browser, they are happy and will continue to go on
abusing HTML tags like <table> and using various tricks and proprietary elements.
Standards provide needed consistency. The Web needs standards, but standards have
to acknowledge what people actually do.
HTML tags – refer various HTML tags in slides and w3schools.
Dept. of CSE, DSCE Page 15