Technology & Innovation

XHTML Web Design for Beginners

What XHTML is and how you can use it to start producing the next generation of Web pages.

What XHTML is and how you can use it to start producing the next generation of Web pages.

MIS Web Design
Lecturer

Videos

 Breakfast Coffee Menu at Course Talk

Next Hangout: Full
Past Hangout: Aug 28th @ 9:15am
Next Hangout: Sept 28th @ 9:15am
Future Hangout: Oct 19th @ 9:15am
reserve up to
2 hours
Breakfast menu items
19
Location
Kierland Commons

XHTML Web Design for Beginners: Introduction

Article Index

XHTML Web Design Introduction

This article is for readers who have either no prior experience of Web Design or very little. If you have dabbled with exporting HTML from Microsoft Word, or played around with FrontPage a little and want to understand what you are doing then this article is for you. I will teach you what XHTML is and how you can use it to start producing the next generation of Web pages.

If you have difficulty with any part of this article or can't get an example to work feel free to email me at info @miswebdesign.com. I'll do my best to answer you as quickly as possible.

If you want to skip this introduction and get on with it feel free. Just go to the Hello World section and get started. But please come back and read the rest of this introduction later when you have time.

Color

I have used color in the example XHTML throughout this article to make it easier for you to understand the code. The color is purely there for this reason and serves no other purpose.

No Programs

I will not be showing you how to use any programs to write XHTML for you. I have a firm belief that the best way to write Web pages is to get your hands dirty and write the code yourself. I've been doing it for seven years so far and it hasn't let me down yet. Here are the main reasons I believe this.

Programs that produce HTML for you often do so badly. What I mean is that they often produce Web pages that go the long way round about doing things. When you code your pages by hand you have an intimate understanding of what you are doing and can make the actual size of the Web page file as small as possible. This reduces download times so your pages load quicker and your users are happier.

When you use a program to generate HTML for you, you do not understand how your page is built internally because it does it for you. This is not a problem as long as everything works. But what about when it doesn't? If you find that your Web page doesn't display properly in Internet Explorer 4, and many of your users use that browser, you are going to have to sort it out. This means forgetting about the program and looking at the code yourself. Do you see the problem? You've been using the program to code the page for you so when the problem occurs you haven't got the knowledge you need to fix it. And problems will occur.

The Internet is no longer limited to people with computers viewing Web sites through one or two different Web browsers. Everything has a Web browser in it these days. Mobile phones, Televisions, Personal Digital Assistants, Cars, even fridges. Blind users "view" Web sites using speech synthesis or Braille devices. There is no way you can test each page you produce in all of the possible ways it may be used. But there is a way to give you the best chance that they will work. This is achieved through producing pages using the standards laid out by the World Wide Web Consortium (W3C), the people who work on XHTML and other Internet standards. Once you have produced your pages the W3C provide a validation service to check that your page meets the standards and therefore has the best chance of being used on any device. I do not know of any HTML generation programs that produce valid code.

I hope that has persuaded you that the learning curve for XHTML is worth it. If you decide to use a program to do it then that will have a learning curve too, so you might as well take the code option and save yourself hassle in the future.

Why XHTML?

Since 1990 HTML or Hyper Text Markup Language has been the language recommended for writing Web pages in. And it has been very successful (you didn't need me to tell you that). But HTML has its problems. Without going into specifics, as it's not the subject of this article, HTML has become a mess. To sort this mess out the World Wide Web Consortium, the standards body for the Web, came up with XHTML in 1999. XHTML stands for eXtensible Hyper Text Markup Language and is written in a language called XML or eXtensible Markup Language.

As the name implies XHTML has the capability of being extended. You can use extra modules to do things with your pages that weren't possible with HTML. The long-term goal is that your Web pages will be able to be understood by computers as well as humans. If this doesn't make sense, allow me to explain.

You may be thinking that computers already understand Web pages because you use a computer to view them. This is true. But computers only understand how to display your pages, not what they mean. Imagine if computers understood what they meant, you could tell your computer to go and visit all of your local supermarket's Web sites and tell you which one is the cheapest for this weeks shopping. Your computer could visit the news sites around the world and bring back the latest headlines that relate to things you are interested in. The possibilities are endless.

Hopefully you now see why XHTML is important. I decided to write this tutorial to teach you XHTML from scratch.

XHTML Web Design for Beginners: Hello World

Article Index

Hello World

No beginners guide would be complete without showing you how to say "Hello World". With XHTML this is pretty simple. Don't worry if you don't understand everything, it will all become clear in time. Your "Hello World" Web page code looks like this:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">
<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Hello World</title>
</head>
<body>
<p>My first Web page.</p>
</body>
</html>

In a visual browser such as Internet Explorer the page above would look something like this:

A Microsoft Internet Explorer window. The title bar contains the text "Hello World". The page contains the text "My first Web page.".


Figure 1-1
View Figure 1-1

We are not going to worry about the code that is a grey color for the time being. All you need to know for now is that it tells the computer that this page is in XHTML and the language used is English. This code needs to be in every page that you produce and release on a live Web site but I'm going to leave it out until we deal with it later to help you learn without it getting it the way. Don't sweat it.

XHTML is called a Markup language because that's what you do with it. You mark up areas of text to indicate what they mean so the browser can know what to do with them. This is done by using elements. An element consists of two tags, an opening tag and a closing tag. Tags use the angle brackets < and > to show they are tags, and the closing tag also has a slash /.

Let's look at the document we just saw to demonstrate this. The <title> element is used to indicate the title of a page. In Internet Explorer this is displayed in the bar at the top of the window. Our title element looks like this:

<title>Hello World</title>

The <title> tag means we are starting a new title element. This is then followed by the text that we want the title to be. In this case the title will be "Hello World". To tell the browser that we have finished with the title we use a closing tag of </title>. As mentioned above the only difference between a start tag and an end tag is the slash /. This is essential as it is the only way the computer knows whether you are starting a new tag or finishing a previous one.

The name of the opening and closing tags must be the same, so:

<title>Hello World</heading>

is invalid and will not work.

As well as containing text such as "Hello World" above, elements can contain other elements. If we look just outside the <title> element we can see that it is inside a <head> element like so:

<head>
<title>Hello World</title>
</head>

This means that the <title> is part of the <head> of the document, because it is inside it. There is no limit to how many elements another element can contain, as long as you follow the rules that we will look at later.

The <head> of a document is used to tell the computer things about your document rather than things that should be in it. The <title> is not part of the page itself; it describes what the document is, so it goes in the <head>. All XHTML documents must have a <head> element that must contain one <title> element, although others are allowed that we will look at later.

After the <head> comes the <body>. The <body> is the part of the document that contains the page itself. All XHTML documents must have one <body> element. The body contains things like paragraphs, bulleted lists, pictures and links to other documents. All of the stuff you view when you visit a site is contained in the <body> element.

Our <body> element is very simple; it contains a single element <p>:

<body>
<p>My first Web page.</p>
</body>

Have you guessed what the <p> element is used for? The <p> element is used to mark a paragraph, so our page will have one paragraph with the text "My first Web page." in it. If we wanted add another paragraph we could do it like this:

<body>
<p>My first Web page.</p>
<p>I hope you like it.</p>
</body>

In a visual browser such as Internet Explorer the page above would look something like this:

The page section only of a Web browser window. The text "My first Web page", followed by some space, and below it the text "I hope you like it.".


Figure 1-2
View Figure 1-2

There's one more essential ingredient that we haven't covered. The <head> and <body> elements are contained by an element <html> which contains the entire document (the <head> and <body> elements). Our <html> element above looks like this:

<html>
<head>
<title>Hello World</title>
</head>
<body>
<p>My first Web page.</p>
</body>
</html>

The <html> element must contain one <head> element and one <body> element.

You may be wondering why there is extra space at the start of some of the lines. This is purely for our benefit and makes no difference to the computer processing your pages. The idea is to add tabs or a set amount of spaces at the start of each line to match the level of your tags. Look at the code above, <html> is not contained in any element so there is no space. <head> is container by one element, <html>, so it has one tab. <title> is contained by two elements, <html> and <head> so it has two tabs, and so on. Trust me, when your documents get big, it makes life a lot easier.

Now it's your turn

If you feel up to it, have a go at doing some pages yourself before reading any more. If not just skip to the next section.

First of all try the "Hello world" example that we just looked at. Here's how.

Open up a text editor of your choice. If you're using Windows then

Start > Programs > Accessories > Notepad

will get you into Notepad, but any text editor will do. Please note that Microsoft Word and other Word processors are not text editors and are not suitable for this task.

Now type in the code below. I recommend that you type the code in yourself rather than copy and paste as it will help you to understand what you are doing. The tab key (for the spacing) is usually located above "Caps Lock" on the left of your keyboard.

<html>
<head>
<title>Hello World</title>
</head>
<body>
<p>My first Web page.</p>
</body>
</html>

Once you have typed the code into your text editor you will need to save it as a Web page file. Web page files have their own "extension" (the period and the three letters after the file name) to distinguish them from other files such as Microsoft Word (.doc) or Adobe Acrobat (.pdf).

Web pages use an extension of either .htm or .html. I prefer to use .html as it matches the name of the language. The choice is yours. Some old systems will not save files with four letter extensions so .htm may be your only choice.

Once you have saved the file open it up in your Web browser. On windows this can usually be done by double clicking the file in Windows Explorer. If you have typed it in correctly then you will see something similar to Figure 1-1 above.

Now that you have your page, try adding some more paragraphs to it like this:

<html>
<head>
<title>Hello World</title>
</head>
<body>
<p>My first Web page.</p>
<p>A second paragraph.</p>
<p>Yet another paragraph.</p>
</body>
</html>

Save your document again and refresh your Web browser. You should see the extra paragraphs appear after the first one.

Summary

That's it for your "Hello World" page. As I said when we started, don't worry if you didn't take it all in, we're going to be looking at each area in greater detail, but hopefully that has given you an idea of how Web pages work. In the next section, we're going to take a closer look at elements and tags and how they are used to build your documents.

XHTML Web Design for Beginners: XHTML Building Blocks

Article Index

XHTML Building Blocks

Elements and tags are the building block of XHTML. You need to fully understand both of these concepts to be able to write Web pages properly. We already touched on how they work in our example above but we're going to take a closer look now.

An element is used to mark sections of your document in order to tell the computer what that section is. This can range from marking the entire document as with the <html> element to marking a single word as important. The concept is the same in all cases.

Elements

Elements are made up of two tags; a start tag and an end tag. Between these tags is the element content.

This element tells the computer that its content "Hello World" is the title for the document. Without the start and end tags the computer would have no way of knowing what to do with this text.

Start Tags

A start tag is made up of a left angle bracket followed by the name of the element and then a right angle bracket.

A start tag tells the computer that we are starting a new element and that it should regard everything it now encounters as part of that element start tag until it reaches the right angle bracket.

End Tags

End tags are made up of a left angle bracket and a slash followed by the name of the element and then a right angle bracket.

XHTML End Tag Syntax. </title>

Once the computer gets to the end tag for an element it knows that element is finished. The slash is necessary to distinguish it from the start tag.

Case Sensitivity

When you are entering your tags you must make sure that the names use lower case letters only. XHTML is what we call case-sensitive. This means that the following tags are all different:

  1. <title>
  2. <Title>
  3. <tITLE>

Only number 1 is an XHTML tag, the rest do not exist. All tags in XHTML are in lower-case so it is not difficult to remember, just be careful and make sure you get it right.

Empty Elements

Certain elements do not have any content. For these empty elements a special syntax is provided. Instead of inserting an end tag immediately after the start tag has finished all we have to do it put a slash before the right angle bracket of the start tag to tell the computer that this element is finished.

The <br> element is used to insert a line break into your document. This tells the computer to stop the text at that point and start a new line. As you may have guessed the <br> element does not have any content so instead of entering the element like this:

<br></br>

we use a single tag with a slash at the end of the tag to show that it is an empty element:

Not only does this save typing, it also makes the code easier to read and more manageable. The space before the slash is necessary to support older Web browsers that do not understand empty elements and will simply ignore the slash as long as there is a space before it.

Content

The element we have just looked at only contained the text "Hello World". But elements can contain a lot more than just text. If they couldn't then XHTML wouldn't be very useful.

Other than text, most of your elements will also contain other elements. In fact a number of elements must contain certain other elements to work properly. We will look at each of these later.

An element that contains another element looks like this.

<head>
<title>The document title</title>
</head>

Here we have a <head> element that contains a <title> element. As we go on you will see elements containing more and more elements as you build up your knowledge and produce larger, more complex documents.

Nesting

No we're not talking about preparing for babies. Nesting means the way in which elements contain elements. When we say that elements are properly nested we mean that each element is completely contained within the elements that contain it, and it completely contains the elements it contains. Try and say that after a night out.

That might sound confusing, but it's really quite simple, as these examples will demonstrate. We are going to be using the elements <em> and <strong> which give text emphasis and strong emphasis respectively. We'll look at them in detail later.

<em>The Lord Of The Rings is a <strong>fantastic</strong> story.</em>

This is valid XHTML.

<em>The Lord Of The Rings is a <strong>fantastic</em> story. </strong>

This is not. The <em> starts outside the <strong> but finishes inside it. The tags are not properly nested. Think of elements as being like boxes. A box can have a box inside it, or can be inside a box, but it can't be inside a box and outside it as well. Neither can your elements.

Required Elements

There are four elements that all XHTML documents must contain. We have already seen that you must have a <head> and it must contain a <title>. I've also mentioned the <html> and <body> elements. We're going to look at each of these elements in turn, starting from the top.

<html>

The <html> element is the container for your whole document. It starts first and finishes last. It tells the computer that this is an <html> document and must always be present.

<head>

After <html> the next element should always be <head>. The head contains elements that are about the document rather than elements that are displayed in the page itself. This includes things like the document title, information to be given to search engines and how this document relates to others on your site.

<title>

Within the <head> of your document you must have a <title> which describes what the document is. Without a <title> Your document is not valid.

<body>

Finally your document must have a <body>. The <body> is the Web page itself. It comes after the <head> and is the only other element that can go in your <html> element. Anything that you want to put in your page goes in here.

You can think of an XHTML document as being like a human body. All people are people from head to toe (<html>), they have a head that contains information you don't see when you look at them (<head>), they have a name (<title>) and they have a body (<body>).

Putting them all together

When we put all of these together we get the basic structure for an XHTML document. Here it is.

Every XHTML document you produce will have that same basic structure. All other elements go in either the <head> or the <body>.

Attributes

Often an element can't convey enough information about itself through its name alone. For example, the <img> element, which is used to display an image, is no use on it's own. You also need to tell the browser where to find the image file, and other things like a text description for users who don't get the image for one reason or another.

This is achieved with attributes. Attributes are added to the start tag of your element and come in the form of a name="value" pair. The name is the name of the attribute you are using, value is replaced with the value you wish to provide for the attribute. Let's take a closer look.

XHTML Attribute Syntax. name="value"

As with elements names, all attribute names are in lower case. You have a choice of using either double quotes " or single quotes ' as long as you use the same before and after the value. You must enclose the value in one form of quotes or the other. Without them your document will not be valid and may not work as intended.

Let's look at an example to see an attribute in action. Below is a simple <img> element that tells the browser to fetch an image from /images/logo.gif.

<img src="/images/logo.gif" />

You will see attributes used a lot and you'll soon get the hang of them so again, don't sweat it.

Summary

We have seen that there are rules to be followed when writing your XHTML documents, and we've looked at the basic building blocks of XHTML. As long as you follow these rules, plus others that I will mention as we go along, you are on your way to creating XHTML web pages. We're now going to add some elements to your arsenal that are used to mark up text.

XHTML Web Design for Beginners: Text That Says Something

Article Index

Text That Says Something

Congratulations! What for? For getting to here, you've got past the hardest part. Whether you understood everything you read so far or just absorbed as much as you could, the next few sections should be a lot easier going as we look at the different elements in your XHTML arsenal and the meaning that they have.

We're going to start with giving more meaning to your text. This includes:

Marking Paragraphs with <p>

Before we dive into those, let's take another look at the paragraph element <p>. The <p> element is used to contain your paragraphs. It is what we call a block or box element. This means that when it occurs in your document (in a visual browser) it will start on a new line, and when it finishes the next element will start below it. This is best described by the example below.

Take a look at the code below which you have already seen in our first example.

<body>
<p>My first Web page.</p>
<p>I hope you like it.</p>
</body>

Here we have two paragraphs. Let's take another look at the way in which they would be displayed to understand what the <p> element is doing. I've added three blue bars to the picture to highlight the spacing and the new line that has been created from using the <p> element.

The page section only of a Web browser window. The text "My first Web page", followed by some space, and below it the text "I hope you like it.". The areas above, between and below the text blocks are highlighted.

Without the <p> elements there would be no spacing and the text would just be in one long line.

Try it with the <p> elements
Try it without the <p> elements

This kind of element is called a box or block element because there is a (often invisible) box around the element that separates it from the rest of the page. This is essential to make your document readable instead of just being one big kludge of text.

The second type of element is called an inline element, this is an element that does not have it's own box, it does not effect the flow of text in any way. The elements we are looking at in this section are inline elements unless otherwise stated.

Now let's add some further meaning to our text.

Adding Emphasis with <em>

First let's look at <em>. <em> is used to indicate text that should be given greater emphasis. It is more important than the text around it. In the paragraph below the phrase "The Lord Of The Rings" is considered more important so it is given more emphasis using <em>.

<p><em>The Lord Of The Rings</em> was written by JRR Tolkien.</p>

View example 2

The way in which <em> is handled by a Web browser will vary. A visual browser such as Internet Explorer will usually display the text in italics whereas an audio browser such as an in-car Web browser or a browser used by blind people may speak the word in a louder voice. Later on we will look at ways that you can specify how your elements should be displayed but for now we will let the browser decide for us.

Adding Strong Emphasis with <strong>

The <strong> element is similar to <em> except that it indicates a stronger emphasis. Let's alter the example above to give the text "JRR Tolkien" a strong emphasis.

<p><em>The Lord Of The Rings</em> was written by <strong>JRR Tolkien</strong>.</p>

View example 3

As with <em>, the way in which the <strong> element is handled depends on the browser being used. Visual browsers will usually display the text in bold, a speech browser may use a louder voice than it does for <em>.

Defining citations with <cite>

<cite> is used to indicate a citation or a reference to another source such as for further information. For example:

<p><cite>Homer Simpson</cite> said, Operator, give me the number for nine-one-one!.</p>

View example 4

In a visual browser the <cite> element will often be displayed in italics, an audio browser may inform the listener that this is a citation.

Abbreviations and Acronyms with <abbr> and <acronym>

In many fields today abbreviations and acronyms are common. But not everyone knows what they mean. Using the <abbr> and <acronym> elements enables you to provide their full meaning without cluttering your page.

Both the <abbr> and <acronym> elements work in the same way, and are interchangeable. There is no clear definition of the difference between an abbreviation and an acronym so use whichever you feel most suitable. I will talk about the <abbr> element but read this as meaning one or the other.

The <abbr> element uses an optional title attribute to show the full version of the abbreviation. For example:

<p>This document uses <abbr title="eXtensible Hyper Text Markup Language">XHTML</abbr>.</p>

View example 5

A visual browser will often alert a user that an explanation of an abbreviation is available; a tool-tip then appears when the user moves their mouse over the term. A speech browser may speak the full version of the abbreviation on request.

Please be aware that Internet Explorer does not support these elements up to version 6 on the PC. If you are using this browser then you will not see any visual difference in the examples above. However most other recent browsers, including Internet Explorer for the Macintosh, do support this element.

Quotes using <q> and <blockquote>

These elements are used to indicate text quoted from another source. <q> is an inline element (it does not break the text flow) and <blockquote> is a block element (it starts and finishes with a new line).

Let's start with <q>. <q> is used for short quotes that you want to include in a sentence or paragraph. <q> uses an optional cite attribute to indicate the location of a source for the quotation. For example:

<p>Homer Simpson said, <q cite="http://personal.inet.fi/taide/karjalainen/homer.html">Operator, give me the number for nine-one-one!</q>.</p>

View example 6

The cite attribute shows that the quote originally came from http://personal.inet.fi/taide/karjalainen/homer.html. Visual browsers should add quotation marks for you around the quoted text. Speech browsers may indicate that this is a quotation.

The <blockquote> element works in the same way as the <q> element except it is a block element so it starts and finishes with a new line. It is used for longer quotes:

<p>Homer Simpson said:</p>
<blockquote cite="http://personal.inet.fi/taide/karjalainen/homer.html">The code of the schoolyard, Marge! The rules that teach a boy to be a man. Let's see. Don't tattle. Always make fun of those different from you. Never say anything, unless you're sure everyone feels exactly the same way you do. What else..</blockquote>

View example 7

Visual browsers display a <blockquote> with extra space on the right and left of the block (it is indented). Speech browsers may indicate that it is a quote. The cite attribute shows where the quote originally came from.

Computer Text with <code>, <samp>, <kbd> and <var>

These elements are used to indicate text that relates to a computer in a certain way, as follows:

<code>indicates computer program code<samp>indicates sample output from a computer program<kbd>indicates text that a user of a program should enter<var>indicates a computer program variable or argument

If the above explanations mean nothing to you don't worry, if you don't know what they mean you're not likely to be using them in your documents, just remember that they exist.

Marking Document Changes with <ins> and <del>

Once you have released a document onto your Web site you may find that some information changes and you need to add or remove sections of text from your documents. While there is nothing to stop you from simply adding or removing text from your document, the <ins> and <del> elements can be used to mark added text and deleted text respectively.

For example, the following text has a section of each type of text:

<p>The code of the schoolyard, Marge! The rules that teach a boy to be a man. Let's see. <del>Don't tattle.</del> Always make fun of those different from you. <ins>Never say anything, unless you're sure everyone feels exactly the same way you do.</ins> What else..</p>

View example 8

Visual browsers will often underline <ins> elements and put a line through <del> elements. Speech browsers may indicate that the text has been added or removed respectively.

Using Elements for their Intended Purpose

As you viewed the examples in this section you may have thought of using the elements purely for their visual effect on the text. For example the <del> element above will often be displayed with a line through the marked text. You should not use any element purely for it's visual effect, later on we will be looking at style sheets which will give you full control over the way in which your text is displayed. Elements should only be used to mark text that has that meaning. This is called the semantics of your documents.

Summary

That's it for elements that are specific to certain types of text. Have a go at using them to create a document and get used to creating XHTML documents.

That's also the end of the first part of this article. I hope you enjoyed it. Part Two is now available.

If you are looking for a book on XHTML then you should take a look at HTML & XHTML: The Definitive Guide: Fifth Edition by Chuck Musciano & Bill Kennedy. I have been using it since the first edition and can highly recommend it for Web authors of all levels.

XHTML Web Design for Beginners: Advanced XHTML Building Blocks

Article Index

Advanced XHTML Building Blocks

Before we look at any more elements there are a few more basic building blocks of XHTML that we need to cover in order for you to understand the topics we will examine. Hopefully you now have an understanding of elements, start tags, end tags, the basic structure of an XHTML document and the text elements we looked at in the previous section.

In this section we will be looking at the topics listed below, don't worry if the topic titles look a bit scary, they'll make sense when you get to them, but the titles will make it easier to check back for later.

Character References and Entity References

Character references aren't as scary as they sound (no need to sweat). Let's find out why they exist, and then we can look at how you code them and use them.

Take a look at your keyboard, can you type a copyright symbol © or an inverted exclamation mark ¡? Unless you're using a pretty strange keyboard then the answer is no.

Imagine you are a Web browser (User Agent) reading a Web page file and you come across a left angle bracket <. How do you know if it is the start of a tag or an angle bracket used in the content of the document? Answer, you don't.

The solution to these two problems? Entity references and character references (funny, that's also the title of this section).

Entity references and character references are extremely similar in XHTML, and people often confuse the two names. Basically they tell a Web browser (User Agent) that it should insert a certain character in their place.

If you don't know what a character is, it's a catch all word for a letter, number, punctuation mark etc. A is one character, AB is two characters, N!P 3 is five characters (four? you forgot to count the space). You get the idea.

A character reference or entity reference represents one character in XHTML, entity references can represent more than one character in SGML or XML but that's another story that you don't need to worry about right now.

The difference between a character reference and an entity reference is this. Character references use numbers while entity references use names. Let's look at the copyright symbol we saw above. To insert a copyright symbol into your document you would use either of the following:

&copy;

Try the &copy; entity reference

&#169;

Try the &#169; character reference

If you try the examples above (and your Web browser (User Agent) isn't broken) you will see a copyright symbol for both examples. As I said before, the entity reference uses names (copy), the character reference uses numbers (169). Observant readers will notice that the character reference also has a sharp symbol #. Let's take a closer look.

An entity reference begins with an ampersand. This is then followed by the name of the entity reference, which is followed by a semi-colon, much in the same way that you use a left angle bracket and right angles bracket to denote (delimit) the start and finish of a tag.

XHTML Entity Reference Syntax - An ampersand & followed by a name (e.g. copy) followed by a semi-colon ;

Character references begin with an ampersand followed by a sharp symbol. This is then followed by the number of the character reference, which is again followed by a semi-colon.

XHTML Character Reference Syntax - An ampersand & followed by a sharp symbol # followed by a number (e.g. 169) followed by a semi-colon ;

Whether you use an entity reference or a character reference is up to you. I tend to use entity references because I find names easier to remember than numbers but the choice is yours. Just don't forget that you need the sharp symbol with the character reference and not with the entity reference.

I will be explaining some of the entity and character references available to you in later sections, but I will not be showing you all of them individually as there are too many (approximately two hundred and fifty). For your reference I have prepared three articles detailing the three sets available to you. These are at the following locations.

Not all of them work in all browsers so be sure to test the ones you choose to use.

Ampersands and Left Angle Brackets

Although it is possible to enter ampersands & and left angle brackets < with most keyboards, you should always use an entity or character reference when they appear in your content. This is for the reason that I have already mentioned. There is no way for a computer to know the difference between the start of an entity/character reference or a tag from an ampersand or a left angle bracket respectively. Using character or entity references for those characters avoids this problem.

The following code contains an ampersand and a left angle bracket:

<p>Never use a < or an & directly in your content.</p>

The above code is wrong and should be written in one of the two following ways, firstly with entity references and then with character references:

<p>Never use a &lt; or an &amp; directly in your content.</p>

View example 2

<p>Never use a &#60; or an &#38; directly in your content.</p>

View example 3

White Space

White space means any characters in your document that do not serve any purpose other than creating space. This includes spaces, tabs, line breaks and zero width spaces. A line break is the character (or 2) at the end of each line that tells the computer to start a new line. A zero width space is used to separate words in languages such as Thai.

There are two issues relating to white space that you need to be aware of.

White Space Between Words

No matter how much space you use between your words, Web browsers (User Agents) will always reduce it to a single space character. There is one exception to this that we will cover in the next section. When I say words I mean any characters that are not white space and have no white space between them.

That might sound a bit complicated, but it's not, it just sounds complicated when you try to describe it. An example should help you to understand.

<p>This content

  has  a lot
 of white   space    
between the

words.</p>

View example 4

If you view the above example in a visual Web browser (User Agent) you will see that all of the content is on a single line with a single space between each word. That's all there is to it.

This feature comes in handy, it means that you can use tabs, spaces and new lines to make your code easier to read and not worry about your document looking funny in a visual Web browser (user agent).

Space Around Tags

You need to be careful about putting white space around your tags until you get used to this rule and then it will become second nature.

If you want a space before or after a word that is contained by an element you should put that space outside the element. By this I mean before the start tag and after the end tag. If you put it inside you might not get any white space between your words.

<p>Always leave white space <strong>outside</strong> your elements when you want it and not<strong> inside </strong>.</p>

In the example above the strong element containing the word outside has white space outside the tags, which is the way it should be. The strong element containing the word inside has white space inside the tags and not outside. On some Web browsers (User Agents) there may not be any space displayed between the words not and inside.

I have not linked to an example for this because most Web browsers will display the content without problems, but they don't have to, so it's better to get into the habit of doing it right.

Comments

When you are creating your documents you may want to leave information for yourself or for others viewing the document code but not viewing the document in a Web browser (User Agent). To do this you use what we call a comment. A comment has the following syntax:

You should be careful not to use two dashes together within your comments as this could be thought to be the end of the comment (even without the right angle bracket).

Here's an example:

<!-- This is the first Web page I ever created. -->
<p>My first Web page.</p>
<!-- This is a comment
spread over two lines. -->

View example 5

As you will see if you view the above example, the text in the comments is ignored. Comments are useful for leaving yourself reminders for later such as what still needs doing to a document.

Summary

In this section we have completed our look at the basic building blocks of XHTML. We've seen how to use special characters in our pages with character references and entity references, we've looked at the way white space is handled and we've also seen how you can add comments to your code.

In the next section we're going to continue our coverage of the elements you can use that relate to text including, amongst others, headings, line breaks and pre-formatted text.

XHTML Web Design for Beginners: Text That Says Something 2

Article Index

Text That Says Something 2

In this section we will be looking at more of the elements (and a couple of entity references) in the XHTML arsenal that relate to text, further to those covered in the section "Text That Says Something".

Specifically we will be covering:

Before we start I would like to re-iterate an important point, all elements should be used for their meaning and not their visual effect. You can make any element look any way you want using style sheets, and we'll be covering it later on. So please, do yourself a favour and use elements for the reason they're intended.

There are many benefits to this, the two most important being that it makes your site much more accessible to disabled users and those who are using alternative browsers such as Personal Digital Assistants and in-car browsers. It also helps your search engine placement.

So now that rant's over and done with let's get on with it.

Headings with <h1> through to <h6>

Any document longer than a few sentences needs to be split up into sections to be usable. This is not a concept invented for the web, it was probably conceived not long after writing was invented.

To mark headings in your XHTML there are six elements that each relate to deeper levels of subheadings as the number goes up. For clarity the six elements are:

  • <h1>,
  • <h2>,
  • <h3>,
  • <h4>,
  • <h5> and
  • <h6>.

You should always start with <h1>, followed by <h2> for sub-headings, <h3> for sub-sub-headings. you get the idea. You should never start with <h1> and then go straight to <h3>, or start with <h2>.

In the past Web designers have started with <h2> or <h3> because they wanted the visual effect of smaller text than commonly offered by <h1> but, as already mentioned (getting sick of it yet?), this can be achieved with style sheets and is not a valid reason for starting your headings with anything other than <h1>.

Heading are block level elements, they have space above and below them, as you'd expect.

It is important that you use the heading elements to mark your headings as it ensures users of all user agents can understand your document structure. It also helps you get higher rankings in search engines as the search engines have a better idea what the document is about by examining the headings.

Here's a sample three level document, I'm sure you can work out what a document with deeper levels would look like.

<h1>XHTML Web Design for Beginners: Introduction</h1>
<h2>Introduction</h2>
<p>This article is for readers who have either no prior experience...</p>
<h3>Colour</h3>
<p>I have used colour in the example...</p>
<h3>No Programs</h3>
<p>I will not be showing you how...</p>

View example 1

In general, most XHTML documents should have only a single <h1> element. If you decide to use more than one then you should be sure that they are two separate topics and you have a good reason for having them on the same page. If two topics are on the same page then usually they are connected, and you should have a single <h1> describing both topics and then <h2>s for each sub-topic. It is very rare, if at all, that a page should have two <h1> elements.

A user agent for the blind will often use headings as a way of giving the user an overview of the document so they can decide which part they wish to hear.

Subscripts and Superscripts with <sub> and <sup>

Subscripts are letters or digits which appear smaller and at the bottom of the line such as the 2 in H2O. Superscripts are again smaller and appear at the top such as the th in the 13th of February.

To mark subscripts and superscripts in XHTML you use the <sub> and <sup> elements respectively. An example should make it clear:

<p>The symbol for water is H<sub>2</sub>0.</p>
<p>This example was written on the 13<sup>th</sup> of February.</p>

View example 2

Line Breaks with <br>

When you are writing your documents you may want to indicate that there should be a new line started without closing a paragraph. To do this you can use the <br> element. <br> is an empty element so you must ensure that you use the empty element syntax by writing it as <br />.

Here's an example:

<p>
The Road goes ever on and on<br />
Down from the door where it began.<br />
Now far ahead the Road has gone,<br />
And I must follow, if I can,<br />
Pursuing it with eager feet,<br />
Until it joins some larger way<br />
Where many paths and errands meet.<br />
And wither then? I cannot say.
</p>

View example 3

This is an element that has no effect outside visual browsers.

Non-breaking space with &nbsp;

Web browsers may split a set of words onto two lines. Sometimes this is not what you want. The solution is the entity reference &nbsp; which stands for non-breaking space.

If you insert a &nbsp; between your words instead of a space, with no spaces on either side, that text will be treated as a single line and never be broken up. Here's an example:

<p>This&nbsp;is&nbsp;a&nbsp;solid&nbsp;line.</p>

View example 4

If you view the example in a visual browser try making your browser window thin and see if you can make the text go on to 2 lines, you can't. Now try with normal spaces.

<p>This is not a solid line.</p>

View example 5

This is another element that has no effect outside visual browsers.

Soft Hyphens with &shy;

Soft Hyphens are used to indicate a point in a word where you would like it to be split on to two lines if that is necessary. It simply makes for a nicer appearance when space is limited such as when you have text in a thin column (which we'll be covering later).

To use it you simply insert it in the word at the point where you would like the potential split to be. Here's an example:

<p>I have no idea what antidisestablishment&shy;arianism means.</p>

View example 6

In a visual browser, if you collapse your browser window so that the long word (which I won't repeat) is against the right hand edge of the window then it should split the word onto two lines at the point where the soft hyphen occurs.

This is another element that has no effect outside visual browsers.

Pre-formatted text with <pre>

Remember when we covered white space in the last section and I told you that it is always collapsed into a single space? Well there's one exception, the <pre> element allows you to layout your text in the same way you want it to appear in a visual user agent. <pre> is a block level element which to remind you means that it has space above it and below it.

Using <pre> is simple, let's redo the example we did with <br> above using <pre> instead:

<pre>The Road goes ever on and on
Down from the door where it began.
Now far ahead the Road has gone,
And I must follow, if I can,
Pursuing it with eager feet,
Until it joins some larger way
Where many paths and errands meet.
And wither then? I cannot say.</pre>

View example 7

You've guessed it, this is another element that has no effect outside visual browsers.

Summary

That's nearly it for text elements, hopefully you now understand most of the elements and entity references that you can use in your XHTML documents to mark-up your text. In part three we will be looking at XHTML's three different types of lists - ordered lists, unordered lists and definition lists. We'll also see how you can add graphics to your pages and how you link your documents together and to other documents/sites.

Part three will be available soon. To be informed straight away when part three is released please sign up for our newsletter.

If you are looking for a book on XHTML then you should take a look at HTML & XHTML: The Definitive Guide: Fifth Edition by Chuck Musciano & Bill Kennedy. I have been using it since the first edition and can highly recommend it for Web authors of all levels.

Up Your Potential with Professionals & Like-Minded Owners

Explore New Horizons
Tech Mastery
Collaborative Learning