Web Bits: HTML/XHTML

April 29th, 2008

This first installment of the Web Bits series will cover the basis of all output on the web: HTML. HTML stands for HyperText Markup Language and is the core presentational method behind the visual rendering of a web page by a browser. HTML has been around since 1990 in one form or another. It has been revised repeatedly as use of the Internet has grown, bugs are found, and other needs have been identified. The latest version of HTML is 4.01, released in 1999. The most current standard is an integration of XML with HTML, yielding XHTML 1.0 (More on that later). For more information on the history of HTML and its specification, please see the W3C site and their documentation. Much of the data here has been pulled heavily from their records.

What is HTML?

The W3C informational page describes HTML as

the lingua franca for publishing hypertext on the World Wide Web

…but what does that actually mean? As I mentioned before, HTML is a markup language. That means that it is used to denote sets of elements and how they should display when viewed by a web browser. HTML consists of pieces of markup code (called tags) that, when read into a web browser, produce a visual result on your screen. Think of it in writing terms; let’s say we have a document that we’ve written in Microsoft Word, or your word processor of choice, that looks something like this:

As you can see we have some common document elements, such as headings of various sizes, paragraphs, bold and italicized font, images, and lists. HTML contains tags that mimic the visual representation of these elements, but in a universal format that is understood by all web browsers. So, if we were to recreate this document in web page format using HTML, the markup tags needed would look something like this:

<h1>Document Title</h1><p>A brief <strong>introduction</strong> of the document contents.</p><h2>Section 1</h2><img src="smiley_face.jpg"><p>Some information about his section of the document.<p><p>Another paragraph of <em>interesting</em> information.</p><h3>Sub Section 1</h3><ul><li>Line Item 1</li><li>Line Item 2</li><li>Line Item 3</li></ul>

Nifty, yah? The majority of HTML works by encapsulation, meaning that it consists of sets of tags that are placed before and after something on the page. Let’s go back to the example above to look at how this works in practice. Let’s take the first item, the <h1> or Heading 1 tag:

<h1>Document Title</h1>

The <h1></h1> tag set specifies that whatever information is contained between the pair will display with the formatting of a level 1 headline. Namely, being very big and very bold. The initial <h1> tag specifies the start of the encapsulation, and the closing </h1> signifies where it stops. The majority of HTML tags must have an opening and closing tag set. The closing tag is always specified with a “/” in it. The same setting works for the Heading 2 (<h2>), Heading 3 (<h3>), and Paragraph (<p>) tags.

You may notice that the line of code that defines the image on the page operates somewhat differently:

<img src="smiley_face.jpg">

This tag introduces two new concepts for us in HTML: singular tags and properties. The <img> tag is an example of the use of a singular tag on the page. This means that it is not an encapsulating tag; it produces a visual result on its own without having to wrap around any element on the page. However, singular tags cannot produce a visual result unless they have attributes applied to them which we call properties. The property used with the <img> tag in our case is “src” - short for “source”. This tells the web browser that we’re placing an image on the page, and the image to use comes from whatever location we specify in the “src” property.

There are many properties that can be applied to HTML tags, and all tags have at least a couple properties that can be applied to them. For instance, the <img> tag also has properties for “width” and “height” that can be used to define the dimensions of the image on the page.

For those of you keeping score, you’ll notice that I’ve excluded the bit of code on the page that is responsible for the creation of the bulleted list:

<ul><li>Line Item 1</li><li>Line Item 2</li><li>Line Item 3</li></ul>

This group of tags introduces the final critical concept of HTML markup: nesting. Nested tags are tags that are placed within each other, often represented by indentation in the code on the page. The visual result varies, but often involves a summation of the attributes of each level of nested tags. Let’s break down the code behind the list item:

<ul></ul>

This tag set, by itself, does nothing. However, it tells the browser that whatever is placed between these tags is to be displayed as an unordered list, commonly represented with bullets along the left-hand side. However, to define the bulleted items inside the list, another set of tags must be nested inside this tag set:

<ul><li>Line Item 1</li></ul>

The <li> tag set tells the browser that everything contained within it is to be displayed as a list item inside the unordered list, resulting in a single bulleted item. To add more list items, we simply duplicate this code over and over, resulting in:

<ul><li>Line Item 1</li><li>Line Item 2</li><li>Line Item 3</li></ul>

…yielding three bulleted list items.

XHTML

I mentioned earlier that HTML has been combined with XML to yield XHTML. XML stands for eXtensible Markup Language, yielding a combined eXtensible Hyper-Text Markup Language. XML is an extremely flexible method of defining custom tags and storing data in a relational format. Combining XML with HTML allows the use of XML data into XHTML files.

Wow, that’s all really cool…so how do I use it?

Creating HTML is an exceedingly simple affair. For the purposes of what I’m covering in this article, there are two directions that can be taken to create a web page using HTML:

  1. Use a WYSIWYG program

  2. Use an IDE or text editor

Don’t know what either of those are? No problem; that’s why you’re reading this article.

WYSIWYG (What You See Is What You Get)

If you’re familiar with any word processor, you’ve used a WYSIWYG interface. WYSIWYG typically contains a toolbar that looks something like this:

To make something bold, you highlight it and click the BOLD button. To italicize text, highlight it and select the ITALICS button. To make a set of list items, select some lines of text and his the LIST button. Easy as that. The idea is that you are making visual changes on the page, without having to actually see the code/processes behind what you are doing because you are A) too busy to have to code it yourself and/or B) don’t need to know the technical aspects behind the scenes.

The WYSIWYG interface has been used in word processors and other office programs (like Excel, Powerpoint, etc.) for a long time. It was eventually included in Microsoft Frontpage, which is a relatively common web editing program included with most past versions of Microsoft Office (it was discontinued in favor of a stand-alone program as of Office 2007).

As with anything, there are advantages and disadvantages of using a WYSIWYG editing program to produce your web pages. Since the program is automagically making visual changes for you, you lack control over just exactly how it is doing these things behind the scenes. This often results in sloppy code, which can have an impact on things like search engine optimization (SEO), accessibility for disabled users, and ease of maintenance. The code produced can oftentimes be proprietary, which will end up tying you to using that particular program for your web editing, as is the case with Microsoft Frontpage. It doesn’t play well with others.

IDE (Integrated Development Environment)

The alternative to using a WYSIWYG interface is to use an IDE. IDEs typically are little more than text editors at their core, but normally come with many sidebars and panels chock full of tools in order to make creation of web pages and scripting as efficient as possible. IDEs are characterized by giving code-hinting (popups that show you what options for code you can enter), having the ability to accept plugins, and support for a wide range of code languages. IDEs are not typically limited to web page creation and can also serve as a tool for developing full-fledged software applications.

Which type should I use?

Your preference for creating your web page largely depends on your technical skill level. If you’re a beginner to web page production, and specifically to code, it is recommended that you choose a WYSIWYG editing program to get started. Common WYSIWYG programs are:

  • Adobe Dreamweaver

  • Microsoft Frontpage (replaced with Microsoft Expression)

  • NVU

If you’re more familiar with code or have done HTML in the past and want to pick it up again, it is recommended that you use an IDE. Coding a web page, as opposed to letting a program code a web page for you, allows you an infinitely higher level of control over the code that is produced, hopefully resulting in cleaner markup, better-optimized, and more accessible sites. Some popular IDEs include:

  • Aptana

  • Eclipse

  • PSPad

Many WYSIWYG programs also contain IDE capabilities, allowing you to seamlessly switch between visual editing and code editing views. If you’re not certain on which you prefer, I highly recommend either Adobe Dreamweaver or Microsoft Expression, as they both produce reasonably well-formed code from their WYSIWYG modes and also excel as IDE code development platforms.

Conclusion

HTML is, on the whole, easy to understand and there are a plethora of ways to go about creating it, for beginners and experts, which makes it very easy to get your feet wet with it.

Ryan Burrell

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Google
  • Facebook
  • Digg
  • del.icio.us
  • StumbleUpon

Tags: , , , , ,

One Response to “Web Bits: HTML/XHTML”

  1. SilverPen Pub » Right Brain, meet Left Brain Says:

    […] swirling visions making me ill. I played World of Warcraft last night, first with April, then with Ryan, and enjoyed my evening immensely, but I needed to […]

Leave a Reply