Creating Web Pages

This page contains a bit of info about creating web pages at the School of Statistics at the University of Minnesota.

Contents

HTML

In order to make a web page, you need to create a file in HTML (HyperText Markup Language), the language of web pages.

If you want to know a lot about HTML, you need to buy a book. One I can recommend is HTML & XHTML: The Definitive Guide, 6th Edition from O'Reilly (the company that does the ``nutshell handbook'' series).

The definitive reference is the HTML 4.01 Specification. Like all reference manuals, it is hard to read unless you know what you are looking for. Don't look at it yet.

If you want to know just an eensy-teensy bit, the following may help.

HTML Elements

HTML defines lots of thingies called elements. Most elements consist of three parts: a start tag, content, and an end tag. For some elements the end tag is optional. A few elements have only a start tag, no content and no end tag.

The FOO element has start tag <FOO> and end tag </FOO>, where ``FOO'' stands for any element name. Element names are case-insensitive. <FOO>, <foo>, and <fOo> refer to the same element. Use whatever style you prefer.

Some elements control fonts. Others make headings.
HTML Browser Display
<H3>A Heading.</H3>
<P>
Some stuff, blah, blah. Some <EM>italicized blah.</EM> Some <B>boldface blah.</B>

A Heading.

Some stuff, blah, blah. Some italicized blah. Some boldface blah.

The P element is a paragraph. So a paragraph starts with <P> and ends with </P>, but the </P> end tag is optional.

The <BR> tag causes a line break. This is one of the rare elements with no content and no end tag.

Overall Structure of an HTML Page

The whole page is an HTML element with start tag <HTML> at the top of the page and end tag </HTML> at the bottom. The page is divided into HEAD and BODY elements. The HEAD element with start tag <HEAD> and end tag </HEAD> contains content that is not displayed by the web browser but can be used by the browser or other applications, such as internet search services, in other ways. It can contain keywords for search services, ratings for internet censorship services, style sheets that tell web browsers how lay out your web pages. But the only element we will illustrate here is the TITLE element. The contents of the title element are not displayed in the browser window, but the browser displays it somewhere. On our workstations, Netscape puts the title of a page in the title bar of its window.

After the HEAD element comes the BODY element that contains the stuff that is displayed in the browser window.

At the very top of the page is a document type definition that says what variant of HTML is being used. This is required, although many browsers do the right thing if it is omitted.

Thus a typical web page looks like this

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<HTML>
<HEAD>
<TITLE>
My Very Own Web Page
</TITLE>
</HEAD>
<BODY>
<H1>My Very Own Web Page</H1>
<P>
Blah, blah, blah.  Bleat, bleat, bleat.
</BODY>
</HTML>

Putting Your Pages On the Server

The School of Statistics web server at http://www.stat.umn.edu/ serves to the world a bunch of pages owned and controlled by the system administrators and — this is the important part — a bunch of pages owned and written by ordinary users like you.

To put a page on the net

That's all there is to it. The file is now a world wide web page, readable anywhere in the world. If your login name is fred, the location (URL) of this page is
http://www.stat.umn.edu/~fred/foo.html
We'll assume your login name is fred for the rest of this page. Wherever you see fred, replace it with your own login name. For example, this file in my directory is http://www.stat.umn.edu/~charlie/foo.html.

Problems

If you see

File Not Found

The requested URL /~fred/foo.html was not found on this server.

you haven't followed directions. There is no file ~fred/public_html/foo.html (still writing fred for your login name).

If you see

Forbidden

You don't have permission to access /~fred/foo.html on this server.

the file permissions are wrong. Your home directory, the subdirectory public_html, and the file foo.html must be world readable.
   chmod +rx ~
   chmod +rx ~/public_html
   chmod +r ~/public_html/foo.html
will fix that.

Links to Your Pages from Other Pages

Your page is now on the world wide web, anyone anywhere in the world can see it, if they know about it already. They can type the URL http://www.stat.umn.edu/~fred/foo.html into their web browser and go to your page. But there are as yet no links to your page from other people's pages, links that allow web surfers to find your page by mousing around the web. Since you don't control any other pages, you have to ask other people to make links to your page.

If you want a link from the department web pages, like the links on

http://www.stat.umn.edu/People/Students.html

to various student home pages, or the links on

http://www.stat.umn.edu/People/Faculty.html

various faculty home pages, or the links on

http://www.stat.umn.edu/Courses/ClassWebPages.html

to various class home pages, ask the system administrators.

If you want links from elsewhere, you'll have to ask the owners of those pages.

Subdirectories

Anything in your public_html directory is served to web (if it is world readable). You can make lots of subdirectories if you want. The file qux.html in the subdirectory baz of the subdirectory bar of public_html has location (URL)

http://www.stat.umn.edu/~fred/bar/baz/qux.html
This is derivable from the actual location of the file in the UNIX file system, which is
~fred/public_html/bar/baz/qux.html
You omit /public_html from the file address and add http://www.stat.umn.edu/ in front of ~fred. This is a convention of our server. It's how it translates URLs for user files.

index.html

Another secret code, not part of HTML, enforced by our web server (and many others) is that if a URL refers to a directory and there is a file index.html in that directory, then it displays the web page specified by this file. If there is no file index.html, then it displays a listing of the files in the directory.

You can use this to shorten URLs. If you make your home page

~fred/public_html/index.html
then you can use the URL
http://www.stat.umn.edu/~fred/
to refer to it. (If index.html were tacked on the end here, it would be redundant.)

.htaccess

Yet another secret code, not part of HTML, enforced by our web server is access control. A file .htaccess placed in a subdirectory of ~/public_html controls access to the files in that directory.

For example, an .htaccess file containing

     order deny,allow
     deny from all
     allow from .stat.umn.edu
will only allow persons running web browsers on machines in the stat.umn.edu domain to see the pages.

This section and the preceding one are specific to the Apache HTTP Server, which is what we are currently running.

More HTML

Attributes

Many HTML elements also have attributes. Attributes are assigned values by attribute=value pairs inserted in the start tag of the element. For example if the FOO element has attributes BAR and QUX, the start tag <FOO BAR="baz" QUX=42> assigns the character string "baz" to the BAR attribute and the number 42 to the QUX attribute.

The HTML reference manual says that all values are supposed to be delimited by double or single quotation marks, although values that contain only letters and numbers can be specified without the quotation marks, as the 42 here. The manual goes on to say ``We recommend using quotation marks even when it is possible to eliminate them.''

Like element names, attribute names are case-insensitive. Attribute values are usually case-insensitive, but not always.

Some real HTML examples

<H1 ALIGN="center">A Centered Heading</H1>
makes a centered heading. The BGCOLOR attribute of the BODY element sets a background color for the page, for example
<BODY BGCOLOR="fuschia">
sets the background color to ``fuchsia'' one of the 16 colors HTML knows by name. Other color specifications are possible. See the reference manual.

Links

We finally get to the whole point of web pages. Web pages are hypertext joined by links. How do you make links between your pages and other pages?

Links are made with the HTML A element. The ``A'' is for anchor, which is what HTML calls this element. Anchors come in two kinds: source and destination. The first is far more important.

Source Anchors

A source anchor makes a hyperlink that when clicked on moves to the location (URL) specified by the HREF attribute. The content of the A element is underlined text in a special color that indicates a hyperlink.
HTML Browser Display
Here we have a link to the <A HREF="http://www.stat.umn.edu/"> School of Statistics</A> at the <A HREF="http://www.umn.edu/"> University of Minnesota</A>.
<P>
Here we have a example of how <EM>not</EM> to write a link. To go to the School of Statistics <A HREF="http://www.stat.umn.edu/"> click here</A>.
Here we have a link to the School of Statistics at the University of Minnesota.

Here we have a example of how not to write a link. To go to the School of Statistics click here.

Destination Anchors

The typical source anchor (colloquially called a ``link'') refers to a whole web page. When you click on the link, you go to the top of the referenced page.

A more complicated use of the anchor element is to mark a destination within a web page. This uses the NAME attribute of the A element. For example the heading of this section is specified in HTML by

<H4><A NAME="dest">Destination Anchors</A></H4>
Then this particular spot in the page is specified by the URL
http://www.stat.umn.edu/~charlie/web.html#dest
the ``#dest'' on the end specifies the position of this anchor within the page http://www.stat.umn.edu/~charlie/web.html.

Every heading in this page is a destination anchor like this. They are all referred to by links in the table of contents section at the top of the page.

You can use the ID attribute instead of the NAME attribute to specify destination anchors. The only difference is that ID values must be unique within a web page and NAME values need not be.

URLs

From the examples we have seen above, we know most of what is useful about URLs. A URL (Uniform Resource Locator) like our example

http://www.stat.umn.edu/~charlie/web.html#dest
consists of four bits
  1. the access protocol (http)
  2. the host name of the server (www.stat.umn.edu)
  3. the path to resource on the server (/~charlie/web.html)
  4. the fragment identifier (#dest)
URLs for web pages always use the HTTP (HyperText Transfer Protocol) access protocol. So a web page URL always begins http:// followed by the server name. The fragment identifier is usually absent. URLs can also be specified relative to the URL of the current page. The first two parts (access protocol and server host name) are absent and the path is given relative to the path of the current web page.

Some examples are
whatsis.html The file whatsis.html in the same directory as the current page.
frammis/whatsis.html The file whatsis.html in the subdirectory frammis of the directory containing the current page.
../frammis/whatsis.html The file whatsis.html in the subdirectory frammis of the directory above the directory containing the current page. (.. is UNIX for ``one directory up''. It is used the same way in URLs.)
/frammis/whatsis.html The file whatsis.html in the subdirectory frammis of the what the server considers its root directory.
The last example specifies the same page as the absolute URL http://www.stat.umn.edu/frammis/whatsis.html. The others depend on the URL of the current page.

Images

Images are included on a page with the IMG element. The SRC attribute gives the URL of the image, usually in JPEG or GIF format. The HEIGHT and WIDTH attributes give the height and width of the image in pixels, used by the browser to lay out the rest of the page while the image is loading. Omit HEIGHT and WIDTH if you don't know what they are.

The ALT attribute gives a short description to be displayed for users that have images turned off or are visually impaired (see Accessibility). The ALT attribute should always be included. The reference manual says it must be included. Omitting ALT is just rude.

The ALIGN attribute can be used to place the image to the left or right of the text or to align the image even with the top, middle, or bottom of the line of text in which it occurs.
HTML Browser Display
<IMG SRC="bell.gif" ALT="The Bell Curve" WIDTH=100 HEIGHT=61 ALIGN="left"> Here we have the famous bell curve, much used and abused.
<P>
Everyone believes in the law of errors, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact (G. Lippmann).
The Bell Curve Here we have the famous bell curve, much used and abused.

Everyone believes in the law of errors, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact (G. Lippmann).

Use the Source, Luke

You can see how any web page works by looking at the HTML for it. In Firefox, the ``Page Source'' option on the ``View'' menu shows the source (the HTML) for the current page.

HTML Validation

To debug your HTML, it is helpful to have a computer check it for validity. Several web sites do this, the most official being the one at W3C (the World Wide Web Consortium). Go to
http://validator.w3.org/
type in a URL, and it will tell you whether it is valid HTML.

Accessibility

``Accessibility'' is computer industry jargon for making computers usable by people with disabilities. Basically, it is just common sense, but common sense that has escaped many web page designers. It includes For more information, see the Web Content Accessibility Guidelines produced by the Web Accessibility Initiative.

See also the the University of Minnesota administrative policy on Accessibility of Information Technology.

Content

It should go without saying that the point of a web page is to say something the world wants to see, at least some people out there. But it doesn't. Content is far more important than all the bells and whistles, but page after page are full of bells and whistles signifying nothing.

Before you start worrying about colors and layout, worry about content. Before you start worrying about frames or Javascript or VBscript or the other bells and whistles du jour, you might consider that these are already obsolete. There are better ways to do all these things that are appearing in the latest releases of web browsers. Why join the kluge of the month club?

Some pages about how not to write web pages are the HTML Hell Page and the Top Ten Mistakes in Web Design, the Top Ten Mistakes Revisited Three Years Later, and the The Top Ten New Mistakes of Web Design.


Author: Charles Geyer (charlie@stat.umn.edu). Comments or corrections gratefully accepted.

Valid HTML 4.01 Transitional