This page contains a bit of info about creating web pages at the School of Statistics at the University of Minnesota.
In order to make a web page, you need to create a file in HTML (HyperText Markup Language), the language of web pages.
If you want to know a lot about HTML, you need to buy a book. One I can recommend is HTML & XHTML: The Definitive Guide, 6th Edition from O'Reilly (the company that does the ``nutshell handbook'' series).
The definitive reference is the HTML 4.01 Specification. Like all reference manuals, it is hard to read unless you know what you are looking for. Don't look at it yet.
If you want to know just an eensy-teensy bit, the following may help.
HTML defines lots of thingies called elements. Most elements consist of three parts: a start tag, content, and an end tag. For some elements the end tag is optional. A few elements have only a start tag, no content and no end tag.
The FOO element has start tag <FOO> and end tag </FOO>, where ``FOO'' stands for any element name. Element names are case-insensitive. <FOO>, <foo>, and <fOo> refer to the same element. Use whatever style you prefer.
Some elements control fonts. Others make headings.
HTML | Browser Display |
---|---|
<H3>A Heading.</H3> <P> Some stuff, blah, blah. Some <EM>italicized blah.</EM> Some <B>boldface blah.</B> |
A Heading.Some stuff, blah, blah. Some italicized blah. Some boldface blah. |
The P element is a paragraph. So a paragraph starts with <P> and ends with </P>, but the </P> end tag is optional.
The <BR> tag causes a line break. This is one of the rare elements with no content and no end tag.
The whole page is an HTML element with start tag <HTML> at the top of the page and end tag </HTML> at the bottom. The page is divided into HEAD and BODY elements. The HEAD element with start tag <HEAD> and end tag </HEAD> contains content that is not displayed by the web browser but can be used by the browser or other applications, such as internet search services, in other ways. It can contain keywords for search services, ratings for internet censorship services, style sheets that tell web browsers how lay out your web pages. But the only element we will illustrate here is the TITLE element. The contents of the title element are not displayed in the browser window, but the browser displays it somewhere. On our workstations, Netscape puts the title of a page in the title bar of its window.
After the HEAD element comes the BODY element that contains the stuff that is displayed in the browser window.
At the very top of the page is a document type definition that says what variant of HTML is being used. This is required, although many browsers do the right thing if it is omitted.
Thus a typical web page looks like this
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <HTML> <HEAD> <TITLE> My Very Own Web Page </TITLE> </HEAD> <BODY> <H1>My Very Own Web Page</H1> <P> Blah, blah, blah. Bleat, bleat, bleat. </BODY> </HTML>
The School of Statistics web server at http://www.stat.umn.edu/ serves to the world a bunch of pages owned and controlled by the system administrators and — this is the important part — a bunch of pages owned and written by ordinary users like you.
To put a page on the net
http://www.stat.umn.edu/~fred/foo.htmlWe'll assume your login name is fred for the rest of this page. Wherever you see fred, replace it with your own login name. For example, this file in my directory is http://www.stat.umn.edu/~charlie/foo.html.
If you see
File Not Foundyou haven't followed directions. There is no file ~fred/public_html/foo.html (still writing fred for your login name).The requested URL /~fred/foo.html was not found on this server.
If you see
Forbiddenthe file permissions are wrong. Your home directory, the subdirectory public_html, and the file foo.html must be world readable.You don't have permission to access /~fred/foo.html on this server.
chmod +rx ~ chmod +rx ~/public_html chmod +r ~/public_html/foo.htmlwill fix that.
Your page is now on the world wide web, anyone anywhere in the world can see it, if they know about it already. They can type the URL http://www.stat.umn.edu/~fred/foo.html into their web browser and go to your page. But there are as yet no links to your page from other people's pages, links that allow web surfers to find your page by mousing around the web. Since you don't control any other pages, you have to ask other people to make links to your page.
If you want a link from the department web pages, like the links on
to various student home pages, or the links on
various faculty home pages, or the links on
to various class home pages, ask the system administrators.
If you want links from elsewhere, you'll have to ask the owners of those pages.
Anything in your public_html directory is served to web (if it is world readable). You can make lots of subdirectories if you want. The file qux.html in the subdirectory baz of the subdirectory bar of public_html has location (URL)
http://www.stat.umn.edu/~fred/bar/baz/qux.htmlThis is derivable from the actual location of the file in the UNIX file system, which is
~fred/public_html/bar/baz/qux.htmlYou omit /public_html from the file address and add http://www.stat.umn.edu/ in front of ~fred. This is a convention of our server. It's how it translates URLs for user files.
Another secret code, not part of HTML, enforced by our web server (and many others) is that if a URL refers to a directory and there is a file index.html in that directory, then it displays the web page specified by this file. If there is no file index.html, then it displays a listing of the files in the directory.
You can use this to shorten URLs. If you make your home page
~fred/public_html/index.htmlthen you can use the URL
http://www.stat.umn.edu/~fred/to refer to it. (If index.html were tacked on the end here, it would be redundant.)
Yet another secret code, not part of HTML, enforced by our web server is access control. A file .htaccess placed in a subdirectory of ~/public_html controls access to the files in that directory.
For example, an .htaccess file containing
order deny,allow deny from all allow from .stat.umn.eduwill only allow persons running web browsers on machines in the stat.umn.edu domain to see the pages.
This section and the preceding one are specific to the Apache HTTP Server, which is what we are currently running.
Many HTML elements also have attributes. Attributes are assigned values by attribute=value pairs inserted in the start tag of the element. For example if the FOO element has attributes BAR and QUX, the start tag <FOO BAR="baz" QUX=42> assigns the character string "baz" to the BAR attribute and the number 42 to the QUX attribute.
The HTML reference manual says that all values are supposed to be delimited by double or single quotation marks, although values that contain only letters and numbers can be specified without the quotation marks, as the 42 here. The manual goes on to say ``We recommend using quotation marks even when it is possible to eliminate them.''
Like element names, attribute names are case-insensitive. Attribute values are usually case-insensitive, but not always.
Some real HTML examples
<H1 ALIGN="center">A Centered Heading</H1>makes a centered heading. The BGCOLOR attribute of the BODY element sets a background color for the page, for example
<BODY BGCOLOR="fuschia">sets the background color to ``fuchsia'' one of the 16 colors HTML knows by name. Other color specifications are possible. See the reference manual.
We finally get to the whole point of web pages. Web pages are hypertext joined by links. How do you make links between your pages and other pages?
Links are made with the HTML A element. The ``A'' is for anchor, which is what HTML calls this element. Anchors come in two kinds: source and destination. The first is far more important.
A source anchor makes a hyperlink that when clicked on moves to the location (URL) specified by the HREF attribute. The content of the A element is underlined text in a special color that indicates a hyperlink.
HTML | Browser Display |
---|---|
Here we have a link to the
<A HREF="http://www.stat.umn.edu/">
School of Statistics</A> at the
<A HREF="http://www.umn.edu/">
University of Minnesota</A>. <P> Here we have a example of how <EM>not</EM> to write a link. To go to the School of Statistics <A HREF="http://www.stat.umn.edu/"> click here</A>. |
Here we have a link to the
School of Statistics at the
University of Minnesota.
Here we have a example of how not to write a link. To go to the School of Statistics click here. |
The typical source anchor (colloquially called a ``link'') refers to a whole web page. When you click on the link, you go to the top of the referenced page.
A more complicated use of the anchor element is to mark a destination within a web page. This uses the NAME attribute of the A element. For example the heading of this section is specified in HTML by
<H4><A NAME="dest">Destination Anchors</A></H4>Then this particular spot in the page is specified by the URL
http://www.stat.umn.edu/~charlie/web.html#destthe ``#dest'' on the end specifies the position of this anchor within the page http://www.stat.umn.edu/~charlie/web.html.
Every heading in this page is a destination anchor like this. They are all referred to by links in the table of contents section at the top of the page.
You can use the ID
attribute instead of the NAME
attribute to specify destination anchors. The only difference is that
ID
values must be unique within a web page and NAME
values need not be.
From the examples we have seen above, we know most of what is useful about URLs. A URL (Uniform Resource Locator) like our example
http://www.stat.umn.edu/~charlie/web.html#destconsists of four bits
Some examples are
whatsis.html | The file whatsis.html in the same directory as the current page. |
frammis/whatsis.html | The file whatsis.html in the subdirectory frammis of the directory containing the current page. |
../frammis/whatsis.html | The file whatsis.html in the subdirectory frammis of the directory above the directory containing the current page. (.. is UNIX for ``one directory up''. It is used the same way in URLs.) |
/frammis/whatsis.html | The file whatsis.html in the subdirectory frammis of the what the server considers its root directory. |
Images are included on a page with the IMG element. The SRC attribute gives the URL of the image, usually in JPEG or GIF format. The HEIGHT and WIDTH attributes give the height and width of the image in pixels, used by the browser to lay out the rest of the page while the image is loading. Omit HEIGHT and WIDTH if you don't know what they are.
The ALT attribute gives a short description to be displayed for users that have images turned off or are visually impaired (see Accessibility). The ALT attribute should always be included. The reference manual says it must be included. Omitting ALT is just rude.
The ALIGN attribute can be used to place the image to the left or right of the text or to align the image even with the top, middle, or bottom of the line of text in which it occurs.
HTML | Browser Display |
---|---|
<IMG SRC="bell.gif" ALT="The Bell Curve" WIDTH=100 HEIGHT=61 ALIGN="left">
Here we have the famous bell curve, much used and abused. <P> Everyone believes in the law of errors, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact (G. Lippmann). |
Here we have the famous bell curve, much used and abused.
Everyone believes in the law of errors, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact (G. Lippmann). |
You can see how any web page works by looking at the HTML for it. In Firefox, the ``Page Source'' option on the ``View'' menu shows the source (the HTML) for the current page.
http://validator.w3.org/type in a URL, and it will tell you whether it is valid HTML.
See also the the University of Minnesota administrative policy on Accessibility of Information Technology.
It should go without saying that the point of a web page is to say something the world wants to see, at least some people out there. But it doesn't. Content is far more important than all the bells and whistles, but page after page are full of bells and whistles signifying nothing.
Before you start worrying about colors and layout, worry about content. Before you start worrying about frames or Javascript or VBscript or the other bells and whistles du jour, you might consider that these are already obsolete. There are better ways to do all these things that are appearing in the latest releases of web browsers. Why join the kluge of the month club?
Some pages about how not to write web pages are
the
HTML Hell Page
and the
Top Ten Mistakes in Web Design,
the
Top Ten Mistakes
Revisited Three Years Later,
and the
The Top Ten New Mistakes of Web Design.