and the first 50 words or so of the <BODY>; but careful choice of the CONTENT associated with the NAME="Description" and HTTP-EQUIV="Keywords" items can be used to influence them to work in a far less haphazard way; but please note that if you put a ridiculously large amount of CONTENT for the "Keywords" tag, some Search Engines will assume you are trying to pull a fast one, and reject the site completely! There is a further <META> attribute which a small but increasing number of Search Engine "spiders" are beginning to respond to: <meta name="ROBOTS" content="ALL | INDEX | NOFOLLOW | NOINDEX"> where the CONTENT can contain one, or two comma-separated, from the list. Thus, if you have a site where you would prefer some of the leaf-pages NOT to be accessed directly out of context, but only via your main index page, you could but a tag in the <HEAD> part of those leaf pages saying <meta name="ROBOTS" content="NOINDEX"> Or if you did not want ANY of the leaf pages catalogued, then put <meta name="ROBOTS" content="INDEX, NOFOLLOW"> in the <HEAD> section of your index.html page. However, please don't rely on all robots doing what you tell them: You can't force a robot to catalogue you if it doesn't feel like it; You can't force a robot to ignore you if it ignores META tags anyway! Obscure Meta Tags: Refresh -------------------------- Not surprisingly, there are plenty of instances of "extension" META tag attributes being invented; one which crops up from time to time (and is presumably put there to deliberately confuse visitors who are NOT using the particular commercial browser whose vendors introduced it) is <META HTTP-EQUIV="refresh" CONTENT="[no_of_seconds][comma][full_URL]"> at least, that's what the W3C HTML4.0 specs say; but it appears that the form that gets used in practice, and more likely to be understood, is <META HTTP-EQUIV="REFRESH" CONTENT="[no_of_seconds][semicolon]URL=[URL]"> The intention here is that the index.htm page containing that tag will be displayed for a given number of seconds after being fetched, and then automatically replaced by a further page as referenced; eg <META http-equiv="refresh" content="5; url='indx2.html'"> where the single quote marks are optional and best omitted. W3C have heard of this tag, and strongly recommend that such an index page should contain a hyperlink to the next page, so that visitors using a browser that does not implement that extension are not left stuck looking at an unhelpful "transient" page; but do any of the perpetrators of this sort of thing remember to do so? (Off-topic) robots.txt ---------------------- A Web server may contain a file called "robots.txt" in its root, which can contain instructions to visiting spiders/robots, and allows much more control than the simpler <META> method above. However, it ONLY works if it is in the root directory of the server, and so is NOT applicable to sites within a user directory. John Alldred 31 January 1999 john@protovale.co.uk http://www.protovale.co.uk/john/ http://www.argonet.co.uk/users/protovale/john.html

The tag ============== tags can only appear in the element of a page. (Note that whereas the element holds the CONTENTS OF the page, the element holds information ABOUT the page: this can usefully include guidelines for "how to index/catalogue this page") Each tag is self-contained: the two attributes are inside the angle-brackets, and there is no end-tag required. The tag contains just two attributes: The first is either NAME= or HTTP-EQUIV= The second is always CONTENT= (It may also contain a third: LANG= ) HTTP-EQUIV is used whenever the information is relevant to the server, browser, or transfer protocol they use; if the information does not have such a relevance, then NAME should be used. The idea (and the origin of the name META) is that an arbitrary amount of extra information can be furnished in an open-ended way. Hence a (contrived) tag behaves like a (non-existent!) tag The only NAMEs which are worth considering are those which do have an accepted (ie W3C-defined) usage. Here are some examples, whose meanings are largely self-explanatory: Note the first one, which seems to be new to HTML4.0, and is part of the ability to specify the character set in use, in anticipation of eventual use of "Unicode" or ISO-10646. Documents in Latin-1 for English and Western European languages should use ISO-8859-1; and the Acorn Latin-1 character set is identical to this EXCEPT for ASCII characters 128 to 159 which are NOT defined in ISO Latin 1 (so don't use bullets, "sexed" quote marks, em-dashes, etc!) The second one gives an example of the LANG attribute; here "English". Search Engine Robots -------------------- Probably the most useful reason for some of these META tags is the ability to furnish Search Engines with information to enable them to catalogue your site "correctly" (ie, in the category you would prefer). In the absence of any other information, they would probably analyse only the and the first 50 words or so of the <BODY>; but careful choice of the CONTENT associated with the NAME="Description" and HTTP-EQUIV="Keywords" items can be used to influence them to work in a far less haphazard way; but please note that if you put a ridiculously large amount of CONTENT for the "Keywords" tag, some Search Engines will assume you are trying to pull a fast one, and reject the site completely! There is a further <META> attribute which a small but increasing number of Search Engine "spiders" are beginning to respond to: <meta name="ROBOTS" content="ALL | INDEX | NOFOLLOW | NOINDEX"> where the CONTENT can contain one, or two comma-separated, from the list. Thus, if you have a site where you would prefer some of the leaf-pages NOT to be accessed directly out of context, but only via your main index page, you could but a tag in the <HEAD> part of those leaf pages saying <meta name="ROBOTS" content="NOINDEX"> Or if you did not want ANY of the leaf pages catalogued, then put <meta name="ROBOTS" content="INDEX, NOFOLLOW"> in the <HEAD> section of your index.html page. However, please don't rely on all robots doing what you tell them: You can't force a robot to catalogue you if it doesn't feel like it; You can't force a robot to ignore you if it ignores META tags anyway! Obscure Meta Tags: Refresh -------------------------- Not surprisingly, there are plenty of instances of "extension" META tag attributes being invented; one which crops up from time to time (and is presumably put there to deliberately confuse visitors who are NOT using the particular commercial browser whose vendors introduced it) is <META HTTP-EQUIV="refresh" CONTENT="[no_of_seconds][comma][full_URL]"> at least, that's what the W3C HTML4.0 specs say; but it appears that the form that gets used in practice, and more likely to be understood, is <META HTTP-EQUIV="REFRESH" CONTENT="[no_of_seconds][semicolon]URL=[URL]"> The intention here is that the index.htm page containing that tag will be displayed for a given number of seconds after being fetched, and then automatically replaced by a further page as referenced; eg <META http-equiv="refresh" content="5; url='indx2.html'"> where the single quote marks are optional and best omitted. W3C have heard of this tag, and strongly recommend that such an index page should contain a hyperlink to the next page, so that visitors using a browser that does not implement that extension are not left stuck looking at an unhelpful "transient" page; but do any of the perpetrators of this sort of thing remember to do so? (Off-topic) robots.txt ---------------------- A Web server may contain a file called "robots.txt" in its root, which can contain instructions to visiting spiders/robots, and allows much more control than the simpler <META> method above. However, it ONLY works if it is in the root directory of the server, and so is NOT applicable to sites within a user directory. John Alldred 31 January 1999 john@protovale.co.uk http://www.protovale.co.uk/john/ http://www.argonet.co.uk/users/protovale/john.html