Blog Post View

What is HyperText Markup Language (HTML)?

Hypertext Markup Language (HTML) is the standard markup language used for creating web pages and web applications. Alongside Cascading Style Sheets (CSS) and JavaScript (JS), it forms the triad of the World Wide Web (WWW). A web browser would typically receive HTML documents from either a web server or local storage. These documents are then rendered into multimedia web pages by the browser for viewing by a user and as such; one can say that these documents basically make up the structure of a web page semantically and originally, cues for the appearance of the document.

Through the use of HTML elements, a web page can consist of various different entities ranging from images to interactive web forms. This is accomplished by denoting structural semantics for text such as headings, paragraphs, lists, links, quotes, and etc. These elements are identified by tags; angled brackets surrounding the keyword. For example, tags such as <img /> and <input /> are used to display content on the page while other tags such as <p>… <p> surrounded text but no tags are ever actually displayed by the browser as they're only used to interpret the page's contents.

It should be noted that HTML can go a step further and allow developers to embed scripts into it such as JavaScript and Cascading Style Sheets to further manipulate and define how the content appears.

Markup

Markup is actually something more than just tags for HTML, however, the tags do play a major part of it. For example, another thing to consider is the attributes of tags which play another significant role in helping customize and define the tag. Other aspects of it are character-based data types, character references, and entity references. Typically, most tags come in pairs as mentioned before and the first tag is called the start or opening tag while the second one is called the end or closing tag. However; some can also represent empty elements and are therefore unpaired, also as mentioned above.

Another important component of HTML is the document type declaration which triggers standard mode rendering and it wouldn't be possible to use the language without it. It should also be noted that there are some unique HTML elements as well, such as <br> which do not permit any content nor do they accept an attribute. With that in mind, note that the normal structure of an HTML tag is <tag attribute1 = "value1" attribute2 = "value2"> content </tag> while tags such as <img/> are different in that they only accept attributes (<img src="image.jpg"/>) and tags such as <br> are different altogether from even those ones in that, they accept neither attribute nor content.

Structural markup is a means for a browser to identify different nuances between similar markup tags and is not something unique to HTML itself but a default style of browsers. For instance, <h2>Heading</h2> denotes “Heading” as a second-level heading that could vary in size depending on the browser, but should be notably smaller than <h1>Heading</h1>.

On the other hand, presentational markup focuses on nuances of the visual output of the content, such as emboldening, underlining, or emphasizing text. Do note, however, that most presentational markup has been deprecated from HTML 4.0 in favor of CSS styling.

Hypertext markup is what allows parts of a document to become links to other pages stored locally, on a server, or can even link to another part of the same document. Attributes are name-value pairs of tags which carry special meaning depending on the tag. Amongst the various types, they are some attributes which are common across all tags for the sake of identifying tags for some other means, such as running specific scripts with the tag and thus, its content. These attributes are id which provides a document-wide unique identifier for the element and is commonly used with stylesheets, class which functions similarly to id with the addition that any subordinate HTML element will semantically adopt its presentation attributes as well, title which is used to provide a light explanation of an element as browsers typically display this as a tooltip when users mouse-over an element, and lang which identifies the natural language of the element's content.

In HTML 4.0, 252 character entity references and 1, 114, 050 numeric character references which allowed characters to be written via simple markup rather than literally. As a result, a written character and its markup counterpart are considered identical and the demand for this was to allow users a method to write these characters without the browser reading it as markup.

HTML Versions

Semantic HTML is a way of writing normal HTML while emphasizing the meaning of encoded information rather than its presentation. Despite this functionality has always existed, but so has presentational markup tags such as <font>, <i>, and <center>. It should be noted that good practice of semantic HTML improves the accessibility of web documents and that since HTML's inception, presentational markup has been deprecated in HTML and XHTML.

Throughout the life of HTML, there have been a few versions of note, particularly HTML 4.0 which had quite a few variations of itself as there were no clear standards in its earliest stages. Various browser vendors pushed many presentational elements and attributes into the language and made it void of its semantic nature until the World Wide Web Consortium (W3C) returned it to its semantic roots with the creation of CSS and Extensible Stylesheet Language (XSL). Ultimately, this led to two differentiating variations of HTML: SGML-based HTML vs HTML (or rather, Extensible HTML or XHTML) on one end and strict versus transitional (or loose) on the other end.

SGML-based HTML vs XHTML differs between the limitations of each. XHTML was developed by the W3C to be identical to HTML 4.01 with the exception of its limitations being over the more complex SGML-based HTML workarounds. That being said, some people confuse the two because of their similarities but it should be noted that HTML 4.01 is a different language than XHTML (which is also sometimes called XML).

HTML 4 was defined with three different versions of its language; Strict, Transitional (or Loose), and Frameset. The Strict version is intended for new documents and is recommended as a best practice while the Transitional and Frameset versions were created for the sake of transitioning documents with older HTML specifications (or none at all) to some versions of HTML 4. This means that the latter two allowed for presentational markup which is completely omitted in the Strict version.

Share this post

Comments (0)

    No comment

Leave a comment

Login To Post Comment