name | Standard Generalized Generic Markup Language |
---|---|
screenshot | |
mime | application/sgml, text/sgml |
uniform type | public.xml |
owner | ISO |
genre | Markup Language |
extended from | GML |
extended to | HTML, XML |
standard | ISO 8879 }} |
The Standard Generalized Markup Language (ISO 8879:1986 SGML) is an ISO-standard technology for defining generalized markup languages for documents. ISO 8879 Annex A.1 defines generalized markup:
Generalized markup is based on two novel postulates:Markup should be declarative: it should describe a document's structure and other attributes, rather than specify the processing to be performed on it. Declarative markup better anticipates unforeseen future processing needs and techniques. Markup should be rigorous so that the techniques available for processing rigorously-defined objects like programs and databases can be used for processing documents as well.
SGML is part of a trio of enabling ISO standards for electronic documents developed by ISO/IEC JTC1/SC34 (ISO/IEC Joint Technical Committee 1, Subcommittee 34 – Document description and processing languages) :
HyTime—Generalized hypertext and scheduling].
SGML is supported by various technical reports, in particular
ISO/IEC TR 9573 – Information processing – SGML support facilities – Techniques for using SGML
A conforming SGML document must be either a type-valid SGML document, a tag-valid SGML document, or both. Note: A user may wish to enforce additional constraints on a document, such as whether a document instance is integrally-stored or free of entity references.A type-valid SGML document is defined by the standard as
An SGML document in which, for each document instance, there is an associated document type declaration (DTD) to whose DTD that instance conforms.
A tag-valid SGML document is defined by the standard as
An SGML document, all of whose document instances are fully tagged. There need not be a document type declaration associated with any of the instances. Note: If there is a document type declaration, the instance can be parsed with or without reference to it.
The SGML emphasis on validity supports the requirement for generalized markup that ''markup should be rigorous.'' (ISO 8879 A.1)
An SGML document may be composed from many entities (discrete pieces of text). In SGML, the entities and element types used in the document may be specified with a DTD, the different character sets, features, delimiter sets, and keywords are specified in the SGML Declaration to create the ''concrete syntax'' of the document.
Although full SGML allows implicit markup and some other kinds of tags, the XML specification (s4.3.1) states:
For introductory information on basic, modern SGML syntax, see XML. The following material concentrates on features not in XML and is not a comprehensive summary of SGML syntax.
Many SGML features relate to markup minimization. Other features relate to parallel asynchronous markup (CONCUR), to linking processing attributes (LINK), and to embedding SGML documents within SGML documents (SUBDOC).
The notion of customizable features was not appropriate for Web use, so one goal of XML was to minimize optional features. However XML's well-formedness rules cannot support Wiki-like languages, leaving them unstandardized and difficult to integrate with non-text information systems.
SGML provides an ''abstract syntax'' that can be implemented in many different types of ''concrete syntax''. Although the markup norm is using angle brackets as start- and end- tag delimiters in an SGML document (per the standard-defined ''reference concrete syntax''), it is possible to use other characters—provided a suitable ''concrete syntax'' is defined in the document's SGML declaration. For example, an SGML interpreter might be programmed to parse GML, wherein the tags are delimited with a left colon and a right full stop, thus, an '':e'' prefix denotes an end tag: :xmp.Hello, world:exmp.
. According to the reference syntax, letter-case (upper- or lower-) is not distinguished in tag names, thus the three tags: (i) <quote>
, (ii) <QUOTE>
, and (iii) <quOtE>
are equivalent. ''NOTE:'' A concrete syntax might ''change'' this rule via the NAMECASE NAMING declarations).
<QUOTE></QUOTE>
pair example) or if it can occur singly (as an HTML <HR>
), those specifications are defined in the DTD for the document (provided the OMITTAG feature is enabled). In this case, the XML counterpart would be the specific ''empty tag'' <hr/>
, equivalent to the SGML NET-enabling start-tag, introduced in the TC2 (International Standard ISO 8879:1986, Technical Corrigendum 2, November 1999).
“ ”
(LIT) or single ’ ’
(LITA)—so that the previous markup example could be written:
One feature of SGML markup languages is the "presumptuous empty tagging", such that the empty end tag </>
in <ITALICS>this</>
"inherits" its value from the nearest previous full start tag, which, in this example, is <ITALICS>
(in other words, it closes the most recently opened item). The expression is thus equivalent to <ITALICS>this</ITALICS>
.
<ITALICS/this/
, which is structurally equivalent to <ITALICS>this</ITALICS>
.
can be written as:
Wherein the first slash ( / ) stands for the NET-enabling “start-tag close” (NETSC), and the second slash stands for the NET. NOTE: XML defines NETSC with a / , and NET with an > (angled bracket)—hence the corresponding construct in XML appears as
The third feature is 'text on the same line', allowing a markup item to be ended with a line-end; especially useful for headings and such, requiring using either SHORTREF or DATATAG minimization. For example, if the DTD includes the following declarations:
(and "&#RE;&#RS;" is a short-reference delimiter in the concrete syntax), then:
is equivalent to:
.
Formal characterization
SGML has many features that defied convenient description with the popular formal automata theory and the contemporary parser technology of the 1980s and the 1990s. The standard warns in Annex H:
There appears to be no definitive classification of full SGML against a known class of formal grammar. Plausible classes may include tree-adjoining grammars and adaptive grammars.
XML is described as being generally parsable like a two-level grammar for non-validated XML and a Conway-style pipeline of coroutines (lexer, parser, validator) for valid XML. The SGML productions in the ISO standard are reported to be LL(3) or LL(4). XML-class subsets are reported to be expressible using a W-grammar. According to one paper, and probably considered at an ''information set'' or parse tree level rather than a character or delimiter level:
The SGML standard does not define SGML with formal data structures, such as parse trees, however, an SGML document is constructed of a rooted directed acyclic graph (RDAG) of physical storage units known as “entities”, which is parsed into a RDAG of structural units known as “elements”. The physical graph is loosely characterized as an ''entity tree'', but entities might appear multiple times. Moreover, the structure graph is also loosely characterized as an ''element tree'', but the ID/IDREF markup allows arbitrary arcs.
The results of parsing can also be understood as a data tree in different notations; where the document is the root node, and entities in other notations (text, graphics) are child nodes. SGML provides apparatus for linking to and annotating external non-SGML entities.
The SGML standard describes it in terms of ''maps'' and ''recognition modes'' (s9.6.1). Each entity, and each element, can have an associated ''notation'' or ''declared content type'', which determines the kinds of references and tags which will be recognized in that entity and element. Also, each element can have an associated ''delimiter map'' (and ''short reference map''), which determines which characters are treated as delimiters in context. The SGML standard characterizes parsing as a state machine switching between recognition modes. During parsing, there is a stack of maps that configure the scanner, while the tokenizer relates to the recognition modes.
Parsing involves traversing the dynamically-retrieved entity graph, finding/implying tags and the element structure, and validating those tags against the grammar. An unusual aspect of SGML is that the grammar (DTD) is used both passively — to ''recognize'' lexical structures, and actively — to ''generate'' missing structures and tags that the DTD has declared optional. End- and start- tags can be omitted, because they can be inferred. Loosely, a series of tags can be omitted only if there is a single, possible path in the grammar to imply them. It was this active use of grammars that made concrete SGML parsing difficult to formally characterize.
SGML uses the term ''validation'' for both recognition and generation. XML does not use the grammar (DTD) to change delimiter maps or to inform the parse modes, and does not allow tag omission; consequently, XML validation of elements is not active in the sense that SGML validation is active. SGML ''without'' a DTD (e.g. simple XML), is a grammar or a language; SGML ''with'' a DTD is a metalanguage. SGML with an SGML declaration is, perhaps, a meta-metalanguage, since it is a metalanguage whose declaration mechanism ''is'' a metalanguage.
SGML has an abstract syntax implemented by many possible concrete syntaxes, however, this is not the same usage as in an abstract syntax tree and as in a concrete syntax tree. In the SGML usage, a concrete syntax is a set of specific delimiters, while the abstract syntax is the set of names for the delimiters. The XML Infoset corresponds more to the programming language notion of abstract syntax introduced by John McCarthy.
The W3C XML (Extensible Markup Language) is a profile (subset) of SGML designed to ease the implementation of the parser compared to a full SGML parser, primarily for use on the World Wide Web. In addition to disabling many SGML options present in the reference syntax (such as omitting tags and nested subdocuments) XML adds a number of additional restrictions on the kinds of SGML syntax. For example, despite enabling SGML shortened tag forms, XML does not allow unclosed start or end tags. It also relied on many of the additions made by the WebSGML Annex. XML currently is more widely used than full SGML. XML has lightweight internationalization based on Unicode. Applications of XML include XHTML, XQuery, XSLT, XForms, XPointer, JSP, SVG, RSS, Atom, XML-RPC, RDF/XML, and SOAP.
The charter for the recently revived World Wide Web Consortium HTML Working Group says, "the Group will not assume that an SGML parser is used for 'classic HTML'". Although HTML syntax closely resembles SGML syntax with the default ''reference concrete syntax'', HTML 5 (reportedly) abandons conforming with SGML, explicitly defining its own serialization, although, it also defines an alternative XML-based XHTML 5 serialization, which does conform to SGML (WWW).
Several modern programming languages support tags as primitive token types, or now support Unicode and regular expression pattern-matching. An example is the Scala programming language.
SP and Jade, the associated DSSSL processors, are maintained by the OpenJade project, and are common parts of Linux distributions. A general archive of SGML software and materials resides at SUNET. The original HTML parser class, in Sun System's implementation of Java, is a limited-features SGML parser, using SGML terminology and concepts.
Category:ISO standards Category:Markup languages Category:Technical communication Category:Data modeling languages
ar:لغة الترميز القياسي العام bg:SGML ca:Standard Generalized Markup Language cs:Standard Generalized Markup Language de:Standard Generalized Markup Language et:Ühtlustatud Üldine Markeerimise Keel es:SGML eo:SGML fr:Standard Generalized Markup Language fy:SGML ga:SGML gl:SGML ko:SGML id:SGML it:Standard Generalized Markup Language lv:Valoda SGML lt:SGML hu:Standard Generalized Markup Language nl:Standard Generalized Markup Language ja:Standard Generalized Markup Language no:SGML nn:SGML pl:SGML pt:SGML ru:SGML sk:Standard Generalized Markup Language fi:SGML sv:SGML uk:SGML vi:SGML zh:SGMLThis text is licensed under the Creative Commons CC-BY-SA License. This text was originally published on Wikipedia and was developed by the Wikipedia community.
The World News (WN) Network, has created this privacy statement in order to demonstrate our firm commitment to user privacy. The following discloses our information gathering and dissemination practices for wn.com, as well as e-mail newsletters.
We do not collect personally identifiable information about you, except when you provide it to us. For example, if you submit an inquiry to us or sign up for our newsletter, you may be asked to provide certain information such as your contact details (name, e-mail address, mailing address, etc.).
When you submit your personally identifiable information through wn.com, you are giving your consent to the collection, use and disclosure of your personal information as set forth in this Privacy Policy. If you would prefer that we not collect any personally identifiable information from you, please do not provide us with any such information. We will not sell or rent your personally identifiable information to third parties without your consent, except as otherwise disclosed in this Privacy Policy.
Except as otherwise disclosed in this Privacy Policy, we will use the information you provide us only for the purpose of responding to your inquiry or in connection with the service for which you provided such information. We may forward your contact information and inquiry to our affiliates and other divisions of our company that we feel can best address your inquiry or provide you with the requested service. We may also use the information you provide in aggregate form for internal business purposes, such as generating statistics and developing marketing plans. We may share or transfer such non-personally identifiable information with or to our affiliates, licensees, agents and partners.
We may retain other companies and individuals to perform functions on our behalf. Such third parties may be provided with access to personally identifiable information needed to perform their functions, but may not use such information for any other purpose.
In addition, we may disclose any information, including personally identifiable information, we deem necessary, in our sole discretion, to comply with any applicable law, regulation, legal proceeding or governmental request.
We do not want you to receive unwanted e-mail from us. We try to make it easy to opt-out of any service you have asked to receive. If you sign-up to our e-mail newsletters we do not sell, exchange or give your e-mail address to a third party.
E-mail addresses are collected via the wn.com web site. Users have to physically opt-in to receive the wn.com newsletter and a verification e-mail is sent. wn.com is clearly and conspicuously named at the point of
collection.If you no longer wish to receive our newsletter and promotional communications, you may opt-out of receiving them by following the instructions included in each newsletter or communication or by e-mailing us at michaelw(at)wn.com
The security of your personal information is important to us. We follow generally accepted industry standards to protect the personal information submitted to us, both during registration and once we receive it. No method of transmission over the Internet, or method of electronic storage, is 100 percent secure, however. Therefore, though we strive to use commercially acceptable means to protect your personal information, we cannot guarantee its absolute security.
If we decide to change our e-mail practices, we will post those changes to this privacy statement, the homepage, and other places we think appropriate so that you are aware of what information we collect, how we use it, and under what circumstances, if any, we disclose it.
If we make material changes to our e-mail practices, we will notify you here, by e-mail, and by means of a notice on our home page.
The advertising banners and other forms of advertising appearing on this Web site are sometimes delivered to you, on our behalf, by a third party. In the course of serving advertisements to this site, the third party may place or recognize a unique cookie on your browser. For more information on cookies, you can visit www.cookiecentral.com.
As we continue to develop our business, we might sell certain aspects of our entities or assets. In such transactions, user information, including personally identifiable information, generally is one of the transferred business assets, and by submitting your personal information on Wn.com you agree that your data may be transferred to such parties in these circumstances.