Look up bracket in Wiktionary, the free dictionary. |
Brackets are tall punctuation marks used in matched pairs within text, to set apart or interject other text. Used unqualified, brackets refer to different types of brackets in different parts of the world and in different contexts.
Contents |
The chevron was the earliest type to appear in written English. Desiderius Erasmus coined the term lunula to refer to the rounded parentheses (), recalling the round shape of the moon.[2]
In addition to referring to the class of all types of brackets, the unqualified word bracket is most commonly used to refer to a specific type of bracket. In modern American usage this is usually the square bracket.
In American usage, parentheses are usually considered separate from other brackets, and calling them "brackets" at all is unusual even though they serve a similar function. In more formal usage "parenthesis" may refer to the entire bracketed text, not just to the punctuation marks used (so all the text in this set of round brackets may be said to be a parenthesis or a parenthetical).[3]
According to early typographic practice, brackets are never set in italics, even when the surrounding characters are italic.[4]
Parentheses (/pəˈrɛnθɨsiːz/) (singular, parenthesis (/pəˈrɛnθɨsɨs/)) – also called simply brackets, or round brackets, curved brackets, oval brackets, or, colloquially, parens – contain material that could be omitted without destroying or altering the meaning of a sentence. In most writing, overuse of parentheses is usually a sign of a badly structured text. A milder effect may be obtained by using a pair of commas as the delimiter, though if the sentence contains commas for other purposes visual confusion may result.
Parentheses may be used in formal writing to add supplementary information, such as "Sen. John McCain (R., Arizona) spoke at length." They can also indicate shorthand for "either singular or plural" for nouns – e.g., "the claim(s)" – or for "either masculine or feminine" in some languages with grammatical gender.[5]
Parenthetical phrases have been used extensively in informal writing and stream of consciousness literature. Of particular note is the southern American author William Faulkner (see Absalom, Absalom! and the Quentin section of The Sound and the Fury) as well as poet E. E. Cummings. Parentheses have historically been used where the dash is currently used – that is, in order to depict alternatives, such as "parenthesis)(parentheses". Examples of this usage can be seen in editions of Fowler's.
Parentheses may also be nested (generally with one set (such as this) inside another set). This is not commonly used in formal writing (though sometimes other brackets [especially square brackets] will be used for one or more inner set of parentheses [in other words, secondary {or even tertiary} phrases can be found within the main parenthetical sentence]).[6]
Any punctuation inside parentheses or other brackets is independent of the rest of the text: "Mrs. Pennyfarthing (What? Yes, that was her name!) was my landlady." In this usage the explanatory text in the parentheses is a parenthesis. (Parenthesized text is usually short and within a single sentence. Where several sentences of supplemental material are used in parentheses the final full stop would be within the parentheses. Again, the parenthesis implies that the meaning and flow of the text is supplemental to the rest of the text and the whole would be unchanged were the parenthesized sentences removed.)
Parentheses in mathematics signify a different precedence of operators. Normally, 2 + 3 × 4 would be 14, since the multiplication is done before the addition. On the other hand (2 + 3) × 4 is 20, because the parentheses override normal precedence, causing the addition to be done first. Some authors follow the convention in mathematical equations that, when parentheses have one level of nesting, the inner pair are parentheses and the outer pair are square brackets. Example:
A related convention is that when parentheses have two levels of nesting, braces are the outermost pair.
Parentheses are also used to set apart the arguments in mathematical functions. For example, f(x) is the function f applied to the variable x. In coordinate systems parentheses are used to denote a set of coordinates; so in the Cartesian coordinate system (4, 7) may represent the point located at 4 on the x-axis and 7 on the y-axis. Parentheses may also represent intervals; (0,5), for example, is the interval between 0 and 5, not including 0 or 5.
Parentheses may also be used to represent a binomial coefficient, and in chemistry to denote a polyatomic ion.
In Chinese and Japanese, 【 】, a combination of brackets and parentheses called 方頭括號 and sumitsuki, are used for inference in Chinese and used in titles and headings in Japanese.
Square brackets – also called simply brackets (US) – are mainly used to enclose explanatory or missing material usually added by someone other than the original author, especially in quoted text.[7] Examples include: "I appreciate it [the honor], but I must refuse", and "the future of psionics [see definition] is in doubt". They may also be used to modify quotations. For example, if referring to someone's statement "I hate to do laundry", one could write: He "hate[s] to do laundry".
The bracketed expression "[sic]" is used after a quote or reprinted text to indicate the passage appears exactly as in the original source; a bracketed ellipsis [...] is often used to indicate deleted material; bracketed comments indicate when original text has been modified for clarity: "I'd like to thank [several unimportant people] and my parentals [sic] for their love, tolerance [...] and assistance [emphasis added]".[8]
Brackets are used in mathematics in a variety of notations, including standard notations for intervals, commutators, the floor function, the Lie bracket, the Iverson bracket, and matrices.
In translated works, brackets are used to signify the same word or phrase in the original language to avoid ambiguity.[9] For example: He is trained in the way of the open hand [karate].
When nested parentheses are needed, brackets are used as a substitute for the inner pair of parentheses within the outer pair.[10] When deeper levels of nesting are needed, convention is to alternate between parentheses and brackets at each level.
In linguistics, phonetic transcriptions are generally enclosed within brackets,[11] often using the International Phonetic Alphabet, while phonemic transcriptions typically use paired slashes.
Brackets can also be used in chemistry to represent the concentration of a chemical substance or to denote distributed charge in a complex ion.
Brackets (called move-left symbols or move right symbols) are added to the sides of text in proofreading to indicate changes in indentation:
Move left | [To Fate I sue, of other means bereft, the only refuge for the wretched left. |
---|---|
Center | ]Paradise Lost[ |
Move up |
Brackets are used to denote parts of the text that need to be checked when preparing drafts prior to finalizing a document. They often denote points that have not yet been agreed to in legal drafts and the year in which a report was made for certain case law decisions.
This section has been nominated to be checked for its neutrality. Discussion of this nomination can be found on the talk page. (September 2011) |
Curly brackets – also called braces (US) or squiggly brackets (UK, informally[citation needed]) are sometimes used in prose to indicate a series of equal choices:[citation needed] "Select your animal {goat, sheep, cow, horse} and follow me". They are used in specialized ways in poetry and music (to mark repeats or joined lines). The musical terms for this mark joining staves are accolade and "brace", and connect two or more lines of music that are played simultaneously.[12] In mathematics they delimit sets. In many programming languages, they enclose groups of statements. Such languages (C being one of the best-known examples) are therefore called curly bracket languages. Some people use a brace to signify movement in a particular direction.[citation needed]
Presumably due to the similarity of the words brace and bracket (although they do not share an etymology), many people mistakenly treat brace as a synonym for bracket. Therefore, when it is necessary to avoid any possibility of confusion, such as in computer programming, it may be best to use the term curly bracket rather than brace. However, general usage in North American English favours the latter form.[citation needed] Indian programmers often use the name "flower bracket".[13]
In classical mechanics, curly brackets are often also used to denote the Poisson bracket between two quantities. It is defined as follows:
Chevrons ⟨ ⟩;[14] are often used to enclose highlighted material. Some dictionaries use chevrons to enclose short excerpts illustrating the usage of words.
In physical sciences, chevrons are used to denote an average over time or over another continuous parameter. For example,
The inner product of two vectors is commonly written as Failed to parse (Missing texvc executable; please see math/README to configure.): \langle a, b\rangle , but the notation (a, b) is also used.
In mathematical physics, especially quantum mechanics, it is common to write the inner product between elements as Failed to parse (Missing texvc executable; please see math/README to configure.): \langle a | b\rangle , as a short version of Failed to parse (Missing texvc executable; please see math/README to configure.): \langle a |\cdot| b\rangle , or Failed to parse (Missing texvc executable; please see math/README to configure.): \langle a | \hat{O} | b\rangle , where Failed to parse (Missing texvc executable; please see math/README to configure.): \hat{O}
is an operator. This is known as Dirac notation or bra-ket notation.
In set theory, chevrons or parentheses are used to denote ordered pairs and other tuples, whereas curly brackets are used for unordered sets.
In linguistics, chevrons indicate orthography, as in "The English word /kæt/ is spelled ⟨cat⟩." In epigraphy, they may be used for mechanical transliterations of a text into the Latin script.
In textual criticism, and hence in many editions of pre-modern works, chevrons denote sections of the text which are illegible or otherwise lost; the editor will often insert his own reconstruction where possible within them.
Chevrons are infrequently used to denote dialogue that is thought instead of spoken, such as:
The mathematical or logical symbols for greater-than (>) and less-than (<) are inequality symbols, and are not punctuation marks when so used. Nevertheless, true chevrons are not available on a typical computer keyboard, but the less-than and greater-than symbols are, so they are often substituted. They are loosely referred to as angled brackets or chevrons in this case.
Single and double pairs of comparison operators (<<, >>) (meaning much smaller than and much greater than) are sometimes used instead of guillemets («, ») (used as quotation marks in many languages) when the proper glyphs are not available.
In comic books, chevrons are often used to mark dialogue that has been translated notionally from another language; in other words, if a character is speaking another language, instead of writing in the other language and providing a translation, one writes the translated text within chevrons. Of course, since no foreign language is actually written, this is only notionally translated.[citation needed]
Chevron-like symbols are part of standard Chinese, and Korean punctuation, where they generally enclose the titles of books: ︿ and ﹀ or ︽ and ︾ for traditional vertical printing, and 〈 and 〉 or 《 and 》 for horizontal printing. See also non-English usage of quotation marks.
In East Asian punctuation, angle brackets are used as quotation marks. Half brackets are used in English to mark added text, such as in translations: "Bill saw ⌊her⌋".
The corner brackets ⌈ and ⌉ have at least two uses in mathematical logic: first, as a generalization of quotation marks, and second, to denote the gödel number of the enclosed expression.
In editions of papyrological texts, half brackets enclose text which is lacking in the papyrus due to damage, but can be restored by virtue of another source, such as an ancient quotation of the text transmitted by the papyrus.[15] For example, Callimachus Iambus 1.2 reads: ἐκ τῶν ὅκου βοῦν κολλύ⌊βου π⌋ιπρήσκουσιν. A hole in the papyrus has obliterated βου π, but these letters are supplied by an ancient commentary on the poem.
In formal semantics, double brackets, ⟦ ⟧, also called Strachey brackets, are used to indicate the semantic evaluation function. The CJK glyphs 〚 〛 look identical except they have added width. They can be typeset in LaTeX with the package stmaryrd.
Representations of various kinds of brackets in ASCII, Unicode and HTML are given below.
Usage | Unicode | SGML/HTML/XML entities | Sample | |
---|---|---|---|---|
Quotation (Western texts) |
U+00AB | Left double guillemet | « | « words » |
U+00BB | Right double guillemet | » | ||
U+2039 | Left single guillemet | ‹ | ‹ x › | |
U+203A | Right single guillemet | › | ||
General purpose | U+0028 | Left parenthesis | ( &lparen; | (parenthesis) |
U+0029 | Right parenthesis | ) &rparen; | ||
U+005B | Left square bracket | [ | [sic] | |
U+005D | Right square bracket | ] | ||
Technical/mathematical (common) |
U+003C | Less-than sign | < < | <HTML> |
U+003E | Greater-than sign | > > | ||
U+007B | Left curly bracket | { | {round, square, curly} | |
U+007D | Right curly bracket | } | ||
Technical/mathematical (specialized) |
U+2308 | Left ceiling | ᄴ | ⌈ceiling⌉ |
U+2309 | Right ceiling | ᄵ | ||
U+230A | Left floor | ᄶ | ⌊floor⌋ | |
U+230B | Right floor | ᄷ | ||
U+27E8 | Mathematical left angle bracket | ⟨ ⟨* | ⟨a, b⟩ | |
U+27E9 | Mathematical right angle bracket | ⟩ ⟩* | ||
Quotation (halfwidth East-Asian texts) |
U+2329 | Left pointing angle bracket | 〈 ⟨* | 〈deprecated〉 |
U+232A | Right pointing angle bracket | 〉 ⟩* | ||
U+FF62 | Halfwidth left corner bracket | 「 | 「カタカナ」 | |
U+FF63 | Halfwidth right corner angle bracket | 」 | ||
Quotation (fullwidth East-Asian texts) |
U+3008 | Left angle bracket | 〈 | 〈한〉 |
U+3009 | Right angle bracket | 〉 | ||
U+300A | Left double angle bracket | 《 | 《한한》 | |
U+300B | Right double angle bracket | 》 | ||
U+300C | Left corner bracket | 「 | 「白八櫨」 | |
U+300D | Right corner bracket | 」 | ||
U+300E | Left corner bracket | 『 | 『カタカナ』 | |
U+300F | Right corner bracket | 』 | ||
U+3010 | Left thick square bracket | 【 | 【ひらがな】 | |
U+3011 | Right thick square bracket | 】 | ||
General purpose (fullwidth East-Asian) |
U+FF08 | Fullwidth left parenthesis | ( | (Wiki) |
U+FF09 | Fullwidth right parenthesis | ) | ||
U+FF3B | Fullwidth left square bracket | [ | [sic] | |
U+FF3D | Fullwidth right square bracket | ] | ||
Technical/mathematical (fullwidth East-Asian) |
U+FF1C | Fullwidth less-than sign | < | <HTML> |
U+FF1E | Fullwidth greater-than sign | > | ||
U+FF5B | Fullwidth left curly bracket | { | {1、2} | |
U+FF5D | Fullwidth right curly bracket | } |
*⟨ and ⟩ were tied to the deprecated symbols U+2329 and U+232A in HTML4 and MathML2, but are being migrated to U+27E8 and U+27E9 for HTML5 and MathML3.
Braces (curly brackets) first became part of a character set with the 8-bit code of the IBM 7030 Stretch.[16]
The angle brackets or chevrons at U+27E8 and U+27E9 are for mathematical use and Western languages, while U+3008 and U+3009 are for East Asian languages. The chevrons at U+2329 and U+232A are deprecated in favour of the U+3008 and U+3009 East Asian angle brackets. Unicode discourages their use for mathematics and in Western texts[17] because they are canonically equivalent to the CJK code points U+300x and thus likely to render as double-width symbols. The less-than and greater-than symbols are often used as replacements for chevrons.
These various bracket characters are frequently used in many computer languages as operators or for other syntax markup. The more common uses follow.
a*(b+c)
has subexpressions a
and b+c
, whereas a*b+c
has subexpressions a*b
and c
substring($val,10,1)
(cons a b)
queue[3]
[5, 10, 15]
[ 2 3 + ] literal
causes the compiler to switch to the interpreter mode, calculate expression 2+3, leave the result on stack and resume compilation. As a result, a literal constant "5" will be compiled into the definition, instead of the whole expression.http://[2001:db8:3c4d:15::abcd:ef12]:8080
These symbols are used in pairs as if they are brackets,
<div>
)ul.main>li
whereas all direct child selectors of the ul.main
tag are targetted.)<name> ::= <first-name> <last-name>
)When not used in pairs to delimit text (not acting as brackets):
<>
denotes an inequation ("not equal to").<<
or >>
they may represent bit shift operators, or in C++, also stream input/output operators.In normal writing (prose) an opening bracket is rarely left hanging at the end of a line of text nor is a closing bracket permitted to start one. However, in computer code this is often done intentionally to aid readability. For example, a bracketed list of items separated by semicolons may be written with the brackets on separate lines, and the items, followed by the semicolon, each on one line.
A common error in programming is mismatching braces; accordingly, many IDEs have braces matching to highlight matching pairs.
In addition to the use of parentheses to specify the order of operations, both parentheses and brackets are used to denote an interval, also referred to as a half-open range. The notation [a,c) is used to indicate an interval from a to c that is inclusive of a but exclusive of c. That is, [5, 12) would be the set of all real numbers between 5 and 12, including 5 but not 12. The numbers may come as close as they like to 12, including 11.999 and so forth (with any finite number of 9s), but 12.0 is not included. In Europe, the notation [5, 12[ is also used for this. The endpoint adjoining the bracket is known as closed, while the endpoint adjoining the parenthesis is known as open. If both types of brackets are the same, the entire interval may be referred to as closed or open as appropriate. Whenever +∞ or −∞ is used as an endpoint, it is normally considered open and adjoined to a parenthesis. See Interval (mathematics) for a more complete treatment.
In quantum mechanics, chevrons are also used as part of Dirac's formalism, bra-ket notation, to note vectors from the dual spaces of the Bra ⟨A| and the Ket |B⟩. Mathematicians will also commonly write ⟨a, b⟩ for the inner product of two vectors. In statistical mechanics, chevrons denote ensemble or time average. Chevrons are used in group theory to write group presentations, and to denote the subgroup generated by a collection of elements. Note that obtuse angled chevrons are not always (and even not by all users) distinguished from a pair of less-than and greater-than signs <>, which are sometimes used as a typographic approximation of chevrons.
In group theory and ring theory, brackets denote the commutator. In group theory, the commutator [g, h] is commonly defined as g −1 h −1 g h . In ring theory, the commutator [a, b] is defined as a b − b a . Furthermore, in ring theory, braces denote the anticommutator where {a, b} is defined as a b + b a . The bracket is also used to denote the Lie derivative, or more generally the Lie bracket in any Lie algebra.
Various notations, like the vinculum have a similar effect to brackets in specifying order of operations, or otherwise grouping several characters together for a common purpose.
In the Z formal specification language, braces define a set and chevrons define a sequence.
Traditionally in accounting, negative amounts are placed in parentheses.
Brackets are used in some countries in the citation of law reports to identify parallel citations to non-official reporters. For example: Chronicle Pub. Co. v. Superior Court, (1998) 54 Cal.2d 548, [7 Cal.Rptr. 109]. In some other countries (such as England and Wales), square brackets are used to indicate that the year is part of the citation, as opposed to optional information. For example, National Coal Board v England [1954] AC 403, (1954) 98 Sol Jo 176 – the case report is in the 1954 volume of the Appeal Cases reports (year not optional) and in volume 98 of the Solicitor's Journal (year optional, since the volumes are numbered, and so given in round brackets).
When quoted material is in any way altered, the alterations are enclosed in brackets within the quotation. For example: Plaintiff asserts his cause is just, stating, "[m]y causes is [sic] just." While in the original quoted sentence the word "my" was capitalized, it has been modified in the quotation and the change signalled with brackets. Similarly, where the quotation contained a grammatical error, the quoting author signalled that the error was in the original with "[sic]" (Latin for "thus"). (California Style Manual, section 4:59 (4th ed.))
Tournament brackets, the diagrammatic representation of the series of games played during a tournament usually leading to a single winner, are so named for their resemblance to brackets or braces.
In roleplaying, and writing, brackets are used for out-of-speech sentences (otherwise known as OOC, out-of-character). Example:
(What's your name?)
To avoid ambiguity as to whether this is an in-character parenthetical statement or an out-of-character statement, in many circles double brackets are used, as they are unheard of in standard writing.
((How long have you played here?))
This article contains instructions, advice, or how-to content. The purpose of Wikipedia is to present facts, not to train. Please help improve this article either by rewriting the how-to content or by moving it to Wikiversity or Wikibooks. (July 2011) |
In computing, a newline,[1] also known as a line break or end-of-line (EOL) marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the text immediately preceding the newline. The actual codes representing a newline vary across operating systems, which can be a problem when exchanging text files between systems with different newline representations.
There is also some confusion whether newlines terminate or separate lines. If a newline is considered a separator, there will be no newline after the last line of a file. The general convention on most systems is to add a newline even after the last line, i.e. to treat newline as a line terminator. Some programs have problems processing the last line of a file if it is not newline terminated. Conversely, programs that expect newline to be used as a separator will interpret a final newline as starting a new (empty) line.
In text intended primarily to be read by humans using software which implements the word wrap feature, a newline character typically only needs to be stored if a line break is required independent of whether the next word would fit on the same line, such as between paragraphs and in vertical lists. See hard return and soft return.
Contents |
Software applications and operating systems usually represent a newline with one or two control characters:
Most textual Internet protocols (including HTTP, SMTP, FTP, IRC and many others) mandate the use of ASCII CR+LF (0x0D 0x0A) on the protocol level, but recommend that tolerant applications recognize lone LF as well. In practice, there are many applications that erroneously use the C newline character '\n' instead (see section Newline in programming languages below). This leads to problems when trying to communicate with systems adhering to a stricter interpretation of the standards; one such system is the qmail MTA that actively refuses to accept messages from systems that send bare LF instead of the required CR+LF.[2]
FTP has a feature to transform newlines between CR+LF and LF only when transferring text files. This must not be used on binary files. Usually binary files and text files are recognised by checking their filename extension.
The Unicode standard defines a large number of characters that conforming applications should recognize as line terminators:[3]
LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029
This may seem overly complicated compared to an approach such as converting all line terminators to a single character, for example LF. However, Unicode was designed to preserve all information when converting a text file from any existing encoding to Unicode and back. Therefore, Unicode should contain characters included in existing encodings. NEL is included in ISO-8859-1[citation needed] and EBCDIC (0x15). The approach taken in the Unicode standard allows round-trip transformation to be information-preserving while still enabling applications to recognize all possible types of line terminators.
Recognizing and using the newline codes greater than 0x7F is not often done. They are multiple bytes in UTF-8 and the code for NEL has been used as the ellipsis ('…') character in Windows-1252. For instance:
ASCII was developed simultaneously by the ISO and the ASA, the predecessor organization to ANSI. During the period of 1963–1968, the ISO draft standards supported the use of either CR+LF or LF alone as a newline, while the ASA drafts supported only CR+LF.
The sequence CR+LF was in common use on many early computer systems that had adopted Teletype machines, typically a Teletype Model 33 ASR, as a console device, because this sequence was required to position those printers at the start of a new line. On these systems, text was often routinely composed to be compatible with these printers, since the concept of device drivers hiding such hardware details from the application was not yet well developed; applications had to talk directly to the Teletype machine and follow its conventions.
Most minicomputer systems from DEC used this convention. CP/M used it as well, to print on the same terminals that minicomputers used. From there MS-DOS (1981) adopted CP/M's CR+LF in order to be compatible, and this convention was inherited by Microsoft's later Windows operating system.
The separation of the two functions concealed the fact that the print head could not return from the far right to the beginning of the next line in one-character time. That is why the sequence was always sent with the CR first. In fact, it was often necessary to send extra characters (extraneous CRs or NULs, which are ignored) to give the print head time to move to the left margin. Even many early video displays required multiple character times to scroll the display.
The Multics operating system began development in 1964 and used LF alone as its newline. Multics used a device driver to translate this character to whatever sequence a printer needed (including extra padding characters), and the single byte was much more convenient for programming. The seemingly more obvious choice of CR was not used, as a plain CR provided the useful function of overprinting one line with another, and thus it was useful to not translate it. Unix followed the Multics practice, and later systems followed Unix.
To facilitate the creation of portable programs, programming languages provide some abstractions to deal with the different types of newline sequences used in different environments.
The C programming language provides the escape sequences '\n' (newline) and '\r' (carriage return). However, these are not required to be equivalent to the ASCII LF and CR control characters. The C standard only guarantees two things:
On Unix platforms, where C originated, the native newline sequence is ASCII LF (0x0A), so '\n' was simply defined to be that value. With the internal and external representation being identical, the translation performed in text mode is a no-op, and text mode and binary mode behave the same. This has caused many programmers who developed their software on Unix systems simply to ignore the distinction completely, resulting in code that is not portable to different platforms.
The C library function fgets() is best avoided in binary mode because any file not written with the UNIX newline convention will be misread. Also, in text mode, any file not written with the system's native newline sequence (such as a file created on a UNIX system, then copied to a Windows system) will be misread as well.
Another common problem is the use of '\n' when communicating using an Internet protocol that mandates the use of ASCII CR+LF for ending lines. Writing '\n' to a text mode stream works correctly on Windows systems, but produces only LF on Unix, and something completely different on more exotic systems. Using "\r\n" in binary mode is slightly better.
Many languages, such as C++, Perl,[6] and Haskell provide the same interpretation of '\n' as C.
Java, PHP,[7] and Python[8] provide the '\r\n' sequence (for ASCII CR+LF). In contrast to C, these are guaranteed to represent the values U+000A and U+000D, respectively.
The Java I/O libraries do not transparently translate these into platform-dependent newline sequences on input or output. Instead, they provide functions for writing a full line that automatically add the native newline sequence, and functions for reading lines that accept any of CR, LF, or CR+LF as a line terminator (see BufferedReader.readLine()). The System.getProperty() method can be used to retrieve the underlying line separator.
Example:
String eol = System.getProperty( "line.separator" ); String lineColor = "Color: Red" + eol;
Python permits "Universal Newline Support" when opening a file for reading, when importing modules, and when executing a file.[9]
Some languages have created special variables, constants, and subroutines to facilitate newlines during program execution.
The different newline conventions often cause text files that have been transferred between systems of different types to be displayed incorrectly. For example, files originating on Unix or Apple Macintosh systems may appear as a single long line on some Windows programs. Conversely, when viewing a file originating from a Windows computer on a Unix system, the extra CR may be displayed as ^M at the end of each line or as a second line break.
The problem can be hard to spot if some programs handle the foreign newlines properly while others do not. For example, a compiler may fail with obscure syntax errors even though the source file looks correct when displayed on the console or in an editor. On a Unix system, the command cat -v myfile.txt will send the file to stdout (normally the terminal) and make the ^M visible, which can be useful for debugging. Modern text editors generally recognize all flavours of CR / LF newlines and allow the user to convert between the different standards. Web browsers are usually also capable of displaying text files and websites which use different types of newlines.
The File Transfer Protocol can automatically convert newlines in files being transferred between systems with different newline representations when the transfer is done in "ASCII mode". However, transferring binary files in this mode usually has disastrous results: Any occurrence of the newline byte sequence—which does not have line terminator semantics in this context, but is just part of a normal sequence of bytes—will be translated to whatever newline representation the other system uses, effectively corrupting the file. FTP clients often employ some heuristics (for example, inspection of filename extensions) to automatically select either binary or ASCII mode, but in the end it is up to the user to make sure his or her files are transferred in the correct mode. If there is any doubt as to the correct mode, binary mode should be used, as then no files will be altered by FTP, though they may display incorrectly.
This article contains instructions, advice, or how-to content. The purpose of Wikipedia is to present facts, not to train. Please help improve this article either by rewriting the how-to content or by moving it to Wikiversity or Wikibooks. (June 2010) |
Text editors are often used for converting a text file between different newline formats; most modern editors can read and write files using at least the different ASCII CR/LF conventions. The standard Windows editor Notepad is not one of them (although Wordpad and the MS-DOS Editor are).
Editors are often unsuitable for converting larger files. For larger files (on Windows NT/2000/XP) the following command is often used:
TYPE unix_file | FIND "" /V > dos_file
On many Unix systems, the dos2unix (sometimes named fromdos or d2u) and unix2dos (sometimes named todos or u2d) utilities are used to translate between ASCII CR+LF (DOS/Windows) and LF (Unix) newlines. Different versions of these commands vary slightly in their syntax. However, the tr command is available on virtually every Unix-like system and is used to perform arbitrary replacement operations on single characters. A DOS/Windows text file can be converted to Unix format by simply removing all ASCII CR characters with
tr -d '\r' < inputfile > outputfile
or, if the text has only CR newlines, by converting all CR newlines to LF with
tr '\r' '\n' < inputfile > outputfile
The same tasks are sometimes performed with awk, sed, Tr_(Unix) or in Perl if the platform has a Perl interpreter:
awk '{sub("$","\r\n"); printf("%s",$0);}' inputfile > outputfile # UNIX to DOS (adding CRs on Linux and BSD based OS that haven't GNU extensions) awk '{gsub("\r",""); print;}' inputfile > outputfile # DOS to UNIX (removing CRs on Linux and BSD based OS that haven't GNU extensions) sed -e 's/$/\r/' inputfile > outputfile # UNIX to DOS (adding CRs on Linux based OS that use GNU extensions) sed -e 's/\r$//' inputfile > outputfile # DOS to UNIX (removing CRs on Linux based OS that use GNU extensions) cat inputfile | tr -d "\r" > outputfile # DOS to UNIX (removing CRs using tr(1). Not Unicode compliant.) perl -pe 's/\r?\n|\r/\r\n/g' inputfile > outputfile # Convert to DOS perl -pe 's/\r?\n|\r/\n/g' inputfile > outputfile # Convert to UNIX perl -pe 's/\r?\n|\r/\r/g' inputfile > outputfile # Convert to old Mac
To identify what type of line breaks a text file contains, the file command can be used. Moreover, the editor Vim can be convenient to make a file compatible with the Windows notepad text editor. For example:
[prompt] > file myfile.txt myfile.txt: ASCII English text [prompt] > vim myfile.txt within vim :set fileformat=dos :wq [prompt] > file myfile.txt myfile.txt: ASCII English text, with CRLF line terminators
The following grep commands echo the filename (in this case myfile.txt) to the command line if the file is of the specified style:
grep -PL $'\r\n' myfile.txt # show UNIX style file (LF terminated) grep -Pl $'\r\n' myfile.txt # show DOS style file (CRLF terminated)
For Debian-based systems, these commands are used:
egrep -L $'\r\n' myfile.txt # show UNIX style file (LF terminated) egrep -l $'\r\n' myfile.txt # show DOS style file (CRLF terminated)
The above grep commands work under Unix systems or in Cygwin under Windows. Note that these commands make some assumptions about the kinds of files that exist on the system (specifically it's assuming only UNIX and DOS-style files—no Mac OS 9-style files).
This technique is often combined with find to list files recursively. For instance, the following command checks all "regular files" (e.g. it will exclude directories, symbolic links, etc.) to find all UNIX-style files in a directory tree, starting from the current directory (.), and saves the results in file unix_files.txt, overwriting it if the file already exists:
find . -type f -exec grep -PL '\r\n' {} \; > unix_files.txt
This example will find C files and convert them to LF style line endings:
find -name '*.[ch]' -exec fromdos {} \;
The file command also detects the type of EOL used:
file myfile.txt > myfile.txt: ASCII text, with CRLF line terminators
Other tools permit the user to visualise the EOL characters:
od -a myfile.txt cat -e myfile.txt hexdump -c myfile.txt
dos2unix, unix2dos, mac2unix, unix2mac, mac2dos, dos2mac can perform conversions. The flip[10] command is often used.