C: A Reference Manual
PART 1 The C Language
1 Introduction
Dennis Ritchie
2 1. CHARACTER SET
A C source file is a sequence of characters selected from a character. C programs are written using the following characters:
1). the 52 Latin capital and small letters: A~Z and a~z
2). the 10 digits: 0~9
3). the space
4). the horizontal tab(HT), vertical tab(VT), form feed(FF) control characters.
5). the 29 graphic character and their official names.
! execlamation mark
% percent sign
^ circumflex accent
& ampersand
- asterisk
( left parenthesis
_ lowline(underscore)
) right parenthesis
- hyphen-minus
- plus sign
= equals sign
~ tilde
[ left square bracket
] right square bracket
' apostrhphe
vertical line |
/ reverse solidus(backslash)
; semicolon
colon
" quotation mark
{ left curly bracket
} right curly bracket
, comma
. full stop
< less-than sign
> greater-than sign
/ solidus(slash, divide sign)
? question mark
Some countries have national character sets that do not include all the graphic character above defined trigraphs and token respelling to allow C programs to be written in the ISO 646-1083 Invariant Code Set.
6). additional characters are sometimes used in C source programes, including
a). formatting characters such as backspace(BS) and carriage return(CR) characters
b). additional Basic Latin characters, include the character $,@,`(grave accent)
The formatting characters are treated as spaces and do not otherwise affect the source program. The additional graphic characters may appear only in comments, character constants, string constants, and file names.
Execution Character Set
The character set interpreted during the execution of a C program is not necessarily the same as the one in which the C programe is written.(like as cross compiler tool). Character int the execution character set are represented by their equivalent int the source character set or by special character escape sequences(escape sequence 换码顺序) that begin with the backslash(/) character.
In addition to the standard characters methioned before, the execution character set must also include:
1). a null character that must be encoded as the value 0, which is used to mark the end of strings.
2). a newline character that is used as the end-of-line marker whichi divide character streams into lines during input/output.
3). the alert,backspace,and carriage return characters.
Whitespace and Line Terminaton
In C source programs the blank(space), end-of-line, VT,FF,HT are known collectively as whitespace characters.(Comments are also whitespace) These characters are ignored except insofar as they are used to separate adjacent tokens.
Character Encoding
A common C programming error is to aussume a particular encoding is in use when another one holds in fact.
Trigraphs
A set of trigraphs is included in Standard C so that programs may be written using only thew ISO 646-1083 Invariant Code Set, a subset of the seven-bit ASCII code set and a code set that is common to many non-english national character sets. The trigraphs, introduced by two consecutive question mark characters. listed in follows:
??( [
??) ]
??< {
??> }
??/ /
??! |
??' ^
??- _
??= #
Digraphs
<: [
:> ]
<% {
%> }
%: #
%:%: ##
Ended with Hello world program
%:include <stdio.h> int main() <% char buf<:??)="Hello world !"; printf("%s/n", buf); return 0; ??>