2. Lexical Elements¶
Note
The contents of this section are informational.
2:1 The text of a Rust program consists of modules organized into source files. The text of a source file is a sequence of lexical elements, each composed of characters, whose rules are presented in this chapter.
2.1. Character Set¶
2.1:1 The program text of a Rust program is written using the Unicode character set.
Syntax
2.1:2 A character is defined by this document for each cell in the coding space described by Unicode, regardless of whether or not Unicode allocates a character to that cell.
2.1:3 A whitespace character is one of the following characters:
2.1:4 0x09 (horizontal tabulation)
2.1:5 0x0A (new line)
2.1:6 0x0B (vertical tabulation)
2.1:7 0x0C (form feed)
2.1:8 0x0D (carriage return)
2.1:9 0x20 (space)
2.1:10 0x85 (next line)
2.1:11 0x200E (left-to-right mark)
2.1:12 0x200F (right-to-left mark)
2.1:13 0x2028 (line separator)
2.1:14 0x2029 (paragraph separator)
2.1:15 A whitespace string is a string that consists of one or more whitespace characters.
2.1:16
An AsciiCharacter
is any Unicode character in the range 0x00 - 0x7F, both inclusive.
Legality Rules
2.1:17 The coded representation of a character is tool-defined.
2.2. Lexical Elements, Separators, and Punctuation¶
Syntax
LexicalElement
::=Comment
|Identifier
|Keyword
|Literal
|Punctuation
Punctuation
::=Delimiter
| + | - | * | / | % | ^ | ! | & | | | && | || | << | >> | += | -= | *= | /= | %= | ^= | &= | |= | <<= | >>= | = | == | != | > | < | >= | <= | @ | _ | . | .. | ... | ..= | , | ; | : | :: | -> | => | # | $ | ?Delimiter
::= { | } | [ | ] | ( | )
Legality Rules
2.2:1 The text of a source file is a sequence of separate lexical elements. The meaning of a program depends only on the particular sequence of lexical elements, excluding non-doc comments.
2.2:2 A lexical element is the most basic syntactic element in program text.
2.2:3 The text of a source file is divided into lines.
2.2:4 A line is a sequence of zero or more characters followed by an end of line.
2.2:5 The representation of an end of line is tool-defined.
2.2:6 A separator is a character or a string that separates adjacent lexical elements. A whitespace string is a separator.
2.2:7 A simple punctuator is one of the following special characters:
+ - * / % ^ ! & | = > < @ _ . , ; : # $ ? { } [ ] ( )
2.2:8 A compound punctuator is one of the following two or more adjacent special characters:
&& || << >> += -= *= /= %= ^= &= |= <<= >>= == != >= <= .. ... ..= :: -> =>
2.2:9 The following compound punctuators are flexible compound punctuators.
&& || << >>
2.2:10 A flexible compound punctuator may be treated as a single compound punctuator or two adjacent simple punctuators.
2.2:11 Each of the special characters listed for single character punctuator is a simple punctuator except if this character is used as a character of a compound punctuator, or a character of a character literal, a comment, a numeric literal, or a string literal.
2.2:12 The following names are used when referring to punctuators:
2.2:13 |
punctuator |
name |
2.2:14 |
|
Plus |
2.2:15 |
|
Minus |
2.2:16 |
|
Star |
2.2:17 |
|
Slash |
2.2:18 |
|
Percent |
2.2:19 |
|
Caret |
2.2:20 |
|
Not |
2.2:21 |
|
And |
2.2:22 |
|
Or |
2.2:23 |
|
And and, lazy boolean and |
2.2:24 |
|
Or or, lazy boolean or |
2.2:25 |
|
Shift left |
2.2:26 |
|
Shift right |
2.2:27 |
|
Plus equals |
2.2:28 |
|
Minus equals |
2.2:29 |
|
Star equals |
2.2:30 |
|
Slash equals |
2.2:31 |
|
Percent equals |
2.2:32 |
|
Caret equals |
2.2:33 |
|
And equals |
2.2:34 |
|
Or equals |
2.2:35 |
|
Shift left equals |
2.2:36 |
|
Shift right equals |
2.2:37 |
|
Equals |
2.2:38 |
|
Equals equals, logical equality |
2.2:39 |
|
Not equals |
2.2:40 |
|
Greater than |
2.2:41 |
|
Less than |
2.2:42 |
|
Greater than equals, greater than or equal to |
2.2:43 |
|
Less than equals, less than or equal to |
2.2:44 |
|
At |
2.2:45 |
|
Underscore |
2.2:46 |
|
Dot |
2.2:47 |
|
Dot dot, exclusive range |
2.2:48 |
|
Dot dot dot, ellipsis |
2.2:49 |
|
Dot dot equals, inclusive range |
2.2:50 |
|
Comma |
2.2:51 |
|
Semicolon |
2.2:52 |
|
Colon |
2.2:53 |
|
Colon colon, path separator |
2.2:54 |
|
Right arrow |
2.2:55 |
|
Fat arrow, Hashrocket |
2.2:56 |
|
Pound |
2.2:57 |
|
Dollar sign |
2.2:58 |
|
Question mark |
2.2:59 |
|
Left curly brace |
2.2:60 |
|
Right curly brace |
2.2:61 |
|
Left square bracket |
2.2:62 |
|
Right square bracket |
2.2:63 |
|
Left parenthesis |
2.2:64 |
|
Right parenthesis |
2.3. Identifiers¶
Syntax
Identifier
::=NonKeywordIdentifier
|RawIdentifier
IdentifierList
::=Identifier
(,Identifier
)* ,?NonKeywordIdentifier
::=PureIdentifier
|WeakKeyword
RawIdentifier
::= r# (PureIdentifier
|RawIdentifierKeyword
)PureIdentifier
::=XID_Start
XID_Continue
* | _XID_Continue
+IdentifierOrUnderscore
::=Identifier
| _Renaming
::= asIdentifierOrUnderscore
2.3:1
A RawIdentifierKeyword
is any keyword in category Keyword
,
except crate
, self
, Self
, and super
.
2.3:2
XID_Start
and XID_Continue
are defined in Unicode Standard Annex
#31.
Legality Rules
2.3:3 An identifier is a lexical element that refers to a name.
2.3:4 A pure identifier is an identifier that does not include weak keywords.
2.3:5 A pure identifier shall follow the specification in Unicode Standard Annex #31 for Unicode version 13.0, with the following profile:
2.3:6
Start
=XID_Start
, plus character 0x5F (low line).2.3:7
Continue
=XID_Continue
2.3:8
Medial
= empty
2.3:9 Characters 0x200C (zero width non-joiner) and 0x200D (zero width joiner) shall not appear in a pure identifier.
2.3:10
A pure identifier shall be restricted to characters in category
AsciiCharacter
in the following contexts:
2.3:11 Crate imports,
2.3:12 Names of external crates represented in a simple path, when the simple path starts with namespace qualifier
::
,2.3:13 Names of outline modules that lack attribute
path
,2.3:14 Names of items that are subject to attribute
no_mangle
,2.3:15 Names of items within external blocks.
2.3:16 Identifiers are normalized using Normalization Form C as defined in Unicode Standard Annex #15.
2.3:17 Two identifiers are considered the same if they consist of the same sequence of characters after performing normalization.
2.3:18 Declarative macros and procedural macros shall receive normalized identifiers in their input.
Examples
foo
_identifier
r#true
Москва
東京
2.4. Literals¶
Syntax
Literal
::=BooleanLiteral
|ByteLiteral
|ByteStringLiteral
|CStringLiteral
|CharacterLiteral
|NumericLiteral
|StringLiteral
Legality Rules
2.4:1 A literal is a fixed value in program text.
2.4.1. Byte Literals¶
Syntax
ByteLiteral
::= b'ByteContent
'ByteContent
::=ByteCharacter
|ByteEscape
ByteEscape
::= | \0 | \" | \' | \t | \n | \r | \\ | \xOctalDigit
HexadecimalDigit
2.4.1:1
A ByteCharacter
is any character in category AsciiCharacter
except characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage
return), 0x27 (apostrophe), and 0x5C (reverse solidus).
Legality Rules
2.4.1:2 A byte literal is a literal that denotes a fixed byte value.
2.4.1:3
The type of a byte literal is u8
.
Examples
b'h'
b'\n'
b'\x1B'
2.4.2. Byte String Literals¶
Syntax
ByteStringLiteral
::=RawByteStringLiteral
|SimpleByteStringLiteral
Legality Rules
2.4.2:1
A byte string literal is a literal that consists of multiple
AsciiCharacters
.
2.4.2:2 The character sequence 0x0D 0xCA (carriage return, new line) is replaced by 0xCA (new line) inside of a byte string literal.
2.4.2.1. Simple Byte String Literals¶
Syntax
SimpleByteStringLiteral
::= b"SimpleByteStringContent
* "SimpleByteStringContent
::=ByteEscape
|SimpleByteStringCharacter
|StringContinuation
2.4.2.1:1
A SimpleByteStringCharacter
is any character in category AsciiCharacter
except characters 0x0D (carriage return), 0x22 (quotation mark), and 0x5C
(reverse solidus).
Legality Rules
2.4.2.1:2
A simple byte string literal is a byte string literal that consists of multiple
AsciiCharacters
.
2.4.2.1:3
The type of a simple byte string literal of size N
is &'static [u8;
N]
.
Examples
b""
b"a\tb"
b"Multi\
line"
2.4.2.2. Raw Byte String Literals¶
Syntax
RawByteStringLiteral
::= brRawByteStringContent
RawByteStringContent
::=NestedRawByteStringContent
| "AsciiCharacter
* "NestedRawByteStringContent
::= #RawByteStringContent
#
Legality Rules
2.4.2.2:1 A raw byte string literal is a simple byte string literal that does not recognize escaped characters.
2.4.2.2:2
The type of a raw byte string literal of size N
is &'static
[u8; N]
.
Examples
br""
br#""#
br##"left #"# right"##
2.4.3. C String Literals¶
Syntax
CStringLiteral
::=RawCStringLiteral
|SimpleCStringLiteral
Legality Rules
2.4.3:1 A c string literal is a literal that consists of multiple characters with an implicit 0x00 byte appended to it.
2.4.3:2 The character sequence 0x0D 0xCA (carriage return, new line) is replaced by 0xCA (new line) inside of a c string literal.
2.4.3.1. Simple C String Literals¶
Syntax
SimpleCStringLiteral
::= c"SimpleCStringContent
* "SimpleCStringContent
::=AsciiEscape
|SimpleStringCharacter
|StringContinuation
|UnicodeEscape
2.4.3.1:1 A simple c string literal is any Unicode character except characters 0x0D (carriage return), 0x22 (quotation mark), 0x5C (reverse solidus) and 0x00 (null byte).
Legality Rules
2.4.3.1:2 A simple c string literal is a c string literal where the characters are Unicode characters.
2.4.3.1:3
The type of a simple string literal is &'static
core::ffi::CStr
.
Examples
c""
c"cat"
c"\tcol\nrow"
c"bell\x07"
c"\u{B80a}"
c"\
multi\
line\
string"
2.4.3.2. Raw C String Literals¶
Syntax
RawCStringLiteral
::= crRawCStringContent
RawCStringContent
::=NestedRawCStringContent
| " ~[\r]* "NestedRawCStringContent
::= #RawCStringContent
#
Legality Rules
2.4.3.2:1 A raw c string literal is a simple c string literal that does not recognize escaped characters.
2.4.3.2:2
The type of a simple string literal is &'static
core::ffi::CStr
.
Examples
cr""
cr#""#
cr##"left #"# right"##
2.4.4. Numeric Literals¶
Syntax
NumericLiteral
::=FloatLiteral
|IntegerLiteral
Legality Rules
2.4.4:1 A numeric literal is a literal that denotes a number.
2.4.4.1. Integer Literals¶
Syntax
IntegerLiteral
::=IntegerContent
IntegerSuffix
?IntegerContent
::=BinaryLiteral
|DecimalLiteral
|HexadecimalLiteral
|OctalLiteral
BinaryLiteral
::= 0bBinaryDigitOrUnderscore
*BinaryDigit
BinaryDigitOrUnderscore
*BinaryDigitOrUnderscore
::=BinaryDigit
| _BinaryDigit
::= [0-1]DecimalLiteral
::=DecimalDigit
DecimalDigitOrUnderscore
*DecimalDigitOrUnderscore
::=DecimalDigit
| _DecimalDigit
::= [0-9]HexadecimalLiteral
::= 0xHexadecimalDigitOrUnderscore
*HexadecimalDigit
HexadecimalDigitOrUnderscore
*HexadecimalDigitOrUnderscore
::=HexadecimalDigit
| _HexadecimalDigit
::= [0-9 a-f A-F]OctalLiteral
::= 0oOctalDigitOrUnderscore
*OctalDigit
OctalDigitOrUnderscore
*OctalDigitOrUnderscore
::=OctalDigit
| _OctalDigit
::= [0-7]IntegerSuffix
::=SignedIntegerSuffix
|UnsignedIntegerSuffix
SignedIntegerSuffix
::= i8 | i16 | i32 | i64 | i128 | isizeUnsignedIntegerSuffix
::= u8 | u16 | u32 | u64 | u128 | usize
Legality Rules
2.4.4.1:1 An integer literal is a numeric literal that denotes a whole number.
2.4.4.1:2 A binary literal is an integer literal in base 2.
2.4.4.1:3 A decimal literal is an integer literal in base 10.
2.4.4.1:4 A hexadecimal literal is an integer literal in base 16.
2.4.4.1:5 An octal literal is an integer literal in base 8.
2.4.4.1:6 An integer suffix is a component of an integer literal that specifies an explicit integer type.
2.4.4.1:7 A suffixed integer is an integer literal with an integer suffix.
2.4.4.1:8 An unsuffixed integer is an integer literal without an integer suffix.
2.4.4.1:9 The type of a suffixed integer is determined by its integer suffix as follows:
2.4.4.1:10 Suffix
i8
specifies typei8
.2.4.4.1:11 Suffix
i16
specifies typei16
.2.4.4.1:12 Suffix
i32
specifies typei32
.2.4.4.1:13 Suffix
i64
specifies typei64
.2.4.4.1:14 Suffix
i128
specifies typei128
.2.4.4.1:15 Suffix
isize
specifies typeisize
.2.4.4.1:16 Suffix
u8
specifies typeu8
.2.4.4.1:17 Suffix
u16
specifies typeu16
.2.4.4.1:18 Suffix
u32
specifies typeu32
.2.4.4.1:19 Suffix
u64
specifies typeu64
.2.4.4.1:20 Suffix
u128
specifies typeu128
.2.4.4.1:21 Suffix
usize
specifies typeusize
.
2.4.4.1:22 The type of an unsuffixed integer is determined by type inference as follows:
2.4.4.1:23 If an integer type can be uniquely determined from the surrounding program context, then the unsuffixed integer has that type.
2.4.4.1:24 If the program context under-constrains the type, then the inferred type is
i32
.2.4.4.1:25 If the program context over-constrains the type, then this is considered a static error.
Examples
0b0010_1110_u8
1___2_3
0x4D8a
0o77_52i128
2.4.4.2. Float Literals¶
Syntax
FloatLiteral
::=DecimalLiteral
. |DecimalLiteral
FloatExponent
|DecimalLiteral
.DecimalLiteral
FloatExponent
? |DecimalLiteral
(.DecimalLiteral
)?FloatExponent
?FloatSuffix
FloatExponent
::=ExponentLetter
ExponentSign
?ExponentMagnitude
ExponentLetter
::= e | EExponentSign
::= + | -ExponentMagnitude
::=DecimalDigitOrUnderscore
*DecimalDigit
DecimalDigitOrUnderscore
*FloatSuffix
::= f32 | f64
Legality Rules
2.4.4.2:1 A float literal is a numeric literal that denotes a fractional number.
2.4.4.2:2 A float suffix is a component of a float literal that specifies an explicit floating-point type.
2.4.4.2:3 A suffixed float is a float literal with a float suffix.
2.4.4.2:4 An unsuffixed float is a float literal without a float suffix.
2.4.4.2:5 The type of a suffixed float is determined by the float suffix as follows:
2.4.4.2:8 The type of an unsuffixed float is determined by type inference as follows:
2.4.4.2:9 If a floating-point type can be uniquely determined from the surrounding program context, then the unsuffixed float has that type.
2.4.4.2:10 If the program context under-constrains the type, then the inferred type is
f64
.2.4.4.2:11 If the program context over-constrains the type, then this is considered a static error.
Examples
45.
8E+1_820
3.14e5
8_031.4_e-12f64
2.4.5. Character Literals¶
Syntax
CharacterLiteral
::= 'CharacterContent
'CharacterContent
::=AsciiEscape
|CharacterLiteralCharacter
|UnicodeEscape
AsciiEscape
::= | \0 | \" | \' | \t | \n | \r | \\ | \xOctalDigit
HexadecimalDigit
2.4.5:1
A CharacterLiteralCharacter
is any Unicode character except
characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage
return), 0x27 (apostrophe), and 0x5c (reverse solidus).
2.4.5:2
A UnicodeEscape
starts with a \u{
literal, followed by 1 to 6
instances of a HexadecimalDigit
, inclusive, followed by a }
character.
It can represent any Unicode codepoint between U+00000 and U+10FFFF,
inclusive, except Unicode surrogate codepoints, which exist between
the range of U+D800 and U+DFFF, inclusive.
Legality Rules
2.4.5:3 A character literal is a literal that denotes a fixed Unicode character.
2.4.5:4
The type of a character literal is char
.
Examples
'a'
'\t'
'\x1b'
'\u{1F30}'
2.4.6. String Literals¶
Syntax
StringLiteral
::=RawStringLiteral
|SimpleStringLiteral
Legality Rules
2.4.6:1 A string literal is a literal that consists of multiple characters.
2.4.6:2 The character sequence 0x0D 0xCA (carriage return, new line) is replaced by 0xCA (new line) inside of a string literal.
2.4.6.1. Simple String Literals¶
Syntax
SimpleStringLiteral
::= "SimpleStringContent
* "SimpleStringContent
::=AsciiEscape
|SimpleStringCharacter
|StringContinuation
|UnicodeEscape
2.4.6.1:1
A SimpleStringCharacter
is any Unicode character except characters
0x0D (carriage return), 0x22 (quotation mark), and 0x5C (reverse solidus).
2.4.6.1:2
StringContinuation
is the character sequence 0x5C 0x0A (reverse solidus,
new line).
Legality Rules
2.4.6.1:3 A simple string literal is a string literal where the characters are Unicode characters.
2.4.6.1:4
The type of a simple string literal is &'static str
.
Examples
""
"cat"
"\tcol\nrow"
"bell\x07"
"\u{B80a}"
"\
multi\
line\
string"
2.4.6.2. Raw String Literals¶
Syntax
RawStringLiteral
::= rRawStringContent
RawStringContent
::=NestedRawStringContent
| " ~[\r]* "NestedRawStringContent
::= #RawStringContent
#
Legality Rules
2.4.6.2:1 A raw string literal is a simple string literal that does not recognize escaped characters.
2.4.6.2:2
The type of a raw string literal is &'static str
.
Examples
r""
r#""#
r##"left #"# right"##
2.4.7. Boolean Literals¶
Syntax
BooleanLiteral
::=
false
| true
Legality Rules
2.4.7:1 A boolean literal is a literal that denotes the truth values of logic and Boolean algebra.
2.4.7:2
The type of a boolean literal is bool
.
Examples
true
2.6. Keywords¶
Syntax
Keyword
::=ReservedKeyword
|StrictKeyword
|WeakKeyword
Legality Rules
2.6:1 A keyword is a word in program text that has special meaning.
2.6:2 Keywords are case sensitive.
2.6.1. Strict Keywords¶
Syntax
StrictKeyword
::=
as
| async
| await
| break
| const
| continue
| crate
| dyn
| enum
| extern
| false
| fn
| for
| if
| impl
| in
| let
| loop
| match
| mod
| move
| mut
| pub
| ref
| return
| self
| Self
| static
| struct
| super
| trait
| true
| type
| unsafe
| use
| where
| while
Legality Rules
2.6.1:1 A strict keyword is a keyword that always holds its special meaning.
2.6.2. Reserved Keywords¶
Syntax
ReservedKeyword
::=
abstract
| become
| box
| do
| final
| macro
| override
| priv
| try
| typeof
| unsized
| virtual
| yield
Legality Rules
2.6.2:1 A reserved keyword is a keyword that is not yet in use.
2.6.3. Weak Keywords¶
Syntax
WeakKeyword
::=
macro_rules
| 'static
| union
Legality Rules
2.6.3:1 A weak keyword is a keyword whose special meaning depends on the context.
2.6.3:2
Word macro_rules
acts as a keyword only when used in the context of a
MacroRulesDefinition
.
2.6.3:3
Word 'static
acts as a keyword only when used in the context of a
LifetimeIndication
.
2.6.3:4
Word union
acts as a keyword only when used in the context of a
UnionDeclaration
.
2.5. Comments¶
Syntax
Legality Rules
2.5:1 A comment is a lexical element that acts as an annotation or an explanation in program text.
2.5:2 A block comment is a comment that spans one or more lines.
2.5:3 A line comment is a comment that spans exactly one line.
2.5:4 An inner block doc is a block comment that applies to an enclosing non-comment construct.
2.5:5 An inner line doc is a line comment that applies to an enclosing non-comment construct.
2.5:6 An inner doc comment is either an inner block doc or an inner line doc.
2.5:7 An outer block doc is a block comment that applies to a subsequent non-comment construct.
2.5:8 An outer line doc is a line comment that applies to a subsequent non-comment construct.
2.5:9 An outer doc comment is either an outer block doc or an outer line doc.
2.5:10 A doc comment is a comment class that includes inner block docs, inner line docs, outer block docs, and outer line docs.
2.5:11 Character 0x0D (carriage return) shall not appear in a comment.
2.5:12 Block comments, inner block docs, and outer block docs shall extend one or more lines.
2.5:13 Line comments, inner line docs, and outer line docs shall extend exactly one line.
2.5:14 Outer block docs and outer line docs shall apply to a subsequent non-comment construct.
2.5:15 Inner block docs and inner line docs shall apply to an enclosing non-comment construct.
2.5:16 Inner block docs and inner line docs are equivalent to attribute
doc
of the form#![doc = content]
, wherecontent
is a string literal form of the comment without the leading//!
,/*!
amd trailing*/
characters.2.5:17 Outer block docs and outer line docs are equivalent to attribute
doc
of the form#[doc = content]
, wherecontent
is a string literal form of the comment without the leading///
,/**
and trailing*/
characters.Examples