2. Lexical Elements

Note

The contents of this section are informational.

2:1 The text of a Rust program consists of modules organized into source files. The text of a source file is a sequence of lexical elements, each composed of characters, whose rules are presented in this chapter.

2.1. Character Set

2.1:1 The program text of a Rust program is written using the Unicode character set.

Syntax

2.1:2 A character is defined by this document for each cell in the coding space described by Unicode, regardless of whether or not Unicode allocates a character to that cell.

2.1:3 A whitespace character is one of the following characters:

  • 2.1:4 0x09 (horizontal tabulation)

  • 2.1:5 0x0A (new line)

  • 2.1:6 0x0B (vertical tabulation)

  • 2.1:7 0x0C (form feed)

  • 2.1:8 0x0D (carriage return)

  • 2.1:9 0x20 (space)

  • 2.1:10 0x85 (next line)

  • 2.1:11 0x200E (left-to-right mark)

  • 2.1:12 0x200F (right-to-left mark)

  • 2.1:13 0x2028 (line separator)

  • 2.1:14 0x2029 (paragraph separator)

2.1:15 A whitespace string is a string that consists of one or more whitespace characters.

2.1:16 An AsciiCharacter is any Unicode character in the range 0x00 - 0x7F, both inclusive.

Legality Rules

2.1:17 The coded representation of a character is tool-defined.

2.2. Lexical Elements, Separators, and Punctuation

Syntax

LexicalElement ::=
    Comment
  | Identifier
  | Keyword
  | Literal
  | Punctuation

Punctuation ::=
    Delimiter
  | +
  | -
  | *
  | /
  | %
  | ^
  | !
  | &
  | |
  | &&
  | ||
  | <<
  | >>
  | +=
  | -=
  | *=
  | /=
  | %=
  | ^=
  | &=
  | |=
  | <<=
  | >>=
  | =
  | ==
  | !=
  | >
  | <
  | >=
  | <=
  | @
  | _
  | .
  | ..
  | ...
  | ..=
  | ,
  | ;
  | :
  | ::
  | ->
  | =>
  | #
  | $
  | ?

Delimiter ::=
    {
  | }
  | [
  | ]
  | (
  | )

Legality Rules

2.2:1 The text of a source file is a sequence of separate lexical elements. The meaning of a program depends only on the particular sequence of lexical elements, excluding non-doc comments.

2.2:2 A lexical element is the most basic syntactic element in program text.

2.2:3 The text of a source file is divided into lines.

2.2:4 A line is a sequence of zero or more characters followed by an end of line.

2.2:5 The representation of an end of line is tool-defined.

2.2:6 A separator is a character or a string that separates adjacent lexical elements. A whitespace string is a separator.

2.2:7 A simple punctuator is one of the following special characters:

+
-
*
/
%
^
!
&
|
=
>
<
@
_
.
,
;
:
#
$
?
{
}
[
]
(
)

2.2:8 A compound punctuator is one of the following two or more adjacent special characters:

&&
||
<<
>>
+=
-=
*=
/=
%=
^=
&=
|=
<<=
>>=
==
!=
>=
<=
..
...
..=
::
->
=>

2.2:9 The following compound punctuators are flexible compound punctuators.

&&
||
<<
>>

2.2:10 A flexible compound punctuator may be treated as a single compound punctuator or two adjacent simple punctuators.

2.2:11 Each of the special characters listed for single character punctuator is a simple punctuator except if this character is used as a character of a compound punctuator, or a character of a character literal, a comment, a numeric literal, or a string literal.

2.2:12 The following names are used when referring to punctuators:

2.2:13

punctuator

name

2.2:14

+

Plus

2.2:15

-

Minus

2.2:16

*

Star

2.2:17

/

Slash

2.2:18

%

Percent

2.2:19

^

Caret

2.2:20

!

Not

2.2:21

&

And

2.2:22

|

Or

2.2:23

&&

And and, lazy boolean and

2.2:24

||

Or or, lazy boolean or

2.2:25

<<

Shift left

2.2:26

>>

Shift right

2.2:27

+=

Plus equals

2.2:28

-=

Minus equals

2.2:29

*=

Star equals

2.2:30

/=

Slash equals

2.2:31

%=

Percent equals

2.2:32

^=

Caret equals

2.2:33

&=

And equals

2.2:34

|=

Or equals

2.2:35

<<=

Shift left equals

2.2:36

>>=

Shift right equals

2.2:37

=

Equals

2.2:38

==

Equals equals, logical equality

2.2:39

!=

Not equals

2.2:40

>

Greater than

2.2:41

<

Less than

2.2:42

>=

Greater than equals, greater than or equal to

2.2:43

<=

Less than equals, less than or equal to

2.2:44

@

At

2.2:45

_

Underscore

2.2:46

.

Dot

2.2:47

..

Dot dot, exclusive range

2.2:48

...

Dot dot dot, ellipsis

2.2:49

..=

Dot dot equals, inclusive range

2.2:50

,

Comma

2.2:51

;

Semicolon

2.2:52

:

Colon

2.2:53

::

Colon colon, path separator

2.2:54

->

Right arrow

2.2:55

=>

Fat arrow, Hashrocket

2.2:56

#

Pound

2.2:57

$

Dollar sign

2.2:58

?

Question mark

2.2:59

{

Left curly brace

2.2:60

}

Right curly brace

2.2:61

[

Left square bracket

2.2:62

]

Right square bracket

2.2:63

(

Left parenthesis

2.2:64

)

Right parenthesis

2.3. Identifiers

Syntax

Identifier ::=
    NonKeywordIdentifier
  | RawIdentifier

IdentifierList ::=
    Identifier (, Identifier)* ,?

NonKeywordIdentifier ::=
    PureIdentifier
  | WeakKeyword

RawIdentifier ::=
    r# (PureIdentifier | RawIdentifierKeyword)

PureIdentifier ::=
    XID_Start XID_Continue*
  | _ XID_Continue+

IdentifierOrUnderscore ::=
    Identifier
  | _

Renaming ::=
    as IdentifierOrUnderscore

2.3:1 A RawIdentifierKeyword is any keyword in category Keyword, except crate, self, Self, and super.

2.3:2 XID_Start and XID_Continue are defined in Unicode Standard Annex #31.

Legality Rules

2.3:3 An identifier is a lexical element that refers to a name.

2.3:4 A pure identifier is an identifier that does not include weak keywords.

2.3:5 A pure identifier shall follow the specification in Unicode Standard Annex #31 for Unicode version 13.0, with the following profile:

  • 2.3:6 Start = XID_Start, plus character 0x5F (low line).

  • 2.3:7 Continue = XID_Continue

  • 2.3:8 Medial = empty

2.3:9 Characters 0x200C (zero width non-joiner) and 0x200D (zero width joiner) shall not appear in a pure identifier.

2.3:10 A pure identifier shall be restricted to characters in category AsciiCharacter in the following contexts:

2.3:16 Identifiers are normalized using Normalization Form C as defined in Unicode Standard Annex #15.

2.3:17 Two identifiers are considered the same if they consist of the same sequence of characters after performing normalization.

2.3:18 Declarative macros and procedural macros shall receive normalized identifiers in their input.

Examples

foo
_identifier
r#true
Москва
東京

2.4. Literals

Syntax

Literal ::=
    BooleanLiteral
  | ByteLiteral
  | ByteStringLiteral
  | CharacterLiteral
  | NumericLiteral
  | StringLiteral

Legality Rules

2.4:1 A literal is a fixed value in program text.

2.4.1. Byte Literals

Syntax

ByteLiteral ::=
    b' ByteContent '

ByteContent ::=
    ByteCharacter
  | ByteEscape

ByteEscape ::=
  | \0
  | \"
  | \'
  | \t
  | \n
  | \r
  | \\
  | \x OctalDigit HexadecimalDigit

2.4.1:1 A ByteCharacter is any character in category AsciiCharacter except characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage return), 0x27 (apostrophe), and 0x5C (reverse solidus).

Legality Rules

2.4.1:2 A byte literal is a literal that denotes a fixed byte value.

2.4.1:3 The type of a byte literal is u8.

Examples

b'h'
b'\n'
b'\x1B'

2.4.2. Byte String Literals

Syntax

ByteStringLiteral ::=
    RawByteStringLiteral
  | SimpleByteStringLiteral

Legality Rules

2.4.2:1 A byte string literal is a literal that consists of multiple AsciiCharacters.

2.4.2:2 The character sequence 0x0D 0xCA (carriage return, new line) is replaced by 0xCA (new line) inside of a byte string literal.

2.4.2.1. Simple Byte String Literals

Syntax

SimpleByteStringLiteral ::=
    b" SimpleByteStringContent* "

SimpleByteStringContent ::=
    ByteEscape
  | SimpleByteStringCharacter
  | StringContinuation

2.4.2.1:1 A SimpleByteStringCharacter is any character in category AsciiCharacter except characters 0x0D (carriage return), 0x22 (quotation mark), and 0x5C (reverse solidus).

Legality Rules

2.4.2.1:2 A simple byte string literal is a byte string literal that consists of multiple AsciiCharacters.

2.4.2.1:3 The type of a simple byte string literal of size N is &'static [u8; N].

Examples

b""
b"a\tb"
b"Multi\
line"

2.4.2.2. Raw Byte String Literals

Syntax

RawByteStringLiteral ::=
    br RawByteStringContent

RawByteStringContent ::=
    NestedRawByteStringContent
  | " AsciiCharacter* "

NestedRawByteStringContent ::=
    # RawByteStringContent #

Legality Rules

2.4.2.2:1 A raw byte string literal is a simple byte string literal that does not recognize escaped characters.

2.4.2.2:2 The type of a raw byte string literal of size N is &'static [u8; N].

Examples

br""
br#""#
br##"left #"# right"##

2.4.3. Numeric Literals

Syntax

NumericLiteral ::=
    FloatLiteral
  | IntegerLiteral

Legality Rules

2.4.3:1 A numeric literal is a literal that denotes a number.

2.4.3.1. Integer Literals

Syntax

IntegerLiteral ::=
    IntegerContent IntegerSuffix?

IntegerContent ::=
    BinaryLiteral
  | DecimalLiteral
  | HexadecimalLiteral
  | OctalLiteral

BinaryLiteral ::=
    0b BinaryDigitOrUnderscore* BinaryDigit BinaryDigitOrUnderscore*

BinaryDigitOrUnderscore ::=
    BinaryDigit
  | _

BinaryDigit ::=
    [0-1]

DecimalLiteral ::=
    DecimalDigit DecimalDigitOrUnderscore*

DecimalDigitOrUnderscore ::=
    DecimalDigit
  | _

DecimalDigit ::=
    [0-9]

HexadecimalLiteral ::=
    0x HexadecimalDigitOrUnderscore* HexadecimalDigit HexadecimalDigitOrUnderscore*

HexadecimalDigitOrUnderscore ::=
    HexadecimalDigit
  | _

HexadecimalDigit ::=
    [0-9 a-f A-F]

OctalLiteral ::=
    0o OctalDigitOrUnderscore* OctalDigit OctalDigitOrUnderscore*

OctalDigitOrUnderscore ::=
    OctalDigit
  | _

OctalDigit ::=
    [0-7]

IntegerSuffix ::=
    SignedIntegerSuffix
  | UnsignedIntegerSuffix

SignedIntegerSuffix ::=
    i8
  | i16
  | i32
  | i64
  | i128
  | isize

UnsignedIntegerSuffix ::=
    u8
  | u16
  | u32
  | u64
  | u128
  | usize

Legality Rules

2.4.3.1:1 An integer literal is a numeric literal that denotes a whole number.

2.4.3.1:2 A binary literal is an integer literal in base 2.

2.4.3.1:3 A decimal literal is an integer literal in base 10.

2.4.3.1:4 A hexadecimal literal is an integer literal in base 16.

2.4.3.1:5 An octal literal is an integer literal in base 8.

2.4.3.1:6 An integer suffix is a component of an integer literal that specifies an explicit integer type.

2.4.3.1:7 A suffixed integer is an integer literal with an integer suffix.

2.4.3.1:8 An unsuffixed integer is an integer literal without an integer suffix.

2.4.3.1:9 The type of a suffixed integer is determined by its integer suffix as follows:

  • 2.4.3.1:10 Suffix i8 specifies type i8.

  • 2.4.3.1:11 Suffix i16 specifies type i16.

  • 2.4.3.1:12 Suffix i32 specifies type i32.

  • 2.4.3.1:13 Suffix i64 specifies type i64.

  • 2.4.3.1:14 Suffix i128 specifies type i128.

  • 2.4.3.1:15 Suffix isize specifies type isize.

  • 2.4.3.1:16 Suffix u8 specifies type u8.

  • 2.4.3.1:17 Suffix u16 specifies type u16.

  • 2.4.3.1:18 Suffix u32 specifies type u32.

  • 2.4.3.1:19 Suffix u64 specifies type u64.

  • 2.4.3.1:20 Suffix u128 specifies type u128.

  • 2.4.3.1:21 Suffix usize specifies type usize.

2.4.3.1:22 The type of an unsuffixed integer is determined by type inference as follows:

  • 2.4.3.1:23 If an integer type can be uniquely determined from the surrounding program context, then the unsuffixed integer has that type.

  • 2.4.3.1:24 If the program context under-constrains the type, then the inferred type is i32.

  • 2.4.3.1:25 If the program context over-constrains the type, then this is considered a static error.

Examples

0b0010_1110_u8
1___2_3
0x4D8a
0o77_52i128

2.4.3.2. Float Literals

Syntax

FloatLiteral ::=
    DecimalLiteral .
  | DecimalLiteral FloatExponent
  | DecimalLiteral . DecimalLiteral FloatExponent?
  | DecimalLiteral (. DecimalLiteral)? FloatExponent? FloatSuffix

FloatExponent ::=
    ExponentLetter ExponentSign? ExponentMagnitude

ExponentLetter ::=
    e
  | E

ExponentSign ::=
    +
  | -

ExponentMagnitude ::=
    DecimalDigitOrUnderscore* DecimalDigit DecimalDigitOrUnderscore*

FloatSuffix ::=
    f32
  | f64

Legality Rules

2.4.3.2:1 A float literal is a numeric literal that denotes a fractional number.

2.4.3.2:2 A float suffix is a component of a float literal that specifies an explicit floating-point type.

2.4.3.2:3 A suffixed float is a float literal with a float suffix.

2.4.3.2:4 An unsuffixed float is a float literal without a float suffix.

2.4.3.2:5 The type of a suffixed float is determined by the float suffix as follows:

  • 2.4.3.2:6 Suffix f32 specifies type f32.

  • 2.4.3.2:7 Suffix f64 specifies type f64.

2.4.3.2:8 The type of an unsuffixed float is determined by type inference as follows:

  • 2.4.3.2:9 If a floating-point type can be uniquely determined from the surrounding program context, then the unsuffixed float has that type.

  • 2.4.3.2:10 If the program context under-constrains the type, then the inferred type is f64.

  • 2.4.3.2:11 If the program context over-constrains the type, then this is considered a static error.

Examples

45.
8E+1_820
3.14e5
8_031.4_e-12f64

2.4.4. Character Literals

Syntax

CharacterLiteral ::=
    ' CharacterContent '

CharacterContent ::=
    AsciiEscape
  | CharacterLiteralCharacter
  | UnicodeEscape

AsciiEscape ::=
  | \0
  | \"
  | \'
  | \t
  | \n
  | \r
  | \\
  | \x OctalDigit HexadecimalDigit

2.4.4:1 A CharacterLiteralCharacter is any Unicode character except characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage return), 0x27 (apostrophe), and 0x5c (reverse solidus).

2.4.4:2 A UnicodeEscape starts with a \u{ literal, followed by 1 to 6 instances of a HexadecimalDigit, inclusive, followed by a } character. It can represent any Unicode codepoint between U+00000 and U+10FFFF, inclusive, except Unicode surrogate codepoints, which exist between the range of U+D800 and U+DFFF, inclusive.

Legality Rules

2.4.4:3 A character literal is a literal that denotes a fixed Unicode character.

2.4.4:4 The type of a character literal is char.

Examples

'a'
'\t'
'\x1b'
'\u{1F30}'

2.4.5. String Literals

Syntax

StringLiteral ::=
    RawStringLiteral
  | SimpleStringLiteral

Legality Rules

2.4.5:1 A string literal is a literal that consists of multiple characters.

2.4.5:2 The character sequence 0x0D 0xCA (carriage return, new line) is replaced by 0xCA (new line) inside of a string literal.

2.4.5.1. Simple String Literals

Syntax

SimpleStringLiteral ::=
    " SimpleStringContent* "

SimpleStringContent ::=
    AsciiEscape
  | SimpleStringCharacter
  | StringContinuation
  | UnicodeEscape

2.4.5.1:1 A SimpleStringCharacter is any Unicode character except characters 0x0D (carriage return), 0x22 (quotation mark), and 0x5C (reverse solidus).

2.4.5.1:2 StringContinuation is the character sequence 0x5C 0x0A (reverse solidus, new line).

Legality Rules

2.4.5.1:3 A simple string literal is a string literal where the characters are Unicode characters.

2.4.5.1:4 The type of a simple string literal is &'static str.

Examples

""
"cat"
"\tcol\nrow"
"bell\x07"
"\uB80a"
"\
multi\
line\
string"

2.4.5.2. Raw String Literals

Syntax

RawStringLiteral ::=
    r RawStringContent

RawStringContent ::=
    NestedRawStringContent
  | " ~[\r]* "

NestedRawStringContent ::=
    # RawStringContent #

Legality Rules

2.4.5.2:1 A raw string literal is a simple string literal that does not recognize escaped characters.

2.4.5.2:2 The type of a raw string literal is &'static str.

Examples

r""
r#""#
r##"left #"# right"##

2.4.6. Boolean Literals

Syntax

BooleanLiteral ::=
    false
  | true

Legality Rules

2.4.6:1 A boolean literal is a literal that denotes the truth values of logic and Boolean algebra.

2.4.6:2 The type of a boolean literal is bool.

Examples

true

2.5. Comments

Syntax

Comment ::=
    BlockCommentOrDoc
  | LineCommentOrDoc

BlockCommentOrDoc ::=
    BlockComment
  | InnerBlockDoc
  | OuterBlockDoc

LineCommentOrDoc ::=
    LineComment
  | InnerLineDoc
  | OuterLineDoc

LineComment ::=
    //
  | // (~[! /] | //) ~[\n]*

BlockComment ::=
    /* (~[! *] | ** | BlockCommentOrDoc) (BlockCommentOrDoc | ~[*/])* */
  | /**/
  | /***/

InnerBlockDoc ::=
    /*! (BlockCommentOrDoc | ~[*/ \r])* */

InnerLineDoc ::=
    //! ~[\n \r]*

OuterBlockDoc ::=
    /** (~[*] | BlockCommentOrDoc) (BlockCommentOrDoc | ~[*/ \r])* */

OuterLineDoc ::=
    /// (~[/] ~[\n \r]*)?

Legality Rules

2.5:1 A comment is a lexical element that acts as an annotation or an explanation in program text.

2.5:2 A block comment is a comment that spans one or more lines.

2.5:3 A line comment is a comment that spans exactly one line.

2.5:4 An inner block doc is a block comment that applies to an enclosing non-comment construct.

2.5:5 An inner line doc is a line comment that applies to an enclosing non-comment construct.

2.5:6 An inner doc comment is either an inner block doc or an inner line doc.

2.5:7 An outer block doc is a block comment that applies to a subsequent non-comment construct.

2.5:8 An outer line doc is a line comment that applies to a subsequent non-comment construct.

2.5:9 An outer doc comment is either an outer block doc or an outer line doc.

2.5:10 A doc comment is a comment class that includes inner block docs, inner line docs, outer block docs, and outer line docs.

2.5:11 Character 0x0D (carriage return) shall not appear in a comment.

2.5:12 Block comments, inner block docs, and outer block docs shall extend one or more lines.

2.5:13 Line comments, inner line docs, and outer line docs shall extend exactly one line.

2.5:14 Outer block docs and outer line docs shall apply to a subsequent non-comment construct.

2.5:15 Inner block docs and inner line docs shall apply to an enclosing non-comment construct.

2.5:16 Inner block docs and inner line docs are equivalent to attribute doc of the form #![doc = content], where content is a string literal form of the comment without the leading //!, /*! amd trailing */ characters.

2.5:17 Outer block docs and outer line docs are equivalent to attribute doc of the form #[doc = content], where content is a string literal form of the comment without the leading ///, /** and trailing */ characters.

Examples

// This is a stand-alone line comment. So is the next line.

////

/* This is a stand-alone
   block comment. */

/*
  /* This is a nested block comment */
*/

/// This outer line comment applies to commented_module.

/** This outer block comment applies to commented_module,
    and is considered documentation. */

pub mod commented_module {

    //! This inner line comment applies to commented_mode.

    /*! This inner block comment applies to commented_module,
        and is considered documentation. */
}

2.6. Keywords

Syntax

Keyword ::=
    ReservedKeyword
  | StrictKeyword
  | WeakKeyword

Legality Rules

2.6:1 A keyword is a word in program text that has special meaning.

2.6:2 Keywords are case sensitive.

2.6.1. Strict Keywords

Syntax

StrictKeyword ::=
    as
  | async
  | await
  | break
  | const
  | continue
  | crate
  | dyn
  | enum
  | extern
  | false
  | fn
  | for
  | if
  | impl
  | in
  | let
  | loop
  | match
  | mod
  | move
  | mut
  | pub
  | ref
  | return
  | self
  | Self
  | static
  | struct
  | super
  | trait
  | true
  | type
  | unsafe
  | use
  | where
  | while

Legality Rules

2.6.1:1 A strict keyword is a keyword that always holds its special meaning.

2.6.2. Reserved Keywords

Syntax

ReservedKeyword ::=
    abstract
  | become
  | box
  | do
  | final
  | macro
  | override
  | priv
  | try
  | typeof
  | unsized
  | virtual
  | yield

Legality Rules

2.6.2:1 A reserved keyword is a keyword that is not yet in use.

2.6.3. Weak Keywords

Syntax

WeakKeyword ::=
    macro_rules
  | 'static
  | union

Legality Rules

2.6.3:1 A weak keyword is a keyword whose special meaning depends on the context.

2.6.3:2 Word macro_rules acts as a keyword only when used in the context of a MacroRulesDefinition.

2.6.3:3 Word 'static acts as a keyword only when used in the context of a LifetimeIndication.

2.6.3:4 Word union acts as a keyword only when used in the context of a UnionDeclaration.