Caution

You’re reading a draft of the Ferrocene Language Specification. Some parts of this document might be missing, incomplete or incorrect. Our aim is to have the specification ready by the end of 2022.

2. Lexical Elements

Note

The contents of this section are informational.

2:1 The text of a Rust program consists of modules organized into source files. The text of a source file is a sequence of lexical elements, each composed of characters, whose rules are presented in this chapter.

2.1. Character Set

2.1:1 The program text of a Rust program is written using the Unicode character set.

Syntax

2.1:2 A character is defined by this document for each cell in the coding space described by Unicode, regardless of whether or not Unicode allocates a character to that cell.

2.1:3 A whitespace character is one of the following characters:

  • 2.1:4 0x09 (horizontal tabulation)

  • 2.1:5 0x0A (new line)

  • 2.1:6 0x0B (vertical tabulation)

  • 2.1:7 0x0C (form feed)

  • 2.1:8 0x0D (carriage return)

  • 2.1:9 0x20 (space)

  • 2.1:10 0x85 (next line)

  • 2.1:11 0x200E (left-to-right mark)

  • 2.1:12 0x200F (right-to-left mark)

  • 2.1:13 0x2028 (line separator)

  • 2.1:14 0x2029 (paragraph separator)

2.1:15 A whitespace string is a string that consists of one or more whitespace characters.

2.1:16 An AsciiCharacter is any Unicode character in the range 0x00 - 0x7F, both inclusive.

Legality Rules

2.1:17 The coded representation of a character is tool-defined.

2.2. Lexical Elements, Separators, and Punctuation

Syntax

LexicalElement ::=
    Comment
  | Identifier
  | Keyword
  | Literal
  | Punctuation

Punctuation ::=
    Delimiter
  | +
  | -
  | *
  | /
  | %
  | ^
  | !
  | &
  | |
  | &&
  | ||
  | <<
  | >>
  | +=
  | -=
  | *=
  | /=
  | %=
  | ^=
  | &=
  | |=
  | <<=
  | >>=
  | =
  | ==
  | !=
  | >
  | <
  | >=
  | <=
  | @
  | _
  | .
  | ..
  | ...
  | ..=
  | ,
  | ;
  | :
  | ::
  | ->
  | =>
  | #
  | $
  | ?

Delimiter ::=
    {
  | }
  | [
  | ]
  | (
  | )

Legality Rules

2.2:1 The text of a source file is a sequence of separate lexical elements. The meaning of a program depends only on the particular sequence of lexical elements, excluding non-doc comments.

2.2:2 A lexical element is the most basic syntactic element in program text.

2.2:3 The text of a source file is divided into lines.

2.2:4 A line is a sequence of zero or more characters followed by an end of line.

2.2:5 The representation of an end of line is tool-defined.

2.2:6 A separator is a character or a string that separates adjacent lexical elements. A whitespace string is a separator.

2.2:7 A simple punctuator is one of the following characters:

+
-
*
/
%
^
!
&
|
=
>
<
@
_
.
,
;
:
#
$
?
{
}
[
]
(
)

2.2:8 A compound punctuator is one of the following two or more adjacent special characters:

&&
||
<<
>>
+=
-=
*=
/=
%=
^=
&=
|=
<<=
>>=
==
!=
>=
<=
..
...
..=
::
->
=>

2.2:9 The following compound punctuators are flexible compound punctuators.

&&
||
<<
>>

2.2:10 A flexible compound punctuator may be treated as a single compound punctuator or two adjacent simple punctuators.

2.2:11 Each of the special characters listed for single character punctuator is a simple punctuator except if this character is used as a character of a compound punctuator, or a character of a character literal, a comment, a numeric literal, or a string literal.

2.2:12 The following names are used when referring to punctuators:

2.2:13

punctuator

name

2.2:14

+

Plus

2.2:15

-

Minus

2.2:16

*

Star

2.2:17

/

Slash

2.2:18

%

Percent

2.2:19

^

Caret

2.2:20

!

Not

2.2:21

&

And

2.2:22

|

Or

2.2:23

&&

And and, lazy boolean and

2.2:24

||

Or or, lazy boolean or

2.2:25

<<

Shift left

2.2:26

>>

Shift right

2.2:27

+=

Plus equals

2.2:28

-=

Minus equals

2.2:29

*=

Star equals

2.2:30

/=

Slash equals

2.2:31

%=

Percent equals

2.2:32

^=

Caret equals

2.2:33

&=

And equals

2.2:34

|=

Or equals

2.2:35

<<=

Shift left equals

2.2:36

>>=

Shift right equals

2.2:37

=

Equals

2.2:38

==

Equals equals

2.2:39

!=

Not equals

2.2:40

>

Greater than

2.2:41

<

Less than

2.2:42

>=

Greater than equals

2.2:43

<=

Less than equals

2.2:44

@

At

2.2:45

_

Underscore

2.2:46

.

Dot

2.2:47

..

Dot dot, exclusive range

2.2:48

...

Dot dot dot, ellipsis

2.2:49

..=

Dot dot equals, inclusive range

2.2:50

,

Comma

2.2:51

;

Semicolon

2.2:52

:

Colon

2.2:53

::

Path separator

2.2:54

->

Right arrow

2.2:55

=>

Fat arrow, Hashrocket

2.2:56

#

Pound

2.2:57

$

Dollar sign

2.2:58

?

Question mark

2.2:59

{

Left curly brace

2.2:60

}

Right curly brace

2.2:61

[

Left square bracket

2.2:62

]

Right square bracket

2.2:63

(

Left parenthesis

2.2:64

)

Right parenthesis

2.3. Identifiers

Syntax

Identifier ::=
    NonKeywordIdentifier
  | RawIdentifier

IdentifierList ::=
    Identifier (, Identifier)* ,?

NonKeywordIdentifier ::=
    PureIdentifier
  | WeakKeyword

RawIdentifier ::=
    r# (PureIdentifier | RawIdentifierKeyword)

PureIdentifier ::=
    XID_Start XID_Continue*
  | _ XID_Continue+

IdentifierOrUnderscore ::=
    Identifier
  | _

Renaming ::=
    as IdentifierOrUnderscore

2.3:1 A RawIdentifierKeyword is any keyword in category Keyword, except crate, self, Self, and super.

2.3:2 XID_Start and XID_Continue are defined in Unicode Standard Annex #31.

Legality Rules

2.3:3 An identifier is a lexical element that refers to a name.

2.3:4 A pure identifier is an identifier that does not include weak keywords.

2.3:5 A pure identifier shall follow the specification in Unicode Standard Annex #31 for Unicode version 13.0, with the following profile:

  • 2.3:6 Start = XID_Start, plus character 0x5F (low line).

  • 2.3:7 Continue = XID_Continue

  • 2.3:8 Medial = empty

2.3:9 Characters 0x200C (zero width non-joiner) and 0x200D (zero width joiner) shall not appear in a pure identifier.

2.3:10 A pure identifier shall be restricted to characters in category AsciiCharacter in the following contexts:

2.3:16 Identifiers are normalized using Normalization Form C as defined in Unicode Standard Annex #15.

2.3:17 Two identifiers are considered the same if they consist of the same sequence of characters after performing normalization.

2.3:18 Procedural macros and declarative macros shall receive normalized identifiers in their input.

Examples

foo
_identifier
r#true
Москва
東京

2.4. Literals

Syntax

Literal ::=
    BooleanLiteral
  | ByteLiteral
  | ByteStringLiteral
  | CharacterLiteral
  | NumericLiteral
  | StringLiteral

Legality Rules

2.4:1 A literal is a fixed value in program text.

2.4.1. Byte Literals

Syntax

ByteLiteral ::=
    b' ByteContent '

ByteContent ::=
    ByteCharacter
  | ByteEscape

ByteEscape ::=
  | \0
  | \"
  | \'
  | \t
  | \n
  | \r
  | \\
  | \x OctalDigit HexadecimalDigit

2.4.1:1 A ByteCharacter is any character in category AsciiCharacter except characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage return), 0x27 (apostrophe), and 0x5C (reverse solidus).

Legality Rules

2.4.1:2 A byte literal is a literal that denotes a fixed byte value.

2.4.1:3 The type of a byte literal is u8.

Examples

b'h'
b'\n'
b'\x1B'

2.4.2. Byte String Literals

Syntax

ByteStringLiteral ::=
    RawByteStringLiteral
  | SimpleByteStringLiteral

Legality Rules

2.4.2:1 A byte string literal is a literal that consists of multiple AsciiCharacters.

2.4.2:2 The character sequence 0x0D 0xCA (carriage return, new line) is replaced by 0xCA (new line) inside of a byte string literal

2.4.2.1. Simple Byte String Literals

Syntax

SimpleByteStringLiteral ::=
    b" SimpleByteStringContent* "

SimpleByteStringContent ::=
    ByteEscape
  | SimpleByteStringCharacter
  | StringContinuation

2.4.2.1:1 A SimpleByteStringCharacter is any character in category AsciiCharacter except characters 0x0D (carriage return), 0x22 (quotation mark), and 0x5C (reverse solidus).

Legality Rules

2.4.2.1:2 A simple byte string literal is a byte string literal that consists of multiple AsciiCharacters.

2.4.2.1:3 The type of a simple byte string literal of size N is &'static [u8; N].

Examples

b""
b"a\tb"
b"Multi\
line"

2.4.2.2. Raw Byte String Literals

Syntax

RawByteStringLiteral ::=
    br RawByteStringContent

RawByteStringContent ::=
    NestedRawByteStringContent
  | " AsciiCharacter* "

NestedRawByteStringContent ::=
    # RawByteStringContent #

Legality Rules

2.4.2.2:1 A raw byte string literal is a simple byte string literal that does not recognize escaped characters.

2.4.2.2:2 The type of a raw byte string literal of size N is &'static [u8; N].

Examples

br""
br#""#
br##"left #"# right"##

2.4.3. Numeric Literals

Syntax

NumericLiteral ::=
    FloatLiteral
  | IntegerLiteral

Legality Rules

2.4.3:1 A numeric literal is a literal that denotes a number.

2.4.3.1. Integer Literals

Syntax

IntegerLiteral ::=
    IntegerContent IntegerSuffix?

IntegerContent ::=
    BinaryLiteral
  | DecimalLiteral
  | HexadecimalLiteral
  | OctalLiteral

BinaryLiteral ::=
    0b BinaryDigitOrUnderscore* BinaryDigit BinaryDigitOrUnderscore*

BinaryDigitOrUnderscore ::=
    BinaryDigit
  | _

BinaryDigit ::=
    [0-1]

DecimalLiteral ::=
    DecimalDigit DecimalDigitOrUnderscore*

DecimalDigitOrUnderscore ::=
    DecimalDigit
  | _

DecimalDigit ::=
    [0-9]

HexadecimalLiteral ::=
    0x HexadecimalDigitOrUnderscore* HexadecimalDigit HexadecimalDigitOrUnderscore*

HexadecimalDigitOrUnderscore ::=
    HexadecimalDigit
  | _

HexadecimalDigit ::=
    [0-9 a-f A-F]

OctalLiteral ::=
    0o OctalDigitOrUnderscore* OctalDigit OctalDigitOrUnderscore*

OctalDigitOrUnderscore ::=
    OctalDigit
  | _

OctalDigit ::=
    [0-7]

IntegerSuffix ::=
    SignedIntegerSuffix
  | UnsignedIntegerSuffix

SignedIntegerSuffix ::=
    i8
  | i16
  | i32
  | i64
  | i128
  | isize

UnsignedIntegerSuffix ::=
    u8
  | u16
  | u32
  | u64
  | u128
  | usize

Legality Rules

2.4.3.1:1 An integer literal is a numeric literal that denotes a whole number.

2.4.3.1:2 A binary literal is an integer literal in base 2.

2.4.3.1:3 A decimal literal is an integer literal in base 10.

2.4.3.1:4 A hexadecimal literal is an integer literal in base 16.

2.4.3.1:5 An octal literal is an integer literal in base 8.

2.4.3.1:6 An integer suffix is a component of an integer literal that specifies an explicit integer type.

2.4.3.1:7 A suffixed integer is an integer literal with an integer suffix.

2.4.3.1:8 An unsuffixed integer is an integer literal without an integer suffix.

2.4.3.1:9 The type of a suffixed integer is determined by its integer suffix as follows:

  • 2.4.3.1:10 Suffix i8 specifies type i8.

  • 2.4.3.1:11 Suffix i16 specifies type i16.

  • 2.4.3.1:12 Suffix i32 specifies type i32.

  • 2.4.3.1:13 Suffix i64 specifies type i64.

  • 2.4.3.1:14 Suffix i128 specifies type i128.

  • 2.4.3.1:15 Suffix isize specifies type isize.

  • 2.4.3.1:16 Suffix u8 specifies type u8.

  • 2.4.3.1:17 Suffix u16 specifies type u16.

  • 2.4.3.1:18 Suffix u32 specifies type u32.

  • 2.4.3.1:19 Suffix u64 specifies type u64.

  • 2.4.3.1:20 Suffix u128 specifies type u128.

  • 2.4.3.1:21 Suffix usize specifies type usize.

2.4.3.1:22 The type of an unsuffixed integer is determined by type inference as follows:

  • 2.4.3.1:23 If an integer type can be uniquely determined from the surrounding program context, then the unsuffixed integer has that type.

  • 2.4.3.1:24 If the program context under-constrains the type, then the inferred type is i32.

  • 2.4.3.1:25 If the program context over-constrains the type, then this is considered a static type error.

Examples

0b0010_1110_u8
1___2_3
0x4D8a
0o77_52i128

2.4.3.2. Float Literals

Syntax

FloatLiteral ::=
    DecimalLiteral .
  | DecimalLiteral FloatExponent
  | DecimalLiteral . DecimalLiteral FloatExponent?
  | DecimalLiteral (. DecimalLiteral)? FloatExponent? FloatSuffix

FloatExponent ::=
    ExponentLetter ExponentSign? ExponentMagnitude

ExponentLetter ::=
    e
  | E

ExponentSign ::=
    +
  | -

ExponentMagnitude ::=
    DecimalDigitOrUnderscore* DecimalDigit DecimalDigitOrUnderscore*

FloatSuffix ::=
    f32
  | f64

Legality Rules

2.4.3.2:1 A float literal is a numeric literal that denotes a fractional number.

2.4.3.2:2 A float suffix is a component of a float literal that specifies an explicit floating-point type.

2.4.3.2:3 A suffixed float is a float literal with a float suffix.

2.4.3.2:4 An unsuffixed float is a float literal without a float suffix.

2.4.3.2:5 The type of a suffixed float is determined by the float suffix as follows:

  • 2.4.3.2:6 Suffix f32 specifies type f32.

  • 2.4.3.2:7 Suffix f64 specifies type f64.

2.4.3.2:8 The type of an unsuffixed float is determined by type inference as follows:

  • 2.4.3.2:9 If a floating-point type can be uniquely determined from the surrounding program context, then the unsuffixed float has that type.

  • 2.4.3.2:10 If the program context under-constrains the type, then the inferred type is f64.

  • 2.4.3.2:11 If the program context over-constrains the type, then this is considered a static type error.

Examples

45.
8E+1_820
3.14e5
8_031.4_e-12f64

2.4.4. Character Literals

Syntax

CharacterLiteral ::=
    ' CharacterContent '

CharacterContent ::=
    AsciiEscape
  | CharacterLiteralCharacter
  | UnicodeEscape

AsciiEscape ::=
  | \0
  | \"
  | \'
  | \t
  | \n
  | \r
  | \\
  | \x OctalDigit HexadecimalDigit

UnicodeEscape ::=
    \u{ (HexadecimalDigit _*)1-6 }

2.4.4:1 A CharacterLiteralCharacter is any Unicode character except characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage return), 0x27 (apostrophe), and 0x5c (reverse solidus).

Legality Rules

2.4.4:2 A character literal is a literal that denotes a fixed Unicode character.

2.4.4:3 The type of a character literal is char.

Examples

'a'
'\t'
'\x1b'
'\u{1F30}'

2.4.5. String Literals

Syntax

StringLiteral ::=
    RawStringLiteral
  | SimpleStringLiteral

Legality Rules

2.4.5:1 A string literal is a literal that consists of multiple characters.

2.4.5:2 The character sequence 0x0D 0xCA (carriage return, new line) is replaced by 0xCA (new line) inside of a string literal

2.4.5.1. Simple String Literals

Syntax

SimpleStringLiteral ::=
    " SimpleStringContent* "

SimpleStringContent ::=
    AsciiEscape
  | SimpleStringCharacter
  | StringContinuation
  | UnicodeEscape

2.4.5.1:1 A SimpleStringCharacter is any Unicode character except characters 0x0D (carriage return), 0x22 (quotation mark), and 0x5C (reverse solidus).

2.4.5.1:2 StringContinuation is the character sequence 0x5C 0x0A (reverse solidus, new line).

Legality Rules

2.4.5.1:3 A simple string literal is a string literal where the characters are Unicode characters.

2.4.5.1:4 The type of a simple string literal is &'static str.

Examples

""
"cat"
"\tcol\nrow"
"bell\x07"
"\uB80a"
"\
multi\
line\
string"

2.4.5.2. Raw String Literals

Syntax

RawStringLiteral ::=
    r RawStringContent

RawStringContent ::=
    NestedRawStringContent
  | " ~[\r]* "

NestedRawStringContent ::=
    # RawStringContent #

Legality Rules

2.4.5.2:1 A raw string literal is a simple string literal that does not recognize escaped characters.

2.4.5.2:2 The type of a raw string literal is &'static str.

Examples

r""
r#""#
r##"left #"# right"##

2.4.6. Boolean Literals

Syntax

BooleanLiteral ::=
    false
  | true

Legality Rules

2.4.6:1 A boolean literal is a literal that denotes the truth values of logic and Boolean algebra.

2.4.6:2 The type of a boolean literal is bool.

Examples

true

2.5. Comments

Syntax

Comment ::=
    BlockCommentOrDoc
  | LineCommentOrDoc

BlockCommentOrDoc ::=
    BlockComment
  | InnerBlockDoc
  | OuterBlockDoc

LineCommentOrDoc ::=
    LineComment
  | InnerLineDoc
  | OuterLineDoc

LineComment ::=
    //
  | // (~[! /] | //) ~[\n]*

BlockComment ::=
    /* (~[! *] | ** | BlockCommentOrDoc) (BlockCommentOrDoc | ~[*/])* */
  | /**/
  | /***/

InnerBlockDoc ::=
    /*! (BlockCommentOrDoc | ~[*/ \r])* */

InnerLineDoc ::=
    //! ~[\n \r]*

OuterBlockDoc ::=
    /** (~[*] | BlockCommentOrDoc) (BlockCommentOrDoc | ~[*/ \r])* */

OuterLineDoc ::=
    /// (~[/] ~[\n \r]*)?

Legality Rules

2.5:1 A comment is a lexical element that acts as an annotation or an explanation in program text.

2.5:2 A block comment is a comment that spans one or more lines.

2.5:3 A line comment is a comment that spans exactly one line.

2.5:4 An inner block doc is a block comment that applies to an enclosing non-comment construct.

2.5:5 An inner line doc is a line comment that applies to an enclosing non-comment construct.

2.5:6 An inner doc comment is either an inner block doc or an inner line doc.

2.5:7 An outer block doc is a block comment that applies to a subsequent non-comment construct.

2.5:8 An outer line doc is a line comment that applies to a subsequent non-comment construct.

2.5:9 An outer doc comment is either an outer block doc or an outer line doc.

2.5:10 A doc comment is a comment class that includes inner block docs, inner line docs, outer block docs, and outer line docs.

2.5:11 Character 0x0D (carriage return) shall not appear in a comment.

2.5:12 Block comments, inner block docs, and outer block docs shall extend one or more lines.

2.5:13 Line comments, inner line docs, and outer line docs shall extend exactly one line.

2.5:14 Outer block docs and outer line docs shall apply to a subsequent non-comment construct.

2.5:15 Inner block docs and inner line docs shall apply to an enclosing non-comment construct.

2.5:16 Inner block docs and inner line docs are equivalent to attribute doc of the form #![doc = content], where content is a string literal form of the comment without the leading //!, /*! amd trailing */ characters.

2.5:17 Outer block docs and outer line docs are equivalent to attribute doc of the form #[doc = content], where content is a string literal form of the comment without the leading ///, /** and trailing */ characters.

Examples

// This is a stand-alone line comment. So is the next line.

////

/* This is a stand-alone
   block comment. */

/*
  /* This is a nested block comment */
*/

/// This outer line comment applies to commented_module.

/** This outer block comment applies to commented_module,
    and is considered documentation. */

pub mod commented_module {

    //! This inner line comment applies to commented_mode.

    /*! This inner block comment applies to commented_module,
        and is considered documentation. */
}

2.6. Keywords

Syntax

Keyword ::=
    ReservedKeyword
  | StrictKeyword
  | WeakKeyword

Legality Rules

2.6:1 A keyword is a word in program text that has special meaning.

2.6:2 Keywords are case sensitive.

2.6.1. Strict Keywords

Syntax

StrictKeyword ::=
    as
  | async
  | await
  | break
  | const
  | continue
  | crate
  | dyn
  | enum
  | extern
  | false
  | fn
  | for
  | if
  | impl
  | in
  | let
  | loop
  | match
  | mod
  | move
  | mut
  | pub
  | ref
  | return
  | self
  | Self
  | static
  | struct
  | super
  | trait
  | true
  | type
  | unsafe
  | use
  | where
  | while

Legality Rules

2.6.1:1 A strict keyword is a keyword that always holds its special meaning.

2.6.2. Reserved Keywords

Syntax

ReservedKeyword ::=
    abstract
  | become
  | box
  | do
  | final
  | macro
  | override
  | priv
  | try
  | typeof
  | unsized
  | virtual
  | yield

Legality Rules

2.6.2:1 A reserved keyword is a keyword that is not yet in use.

2.6.3. Weak Keywords

Syntax

WeakKeyword ::=
    macro_rules
  | 'static
  | union

Legality Rules

2.6.3:1 A weak keyword is a keyword whose special meaning depends on the context.

2.6.3:2 Word macro_rules acts as a keyword only when used in the context of a MacroRulesDefinition.

2.6.3:3 Word 'static acts as a keyword only when used in the context of a LifetimeIndication.

2.6.3:4 Word union acts as a keyword only when used in the context of a UnionDeclaration.