Caution
You’re reading a draft of the Ferrocene Language Specification. Some parts of this document might be missing, incomplete or incorrect. Our aim is to have the specification ready by the end of 2022.
2. Lexical Elements¶
Note
The contents of this section are informational.
2:1 The text of a Rust program consists of modules organized into source files. The text of a source file is a sequence of lexical elements, each composed of characters, whose rules are presented in this chapter.
2.1. Character Set¶
2.1:1 The program text of a Rust program is written using the Unicode character set.
Syntax
2.1:2 A character is defined by this document for each cell in the coding space described by Unicode, regardless of whether or not Unicode allocates a character to that cell.
2.1:3 A whitespace character is one of the following characters:
2.1:4 0x09 (horizontal tabulation)
2.1:5 0x0A (new line)
2.1:6 0x0B (vertical tabulation)
2.1:7 0x0C (form feed)
2.1:8 0x0D (carriage return)
2.1:9 0x20 (space)
2.1:10 0x85 (next line)
2.1:11 0x200E (left-to-right mark)
2.1:12 0x200F (right-to-left mark)
2.1:13 0x2028 (line separator)
2.1:14 0x2029 (paragraph separator)
2.1:15 A whitespace string is a string that consists of one or more whitespace characters.
2.1:16
An AsciiCharacter
is any Unicode character in the range 0x00 - 0x7F, both inclusive.
Legality Rules
2.1:17 The coded representation of a character is tool-defined.
2.2. Lexical Elements, Separators, and Punctuation¶
Syntax
LexicalElement
::=Comment
|Identifier
|Keyword
|Literal
|Punctuation
Punctuation
::=Delimiter
| + | - | * | / | % | ^ | ! | & | | | && | || | << | >> | += | -= | *= | /= | %= | ^= | &= | |= | <<= | >>= | = | == | != | > | < | >= | <= | @ | _ | . | .. | ... | ..= | , | ; | : | :: | -> | => | # | $ | ?Delimiter
::= { | } | [ | ] | ( | )
Legality Rules
2.2:1 The text of a source file is a sequence of separate lexical elements. The meaning of a program depends only on the particular sequence of lexical elements, excluding non-doc comments.
2.2:2 A lexical element is the most basic syntactic element in program text.
2.2:3 The text of a source file is divided into lines.
2.2:4 A line is a sequence of zero or more characters followed by an end of line.
2.2:5 The representation of an end of line is tool-defined.
2.2:6 A separator is a character or a string that separates adjacent lexical elements. A whitespace string is a separator.
2.2:7 A simple punctuator is one of the following special characters:
+ - * / % ^ ! & | = > < @ _ . , ; : # $ ? { } [ ] ( )
2.2:8 A compound punctuator is one of the following two or more adjacent special characters:
&& || << >> += -= *= /= %= ^= &= |= <<= >>= == != >= <= .. ... ..= :: -> =>
2.2:9 The following compound punctuators are flexible compound punctuators.
&& || << >>
2.2:10 A flexible compound punctuator may be treated as a single compound punctuator or two adjacent simple punctuators.
2.2:11 Each of the special characters listed for single character punctuator is a simple punctuator except if this character is used as a character of a compound punctuator, or a character of a character literal, a comment, a numeric literal, or a string literal.
2.2:12 The following names are used when referring to punctuators:
2.2:13 |
punctuator |
name |
2.2:14 |
|
Plus |
2.2:15 |
|
Minus |
2.2:16 |
|
Star |
2.2:17 |
|
Slash |
2.2:18 |
|
Percent |
2.2:19 |
|
Caret |
2.2:20 |
|
Not |
2.2:21 |
|
And |
2.2:22 |
|
Or |
2.2:23 |
|
And and, lazy boolean and |
2.2:24 |
|
Or or, lazy boolean or |
2.2:25 |
|
Shift left |
2.2:26 |
|
Shift right |
2.2:27 |
|
Plus equals |
2.2:28 |
|
Minus equals |
2.2:29 |
|
Star equals |
2.2:30 |
|
Slash equals |
2.2:31 |
|
Percent equals |
2.2:32 |
|
Caret equals |
2.2:33 |
|
And equals |
2.2:34 |
|
Or equals |
2.2:35 |
|
Shift left equals |
2.2:36 |
|
Shift right equals |
2.2:37 |
|
Equals |
2.2:38 |
|
Equals equals |
2.2:39 |
|
Not equals |
2.2:40 |
|
Greater than |
2.2:41 |
|
Less than |
2.2:42 |
|
Greater than equals |
2.2:43 |
|
Less than equals |
2.2:44 |
|
At |
2.2:45 |
|
Underscore |
2.2:46 |
|
Dot |
2.2:47 |
|
Dot dot, exclusive range |
2.2:48 |
|
Dot dot dot, ellipsis |
2.2:49 |
|
Dot dot equals, inclusive range |
2.2:50 |
|
Comma |
2.2:51 |
|
Semicolon |
2.2:52 |
|
Colon |
2.2:53 |
|
Path separator |
2.2:54 |
|
Right arrow |
2.2:55 |
|
Fat arrow, Hashrocket |
2.2:56 |
|
Pound |
2.2:57 |
|
Dollar sign |
2.2:58 |
|
Question mark |
2.2:59 |
|
Left curly brace |
2.2:60 |
|
Right curly brace |
2.2:61 |
|
Left square bracket |
2.2:62 |
|
Right square bracket |
2.2:63 |
|
Left parenthesis |
2.2:64 |
|
Right parenthesis |
2.3. Identifiers¶
Syntax
Identifier
::=NonKeywordIdentifier
|RawIdentifier
IdentifierList
::=Identifier
(,Identifier
)* ,?NonKeywordIdentifier
::=PureIdentifier
|WeakKeyword
RawIdentifier
::= r# (PureIdentifier
|RawIdentifierKeyword
)PureIdentifier
::=XID_Start
XID_Continue
* | _XID_Continue
+IdentifierOrUnderscore
::=Identifier
| _Renaming
::= asIdentifierOrUnderscore
2.3:1
A RawIdentifierKeyword
is any keyword in category Keyword
,
except crate
, self
, Self
, and super
.
2.3:2
XID_Start
and XID_Continue
are defined in Unicode Standard Annex
#31.
Legality Rules
2.3:3 An identifier is a lexical element that refers to a name.
2.3:4 A pure identifier is an identifier that does not include weak keywords.
2.3:5 A pure identifier shall follow the specification in Unicode Standard Annex #31 for Unicode version 13.0, with the following profile:
2.3:6
Start
=XID_Start
, plus character 0x5F (low line).2.3:7
Continue
=XID_Continue
2.3:8
Medial
= empty
2.3:9 Characters 0x200C (zero width non-joiner) and 0x200D (zero width joiner) shall not appear in a pure identifier.
2.3:10
A pure identifier shall be restricted to characters in category
AsciiCharacter
in the following contexts:
2.3:11 Crate imports,
2.3:12 Names of external crates represented in a simple path, when the simple path starts with namespace qualifier
::
,2.3:13 Names of outline modules that lack attribute
path
,2.3:14 Names of items that are subject to attribute
no_mangle
,2.3:15 Names of items within external blocks.
2.3:16 Identifiers are normalized using Normalization Form C as defined in Unicode Standard Annex #15.
2.3:17 Two identifiers are considered the same if they consist of the same sequence of characters after performing normalization.
2.3:18 Declarative macros and procedural macros shall receive normalized identifiers in their input.
Examples
foo
_identifier
r#true
Москва
東京
2.4. Literals¶
Syntax
Literal
::=BooleanLiteral
|ByteLiteral
|ByteStringLiteral
|CharacterLiteral
|NumericLiteral
|StringLiteral
Legality Rules
2.4:1 A literal is a fixed value in program text.
2.4.1. Byte Literals¶
Syntax
ByteLiteral
::= b'ByteContent
'ByteContent
::=ByteCharacter
|ByteEscape
ByteEscape
::= | \0 | \" | \' | \t | \n | \r | \\ | \xOctalDigit
HexadecimalDigit
2.4.1:1
A ByteCharacter
is any character in category AsciiCharacter
except characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage
return), 0x27 (apostrophe), and 0x5C (reverse solidus).
Legality Rules
2.4.1:2 A byte literal is a literal that denotes a fixed byte value.
2.4.1:3
The type of a byte literal is u8
.
Examples
b'h'
b'\n'
b'\x1B'
2.4.2. Byte String Literals¶
Syntax
ByteStringLiteral
::=RawByteStringLiteral
|SimpleByteStringLiteral
Legality Rules
2.4.2:1
A byte string literal is a literal that consists of multiple
AsciiCharacters
.
2.4.2:2 The character sequence 0x0D 0xCA (carriage return, new line) is replaced by 0xCA (new line) inside of a byte string literal.
2.4.2.1. Simple Byte String Literals¶
Syntax
SimpleByteStringLiteral
::= b"SimpleByteStringContent
* "SimpleByteStringContent
::=ByteEscape
|SimpleByteStringCharacter
|StringContinuation
2.4.2.1:1
A SimpleByteStringCharacter
is any character in category AsciiCharacter
except characters 0x0D (carriage return), 0x22 (quotation mark), and 0x5C
(reverse solidus).
Legality Rules
2.4.2.1:2
A simple byte string literal is a byte string literal that consists of multiple
AsciiCharacters
.
2.4.2.1:3
The type of a simple byte string literal of size N
is &'static [u8;
N]
.
Examples
b""
b"a\tb"
b"Multi\
line"
2.4.2.2. Raw Byte String Literals¶
Syntax
RawByteStringLiteral
::= brRawByteStringContent
RawByteStringContent
::=NestedRawByteStringContent
| "AsciiCharacter
* "NestedRawByteStringContent
::= #RawByteStringContent
#
Legality Rules
2.4.2.2:1 A raw byte string literal is a simple byte string literal that does not recognize escaped characters.
2.4.2.2:2
The type of a raw byte string literal of size N
is &'static
[u8; N]
.
Examples
br""
br#""#
br##"left #"# right"##
2.4.3. Numeric Literals¶
Syntax
NumericLiteral
::=FloatLiteral
|IntegerLiteral
Legality Rules
2.4.3:1 A numeric literal is a literal that denotes a number.
2.4.3.1. Integer Literals¶
Syntax
IntegerLiteral
::=IntegerContent
IntegerSuffix
?IntegerContent
::=BinaryLiteral
|DecimalLiteral
|HexadecimalLiteral
|OctalLiteral
BinaryLiteral
::= 0bBinaryDigitOrUnderscore
*BinaryDigit
BinaryDigitOrUnderscore
*BinaryDigitOrUnderscore
::=BinaryDigit
| _BinaryDigit
::= [0-1]DecimalLiteral
::=DecimalDigit
DecimalDigitOrUnderscore
*DecimalDigitOrUnderscore
::=DecimalDigit
| _DecimalDigit
::= [0-9]HexadecimalLiteral
::= 0xHexadecimalDigitOrUnderscore
*HexadecimalDigit
HexadecimalDigitOrUnderscore
*HexadecimalDigitOrUnderscore
::=HexadecimalDigit
| _HexadecimalDigit
::= [0-9 a-f A-F]OctalLiteral
::= 0oOctalDigitOrUnderscore
*OctalDigit
OctalDigitOrUnderscore
*OctalDigitOrUnderscore
::=OctalDigit
| _OctalDigit
::= [0-7]IntegerSuffix
::=SignedIntegerSuffix
|UnsignedIntegerSuffix
SignedIntegerSuffix
::= i8 | i16 | i32 | i64 | i128 | isizeUnsignedIntegerSuffix
::= u8 | u16 | u32 | u64 | u128 | usize
Legality Rules
2.4.3.1:1 An integer literal is a numeric literal that denotes a whole number.
2.4.3.1:2 A binary literal is an integer literal in base 2.
2.4.3.1:3 A decimal literal is an integer literal in base 10.
2.4.3.1:4 A hexadecimal literal is an integer literal in base 16.
2.4.3.1:5 An octal literal is an integer literal in base 8.
2.4.3.1:6 An integer suffix is a component of an integer literal that specifies an explicit integer type.
2.4.3.1:7 A suffixed integer is an integer literal with an integer suffix.
2.4.3.1:8 An unsuffixed integer is an integer literal without an integer suffix.
2.4.3.1:9 The type of a suffixed integer is determined by its integer suffix as follows:
2.4.3.1:10 Suffix
i8
specifies typei8
.2.4.3.1:11 Suffix
i16
specifies typei16
.2.4.3.1:12 Suffix
i32
specifies typei32
.2.4.3.1:13 Suffix
i64
specifies typei64
.2.4.3.1:14 Suffix
i128
specifies typei128
.2.4.3.1:15 Suffix
isize
specifies typeisize
.2.4.3.1:16 Suffix
u8
specifies typeu8
.2.4.3.1:17 Suffix
u16
specifies typeu16
.2.4.3.1:18 Suffix
u32
specifies typeu32
.2.4.3.1:19 Suffix
u64
specifies typeu64
.2.4.3.1:20 Suffix
u128
specifies typeu128
.2.4.3.1:21 Suffix
usize
specifies typeusize
.
2.4.3.1:22 The type of an unsuffixed integer is determined by type inference as follows:
2.4.3.1:23 If an integer type can be uniquely determined from the surrounding program context, then the unsuffixed integer has that type.
2.4.3.1:24 If the program context under-constrains the type, then the inferred type is
i32
.2.4.3.1:25 If the program context over-constrains the type, then this is considered a static error.
Examples
0b0010_1110_u8
1___2_3
0x4D8a
0o77_52i128
2.4.3.2. Float Literals¶
Syntax
FloatLiteral
::=DecimalLiteral
. |DecimalLiteral
FloatExponent
|DecimalLiteral
.DecimalLiteral
FloatExponent
? |DecimalLiteral
(.DecimalLiteral
)?FloatExponent
?FloatSuffix
FloatExponent
::=ExponentLetter
ExponentSign
?ExponentMagnitude
ExponentLetter
::= e | EExponentSign
::= + | -ExponentMagnitude
::=DecimalDigitOrUnderscore
*DecimalDigit
DecimalDigitOrUnderscore
*FloatSuffix
::= f32 | f64
Legality Rules
2.4.3.2:1 A float literal is a numeric literal that denotes a fractional number.
2.4.3.2:2 A float suffix is a component of a float literal that specifies an explicit floating-point type.
2.4.3.2:3 A suffixed float is a float literal with a float suffix.
2.4.3.2:4 An unsuffixed float is a float literal without a float suffix.
2.4.3.2:5 The type of a suffixed float is determined by the float suffix as follows:
2.4.3.2:8 The type of an unsuffixed float is determined by type inference as follows:
2.4.3.2:9 If a floating-point type can be uniquely determined from the surrounding program context, then the unsuffixed float has that type.
2.4.3.2:10 If the program context under-constrains the type, then the inferred type is
f64
.2.4.3.2:11 If the program context over-constrains the type, then this is considered a static error.
Examples
45.
8E+1_820
3.14e5
8_031.4_e-12f64
2.4.4. Character Literals¶
Syntax
CharacterLiteral
::= 'CharacterContent
'CharacterContent
::=AsciiEscape
|CharacterLiteralCharacter
|UnicodeEscape
AsciiEscape
::= | \0 | \" | \' | \t | \n | \r | \\ | \xOctalDigit
HexadecimalDigit
UnicodeEscape
::= \u{ (HexadecimalDigit
_*)1-6 }
2.4.4:1
A CharacterLiteralCharacter
is any Unicode character except
characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage
return), 0x27 (apostrophe), and 0x5c (reverse solidus).
Legality Rules
2.4.4:2 A character literal is a literal that denotes a fixed Unicode character.
2.4.4:3
The type of a character literal is char
.
Examples
'a'
'\t'
'\x1b'
'\u{1F30}'
2.4.5. String Literals¶
Syntax
StringLiteral
::=RawStringLiteral
|SimpleStringLiteral
Legality Rules
2.4.5:1 A string literal is a literal that consists of multiple characters.
2.4.5:2 The character sequence 0x0D 0xCA (carriage return, new line) is replaced by 0xCA (new line) inside of a string literal.
2.4.5.1. Simple String Literals¶
Syntax
SimpleStringLiteral
::= "SimpleStringContent
* "SimpleStringContent
::=AsciiEscape
|SimpleStringCharacter
|StringContinuation
|UnicodeEscape
2.4.5.1:1
A SimpleStringCharacter
is any Unicode character except characters
0x0D (carriage return), 0x22 (quotation mark), and 0x5C (reverse solidus).
2.4.5.1:2
StringContinuation
is the character sequence 0x5C 0x0A (reverse solidus,
new line).
Legality Rules
2.4.5.1:3 A simple string literal is a string literal where the characters are Unicode characters.
2.4.5.1:4
The type of a simple string literal is &'static str
.
Examples
""
"cat"
"\tcol\nrow"
"bell\x07"
"\uB80a"
"\
multi\
line\
string"
2.4.5.2. Raw String Literals¶
Syntax
RawStringLiteral
::= rRawStringContent
RawStringContent
::=NestedRawStringContent
| " ~[\r]* "NestedRawStringContent
::= #RawStringContent
#
Legality Rules
2.4.5.2:1 A raw string literal is a simple string literal that does not recognize escaped characters.
2.4.5.2:2
The type of a raw string literal is &'static str
.
Examples
r""
r#""#
r##"left #"# right"##
2.4.6. Boolean Literals¶
Syntax
BooleanLiteral
::=
false
| true
Legality Rules
2.4.6:1 A boolean literal is a literal that denotes the truth values of logic and Boolean algebra.
2.4.6:2
The type of a boolean literal is bool
.
Examples
true
2.6. Keywords¶
Syntax
Keyword
::=ReservedKeyword
|StrictKeyword
|WeakKeyword
Legality Rules
2.6:1 A keyword is a word in program text that has special meaning.
2.6:2 Keywords are case sensitive.
2.6.1. Strict Keywords¶
Syntax
StrictKeyword
::=
as
| async
| await
| break
| const
| continue
| crate
| dyn
| enum
| extern
| false
| fn
| for
| if
| impl
| in
| let
| loop
| match
| mod
| move
| mut
| pub
| ref
| return
| self
| Self
| static
| struct
| super
| trait
| true
| type
| unsafe
| use
| where
| while
Legality Rules
2.6.1:1 A strict keyword is a keyword that always holds its special meaning.
2.6.2. Reserved Keywords¶
Syntax
ReservedKeyword
::=
abstract
| become
| box
| do
| final
| macro
| override
| priv
| try
| typeof
| unsized
| virtual
| yield
Legality Rules
2.6.2:1 A reserved keyword is a keyword that is not yet in use.
2.6.3. Weak Keywords¶
Syntax
WeakKeyword
::=
macro_rules
| 'static
| union
Legality Rules
2.6.3:1 A weak keyword is a keyword whose special meaning depends on the context.
2.6.3:2
Word macro_rules
acts as a keyword only when used in the context of a
MacroRulesDefinition
.
2.6.3:3
Word 'static
acts as a keyword only when used in the context of a
LifetimeIndication
.
2.6.3:4
Word union
acts as a keyword only when used in the context of a
UnionDeclaration
.
2.5. Comments¶
Syntax
Legality Rules
2.5:1 A comment is a lexical element that acts as an annotation or an explanation in program text.
2.5:2 A block comment is a comment that spans one or more lines.
2.5:3 A line comment is a comment that spans exactly one line.
2.5:4 An inner block doc is a block comment that applies to an enclosing non-comment construct.
2.5:5 An inner line doc is a line comment that applies to an enclosing non-comment construct.
2.5:6 An inner doc comment is either an inner block doc or an inner line doc.
2.5:7 An outer block doc is a block comment that applies to a subsequent non-comment construct.
2.5:8 An outer line doc is a line comment that applies to a subsequent non-comment construct.
2.5:9 An outer doc comment is either an outer block doc or an outer line doc.
2.5:10 A doc comment is a comment class that includes inner block docs, inner line docs, outer block docs, and outer line docs.
2.5:11 Character 0x0D (carriage return) shall not appear in a comment.
2.5:12 Block comments, inner block docs, and outer block docs shall extend one or more lines.
2.5:13 Line comments, inner line docs, and outer line docs shall extend exactly one line.
2.5:14 Outer block docs and outer line docs shall apply to a subsequent non-comment construct.
2.5:15 Inner block docs and inner line docs shall apply to an enclosing non-comment construct.
2.5:16 Inner block docs and inner line docs are equivalent to attribute
doc
of the form#![doc = content]
, wherecontent
is a string literal form of the comment without the leading//!
,/*!
amd trailing*/
characters.2.5:17 Outer block docs and outer line docs are equivalent to attribute
doc
of the form#[doc = content]
, wherecontent
is a string literal form of the comment without the leading///
,/**
and trailing*/
characters.Examples