Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lexer to skip utf-8 whitespaces #2339

Merged
merged 1 commit into from
Jun 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions gcc/rust/lex/rust-lex.cc
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,10 @@ Lexer::build_token ()
{
/* ignore whitespace characters for tokens but continue updating
* location */
case '\n': // newline
case '\n': // newline
case 0x0085: // next line
case 0x2028: // line separator
case 0x2029: // paragraph separator
current_line++;
current_column = 1;
// tell line_table that new line starts
Expand All @@ -432,10 +435,16 @@ Lexer::build_token ()
case ' ': // space
current_column++;
continue;
case '\t': // tab
case '\t': // horizontal tab
// width of a tab is not well-defined, assume 8 spaces
current_column += 8;
continue;
case '\v': // vertical tab
case 0x000c: // form feed
case 0x200e: // left-to-right mark
case 0x200f: // right-to-left mark
// Ignored.
continue;

// punctuation - actual tokens
case '=':
Expand Down
16 changes: 16 additions & 0 deletions gcc/testsuite/rust/compile/torture/utf8_whitespaces.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
fn main() {
// FORM FEED

// LINE TABULATION (vt)

// NEXT LINE (nel)

// LEFT-TO-RIGHT MARK
// RIGHT-TO-LEFT MARK
// LINE SEPARATOR

// PARAGRAPH SEPARATOR

}