Understanding Chrome V8 — Chapter 4: Scanner, Token generation

Written by huidou | Published 2022/08/14
Tech Story Tags: javascript | chrome-v8 | front-end-development | security | javascript-development | javascript-tutorial | understanding-chrome-v8 | javascript-fundamentals

TLDRV8_INLINE Token::Scanner::ScanSingleToken() is the start of scanning the token. We already know that c0_ has already pointed to the first character of the JS stream during initialization. The next step is to analyze whether the token is a keyword or an identifier. Figure 1 is the call stack of these methods, Figure 2 is the variable window, we can see that the one_byte_char is c0. Figure 3 shows that these eight characters are exactly the first string — — — function JsPrint.via the TL;DR App

1. Test Case

Note: The test case is very simple, so it can’t cover the full flow of the scanner.

2. Token: function, JsPrint

The V8_INLINE Token::Value Scanner::ScanSingleToken() is the start of scanning the token. We already know that c0_ has already pointed to the first character of the JS stream during initialization. In our test case, c0_ points to “f”. Let’s see how it works.

(1): The 5th line of the above code. Checks whether c0_ is an Ascii. In our case, it is true (c0_ is f).

(2): The 6th line of the above code. one_char_tokens is a pre-defined template in V8, which is an array of character types. It is below:

The one_char_tokens is composed of the above three parts of the code. In our case, ‘f’ is the IsAsciiIdentifier(c) identifier, it’s type is Token::IDENTIFIER, and IsAsciiIdentifier is below:

(3): Look at ScanSingleToken(). The type of c0_ is Token::IDENTIFIER, so the ScanIdentifierOrKeyword() will be called. We can see several methods in ScanIdentifierOrKeyword, these methods are used to wrap the relationship between classes. Figure 1 is the call stack of these methods.

Searching for ScanSingleToken in V8, then set a breakpoint on it, you can reproduce the stack of Figure 1.

(4): Before scanning the next character, the ‘f’ needs to be saved. The following is the method to save characters.

Figure 2 is the variable window, we can see that the one_byte_char is c0_, which is stored in the backing_store_.

Let’s start scanning the second character, the code is below.

The AdvanceUntil method reads each character in turn and stops when the terminator occurs.

In Figure 3, we can see that these eight characters are exactly the first string — — function in our case. Now, we get a complete token, the next step is to analyze whether the token is a keyword or an identifier. The code is below.

GetToken is used to determine whether a token is a keyword or an identifier. The determination is very simple — get the type of token by looking up a predefined hash table. Figure 4 shows the important members of the hash table.

So far, the token function has been generated, and it’s type is Token::FUNCTION. The token JsPrint is similar, it’s type is Token::IDENTIFIER.

3. Difference between function and JsPrint

We know that function is a keyword, and JsPrint is a custom identifier. Let’s see how Parser expresses the difference between the two of them.

The above code parses these two tokens. Let’s make it clear that the scanner is responsible for generating tokens, and the difference between tokens is identified by the parser.

In ParserBase<Impl>::ParseHoistableDeclaration(), the above method will be called. In our case, the ParseIdentifier uses the following symbol method when parsing our function and JsPrint.

The symbol table saves the correspondence between identifiers and JS source code. Precisely, it saves the pairing relationship between the identifier and its declaration information. In our case, the function and JsPrint are a pair in the symbol table. Figure 5 is the call stack of the symbol table.

We can see the source code when debugging the program, precisely benefit from the existence of the symbol table. It is a data structure generated in the compilation. To be precise, the symbol table is generated in the lexical analysis phase and supplemented in the syntax analysis phase.

Okay, that wraps it up for this share. I’ll see you guys next time, take care!

My blog is cncyclops.com. Please reach out to me if you have any issues.

WeChat: qq9123013 Email: v8blink@outlook.com


Also Published here


Written by huidou | a big fan of chrome V8
Published by HackerNoon on 2022/08/14