Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bytecode generation for var statements in for-in statements #52

Merged
merged 1 commit into from
Feb 19, 2019
Merged

Fix bytecode generation for var statements in for-in statements #52

merged 1 commit into from
Feb 19, 2019

Conversation

pmarkee
Copy link
Contributor

@pmarkee pmarkee commented Jan 10, 2019

Fixes #29

Signed-off-by: Peter Marki [email protected]

@pmarkee pmarkee changed the title Do not generate byte code for functions if it was already generated [WIP] Do not generate byte code for functions if it was already generated Jan 10, 2019

// Do not generate bytecode for this expression if it has already been generated.
if (!blk)
return;
Copy link
Contributor

@robertsipka robertsipka Jan 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not we avoid to step into these generateExpressionByteCode functions (in case of FunctionExpression or ArrowFunctionExpression nodes) at a second time? I think these functions will be called twice in the generateStatementByteCode function of ForInSatementNode class. Can`t we just skip it at the second time? For example, if the left value of the forInStatement is a variable declaration or something else.
That's why I have another question: What about if I create an object without function expressions:

for ( var id_0 = { a: 2, b: 10 } in Array.toString ) { }

Will the engine also generate bytecodes for a and b twice or this case is handled? My assumption is that the duplication appears here as well, even though no assert will occur.

Note: these are just questions, however, this patch may be the right solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[FYI] by setting ENV variable DUMP_BYTECODE to any value (e.g. export DUMP_BYTECODE=1), escargot print the generated bytecode.

Source Code

for ( var id_0 = { toString : function ( ) {} } in Array.toString ) { } 

Bytecode

register info.. (stack variable size(1)) [X,X,X,X,X,X,`this`,]
0       createobject -> r3 | CreateObject (line: 1:20)
48      create function  -> r4 | CreateFunction (line: 1:40)
104     object define own property with name r3.toString <- r4 | ObjectDefineOwnPropertyWithNameOperation (line: 1:20)
160     set global object global.id_0 <- r3 | SetGlobalObject (line: 1:11)
232     get global object r2 <- global.Array | GetGlobalObject (line: 1:52)
304     get object r1 <- r2.toString | GetObjectPreComputedCase (line: 1:52)
432     load r2 <- undefined | LoadLiteral (line: 1:5)
488     equal r2 <- r2 , r1 | BinaryEqual (line: 1:5)
544     jump if true r2 -> -983881656 | JumpIfTrue (line: 1:5)
600     load r2 <- null | LoadLiteral (line: 1:5)
656     equal r2 <- r2 , r1 | BinaryEqual (line: 1:5)
712     jump if true r2 -> -983881656 | JumpIfTrue (line: 1:5)
768     enumerate object r1, r5 | EnumerateObject (line: 1:5)
816     check if key is last r5 | CheckIfKeyIsLast (line: 1:5)
872     enumerate object key r1 r5 | EnumerateObjectKey (line: 1:5)
920     createobject -> r2 | CreateObject (line: 1:20)
968     object define own property with name r2.toString <- r3 | ObjectDefineOwnPropertyWithNameOperation (line: 1:20)
xxx     create function  -> r3 | CreateFunction (line: 1:40) // duplicated generation of FunctionExpression
1024    set global object global.id_0 <- r2 | SetGlobalObject (line: 1:11)
1096    jump -983882048 | Jump (line: 1:5)
1152    jump -983881656 | Jump (line: 1:5)
1208    end | End (line: 1:1)

The above shows the generated bytecode of the source code (by default, it occurs an assert error before printing the bytecode, so I reconstruct it)
In this example, generation of FunctionExpression bytecode is duplicated twice(as you commented), one for initialization of the global variable id_0 and the other for for-in statement. which is the main problem of the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bytecode (FunctionExpression ast node) for generation of each FunctionExpression should be exist only at once.
Based on this assumption, function generateExpressionByteCode of FunctionExpressionNode sequentially allocates the pre-generated CodeBlock for each FunctionExpressionNode.
But, in the above case, when second identical FunctionExpression node about to generate a bytecode, it can not find the CodeBlock because first FunctionExpression node already take the CodeBlock and increase the counter (one CodeBlock is pre-generated for each FunctionExpression).

for (size_t i = 0; i < len; i++) {
    CodeBlock* c = childBlocks[i];
    if (c->isFunctionExpression()) {
        if (cnt == counter) {
            blk = c;
            break;
        }
        cnt++;
    }
}

From the above code in generateExpressionByteCode function, it cannot match the related CodeBlock due to the already increased counter and finally assert error occurred.

Copy link
Contributor

@clover2123 clover2123 Jan 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicated FunctionExpression node issue is only related with for-in statement (maybe same for for-of statement, but for-of is not yet implemented).
I think parsing for for-of statement should be fixed to not to generate duplicated variable declaration nodes.
However, this patch which just returns for the case of no matched CodeBlock is not a complete solution because it can not check the rule that one FunctionExpression node for one FunctionExpression generation. And this patch also removes the second FunctionExpression bytecode, assigning an empty object to variable id_0 during the for-in iteration.

Copy link
Contributor Author

@pmarkee pmarkee Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the in-depth explanation @clover2123 ! I tried to simply just skip the second generation but it makes some tests fail and causes segfaults, so I continue working on the issue and try to come up with a good solution.

@pmarkee pmarkee changed the title [WIP] Do not generate byte code for functions if it was already generated [WIP] Do not generate byte code twice for global variables in for-in statements Jan 15, 2019
@pmarkee
Copy link
Contributor Author

pmarkee commented Jan 15, 2019

@robertsipka @clover2123 I have updated the PR. I decided to simply get rid of the first generation of the bytecode, as the second generation will generate it and put it in the global scope anyway. I'm not sure if this is the right approach but I tried it with some test suites and it all passed.

@clover2123
Copy link
Contributor

clover2123 commented Jan 17, 2019

@pmarkee In my opinion, the first variable declaration should be generated and executed before for-in iteration as it is. And the second variable declaration need to be fixed such that the result of for-in operation is allocated in the variable without executing the right expression.

var object = {a: 1, b: 2, c: 3}; 
function func_counter() {
    print("func_counter() called")
}
for (var property = func_counter() in object) {
    print("property : "  + property);
}

result

func_counter() called
property : a
property : b
property : c

From the above example which is obtained from JSC engine, we can find that the variable declaration is executed before the for-in iteration and func_counter() is called only once.

@pmarkee pmarkee changed the title [WIP] Do not generate byte code twice for global variables in for-in statements Fix bytecode generation for var statements in for-in statements Feb 1, 2019
@pmarkee
Copy link
Contributor Author

pmarkee commented Feb 1, 2019

@clover2123 @robertsipka I have updated the PR, I found another possible approach. I simply set the init of the variable declarator node to nullptr, so the initial value of the variable will be completely ignored and the for-in statement will work as expected.

@@ -4758,6 +4758,7 @@ class Parser : public gc {

if (declarations.size() == 1 && this->matchKeyword(InKeyword)) {
RefPtr<VariableDeclaratorNode> decl = declarations[0];
decl->setInitPtr(nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it would be the best to handle it here. I think it works because you avoid to call the generateExpressionByteCode call for this init node in here by set its value to nullptr. Maybe you could skip the bytecode generation there instead of overwrite its value here. I did not investigate it fully, so I would also like to hear the opinions of others and you.

@@ -4758,6 +4758,7 @@ class Parser : public gc {

if (declarations.size() == 1 && this->matchKeyword(InKeyword)) {
RefPtr<VariableDeclaratorNode> decl = declarations[0];
decl->setInitPtr(nullptr);
// if (decl->init() && (decl.id.type === Syntax.ArrayPattern || decl.id.type === Syntax.ObjectPattern || this->context->strict)) {
if (decl->init() && (decl->id()->type() == ArrayExpression || decl->id()->type() == ObjectExpression || this->context->strict)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After setting the m_init to null value, this if statement that checks some syntax errors will be always ignored. Instead of your approach, how about just not generating the bytecode of m_init in ForInStatementNode::generateStatementByteCode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't really be skipped in ForInStatementNode::generateStatementByteCode, because the first time it generates the initial value for m_left, then the second time it will assign the result of the object enumeration to it (the value that will be used in the current iteration). Both are needed to work correctly, so we can't really skip it here. I'm leaning more towards the suggestion by @robertsipka . I'm currently investigating whether the if check here is necessary at all, we may be able to skip that altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmarkee
As you commented, generating a store bytecode of m_left at the second time would be the proper target point. It may be possible not to generate a store bytecode of m_init in VariableDeclarationNode::generateStoreByteCode (instead calling a new function generateStoreByteCodeWithoutInit?).

@pmarkee
Copy link
Contributor Author

pmarkee commented Feb 15, 2019

@clover2123 @robertsipka I updated the pull request, I removed the if statement in VariableDeclarationNode::generateStoreByteCode() as suggested, we don't need that IMO.

Copy link
Contributor

@robertsipka robertsipka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add the following test case from the issue #29?

for ( var id_0 = { toString : function ( ) {} } in Array.toString ) { }

LGTM (informally).

@pmarkee
Copy link
Contributor Author

pmarkee commented Feb 18, 2019

Thanks @robertsipka , I have added the test case to the regression test.

Copy link
Contributor

@clover2123 clover2123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@yichoi yichoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yichoi yichoi merged commit 250d95d into Samsung:master Feb 19, 2019
@pmarkee pmarkee deleted the issue-29 branch February 20, 2019 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants