-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bytecode generation for var statements in for-in statements #52
Conversation
|
||
// Do not generate bytecode for this expression if it has already been generated. | ||
if (!blk) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could not we avoid to step into these generateExpressionByteCode
functions (in case of FunctionExpression
or ArrowFunctionExpression
nodes) at a second time? I think these functions will be called twice in the generateStatementByteCode
function of ForInSatementNode
class. Can`t we just skip it at the second time? For example, if the left value of the forInStatement is a variable declaration or something else.
That's why I have another question: What about if I create an object without function expressions:
for ( var id_0 = { a: 2, b: 10 } in Array.toString ) { }
Will the engine also generate bytecodes for a
and b
twice or this case is handled? My assumption is that the duplication appears here as well, even though no assert will occur.
Note: these are just questions, however, this patch may be the right solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[FYI] by setting ENV variable DUMP_BYTECODE
to any value (e.g. export DUMP_BYTECODE=1
), escargot print the generated bytecode.
Source Code
for ( var id_0 = { toString : function ( ) {} } in Array.toString ) { }
Bytecode
register info.. (stack variable size(1)) [X,X,X,X,X,X,`this`,]
0 createobject -> r3 | CreateObject (line: 1:20)
48 create function -> r4 | CreateFunction (line: 1:40)
104 object define own property with name r3.toString <- r4 | ObjectDefineOwnPropertyWithNameOperation (line: 1:20)
160 set global object global.id_0 <- r3 | SetGlobalObject (line: 1:11)
232 get global object r2 <- global.Array | GetGlobalObject (line: 1:52)
304 get object r1 <- r2.toString | GetObjectPreComputedCase (line: 1:52)
432 load r2 <- undefined | LoadLiteral (line: 1:5)
488 equal r2 <- r2 , r1 | BinaryEqual (line: 1:5)
544 jump if true r2 -> -983881656 | JumpIfTrue (line: 1:5)
600 load r2 <- null | LoadLiteral (line: 1:5)
656 equal r2 <- r2 , r1 | BinaryEqual (line: 1:5)
712 jump if true r2 -> -983881656 | JumpIfTrue (line: 1:5)
768 enumerate object r1, r5 | EnumerateObject (line: 1:5)
816 check if key is last r5 | CheckIfKeyIsLast (line: 1:5)
872 enumerate object key r1 r5 | EnumerateObjectKey (line: 1:5)
920 createobject -> r2 | CreateObject (line: 1:20)
968 object define own property with name r2.toString <- r3 | ObjectDefineOwnPropertyWithNameOperation (line: 1:20)
xxx create function -> r3 | CreateFunction (line: 1:40) // duplicated generation of FunctionExpression
1024 set global object global.id_0 <- r2 | SetGlobalObject (line: 1:11)
1096 jump -983882048 | Jump (line: 1:5)
1152 jump -983881656 | Jump (line: 1:5)
1208 end | End (line: 1:1)
The above shows the generated bytecode of the source code (by default, it occurs an assert error before printing the bytecode, so I reconstruct it)
In this example, generation of FunctionExpression bytecode is duplicated twice(as you commented), one for initialization of the global variable id_0
and the other for for-in
statement. which is the main problem of the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bytecode (FunctionExpression ast node) for generation of each FunctionExpression should be exist only at once.
Based on this assumption, function generateExpressionByteCode
of FunctionExpressionNode
sequentially allocates the pre-generated CodeBlock for each FunctionExpressionNode.
But, in the above case, when second identical FunctionExpression node about to generate a bytecode, it can not find the CodeBlock because first FunctionExpression node already take the CodeBlock and increase the counter (one CodeBlock is pre-generated for each FunctionExpression).
for (size_t i = 0; i < len; i++) {
CodeBlock* c = childBlocks[i];
if (c->isFunctionExpression()) {
if (cnt == counter) {
blk = c;
break;
}
cnt++;
}
}
From the above code in generateExpressionByteCode
function, it cannot match the related CodeBlock due to the already increased counter and finally assert error occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicated FunctionExpression node issue is only related with for-in statement (maybe same for for-of statement, but for-of is not yet implemented).
I think parsing for for-of statement should be fixed to not to generate duplicated variable declaration nodes.
However, this patch which just returns for the case of no matched CodeBlock is not a complete solution because it can not check the rule that one FunctionExpression node for one FunctionExpression generation. And this patch also removes the second FunctionExpression bytecode, assigning an empty object to variable id_0
during the for-in iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the in-depth explanation @clover2123 ! I tried to simply just skip the second generation but it makes some tests fail and causes segfaults, so I continue working on the issue and try to come up with a good solution.
@robertsipka @clover2123 I have updated the PR. I decided to simply get rid of the first generation of the bytecode, as the second generation will generate it and put it in the global scope anyway. I'm not sure if this is the right approach but I tried it with some test suites and it all passed. |
@pmarkee In my opinion, the first variable declaration should be generated and executed before for-in iteration as it is. And the second variable declaration need to be fixed such that the result of for-in operation is allocated in the variable without executing the right expression. var object = {a: 1, b: 2, c: 3};
function func_counter() {
print("func_counter() called")
}
for (var property = func_counter() in object) {
print("property : " + property);
} result
From the above example which is obtained from JSC engine, we can find that the variable declaration is executed before the for-in iteration and |
@clover2123 @robertsipka I have updated the PR, I found another possible approach. I simply set the |
src/parser/esprima_cpp/esprima.cpp
Outdated
@@ -4758,6 +4758,7 @@ class Parser : public gc { | |||
|
|||
if (declarations.size() == 1 && this->matchKeyword(InKeyword)) { | |||
RefPtr<VariableDeclaratorNode> decl = declarations[0]; | |||
decl->setInitPtr(nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it would be the best to handle it here. I think it works because you avoid to call the generateExpressionByteCode call for this init
node in here by set its value to nullptr
. Maybe you could skip the bytecode generation there instead of overwrite its value here. I did not investigate it fully, so I would also like to hear the opinions of others and you.
src/parser/esprima_cpp/esprima.cpp
Outdated
@@ -4758,6 +4758,7 @@ class Parser : public gc { | |||
|
|||
if (declarations.size() == 1 && this->matchKeyword(InKeyword)) { | |||
RefPtr<VariableDeclaratorNode> decl = declarations[0]; | |||
decl->setInitPtr(nullptr); | |||
// if (decl->init() && (decl.id.type === Syntax.ArrayPattern || decl.id.type === Syntax.ObjectPattern || this->context->strict)) { | |||
if (decl->init() && (decl->id()->type() == ArrayExpression || decl->id()->type() == ObjectExpression || this->context->strict)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After setting the m_init
to null value, this if statement that checks some syntax errors will be always ignored. Instead of your approach, how about just not generating the bytecode of m_init
in ForInStatementNode::generateStatementByteCode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't really be skipped in ForInStatementNode::generateStatementByteCode
, because the first time it generates the initial value for m_left
, then the second time it will assign the result of the object enumeration to it (the value that will be used in the current iteration). Both are needed to work correctly, so we can't really skip it here. I'm leaning more towards the suggestion by @robertsipka . I'm currently investigating whether the if check here is necessary at all, we may be able to skip that altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmarkee
As you commented, generating a store bytecode of m_left
at the second time would be the proper target point. It may be possible not to generate a store bytecode of m_init in VariableDeclarationNode::generateStoreByteCode
(instead calling a new function generateStoreByteCodeWithoutInit
?).
@clover2123 @robertsipka I updated the pull request, I removed the if statement in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add the following test case from the issue #29?
for ( var id_0 = { toString : function ( ) {} } in Array.toString ) { }
LGTM (informally).
Fixes #29 Signed-off-by: Peter Marki [email protected]
Thanks @robertsipka , I have added the test case to the regression test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #29
Signed-off-by: Peter Marki [email protected]