-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/loop without var decl #474
base: main
Are you sure you want to change the base?
feature/loop without var decl #474
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! It looks like there are some more ways to use for-of (and for-in) loops: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...of#examples for example exotic cases like for (let [a,b,c] of foo)
or for (a.b.c of bar)
. I wonder if we should at least already plan for supporting those as well.
Given that the LHS cannot be an arbitrary expression but must be one of a few cases (variable definition, variable, desctructuring pattern, property access, ...), probably the best/right/only way to do this is by supporting different types of these loops, with different inputs and possibly additional data (e.g. the property name as string or the shape of the desctructuring pattern). So then I think there are three high-level options:
-
Have a single BeginForOfLoop with some sort of "type" enum that indicates whether we declare a new variable, use a destructuring pattern, etc.
-
Have different BeginForOfLoop ops, e.g. BeginPlainForOfLoop, BeginForOfLoopWithReassignment, BeginForOfLoopWithPropertyAccess etc. That might be most consistent with other operations, and indeed I just realized we already have a
BeginForOfLoopWithDestruct
operation -
Design it like plain for loops, with a
BeginForOfLoopHeader
andBeginForOfLoopBody
. I think this will be difficult though because in the header, we would need exactly one additional operation of a specific kind, which is hard to implement in FuzzIL (typically we don't want such constraints so that we can just copy instructions between programs or minimize them away, etc.)
I'm currently leaning towards (2), mostly because it's consistent with what we already have and because it keeps things a bit simpler in the sense that a given operation always has the same number of inputs and outputs. WDYT?
I wanted to implement case 3 with commit 89d2d77 I noticed that there are still some persisting compilation errors. See this before and after: before
after
I assume this error stems from the JavascriptLifter in the assign method because there we create the line "let v3 = c;" This will be fixed soon |
This is ready for the final review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks really good, just 2 questions.
loopVar = existingVar | ||
} else { | ||
loopVar = emit(LoadNamedVariable(identifier.name)).output | ||
map(identifier.name, to: loopVar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do this here but not in the case above (line 436)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the case starting in line 438 is used for global variables and the case in line 436 for non-global variables. In both cases, the compiler just has an identfier as input. For global variables, lookupIdentifer will return nil because the identifier is not associted with a FuzzIL variable. Either because the global Variable has not been set but also when it has actually been set before, i.e. with StoreNamedVariable (see example "a" and "b")
Illustration
for (a of ["new a"]) {} // line 469. lookupIdentifier didn't return FuzzIL Variable because a is used for the first time here
output("value of a: " + a);
b = "old b";
for (b of ["new b"]) {} // line 469. lookupIdentifier didn't return FuzzIL Variable because the b identifier was not mapped to a FuzzIL variable
output("value of b: " + b);
var c = "old c";
for (c of ["new c"]) {} // line 467 because lookupIdentifier returned the FuzzIL Variable that the previous line mapped the identifier 'c' to
output("value of c: " + c);
for (var d of ["new d"]) {} // line 457. lhs of for loop header is not an identfier...
output("value of d: " + d);
loopVar = existingVar | ||
} else { | ||
loopVar = emit(LoadNamedVariable(identifier.name)).output | ||
map(identifier.name, to: loopVar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
@@ -886,7 +886,7 @@ public class JavaScriptLifter: Lifter { | |||
w.assign(expr, to: instr.output) | |||
|
|||
case .loadNamedVariable(let op): | |||
w.assign(Identifier.new(op.variableName), to: instr.output) | |||
w.assign(Identifier.new(op.variableName), to: instr.output, forceInlining: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed here? To support reasignment to global variables? Why don't we inline there without this, i.e. what kind of code would we generate?
I feel like this is maybe not the right way to approach it. In our IL, I guess we would currently generate code like this:
v2 <- LoadNamedVariable 'foo'
ForInLoopWithReassign v1, v2
...
EndForInLoop
So intuitively, I'd expect that to look more like this in JS
let v2 = foo;
for (v2 in v1) {
...
}
I don't really have a great alternative though. I guess with the current way variables work, we could add an explicit StoreNamedVariable:
v2 <- LoadNamedVariable 'foo'
ForInLoopWithReassign v1, v2
StoreNamedVariable v2, 'foo'
...
EndForInLoop
Probably this isn't super important at the moment (?). Maybe in the future we'll want to redesign the way our variables work (because that's also causing a number of other issues in the compiler), and then we can also solve this properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to justify this with an example. Let's pretend the newly added forceInlining parameter doesn't exist. Then we would have an issue in the part where we lift FuzzIL->JS
JS before compiling to FuzzIL
for (a of ["new a"]) {}
b = "old b";
for (b of ["new b"]) {}
JS after lifting FuzzIL
const v1 = ["new a"];
let v2 = a;
for (v2 of v1) {
}
b = "old b";
const v5 = ["new b"];
let v6 = b;
for (v6 of v5) {
}
As we can see here, the lifting was not successful because the iterator in the loops became a new js variable even though we want to use the global variable. The reason why inlining can not happen here is a bit specific: The For*LoopWithReassignment loops we defined in JSOperations.swift have the property reassigning defined in Semantics.swift. This causes shouldTryInlining to return false. This is why I introduced forceInlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's what I thought. I think we should be able to handle this differently though: can you try using w.declare(instr.output, as: op.variableName)
and see if that works, too?
More high-level, my (small) concern with this is that the lifted JS doesn't really fit the FuzzIL code, because there we have a LoadNamedVariable
, which does basically create a new local variable with the content of a named variable. So it's somewhat unintuitive that later assignments to that new local variable also change the named variable. However, after thinking about this for a bit, I think what we should actually change is the way named variables work in FuzzIL. That should also allow us to (eventually) fix a number of other correctness issues with the compiler. What I would suggest is replacing LoadNamedVariable
, StoreNamedVariable
, and DefineNamedVariable
with a single operation:
v3 <- CreateNamedVariable "foo", $declarationMode, [initialValue]
Where the declaration mode is something like
enum NamedVariableDeclarationMode {
case none
case global
case var
case let
}
And so reflects the different types of JS variables. Then, the lifter will simply lift this variable to its name (and potentially create a var
or let
declaration statement), and there is no local variable anymore (which is what we want here and elsewhere). For your example above, we would simply compile it to
v2 <- LoadString "old b"
v3 <- CreateNamedVariable "b", .global, v2
ForInLoopWithReassign v1, v3
...
EndForInLoop
What do you think about this? We'd certainly want to do this in a follow-up PR, but I think it would be good to do this somewhat soon-ish so that we can see if it works out and because it might be important for fixing a number of other correctness issues in the compiler (basically for every program where the names of variables are important, e.g. because they are referenced from eval'd code, inside a with
, used in an object literal, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I gave that idea a shot now: #486 let me know what you think and if that should solve these issues here. No rush though :)
Without loss of generality, we will only talk about for-of loops in the following, even though everything applies to for-in loops as well. The problem (Expected variable declaration as init part of a for-of loop, found Identifier) addressed with this PR was the most common reason why compilation of JS files in test/mjsunit failed, excluding the yet to be supported different parameter types and currently pending PRs.
Description of the problem
Previously we could compile this
case 1
but not this
case 2
or even this
case 3
Solution in a nutshell
This PR only addresses case 2 so far. To support this new feature, in a nutshell, we first verify if the identifier points to a declared variable (so we know we are not in case 3) and if so, we give the variable corresponding to the identifier as input into BeginForInLoop and do not create a new inner output for the iterator. During lifting to FuzzIL and JS we check the number of inputs to know if the iterator is the second input (as in case 2) or the inner output (as in case 1). Because iterating over an object with the iterator inherently changes the iterators value, we need to mark this FuzzIL instruction as having the "reassigning" property. Otherwise inlining would happen because the lifter thinks that the iterator still has the previous value, e.g. 'foo' in case 2
Some questions and next steps
The current design choice is to give BeginForInLoop a parameter called usesPredeclaredIterator to distinguish case 1 from case 2. However, there is also the option of heaving unique IL instructions for each case. I like the current solution more but if you like the other option better I can change it.
case 3 is not supported. I want to change that by utilizing LoadNamedVariable and StoreNamedVariable
Writing tests. Do you have suggestions where to implement it`? Otherwise I would write the obligatory CompilerTest and now, since we also changed FuzzIL itself, also tests in FuzzilliTests