Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/loop without var decl #474

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

TobiasWienand
Copy link
Contributor

@TobiasWienand TobiasWienand commented Nov 20, 2024

Without loss of generality, we will only talk about for-of loops in the following, even though everything applies to for-in loops as well. The problem (Expected variable declaration as init part of a for-of loop, found Identifier) addressed with this PR was the most common reason why compilation of JS files in test/mjsunit failed, excluding the yet to be supported different parameter types and currently pending PRs.

Description of the problem

Previously we could compile this

case 1

for (var c of ["bar"]){
}
console.log(c); // bar

but not this

case 2

var c = "foo";
for ( c of ["bar"]){
}
console.log(c); // bar

or even this

case 3

for ( c of ["bar"]){
}
console.log(c); // bar
console.log(this.c); // bar 

Solution in a nutshell

This PR only addresses case 2 so far. To support this new feature, in a nutshell, we first verify if the identifier points to a declared variable (so we know we are not in case 3) and if so, we give the variable corresponding to the identifier as input into BeginForInLoop and do not create a new inner output for the iterator. During lifting to FuzzIL and JS we check the number of inputs to know if the iterator is the second input (as in case 2) or the inner output (as in case 1). Because iterating over an object with the iterator inherently changes the iterators value, we need to mark this FuzzIL instruction as having the "reassigning" property. Otherwise inlining would happen because the lifter thinks that the iterator still has the previous value, e.g. 'foo' in case 2

Some questions and next steps

  1. The current design choice is to give BeginForInLoop a parameter called usesPredeclaredIterator to distinguish case 1 from case 2. However, there is also the option of heaving unique IL instructions for each case. I like the current solution more but if you like the other option better I can change it.

  2. case 3 is not supported. I want to change that by utilizing LoadNamedVariable and StoreNamedVariable

  3. Writing tests. Do you have suggestions where to implement it`? Otherwise I would write the obligatory CompilerTest and now, since we also changed FuzzIL itself, also tests in FuzzilliTests

@TobiasWienand TobiasWienand changed the title Feature/loop without var decl feature/loop without var decl Nov 20, 2024
Copy link
Collaborator

@saelo saelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! It looks like there are some more ways to use for-of (and for-in) loops: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...of#examples for example exotic cases like for (let [a,b,c] of foo) or for (a.b.c of bar). I wonder if we should at least already plan for supporting those as well.

Given that the LHS cannot be an arbitrary expression but must be one of a few cases (variable definition, variable, desctructuring pattern, property access, ...), probably the best/right/only way to do this is by supporting different types of these loops, with different inputs and possibly additional data (e.g. the property name as string or the shape of the desctructuring pattern). So then I think there are three high-level options:

  1. Have a single BeginForOfLoop with some sort of "type" enum that indicates whether we declare a new variable, use a destructuring pattern, etc.

  2. Have different BeginForOfLoop ops, e.g. BeginPlainForOfLoop, BeginForOfLoopWithReassignment, BeginForOfLoopWithPropertyAccess etc. That might be most consistent with other operations, and indeed I just realized we already have a BeginForOfLoopWithDestruct operation

  3. Design it like plain for loops, with a BeginForOfLoopHeader and BeginForOfLoopBody. I think this will be difficult though because in the header, we would need exactly one additional operation of a specific kind, which is hard to implement in FuzzIL (typically we don't want such constraints so that we can just copy instructions between programs or minimize them away, etc.)

I'm currently leaning towards (2), mostly because it's consistent with what we already have and because it keeps things a bit simpler in the sense that a given operation always has the same number of inputs and outputs. WDYT?

Sources/Fuzzilli/Lifting/JavaScriptLifter.swift Outdated Show resolved Hide resolved
Sources/Fuzzilli/Lifting/JavaScriptLifter.swift Outdated Show resolved Hide resolved
@TobiasWienand
Copy link
Contributor Author

I wanted to implement case 3 with commit 89d2d77

I noticed that there are still some persisting compilation errors. See this before and after:

before

c = "old";
for (c of ["new"]) {
}
if (c == "new" && this.c === "new")
    console.log("Test 5 successful") // success with v8

after

c = "old";
const v2 = ["new"];
let v3 = c; // should not have become a let
for (v3 of v2) {
}
if ((v3 == "new") && (this.c === "new")) {
    console.log("Test 5 successful");  // failure because this.c is "old" 
}

I assume this error stems from the JavascriptLifter in the assign method because there we create the line "let v3 = c;"

This will be fixed soon

@TobiasWienand
Copy link
Contributor Author

This is ready for the final review now.

Copy link
Collaborator

@saelo saelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks really good, just 2 questions.

loopVar = existingVar
} else {
loopVar = emit(LoadNamedVariable(identifier.name)).output
map(identifier.name, to: loopVar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this here but not in the case above (line 436)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the case starting in line 438 is used for global variables and the case in line 436 for non-global variables. In both cases, the compiler just has an identfier as input. For global variables, lookupIdentifer will return nil because the identifier is not associted with a FuzzIL variable. Either because the global Variable has not been set but also when it has actually been set before, i.e. with StoreNamedVariable (see example "a" and "b")

Illustration

for (a of ["new a"]) {} // line 469. lookupIdentifier didn't return FuzzIL Variable because a is used for the first time here
output("value of a: " + a);

b = "old b";
for (b of ["new b"]) {} // line 469. lookupIdentifier didn't return FuzzIL Variable because the b identifier was not mapped to a FuzzIL variable
output("value of b: " + b);

var c = "old c";
for (c of ["new c"]) {} // line 467 because lookupIdentifier returned the FuzzIL Variable that the previous line mapped the identifier 'c' to
output("value of c: " + c);

for (var d of ["new d"]) {} // line 457. lhs of for loop header is not an identfier...
output("value of d: " + d);

loopVar = existingVar
} else {
loopVar = emit(LoadNamedVariable(identifier.name)).output
map(identifier.name, to: loopVar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@@ -886,7 +886,7 @@ public class JavaScriptLifter: Lifter {
w.assign(expr, to: instr.output)

case .loadNamedVariable(let op):
w.assign(Identifier.new(op.variableName), to: instr.output)
w.assign(Identifier.new(op.variableName), to: instr.output, forceInlining: true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed here? To support reasignment to global variables? Why don't we inline there without this, i.e. what kind of code would we generate?

I feel like this is maybe not the right way to approach it. In our IL, I guess we would currently generate code like this:

v2 <- LoadNamedVariable 'foo'
ForInLoopWithReassign v1, v2
    ...
EndForInLoop

So intuitively, I'd expect that to look more like this in JS

let v2 = foo;
for (v2 in v1) {
    ...
}

I don't really have a great alternative though. I guess with the current way variables work, we could add an explicit StoreNamedVariable:

v2 <- LoadNamedVariable 'foo'
ForInLoopWithReassign v1, v2
    StoreNamedVariable v2, 'foo'
    ...
EndForInLoop

Probably this isn't super important at the moment (?). Maybe in the future we'll want to redesign the way our variables work (because that's also causing a number of other issues in the compiler), and then we can also solve this properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to justify this with an example. Let's pretend the newly added forceInlining parameter doesn't exist. Then we would have an issue in the part where we lift FuzzIL->JS

JS before compiling to FuzzIL

for (a of ["new a"]) {}

b = "old b";
for (b of ["new b"]) {}

JS after lifting FuzzIL

const v1 = ["new a"];
let v2 = a;
for (v2 of v1) {
}

b = "old b";
const v5 = ["new b"];
let v6 = b;
for (v6 of v5) {
}

As we can see here, the lifting was not successful because the iterator in the loops became a new js variable even though we want to use the global variable. The reason why inlining can not happen here is a bit specific: The For*LoopWithReassignment loops we defined in JSOperations.swift have the property reassigning defined in Semantics.swift. This causes shouldTryInlining to return false. This is why I introduced forceInlining.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's what I thought. I think we should be able to handle this differently though: can you try using w.declare(instr.output, as: op.variableName) and see if that works, too?

More high-level, my (small) concern with this is that the lifted JS doesn't really fit the FuzzIL code, because there we have a LoadNamedVariable, which does basically create a new local variable with the content of a named variable. So it's somewhat unintuitive that later assignments to that new local variable also change the named variable. However, after thinking about this for a bit, I think what we should actually change is the way named variables work in FuzzIL. That should also allow us to (eventually) fix a number of other correctness issues with the compiler. What I would suggest is replacing LoadNamedVariable, StoreNamedVariable, and DefineNamedVariable with a single operation:

v3 <- CreateNamedVariable "foo", $declarationMode, [initialValue]

Where the declaration mode is something like

enum NamedVariableDeclarationMode {
    case none
    case global
    case var
    case let
}

And so reflects the different types of JS variables. Then, the lifter will simply lift this variable to its name (and potentially create a var or let declaration statement), and there is no local variable anymore (which is what we want here and elsewhere). For your example above, we would simply compile it to

v2 <- LoadString "old b"
v3 <- CreateNamedVariable "b", .global, v2
ForInLoopWithReassign v1, v3
    ...
EndForInLoop

What do you think about this? We'd certainly want to do this in a follow-up PR, but I think it would be good to do this somewhat soon-ish so that we can see if it works out and because it might be important for fixing a number of other correctness issues in the compiler (basically for every program where the names of variables are important, e.g. because they are referenced from eval'd code, inside a with, used in an object literal, etc.).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I gave that idea a shot now: #486 let me know what you think and if that should solve these issues here. No rush though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants