Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vz: implement auto save/restore #2900

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pkg/store/filenames/filenames.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ const (
HostAgentStderrLog = "ha.stderr.log"
VzIdentifier = "vz-identifier"
VzEfi = "vz-efi" // efi variable store
VzMachineState = "vz-machine-state" // machine state file
QemuEfiCodeFD = "qemu-efi-code.fd" // efi code; not always created
AnsibleInventoryYAML = "ansible-inventory.yaml"

Expand Down
18 changes: 18 additions & 0 deletions pkg/vz/save_restore.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
//go:build darwin && !arm64 && !no_vz

package vz

import (
"fmt"
"runtime"

"github.com/Code-Hex/vz/v3"
)

func saveVM(vm *vz.VirtualMachine, machineStatePath string) error {
return fmt.Errorf("save is not supported on the vz driver for the architecture %s", runtime.GOARCH)
}

func restoreVM(vm *vz.VirtualMachine, machineStatePath string) error {
return fmt.Errorf("restore is not supported on the vz driver for the architecture %s", runtime.GOARCH)
}
76 changes: 76 additions & 0 deletions pkg/vz/save_restore_arm64.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
//go:build darwin && arm64 && !no_vz

package vz

import (
"errors"
"fmt"
"os"

"github.com/Code-Hex/vz/v3"
"github.com/sirupsen/logrus"
)

func saveVM(vm *vz.VirtualMachine, machineStatePath string) error {
if !vm.CanPause() {
return fmt.Errorf("can't pause the VZ machine")
}

// Remove the old machine state file if it exists,
// because saving the machine state will fail if the file already exists.
Copy link
Member

@nirs nirs Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing to avoid failure on save is not a good reason, save can replace existing file.

if err := os.Remove(machineStatePath); err != nil && !errors.Is(err, os.ErrNotExist) {
logrus.WithError(err).Errorf("Failed to remove the old VZ machine state file %q", machineStatePath)
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can write machine state to vz-machine-state.tmp and then replace the current state by renaming vz-machine-state.tmp to vz-machine-state. This ensures that we always have consistent state, either the previous one or the new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I don't see the value in keeping the vz-machine-state after a resume. This is because the current implementation does not save disk snapshots, and there is no mechanism to revert the guest OS's changes to the diffdisk after a resume to restore the state to match when the vz-machine-state was saved.

By the time we reach this point, the guest OS should already have gone through StatusRunning, so I don't think it's worth complicating things to maintain consistency with the previous vz-machine-state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Maybe add a comment here to explain why we don't keep the state?


logrus.Info("Pausing VZ")
norio-nomura marked this conversation as resolved.
Show resolved Hide resolved
if err := vm.Pause(); err != nil {
return err
}

// If we can't stop the machine after pausing, saving the machine state will be useless.
// So we should check this before saving the machine state.
if !vm.CanStop() {
return fmt.Errorf("can't stop the VZ machine after pausing")
}
norio-nomura marked this conversation as resolved.
Show resolved Hide resolved

logrus.Info("Saving VZ machine state for resuming later")
norio-nomura marked this conversation as resolved.
Show resolved Hide resolved
if err := vm.SaveMachineStateToPath(machineStatePath); err != nil {
// If we fail to save the machine state, we should resume the machine to call RequestStop() later
logrus.WithError(err).Errorf("Failed to save the machine state to %q", machineStatePath)
if resumeError := vm.Resume(); resumeError != nil {
return resumeError
}
norio-nomura marked this conversation as resolved.
Show resolved Hide resolved
return err
}

logrus.Info("Stopping VZ")
if err := vm.Stop(); err != nil {
// If we fail to stop the machine, we should resume the machine to call RequestStop() later
logrus.WithError(err).Error("Failed to stop the VZ machine")
if resumeError := vm.Resume(); resumeError != nil {
return resumeError
}
return err
}
return nil
}

func restoreVM(vm *vz.VirtualMachine, machineStatePath string) error {
if _, err := os.Stat(machineStatePath); err != nil {
return err
}
logrus.Info("Saved VZ machine state found, resuming VZ")
norio-nomura marked this conversation as resolved.
Show resolved Hide resolved
if err := vm.RestoreMachineStateFromURL(machineStatePath); err != nil {
return err
}
if err := vm.Resume(); err != nil {
return err
}
// Remove the machine state file after resuming the machine
if err := os.Remove(machineStatePath); err != nil {
// We should log the error but continue the process, because the machine state is already restored
logrus.WithError(err).Errorf("Failed to remove the VZ machine state file %q", machineStatePath)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we always remove the saved state? what if you want to always start with the same saved state?

Copy link
Contributor Author

@norio-nomura norio-nomura Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if you want to always start with the same saved state?

As I mentioned in a reply to another comment earlier, achieving that would require a snapshot feature that includes saving and restoring the diffdisk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we can add this later.

return nil
}
15 changes: 11 additions & 4 deletions pkg/vz/vm_darwin.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,17 @@ func startVM(ctx context.Context, driver *driver.BaseDriver) (*virtualMachineWra
if err != nil {
return nil, nil, err
}

err = machine.Start()
if err != nil {
return nil, nil, err
machineStatePath := filepath.Join(driver.Instance.Dir, filenames.VzMachineState)
if err := restoreVM(machine, machineStatePath); err != nil {
if errors.Is(err, os.ErrNotExist) {
logrus.Info("Saved VZ machine state not found, starting VZ")
} else {
logrus.WithError(err).Warn("Failed to restore VZ. Falling back to starting")
}
err = machine.Start()
if err != nil {
return nil, nil, err
}
}

wrapper := &virtualMachineWrapper{VirtualMachine: machine, stopped: false}
Expand Down
8 changes: 8 additions & 0 deletions pkg/vz/vz_driver_darwin.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import (
"github.com/lima-vm/lima/pkg/driver"
"github.com/lima-vm/lima/pkg/limayaml"
"github.com/lima-vm/lima/pkg/reflectutil"
"github.com/lima-vm/lima/pkg/store/filenames"
)

var knownYamlProperties = []string{
Expand Down Expand Up @@ -192,6 +193,13 @@ func (l *LimaVzDriver) RunGUI() error {
}

func (l *LimaVzDriver) Stop(_ context.Context) error {
machineStatePath := filepath.Join(l.Instance.Dir, filenames.VzMachineState)
if err := saveVM(l.machine.VirtualMachine, machineStatePath); err != nil {
logrus.WithError(err).Warn("Failed to save VZ. Falling back to shutdown")
norio-nomura marked this conversation as resolved.
Show resolved Hide resolved
} else {
return nil
}

logrus.Info("Shutting down VZ")
canStop := l.machine.CanRequestStop()

Expand Down