The codebase for ICLR2023 paper: Simple Emergent Action Representation from Multi-Task Policy Training.
Our project page is at:
- Python 3.8
- PyTorch 1.8
- gym 0.24.0
- MuJoCo 2.1.0
- mujoco_py
- posix_ipc
- tensorboardX
- tabulate
- seaborn
# train EAR in HalfCheetah-Vel
python starter/ --config config/train/halfcheetah-vel.json --id EAR_SAC --seed 0 --worker_nums 10 --eval_worker_nums 10
The training curves can be plotted following:
python torchrl/utils/ --id EXP_NAMES --env_name HalfCheetah-Vel --entry "Running_Average_Rewards" --add_tag POSTFIX_FOR_OUTPUT_FILES --seed SEEDS
You can replace "Running_Average_Rewards" with different entry to see different curve for different entry.
# adapt to new tasks in HalfCheetah-Vel
python starter/ --config config/adapt/halfcheetah-vel.json --id EAR_SAC --seed 0
# interpolate two tasks in HalfCheetah-Vel
python starter/ --config config/interpolate/halfcheetah-vel.json --id EAR_SAC --seed 0
# compose two tasks in HalfCheetah-RunJump
python starter/ --config config/interpolate/halfcheetah-runjump.json --id EAR_SAC --seed 0
You should first initialize LTE_1 and LTE_2 in with the two task embeddings to be interpolated(or composed).