
Quick Start with SERL in Sim

This is a minimal mujoco simulation environment for training with SERL. The environment consists of a panda robot arm and a cube. The goal is to lift the cube to a target position. The environment is implemented using franka_sim and gym interface.


Install Franka Sim library

    cd franka_sim
    pip install -e .
    pip install -r requirements.txt

Try if franka_sim is running via python franka_sim/franka_sim/test/

Before beginning, please make sure that the simulation environment with franka_sim is working.

Note: to set MUJOCO_GL as egl if you are doing off-screen rendering. You can do so by export MUJOCO_GL=egl and remember to set the rendering argument to False in the script. If receives Cannot initialize a EGL device display due to GLIBCXX not found error, try run conda install -c conda-forge libstdcxx-ng (ref)

Optionally install tmux: sudo apt install tmux

1. Training from state observation example

✨ One-liner launcher (requires tmux) ✨

bash examples/async_sac_state_sim/

To kill the tmux session, run tmux kill-session -t serl_session.

Without using one-liner tmux launcher

You can opt for running the commands individually in 2 different terminals.

cd examples/async_sac_state_sim

Run learner node:


Run actor node with rendering window:

# add --ip x.x.x.x if running on a different machine

You can optionally launch the learner and actor on separate machines. For example, if the learner node is running on a PC with ip=x.x.x.x, you can launch the actor node on a different machine with internet access to ip=x.x.x.x and add --ip x.x.x. to the commands in

Remove --debug flag in to upload training stats to wandb.

2. Training from image observation example

✨ One-liner launcher (requires tmux) ✨

bash examples/async_drq_sim/

Without using one-liner tmux launcher

You can opt for running the commands individually in 2 different terminals.

cd examples/async_drq_sim

# to use pre-trained ResNet weights, please download

Run learner node:


Run actor node with rendering window:

# add --ip x.x.x.x if running on a different machine

3. Training from image observation with 20 demo trajectories example

✨ One-liner launcher (requires tmux) ✨

bash examples/async_drq_sim/

Without using one-liner tmux launcher

You can opt for running the commands individually in 2 different terminals.

cd examples/async_drq_sim

# to use pre-trained ResNet weights, please download
# note manual download is only for now, once repo is public, auto download will work

# download 20 demo trajectories
wget \

Run learner node, while provide the path to the demo trajectories in the --demo_path argument.

bash --demo_path franka_lift_cube_image_20_trajs.pkl

Run actor node with rendering window:

# add --ip x.x.x.x if running on a different machine

Use RLDS logger to save and load trajectories

This provides a way to save and load trajectories for SERL training. Tensorflow RLDS dataset format is used to save and load trajectories. This standard is compliant with the RTX datasets, which can potentially can be used for other robot learning tasks.


This requires additional installation of oxe_envlogger:

git clone
cd oxe_envlogger
pip install -e .


Save the trajectories

With the example above, we can save the data from the replay buffer by providing the rlds_logger_path argument. This will save the data to the specified path.

./ --log_rlds_path /path/to/save

This will save the data to the specified path in the following format:

 - /path/to/save
    - dataset_info.json
    - features.json
    - serl_rlds_dataset-train.tfrecord-00000
    - serl_rlds_dataset-train.tfrecord-00001

Load the trajectories

With the example above, we can load the data from the replay buffer by providing the preload_rlds_path argument. This will load the data from the specified path.

./ --preload_rlds_path /path/to/load

This is similar to the examples/async_rlpd_drq_sim/ script, which uses --demo_path argument which load .pkl offline demo trajectories.


  1. If you receive a Out of Memory error, try reducing the batch size in the script. by adding the --batch_size argument. For example, bash --batch_size 64.
  2. If the provided offline RLDS data is throwing an error, this usually means the data is not compatible with current SERL format. You can provide a custom data transform with the data_transform(data, metadata) -> data function in the examples/async_drq_sim/ script.