Your First Environment#

Let’s run your first GenesisLab environment and understand what’s happening.

Running a Pre-built Environment#

import gymnasium as gym
import genesislab.envs  # Registers GenesisLab environments

# Create environment
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)

# Get environment information
print(f"Observation space: {env.observation_space}")
print(f"Action space: {env.action_space}")
print(f"Number of environments: {env.unwrapped.num_envs}")

# Reset environment
obs, info = env.reset()
print(f"Observation shape: {obs.shape}")
print(f"Initial reward: {info.get('episode', {}).get('r', 0)}")

# Run simulation
for step in range(1000):
    # Sample random action
    action = env.action_space.sample()
    
    # Step environment
    obs, reward, terminated, truncated, info = env.step(action)
    
    # Print episode statistics when episodes end
    if "episode" in info:
        print(f"Step {step}: Episode reward: {info['episode']['r']:.2f}, "
              f"Length: {info['episode']['l']}")

# Cleanup
env.close()

Understanding the Output#

Observation Space#

Box(-inf, inf, (48,), float32)

This means:

Type: Continuous (Box)
Shape: (48,) - 48-dimensional observation vector
Range: Unbounded

What’s in the observation? For the Go2 robot on flat terrain:

Base linear velocity (3)
Base angular velocity (3)
Projected gravity (3)
Command velocities (3)
Joint positions (12)
Joint velocities (12)
Last actions (12)

Action Space#

Box(-1.0, 1.0, (12,), float32)

This means:

Type: Continuous (Box)
Shape: (12,) - 12 DOF robot
Range: [-1, 1] - normalized action space

What do actions represent?

Joint position targets (relative to default position)
Actions are scaled and added to default joint positions
PD controller converts targets to motor torques

Running with Visualization#

To see what’s happening:

import gymnasium as gym
import genesislab.envs

# Create environment with viewer
env = gym.make(
    "GenesisLab-Go2-Flat-v0",
    num_envs=1,  # Use 1 env for easier visualization
    headless=False  # Enable viewer
)

obs, info = env.reset()

# Run and watch
for _ in range(1000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    
    # Slow down for visualization
    import time
    time.sleep(0.01)

env.close()

Understanding the Simulation Loop#

1. Reset#

obs, info = env.reset()

What happens:

Robot state is initialized (position, velocity, etc.)
Commands are sampled (target velocities)
Terrain is reset (if using curriculum)
Domain randomization events fire
Initial observations are computed

2. Step#

obs, reward, terminated, truncated, info = env.step(action)

What happens:

Actions are processed and scaled
PD controller computes motor torques
Genesis physics simulation steps forward
Sensors are updated
Observations are computed
Rewards are computed
Termination conditions are checked

3. Episode End#

When terminated or truncated is True:

Episode statistics are logged
Environment auto-resets
New episode begins

Exploring Different Environments#

Flat Terrain#

env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)

Good for:

Initial training
Policy debugging
Fast iteration

Rough Terrain#

env = gym.make("GenesisLab-Go2-Rough-v0", num_envs=1024)

Features:

Procedurally generated height maps
Stairs, slopes, stepping stones
Terrain curriculum

Custom Configuration#

from genesislab.tasks.go2_flat import Go2FlatEnvCfg

# Modify configuration
cfg = Go2FlatEnvCfg()
cfg.scene.num_envs = 8192
cfg.scene.env_spacing = 5.0
cfg.rewards.forward_vel.weight = 2.0

# Create environment
env = gym.make("GenesisLab-Go2-Flat-v0", cfg=cfg)

Training with a Policy#

Random Policy#

import torch

class RandomPolicy:
    def __init__(self, action_dim):
        self.action_dim = action_dim
    
    def __call__(self, obs):
        batch_size = obs.shape[0]
        return torch.rand(batch_size, self.action_dim) * 2 - 1  # [-1, 1]

# Create policy
policy = RandomPolicy(action_dim=12)

# Run simulation
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()

for _ in range(1000):
    action = policy(torch.from_numpy(obs))
    obs, reward, terminated, truncated, info = env.step(action.numpy())

Simple MLP Policy#

import torch.nn as nn

class MLPPolicy(nn.Module):
    def __init__(self, obs_dim, action_dim, hidden_dim=256):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(obs_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, action_dim),
            nn.Tanh()  # Output in [-1, 1]
        )
    
    def forward(self, obs):
        return self.net(obs)

# Create and use policy
policy = MLPPolicy(obs_dim=48, action_dim=12).cuda()

env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()
obs_tensor = torch.from_numpy(obs).cuda()

for _ in range(1000):
    with torch.no_grad():
        action = policy(obs_tensor)
    obs, reward, terminated, truncated, info = env.step(action.cpu().numpy())
    obs_tensor = torch.from_numpy(obs).cuda()

Accessing Environment Information#

Scene and Managers#

env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)

# Access the LabScene
scene = env.unwrapped.scene

# Access managers
obs_manager = scene.observation_manager
reward_manager = scene.reward_manager
action_manager = scene.action_manager

# Get manager info
print(f"Observation terms: {list(obs_manager.terms.keys())}")
print(f"Reward terms: {list(reward_manager.terms.keys())}")

Episode Statistics#

env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()

episode_rewards = []
episode_lengths = []

for _ in range(10000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    
    if "episode" in info:
        episode_rewards.append(info["episode"]["r"])
        episode_lengths.append(info["episode"]["l"])

print(f"Mean episode reward: {sum(episode_rewards) / len(episode_rewards):.2f}")
print(f"Mean episode length: {sum(episode_lengths) / len(episode_lengths):.0f}")

Individual Reward Terms#

env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()

# Step once to get rewards
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)

# Access individual reward terms
reward_manager = env.unwrapped.scene.reward_manager
for term_name, term_value in reward_manager.term_rewards.items():
    mean_value = term_value.mean().item()
    print(f"{term_name}: {mean_value:.4f}")

Common Issues#

Environment Creation Fails#

Problem: ImportError: cannot import name 'genesislab'

Solution: Make sure you imported genesislab.envs to register environments.

Simulation is Slow#

Problem: Low FPS

Solutions:

Increase num_envs (better GPU utilization)
Use headless=True (no visualization)
Check GPU is being used

Robot Falls Immediately#

Problem: With random actions, robot falls

This is normal! Random actions don’t produce useful behavior. The robot needs to be trained with RL to learn walking.

Next Steps#

Learn basic concepts to understand how it works
Customize environment configuration
Follow training tutorial to train a policy
Explore advanced topics