Your First Environment#
Let’s run your first GenesisLab environment and understand what’s happening.
Running a Pre-built Environment#
import gymnasium as gym
import genesislab.envs # Registers GenesisLab environments
# Create environment
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
# Get environment information
print(f"Observation space: {env.observation_space}")
print(f"Action space: {env.action_space}")
print(f"Number of environments: {env.unwrapped.num_envs}")
# Reset environment
obs, info = env.reset()
print(f"Observation shape: {obs.shape}")
print(f"Initial reward: {info.get('episode', {}).get('r', 0)}")
# Run simulation
for step in range(1000):
# Sample random action
action = env.action_space.sample()
# Step environment
obs, reward, terminated, truncated, info = env.step(action)
# Print episode statistics when episodes end
if "episode" in info:
print(f"Step {step}: Episode reward: {info['episode']['r']:.2f}, "
f"Length: {info['episode']['l']}")
# Cleanup
env.close()
Understanding the Output#
Observation Space#
Box(-inf, inf, (48,), float32)
This means:
Type: Continuous (Box)
Shape: (48,) - 48-dimensional observation vector
Range: Unbounded
What’s in the observation? For the Go2 robot on flat terrain:
Base linear velocity (3)
Base angular velocity (3)
Projected gravity (3)
Command velocities (3)
Joint positions (12)
Joint velocities (12)
Last actions (12)
Action Space#
Box(-1.0, 1.0, (12,), float32)
This means:
Type: Continuous (Box)
Shape: (12,) - 12 DOF robot
Range: [-1, 1] - normalized action space
What do actions represent?
Joint position targets (relative to default position)
Actions are scaled and added to default joint positions
PD controller converts targets to motor torques
Running with Visualization#
To see what’s happening:
import gymnasium as gym
import genesislab.envs
# Create environment with viewer
env = gym.make(
"GenesisLab-Go2-Flat-v0",
num_envs=1, # Use 1 env for easier visualization
headless=False # Enable viewer
)
obs, info = env.reset()
# Run and watch
for _ in range(1000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
# Slow down for visualization
import time
time.sleep(0.01)
env.close()
Understanding the Simulation Loop#
1. Reset#
obs, info = env.reset()
What happens:
Robot state is initialized (position, velocity, etc.)
Commands are sampled (target velocities)
Terrain is reset (if using curriculum)
Domain randomization events fire
Initial observations are computed
2. Step#
obs, reward, terminated, truncated, info = env.step(action)
What happens:
Actions are processed and scaled
PD controller computes motor torques
Genesis physics simulation steps forward
Sensors are updated
Observations are computed
Rewards are computed
Termination conditions are checked
3. Episode End#
When terminated or truncated is True:
Episode statistics are logged
Environment auto-resets
New episode begins
Exploring Different Environments#
Flat Terrain#
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
Good for:
Initial training
Policy debugging
Fast iteration
Rough Terrain#
env = gym.make("GenesisLab-Go2-Rough-v0", num_envs=1024)
Features:
Procedurally generated height maps
Stairs, slopes, stepping stones
Terrain curriculum
Custom Configuration#
from genesislab.tasks.go2_flat import Go2FlatEnvCfg
# Modify configuration
cfg = Go2FlatEnvCfg()
cfg.scene.num_envs = 8192
cfg.scene.env_spacing = 5.0
cfg.rewards.forward_vel.weight = 2.0
# Create environment
env = gym.make("GenesisLab-Go2-Flat-v0", cfg=cfg)
Training with a Policy#
Random Policy#
import torch
class RandomPolicy:
def __init__(self, action_dim):
self.action_dim = action_dim
def __call__(self, obs):
batch_size = obs.shape[0]
return torch.rand(batch_size, self.action_dim) * 2 - 1 # [-1, 1]
# Create policy
policy = RandomPolicy(action_dim=12)
# Run simulation
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()
for _ in range(1000):
action = policy(torch.from_numpy(obs))
obs, reward, terminated, truncated, info = env.step(action.numpy())
Simple MLP Policy#
import torch.nn as nn
class MLPPolicy(nn.Module):
def __init__(self, obs_dim, action_dim, hidden_dim=256):
super().__init__()
self.net = nn.Sequential(
nn.Linear(obs_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, action_dim),
nn.Tanh() # Output in [-1, 1]
)
def forward(self, obs):
return self.net(obs)
# Create and use policy
policy = MLPPolicy(obs_dim=48, action_dim=12).cuda()
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()
obs_tensor = torch.from_numpy(obs).cuda()
for _ in range(1000):
with torch.no_grad():
action = policy(obs_tensor)
obs, reward, terminated, truncated, info = env.step(action.cpu().numpy())
obs_tensor = torch.from_numpy(obs).cuda()
Accessing Environment Information#
Scene and Managers#
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
# Access the LabScene
scene = env.unwrapped.scene
# Access managers
obs_manager = scene.observation_manager
reward_manager = scene.reward_manager
action_manager = scene.action_manager
# Get manager info
print(f"Observation terms: {list(obs_manager.terms.keys())}")
print(f"Reward terms: {list(reward_manager.terms.keys())}")
Episode Statistics#
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()
episode_rewards = []
episode_lengths = []
for _ in range(10000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
if "episode" in info:
episode_rewards.append(info["episode"]["r"])
episode_lengths.append(info["episode"]["l"])
print(f"Mean episode reward: {sum(episode_rewards) / len(episode_rewards):.2f}")
print(f"Mean episode length: {sum(episode_lengths) / len(episode_lengths):.0f}")
Individual Reward Terms#
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()
# Step once to get rewards
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
# Access individual reward terms
reward_manager = env.unwrapped.scene.reward_manager
for term_name, term_value in reward_manager.term_rewards.items():
mean_value = term_value.mean().item()
print(f"{term_name}: {mean_value:.4f}")
Common Issues#
Environment Creation Fails#
Problem: ImportError: cannot import name 'genesislab'
Solution: Make sure you imported genesislab.envs to register environments.
Simulation is Slow#
Problem: Low FPS
Solutions:
Increase
num_envs(better GPU utilization)Use
headless=True(no visualization)Check GPU is being used
Robot Falls Immediately#
Problem: With random actions, robot falls
This is normal! Random actions don’t produce useful behavior. The robot needs to be trained with RL to learn walking.
Next Steps#
Learn basic concepts to understand how it works
Customize environment configuration
Follow training tutorial to train a policy
Explore advanced topics