# Your First Environment Let's run your first GenesisLab environment and understand what's happening. ## Running a Pre-built Environment ```python import gymnasium as gym import genesislab.envs # Registers GenesisLab environments # Create environment env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024) # Get environment information print(f"Observation space: {env.observation_space}") print(f"Action space: {env.action_space}") print(f"Number of environments: {env.unwrapped.num_envs}") # Reset environment obs, info = env.reset() print(f"Observation shape: {obs.shape}") print(f"Initial reward: {info.get('episode', {}).get('r', 0)}") # Run simulation for step in range(1000): # Sample random action action = env.action_space.sample() # Step environment obs, reward, terminated, truncated, info = env.step(action) # Print episode statistics when episodes end if "episode" in info: print(f"Step {step}: Episode reward: {info['episode']['r']:.2f}, " f"Length: {info['episode']['l']}") # Cleanup env.close() ``` ## Understanding the Output ### Observation Space ```python Box(-inf, inf, (48,), float32) ``` This means: - **Type**: Continuous (Box) - **Shape**: (48,) - 48-dimensional observation vector - **Range**: Unbounded **What's in the observation?** For the Go2 robot on flat terrain: - Base linear velocity (3) - Base angular velocity (3) - Projected gravity (3) - Command velocities (3) - Joint positions (12) - Joint velocities (12) - Last actions (12) ### Action Space ```python Box(-1.0, 1.0, (12,), float32) ``` This means: - **Type**: Continuous (Box) - **Shape**: (12,) - 12 DOF robot - **Range**: [-1, 1] - normalized action space **What do actions represent?** - Joint position targets (relative to default position) - Actions are scaled and added to default joint positions - PD controller converts targets to motor torques ## Running with Visualization To see what's happening: ```python import gymnasium as gym import genesislab.envs # Create environment with viewer env = gym.make( "GenesisLab-Go2-Flat-v0", num_envs=1, # Use 1 env for easier visualization headless=False # Enable viewer ) obs, info = env.reset() # Run and watch for _ in range(1000): action = env.action_space.sample() obs, reward, terminated, truncated, info = env.step(action) # Slow down for visualization import time time.sleep(0.01) env.close() ``` ## Understanding the Simulation Loop ### 1. Reset ```python obs, info = env.reset() ``` **What happens:** 1. Robot state is initialized (position, velocity, etc.) 2. Commands are sampled (target velocities) 3. Terrain is reset (if using curriculum) 4. Domain randomization events fire 5. Initial observations are computed ### 2. Step ```python obs, reward, terminated, truncated, info = env.step(action) ``` **What happens:** 1. Actions are processed and scaled 2. PD controller computes motor torques 3. Genesis physics simulation steps forward 4. Sensors are updated 5. Observations are computed 6. Rewards are computed 7. Termination conditions are checked ### 3. Episode End When `terminated` or `truncated` is True: - Episode statistics are logged - Environment auto-resets - New episode begins ## Exploring Different Environments ### Flat Terrain ```python env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024) ``` Good for: - Initial training - Policy debugging - Fast iteration ### Rough Terrain ```python env = gym.make("GenesisLab-Go2-Rough-v0", num_envs=1024) ``` Features: - Procedurally generated height maps - Stairs, slopes, stepping stones - Terrain curriculum ### Custom Configuration ```python from genesislab.tasks.go2_flat import Go2FlatEnvCfg # Modify configuration cfg = Go2FlatEnvCfg() cfg.scene.num_envs = 8192 cfg.scene.env_spacing = 5.0 cfg.rewards.forward_vel.weight = 2.0 # Create environment env = gym.make("GenesisLab-Go2-Flat-v0", cfg=cfg) ``` ## Training with a Policy ### Random Policy ```python import torch class RandomPolicy: def __init__(self, action_dim): self.action_dim = action_dim def __call__(self, obs): batch_size = obs.shape[0] return torch.rand(batch_size, self.action_dim) * 2 - 1 # [-1, 1] # Create policy policy = RandomPolicy(action_dim=12) # Run simulation env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024) obs, info = env.reset() for _ in range(1000): action = policy(torch.from_numpy(obs)) obs, reward, terminated, truncated, info = env.step(action.numpy()) ``` ### Simple MLP Policy ```python import torch.nn as nn class MLPPolicy(nn.Module): def __init__(self, obs_dim, action_dim, hidden_dim=256): super().__init__() self.net = nn.Sequential( nn.Linear(obs_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, action_dim), nn.Tanh() # Output in [-1, 1] ) def forward(self, obs): return self.net(obs) # Create and use policy policy = MLPPolicy(obs_dim=48, action_dim=12).cuda() env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024) obs, info = env.reset() obs_tensor = torch.from_numpy(obs).cuda() for _ in range(1000): with torch.no_grad(): action = policy(obs_tensor) obs, reward, terminated, truncated, info = env.step(action.cpu().numpy()) obs_tensor = torch.from_numpy(obs).cuda() ``` ## Accessing Environment Information ### Scene and Managers ```python env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024) # Access the LabScene scene = env.unwrapped.scene # Access managers obs_manager = scene.observation_manager reward_manager = scene.reward_manager action_manager = scene.action_manager # Get manager info print(f"Observation terms: {list(obs_manager.terms.keys())}") print(f"Reward terms: {list(reward_manager.terms.keys())}") ``` ### Episode Statistics ```python env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024) obs, info = env.reset() episode_rewards = [] episode_lengths = [] for _ in range(10000): action = env.action_space.sample() obs, reward, terminated, truncated, info = env.step(action) if "episode" in info: episode_rewards.append(info["episode"]["r"]) episode_lengths.append(info["episode"]["l"]) print(f"Mean episode reward: {sum(episode_rewards) / len(episode_rewards):.2f}") print(f"Mean episode length: {sum(episode_lengths) / len(episode_lengths):.0f}") ``` ### Individual Reward Terms ```python env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024) obs, info = env.reset() # Step once to get rewards action = env.action_space.sample() obs, reward, terminated, truncated, info = env.step(action) # Access individual reward terms reward_manager = env.unwrapped.scene.reward_manager for term_name, term_value in reward_manager.term_rewards.items(): mean_value = term_value.mean().item() print(f"{term_name}: {mean_value:.4f}") ``` ## Common Issues ### Environment Creation Fails **Problem**: `ImportError: cannot import name 'genesislab'` **Solution**: Make sure you imported `genesislab.envs` to register environments. ### Simulation is Slow **Problem**: Low FPS **Solutions**: 1. Increase `num_envs` (better GPU utilization) 2. Use `headless=True` (no visualization) 3. Check GPU is being used ### Robot Falls Immediately **Problem**: With random actions, robot falls **This is normal!** Random actions don't produce useful behavior. The robot needs to be trained with RL to learn walking. ## Next Steps - Learn [basic concepts](basic_concepts.md) to understand how it works - Customize [environment configuration](environment_configuration.md) - Follow [training tutorial](../tutorials/basic_locomotion.md) to train a policy - Explore [advanced topics](../advanced_topics/index.md)