# Your First Environment

Let's run your first GenesisLab environment and understand what's happening.

## Running a Pre-built Environment

```python
import gymnasium as gym
import genesislab.envs  # Registers GenesisLab environments

# Create environment
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)

# Get environment information
print(f"Observation space: {env.observation_space}")
print(f"Action space: {env.action_space}")
print(f"Number of environments: {env.unwrapped.num_envs}")

# Reset environment
obs, info = env.reset()
print(f"Observation shape: {obs.shape}")
print(f"Initial reward: {info.get('episode', {}).get('r', 0)}")

# Run simulation
for step in range(1000):
    # Sample random action
    action = env.action_space.sample()
    
    # Step environment
    obs, reward, terminated, truncated, info = env.step(action)
    
    # Print episode statistics when episodes end
    if "episode" in info:
        print(f"Step {step}: Episode reward: {info['episode']['r']:.2f}, "
              f"Length: {info['episode']['l']}")

# Cleanup
env.close()
```

## Understanding the Output

### Observation Space

```python
Box(-inf, inf, (48,), float32)
```

This means:
- **Type**: Continuous (Box)
- **Shape**: (48,) - 48-dimensional observation vector
- **Range**: Unbounded

**What's in the observation?**
For the Go2 robot on flat terrain:
- Base linear velocity (3)
- Base angular velocity (3)
- Projected gravity (3)
- Command velocities (3)
- Joint positions (12)
- Joint velocities (12)
- Last actions (12)

### Action Space

```python
Box(-1.0, 1.0, (12,), float32)
```

This means:
- **Type**: Continuous (Box)
- **Shape**: (12,) - 12 DOF robot
- **Range**: [-1, 1] - normalized action space

**What do actions represent?**
- Joint position targets (relative to default position)
- Actions are scaled and added to default joint positions
- PD controller converts targets to motor torques

## Running with Visualization

To see what's happening:

```python
import gymnasium as gym
import genesislab.envs

# Create environment with viewer
env = gym.make(
    "GenesisLab-Go2-Flat-v0",
    num_envs=1,  # Use 1 env for easier visualization
    headless=False  # Enable viewer
)

obs, info = env.reset()

# Run and watch
for _ in range(1000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    
    # Slow down for visualization
    import time
    time.sleep(0.01)

env.close()
```

## Understanding the Simulation Loop

### 1. Reset

```python
obs, info = env.reset()
```

**What happens:**
1. Robot state is initialized (position, velocity, etc.)
2. Commands are sampled (target velocities)
3. Terrain is reset (if using curriculum)
4. Domain randomization events fire
5. Initial observations are computed

### 2. Step

```python
obs, reward, terminated, truncated, info = env.step(action)
```

**What happens:**
1. Actions are processed and scaled
2. PD controller computes motor torques
3. Genesis physics simulation steps forward
4. Sensors are updated
5. Observations are computed
6. Rewards are computed
7. Termination conditions are checked

### 3. Episode End

When `terminated` or `truncated` is True:
- Episode statistics are logged
- Environment auto-resets
- New episode begins

## Exploring Different Environments

### Flat Terrain

```python
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
```

Good for:
- Initial training
- Policy debugging
- Fast iteration

### Rough Terrain

```python
env = gym.make("GenesisLab-Go2-Rough-v0", num_envs=1024)
```

Features:
- Procedurally generated height maps
- Stairs, slopes, stepping stones
- Terrain curriculum

### Custom Configuration

```python
from genesislab.tasks.go2_flat import Go2FlatEnvCfg

# Modify configuration
cfg = Go2FlatEnvCfg()
cfg.scene.num_envs = 8192
cfg.scene.env_spacing = 5.0
cfg.rewards.forward_vel.weight = 2.0

# Create environment
env = gym.make("GenesisLab-Go2-Flat-v0", cfg=cfg)
```

## Training with a Policy

### Random Policy

```python
import torch

class RandomPolicy:
    def __init__(self, action_dim):
        self.action_dim = action_dim
    
    def __call__(self, obs):
        batch_size = obs.shape[0]
        return torch.rand(batch_size, self.action_dim) * 2 - 1  # [-1, 1]

# Create policy
policy = RandomPolicy(action_dim=12)

# Run simulation
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()

for _ in range(1000):
    action = policy(torch.from_numpy(obs))
    obs, reward, terminated, truncated, info = env.step(action.numpy())
```

### Simple MLP Policy

```python
import torch.nn as nn

class MLPPolicy(nn.Module):
    def __init__(self, obs_dim, action_dim, hidden_dim=256):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(obs_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, action_dim),
            nn.Tanh()  # Output in [-1, 1]
        )
    
    def forward(self, obs):
        return self.net(obs)

# Create and use policy
policy = MLPPolicy(obs_dim=48, action_dim=12).cuda()

env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()
obs_tensor = torch.from_numpy(obs).cuda()

for _ in range(1000):
    with torch.no_grad():
        action = policy(obs_tensor)
    obs, reward, terminated, truncated, info = env.step(action.cpu().numpy())
    obs_tensor = torch.from_numpy(obs).cuda()
```

## Accessing Environment Information

### Scene and Managers

```python
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)

# Access the LabScene
scene = env.unwrapped.scene

# Access managers
obs_manager = scene.observation_manager
reward_manager = scene.reward_manager
action_manager = scene.action_manager

# Get manager info
print(f"Observation terms: {list(obs_manager.terms.keys())}")
print(f"Reward terms: {list(reward_manager.terms.keys())}")
```

### Episode Statistics

```python
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()

episode_rewards = []
episode_lengths = []

for _ in range(10000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    
    if "episode" in info:
        episode_rewards.append(info["episode"]["r"])
        episode_lengths.append(info["episode"]["l"])

print(f"Mean episode reward: {sum(episode_rewards) / len(episode_rewards):.2f}")
print(f"Mean episode length: {sum(episode_lengths) / len(episode_lengths):.0f}")
```

### Individual Reward Terms

```python
env = gym.make("GenesisLab-Go2-Flat-v0", num_envs=1024)
obs, info = env.reset()

# Step once to get rewards
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)

# Access individual reward terms
reward_manager = env.unwrapped.scene.reward_manager
for term_name, term_value in reward_manager.term_rewards.items():
    mean_value = term_value.mean().item()
    print(f"{term_name}: {mean_value:.4f}")
```

## Common Issues

### Environment Creation Fails

**Problem**: `ImportError: cannot import name 'genesislab'`

**Solution**: Make sure you imported `genesislab.envs` to register environments.

### Simulation is Slow

**Problem**: Low FPS

**Solutions**:
1. Increase `num_envs` (better GPU utilization)
2. Use `headless=True` (no visualization)
3. Check GPU is being used

### Robot Falls Immediately

**Problem**: With random actions, robot falls

**This is normal!** Random actions don't produce useful behavior. The robot needs to be trained with RL to learn walking.

## Next Steps

- Learn [basic concepts](basic_concepts.md) to understand how it works
- Customize [environment configuration](environment_configuration.md)
- Follow [training tutorial](../tutorials/basic_locomotion.md) to train a policy
- Explore [advanced topics](../advanced_topics/index.md)