Abstract
This project demonstrates the use of Covariance Matrix Adaptation Evolution Strategy (CMA-ES) for reinforcement learning in the CartPole-v1 environment. The agent optimizes a linear policy using evolutionary methods, requiring no gradients or neural networks.
Technical
- Environment: CartPole-v1 (Gymnasium)
- Policy: Linear mapping from state to action
- Algorithm: CMA-ES (evolutionary strategy)
- Population: 16
- Generations: 100
Results
- Mean Reward: 500.0 (optimal)
- Evaluation Episodes: 10
- Convergence: Within 5 generations
References
- Hansen, N., & Ostermeier, A. (2001). Evolutionary Computation, 9(2), 159-195.
- Salimans, T. et al. (2017). arXiv:1703.03864.
- Brockman, G. et al. (2016). arXiv:1606.01540.
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.