CMA-ES Agent | Minimal Overview

Abstract

This project demonstrates the use of Covariance Matrix Adaptation Evolution Strategy (CMA-ES) for reinforcement learning in the CartPole-v1 environment. The agent optimizes a linear policy using evolutionary methods, requiring no gradients or neural networks.

Technical

Environment: CartPole-v1 (Gymnasium)
Policy: Linear mapping from state to action
Algorithm: CMA-ES (evolutionary strategy)
Population: 16
Generations: 100

Results

Mean Reward: 500.0 (optimal)
Evaluation Episodes: 10
Convergence: Within 5 generations

References

Hansen, N., & Ostermeier, A. (2001). Evolutionary Computation, 9(2), 159-195.
Salimans, T. et al. (2017). arXiv:1703.03864.
Brockman, G. et al. (2016). arXiv:1606.01540.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.