The Soft Actor-Critic (SAC) model with Structured Data is a reinforcement learning algorithm that combines the advantages of both actor-critic methods and maximum entropy reinforcement learning. It is specifically designed to handle environments with continuous action spaces, making it well-suited for a wide range of tasks.
In this model, an agent learns to interact with an environment by taking actions that maximize the expected cumulative reward over time. It comprises three key components: an actor network, a critic network, and an entropy regularization term.
The actor network is responsible for selecting actions given the current state of the environment. It is trained to maximize the expected cumulative reward while also maximizing entropy to encourage exploration. The critic network, on the other hand, estimates the value function, which represents the expected cumulative reward achievable from a given state. Lastly, the entropy regularization term helps to balance exploration and exploitation by encouraging the policy to be stochastic.
To incorporate structured data into the SAC model, the input state space is expanded to include additional structured features. These features can provide additional context or domain knowledge to improve the agent's decision-making process. The model learns to leverage both the raw sensor data and the structured features to make informed actions in the environment.
Pros:
Cons:
Stable Baselines3
library.