Skip to content

A collection of reinforcement learning algorithms implemented from scratch. JointGYM is designed for the effortless integration of these algorithms into various applications.

Notifications You must be signed in to change notification settings

dwipddalal/joint-gym

Repository files navigation


Joint-Gym is a pybullet-based wrapper for Reinforcement Learning-based tasks.

Building simulation environment

  • We used pybullet to build a physics-based simulation environment that can be embedded with physics.

PyBullet is a physics engine that can be used for simulating rigid-body dynamics with contacts and is widely used in robotics and machine learning research. It is written in C++, but it provides a Python API, which allows it to be used with Python.

The PyBullet Python API can be used to create, simulate and control physics in a PyBullet simulation. It provides functions for creating and manipulating objects, applying forces and torques, and retrieving information about the simulation state. You can also use PyBullet's Python API to connect to other Python libraries such as OpenAI's Gym, TensorFlow, and PyTorch, to perform reinforcement learning tasks.

In summary, PyBullet is a powerful physics engine that can be used to simulate robotic arms and other multi-body systems, and its Python API allows it to be easily integrated with Python-based machine-learning libraries.

Some experiments that we ran to check for the robustness of pybullet environment

  • Controlling a 3r robot in pybullet

Concepts of Reinforcement Learning

Two major classes of algorithms:

  • Model-based algorithm
  • Model-free algorithm

Two classes of learning:

  • Online Learning
  • Offline Learning

Two classes of policy:

  • On-policy
  • Off-policy

Q Function

The action-value function, also known as the Q-function, is a fundamental concept in reinforcement learning that maps a state-action pair to the expected total reward of taking that action in that state and following a specific policy thereafter.

Formally, the action-value function is defined as:

$Q(s, a) = \mathbb{E}[R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + \ldots | S_t = s, A_t = a]$

Where s is the current state, a is the action taken in that state, R_t is the reward received at time t, γ is a discount factor that determines the importance of future rewards, and E[.] is the expected value operator.

The Q-function represents the quality of taking a particular action in a specific state. By computing the Q-values for all actions in each state, an agent can determine the best action to take in each state and thus optimize its behavior to maximize its expected total reward.

Objective in reinforcement learning

In RL the objective is to find an approximate function that can map the input state and action pair with the expected value.

To create a reinforcement learning-based model for solving the inverse kinematics problem of a 2-r robotic arm, we can use a deep reinforcement learning algorithm such as deep Q-learning.

steps to make RL model for this case:

Define the state space: The state space is the set of all possible states that the robot arm can be in. In this case, the state space can consist of the initial and final positions of the end-effector, as well as the angles of the two joints.

Define the action space: The action space is the set of all possible actions that the robot can take. In this case, the action space can consist of the change in angle for each of the two joints.

Define the reward function: The reward function is used to evaluate the goodness of a particular action taken in a given state. In this case, we can define the reward as the negative Euclidean distance between the current position of the end-effector and the target position.

Train the model: We can use deep Q-learning to train the model by iteratively updating the Q-values for each state-action pair. The Q-value represents the expected future reward for taking a particular action in a given state.

Test the model: Once the model is trained, we can test it by inputting a new initial and final position for the end-effector and having the model output the optimal angles for the two joints to reach the final position.

Analogy with Deep learning

Episode = Epoch

Results

https://drive.google.com/file/d/1lKRV3yjGzJWdM20cQ53CsYYHYlah5ED9/view?usp=sharing

Documentation Style

We have used the PEP 8 format style.

About

A collection of reinforcement learning algorithms implemented from scratch. JointGYM is designed for the effortless integration of these algorithms into various applications.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages