Skip to content

Latest commit

 

History

History
140 lines (88 loc) · 5.66 KB

File metadata and controls

140 lines (88 loc) · 5.66 KB

Development Notes

This document is a place for developers to collaborate on various aspects of this project.

Project 1: Clean up

Summary This project is a consolidation of numerous other smaller projects. I am starting a cleanup project, with the following goals:

  • Sim runs slowly. Vectorized the execution and store data more efficiently.
  • Define a custom venv with documented deps/requirements.
  • Build a single config file that feeds the entire simulation.
  • I use a mix of scripting and object-oriented modules; make better use of object-oriented design.
  • Standardize RL with a base class, make applicable to all techniques.
  • Standardize the planners in/out.
  • Store data online.
  • Make more robust to incorrect configs.
  • Build unit tests.
  • Build logging module.

Project 2: Learning

Proposed: Inspired by recent work entitled Latent space modeling of parametric and time-dependent PDEs using neural ODEs, we are working on ways to learn the latent space currently explicity defined in in lemni_tools.py.

Project 3: Hypospace learning

Summary: Initially proposed in this repo, we introduced a technique for control the orientation of a Lemniscatic Arch swarm using reinforcement learning; specifically, we using Continuous Action Learning Automata (CALA) to tune offset angles around the $x$- and $z$-axes.

We have since integrated this technique into the broader multi-agent framework.

Files in play

The following files are in play for this project:

# where the learning happens       
├── learner/CALA_control.py 

# we use a manually-placed obstacle as a reference landmark for learning. The swarms is training to align to aim itself as this. The axis for alignment is defined by a line drawn from the center of mass of the swarm to the center of the lemniscate trajectory.          
├── obstacles/obstacles.py

# this is where the lemniscate trajectory is defined (and uses the learned offsets)
├── planner/techniques/lemni_tools.py                 

# all this drives a desired trajectory, which is stored here 
├── planner/trajectory.py        

# the underlying control to follow this trajectory is orchestrated by a master controller here (integrates all other components)
├── orchestrator.py         

Where the work needs to be done

Inter-agent communication

For now, all agents share their statistics and the learned parameter. This is just to get things working. We'll loosen this up later.

This is accomplished near line 165 in learner/CALA_control.py at negotiate_with_neighbours(). This is accomplished by setting the flag leader_follower = True and setting the leader = 0. This means agent 0 does all the learning.

Applying the learned offsets

Around line 263 in planner/techniques/lemni_tools.py, we define the lemniscate trajectory. This is where we integrate the learned offsets from CALA. This ultimately drives the orientation of the swarm.

# -------------------------- #
# offset by learned parameter
# -------------------------- #

lemni[0, m] = base_theta  

if 'x' in learning_axes:
    lemni[0, m] += learn_actions.get('x', np.zeros(nVeh))[m]
    
if 'z' in learning_axes:

    lemni[1, m] = learn_actions.get('z', np.zeros(nVeh))[m]

I think we should instead be learning some kind of policy, rather than the offset directly. It sort of works, but I think we can do better.

Reward structure

For now, we couple the rewards in the $x$- and $z$-axes by setting the flag reward_coupling == 2.

Around line 331 in learner/CALA_control.py, we define the reward structure.

def update_reward_increment(self, k_node, state, centroid, focal, target, mode):
    
    if self.reward_mode == 'target':
        
        reference   = 'global'      # 'global' (default),   'local' (not working yet)
        reward_form = 'dot'         # 'dot'(default),       'angle' (not working yet)
        
        # compute the heading vector (centered on centroid)
        v_centroid      = centroid[0:3, 0]
        v_focal         = focal[0:3]
        v_heading       =  v_centroid - v_focal
        
        # compute the target vector (centered on centroid)
        if target.shape[1] == 0:
            v_target = - v_focal
        else:
            v_target = target[0:3, 0] - v_focal
        
        # =============
        # when coupled
        # =============
        
        if reward_coupling == 2:
            
            if reference == 'global':
                
                v1 = v_heading
                v2 = v_target
    
                
                if reward_form == 'dot':
    
                    v1 /= (np.linalg.norm(v1) + epsilon)
                    v2 /= (np.linalg.norm(v2) + epsilon)
                    reward = (np.dot(v1, v2) + 1) / 2


                elif reward_form == 'angle':
                    
                    reward_sigma = 0.5
                    v1 /= (np.linalg.norm(v1) + epsilon)
                    v2 /= (np.linalg.norm(v2) + epsilon)
                    angle_diff = np.arccos(np.clip(np.dot(v1, v2), -1.0, 1.0))
                    reward = np.exp(-angle_diff**2 / reward_sigma**2)  # Gaussian bump at 0

Essentially, we are attempting to maximize the value of the dot product between the heading vector (from the focal agent to the centroid of the swarm) and the target vector (from the focal agent to the target landmark). This short or works... but I think we need to work on this.

Results so far

Available here