Categories
dev evolution Locomotion simulation

Evolution (Part 1)

Ok next we’re going to be looking at https://github.com/bulletphysics/bullet3/tree/master/examples/Evolution

According to some blog bit: https://gitmemory.com/issue/bulletphysics/bullet3/2118/468943185 there might be an issue: “The problem with this demo is the following: It can not optimize since the robots are not properly restored. Their physical settings are not properly restored and thus their performance varies. I think it should only serve as a bullet example that does some simulation. If you can improve it to restore the physics correctly, that is fine, but this might be hard to do if you are new to bullet. It would be best to reimplement it in pybullet, because it features deterministic restoring of robots.”

So will have to keep that in mind.

So starting in the NN3DWalkersTimeWarp code, we quickly run into ERP and CFM http://www.ode.org/ode-latest-userguide.html#sec_3_7_0 – These are error reduction parameter and constraint force mixing and help fix errors caused by joints that aren’t set up perfectly.

The code also loads a number of Constraint Solvers:

btSequentialImpulseConstraintSolver.h
btNNCGConstraintSolver.h
btMultiBodyConstraintSolver.h
btMultiBodyDynamicsWorld.h
btDantzigSolver.h
btSolveProjectedGaussSeidel.h
btLemkeSolver.h
btMLCPSolver.h

Hmm. Not sure what they do, but they enforce the joint constraints in the physics we’ve set up.

The TimeWarp class mostly deals with configuration settings for running the simulation at different speeds, and with different solvers, etc.

The setupBasicParamInterface() sets up the UI sliders. Then,

m_guiHelper->setUpAxis(1); // Set Y axis as Up axis

createEmptyDynamicsWorld(); // create an empty dynamic world

m_guiHelper->autogenerateGraphicsObjects(m_dynamicsWorld);

The Dynamics world is configured with

btDefaultCollisionConfiguration in a btCollisionDispatcher

a default btDbvtBroadphase (a sweeping style collision detection)

and whichever solver is being used.

The Dynamics world is initiated as a btDiscreteDynamicsWorld

There’s timeWarpSimulation which we override, and the stepSimulation which is in a loop called the ‘canonical game loop’. It updates any variables changed by the UI parameters, then calls our code, and then updates and time variables that need refreshing, and gives you a chance to update some graphics. It gets to the end of the loop, checks the time, to decide whether to render the graphics, or to do the loop again.

That’s sufficient for now, to move on with the CPP file code.

https://github.com/bulletphysics/bullet3/blob/master/examples/Evolution/NN3DWalkers.cpp

Another interesting project I saw now, using OGRE, which is some other physics engine, I think, is https://github.com/benelot/minemonics – has some evolved life too.

Jumper 1

But we are evolving something that looks more like these creepy rainbow spider bombs, that ‘we made’ in MotorDemo.cpp.

class NNWalker
{
	btDynamicsWorld* m_ownerWorld;
	btCollisionShape* m_shapes[BODYPART_COUNT];
	btRigidBody* m_bodies[BODYPART_COUNT];
	btTransform m_bodyRelativeTransforms[BODYPART_COUNT];
	btTypedConstraint* m_joints[JOINT_COUNT];
	btHashMap<btHashPtr, int> m_bodyTouchSensorIndexMap;
	bool m_touchSensors[BODYPART_COUNT];
	btScalar m_sensoryMotorWeights[BODYPART_COUNT * JOINT_COUNT];

	bool m_inEvaluation;
	btScalar m_evaluationTime;
	bool m_reaped;
	btVector3 m_startPosition;
	int m_index;

These above are the variables for the walker, which is similar to the MotorDemo critter. The code has been upgraded here and there since MotorDemo.cpp. It’s pretty great that people share their hard work so that we can just waltz on in and hack a quick robot together. This guy https://github.com/erwincoumans founded the Bullet project, and now works for Google Brain. (on Google Brain?)

So these are some of the public methods for the WalkersExample

	// Evaluation

	void update(const btScalar timeSinceLastTick);

	void updateEvaluations(const btScalar timeSinceLastTick);

	void scheduleEvaluations();

	void drawMarkings();



	// Reaper

	void rateEvaluations();

	void reap();

	void sow();

	void crossover(NNWalker* mother, NNWalker* father, NNWalker* offspring);

	void mutate(NNWalker* mutant, btScalar mutationRate);

	NNWalker* getRandomElite();

	NNWalker* getRandomNonElite();

	NNWalker* getNextReaped();

	void printWalkerConfigs();

Let’s just go through these for now.

We want to evolve behaviours based on stimulus, and the NNWalkers even have m_touchSensors, which we’ll have a look at later, to wee what they let us do.

Interlude: http://news.mit.edu/2015/algorithm-helps-robots-handle-uncertainty-0602 here says the following, and also has a video with sick drone rock music. This is a good part to remember, that either the robot is pausing, evaluating, pausing, evaluating, to accomplish the behaviours using ‘online’ feedback and offline policy processing.

There’s an offline planning phase where the agents can figure out a policy together that says, ‘If I take this set of actions, given that I’ve made these observations during online execution, and you take these other sets of actions, given that you’ve made these observations, then we can all agree that the whole set of actions that we take is pretty close to optimal,’” says Shayegan Omidshafiei, an MIT graduate student in aeronautics and astronautics and first author on the new paper. “There’s no point during the online phase where the agents stop and say, ‘This is my belief. This is your belief. Let’s come up with a consensus on the best overall belief and replan.’ Each one just does its own thing.

So the parts to do with Evaluation are:

// Evaluation

void update(const btScalar timeSinceLastTick);

void updateEvaluations(const btScalar timeSinceLastTick);

void scheduleEvaluations();

void drawMarkings();

And here’s the code for all the evolution stuff:

bool fitnessComparator(const NNWalker* a, const NNWalker* b)
{
return a->getFitness() > b->getFitness(); // sort walkers descending
}

So we are maximising fitness, and this is the Comparator function

The ratingEvaluations function orders by fitness, and prints the square root of the best performing individual’s distance fitness. Then we iterate through the walkers, update the time series canvas tick (next generation). Zero the times, and counters.

void NN3DWalkersExample::rateEvaluations()
 {
     m_walkersInPopulation.quickSort(fitnessComparator);  // Sort walkers by fitness
b3Printf("Best performing walker: %f meters", btSqrt(m_walkersInPopulation[0]->getDistanceFitness()));

for (int i = 0; i < NUM_WALKERS; i++)
{
    m_timeSeriesCanvas->insertDataAtCurrentTime(btSqrt(m_walkersInPopulation[i]->getDistanceFitness()), 0, true);
}
m_timeSeriesCanvas->nextTick();

for (int i = 0; i < NUM_WALKERS; i++)
{
    m_walkersInPopulation[i]->setEvaluationTime(0);
}
m_nextReaped = 0;

}

The reap function uses the REAP_QTY (0.3) to iterate backwards through the worst 30%, set them to reaped.

void NN3DWalkersExample::reap()
{
int reaped = 0;
for (int i = NUM_WALKERS - 1; i >= (NUM_WALKERS - 1) * (1 - REAP_QTY); i--)
{ // reap a certain percentage
m_walkersInPopulation[i]->setReaped(true);
reaped++;
b3Printf("%i Walker(s) reaped.", reaped);
}
}

getRandomElite and getRandomNonElite use SOW_ELITE_QTY (set to 0.2). The functions return one in the first 20% and the last 80%.

NNWalker* NN3DWalkersExample::getRandomElite()
{
return m_walkersInPopulation[((NUM_WALKERS - 1) * SOW_ELITE_QTY) * (rand() / RAND_MAX)];
}
NNWalker* NN3DWalkersExample::getRandomNonElite()
{
return m_walkersInPopulation[(NUM_WALKERS - 1) * SOW_ELITE_QTY + (NUM_WALKERS - 1) * (1.0f - SOW_ELITE_QTY) * (rand() / RAND_MAX)];
}

The getNextReaped() function checks if we’ve reaped the REAP_QTY percentage yet. If not, increment counter and return the next reaped individual.

NNWalker* NN3DWalkersExample::getNextReaped()
 {
     if ((NUM_WALKERS - 1) - m_nextReaped >= (NUM_WALKERS - 1) * (1 - REAP_QTY))
     {
         m_nextReaped++;
     }

if (m_walkersInPopulation[(NUM_WALKERS - 1) - m_nextReaped + 1]->isReaped())
 {
     return m_walkersInPopulation[(NUM_WALKERS - 1) - m_nextReaped + 1];
 }
 else
 {
     return NULL;  // we asked for too many
 }
}

Next is sow()…

SOW_CROSSOVER_QTY is 0.2,
SOW_ELITE_PARTNER is 0.8.
SOW_MUTATION_QTY is 0.5.
MUTATION_RATE is 0.5.
SOW_ELITE_QTY is 0.2.
So we iterate over 20% of the population, increase sow count, get random elite mother, and 80% of the time, a random elite father (20% non-elite). Grab a reaped individual, and make it into a crossover of the parents.

Then mutations are performed on the population from 20% to 70%, passing in some scalar to the mutate().

Finally, the REAP_QTY – SOW_CROSSOVER_QTY is 10%, who are sown as randomized motor weight individuals.

Ok…

void NN3DWalkersExample::sow()

{

 int sow = 0;

 for (int i = 0; i < NUM_WALKERS * (SOW_CROSSOVER_QTY); i++)

 {  // create number of new crossover creatures

 sow++;

 b3Printf("%i Walker(s) sown.", sow);

 NNWalker* mother = getRandomElite();                                                                  // Get elite partner (mother)

 NNWalker* father = (SOW_ELITE_PARTNER < rand() / RAND_MAX) ? getRandomElite() : getRandomNonElite();  //Get elite or random partner (father)

 NNWalker* offspring = getNextReaped();

 crossover(mother, father, offspring);

 }

 

 for (int i = NUM_WALKERS * SOW_ELITE_QTY; i < NUM_WALKERS * (SOW_ELITE_QTY + SOW_MUTATION_QTY); i++)

 {  // create mutants

 mutate(m_walkersInPopulation[i], btScalar(MUTATION_RATE / (NUM_WALKERS * SOW_MUTATION_QTY) * (i - NUM_WALKERS * SOW_ELITE_QTY)));

 }

 

 for (int i = 0; i < (NUM_WALKERS - 1) * (REAP_QTY - SOW_CROSSOVER_QTY); i++)

 {

 sow++;

 b3Printf("%i Walker(s) sown.", sow);

 NNWalker* reaped = getNextReaped();

 reaped->setReaped(false);

 reaped->randomizeSensoryMotorWeights();

 }

}


Crossover goes through the joints, and sets half of the motor weights to the mom and half to the dad, randomly.

void NN3DWalkersExample::crossover(NNWalker* mother, NNWalker* father, NNWalker* child)
{
	for (int i = 0; i < BODYPART_COUNT * JOINT_COUNT; i++)
	{
		btScalar random = ((double)rand() / (RAND_MAX));

		if (random >= 0.5f)
		{
			child->getSensoryMotorWeights()[i] = mother->getSensoryMotorWeights()[i];
		}
		else
		{
			child->getSensoryMotorWeights()[i] = father->getSensoryMotorWeights()[i];
		}
	}
}

mutate() takes a mutation rate, and randomizes motor weights at that rate.


void NN3DWalkersExample::mutate(NNWalker* mutant, btScalar mutationRate)
{
	for (int i = 0; i < BODYPART_COUNT * JOINT_COUNT; i++)
	{
		btScalar random = ((double)rand() / (RAND_MAX));

		if (random >= mutationRate)
		{
			mutant->getSensoryMotorWeights()[i] = ((double)rand() / (RAND_MAX)) * 2.0f - 1.0f;
		}
	}
}

Ok good enough.

Let’s move on to the evaluation code…