Genetic Programming with Hadoop
Dan Rosanova
In addition to being an excellent Big Data tool, Hadoop is a powerful platform for distributed computing and lends itself well to Monte Carlo simulations. This class will focus on leveraging concepts of genetic algorithms and genetic programing in the Hadoop environment. Concepts, tools, and strategies will be discussed in general form, including program representation, population creation, cross breeding, and fitness selection. Specific applications for both graph routing and trading algorithms will be detailed and demonstrated running in a Hadoop cluster.   

Genetic Programming is a subset of evolutionary computation – or programs that write themselves. The programs converge on a solution by following biological principles to propagate and cross breed successful program characteristics. Genetic programs have been widely used in scientific and engineering applications and are beginning to find their way into wide spread computing applications including routing, pricing, trading strategies, and vision systems.

Hadoop provides the opportunity to scale genetic programing into more complex areas by allowing the developer to focus on the problem space and not on the intricacies of distributed computing. This class will introduce genetic programming and show two applied applications. One a simple ant/search program, the other a currency trading model. The genetic representations of these programs and their runtime experience on Hadoop will be examined and compared to other approaches such as MPI, HPC and GPU.  

Level : Advanced