Lab 8: Evolving a Detection Predicate Using Genetic Programming (GP)

Your goal is to use Genetic Programming (GP) to evolve a detection predicate that features low false positive and low false negative detection rates for the Trojan java application from the sixth lab. It would be interesting if an algorithm can come up with a detection predicate that is better than that created by a human in the fifth lab.

Write a GP-based tool that evolves a detection predicate that is suitable for finding elusive malware features while keeping false positives low. The detection predicate that is evolved from your GP will likely be more complicated than the one that simply conjoins the sensor ranges. Ideally, the evolved detection predicate will have fewer false positives and false negatives than the detection predicate that was created for the fifth lab.

This idea has never been tried before, so if your results are good, you should be able to publish a research paper on this technique. If the GP approach does not work well, you can try something else. Remember, this is research. Please consult the GP Tutorial and GP Resources websites that can be accessed from the course website. You might want to take advantage of the substantial computation resources of the CS TUX cluster to evolve your solution. Remember that GPs are embarrassingly parallelizable.

The key is to design a good fitness function that can yield a good result in a reasonable amount of time. You are encouraged to experiment with several fitness functions until you find one that works well. The rest of the GP design is straightforward, but you can try different options for example, the boolean operators of the evolved predicate can be limited to just NOT and AND instead of a full array of boolean operators; perhaps normalizing the sensor values across all sensors will make crossover and mutation easier, and so on.