Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

How can one select the proper number of parameters for a genetic algorithm to mo

ID: 654441 • Letter: H

Question

How can one select the proper number of parameters for a genetic algorithm to model a given system?

For example, say you want to optimize production of cars, and you have 1,000 measurements of hourly efficiency at various tasks for each of 1,000 different employees. So, you have 1,000,000 data points. Most of these are likely to be weakly correlated to the overall efficiency of your factory, but not so weakly that you can say they are irrelevant with statistical confidence. How do you go about picking inputs for your GA so that you don't have 1,000,000+ degrees of freedom, resulting in very slow convergence or no convergence at all?

Specifically, what are the algorithms one could use to pre-select or selectively eliminate features?

One approach I have used myself in this scenario is to evolve the parameter selection itself, so I might have parents like {a,b,c}, {b,d,e,q,x,y,z}, and so on. I would then mutate the children to add or drop features. This works well for a few dozen features. But the problem is that it is inefficient if there is a large number of degrees of freedom. In that case, you are looking at 10^n combinations (in the example above, 10^1,000,000), which makes some pre-filtering of features critical to get any kind of useful performance.

Explanation / Answer

I have never done this before, and obviously don't have access to said data, but a potentially good way to do this would be through clustering. For each employee, we have an n-dimensional vector, where each dimension cooresponds to a different task. Then, we can use clustering to group "similar" employees together; however, this is going to be solely dependent on your data, ie it's quite possible that given only 1000 employees that clustering will yield groups of employees that aren't really all that related, and so while we may get a reduction in population, it may be at the expense of information loss.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote