Artificial Intelligence (AI) and Machine Learning (ML) play integral roles in our lives.  In fact, many of you probably came across this blog post due to a type of one of these systems.  AI is the idea that machines should be taught to do tasks (everything from search engines to driving cars).  ML is an application of AI where machines get to learn for themselves based on available data.

ML is gaining popularity in the evaluation of job candidates because, given large enough datasets, the process can find small, but predictive, bits of data and maximize their use.  This idea of letting the data guide decisions is not new.  I/O psychologists used this kind of process when developing work/life inventories (biodata) and examining response patterns of test items (item response theory—IRT).  The approaches have their advantages (being atheoretical, they are free from pre-conceptions) and problems (the number of people participating need to be very large so that results are not subject to peculiarities about the sample).  ML accelerated the ideas behind both biodata and IRT, which I think has led to solutions that don’t generalize well.  But, that’s for another blog post.

What is important here is the data made available and whether that data is biased.  For instance, if your hiring algorithm includes zipcodes or a classification of college/university attended, it has race baked in.  This article has several examples of how ML systems get well trained on only the data that goes in, leading to all kinds of biases (and not just human ones).  So, if your company wants to avoid bias based on race, sex, and age, it needs to dig into each element the ML is looking at to see if it is a proxy for something else (for instance, many hobbies are sex specific).  You then have to ask yourself whether the predictive value of that bit is worth the bias it has.

Systemic bias in hiring is insidious and we need to hunt it down.  It is not enough to say, “We have a data driven system” and presume that it is not discriminatory.  If the ML driving it was based on inadvertent bias, it will perpetuate it.  We need to check the elements that go into these systems to ensure that they are valid and fair to candidates.

I’d like to thank Dennis Adsit for recommending the article from The Economist to me.