Often, we have to choose one out of two options based on statistics.
But Manier times, Statistical Data has a lot of hidden Lurking Variables that have not been considered.
Imagine you need to choose between 2 hospitals for surgery.
Out of each hospital's last 1000 patients, 900 survived at Hospital A, while only 500 survived at Hospital B. (This is aggregated data)
So, it would seem like Hospital A is the better choice. Let’s find out!
Do all the patients arrive at the hospital with the same level of health? No.
If we divide both hospital's last 1000 patients into those who arrived in good health and poor health respectively, our decision may start to vary ->
If we say that most patients in Hospital A were in good health (still they could not save 100 lives) whereas Hospital B had most in poor health, but they could save 500 lives!
So, condition of the patient is the Lurking variable in this situation.
Now, you might consider Hospital B as a better option, this is known as:
It often occurs when we combine data that hides a conditional variable, sometimes known as a Lurking Variable i.e. a hidden additional factor that significantly influences results.
So how do we avoid falling for the paradox?
Unfortunately, there's no single answer to it. All we can do is carefully examine the actual situation & stats along with the lurking variables which may exist.
Otherwise, we’ll leave ourselves vulnerable to those who would manipulate data to promote their own agendas.