Intuition out of counter-intuition

varsoft

浏览: 2570024 次
性别:
来自: 上海

最近访客更多访客>>

wangyy

u012363178

songhait

骑驴找骆驼

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (3320)

社区版块

存档分类

Blog

Intuition out of counter-intuition

刘未鹏
C++的罗浮宫(http://blog.csdn.net/pongba)

Lately I stumbled across an interesting article about Bayes Theorem( you can find it here - please read it first( it's a pretty enjoyable article), otherwise you might not know what I'm talking about). The article is entitled "It's not so easy to predict murder, do the math".

As interesting as the writing is, the most intriguing part of it is the application of Bayes Theorem to calculate the chance of a correct guess when deducing if someone is a potential murder.

It turns out that, when the percentage of people who're potential murders becomes low, the odds that the psychiatrist makes a wrong guess about whether one is a potential murder becomes high.

For instance, let's assume that the "specificity" of a murder-prediction made by a psychiatrists is 99.9%, and the "sensitivity" 99.9%, too, then for, say, 10,000 people, of which 150 are potential murders, the chance that one, when predicted as a murder, would actually be a murder is about 150/151, which is a pretty high one.

But then it dives into the counter-intuitive part.

Assume only 1 out of 10,000 people are actually potential murders, then the chance that a predicted murder is actually a potential one is 1/2(50%), which suggests a pretty high chance that the psychiatrist could've been wrong.

However, the point is, when someone says that something is counter-intuitive, there's a good chance that we can find something underneath that is intuitive again. That is, the reason people are calling it counter-intuitive is just that they get something wrong along the line, which eventually leads to the counter-intuition.

In this particular case. The confusion roots in the understanding of the specificity/sensitivity of a murder-prediction. As was told, the specificity and the sensitivity are both 99.9%. This can give us a false-belief that the prediction is a highly accurate one, which, because of the vagueness of human natural language, can in turn lead us to the belief that, whatever the context is, a predicted murder is, of the probability 99.9%, an actual potential murder. And therein lies the problem.

Let's recap the definition of "specificity" and "sensitivity": when we say that the sensitivity of a prediction is 99.9%, that means that if one is actually something, then there's a pretty high chance(99.9%) that he/she is predicted as something. Similarly, a specificity of 99.9% suggests that when one isn't something, there's a pretty high chance(99.9%) that he/she isn't predicted as something. In terms of mathematic language, this is to say: P(Pred(A)|A) = 99.9% and P(~Pred(A)|~A) = 99.9%.

Now recall that we thought of this differently. Actually we thought that "a predicted murder is, of the probability 99.9%, an actual potential murder". In terms of mathematics, this is to say: P(A|Pred(A)) = 99.9%, which is of the reverse form w.r.t. the definition of sensitivity. And this is exactly the source of all the counter-intuition.

Once we've captured the essence of the definition of "specificity" and "sensitivity"( note that they could, if the same, be referred to as "accuracy" collectively), the left job is easy - we just need to use the Bayes Theorem mechanically:

Let A = "one is potentially a murder"; Pred(A) = "one is predicted as a potential murder".

preconditions:
P(Pred(A)|A) = 99.9%; P(~Pred(A)|A) = 0.1%;
P(~Pred(A)|~A) = 99.9%; P(Pred(A)|~A) = 0.1%;

Bayes Theorem application:
P(A|Pred(A)) = (P(Pred(A)|A)*P(A))/P(Pred(A)) ;
where P(Pred(A)) = P(A)*P(Pred(A)|A) + P(~A)*P(Pred(A)|~A).

Now assume we have 10,000 people accepting the test, 1 of them is actually potentially a murder.
Then we'd have P(A) = 0.0001; P(~A) = 1-P(A) = 0.9999; Plug them into the equation above, we have:

P(A|Pred(A)) = (0.999*0.0001)/(0.0001*0.999+0.9999*0.001) ~= 1/10;
This implies that, if one is predicted as a potential murder, then there's only a 1/10 probability that he/she is actually one. Pretty embarrassing result, isn't it?

And if we adjust P(~Pred(A)|~A) - the specificity - to 99.99%, which is the original setting of the article in question. This'll become:
P(A|Pred(A)) = (0.999*0.0001)/(0.0001*0.999+0.9999*0.0001) ~= 1/2;
which is still pretty rough.

As it turned out, when the percentage of people who're actually potential murders becomes very low, the specificity becomes critical and it practically dominates the result. That's why in some scenarios where the samples that satisfy some particular conditions are rare, the specificity of the test is extremely important; straightly put, when the sample set is large and the percentage of the object samples is very low, one more(or fewer) '9' at the tail of the specificity would've changed the result dramatically.

A related example comes from data-mining, where you may construct a predictor/classifier to predict if one has cancer. And because of the severe percentage of patients who actually have cancer, a seemingly high sensibility or specificity isn't enough; it may classify those who doesn't have cancer correctly at a very high score, but as long as one or a few wrong predictions w.r.t. the cancer-having patients occur, the result would be bad. Hence in those situations, often some other techniques are used as supplements.

But, you may ask, then why isn't the accuracy of a prediction defined as P(A|Pred(A)) in the first place? This way we'd never have to do such tedious calculation. The reason is actually a simple one: the sample set of Pred(A) is usually too small( for instance, how many of a general group of people have aids?) to draw a reasonablly accurate P(A|Pred(A)) from. The sample set of A, on the other hand, is usually large enough to draw a reasonably accurate approximation of P(Pred(A)|A) from.

Another way to look at this issue:

Consider the Bayes Theorem:

P(A|B) = (P(B|A)*P(A))/P(B).

Let's rewrite it a little bit:

P(A|B) = P(B|A)* (P(A)/P(B)).

This way we can see clearly that P(A|B) is proportional to P(B|A), provided that P(B) and P(A) are fixed. An immediate conclusion is that, the higher/lower the accuracy of murder-detection is, the higher/lower the probability that one is actually a potential murder when predicted as one is; and vice versa. Actually, This kinda conforms to our intuition - the higher the probability that B occurs when A occurs, the higher the probability that A occurs when B occurs.

The tricky part, though, lies in the proportion factor(i.e. P(B)/P(A)).

Let's still take murder-detection as our example, then A would mean "one is actually a potential murder" and B "one is predicted as a potential murder"(i.e. Pred(A)). If we're concerned about the precise number, we must take into account the proportion of actual potential murders as oppose to that of those who're not. Given the original setting(i.e. 1 out of 10,000 people are actually potential murders), we can readily calculate P(B)/P(A), which is what effected the final result.

This would be even clearer if we draw a little ven diagram, though. But I lack the time and patience to do that. So you may draw it and see for it yourself.

分享到：