I am trying to understand the ROC curve.

I am trying to understand the ROC curve. Say you have N0 samples, in which N+ is diseased, N- is healthy. You perform a lab test and find that some are diseased (positive), and some are not (negative).

You can make a 2x2 table.

Test	Diseased (+)	Normal (-)
Test +	TP	FP
Test -	FN	TN

But laboratory test is based on some cut off to declare a test is positive or negative. The use of different cut off will make TP, FP, FN and TN number differ.

Clinical sensitivity is defined as true positive to all with disease. In another word, Sensitivity is defined true positive rate, which is = TP/(TP+FN) You want 100% sensitivty, which can happen if FN=0

Specificity is fraction of true negatives to all without disease. So this is defined as TN/(FP+TN).Again a value of 1 would be highly desired.

1- specificity = 1- TN/(FP+TN) = FP/(FP+TN).

The denominator in the two cases, (TP+FN) is the total diseased samples; FP+TN is total undiseased sample.

ROC curve is constructed by using different cut-off. At each cut-off value, you get a pair of data (true positive rate, false positive rate). When you set cut-off value very high, you have very few cases you regaded as true positive, (and among undieased sample, every thing is much lower, and they are all declared as negative). So you have a very low true positive rate; FP is also very low. As you lower the cut-off, more cases will be classifed as TP; so true positive rate increases. At the same time, FP might comes in.

A vertical straight-up line, would have 100% sensitivity (all Positives samples declared as TP); and zero false positive rate.

Some notes on Virus

Training