# Research reports and the lingo

I have some experience with research reports, both writing and reading.  In this forum, we come across many and are trying to make sense of them.  One of the best tips I ever got about research is the following 4 steps.

1.Practical problem (does X effect Y)?

2.Statistical problem (does a sample population have a relationship between X and Y)?

3.Statistical solution (what is the correlation between X and Y)?

4.Practical solution (eliminate X due to the relationship with Y).

A couple of key thoughts on the generic research report.  Most of the research that I have seen on PC is correlational research, which looks at is there a relationship between two variables.  There are two VERY important things to consider when looking at correlation results, the sample population, and the significance of the relationship.

Each correlation study has a sample group of men, 1,000, 2,000, etc. and the hope of the researcher is this sample represents the population. The gold standard of research is “pure random sample of sample participants” and most samples are not pure, some are better than others.  Age, comorbidity, the region where participants live, ethnicity, etc. all affect how well the sample represents the population.  Remember the end the statistic solution refers to the larger population, or we make inference about the variable in question to the population.

The second critical factor is the significance of the relationship.  A perfect correlation is 1, no correlation is 0.  See https://onlinecourses.science.psu.edu/stat100/node/35 for more on significance.  The real key and most often in research there is a positive correlation meaning based on the statistical test there is a relationship. Now I have often heard and read there is a statistical correlation but not a practical one.

One other thought is bias, researcher bias, sample bias and many other biases.  I am not suggesting intentional bias. When I did my dissertation, I had a strong hypothesis.  The results came in completely opposite of what I believed. At first, I thought I failed. However, disproving or looking for another variable that could explain the result is as valuable as my initial belief about the relationship between variables.  I suggest most researchers have bias and the good ones declare it and ensure the research plan accounts for it.

In this source are some good thoughts and definitions of many of the terms in research reports.  http://researchbasics.education.uconn.edu/correlation/

I was taught to read research reports with a healthy skepticism.  Look for flaws, look for the makeup of the sample and how well the sample represents the population.  Also look for the significance of the findings.  One key to research is where is it published.  Peer-reviewed is the gold standard but there are different levels of peers.  Being published in the AMA is more rigorous than a local group that formed a website to publish their thoughts.

I am not a statistician but do have some experience with research design and interpretation.  Please feel free to add your thoughts and experience to this thread.  The better we are at understanding the importance of a study, the better decisions we will make.  Thanks, Denis

• invert the numbers

One trick that helps me is to invert the numbers.

When the PROTECT trial says that the "number needed to treat" is 33 men to keep one extra man alive or free of metastatses after 10 years, I turn it around:  32 men  get all the negative outcomes of impotence, incontinence, uniary strictures, heart attacks and strokes, etc. so that one man can get some benefit.

When a paper claims that a test is 95% accurate, that means 5% of the time it is not accurate.

Here's a simple example, one that 9 out of 10 doctors will get wrong.

There is a test for a rare disease.  The disease occurs in 1 person in 1000 in the general population.  The test is extrmely good, unrealisitically good, to make the problem easier.  The test is right 95% of the time: if the test says you have the disease, it is right 95% of the time.  If the test says you don't have the disease, it is 100% accurate.  (That's the unrealistic part, no test is that good.)

Your doctor gives you the test.  There are no other indications that you have the disease.

The test comes back positive.  What are the chances you have the disease?

The answer is closer to 2%.

The exact answer uses Bayes theorem for conditional probabilities, but there is a shorcut that works in simple cases like this one.

Give the test to 1000 people.  You expect 1 of those 1000 people to have the disease.  But because the test is 95% accurate, 5% of the people will get a positive answer.  In the group of 1000, that's 50 people, plus the one person who actually has it.

So if you have a positive test, the question is whether you are the one person or one of the 50 people?  The odds that you are the one person who has the disease are

1 /  (50+1) a bit less than 2%.

See how reversing the 95% right to 5% wrong makes the problem easier to understand and solve?

• SPT said:

invert the numbers

One trick that helps me is to invert the numbers.

When the PROTECT trial says that the "number needed to treat" is 33 men to keep one extra man alive or free of metastatses after 10 years, I turn it around:  32 men  get all the negative outcomes of impotence, incontinence, uniary strictures, heart attacks and strokes, etc. so that one man can get some benefit.

When a paper claims that a test is 95% accurate, that means 5% of the time it is not accurate.

Here's a simple example, one that 9 out of 10 doctors will get wrong.

There is a test for a rare disease.  The disease occurs in 1 person in 1000 in the general population.  The test is extrmely good, unrealisitically good, to make the problem easier.  The test is right 95% of the time: if the test says you have the disease, it is right 95% of the time.  If the test says you don't have the disease, it is 100% accurate.  (That's the unrealistic part, no test is that good.)

Your doctor gives you the test.  There are no other indications that you have the disease.

The test comes back positive.  What are the chances you have the disease?

The answer is closer to 2%.

The exact answer uses Bayes theorem for conditional probabilities, but there is a shorcut that works in simple cases like this one.

Give the test to 1000 people.  You expect 1 of those 1000 people to have the disease.  But because the test is 95% accurate, 5% of the people will get a positive answer.  In the group of 1000, that's 50 people, plus the one person who actually has it.

So if you have a positive test, the question is whether you are the one person or one of the 50 people?  The odds that you are the one person who has the disease are

1 /  (50+1) a bit less than 2%.

See how reversing the 95% right to 5% wrong makes the problem easier to understand and solve?

I see where you are going with this but there are several problems with your math and logic.  The most obvious is this: "But because the test is 95% accurate, 5% of the people will get a positive answer."  This is really nonsense.  Earlier you stated that "if the test says you have the disease, it is right 95% of the time."  Then later your math says it is only right 2% of the time if it is positive.  At the very least your explicit definition of the tests accuracy is wrong.  A test that is 100% accurate would, by your logic, present a positive result 0% of the time.

My own math would say that if a test is 95% accurate for positive results and 100% accurate for negative results and you tested 19,000 people (of whom statistically you would expect 19 to have the disease) then 20 people would test positive: 19 correctly and 1 incorrectly.  The other 18,980 would have accurate negative results.

-Tom

• SPT said:

invert the numbers

One trick that helps me is to invert the numbers.

When the PROTECT trial says that the "number needed to treat" is 33 men to keep one extra man alive or free of metastatses after 10 years, I turn it around:  32 men  get all the negative outcomes of impotence, incontinence, uniary strictures, heart attacks and strokes, etc. so that one man can get some benefit.

When a paper claims that a test is 95% accurate, that means 5% of the time it is not accurate.

Here's a simple example, one that 9 out of 10 doctors will get wrong.

There is a test for a rare disease.  The disease occurs in 1 person in 1000 in the general population.  The test is extrmely good, unrealisitically good, to make the problem easier.  The test is right 95% of the time: if the test says you have the disease, it is right 95% of the time.  If the test says you don't have the disease, it is 100% accurate.  (That's the unrealistic part, no test is that good.)

Your doctor gives you the test.  There are no other indications that you have the disease.

The test comes back positive.  What are the chances you have the disease?

The answer is closer to 2%.

The exact answer uses Bayes theorem for conditional probabilities, but there is a shorcut that works in simple cases like this one.

Give the test to 1000 people.  You expect 1 of those 1000 people to have the disease.  But because the test is 95% accurate, 5% of the people will get a positive answer.  In the group of 1000, that's 50 people, plus the one person who actually has it.

So if you have a positive test, the question is whether you are the one person or one of the 50 people?  The odds that you are the one person who has the disease are

1 /  (50+1) a bit less than 2%.

See how reversing the 95% right to 5% wrong makes the problem easier to understand and solve?

The 95% number refers to the

The 95% number refers to the confidence interval, simply stated that the sample results should fall into the population results with that confidence.  It is not a grade that suggests the results are really accurate more about confidence in the sample.  Denis

• I see where you are going with this but there are several problems with your math and logic.  The most obvious is this: "But because the test is 95% accurate, 5% of the people will get a positive answer."  This is really nonsense.  Earlier you stated that "if the test says you have the disease, it is right 95% of the time."  Then later your math says it is only right 2% of the time if it is positive.  At the very least your explicit definition of the tests accuracy is wrong.  A test that is 100% accurate would, by your logic, present a positive result 0% of the time.

My own math would say that if a test is 95% accurate for positive results and 100% accurate for negative results and you tested 19,000 people (of whom statistically you would expect 19 to have the disease) then 20 people would test positive: 19 correctly and 1 incorrectly.  The other 18,980 would have accurate negative results.

-Tom

False positives and false negatives

There are two numbers for pretty much every test: the number of false positives and the number of false negatives.  In the very simplified example I gave, the false positive rate is 5%, and the false negative rate is 0%.  I defined what the 95% means:  if the test is positive, there is a 95% chance you actually have the disease.  Invert that number, and it means that 5% of people will get a positive result but not have the disease. That's why when you test 1000 people at random, you expect 51 positives: 50 false positives and 1 genuine positive.

Doctors and medical research papers use the terms specificity and sensitivity, which are 1 minus the false positve and false negative rates.

Wiki has a write up with the complete math.

There's nothing controversial or difficult about the math; what is distressing is that so few doctors understand the concept of conditional probability.

• SubDenis said:

The 95% number refers to the

The 95% number refers to the confidence interval, simply stated that the sample results should fall into the population results with that confidence.  It is not a grade that suggests the results are really accurate more about confidence in the sample.  Denis

No, it is not the confidence interval

Confidence is another topic entirely.  An important one, but this example was about false positives and conditional probability, not about confidence intervals.