A Guide on How To Evaluate Papers in Computer Science

[TL;DR  Try answering all questions below.]

When I started graduate school, I realized found out the hard way that “we’re not in Kansas anymore”.  Many professors told me it was all about the research, and not at all about grades.  Grudgingly, I stopped obsessing over assignments, and started obsessing over research papers.  However, the transition wasn’t easy.  Learning how to evaluate research papers critically is a continuous process, and I tried to find a starting point online on how to evaluate a research paper in Computer Science to no avail.  Each scientist must find their own “groove” or “style” when evaluating another scientist’s work.

I’ve been fortunate enough to be surrounded by really bright colleagues who have shaped my thought process, as well as really bright mentors who have taught me how to think critically.  This blog post unites their collective wisdom into one guide: A Guide on How To Evaluate Papers in Computer Science.

I warn you that this is not a be-all, catch-all, one-stop solution on evaluating papers.  I am sure there are millions of ways to evaluate a paper.  What I hope to provide with this guide is essentially a starting point for your “thinking process” (which I could not find elsewhere).

How I suggest you use this guide is essentially by filling in the blanks the questions pose after reading the paper.  If you decide to fill it in as you go, go back to the questions when you’re done and make sure your answers still hold; you must verify whether you captured the essence of the paper.

1. What are the main points?

Every self-respecting evaluation of a paper will have at least a summary of the work that was presented.  The key is to not cite the paper at all in your description; paraphrase the ideas and think of how you would explain the idea to someone in Computer Science outside your area of expertise.

2. What did I not understand?

This is usually tough for scientists because it explicitly indicates a lack of understanding.  Don’t worry.  It’s not that you aren’t smart, rather that currently you are not sure of what the article is stating.  The key to this question is not saying “I don’t get X”, but rather, “I don’t get X, but I think it means Y.”

3. If I were to build a system out of this, how would I do it?

The beauty of computer science is that even if the content of the paper is highly theoretical, at some point the technical knowledge can be made concrete.  Ask yourself, “How can we do that?”  Often you’ll find implementation flaws that indicate theoretical flaws…this feeds itself because you can then identify ways to improve the theory.

** If the paper itself discusses an implementation, question 5 will be particularly relevant.

4. Does this paper solve the field?

The answer to this question should be no.  The answer to this highly rhetorical question is to inject (what I think is) healthy satire in your analysis.  Unless the paper proves that P = NP, it hasn’t solved the field.  This leads us to the next sub-question.

  • If not, what did they do wrong?

You might be thinking “oh well, this is just trying to break this paper”.  And that is exactly what you are doing.  Science is not art.  It is subject to scrutiny, review and analysis of hypothesis, methodology and results (more on this later).

Find something that they did wrong, or did not do at all.  I’m sure there is something.

  • If so, what did they (the authors) consider that no one else in the universe did?

Once again, this is a very rhetorical question.  If you’re answering this question, it is highly likely that you’re not being critical enough.  Go back and review more!

5. Is there any other way to do what the authors did?

This is very relevant if the authors are describing a system in their paper.  If not, the question should be focused towards other methodologies to evaluate the same hypothesis.  The previous statement is actually a perfect set-up for my next set of questions.

6. Does the experimental claim have validity?

Recall the field you (presumably) are studying.  Computer Science.  Computer Science.  Computer Science.

Science.

This is a scientific field and we’re still at the mercy of the scientific process.  The paper should try to assess some scientific claim.  If you’re rusty on your scientific terminology, recall that this claim is the research hypothesis.

Go back to your “main points” question (first one).  If you did not write the hypothesis of the article, it’s a bad sign for the paper. If you can guess the hypothesis, but can’t answer any of the following questions because you can’t find the answers, it’s another bad sign.

Caveat:
There are many papers in Computer Science that don’t evaluate any claim.  Survey or position papers are an example.  However, the good papers that don’t evaluate anything usually explicitly state that or something similar.  What you have to be careful for are unwarranted or unsupported claims.

As a general heuristic, you should ask yourself: “Is this paper worth anything?”

  • Measurement validity: “Can we trust the process?”

This basically asks whether the process makes sense.  What you’re looking for is whether or not the phenomena’s operationalization makes sense.  In other words, does the phenomena they study map to what they measure?

  • Internal validity: “Can we trust causal assertions?”

This question asks whether or not we can trust that the phenomena we are measuring is a direct consequence of an external variable.  Recall that the hypothesis revolves around observing an effect as a direct consequence of another one, and depending on how well the authors have crafted their experiment, your trust in their causal assertions should vary.  Internal validity can be broken up into the next three points.

  • Is there temporal ordering? (“Does x come before y?”)

This tries to establish the dependent variable in terms of the independent variable.  Does the author’s claim (related to the “y”) come as a result of their perturbation (related to “x”) ?

  • Is there association? 
(“Do x and y move in a pattern?”)

If the author’s introduced change (“x”) really does alter the phenomena they are making claims about (“y”), then you should be able to predict the next “y” for a change in “x”.  If they move erratically, that’s a red flag.

  • Can we rule out any and all rival explanations?

As one of my mentors once explained to me, the answer to this question is “proof by lack of imagination”.  It is impossible to list all ways the “y” could be changed due to changes in factors x1, x2, x3, x4 …. xn.

That being said, that is no excuse for the authors not “doing their due diligence” and think really hard to eliminate/consider all possible factors.  If any thing exists that was not considered, its impact should be minimal or equally impacting (a bias, not an error) to the entire experiment.  This sort of implies you can “factor it out” when analyzing.

  • External validity: “Can we generalize the findings?”

This depends directly on how they obtained their sample size for observing the “y”.  Look at the sample size used (usually the fault of many papers), the way participants were recruited, and the statistical tests used.  If these three things check out, it is possible that it’s all good.

Phew!  That was a brain-dump.  Having exposed my method, I will close with this:  The scientific process is not perfect.  It is really easy to think that after such rigorous in-depth analysis, if an article meets the required scientific backup, it is fact.  If only it were so easy.  The scientific process is a man-made construct that tries to impose order and logic in the chaos that is the universe.  Let’s just say it might miss a few things.

Playing both sides of the devil advocate role, even though it is not perfect, it’s the best process we have.  It is our role as scientists to critically evaluate the work of others and not take anything for granted.  Doing so is the only way to find the paths that lead humanity to bigger and better places!