Introduction into Evidence-based Management: How to Evaluate the Research Quality of Scientific Studies?
As you continue to develop your evidence-based management skills, learning to evaluate research for its quality and applicability to your own management practice is paramount. While peer-reviewed, empirical research is generally of a much higher quality than other published information found from other sources, not all journal articles are created equally. Some research findings are incredibly fascinating and provocative, yet hard to reproduce; other trends may sound exciting, but not be relevant to your own industry or management practice (Creswell, 2002).
How can you, as an evidence-based manager, sift through published work to find articles of value and relevance? The key is learning some basics about evaluating the quality of research. This CQ Dossier will explain two of the core attributes that all valuable, useful research must exhibit: reliability and validity (Kmet et al,
Study reliability. Put broadly, reliability is all about the consistency and reproducibility of results. A single empirical study is never sufficient to prove that an intervention works, or that a relationship between factors is robust and genuine (Aguinis et al, 2017). This is because findings sometimes result from random human error, statistical abnormalities, the selection of a very specific and unique sample, and any number of other issues. In order to truly have faith that a research trend is real, it must be found multiple times, in a variety of contexts. This is at the core of evaluating a study’s reliability (Creswell, 2002).
Measure reliability. In addition to pertaining to how reproducible a study’s findings are, reliability can also be used to examine a measure or instrument (Santos, 1999). A reliable measure of worker satisfaction, for example, will yield consistent, similar scores over time, when administered to the same individuals. An unreliable measure will not show such consistency – a single employee may seem highly satisfied, according to the measure, on one day, and highly dissatisfied the next. This can taint statistical analysis and conclusions about results in an irreparable way.
Manager take-aways. As a manager, you can use reliability to guide your work in several ways. First, only consider implementing an intervention if it has been shown to work in a variety of studies, conducted in a variety of settings (Aguinis et al, 2017; Kmet et al, 2004). A single test is never sufficient proof that a strategy works. When seeking methods of measuring factors, such as employee performance, personality, or motivation, make sure to select a measure that has been tested for reliability in empirical research papers. Chronbach’s alpha is the statistic commonly used to test a measure’s reliability (Santos, 1999); look for values of 0.80 or higher to indicate a robust, consistent test.
Validity refers to a variety of crucial research attributes, which any quality study should exhibit. In a general sense, a study can be said to be “valid” if the results are accurately represented, the study is well-designed, and the results can be used to draw useful, broad conclusions. The obverse is also true: if a study is misreported, not well controlled and designed, or does not provide practical and applicable conclusions, it is likely to be invalid. Each of these attributes is discussed below.
Cause-effect validity. A study’s cause-effect validity, often referred to as internal validity, is the degree to which the study has identified a genuine cause and effect relationship. In management research, it can be very difficult to prove conclusively that an intervention or strategy had a direct effect on outcomes the manager cares about (Landers & Behrend, 2015). This is because the average workplace is dynamic and complicated, with many other factors impacting employee behavior.
In order to truly prove that a factor has a causal impact on workplace outcomes, a study must a) manipulate the variable that is believed to be the cause, under controlled conditions; b) demonstrate that a change in outcomes followed the manipulation of the causal variable; and c) must rule out any alternate, plausible explanations of why a change in outcomes was observed (Landers & Behrend, 2015). If a study does not provide such information, and does not test an intervention under such rigorous conditions, you do not have conclusive proof that it is effective. All results, then, should be viewed with skepticism.
Design validity. When reading an empirical study, pay close attention to the details about the method. Make sure you can answer the following questions: was the study conducted in an actual workplace, or in a laboratory? Were the participants actual employees, or random volunteers? Were external factors measured and controlled for? Did the authors consider, and address, criticisms of their conclusions? These questions will help to give you a sense of the research quality (Becker et al, 2016).
Many psychological studies are conducted in laboratories, with volunteer samples that may not resemble the employees you, as a manager, will be working with (Becker et al, 2016). When studies are conducted in organizational settings, they tend to be a bit less well-controlled, and any observed effects may be a statistical fluke. Some researchers, in addition, are not adept at acknowledging that their conclusions may be only one of interpretation of the results, among many possible alternatives. Make sure to base your own management decisions on research that is carefully conducted, with results that are reported fairly, with limitations acknowledged.
External validity. A study can be said to be “externally valid” if the researcher’s findings apply easily to the outside world. This type of validity is not absolute: findings that may be valid for one industry or group of people may not be valid for another (Green & Glasgow, 2006). For example, if you are seeking to introduce an employee wellness program, it is probably best to select one that has been tested in the industry you occupy, in an organization with a similar size and comparable demographics to yours (Kessler & Vesterlund, 2015).
Organizational research, as mentioned above, is conducted in a wide variety of settings, with a wide variety of types of people. As a result, some conclusions may not be applicable to your own management practice or your organization (Green & Glasgow, 2006). Make sure to pay close attention to a study’s demographics, setting, organization size, and country of origin; the more factors that make your organization distinct from the one being tested, the less likely it is that the researcher’s findings will apply to you. Conversely, if a strategy or intervention has worked in many places, industries, and cultures, it is likelier to be relevant and useful in your workplace (Kessler & Vesterlund, 2015).
While all peer-reviewed research is quite rigorous, it’s important to learn how to determine an individual study’s quality
A reliable research finding is one that has been reproduced, typically in a variety of settings
A reliable measure is one which shows great consistency, when administered to the same people multiple times over a prolonged period
A valid study should provide strong evidence that an intervention or strategy directly causes an improvement in outcomes
Valid studies must also be well-controlled, and ought to be conducted in true organizational settings
A researcher’s conclusions may not be valid for your own management practice if the research was conducted in an industry, country, or culture very different from your own
Sign up for our CQ Net Newsfeed
Stay up-to-date on the most recent evidence-based management news.
Aguinis, H., Cascio, W. F., & Ramani, R. S. (2017). Science’s reproducibility and replicability crisis: International business is not immune.
Becker, T. E., Atinc, G., Breaugh, J. A., Carlson, K. D., Edwards, J. R., & Spector, P. E. (2016). Statistical control in correlational studies: 10 essential recommendations for organizational researchers. Journal of Organizational Behavior, 37(2), 157-167.
Creswell, J. W. (2002). Educational research: Planning, conducting, and evaluating quantitative (pp. 146-166). Upper Saddle River, NJ: Prentice Hall.
Green, L. W., & Glasgow, R. E. (2006). Evaluating the relevance, generalization, and applicability of research: issues in external validation and translation methodology. Evaluation & the health professions, 29(1), 126-153.
Kessler, J., & Vesterlund, L. (2015). The external validity of laboratory experiments: The misleading emphasis on quantitative effects. Handbook of Experimental Economic Methodology, Oxford University Press, Oxford, UK.
Kmet, L. M., Lee, R. C., & Cook, L. S. (2004). Standard quality assessment criteria for evaluating primary research papers from a variety of fields.
Landers, R. N., & Behrend, T. S. (2015). An inconvenient truth: Arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial and Organizational Psychology, 8(2), 142-164.
Santos, J. R. A. (1999). Cronbach’s alpha: A tool for assessing the reliability of scales. Journal of extension, 37(2), 1-5.
Erika Price is a social psychologist, writer, and statistical and methodological consultant based in Chicago, Illinois, USA. Erika's research has focused on the psychology of political tolerance and open-mindedness. In addition to conducting experimental and survey-based research on these topics, Erika helps clients use methodological and data analytic tools to answer pressing questions that challenge their organization.