
Correlational Research Designs in Thesis and Dissertations
What is Correlational Research?
Correlational Research is a type of non-experimental research in which the researcher assesses the relationship between two or more variables in a single group without controlling or manipulating them.
It is a quantitative research method that establishes a link between two quantitative variables. Therefore, the two variables must be represented numerically.
Correlational research measures the association between the variables using a calculated value, correlation coefficient (r). The correlation coefficient value ranges from -1 to +1.
What is the Purpose of a Correlational Research Design?
A Correlational Design aims to assess three aspects;
(i) Whether a statistically significant association exists between two variables.
The p-value shows the statistical significance of the association between the variables. A p-value <= 0.05 indicates the association is statistically significant, whereas a p-value> 0.05 implies the association is not statistically significant.
(ii) The direction of the relationship
The positive (+) or negative (-) sign of the correlation coefficient (r) indicates the direction of the association.
A positive sign shows that the two variables change in the same direction; that is, as one variable increases, the other also increases. Similarly, as one variable decreases, the other decreases.
A negative sign shows that the two variables change in opposite directions. As one variable increases, the other decreases, and vice versa.
(iii) The strength of the relationship
The nearer the correlation coefficient is to one, the stronger the association between the two variables. As the value declines to zero, the association weakens.
When should Correlational Research be Used in a Dissertation Project?
Correlational research is widely used in psychology, medicine, and market research to collect data from a population using a sample, to examine associations between various factors, and generalize the findings to the population.
Below is a guide on when to conduct a correlational study;
(i) Investigating Non-Causal Relationships:
Correlational research is valuable when examining relationships between related variables that may not affect each other. In this setup, the researcher is an observer, collecting and analyzing the data in its natural state, with no manipulation or control.
Example
For example, a correlational research method in psychology may assess the association between social media use and loneliness. Though social media may reduce loneliness through social connections, it may also increase it depending on usage trends and experiences.
From a sample, the study collects data on the time spent on social media and feelings of loneliness.
Time is a continuous variable and may be represented in hours, minutes, or seconds, depending on the study’s granularity.
Time may also be measured as a range, for instance, < 1 hour a day, 1 to < 3 hours/day, 3 to < 5 hours/day, and >= 5 hours/day. In such a case, numerical values are assigned to the groups in ordinal format, that is, <1 hour a day =1 and 1 to < 3 hours/day =2, and so forth.
Feelings of loneliness may be measured using a predetermined questionnaire-like scale with closed responses, usually in Likert format. This allows for numerical representation and analysis.
Then, the study examines the correlation between social media use and loneliness to determine whether an association exists.
(ii) Exploring Causal Relationships When Experiments are Undoable:
When modifying the quantitative variables is unrealistic or unethical, the researcher might choose to conduct correlational research rather than experimental research or another type of research.
In these circumstances, the study uses correlational data for exploratory purposes before examining a causal relationship.
Example
For example, a study investigates socioeconomic status’s influence on academic achievement. Correlating various socioeconomic factors with academic performance could help identify possible causal links.
(iii) Testing New Measurement Tools:
Correlational research can also check the reliability and validity of measurement instruments.
How Correlation Checks Reliability and Validity of Instruments
(i) Reliability
Test-Retest Reliability:
Test-retest Reliability assesses the consistency of a measurement instrument over time.
The test involves administering a questionnaire to select participants at two different time points. Correlating the two sets of scores provides the test-retest reliability. A high, positive, and significant correlation coefficient indicates higher reliability.
Inter-Rater Reliability:
Inter-rater Reliability evaluates the consistency of ratings between multiple raters.
For example, in studies using numerous raters to score the questionnaire responses, the researcher should assess this reliability to maintain high accuracy. Correlating the rater’s ratings provides the inter-rater reliability.
Validity
Convergent Validity
Nonetheless, correlation can also check convergent validity.
Convergent validity measures the degree to which items of the exact construct correlate.
For example, a researcher can correlate a new instrument that measures anxiety with a well-developed anxiety instrument. A high, positive, and significant correlation suggests that the new measure effectively measures the same construct as the established measure.
What are the Types of correlational research design?
Correlational research is categorized into three types based on the direction of the correlation coefficient.
The types of correlational research are positive, negative, or zero correlation.
(i) Positive Correlational Research
A Positive Correlational Research is a type of research in which the correlation between the variables of interest is greater than zero (0 < r < 1) and statistically significant (p < 0.05). In this design, a change in one variable signals a change in the same direction as the other variable. That is, when one variable increases, the other variable also increases, or both variables decrease.
An example of a positive correlation study could be research that explores the relationship between the study hours and the marks obtained by students in a particular subject. A positive correlational research would indicate that longer study hours are associated with higher marks and vice versa.
(ii) Negative Correlational Research
In a Negative Correlational Research design, the correlation between the study variables is less than zero (0 > r > -1) and statistically significant (p < 0.05). It implies that a change in one variable signals a change in the other variable but in the opposite direction. For instance, an increase in one variable signals a decrease in the other variable.
An example of a negative correlational study can be examining the relationship between price and demand for a product. A negative correlation would indicate that a higher price decreases the demand for a product.
(iii) Zero Correlational Research
Zero correlational research implies no association between the variables investigated. It is a research design where the change in one variable is inconsequential to the other variable, indicating no relationship.
This research design examines variables with unclear statistical relationships. For example, income and patience are independent variables, and their relationship can be investigated using zero correlational research.
Ideally, variables in a zero correlational research may showcase little correlation that is not statistically significant (p > 0.05), indicating the variables are not mutually inclusive. Significant correlations between independent variables may happen by chance and not due to the sway of the other variable.
How is Data Collected in Correlational Research?
The data collection method implemented in a study can also be used to classify correlational research designs. Using this approach, 3 types of correlational research designs are commonly applied in dissertations: Survey, Observational, and Archival research.
Before collecting data, it is essential to carefully select the most appropriate methods for obtaining a representative sample from the population of interest and eliminating research biases to ensure reliable and valid findings.
Also, it is best to consider the ethical implications of each data collection method to avoid harming participants.
What is Survey Research?
Survey research collects structured data from a population sample through a predetermined questionnaire.
The researcher can administer the questionnaire through different approaches;
- Self-administered questionnaires: Participants complete questionnaires independently.
- Interviewer-administered questionnaires: The researchers directly ask participants questions in person or by phone
- Online surveys: The research uses an online platform to facilitate the data collection from the questionnaire.
In dissertations, the online survey is more popular than directly interviewing the participants by phone or in person.
What are the Key advantages of Survey Research:
- Survey research is a fast and convenient way to collect data from a vast population.
- Uses a representative sample to generalize findings to the larger population.
- Surveys can collect data on various concepts such as attitudes, beliefs, behaviors, and experiences.
Critical Limitations of Survey Research:
- Social desirability bias: Participants may respond based on their socially acceptable beliefs rather than their own.
- Recall bias: It involves the participant’s difficulty in remembering past phenomena.
- Response bias: Participants with similar characteristics may dominate responding, leading to skewed results.
Naturalistic Observation Research
Naturalistic observation is a type of field research that examines real-world phenomena or behaviors without manipulation.
It involves observing, recording, and grouping events or actions as they occur in their natural context.
For instance, observing people’s behavior in a public environment such as a grocery store or workplace.
In naturalistic observation studies, the participants are not informed of the research to ensure they act naturally; otherwise, they may deviate from their true selves.
For this research method to be considered ethical, the researcher must maintain the participant’s anonymity and ensure that the data is collected in a public setting where the individuals observed do not expect complete privacy in their actions or behaviors.
What are the Key advantages of Naturalistic Observation:
- Naturalistic observation is used to observe proper behavior in real-life settings.
- It can prompt new insights and thus be a kind of generative research.
- It allows researchers to understand social and cultural behaviors.
What are the Key limitations of naturalistic observation:
- Observer bias: Individual biases as an observer can sway the observations and findings.
- Reactivity: Participants may change their true selves if they realize they are being observed.
- Tedious and time-consuming: It can be difficult and time-consuming to record and compile observational data.
What is Archival Research?
Archival research involves gathering and analyzing secondary data. The data is pre-existing and available for research.
What are the Common Sources of Archived Data for Dissertations?
For dissertations, archived or secondary data can be found in the following sources;
- Government publications: Includes economic statistics, census data, and health reports
- Academic journals and books: Published articles and books
- Market research reports: Consumer surveys and industry reports.
- Organizational records: Financial statements, sales data, customer databases.
- Online databases: Google Scholar, PubMed, and JSTOR.
What are the Key Advantages of Archival Research?
- Archival research is non-reactive, as the participants can not influence the data after it has been collected.
- It is a fast and cost-effective method for collecting research data because the data is already available.
- Further, it provides a broader scope of information, including data from people who may not be willing to participate in the research or be alive.
- The research allows researchers to explore the evolution of phenomena or behavior. With archival data, the researcher can examine the change in the correlation between variables (behaviors or phenomena) over time.
- Moreover, researchers use archival data in their dissertation projects to conduct a meta-analysis.
What are the Key Limitations of Archival Research:
- Archival research offers limited control over the data quality and completeness.
- The archived data may be biased, swaying the findings.
- Some archived data require access, which can be challenging to get at times.
What are the Analysis Methods in Correlation Research?
The analysis methods used in correlation research compute a correlation coefficient value that describes any association between the variables.
The analysis methods used in correlation research depend on the variable’s measurement level, the linearity of the relationship, and data distribution.
| Correlation coefficient | Type of relationship | Levels of measurement | Data distribution |
| Pearson’s r | Linear | Two quantitative (interval or ratio) variables | Normal distribution |
| Spearman’s rho | Non-linear | Two ordinal, interval, or ratio variables | Any distribution |
| Kendall’s tau | Non-linear | Two ordinal, interval, or ratio variables | Any distribution |
| Point-biserial | Linear | One dichotomous variable and one continuous variable | Normal distribution |
| Phi | Non-linear | Two dichotomous variables | Any distribution |
| Cramér’s V | Non-linear | Two nominal variables | Any distribution |
The following section examines each measure in detail;
(i) Pearson’s Correlation Coefficient
Pearson’s correlation coefficient (r) measures the linear relationship between two continuous variables.
It is the most common statistic for measuring the correlation between two quantitative variables.
For Pearson’s correlation coefficient to detect a relationship between the variables studied, the variables analyzed must meet particular assumptions.
What are the Assumptions of Pearson’s Correlation Coefficient?
Pearson’s correlation coefficient is a parametric method with the following assumptions;
(a) Continuous Variables:
- Both variables should be measured on a continuous scale, either interval or ratio data.
- These data types have the magnitude between intervals or values, enabling meaningful comparisons.
Examples of Interval or Ratio Scale:
- Age, income, weight, and test scores.
(b) Independence of Observations:
- The observations should be independent of one another. One observation should not influence the other.
- The data can be randomly collected to ensure independence.
(c) Normality:
Both variables should be normally distributed.
- Assumption check:
- To inspect normality visually, use histograms or normal Q-Q plots.
- To statistically test normality, use the skewness and kurtosis values, the Shapiro-Wilk test, or the Kolmogorov-Smirnov test.
- Addressing Normality:
- Transform data using log or Box-Cox transformation. These techniques can stabilize the variance.
- Weighted Least Squares.
(d) Homoscedasticity:
The data should be equally distributed around the regression line.
- Assumption check:
- To visually inspect homoscedasticity, create a scatterplot and check whether the data points are spread roughly equally across the regression line.
- To statistically test homoscedasticity, use the Breusch-Pagan test or the White test.
- Addressing Normality:
- Transform data using log or Box-Cox transformation. These techniques can stabilize the variance.
- Weighted Least Squares.
(e) Linearly Related:
The relationship between the two variables should be linear, i.e., it can be represented in a straight line (regression line).
- Assumption check:
- Create a scatterplot using one variable on the x-axis and another on the y-axis to visualize the relationship.
- A linear pattern indicates that the assumption is met.
- Addressing Linearity (Nonlinear Relationships):
- If the pattern observed is nonlinear, for instance, curvilinear, Pearson’s correlation is inappropriate, and a non-parametric technique like Spearman’s rank correlation or Kendall’s Tau is applied.
- Nonlinear regression can also be used to examine nonlinear relationships.
(f) No Outliers:
- Outliers are extreme values that are away from most of the other data points. These values can affect the correlation coefficient, leading to inaccurate conclusions.
How to Identify Outliers:
- Assumption check:
- To detect outliers visually, create a scatterplot and inspect the points. Data points far away from the other points are outliers.
- To check for outliers using statistical methods, calculate Z-scores (a z-score > ±3 is an outlier) or create box plots (data points away from the IQR).
- Handling Outliers:
- Check whether it is a data entry error or another anomaly. If it is, it can be deleted or replaced with an average measure such as mean, median, or mode.
- The outlier may be retained in the analysis if it is a valid data point.
- Transforming data reduces the effect of outliers.
How to Interpret Pearson’s Correlation Coefficient
- Pearson’s Correlation Coefficient ranges from -1 to +1.
- Coefficient values closer to one indicate a stronger association—correlation coefficient values near zero signal a weak or no relationship.
- Correlation coefficient values less than zero indicate a negative association and the values above indicate a positive association. In a positive association, as one variable increases, the other also increases, whereas in a negative association, as one variable increases, the other decreases.
| Degree of Linear Relationship | Pearson’s Correlation Coefficient (r) range (+ or -) |
| Weak / no correlation | 0 < r < 0.3 |
| Low correlation | 0.3 < r < 0.5 |
| Moderate correlation | 0.5 < r < 0.7 |
| High correlation | 0.7 < r < 1.0 |
(ii) Spearman’s Rank Correlation Coefficient (Spearman’s rho):
Spearman’s rho is a non-parametric test that examines the monotonic association between two ordinal / ranked variables.
Unlike a linear relationship, where variables change at the same rate, the variables change at different rates in a monotonic association. Therefore, the line in a monotonic association is not straight but wavy.
When to use Spearman’s Rank Correlation Coefficient?
- Spearman’s Rank Correlation Coefficient is appropriate for ordinal data.
- It is non-parametric and can also be used on continuous data that did not meet the assumptions for conducting Pearson’s correlation.
How to Interpret Spearman’s Rank Correlation Coefficient
.Spearman’s rank correlation coefficient has the same range as Pearson’s correlation and has a similar interpretation.
- Positive monotonic is when one variable increases, the other also increases.
- Negative monotonic is when one variable increases, the other decreases.
- No monotonic is when there is no relationship.
| Degree of Monotonic Relationship | Spearman’s Rank Correlation Coefficient (r) range (+ or -) |
| Weak / no correlation | 0 < r < 0.3 |
| Low correlation | 0.3 < r < 0.5 |
| Moderate correlation | 0.5 < r < 0.7 |
| High correlation | 0.7 < r < 1.0 |
(iii) Kendall’s Tau:
Kendall’s Tau is also a non-parametric correlation coefficient that measures the monotonic relationship between two ordinal / ranked variables.
When to use Kendall’s Tau Correlation Coefficient?
- Kendall’s Tau Correlation Coefficient is appropriate for ordinal data.
- It is non-parametric and can also be used on continuous data that did not meet the assumptions for conducting Pearson’s correlation.
- It is less sensitive to outliers and thus suitable for such data.
- It is better than Spearman’s correlation in handling data ties.
- It is also preferred for small sample sizes.
How to Interpret Kendall’s Tau Correlation Coefficient
.Kendall’s Tau correlation coefficient has the same range as Pearson’s correlation and has a similar interpretation.
- Positive monotonic is when one variable increases, the other also increases.
- Negative monotonic is when one variable increases, the other decreases.
- No monotonic is when there is no relationship.
| Degree of Monotonic Relationship | Kendall’s Tau Correlation Coefficient (r) range (+ or -) |
| Weak / no correlation | 0 < r < 0.3 |
| Low correlation | 0.3 < r < 0.5 |
| Moderate correlation | 0.5 < r < 0.7 |
| High correlation | 0.7 < r < 1.0 |
(iv) Point-Biserial Correlation
Point-biserial correlation measures the linear relationship between a continuous variable and a dichotomous variable, i.e., a variable with only two categories, such as “yes” and “no,” or “male” or “female.”
What are the Assumptions of Point-Biserial Correlation?
Point-biserial correlation is a parametric technique with the following assumptions;
(a) A Continuous and Dichotomous Variable :
- One variable should be measured on a continuous scale, either interval or ratio data, and the other dichotomous, i.e., with two categories.
(b) Independence of Observations:
The observations should be independent of one another. One observation should not influence the other.
The data can be randomly collected to ensure independence.
(c) Normality of the Continuous Variable:
The continuous variable should be normally distributed within each category of the dichotomous variable.
- Assumption check:
- To visually inspect normality, create histograms or normal Q-Q plots.
- To statistically test normality use the skewness and kurtosis values, the Shapiro-Wilk test, or the Kolmogorov-Smirnov test.
- Addressing Normality:
- Transform the continuous variable using log or Box-Cox transformation. These techniques can stabilize the variance.
- Weighted Least Squares.
(d) Homogeneity of Variances:
The continuous variable should have approximately equal variances across the two categories of the dichotomous variable.
- Assumption check:
- To statistically check for homogeneity of variances, use Levene’s test.
- Addressing Heterogeneity:
- Transform the continuous variable using log or Box-Cox transformation. These techniques can stabilize the variance.
- Weighted Least Squares.
(e) Linearly Related:
Although point-biserial correlation examines the relationship between a continuous variable and a dichotomous variable, it assumes a linear relationship between the two variables.
This implies that as the continuous variable changes, i.e., increases or decreases, the probability of the dichotomous variable being in one category linearly increases or decreases.
- Assumption check:
- To visualize the relationship, create a scatterplot using the continuous variable on the y-axis and the dichotomous variable on the x-axis.
- A linear pattern based on the categories indicates that the assumption is met.
- Addressing Linearity (Non-Linear Relationships):
- Transform the continuous variable using log or Box-Cox transformation. These techniques can stabilize the variance.
- Weighted Least Squares.
(f) No Outliers:
Outliers are extreme values that are away from most of the other data points. These values can affect the correlation coefficient, leading to inaccurate conclusions.
- Assumption check:
- To check for outliers using statistical methods, calculate Z-scores (a z-score > ±3 is an outlier) or create box plots (data points away from the IQR).
- To visually detect outliers, create a scatterplot and inspect the points. Data points far away from the other points in each category are outliers.
- Handling Outliers:
- Check whether it is a data entry error or another anomaly. If it is, it can be deleted or replaced with an average measure such as mean, median, or mode.
- The outlier may be retained in the analysis if it is a valid data point.
- Transforming data reduces the effect of outliers.
When to Use Point-Biserial Correlation:
- To determine the strength and direction of the relationship between a continuous and a dichotomous variable.
- The continuous variable should meet the assumptions of a linear relationship.
How to Interpret Point-Biserial Correlation Coefficient
- The point-Biserial Correlation Coefficient ranges from -1 to +1.
- Coefficient values closer to one indicate a stronger association—correlation coefficient values near zero signal a weak or no relationship.
- Correlation coefficient values less than zero indicate a negative correlation, and the values above indicate a positive correlation.
- In a positive correlation, as the continuous variable increases, the probability of one category (the ‘1″ between “0” and “1”) within the dichotomous variable also increases.
- In a negative correlation, as the continuous variable increases, the probability of one category (the ‘1″ between “0” and “1”) within the dichotomous variable decreases.
| Degree of Linear Relationship | Point-Biserial Correlation Coefficient (r) range (+ or -) |
| Weak / no correlation | 0 < r < 0.3 |
| Low correlation | 0.3 < r < 0.5 |
| Moderate correlation | 0.5 < r < 0.7 |
| High correlation | 0.7 < r < 1.0 |
(v) Phi Correlation
Phi coefficient measures the strength and direction of the association between two binary variables. It is similar to Pearson’s correlation coefficient but designed for a 2×2 contingency table.
It is calculated by dividing the chi-square value by the sample size and then taking the square root of this value.
When to Use Phi Coefficient:
- Both variables should be dichotomous (having exactly two categories).
- The data can be represented in a 2×2 contingency table.
How to Interpret Phi Correlation Coefficient
- Phi Correlation Coefficient ranges from -1 to +1.
- Coefficient values closer to one indicate a stronger association—correlation coefficient values near zero signal a weak or no relationship.
- Phi Correlation coefficient
- Values less than zero indicate a negative correlation, and the values above indicate a positive correlation.
- In a positive correlation, the two variables occur together.
- In a negative correlation, the two variables occur separately.
| Degree of Relationship | Phi Correlation Coefficient (r) range (+ or -) |
| Weak / no correlation | 0 < r < 0.3 |
| Low correlation | 0.3 < r < 0.5 |
| Moderate correlation | 0.5 < r < 0.7 |
| High correlation | 0.7 < r < 1.0 |
(vi) Cramer’s V Correlation
Cramer’s V measures the strength of association between two nominal variables. It is an alternative to phi correlation in larger than 2×2 tables.
When to Use Cramer’s V Coefficient:
- Both variables should be nominal.
- The data can be represented in a larger than 2×2 contingency table.
How to Interpret Cramer’s V Coefficient
- Cramer’s V Correlation Coefficient ranges from 0 to +1, with no negative values.
- Coefficient values greater than 0.15 indicate a stronger association—correlation coefficient values near zero signal a weak or no relationship.
| Cramer V | Interpretation |
| > 0.25 | Very strong relationship |
| > 0.15 | Strong relationship |
| > 0.10 | Moderate relationship |
| > 0.05 | Weak relationship |
| ≥ 0 | No or very weak relationship |
What are the Limitations of Correlational Research?
No Causation
Further, correlational research does not infer causation between the variables investigated; it focuses on the strength and direction.
Directionality Limitations
Even though two variables exhibit directionality, implying the two variables are correlated, it is impossible to conclude which variable influences the other.
Excludes Extraneous Variables
Furthermore, correlation is a bivariate analysis and does not control for extraneous variables, which may impact the relationship between the variables of interest. These factors limit its application in research.
What is the Difference between correlational and experimental research?
Experimental research is strongly associated with correlational research design because it uses quantitative analysis to explore the relationship between variables. However, the two research methods feature unique characteristics in terms of research intent, data collection, and validity.
In experimental research, the primary intent of this research design is to establish causal and effect relationships between two variables (or more). To achieve this, the researcher manipulates one or more independent variables and observes the effect on one or more dependent variables. Standardized control measures such as having a control group and using random assignment to groups are implemented in the study to minimize bias and ensure reliable and valid results.
Unlike experimental research, correlational research examines how variables associate with each other, not the causal relationship between them. In correlational research, the researcher neither manipulates nor controls the variables or the study’s settings. The variables are measured, and correlation is assessed to gauge their association.
The following table summarizes the key differences between correlational and experimental research.
| Correlational Research | Experimental Research |
| Aims to evaluate the direction and strength of association between variables. | Intends to establish causal relationships between variables. |
| No manipulation of variables. | Independent variables are manipulated to examine their effect on the dependent variable. |
| No control over research variables. | The researcher controls the research variables. Extraneous variables are controlled to reduce their impact on the variables of interest. |
| Easy to generalize findings to other populations, high external validity | Have high internal validity in concluding causal relationships. |
Understanding the critical differences between correlational and experimental research is essential when choosing the correct research design for a dissertation project.
Are Your Struggling with Research Design?
Talk to us at TheGear Consultating and get personalized help. Book a 30-minutes no-obligation consultation with one of our redesign experts by clicking on the button below.
