Learn Before
Sensitivity of the Mean to Outliers
A key characteristic of the mean is its high sensitivity to extreme scores, or outliers. In highly skewed distributions, the mean is pulled away from the median in the direction of the skew (the longer tail). Because extreme scores heavily influence its value, the mean may cease to be an accurate representation of the typical score in such datasets, leading researchers to often prefer the median for skewed distributions.
0
1
Tags
Ch.2 Psychological Research - Psychology @ OpenStax
Psychology @ OpenStax
Introduction to Psychology @ OpenStax Course
OpenStax
OpenStax Psychology (2nd ed.) Textbook
Psychology
Social Science
Empirical Science
Science
KPU
Research Methods in Psychology - 4th American Edition @ KPU
Related
Positively Skewed Distribution
Sensitivity of the Mean to Outliers
Negatively Skewed Distribution
Which of the following best describes the shape of a skewed distribution?
A psychology researcher measures the reaction times of participants on a memory task. They find that while most participants respond very quickly (between 200 and 400 milliseconds), a small number of participants take significantly longer (over 1,500 milliseconds). Because the scores cluster at the lower end of the scale and the tail trails off toward the higher values, this distribution is __________ skewed.
A researcher conducting a study on digital literacy finds that most participants are highly proficient and score near the top of a 100-point scale. However, a small number of participants have almost no experience with technology and score very low. This creates a negatively skewed distribution. Analyze the relationship between the measures of central tendency in this scenario and arrange them in order from the lowest numerical value to the highest numerical value.
A researcher finds that a distribution of participant reaction times is heavily positively skewed, with a mean of $1,200 ext{ ms}. The researcher justifies reporting the mean as the 'typical' reaction time by arguing that it is the most comprehensive measure because it captures the specific performance of every slow-responding participant in the long tail. This justification represents a scientifically valid evaluation of how to represent the central tendency of this data.
In a skewed distribution, the 'tail' refers to the prominent peak where the majority of scores cluster most heavily.
In a skewed distribution, how does the 'tail' relate to the overall shape and clustering of the data?
Match each distribution type to the description that correctly identifies where scores cluster and where the tail extends.
A psychology researcher is analyzing the shapes of different data distributions. Match each research data distribution characteristic to its corresponding pattern of score clustering and tail direction.
When analyzing the shape of a skewed data distribution, a researcher finds a prominent peak where scores cluster heavily toward one end. To identify the direction of the skew, the researcher must locate the relative position of the _____, which trails off toward the opposite end.
Order the steps a researcher must take to evaluate the shape of a dataset's distribution and determine if it is skewed.
Excluding Outliers
Handling Valid Extreme Outliers
Example of an Outlier
Sensitivity of the Mean to Outliers
Defining Outliers using z Scores
Reaction Time Outlier Example
Impact of Outliers on the Range
Identifying Outliers Using z Scores
Handling Outliers
What term is used to describe an extreme score that falls significantly above or below the rest of the scores within a distribution?
In psychological research, an outlier in a dataset always indicates that an error occurred during data collection, such as an equipment malfunction or participant misunderstanding.
In psychological research, outliers can arise from various sources. Match each research scenario with the most likely reason for the extreme score described.
A researcher identifies a score in a dataset that falls significantly below the rest of the distribution. Arrange the following steps in the logical analytical sequence used to investigate and address this extreme value.
You are constructing a research protocol for a study on the cognitive effects of extreme stress. To ensure that your final dataset can systematically distinguish between genuine cases of 'stress resilience' (valid outliers) and 'task confusion' (measurement errors), which design element should you integrate into your data-collection phase?
Sexual Partners Survey Outlier Example
Outlier in Beck Depression Inventory Scores
A researcher is evaluating a dataset where a single participant's score is , while all other scores are between and . After confirming the participant understood the task perfectly and no errors occurred, the researcher decides to retain the score. This decision reflects a judgment that the extreme value is a(n) _____ which, despite its potential to skew the mean, represents a genuine and valid case within the study.
A researcher measures anxiety in 50 participants and finds that 49 scores fall between 20 and 45, but one participant scores 95. This extreme value, which may reflect a genuinely anxious individual or a data collection error, is called a(n) _____.
A researcher administering a depression inventory notices one participant scored far higher than the rest of the sample. If the researcher confirms this participant is indeed clinically depressed and no data entry or equipment errors occurred, this extreme score is no longer considered an outlier.
Match each description of a research scenario that produces an extreme score with its corresponding source of outlier.
When conducting data cleaning on a newly collected psychological dataset, order the stages a researcher should follow to systematically identify and categorize an extreme score.
Define what an outlier is in the context of a dataset's distribution, and describe the two broad categories of sources that can produce these extreme scores in psychological research.
Based on the provided context, diagnose the nature of this student's extreme score. Is this outlier a result of an error, or does it represent a genuine case? Explain your reasoning.
A researcher asks participants to report their age in years. Most responses range from to . One participant enters . Apply the definition of an outlier to explain what this value represents and identify the most likely reason for this extreme score based on common sources of outliers.
Example of Finding the Median
Sensitivity of the Mean to Outliers
What does the median represent in a distribution of scores?
A psychologist is analyzing the number of words recalled by a group of participants in a memory study. Arrange the steps below in the correct order to determine the median value of these recall scores.
A psychologist is evaluating how modifications to a dataset of reaction times (measured in milliseconds) affect the median. Match each potential data change to its specific analytical impact on the median value.
A psychologist studying cognitive test scores finds that most participants score near 50, but one participant scores 0 due to a computer error. The psychologist’s decision to report the median as the representative value is methodologically sound because the median’s focus on the middlemost position prevents this extreme score from distorting the perceived typical performance of the group.
A psychology researcher is designing a set of hypothetical results for a textbook to illustrate how the median remains stable despite extreme outliers. They need to construct a dataset of reaction times (measured in milliseconds) that satisfies two specific design criteria: (1) the median must be exactly , and (2) the mean must be at least to demonstrate a significant positive skew. Which of the following datasets correctly assembles these parameters?
Example of Calculating the Median
For any distribution of scores, the median must always be one of the actual observed scores present in the original dataset.
A psychological researcher is analyzing different sets of participant scores. Match each dataset scenario with the correct rule or property used to identify or describe its median.
A researcher measures the reaction times (in seconds) of six participants completing a cognitive task. The recorded times are: 14, 8, 21, 10, 18, and 12. The median reaction time for this group of participants is _____.
A psychology researcher analyses a set of reaction times and observes that the median is a value that does not equal any of the actual observed scores in the dataset. By analyzing this outcome, the researcher can deduce that the dataset contains a(n) _____ number of scores.
A researcher is determining the most appropriate measure of central tendency to report for a dataset of cognitive processing times. Arrange the steps in the correct order to evaluate the data properties and select the representative metric.
In the context of analyzing psychological data, define the median as a measure of central tendency. Specifically explain the procedural steps a researcher must take to identify the median when working with a dataset that contains an even number of scores.
Based on the researcher's finding that the median is 4, explain what this value indicates about the distribution of aggressive play behaviors among the children in the classroom.
A clinical psychologist administers a short anxiety inventory to four patients. The recorded scores are 12, 18, 9, and 15. Briefly explain how the psychologist should calculate the median for this specific set of scores and provide the final median value.
Sensitivity of the Mean to Outliers
Usefulness of the Mean in Statistical Analyses
t-Test
Statistical Mean Formula
Hypothetical Population Mean
z Score
Standard Deviation
Which measure of central tendency is calculated by finding the sum of all scores in a distribution and dividing that sum by the total number of scores?
In a psychological study, if a researcher increases the value of one participant's score while keeping the total number of participants the same, the mean () of the distribution will also increase.
A researcher is comparing the number of errors made by four different groups of participants on a cognitive task. Based on the scores provided for each group, arrange the groups in order from the lowest mean (M) number of errors to the highest mean (M) number of errors.
A psychology researcher is evaluating how various modifications to a dataset impact the calculated mean (). Match each modification to its specific logical consequence for the resulting mean.
A cognitive psychologist is designing a study to test a new mnemonic device. To establish baseline equivalence, the researcher must construct two pilot groups ( each) where the mean () recall score for the 'Experimental' group is designed to be exactly points higher than the mean () for the 'Control' group. Which of the following data generation plans successfully constructs these groups to meet this research requirement?
Formula for the Mean
In psychological research, the mean (symbolized as ) is a widely used measure of central tendency because it has mathematical properties that are valuable for inferential statistics.
A researcher is evaluating a peer's statistical report and discovers that the mean () was calculated by dividing the sum of all scores by the total number of participants minus two. The researcher would judge this resulting value to be a(n) _____ representation of the distribution's average.
A cognitive psychologist records the reaction times of three participants as 4, 6, and 8 seconds. Match each component of the statistical mean formula () to its applied value or role in this dataset.
A researcher is calculating a z score for a participant. According to the relationship between descriptive statistics, the researcher must find the difference between that individual's score and the distribution's _____ before dividing by the standard deviation.
A psychologist needs to evaluate whether a new therapy group has a higher average wellness score than a control group. Arrange the steps the psychologist should take to calculate the mean () for the therapy group to begin this evaluation.
Define the statistical mean as a measure of central tendency. In your response, explicitly state how it is calculated and provide two reasons why it is widely used in psychological research.
Based on the researcher's calculation method, identify which measure of central tendency she is using and explain why this specific measure is necessary for her planned follow-up analyses (calculating z scores and performing inferential statistics).
A clinical psychologist measures the number of sleep interruptions for a patient over three nights, recording scores of 2, 4, and 6 interruptions. Using the standard statistical formula (), calculate the mean number of sleep interruptions for this patient.
Learn After
A real estate agent is marketing a neighborhood of 10 homes. Nine of the homes are valued at approximately $200,000 each, while one newly built mansion is valued at $3,200,000. The agent advertises the 'average home value' in the neighborhood as $500,000. Which statement best evaluates the agent's use of this figure?
Example of the Mean's Sensitivity to Outliers
In a highly skewed distribution, how does the presence of extreme scores typically affect the mean?
In a distribution of reaction times, the mean will typically be more significantly shifted by the presence of a few extreme outliers than the median will.
Match each research scenario with the expected relationship between the mean and the median, based on how the presence of extreme scores (outliers) influences the mean.
A researcher is analyzing a dataset of participants' reaction times that is positively skewed due to a few participants who took an exceptionally long time to respond (outliers). Based on the sensitivity of the mean to these extreme scores, arrange the following values in order from the lowest numerical value to the highest numerical value within this specific distribution.
A researcher is developing a reporting protocol for a psychology study on reaction times. One participant's response is times slower than everyone else's. Arrange the following steps in the correct order to create a data-summary protocol that ensures the final report accurately represents the 'typical' participant by accounting for the mean's sensitivity to extreme scores.
In a skewed distribution, the mean is typically pulled away from the median in the direction of the shorter tail of the distribution.
A researcher is examining the weekly study hours of a group of psychology students. Most students study between and hours per week, but two students report studying hours per week. Which statement best explains why the mean of this dataset is not a good representation of the typical student's study hours?
A researcher is evaluating the central tendency of a dataset containing several extreme outliers that have significantly shifted the average value. After reviewing the data, the researcher concludes that the mean is a(n) _____ representation of the typical score because its high sensitivity to those outliers causes it to no longer accurately reflect the center of the distribution.
Analyze the impact of different distribution characteristics on central tendency measures by matching each concept with the statement that best describes its behavior or role in a dataset.
A researcher is evaluating a reaction time dataset and finds that a few extremely long response times have created a highly skewed distribution. The researcher must determine which measure of central tendency provides the most accurate representation of the typical score. Because the mean is pulled away from the center by these extreme scores, the researcher decides that the mean is not an accurate representation and evaluates that they should report the _____ instead.