Very often in practice we are called upon to make decisions about populations on the basis of sample information. Such decisions are called statistical decisions. For example, we may wish to decide on the basis of sample data whether one psychological procedure is better than another, whether the findings from survey data are representative of the population, whether the conclusions reached as to an experiment are valid, etc.
What is a Hypothesis?
A hypothesis refers to conjectures that can be used to explain observations; a hunch, or educated guess, which is advanced for the purpose of being tested. A hypothesis provides us with a guiding idea to determine what was relevant and what was irrelevant. This mode of accounting for problems possesses three steps:
1. the proposal of a hypothesis to account for a phenomenon
2. the deduction from the hypothesis that certain phenomena should be observed in given circumstances
3. the checking of this deduction by observation.
Example
We may reason that a deprived family background causes low reading attainment in children. We may then produce empirical evidence that low family income and overcrowding are associated with poor reading attainment. If no such evidence was forthcoming, then the hypothesis must be decisively rejected. But if the predicted relationship was found, could we conclude that the hypothesis was correct, i.e. poor family background does cause low reading attainment? The answer must be ‘no’. It might equally be the case that a low level of school resources is also to blame. The main point to note is that the scientific process never leads to certainty in explanation, only to the rejection of existing hypotheses and the construction of new ones.
How do we Formulate Hypotheses?
Generally, a hypothesis is derived from theory or the literature on the problem. But regardless of source, a hypothesis must meet one criterion: it must be stated in such a way that it can be confirmed or refuted. In other words, it must be amenable to test.
Example “Is a student’s authoritarian behaviour directly related to his or her attitudes concerning punishment received from his or her parents?”
Although the statement of purpose is quite detailed, it conveys little information unless the conceptual propositions are detailed: what do we mean by authoritarian behaviour, attitudes concerning punishment and received? Although most individuals may know the meanings, they lack scientific precision.
Once the conceptual propositions have been established (i.e., the meanings in scientific terms) we then need an operational proposition that defines the concepts in such a way that they can be observed and measured. This may be derived from a score achieved on a particular scale of authoritarianism; indirectly, we may study the relationship between childhood aggression and exposure to violent television programmes, but we still need to define both the variables under study – aggression and television violence – in operational terms. The former might be simply a tally of aggressive acts such as hitting, fighting, damaging property, etc. Or it might be based on the analysis of projective test material (Thematic Apperception Test). A panel of judges may be used to develop an operational definition of aggression by watching a child in a free-play situation and then rate the child’s aggressiveness on a five-point scale. Alternatively, we could observe children as they play with a selection of toys we had previously classified as aggressive (guns, tanks, knives, etc,) and toys classified as non-aggressive (cars, dolls, puzzles, etc.).
Defining violence may be a little more difficult to agree on. What constitutes television violence? The problem here is both cultural and the difference in precision between what the general public will accept in defining a term and what researchers will accept. To operationalize the concept of television violence we could use a checklist of items, such as “Was there physical contact of an aggressive nature?” “Has an illegal act taken place?” etc. Perhaps you can establish a criterion that a violent television programme must have five or more items checked ‘yes’ for it to be considered violent.
From the General to the Operational
Problem or General Hypothesis: You expect some children to read better than others because they come from homes, in which there are positive values and attitudes to education.
Research Hypothesis: Reading ability in nine-year-old children is related to parental attitudes towards education.
Operational Hypothesis: There is a significant relationship between reading ability for nine-year-old children living in Carlisle as measured by standardized reading test (state test) and parental attitudes to education as measured by the attitudinal scale derived from test (state test).
Criteria for Judging Hypotheses
1. Hypotheses should be clearly stated. General terms such as personality, self-esteem, moral fibre, etc. should be avoided: the statement demands concise, scientific definition of the concepts and terms:
“Personality as measured by the Eysenck’s Personality Inventory . . .”
2. Hypotheses predict an outcome, an obvious necessity is that instrument exists to provide valid and reliable measures of the variables involved.
3. Hypotheses should state differences or relationships between variables. A satisfactory hypothesis is one in which the expected relationship is made explicit.
4. Hypotheses should be limited in scope. Hypotheses of global significance are not required. Those that are specific and relatively simple to test are preferable.
5. Hypotheses should not be inconsistent with known facts. All hypotheses should be grounded in past knowledge. The hypothesis should not lead the cynical reader to say: “Whatever led you to expect that?” or “You made this one up after you collected the data.”
Unconfirmed Hypotheses
Does an unconfirmed hypothesis invalidate prior knowledge or the literature? Well, either the hypothesis is false, or some of the previous information is erroneous, or other information has been overlooked, or some information may have been misinterpreted b the researcher, or the experimental design might have been incorrect. A new hypothesis may need to be formulated and tested using a different study – scientific progress in developing alternative paradigms! Even if the hypothesis is refuted, knowledge is advanced
Statistical Hypotheses
In attempting to reach decisions, it is useful to make assumptions about the population involved in the study. Such assumptions, which may or may not be true, are called statistical hypotheses and in general are statement about the probability distributions of the population. In many instances we formulate a statistical hypothesis for the sole purpose of rejecting or nullifying it. For example, if we want to decide whether one psychological procedure is better than another, we formulate the hypothesis that there is no significant difference between the procedures (i.e. any observed differences are merely due to sampling error from the same population). Such hypotheses are called null hypotheses and are denoted by the symbol H0. Any hypothesis that differs from a given hypothesis is called an alternative hypothesis. A hypothesis alternative to the null hypothesis is denoted H1.
Tests of Hypotheses and Significance
If on the supposition that a particular hypothesis is true we find that results observed in a random sample differ markedly from those expected under the hypothesis. On the basis of chance using sampling theory, we would say that the observed differences are significant and we would be inclined to reject the hypothesis (or at least not accept it on the basis of the evidence obtained). Procedures which enable us to decide whether to accept or reject hypotheses or to determine whether observed samples differ significantly from the expected results are called tests of hypotheses, tests of significance or rules of decision.
Type I and Type II Errors
If we reject a hypothesis when it should be accepted, we say a Type I error has been made. On the other hand, if we accept a hypothesis when it should be rejected, we say that a Type II error has been made. In either case a wrong decision has been made or an error of judgement has occurred. In order for any tests of hypotheses to be sound, they must be designed so as to minimize these errors of judgement. The probability of making a Type I error is determined before the experiment. A Type I error can be limited by properly choosing a level of significance : a level of significance of 0.05 implies a 5 percent chance of making the wrong decision. The probability of making a Type II error cannot be determined before the experiment. It is possible to avoid risking a Type II error altogether by simply not making them, which amounts to never accepting hypotheses. However, for any given sample size, an attempt to reduce one type of error is often accompanied by an increase in the other type of error. The only way to reduce both types of error is to increase the sample size, which may or may not be possible.
One-Tailed and Two-Tailed Tests
Often we display interest in extreme values of the statistic X or its corresponding standard z score on both sides of the mean, i.e. in both “tails” of the distribution. For this reason such tests are called two-tailed tests. In such tests the area covered by critical regions in both tails is equal to the level of significance, i.e. 1/2. However, we may be interested only in extreme values to one side of the mean, for example when we are testing the hypothesis that one procedure is better than another. Such tests are called one-tailed tests and the critical region to one side of the distribution has an area equal to the level of significance.
What is a Hypothesis?
A hypothesis refers to conjectures that can be used to explain observations; a hunch, or educated guess, which is advanced for the purpose of being tested. A hypothesis provides us with a guiding idea to determine what was relevant and what was irrelevant. This mode of accounting for problems possesses three steps:
1. the proposal of a hypothesis to account for a phenomenon
2. the deduction from the hypothesis that certain phenomena should be observed in given circumstances
3. the checking of this deduction by observation.
Example
We may reason that a deprived family background causes low reading attainment in children. We may then produce empirical evidence that low family income and overcrowding are associated with poor reading attainment. If no such evidence was forthcoming, then the hypothesis must be decisively rejected. But if the predicted relationship was found, could we conclude that the hypothesis was correct, i.e. poor family background does cause low reading attainment? The answer must be ‘no’. It might equally be the case that a low level of school resources is also to blame. The main point to note is that the scientific process never leads to certainty in explanation, only to the rejection of existing hypotheses and the construction of new ones.
How do we Formulate Hypotheses?
Generally, a hypothesis is derived from theory or the literature on the problem. But regardless of source, a hypothesis must meet one criterion: it must be stated in such a way that it can be confirmed or refuted. In other words, it must be amenable to test.
Example “Is a student’s authoritarian behaviour directly related to his or her attitudes concerning punishment received from his or her parents?”
Although the statement of purpose is quite detailed, it conveys little information unless the conceptual propositions are detailed: what do we mean by authoritarian behaviour, attitudes concerning punishment and received? Although most individuals may know the meanings, they lack scientific precision.
Once the conceptual propositions have been established (i.e., the meanings in scientific terms) we then need an operational proposition that defines the concepts in such a way that they can be observed and measured. This may be derived from a score achieved on a particular scale of authoritarianism; indirectly, we may study the relationship between childhood aggression and exposure to violent television programmes, but we still need to define both the variables under study – aggression and television violence – in operational terms. The former might be simply a tally of aggressive acts such as hitting, fighting, damaging property, etc. Or it might be based on the analysis of projective test material (Thematic Apperception Test). A panel of judges may be used to develop an operational definition of aggression by watching a child in a free-play situation and then rate the child’s aggressiveness on a five-point scale. Alternatively, we could observe children as they play with a selection of toys we had previously classified as aggressive (guns, tanks, knives, etc,) and toys classified as non-aggressive (cars, dolls, puzzles, etc.).
Defining violence may be a little more difficult to agree on. What constitutes television violence? The problem here is both cultural and the difference in precision between what the general public will accept in defining a term and what researchers will accept. To operationalize the concept of television violence we could use a checklist of items, such as “Was there physical contact of an aggressive nature?” “Has an illegal act taken place?” etc. Perhaps you can establish a criterion that a violent television programme must have five or more items checked ‘yes’ for it to be considered violent.
From the General to the Operational
Problem or General Hypothesis: You expect some children to read better than others because they come from homes, in which there are positive values and attitudes to education.
Research Hypothesis: Reading ability in nine-year-old children is related to parental attitudes towards education.
Operational Hypothesis: There is a significant relationship between reading ability for nine-year-old children living in Carlisle as measured by standardized reading test (state test) and parental attitudes to education as measured by the attitudinal scale derived from test (state test).
Criteria for Judging Hypotheses
1. Hypotheses should be clearly stated. General terms such as personality, self-esteem, moral fibre, etc. should be avoided: the statement demands concise, scientific definition of the concepts and terms:
“Personality as measured by the Eysenck’s Personality Inventory . . .”
2. Hypotheses predict an outcome, an obvious necessity is that instrument exists to provide valid and reliable measures of the variables involved.
3. Hypotheses should state differences or relationships between variables. A satisfactory hypothesis is one in which the expected relationship is made explicit.
4. Hypotheses should be limited in scope. Hypotheses of global significance are not required. Those that are specific and relatively simple to test are preferable.
5. Hypotheses should not be inconsistent with known facts. All hypotheses should be grounded in past knowledge. The hypothesis should not lead the cynical reader to say: “Whatever led you to expect that?” or “You made this one up after you collected the data.”
Unconfirmed Hypotheses
Does an unconfirmed hypothesis invalidate prior knowledge or the literature? Well, either the hypothesis is false, or some of the previous information is erroneous, or other information has been overlooked, or some information may have been misinterpreted b the researcher, or the experimental design might have been incorrect. A new hypothesis may need to be formulated and tested using a different study – scientific progress in developing alternative paradigms! Even if the hypothesis is refuted, knowledge is advanced
Statistical Hypotheses
In attempting to reach decisions, it is useful to make assumptions about the population involved in the study. Such assumptions, which may or may not be true, are called statistical hypotheses and in general are statement about the probability distributions of the population. In many instances we formulate a statistical hypothesis for the sole purpose of rejecting or nullifying it. For example, if we want to decide whether one psychological procedure is better than another, we formulate the hypothesis that there is no significant difference between the procedures (i.e. any observed differences are merely due to sampling error from the same population). Such hypotheses are called null hypotheses and are denoted by the symbol H0. Any hypothesis that differs from a given hypothesis is called an alternative hypothesis. A hypothesis alternative to the null hypothesis is denoted H1.
Tests of Hypotheses and Significance
If on the supposition that a particular hypothesis is true we find that results observed in a random sample differ markedly from those expected under the hypothesis. On the basis of chance using sampling theory, we would say that the observed differences are significant and we would be inclined to reject the hypothesis (or at least not accept it on the basis of the evidence obtained). Procedures which enable us to decide whether to accept or reject hypotheses or to determine whether observed samples differ significantly from the expected results are called tests of hypotheses, tests of significance or rules of decision.
Type I and Type II Errors
If we reject a hypothesis when it should be accepted, we say a Type I error has been made. On the other hand, if we accept a hypothesis when it should be rejected, we say that a Type II error has been made. In either case a wrong decision has been made or an error of judgement has occurred. In order for any tests of hypotheses to be sound, they must be designed so as to minimize these errors of judgement. The probability of making a Type I error is determined before the experiment. A Type I error can be limited by properly choosing a level of significance : a level of significance of 0.05 implies a 5 percent chance of making the wrong decision. The probability of making a Type II error cannot be determined before the experiment. It is possible to avoid risking a Type II error altogether by simply not making them, which amounts to never accepting hypotheses. However, for any given sample size, an attempt to reduce one type of error is often accompanied by an increase in the other type of error. The only way to reduce both types of error is to increase the sample size, which may or may not be possible.
One-Tailed and Two-Tailed Tests
Often we display interest in extreme values of the statistic X or its corresponding standard z score on both sides of the mean, i.e. in both “tails” of the distribution. For this reason such tests are called two-tailed tests. In such tests the area covered by critical regions in both tails is equal to the level of significance, i.e. 1/2. However, we may be interested only in extreme values to one side of the mean, for example when we are testing the hypothesis that one procedure is better than another. Such tests are called one-tailed tests and the critical region to one side of the distribution has an area equal to the level of significance.
Thanks
0 comments:
Post a Comment