Appendix A
METHODOLOGY TABLES- Table A-1: General Criteria for Selecting Studies for In-Depth Review
- Table A-2: Additional Criteria for In-Depth Review of Studies on Assessment Methods
- Table A-3: Additional Criteria for In-Depth Review of Studies on Intervention Methods
- Table A-4: Interpreting the Sensitivity and Specificity of Assessment Tests
- Table A-5: Strength of Evidence Ratings for Guideline Recommendations
- Table A-6: Ratings for Applicability of Evidence to the Guideline Topic
- Table A-7: Characteristics of Single-Subject Methodology Used to Study Behavioral Interventions
- Table A-8: Criteria for In-Depth Review for Single-Subject Design Studies of Interventions for Children with Autism
Table
A-1
General Criteria for Selecting Studies for In-Depth
Review
The panel established quality criteria for the in-depth review of scientific articles considered to contain adequate evidence about efficacy of assessment or intervention methods.
To be selected for in-depth review by the panel, a scientific article had to meet all of the general criteria for in-depth review given below, as well as all the additional criteria for either assessment or intervention methods given in Tables A-2 and A-3, respectively.
To meet the general criteria for in-depth review, studies had to:
- Be published in English in a peer-reviewed scientific/academic publication
- Provide original data about efficacy of an assessment or intervention method for autism (or be a systematic synthesis of such data from other studies)
- Evaluate an assessment or intervention method currently available to providers in the U.S. (and not evaluate just an obsolete or clearly experimental method)
- Provide an adequate description of the assessment or intervention methods evaluated, or provide a reference where such a description could be found
- Evaluate subjects of appropriate age (see Table A-6)
Table A-2
Additional Criteria for In-Depth Review of Studies on Assessment Methods
To be considered as adequate evidence about efficacy, studies of assessment methods had to meet all criteria given in A and B below:
A. Meet all the general criteria for in-depth review given in Table A-1,
and
B. Meet all the following additional criteria for studies of assessment methods:
- Compare the findings of the test evaluated to an adequate reference standard *
- Give the sensitivity and specificity of the test compared to an adequate reference standard or provide enough data so that these can be calculated
- * The clinical judgment of an experienced, qualified professional using DSM-III or DSM-III-R or DSM-IV was considered to be an adequate reference standard.
Table A-3
Additional Criteria for In-Depth Review of Studies on Intervention Methods
To be considered as adequate evidence about efficacy, studies of intervention methods had to meet all criteria given in A and B below:
A. Meet all the general criteria for in-depth review given in Table A-1,
and
B. Meet all the following additional criteria for studies of intervention methods:
All intervention studies had to:
- Evaluate functional outcomes that are important to a child's overall health or development or are important for the family or society
Studies using group designs had to:
- Be controlled trials evaluating a group receiving the intervention compared to a group(s) receiving no intervention or a different intervention
- Assign subjects to groups either randomly or using a method that did not appear to significantly bias the results
- Use equivalent methods for measuring baseline subject characteristics and outcomes for all groups studied
Studies using single-subject designs had to:
- Use an acceptable research design see Table A-8 for more detailed information about the criteria for acceptable single-subject design studies)
- Report on at least three subjects
Table A-4
Interpreting the Sensitivity and Specificity of Assessment Tests
The established method for evaluating the efficacy (or accuracy) of an assessment test is to determine its sensitivity and specificity compared to an adequate reference standard.
- Reference Standard: An alternative method to determine if a subject actually has the condition that the test is attempting to identify. It is important that the reference standard be independent of the test being evaluated. It is also presumed that the reference standard is a more accurate way to identify the condition than is the test being evaluated. To be useful in calculating sensitivity and specificity, a reference standard has to have specified diagnostic criteria to determine if a person does or does not have the condition.
- Sensitivity: The percentage of all persons with the condition who are correctly identified as having the condition (based on the reference standard). The sensitivity of a test is the percentage of all persons with the condition who have positive tests that correctly identify the condition (the true positive rate).
- Specificity: The percentage of all persons who do not have the condition (according to the reference standard) who are correctly identified by the tests as being free of the condition. The specificity of a test is the percentage of all persons who do not have the condition who have negative test results (the true negative rate).
- Criteria for determining positive or negative tests: The rules to interpret test results and determine if the test is positive (indicating the individual has the condition) or negative (indicating that the person does not have the condition). If a test provides numerical measurements, then a specific number or "cut-off score" is used to determine positive and negative tests. If a test uses a set of more descriptive criteria, these are often referred to as the "cut-off criteria" for the test.
Methods for calculation of sensitivity and
specificity
*According to the reference standard Reference Standard
|
- The higher the sensitivity and specificity, the greater the accuracy of the test
The higher the sensitivity and specificity of a test, the greater its accuracy. The perfect test would have both sensitivity and specificity of 100% and would have no false positive or false negative results. Such a test would correctly identify all those with the condition and all those who did not have the condition (100% sensitivity and 100% specificity).
- What is "acceptable" for sensitivity or specificity depends on the situation
-
In the real world, assessment methods for screening and early identification of a disorder rarely have perfect sensitivity and specificity. There is no general agreement about what the acceptable levels of sensitivity and specificity for an assessment test are. Acceptable levels vary depending upon the intent of the test, the setting of testing (for example, general population or a specific subgroup at risk for the condition), the prevalence of the condition in the group being tested, alternate methods of assessment, and costs and benefits of testing.
- Effect of different test or reference standard criteria on sensitivity / specificity
-
To calculate sensitivity and specificity of a test, the reference standard must employ specific criteria for determining if a person does or does not have a condition. Also, the cut-off criteria must be given for both the test being evaluated and the reference standard. Using different cut-off criteria for either the test or the reference standard (or using a different reference standard) will result in different values for sensitivity and specificity for the test.
Table A-5
Strength of Evidence Ratings for Guideline Recommendations
Each of the guideline recommendations in Chapters III and IV is followed by one of the four "strength of evidence" ratings described below. These strength of evidence ratings indicate the amount, general quality, and clinical applicability (to the guideline topic) of scientific evidence used as the basis for each guideline recommendation.
Strong evidence is defined as evidence from two or more studies that met criteria for adequate evidence about efficacy and had at least moderate applicability to the topic, with the evidence consistently and strongly supporting the recommendation. | |
[B] = | Moderate evidence is defined as evidence from at least one study that met criteria for adequate evidence about efficacy and had at least moderate applicability to the topic, and where the evidence supports the recommendation. |
[C] = | Limited evidence is defined as evidence from at least one study that met criteria for adequate evidence about efficacy and had at least minimally acceptable applicability to the topic, and where the evidence supports the recommendation. |
[D] = | Panel consensus opinion (either [D1] or
[D2] below):
[D1] = Panel consensus opinion based on information not meeting criteria for adequate evidence about efficacy, on topics where a systematic review of the literature was done [D2] = Panel consensus opinion on topics where a systematic literature review was not done |
Table A-6
Ratings for Applicability of Evidence to the Guideline Topic
This table provides criteria for rating a study's evidence in terms of its applicability to the guideline topic (assessment and treatment/intervention of autism in children from birth to 3 years old). The categories below for acceptable and unacceptable applicability refer to the children's ages at the beginning of the study. For studies that present data separately for age subgroups, the applicability rating applies to the evidence used rather than to the entire study.
Acceptable ratings of applicability for this guideline, by age groupings:
High | = | All children under 3 years old |
Moderately high | = | All children under 4 years old |
Moderate | = | Some children under 3 years old; all children under 6 years old |
Moderately low | = | All children from 4 to 6 years old |
Low | = | Some children under and some over 6 years old |
Unacceptable applicability for this guideline: | ||
Unacceptable | = | No children under 6 years old |
Table A-7
Characteristics of Single-Subject Methodology Used to Study Behavioral Interventions
Introduction to single-subject methodology
-
Single-subject methodology is an approach to determine the effect of an intervention on an individual. In contrast, group research designs focus on differences between groups resulting from different interventions.
Acceptable single-subject study designs are based on repeated controlled application of the intervention to demonstrate its effectiveness and the use of appropriate control conditions to evaluate the degree of change as compared to pre-intervention behavior variability. Case series and anecdotal reports simply present "pre - post" information and do not meet these research standards.
Single-subject design studies involve systematically observing and recording an individual's specific behaviors. Repeated measurements of the behavior (the frequency of the behavior within a discrete period of time) are then recorded on a graph. Patterns are visually analyzed to determine if the changes in behavior are due to the intervention.
Single-subject methodology is used by researchers to evaluate the general effectiveness of specific intervention techniques. Single-subject methodology can also be used by the professional providing the intervention to evaluate the effect of a specific intervention for an individual child.
In studying specific behavioral interventions, single-subject design methods have the following advantages:
- Focusing on the process of change in an individual's behavior
- Being compatible with and complementing group research
- Addressing the confounding variables present in anecdotal or simple case studies (such as the effects of development and maturation, behavior variation and reactivity, coincidental events, measurement error, expectation and unintended observer bias)
- Examining the specific relationship between the intervention and directly observable and quantifiable change displayed by the individual receiving intervention
- Following a methodology that can be specifically replicated by other clinicians and researchers
- Being flexible and able to adapt to the individual and interventions that are under study
Commonly used single-subject designs
Commonly used single-subject designs include:
Within Series: Sequential implementation
A-B-A
A-B-A-B
Other A-B permutations
Between Series: Concurrent implementation
Alternating treatments
Simultaneous treatments
Combined Series: Multiple baseline
Multiple baseline across behaviors
Multiple baseline across settings
Multiple baseline across individuals
Multiple baseline permutations of the above
The purpose of all these study designs is to differentiate between normally occurring variation in the person's behavior and the effects of the intervention. As used below, the term "A" is used to identify a period when no treatment is given, and the term "B" refers to the intervention phase, in which the treatment procedure is introduced in a controlled fashion. Multiple baseline designs refer to conducting measurement of several dependent variables, subjects, or settings simultaneously to examining sources of possible extraneous factors that may be influencing behavior change.
Table A-8
Criteria for In-Depth Review for Single-Subject Design Studies of Interventions for Children with Autism
Articles that used a single-subject design methodology to evaluate an intervention were considered to contain adequate evidence about efficacy for that method and were selected for in-depth review if all the following criteria were met. The study had to:
- Evaluate an behavioral intervention method or technique
- Evaluate three or more subjects with autism, including at least one child under 6 years old and at least one child with autism
- Report on functional outcomes important for the child or the family (or some intermediate outcome demonstrated to be related to a functional outcome)
- Use one of the following standard single-subject research designs
- multiple baseline
- alternating treatment
- A-B-A-B
- combination of the above designs