Statistical Research FAQ
Statistical Research FAQ Answers
-
In a study performed with 58,288 students by Carini, Hayek, Kuh and
Ouiment, they found that multivariate regression analysis indicated mode
effects were generally small. However, they did tend to respond slightly more
favorably on a web based form.(1) Also, individual items were found to be highly correlated on the two
instruments with almost identically high reliability(2) in a study by Cates
in 1993.[TOP]
-
Perhaps the best research we found was from McGourty, J. Scoles, K. Thorpe:
An investigation of non-response bias was conducted at Drexel University for Fall 2001 term to
determine if some types of students were more likely to complete their course
evaluations than others. The study examined various student demographic
characteristics including student sex, minority status, class standing, and
cumulative grade point average (GPA). The study revealed that student sex,
class standing, and cumulative GPA were predictors of student completion of the
course evaluation process. Interestingly, women were more likely than men to
complete the course evaluation process. Women completed about 54% of the
potential course evaluations whereas the male students completed only 49% of
their assigned course evaluations.
Juniors and seniors were more likely to complete
course evaluations than sophomore and pre-junior students (the majority of
students at Drexel University are enrolled in five year programs). The seniors
completed 54% of the their assigned course evaluations, while juniors completed
46% of their potential course evaluations. Interestingly, the freshmen
completed the greatest percentage of assigned course evaluations, 58%.
Since the courses enrolled tend to follow class
standing, these results suggest that additional encouragement should be
directed to sophomore and pre-junior classes by having faculty in those classes
encourage students to complete the evaluation process.
Perhaps not surprising, students with higher
cumulative GPAs were more likely to complete the online course evaluations than
those students with lower cumulative GPAs. This finding may affect the results
of the course evaluation process since students who are performing less well
academically are not equally represented in the course evaluation responses.
However, it is not clear what effect the under representation of this
population has on the overall course evaluations. " (3) [TOP]
-
Marsh, H.W. & Roche, L. A. (1997) in their review of multi section studies
summarize the relationships found between student ratings and background
characteristics. Their summary concludes the following:
- Popular classes that have higher interest are rated more favorably, although it is not always clear if interest existed before the start of the course or was generated by the
course or the instructor.
- Class average grades ARE correlated with class average student ratings.
Interpretation depends on whether grades are due to a leniency effect, superior
learning, or pre-existing differences.
- Elective courses and courses with more students taking the course because of
general interest, tend to be rated higher.
- Difficult courses that require more time and effort are rated somewhat more favorably.
- Smaller classes are rated somewhat more favorably than larger classes.
- Graduate level courses are rated more favorably than undergraduate, upper level
courses are rated higher than lower level courses.
- Ratings are somewhat higher if it is known that they are used for
tenure/promotion decisions.
- Ratings are somewhat higher if not anonymous and if the instructor is present
when the ratings are being completed.
- Mixed findings and/or little or no effect is found for factors of instructor
rank, gender of instructor or student, academic discipline, and students?
personality. [TOP ]
-
Marsh, H.W. & Roche, L. A. (1997) argue that teaching is multidimensional
and thus the validity and usefulness of student evaluation information depends
on the content and the coverage of the items contained in a rating instrument.
Poorly worded or inappropriate items will not provide useful information,
whereas scores averaged across an ill-defined assortment of items offer no
basis for knowing what is being measured. Their methodology focuses on factor
analytic studies of many ratings instruments and identifies nine factors that
show up with regularity. The factors are learning/value, instructor
enthusiasm, organization/clarity, group interaction, individual rapport,
breadth of coverage, examinations/grading, assignments/readings, and
workload/difficulty. Marsh & Roche have developed an
instrument based on these factors. More important, however, is their argument
that student ratings comprised of only global items are bound to miss the
richness and complexity of teaching. They believe that this richness and
complexity should be taken into account when ratings are used for
administrative and personnel purposes. In other words committees and administrators
should take the time to understand what the ratings might be saying about a
faculty member's teaching. (4) [TOP]
-
Richard Naylor of Burns Owens Partnership, studied some of the methodological principles
underlying the kind of work he is involved in. In an ideal world, he
explained, it would be possible to take a census approach to
research, and survey everyone in the population universe that
you wish to examine. In practice, it is never possible to achieve a 100%
response rate, not even when the Government conducts an official census.
Instead, you must work with a survey sample that accurately
represents the population universe.
Survey results are typically 'grossed-up' to provide figures for the whole
population. For example, the
BARB figures recording UK TV audiences rely on just 52,000
interviews per year to estimate viewing figures for more than 24 million TV
owning households.
The tricky part is that unless you know the make up of your population
universe, you won't be able to structure a representative sample. But how
reliable and statistically significant does your sample need to be?
Professional researchers have written many an academic paper investigating the
minutiae of this question, but if you want to keep things relatively simple,
Naylor highlights two crucial criteria to keep in mind.
Two Magic Numbers
100
This is the minimum sample size for which you can achieve a 95% confidence interval. (5)[TOP]
95% Confidence Interval
Confidence intervals (reflecting error levels tolerated) are indicators of reliability, and surveys should be able to demonstrate a confidence interval of 95% or better. According to the
National Statistics web site, "a 95% confidence interval is a range within which the true population would fall for 95% of the times the sample survey was repeated. It is a standard way of expressing the statistical accuracy of a survey based
estimate." The larger your sample size, the more accurate you can expect your survey to be.
Research
- Carini, R.M., Hayek, J.C., Kuh, G.D., & Ouimet, J.A. (2003). College student responses to web and paper surveys: Does mode matter? Research in Higher Education, 44 (1), 1-19.
- Cates, W.M. (1993). A small-scale comparison of the equivalence of paper-and-pencil and computerized versions of student end-of-course evaluations. Computers in Human Behavior, 9, 401-409.
- McGourty, J., Scoles, K. & Thorpe, S. (2002, November). Web-based student evaluation: comparing the experience at two universities. Paper presented at the 32nd ASEE/IEEE Frontiers in Education Conference, Boston, MA.
- Marsh, H.W. & Roche, L. A. (1997) www.smith.edu/deanoffaculty/Al.html
- Richard Naylor of Burns Owens Partnership,
www.nmk.co.uk/article/2004/03/03/online-research. Presented an introduction to
the principles and practice of designing, delivering and analysing online surveys.