Measurement & Evaluation in Education

Course description: Concepts of measurement and evaluation, classroom test construction, creation and use of derived scores, selection and use of published measurement instruments, current issues. Educational Testing and Measurement was the textbook for the course.

AttachmentSize
EDF6432 David Norman create objective test questions.pdf20.45 KB

Argument for testing in schools

As a graduate of the Texas public education system, I was required, starting in 3rd grade, to pass the Texas Assessment of Academic Skills (TAAS) test for grade advancement and Texas Academic Skills Program (TASP) for entry into The University of Texas for my bachelors degree. I have been in the midst of students who did not pass one or both exams. Opponents of school testing cite research showing long-term damage to students who are retained. In place of standardized tests, strong support systems, high expectations, caring, and more long-term, concept mastery evaluations are suggested (White, 2005, ¶9). I thought in grade school, and continue to think now, the high-profile exams I was required to pass were appropriate queries of my knowledge.

Thesis

Criterion-referenced tests are essential to create credibility and value for diplomas.

Proof

High-stakes tests impact many important decisions including grade promotion, high school graduation, administrative incentives and penalties, and teacher placement (Kubiszyn & Borich, 2003, p. 21). An increasing number of states are requiring students to pass a test to graduate high school, without necessarily providing remediation for students who fail (Kubiszyn & Borich, 2003, p. 21).

Texas has established the Texas Essential Knowledge and Skills (TEKS) academic standards for reading, math, writing, science, and social studies (Student Assessment, Texas Education Agency). In the past, Texas administered the TAAS, and now administers the TAKS test on a criterion-referenced basis. Students and parents receive a report outlining the strengths and weaknesses of the student and a "pass", "pass with academic recognition", or "fail" result for each subject area. Students, parents, and teachers are able to better pinpoint where students and teachers need to concentrate. Over an eight year period, Texas was able to demonstrate, through the TAAS test, an increase in the percent of students meeting minimum state, academic expectations in reading, mathematics, and writing (Texas Assessment of Academic Skills, 2002).

Teachers should be encouraged to break from their bubble of subject-area focus they have traditionally been stuck in from teaching models of the industrial age (Marshak, 2003). A case study in co-teaching revealed how teachers were influenced in states with high-stakes tests and felt compelled to teach only test-related material (Mastropieri, M. A., Scruggs, T. E., Graetz, J., Norland, J., Gardizi, W., & McDuffie, K., 2005). Instead of leaving out non-test teaching materials, teachers communicated with each other to compound educational value in classroom activities, in this case, computer class. In addition to teaching computer modeling, the computer teacher was able to create an activity that used computer modeling to teach world history. Students' test scores improved as a result and students requested copies of the software to take home for additional practice with other subject areas.

Refutation

Opponents of high-stakes testing work on the assumption where high standards are meant to drive low-performing disadvantaged students to work harder. The assumption of the high standards argument is students are all qualified to obtain high school diplomas. While an ideal education system would cycle all students through as educated graduates, not all students have the mental capacity, the personal motivation and persistence, or willingness to fulfill the requirements set forth by the education system. Deficient students are given the opportunity to re-take high-stake tests and should not graduate simply because they showed improvement, a capacity to learn, a willingness to participate, or the persistence to re-take tests and fail. The high school diploma should represent the ability to read, write, and do math for students who can demonstrate their mastery of state academic standards.

An Education Week survey showed "66 percent of teachers thought state tests were forcing them to concentrate too much on what was tested to the detriment of other important topics, and nearly half said they spent a 'great deal of time' helping students prepare for tests" (Doherty, 2002, ¶7). While students should learn more than what is on state tests, it is important for students to master essential, basic subject areas. The requirements on state exams should indeed supersede the education interests of individual teachers.

The American Educational Research Association made a statement concerning high-stakes testing, suggesting accommodations should be made for students not proficient in English (Kubiszyn & Borich, 2003, p. 21). In a country where government, business, and educational transactions are primarily English, high-stakes tests double as an assessment not only of content mastery, but communication skills. The added communication assessment is important, not something to protest. Surveys and research conducted by the government of Manitoba Canada, The Wall Street Journal, and the Association of Legal Administrators found communication and inter-personal skills among the highest demand skills of job applicants (Johnstone & Williams, 2003; Perry, 2002; The Association of Legal Administrators Competency-Based Education Job/Needs Analysis, 2004). Enabling students to avoid gaining proficiency in English does not benefit industries with shortages of job applicants nor the applicants in those industries. Knowing algebra is worthless if you don't understand the question where the details of the algebraic problem is explained.

Conclusion

Criterion-referenced testing is appropriate for schools when linked to the state's academic standards. Teachers can collaborate with others to compound educational activities to make up for time where education is spent directed on passing high-stakes tests. Texas has proven when test content is linked to state academic objectives, teachers and students are able to work together to improve education.

References

The Association of Legal Administrators Competency-Based Education Job/Needs Analysis. (2004, June 28). Association of Legal Administrators. Retrieved on June 27, 2005, from http://www.alanet.org/education/knowledgesurvey.html

Doherty, K. M. (2002, February 27). Assessment. Education Week on the Web. Retrieved June 27, 2005, from http://www.edweek.org/rc/issues/assessment/

Johnstone, P., & Williams, A. (2003, June 19). Manitoba Employer Survey 2000. Government of Canada. Retrieved on June 27, 2005, from http://www.hrsdc.gc.ca/en/mb/survey/employer-shortage.shtml

Kubiszyn, T., & Borich, G. (2003). Educational Testing and Measurement: Classroom Application and Practice (7th ed.). Hoboken, NJ: John Wiley & Sons, Inc.

Marshak, D. (2003, Nov). No Child Left Behind: A Foolish Race Into the Past. Phi Delta Kappan, 8(3), 229-231.

Mastropieri, M. A., Scruggs, T. E., Graetz, J., Norland, J., Gardizi, W., & McDuffie, K. (2005, May). Case Studies in Co-Teaching in Content Areas: Successes, Failures, and Challenges. Intervention in School and Clinc, 40(5), 260-270.

Perry, D. (2002, May 20). Do You Have the Skills Most in Demand Today? The Wall Street Journal. Retrieved on June 27, 2005, from http://www.careerjournal.com/columnists/perspective/20020520-fmp.html

Student Assessment. Texas Education Agency. Retrieved on June 28, 2005, from http://www.tea.state.tx.us/student.assessment/

Texas Assessment of Academic Skills. (2002). Texas Education Agency. Retrieved on June 28, 2005, from http://www.tea.state.tx.us/student.assessment/reporting/results/swresults/august/g310nse_au.pdf

White, J. (2005, June 30). Activity 1-Arguement Against Testing [Msg 1]. Message posted to http://webct.ucf.edu/

Different types of reliability

Type of reliability When How What
Internal consitency Assess a single dimension Correlate each individual item score with the total score. All the items on your test assess the same construct.
Interrater Find consistency in the rating of some outcome Examine the percentage of agreement between raters. The reliability coefficient for your test indicates a poor, moderate, or high degree of agreement between respondents.
Parallel Compare several different forms of a test to see if they are equivalent or reliable Correlate the scores from one form of the test with scores from a second form of the same test with the same content. Two forms of your test are equivalent to one other.
Test-retest Reliability over time Correlate the scores from time 1 with the scores of time 2. The test gives the same results even if the participants didn't all take it at the same time.

Portfolios

A good portfolio is both summative and formative in nature. The contributions to the portfolio should be evaluated as the portfolio is being created as well as a final evaluation as part of a continuous process.

Portfolios should reflect the immediate assignment task and the overall area of study. The content should reflect the interests, in addition to the abilities, of the student.

Sample essay question

Cognitive Objective:

The student should be able to:

  • Identify and describe common characteristics of programming frameworks
  • Identify and describe issues related to standardized and open source frameworks in a clear and concise manner
  • Reference and provide a short description of at least two popular PHP frameworks
  • Evaluate the quality of named frameworks

Test Item:

In an article, titled "Why PHP sucks", on bitstorm.org, Edwin Martin complained about not having a standardized framework for PHP. He cited Struts for JSP developers and .Net for ASP developers as models for a standardized framework for PHP. As opposed to a standardized framework, PHP has a multitude of open source frameworks including Midgard, Horde, Blueshoes, Cake, Seagull, Sourdough, binarycloud, SMART, and many others. Compare and contrast having a standardized framework verses multiple privately developed frameworks. Use your analysis to build an argument defending or refuting Martin's complaint. Limit your response to two pages.

Score Scheme:

Points Score Basis
10 Identified and described common characteristics of programming frameworks
10 Identified and described issues relating to standardized and open source frameworks in a clear and concise manner
10 Referenced and provided a short description of at least two popular PHP frameworks
15 Effectively evaluated the quality of named frameworks
10 Essay follows APA style guidelines, contains no spelling or grammatical errors, and is completed on time.
55 Total Points

Sample test blueprint

Test Blueprint

Categories

 

Knowledge

Comprehension

Application

Analysis

Synthesis

Evaluation

Total

(percentage)

Content Outline

(number of items)

1. The student can state the purposes for various string, database, and graphics functions in PHP. (2, 3, 5, 6, 8, 9)

6

 

 

 

 

 

6

30

2. Given a line of code, the student will be able to identify parse errors. (1, 4, 7, 10)

 

 

 

4

 

 

4

20

3. The student can distinguish standard from PECL modules. (11, 12)

2

 

 

 

 

 

2

10

4. Given a programmatic situation, the student can apply a PHP function as a solution. (13, 14, 15, 16)

 

 

4

 

 

 

4

20

5. Given generic character types for regular expressions, the student can identify the function of each escape sequence. (17, 18, 19, 20)

4

 

 

 

 

 

4

20

Total

12

0

4

4

0

0

20

 

Percentage

60

 

20

20

 

 

100

 

Directions
Circle the best answer that either correctly completes each statement or correctly answers the questions.

1. Identify the line of code with a parse error.
  1. echo ‘Hello World!’;
  2. echo ‘Hello “World”!’;
  3. echo “Hello World’s!”;
  4. echo ‘Hello World’s’;
2. The purpose of strpos() is to
  1. find the first position of an array in a string
  2. find the first occurrence of a string
  3. insert a string in a specific position of a different string
  4. delete the first occurrence of a string
3. The purpose of join() is to
  1. add two variables together mathematically.
  2. join array elements with a string.
  3. concatenate to strings.
  4. merge two arrays.
4. Identify the line of code with a parse error.
  1. if(strcmp($foo, $bar)) {
  2. if strcmp($foo, $bar) {
  3. if(substr($foo, $bar)) {
  4. if(strtr($foo, $bar)) {
5. The purpose of ksort() is to
  1. sort a string starting with capital letters
  2. sort an array by values
  3. sort an array by keys
  4. reverse sort an array by keys
6. $mysqli = new mysqli(‘localhost’, ‘root’, ‘password’); will
  1. attempt to open a connection to MySQL
  2. return boolean TRUE on success
  3. produce a parse error
  4. return an array of connection information on success
6. The purpose of mysqli_autocommit() is to
  1. turn on or off auto-committing database modifications on InnoDB tables
  2. turn on or off auto-committing database modifications on MyISAM tables
  3. turn on or off auto-committing database modifications on HEAP tables
  4. commit uncommitted SQL transactions
7. Identify the line of code with a parse error.
  1. $stmt = $mysqli->prepare(“SELECT * FROM foo WHERE bar=?”);
  2. $stmt = $mysqli->prepare(‘SELECT * FROM foo WHERE bar=?’);
  3. $stmt = mysqli_prepare($link, “SELECT * FROM foo WHERE bar=?”);
  4. $stmt = mysqli->prepare(‘SELECT * FROM foo WHERE bar=?’);
8. break 3; will _________ of a current for, foreach, while, do-while, or switch structure
  1. end execution for 3 levels
  2. pause execution for 2 levels and resume at the third level
  3. pause execution for 3 seconds
  4. cause a parse error during the execution
9. The require statement
  1. requires a specific variable parameter for continued execution
  2. includes and evaluates a specific file
  3. requires an end user to enter a password
  4. will not cause a parse error on failure
10. Identify the line of code with a parse error.
  1. $i += $factor;
  2. $i *= $factor;
  3. $i #= $factor;
  4. $i |= $factor;

Directions
Each statement is either true or false. Circle True for each statement that is true and False for each statement that is false.

11. GD2 is a standard extension
  1. True
  2. False
12. mailparse is a PECL extension
  1. True
  2. False

Directions.
Write a PHP function answer to each question in the space below each question.

13. What function would most accurately compare two string passwords?
            strcmp()

14. What function would calculate the sha1 hash of a file?
            sha1_file()

15. What function would convert a string into variables?
            str_parse()

16. How would an administrator remove erroneous spaces from the end of a string?
            rtrim()

Directions
Complete each sentence with the missing PCRE character match.

17. \s represents any _________ character/digit
            whitespace

18. \W represents any __________ character/digit
            non-word

19. \d represents any _________ character/digit
            decimal

20. \w represents any __________ character/digit
            word