kitchen table math, the sequel: 3/31/13 - 4/7/13

Thursday, April 4, 2013

automated essay grading

In the Times today:

Essay-Grading Software Offers Professors a Break By JOHN MARKOFF
Published: April 4, 2013 | New York Times

I'm actually in favor of essay grading software, in theory. I've been interested in automated essay scoring ever since reading Richard Hudson's paper Measuring Maturity in Writing (which I need to re-read, so nothing more on that at the moment):
Abstract
The chapter reviews the anglophone research literature on the 'formal' differences (identifiable in terms of grammatical or lexical patterns) between relatively mature and relatively immature writing (where maturity can be defined in terms of independent characteristics including the writer's age and examiners' gradings of quality). The measures involve aspects of vocabulary as well as both broad and detailed patterns of syntax. In vocabulary, maturity correlates not only with familiar measures of lexical diversity, sophistication and density, but also with 'nouniness' (not to be confused with 'nominality'), the proportion of word tokens that are nouns. In syntax, it correlates not only with broad measures such as T-unit length and subordination (versus coordination), but also with the use of more specific patterns such as apposition. At present these measures are empirically grounded but have no satisfactory theoretical explanation, but we can be sure that the eventual explanation will involve mental growth in at least two areas: working memory capacity and knowledge of language.
Maturity of writing, in this sense, can be measured by software, and I would be using automated scoring software myself if I could buy essay-scoring software on Amazon. EdX says it's giving software away free to 'institutions' (does that leave out individuals?) so I'll have to see if my department might throw its hat in the ring.

That said, a lot of this is nonsense:
Anant Agarwal, an electrical engineer who is president of EdX, predicted that the instant-grading software would be a useful pedagogical tool, enabling students to take tests and write essays over and over and improve the quality of their answers. He said the technology would offer distinct advantages over the traditional classroom system, where students often wait days or weeks for grades.

[snip]

“It allows students to get immediate feedback on their work, so that learning turns into a game, with students naturally gravitating toward resubmitting the work until they get it right,” said Daphne Koller, a computer scientist and a founder of Coursera.

[snip]

“One of our focuses is to help kids learn how to think critically,” said Victor Vuchic, a program officer at the Hewlett Foundation. “It’s probably impossible to do that with multiple-choice tests. The challenge is that this requires human graders, and so they cost a lot more and they take a lot more time.”
None of these things is going to happen. Students aren't going to write essay responses "over and over again;" if they do write essay responses over and over again it's not going to feel like a fun game; and nobody's going to learn to think critically from automated essay scoring software.

Oy.

Tuesday, April 2, 2013

words page

In honor of the latest bulletin from palisadesk, I've started a new Blogger page.

uh oh

A note from palisadesk:
Unfortunately, it is true that using Lucy Calkins' methods can raise test scores, due to the design of the current generation of "authentic assessments" (aka holistic assessment, standards-based assessment, performance assessment). I know several schools (including my own) where test scores rose substantially when they STOPPED doing systematic synthetic phonics and moved to a workshop model instead.

So to prove the instructivist stuff works you also need to have in place testing that assesses actual skills -- phonemic decoding, vocabulary, grammar, spelling, arithmetic, etc.

It's really not all that unbelievable, if you consider how the testing has 
changed. Schools used to use norm-referenced measures (like the IOWA, the
 CTBS, Metropolitan Achievement Test, etc.) which also have definite 
limitations, but different ones.

Once they replaced those (as many states
 have done) with "constructed-response" item tests, variously known as 
performance assessments, holistic assessments, standards-based assessments 
and so on, a more fuzzy teaching approach also yielded benefits. These 
open-response items are usually scored on a rubric basis, based on anchor 
papers or exemplars, according to certain criteria for reasoning,
 conventions of print, organization, and so forth. These are variously 
weighted, but specifics like sentence structure, spelling, grammar,
 paragraph structure etc. generally carry less weight than such things as
 "providing two details from the selection to support your argument."

The 
open responses often mimic journal writing -- it is personal in tone, 
calls for the student to express an opinion, and many elements of what we
 would call good writing (or correct reading) count for little or even 
nothing.



The same is true in math. A local very exclusive private school which is 
famous for its high academic achievement recently switched from 
traditional math to Everyday Math and saw its test scores soar on these assessments (probably not on norm-referenced measures, but they aren't
 saying).



Another school where I worked implemented good early reading instruction 
with a strong decoding base (and not minimizing good literature, either),
 but saw its scores on the tests go down almost 25%. I think the reason
 for that is that teaching children to write all this rubbish for the "holistic assessments" is very time consuming, and if you spent your 
instructional time teaching the basic skills -- which aren't of much value 
on these tests -- your kids will do poorly.



So yes, you can post [my email], not referring to me of course. You can say -- 
because I don't think I've mentioned it publicly anywhere -- that I have 
been involved in the past in field-testing these assessments so have a 
more complete picture of how they are put together and evaluated, and what 
they do and do not measure.

Different states have made up their own but 
they share many similarities.
I was surprised when I read this ..... somehow I had assumed that, basics being basic, absence of basics would make any test hard to pass.

Apparently not.