I continue to be fascinated by the disconnect between the degree of detail available in ME DoE Data Warehouse information – and what is available in the “School Grades” data collection effort itself – and the really thoughtless application of that data to uni-dimensional and somewhat random A-F grades. The essence of what the grading system appeared intended to communicate was evident from yesterday’s newspaper headlines:
Failure and shame were the watchwords of the day. It was somewhat serendipitous that Slate.com published an article yesterday entitled, “The Case Against Grades: They Lower Self-Esteem, Discourage Creativity, and Reinforce the Class Divide.” Many of the same dynamics appeared poised to occur as a result of the ME DoE decision to try to “improve” Maine schools through unfairly labeling them with golden As through scarlet Fs.
And what do we mean by unfair? A large part of it certainly has to do with the fact that the “school grades” relate so strongly to income. If most of the variation in school scoring can be explained by local wealth, then all these grades do is affirm poor towns as failures and wealthy towns as wonderful and desirable. (As if everything else in American culture isn’t already reinforcing this belief.)
However, there were smaller and stranger injustices lodged within the system that I discovered when I look at the raw scores before the assignment of letter grades.
How were the raw scores created? For both the high school and sub-high school grading methodologies, the DoE took into account math proficiency, growth in math proficiency, reading proficiency and growth in reading proficiency. For high schools, the DoE also looked at four and five year graduation rates and ranked them on a 0-50 scale. For elementary and middle/junior high schools, the DoE looked at growth in reading and math proficiency for the lowest 25% of student test-takers.
I noted in my addition to yesterday’s post that the state academies — which really are examples of innovation in relatively impoverished areas, the kind of thing that Commissioner Bowen and Governor LePage give lip-service to appreciating — were heavily penalized for lower standardized test participation, something that state law had permitted them to do. This is likely to matter a lot to the state academies, too, because they subsidize their education by attracting full-tuition students. The decision to arbitrarily punish them for something they had been allowed to do is likely to cost them real dollars in terms of reduced full-tuition enrollment.
The penalty for not having more students take a test which they were not required to take was to drop them a grade lower than their score would have otherwise predicted. For example, look at this section of my Excel sheet:
Ranking of raw scores and grades don’t necessarily correspond. Why? Because the ME DoE decided that student participation rate had to “count” somehow. This was the bluntest and least logical way to achieve that end.
Another strange injustice, however, was the decision to create arbitrary new cut points to determine which schools would get each of the “school grades.” Looking at the distribution of the raw scores, there was no obvious way to produce the particular set of grades that the DoE decided to assign. Here, for example, is the distribution of scores in terms of what proportion of the maximum possible score assigned by the DoE was achieved by each school.
Under a traditional grading scheme, this would mean that most of the schools fail and the second largest group get Ds. Now, let it be said that there’s nothing particularly necessary about the grading methodology that the DoE developed. This was their baby – they could have developed any weighting, any grading scheme. Yet even within their own grading scheme the cut points are arbitrary and don’t communicate anything real accurate, in terms of proportions, regarding what was produced by their own dataset. They didn’t say “most of Maine schools fail.” Instead, they said something different:
In blue, you see how schools would have been graded if you just took the scores that DoE found that they achieved, through the DoE’s own grading system, at face value. In red, you see what the ME DoE decided to assign. The proportions are nowhere even similar to what their test scores found. Instead of reporting a clear assessment of how schools were doing according to some presumably meaningful set of criteria, the DoE decided that it wanted to see a certain structure to its outcomes — mostly Cs, with a meaningful number of B and Ds, with fewer As and Fs. Since this profile is not naturally available in the data they collected, they just…made it happen.
All those headlines about “75% of the state at C or less” were absolutely meaningless, because the only reason that they are like that is because the DoE decided to report them as such.
Okay. With that very relevant issue out of the way, I wanted to spend a little more time looking at the fit between the poverty/school score relationship and its residuals. I looked at high schools, junior high/middle schools, and elementary schools and found that there were three very different kinds of relationship between poverty and school scores depending on the school level.
For high schools, poverty levels were very, very determinative of outcome. I said in my post yesterday that poverty levels explained around a third of school grades. Looking at more finely-grained outcomes – the raw school scores – and more finely-grained inputs – just high schools – I found that poverty accounts for nearly two-thirds of raw school scores.
That is really, really high. Most of a score for each individual high school, across its level of variation, can be predicted by how many kids get free lunch. The biggest outlier in this trend – that one little dot at the top – is the Maine School of Math and Science, which is so sui generis it probably shouldn’t even be included in this kind of measurement.
Giving high schools grades without accounting for poverty levels is really just telling kids that they are failures for being poor and teachers and administrators that they are failures for choosing to work in impoverished areas.
Wouldn’t it be more interesting to find out how much individual school add (or reduce) value relative to what was predicted by average school poverty level? On that subject, here are the residuals that I calculated for this relationship. I threw in the ME DoE scores next to the residuals just for funsies. What different characteristics of “good schools” do these two different kinds of measurement measure?
As I described in yesterday’s post, looking at the residuals allows us to control for large, important variables like poverty levels in order to start looking at other contributing factors – for instance, like teaching methods, relationships with the community, curricular innovations, you know. Things that we’re trying to do in order to PRODUCE better schools. Looking at the residuals, Waterville Senior High is scoring 10% higher relative to the maximum possible score than its poverty levels would predict – similar to Cape Elizabeth High, in fact. Why is that? Wouldn’t you like to know?
Returning to the question of the strength of the poverty/score relationship, this does start changing when you look at different school levels. Here is the scatterplot for elementary schools:
Note how much more distant many dots are from the trendline in the middle, compared with the tighter cluster in the high school chart. That is what a weaker relationship looks like. Here, the R^2 tells us that only around a quarter of the school score results is explained by the school’s percentage of students receiving free or reduced lunch. That is VERY interesting, and worthy of further attention.
Here are the residuals for elementary schools.
It is also interesting that the strength of the relationship between middle/junior school scores and poverty fall between the two school groups, but much closer to the high-school strength of relationship.
What is it that happens between elementary school and middle school that leads to such different outcomes? Is it difference in the standardized tests? Something truly different about how middle school and elementary school students are taught? Something else?
Another fascinating, fascinating set of questions which are not approached – not even raised – by the arbitrary, unfair flat “school grades” that our state DoE decided to issue. What a missed opportunity.
It might be interesting to take a look a multi-building districts compared to single (Portland compared to Van Buren, for example) or pK-8 compared to pK-12 systems or (Caswell v Auburn) or rural v urban.
I’m very interested in the rural/urban question myself though I believe that most of the effect would be swallowed up by poverty levels. The question of the best format for schools- smaller, larger? Multi-building? and I’m personally very interested in the effects of customized learning – all would be good to look at once income was controlled for.
The Millinocket superintendent, plus others, got their schools’ grades removed because there had been configuration changes over the past couple of years. What bothers me about this is that the DoE apparently never accounted for that sort of scenario even though they could have. What else didn’t they consider?
I also wondered about what the distribution would look like if it measured only the ways in which the school added value. It seemed that the most valid measure was the change in test scores, rather than the actual achievement of the students.
Was the methodology used by the Maine DoE copied from Florida, or did they come up with it on their own?
Excellent and interesting analysis, by the way.
Very interesting! Thanks for doing the extra work to help put this information in context.
Would you consider exploring a little further the middle/jr high school issues that you touched on above? I wish you had done a breakdown for them as you did for elementary and high school. I think one of the more complicated issues with this cohort is that some middle/jr high school are three years and some are only two. I do not believe that the governor’s grading system was at all accurate for those two year middle schools since the only scores that they are instructionally responsible for are the Fall of the 8th grade year. Because 9th graders do not have standardized testing, the middle school does not get the benefits of having the growth measured that resulted from the full two year instructional time.