Vacuous (Great Word, BTW) Thinking, Student Agency, and a Stunted Notion of Transfer

Although I enjoyed many of the readings throughout this unit, I was drawn to two in particular:  George Hillocks Jr.’s “How State Assessments Lead to Vacuous Thinking and Writing” and Tim McGee’s chapter “Taking a Spin on the Intelligent Essay Assessor.”  What stood out to me in these readings is that they go well beyond merely demonstrating the negative effects of standardized tests and/or AES systems on student writing; they take it a step further and explore how it actually influences their thinking (although this is more implicit in McGee’s piece).

In my Teaching College English class at UNC Charlotte, I remember Dr. Tony Scott asking us early in the semester why we liked writing–I gave, what many thought, was an intriguing answer.  I mentioned how I was fascinated with writing’s connection to thought; how it helps us explore our ideas, the manner in which we learn to communicate our thoughts in a comprehensible fashion; etc.  I concluded with an assertion I still heartily believe in (albeit there are exceptions):  All great thinkers are not necessarily great writers, but all great writers are great thinkers as well.  If you think about it through a certain lens, it makes complete sense.  We oftentimes want to break writing down into standard components such as organization, syntax, tone, etc.  However, usually when we think something is poorly written, we also object to the line(s) of thought possessed within.

That being said, I had never really cultivated in my mind any direct connection between standardized testing and the influence it has on students’ actual thought processes.  Yet, these two pieces definitely demonstrate the perverse influence these kinds of assessments and/or software havw on students critical thinking skills.  McGee details how the Intelligent Essay Assessor, which is supposed to read for content, is actually anything but intelligent.  Even though he went in with some hopes, they were soon dashed by the machine’s inability to actually read for content.  He inverted the order of one essay to make it make no logical sense and provided factually incorrect information in a history essay but included some keywords.  Each time, the computer provided a similar score–in the one instance, the score was the same due to a one point reduction in content being balanced by a one point increase in mechanics for the gibberish!

Hillocks Jr.’s research was even more alarming, however.  Perhaps this was due to him using the example essays that were used for grade norming/student examples.  One exemplary essay provided sparse (if any) support for its claims; sure, it was developed, but the logic of the argument really didn’t hold.  The example of the passing exam in Texas directly contradicted itself (sure, Hillocks Jr. takes a slight stab at Bush, but–c’mon–there was a gold-mine worth of humor there he should have taken advantage of!).  Think about that–you can directly contradict yourself in an essay and be considered a proficient writer in the state of Texas.

Overall, I started to contemplate how standardized testing and AES may have an even more drastic influence on student agency than I ever had considered.  Not only is the writing regimented, but the actual manner of critical thinking is stunted as well.  Reading Hillock’s Jr.’s article came at the right time for me–my last batch of articles for my class had been less than stellar.  After reading his article, I quickly realized that much of what he was pointing out were the same issues my students were having:  an abundance of facts that really didn’t serve as evidence, just padding; claims that directly contradicted previous assertions; not really thinking through an issue but just babbling.  At first, I was frustrated by these articles, yet, after reading this article, I started to realize–this is what they were taught to do.  In all honesty, I did make some pedagogical mistakes that I also believed influenced the poor results; however, everything he discussed in his article appeared in my students’ writing.

This demonstrated, for me at least, that the influence of poorly designed standardized writing tests extends far beyond the students writing–it corrupts their thinking as well!  While this is disheartening, I did seem some light at the end of the tunnel.

Primarily, I saw an opportunity for writing assessment scholars to gain support from other disciplines (The Fellowship of the Non-Vacuous Thinking, so to speak).  If we could demonstrate how these learned behaviors on standardized writing tests are “transferring” over to other classes outside of English and composition, we could make an excellent case to create a unified front.  Surely scholars in other fields care about student writing; however, if we could demonstrate to them how stunted these tests make their critical thinking skills as well, we might gain more traction.  It might be wishful thinking, but I like to believe it is possible.

 

Portfolio Assessment–Can Classroom and Large-Scale Assessment Converge in an Ethical Fashion?

One of the greatest tensions that I found when reading about assessing writing programs was the use of portfolios for program and/or institution wide assessment.  Although portfolios seem rather apt for this purpose, considering how they exhibit multiple examples of student work and frequently provide reflective writing that demonstrates students’ learning, I struggled with their use for these purposes.  Namely, I felt that using portfolios assembled for classroom assessment purposes for program and/or institutional assessments creates a possible dilemma:  if the portfolios are used for large-scale assessment purposes, it would appear as if a certain amount of standardization might be necessary.

While I wouldn’t necessarily contend that such standardization is inherently inappropriate, I do believe that this creates quite a tension in regard to student agency–specifically, in designing portfolio assessments for both classroom and program assessment purposes, are we restricting our students’ agency in order to make sure we have the data we require?

In his chapter “Eportfolios as Tools for Facilitating and Assessing Knowledge Transfer,” Carl Whithaus advocates for using portfolios for such a use, indicating how they can be rather beneficial in assessing whether learning outcomes are being met and, even more important in his estimation, for inquiring about issues of transfer.  Other readings used Eportfolios for large-scale assessment purposes as well; however, in many of these articles, it was unclear to me whether these large-scale assessments were simply taking classroom portfolios and re-purposing them or if the classroom portfolios were being designed with the large-scale assessment in mind.  If the former, I see nothing wrong with this; if the latter, I find this severely troubling.

Intriguingly, when reading for our next unit, I found that Michael shares some of these same concerns.  When discussing his experiences at Clemson (I presume), he discussed the general education Eportfolio he was forced to use.  Reluctant at first, he was even more concerned when, as he notes, “…it became clear to me over the course of the year that student learning was secondary to administrative assessment needs in this particular system” (83).  This was my fear as I read many of the other articles–was the need for large-scale assessment driving classroom instruction?

One of my favorite aspects of using Eportfolios is the ability of the individual instructor to tailor this assessment tool to his/her classroom and, further, for students to then tailor this assessment tool for their own individual purposes.  Yet, when these portfolios also become data for large-scale assessments, the need for some semblance of standardization would almost seem paramount.  As a result, it would appear rational that such standardization would consequently restrict student agency.  Thus, a conflict arises–in order to use the Eportfolios for data in large-scale assessment (presumably to create better environments for our students), we must simultaneously restrict our students.

This led me to wonder about assessment design.  Is there a method for using portfolios for large-scale assessment that does not require taking agency over the Eportfolios away from instructors/students?  I realize that my commentary on the need for standardization is indeed a major assumption, yet I cannot think of any manner in which Eportfolios could be implemented in the classroom at instructor and/or student discretion without producing a logistical nightmare for the large-scale assessment.  By standardizing the Eportfolios, it would appear as if an “end justifies the means” logic is being employed, which I am not necessarily comfortable with; however, if the Eportfolios are specific to particular classroom contexts and students, it would seem to create validity and reliability issues for the large-scale assessment.

If we can’t have our cake and eat it to, I would always advocate for the classroom (my bias towards classroom assessment being patently obvious here).  I feel as if autonomy for instructors and agency for our students in designing and implementing these portfolios should triumph the needs of the program or institution.  Yet, I can’t help thinking that I am missing something here or, perhaps, creating a rather reductive binary.  Thoughts?  Solutions?

 

 

 

The Ethics of Making Discrimination Claims

On the first day of my Teaching College English course at UNC Charlotte, Dr. Tony Scott went over some issues with how he wanted us to compose our forum posts.  Initially, as students, we reacted somewhat harshly to this—it seemed rather controlling.  However, reflecting on this a few years later, I can see the point Tony was trying to make.  One of his slides was of a cat making an angry face with the words “I disagree!” overlapping the picture.  He referred to this as the contrarian and admitted that he had been an exceptional example of this during his graduate career.  I immediately identified with this caricature, but—over the last few years—I have tried to balance this tendency.  When I read, I try to be both accepting and critical of the author(s)’ claims.

As I did the readings for this unit, though, I found my reactions to the articles to be rather polarizing.  I either was nodding my head in agreement or utterly confused about how someone could make such unsubstantiated claims.  Perhaps my two favorite pieces, as I’ve mentioned in class, were Chris Anson’s (I’m an admittedly biased huge Anson fan!) “Black Holes” and Johnson and VanBrackle’s “Linguistic Discrimination in Writing Assessment.”  Reflecting on why I favored these articles in particular, I came to a realization—I appreciated them the most because I thought their methodologies were well executed.

Racism and discrimination are issues in writing assessment that need to be critically examined; if not, the tendency for writing assessments to become more oppressive is quite frightening.  That being said, though, I have always felt that claims of racism and discrimination need to be handled in an ethical manner.  Such accusations carry a serious gravity to them that can tarnish the reputation of the accused.  In many cases, once accused, the accused becomes stigmatized, regardless of the validity of the accusation.  Hence, research conducted in regard to racism and discrimination should be diligent and well-reasoned—the margin for error or sloppy work is much smaller.

In Anson’s article, he does a brilliant job of showing how there is an enormous lack of research on racial issues in regard to WAC.  He documents this quite well, analyzing various major publications, collections, etc. to show the paucity of discussions of racial issues.  This results, in my estimation, in an ethical and fair claim—WAC research and scholarship ignores issues of race all too frequently.

In spite of how impressed I was with Anson’s work, Johnson and VanBrackle absolutely impressed me.  As I read their article, I found myself pushing back at times; however, every objection I raised was immediately met with sound reasoning beyond their methods.  Johnson and VanBrackle accounted for unforeseen variables and possibly faulty assumptions.  This was primarily a result of their accounting for three different categories:  AAE, ESL, and SAE were researched in relation to one another.

This removed a potential problem I initially saw with this line of research.  When Johnson and VanBrackle originally discussed their categories of errors, I was skeptical of the differentiation between AAE and SAE; primarily, I felt that the salience (in my terms, not their terms) of the AAE errors was much more noticeable than SAE errors.  As we read, we tend to hear the text in our own head.  Thus, errors such as plural issues and “incorrect” uses of the “to be” form are quite noticeable; semi-colons in place of commas are not as “internally audible.”  So, as I read, I immediately objected to this being necessarily a form of discrimination; instead, it seemed a result of the aforementioned process, which might reflect a bias in the way we read, but not necessarily that the raters were biased due to racial issues in particular.  However, quite brilliantly, when Johnson and VanBrackle brought in the ESL examples, my objections were rendered completely null-in-void.  These errors are also quite “internally audible,” which means that the discrepancy between AAE errors and ESL errors was quite indicative of discrimination. When paired with the SAE scores, the trend became quite noticeable.  Johnson and VanBrackle’s claim that “Based on previous research, it is not surprising that our results indicate linguistic discrimination, but what is surprising and provocative is the extent of the discrimination against AAE errors” is well-founded (47).  Their methodology was sound and such a claim was valid based off of the extensively well-thought out research design.

Contrasted with Whithaus et al.’s claims in “Keyboarding Compared with Handwriting,” I felt there was a discernible difference in the quality of methodology and, subsequently, the quality of the claims.  While not necessarily based on race, this example is, in my opinion, exemplary of poorly founded claims based off of problematic methodology.  In this article, the authors conclude, “In the present study, one aspect of the data that interested the researchers was the 3% higher pass rate of the typed exams. This statistic could indicate that students perform better when keyboarding, that raters – despite their claims to the contrary – deduct points from typed exams less frequently than from handwritten ones, or a combination of the two scenarios” (14).  This seems fine on the surface; however, it glaringly omits another possible variable when considering this discrepancy.

Sure, it is quite possible that students do perform better when keyboarding and raters may have a bias, yet they have no right to this claim.  Another variable was not accounted for—students who prefer to keyboard could, quite possibly, be more comfortable with computers due to having grown-up with them in their household.  As a result, they probably came from families with higher incomes which subsequently probably led to better educational opportunities.  Hence, the discrepancy might not have anything to do with Whithaus et al.’s hypothesis; instead, a likely scenario is that these students were afforded more opportunities growing-up which led to better development in their writing skills.  This still demonstrates societal inequity but demonstrates that:  1) keyboarding might not be inherently better and 2) the raters may not be bias.  Much like the ice cream sales and drowning deaths example discussed in class, correlation does not necessarily mean causation.  In the Whithaus et al. article, access and educational opportunity are the “more people swim in the summer when more ice cream is sold” factor—in essence, there might be something unaccounted for.

In their defense, Whithaus et al. did not make a damaging claim of racism and/or discrimination.  And yet, although they did not, I feel as if their claims demonstrated how a lack of accounting for other possible variables could produce an erroneous claim.  When it comes to race and discrimination in relation to writing assessment, we have an imperative to examine the potentially biased and discriminatory nature of our writing assessments.  Yet, in doing so, I feel that we also have an ethical obligation to make sure that any claims in regard to race and discrimination are based on fundamentally sound empirical methodologies.  Such charges have a profound impact—our knowledge claims, in regard to these issues, must extend from meticulous and well-reasoned research, not from a desire to have our hypothesis proved.  Just as we have an obligation to design ethical writing assessments that are not racist or discriminatory, we also have an obligation to research possible claims of racism and discrimination in an ethical and just manner.

Efficiency and Agency: At Loggerheads?

My research interests revolve around writing assessment and student agency. Although my focus at the current moment is primarily on response, I envision my scholarship throughout my career as addressing the intersection of the two in various assessment settings. Oftentimes, writing assessment is looked at as having possibly debilitating effects on those who are assessed; whether it is racial bias, improper placement, denial of access to various educational opportunities, etc., as a field, we realize the potential dangers of inappropriate, unethical assessment practices. Yet, perhaps the most ominous threat many current writing assessments present, in my estimation, is their potential to severely limit and hinder student agency.

Frequently, we judge an assessment by what it tells us about students’ writing (i.e. “does it measure what it purports to measure”); however, while this is quite critical, writing assessment scholarship does not address what assessments tell students about writing as often as it probably should. Through the process of assessment washback (see Perelman and Anson for discussions of this), in many instances, writing assessments work in reverse—the curriculum teaches what the test purports to measure. Moreover, when high-stakes testing and writing assessments carry such gravity in decisions about access, it is only natural that students, realizing they are being judged for admissions or advancement on these assessments, would naturally internalize the expectations of these writing assessments as emblematic of what “good” writing should look like. As Michael Neal notes, “I am concerned that the types of writing assessments we use are mechanizing the minds of students in ways that are contrary to what we are or should be teaching them about writing” (24).

The root cause of this, I contend, is the desire for efficiency. In his article “The Worship of Efficiency: Untangling Theoretical and Practical Considerations in Writing Assessment,” Michael Williamson discusses how efficiency has become the “god-term,” in a sense, of the assessment community. With diminishing budgets, coupled with a somewhat paradoxical desire for large scale assessments in education, efficiency becomes priority number one—how can we assess students’ learning (in particular, writing abilities) as efficiently and cost effectively as possible? When contemplating the popularity of the SAT, Michael Neal contends, “That’s the almost sinister beauty of the SAT verbal writing assessment. At a time when resources are scarcer than ever, an efficient writing assessment appears that shifts the financial burden away from institutions…” (37). This efficiency is rather productive from an economic standpoint, yet it fails to account for the assessment technologies that are necessary to produce a writing assessment on such a large scale.

To ensure inter-rater reliability, the prompt needs to call for a genre (i.e. the “marvelous” five-paragraph-theme) that is rather reductive; furthermore, such reduction inevitably leads to calls for machine scoring. Both the “Orwellian” trained scorers, as well as the machines, create a view of writing that is the direct antithesis of what we are attempting to teach students about writing (not to mention that the more reductive an assessment is, the easier it can be “gamed”). But, unfortunately, these concerns are, more often than not, triumphed by the myopic desire for efficiency.

Racking my brain, I had a difficult time contemplating an example of a writing assessment currently in use that is both efficient yet promotes a rich, authentic view of writing while allowing for student agency. The writing portion of the SAT is quite efficient, but I do not believe any scholars are willing to champion the agency granted to students (please note sarcasm) via the five-paragraph-theme. Portfolio assessments, in most instances, do promote a rich, authentic view of writing, yet—for the most part—they are not really efficient from either a time or economic standpoint (while Dr. Neal and co. might have gotten the cost of assessing a portfolio down into the $3 a portfolio range, I’m sure they still cannot compete with the SAT in this regard). These thoughts have led me to a rather vexing set of questions: Can we design writing assessments that are efficient while simultaneously bolstering student agency? Are efficiency and student agency natural gladiatorial combatants in the arena of writing assessment? Does one necessarily have to receive the “thumbs down” in order for the other to triumph?

 

Power to the People!

*This blog post conveys some strong sentiments.  They are not meant to offend or point the finger; rather, they are kept in their emotive form to express personal sentiments and evoke critical reflection amongst the field.

While I enjoyed the readings for this unit, I was particularly drawn to Linda Adler-Kassner and Peggy O’Neill’s Reframing Writing Assessment.  Specifically, I thought they did a great job explicating current assessment frames as well as highlighting what I, personally, view as our field’s major liability as a discipline–in spite of the content of our discipline, we (ironically) oftentimes fail as public rhetors.  In the end, we are rather talented at articulating our viewpoints amongst each other and at gaining a consensus (even if it is not perfect) about major issues.  However, when we try to express these issues and opinions to a wider audience, we undoubtedly seem to fail.

One of the main reasons I am drawn to writing assessment is that I believe issues in regard to writing assessment (and educational assessment in general)–and the subsequent debates that emerge as a result–are critical to Rhetoric and Composition’s future.  In my estimation, this will be our generation of Rhetoric and Composition scholar’s “battleground.”  Yet, I am increasingly disappointed by our efforts to affect change.  While many of the histories we have read portray a movement to more localized, contextual assessments, the proliferation of state and national writing assessments has continued, especially in the K-12 setting.  We may have improved many assessment practices at the collegiate level, but–in my opinion–the shit is going to start rolling uphill.  Since, as Adler-Kassner and Peggy O’Neill note, “…researchers who have either scrutinized the ways writing instructors engage with audiences outside the academy and/or have engaged these audiences themselves have noted, we are notoriously inexperienced with this work” (103).

As we fight to gain traction and have our voices heard in assessment related issues, it seems as if Rhetoric and Composition is ill-equipped to “rhetorically rumble” with the stakeholders and testing companies that present major obstacles to developing rich and authentic assessment models.  Unfortunately, these stakeholders and testing companies are much more PR savvy and, to be honest, know how to appeal to, and connect with, the general public much better than we do.

So, for this blog post, I though I would create an “addendum,” of sorts, to Adler-Kassner and O’Neill’s suggestions to improve communications with the media.  My suggestions, though, will focus more on appealing to the general public at large.

1)  Stop, as a field, being so adverse to numbers!  Ok, so I am definitely riding Haswell’s coat-tails with this (see the Elliot and Perelman collection in reference to this), but I am continually frustrated by how critical Rhetoric and Composition is of statistics.  Haswell aptly demonstrates how numbers can be rhetorical tools–even if our attachment to post-modern theory makes us skeptical of absolutes and distrusting of the “objectivity” of numbers, the fact is, they convey messages rather well.  As a field, I believe we are well positioned to use numbers both ethically and are quite capable of combating the numbers produced by our opposition (see previous blog post for how we could rather quickly mess with these companies reliability figures).  Presented with charts, graphs, and over-whelming statistics, the average citizen outside our field is more likely to respond than if we theorize them to death!

2)  Learn to communicate with the general public.  Ok, so I’m venting a bit here but please indulge me.  Nothing frustrates me more with Rhetoric and Composition than its constant discussions of equality, scholarship on class differences, morally superior stances, etc.  Do not get me wrong–these are all admirable; however, many scholars are so out of touch with the working class that these efforts are hypocritical at best, disingenuous in the worst case scenarios.

While we purport to care about these disenfranchised individuals, how often do many scholars actually associate with them?  We tend to favor (and this is a broad, broad generalization) eating at fancy restaurants, having exquisite tastes, engaging in overly-intellectualized conversations, etc.  Furthermore, some actually seem to look down on those who engage in less high-brow activities (i.e. eating at chain restaurants,  less “refined” tastes, and trivial conversation).  Until we, as a field, become better at interacting with those we so adamantly insist we are defending, we are going to continue to come off as snobbish elites who are generally unlikeable.  In Adler-Kassner and O’Neill’s words, “We can, of course, use theory from our own field to frame the issue, but we must remember (as rhetorical theory reminds us) the roles of audience, purpose, and context in shaping the message” (103).  If we continue to be detached from the general public, how can we possibly appeal to them?

As much as possible, we need to engage our friends and family in discussion of the issues that are important to us and frame them in ways they can understand.  This is not as difficult as one might think (I can personally attest to this), yet it requires valuing genuine communication over the appearance of intellectual superiority.  We need to address these issues in relate-able ways and, overall, be more relate-able ourselves.

3)  More research on the adverse effects unethical assessment practices have on parents and studentsThis one seems rather simple, yet we tend to sometimes focus on why many of these writing assessments are inherently flawed (validity, validity, validity!) rather than the damage they are actually doing.  This particular point was inspired by Condon’s article.  As I read about how his assessment enabled WSU to gain valuable insight into how students viewed their education, I couldn’t help but wonder the benefits exploring the effects of these tests could have.  This research is definitely out there, but we need more of it and to market it better, which leads me to my last point.

4)  Establish a sub-discipline within Rhetoric and Composition that merges our work with public relations.  In essence, we need to train and develop a PR team!  Writing assessment, gender studies, visual rhetoric, etc. are all subdivisions (in a sense) of Rhetoric and Composition.  Perhaps we need to start developing a subdivision that studies and performs public relations tasks.  This kind of work is ripe for analysis from our field and would serve to develop a set of scholars specifically trained to work with the general public and the media.  As a result, we could effectively get our message across to a broader audience.

5)  Contemplate how ridiculous our aversion to “non-scholarly” publication is.  This one really is quite simple–if these issues affect the public at large, it might be better for our publications to, well, address the general public!  Within our field, tenure decisions and academic credibility seem to be contingent on publishing in academic journals and composing books meant for academic audiences.

This can be quite productive–it serves to develop our pool of knowledge.  Yet, what good is it if we constantly communicate with ourselves?  Overall, we need to embrace scholars writing articles in magazines, publishing books for the mainstream, etc.  The pretentiousness of this scholarly publishing trend is in direct opposition to our goals!  If you want to communicate to the general public, write for them.

Overall, admittedly, these points are all based on a prominent belief of mine–if those affected by poor assessment practices (i.e. parents, students, teachers) knew as much as we do about the dire effects, the public backlash would be substantial.  I firmly believe that the general public is not indifferent to these matters; they merely do not understand them enough to see what is at stake.  We need to, as a field, bring our message to the public so we can form committed alliances that, through sheer forces of numbers and resolve, can take on corrupt stakeholders and various financial interests to insure a better educational future for our students!

Bullshit!

Many of us are probably familiar with the card game bullshit.  Essentially, players attempt to discard their hand into the pile based off of the card they are supposed to throw on their turn.  However, as the name of the game suggests, the trick is to not only get rid of cards that you have that correspond to the card you are suppose to throw; you are also suppose to discard other cards (along with or in place of the correct one for your turn) in an attempt to deceive your opponents.  If another player suspects you are “bullshitting,” they can call bullshit.  When they are right, you must pick up all of the discarded cards; when they are wrong, they must pick them up.

While this game figures into the accepted societal definition of bullshit, players are–essentially–lying.  For philosopher Harry Frankfurt, though, lying and bullshitting are two separate acts.  Les Perelman has used Frankfurt’s work in discussions of mass-market writing assessments and summarizes the distinction well:  “As Frankfurt argues, bullshit is actually more dangerous than outright lying because while the liar knows the truth, although he wants to lead his audience away from it, the bullshitter is unconcerned with the truth…” (426).  This conceptualization of bullshit seemed rather apt (for me at least) in Cherry and Meyer’s discussion of the tertium quid.

This process, in which a third evaluator is brought in and, subsequently, the “bad” rating is thrown out, is absolute and utter bullshit!  As Cherry and Meyer note, “It is precisely the big misses that most contribute to low reliability” (41).  If you are truly concerned with reliability, you need to account for these misses; however, if you merely care about having an impressive number, the tertium quid is the best method of bullshit possible.  Disregard outliers–it’s just that simple.  Now, I realize that disregard for outliers is acceptable practice for some statistical calculations (I can’t think of any off the top of my head, but I have encountered them), but, in this instance, the outliers are essential to determining the “truth.”  To disregard them is to be unconcerned with the reliability of your test.

In this manner, test designers are not really lying:  the correlation coefficient is precisely accurate by the manner in which they calculate it.  Yet, by not making their methods transparent, they are bullshitting.  This is why I concurred with Cherry and Meyer’s call to make methods of calculation more transparent.  Furthermore, as Perelman has contended, “…few recent College Board Research, ETS Research, or Pearson Educational Research Reports discuss failings or problems in any of the standardized tests owned, developed, or administered by these companies…Moreover, it has become commonplace for the abstracts of many of these research reports to include assertions that are not supported by data in the actual report” (432, emphasis added).  Hence, the research being conducted on these tests has no concern for any semblance of truth–it is merely propaganda!

Indirectly, this leads me to a second point I’d like to address.  According to Lorrie Shepard, “Test makers are not responsible for negative consequences following test misuse, nor should such effects be folded into the validity evaluation” (8).  Taken at face value, this contention seems fair; however, if we examine it more closely, it is rather unethical.  To illustrate my point, I’ll take up Cherry and Meyer’s metaphor of missile accuracy calculations.  They equate the tertium quid to missile accuracy tests discounting “…projectiles that missed their targets by more than 100 feet” (41).  During our last invasion of Iraq, so-called “smart bombs” did not live up to their expectations–in fact, they were responsible for extensive collateral damage as well as many Iraqi civilian casualties.

Now, let us say that these missiles were not intended for close-quarter urban combat (I am not sure whether this was the case or not, but merely use this for hypothetical purposes).  By Shepard’s logic, the defense contractors are in no way responsible.  Yet, since they projected such accuracy, it can only be assumed that the military would employ them in this environment and–based on the statistical evidence–be confident in their ability to hit their targets without collateral damage.  Even though this wasn’t an intended use, it was a logical deployment of the technology based off of the data provided.  Apologies to Lorrie Shepard, but I find these defense contractors highly responsible for the damage their technology caused, regardless of the intended use.  By providing inaccurate information, they made such a use seem plausible.

The old saying goes, “Liars, damned liars, and statisticians.”  Perhaps a more fitting phrase for our discussion would be, “Bullshitters, damned bullshitters, and mass-market test designers.”  The more I read about assessment and the many unethical practices that result out of it, the more I have come to believe that we need to start calling “bullshit” on these people.  However, as Dr. Neal mentioned in class, our field’s ability to communicate outside of itself is rather lacking at times.  Presented with fancy charts and figures or some academic babbling about theory, the general public will most likely respond to the former.  I guess my main questions are:  How can we tailor are rhetoric to be competitive with the former without falling into the ethical pitfalls of our opponents?  Is there are way to “fight fire with fire” without being reduced to their level?

Reliability and Ethics

Huot and Neal’s “Writing Assessment:  A Techno-History” caused me to contemplate the concept of reliability–particularly the methods by which we determine reliability and its ongoing tension with validity.  Having been a reader for an assessment Dr. Neal designed, the influence of William L. Smith on his thinking seems, in my estimation, quite apparent.  Two other readers and I were asked to read student’s compositions and evaluate them based off of where we felt they should be placed in FSU’s FYC program.  Generally, two of us read each student essay with the third weighing in if there was any disagreement.  Before we began, Michael had us read a small batch of student essays to determine how we would evaluate them; amazingly, we were rather reliable scorers from the start–little disagreement took place between the three of us.

However, reading through the numerous histories of writing assessment, I became intrigued by what would have happened if we had not agreed.  According to Huot and Neal, The technology of testing in this century has moved dramatically toward valuing objectivity over subjectivity in decision making, adding to the strength of the psychometric, educational assessment community…” (423).  Reliability is usually viewed as a rather objective enterprise.  Inter-rater reliability refers to the consistency of scorers; generally speaking, the closer to 1, the better.  Yet, I did not find in any of the discussions of reliability throughout our readings any ethical discussions about how reliability is achieved. 

Take for instance the assessment I worked on for Dr. Neal.  If we hadn’t agreed as scorers, and for hypothetical purposes let’s say I was the root cause of this, how might reliability have been achieved?  If Dr. Neal would have discussed why my scores were different and guided me more towards scoring within the intentions of the assessment through intellectual discussion, this would seem to be an ethical version of reliability.  Presumably, this is the method Dr. Neal would have undertaken.

But, what if Dr. Neal would have used his institutional authority over me?  I was compensated for this assessment; furthermore, Dr. Neal is my major professor.  Had Dr. Neal insisted that I scored and/or rated these essays in a certain way, by threatening to take me off the assessment, I would have risked losing compensation and disappointing and infuriating someone critical to my academic success.  I consider myself a principled person, yet–to be honest–I would have backed down to him in a heartbeat no matter how right I believed I was.

In both cases, Dr. Neal would have a strong correlation coefficient, yet the methods of obtaining this number seem drastically different from an ethical standpoint.  Furthermore, in many ways, the second hypothetical method I discuss is frequently used on scorers of standardized writing tests–ETS will norm them and, if they do not fall in line with the pack, they do not keep their job.  In essence, reliability is coerced.  Hence, the objectivity of these assessments is not as objective as they really seem.

This led me to question whether the reliability vs. validity argument shouldn’t also include ethical discussions of how reliability is achieved.  I’m sure this has happened in the past, but it does not seem to be a major factor in discussions on reliability.  Essentially, it appears to boil down to not mattering how the @*#& you get the number, just whether you actually have the number or not.  Validity is a nice balance for this, but discussions of the ethical ramifications of achieving reliability–in my opinion–would be quite beneficial in exposing nefarious assessment practices.

What are your thoughts?  Am I way off base?