|
What Works in Schools: A Research-Based Approach to School ImprovementPresenter: Bob Marzano, Marzano and Associates, Centennial, CO
This session is presented in separate parts. Use the buttons at the end of the transcription to navigate between each part.
Part TwoBOB MARZANO: If the school employs an effective feedback system — well, let me put it differently — if they don't employ an effective feedback system, they really don't know how students are learning, whether they are learning or whether they are not. Now, this is a tough one. Remember I said these are listed in rank order, these school-level factors. I say these first two are the key. That is why I am spending some time on them, because I think a lot of schools are doing well already in the other three. Now, this is probably the most difficult, arguably, because we are in an environment now where the primary vehicle for feedback is something that, by definition, does not provide good feedback. Let me state it differently. In the public out there, with legislators and parents, what do they look to as the evidence that students in a school are learning or not? They look to some type of external test, right? Standardized tests, State-level tests, wiring the two together. Now, I am a testing person. And by that I mean that external tests have their place, and they have an important place. But not as a feedback mechanism and not as the primary tool to make decisions about how a school is doing. And certainly not as a tool to ever make decisions about how an individual student is doing. To back that up, let me go back to measurement 101, back in your undergrad days. You all remember the terms "reliability" and "validity"? Remember the term "standard error measurement"? Do you remember what it means? Some of you remember all too well; some of you, it's a little fuzzy. Well, for those of you like me, it's a little fuzzy, let me remind you. Basically, all of what we do relative to assessment in education and in psychology, and in the social sciences for the most part, revolves around classical test theory. And classical test theory really boils down to a fairly basic, simple equation. Here is the equation: Your score on any test — or any assessment, I should say; it's called the observed score — is made up of two parts, true plus error. The true score, plus error. Your true score is the score that truly represents what you knew on that assessment at that particular time. Does that make sense? Plus error. And error can either work to inflate your observed score or deflate your observed score artificially. Let me illustrate. I am your student. I am in your class. You give 100-point test on the midterm. My observed score, the score I get, is a 75. But let's say my true score is 80. How can that happen? I got a 75; I should have got an 80. Well, can you see where I was tired? Can you see where it was a timed test and I didn't get to the last few items? Can you see the possibility of you were tired when you scored the test? That's error. Let's do it the other way around. I got a 75; my true score was a 70. How can that happen? Well, actually, for a little comic relief, I always throw in Elaine Boozer here. She is a comic, and I remember seeing her routine once. She said she hated those math tests where you had the two pieces of paper. On one piece of paper you put your answer, the other piece of paper you have to show how you got your answer. She said, on the second piece of paper, she would always have to draw a picture of herself looking at the kid's paper next to her. [Laughter] BOB MARZANO: That's error working for you obviously. Now, the standard error of measurement tells you how much wobble you can expect in any test. Now, some may use a test that is pretty commonly known to people, widely used. The ITBS, how many of you are familiar with the Iowa Test of Basic Skills? [A show of hands] BOB MARZANO: Now, let's say that your sixth-grade son comes home with his Iowa test scores. As the good parent that you are, you start scanning through the subtests, and here is what you find: On capitalization, his grade equivalency is a 5.0. So, he's a sixth-grader, his grade equivalency is a 5.0 on capitalization. Are you happy or are you sad? You're sad, right? Maybe you're a little mad. No TV for you tonight, young man. We're going to capitalize. We've got a year of capitalization to build in here. [Laughter] BOB MARZANO: Until you look at the standard error on the ITBS. The standard error is about .8 for that subtest. Actually, this is the older version; I am not sure what the newer one is. My guess is it's pretty much the same. Now, here is what you do with that standard error. You form what are called confidence bands or confidence intervals around the observed scores. Now, I am not being completely mathematically accurate here, but close enough. If you roughly want to form the 95 percent confidence interval, you add and subtract roughly two standard errors. So, if the standard error is .8, what is two standard errors? 1.6, right? And you add and subtract that to the observed score. So, the observed score was 5.0, plus 1.6, minus 1.6. You are 95 percent sure the true score is anywhere between a 3.4 and a 6.6. Are you happy or are you sad? You don't know, right? He might be worse than you thought. He might be like two years behind. [Laughter] BOB MARZANO: You're going to capitalize at night and in the morning. Or it might be time for Baskin & Robbins, right? I mean, he is six months ahead of the game. Now, I've shown this to parents and legislators in places where they use the ITBS, and they say, well, let's get rid of the ITBS. I say, well, you don't get it; that's as good as it gets. The ITBS is a great test. I recommend it all the time. I think it is a wonderful test. There is error in every test that's out there. You cannot get away from it. No matter how good the test is, there is always wobbling. You're never sure. What usually really brings this home to people is the SAT's. Anybody know what the standard error for verbal and quantitative combined is with the SAT's? If I'm correct, it's about 33 points. So, your daughter, in this case, comes home with SAT scores of 900. If you want to form the 95 percent confidence interval, you add and subtract 66 points. So, you are 95 percent sure the true score is anywhere from 834 to 966. Look up the percentile differences sometime between an 834 and a 966, and you will see how much wobble there is on a fine test like the SAT's. Anybody here take the GRE's? [A show of hands] BOB MARZANO: A few hands. Anybody here, like me, did not get accepted to the doctoral program I really wanted because my GRE's were too low? [No response] BOB MARZANO: I'm the only one in the house apparently. [Laughter] BOB MARZANO: Anybody know the standard error for the GRE's? According to Monty Neill, it's about 45 points. So, take your GRE cumulative score, add and subtract 90 points, look up the percentile differences between your low score and your high score. So, I'm telling you something you already know, but it is crazy when you think about it, isn't it? Now, I mentioned Monty Neill, of Fair Test. If you go to his Web site, first of all, Monty, and that particular Web site, is the best one I've ever seen for keeping track of large-stakes training. He is a wonderful guy. But we are in a crazy situation. I believe, at last count, the number of States that had high-stakes assessments — by high stakes I mean that if students don't do well on the assessment, something not very good happens, as in they don't get a certain type of diploma. That is crazy. That is an absolutely ludicrous place to be. The American Psychological Association and the American Educational Research Association regularly — regularly, about every five years — put out a little document that is really standards relative to assessment. And in every single one of those documents, either implicitly or explicitly; most often explicitly — they say never, never, never make a decision about an individual based on a single test or a set of tests. Yet, I think that already is happening all across the country. And I think the trend will get even stronger and stronger. Also, at the school level, how many are in States where you have a report card on your school and, if you don't do well, something happens? [A show of hands] BOB MARZANO: That's funny. I believe this started in Kentucky, with the KUROUS model, although Kentucky has been very forward thinking in this and I think it has changed somewhat from the anecdote I'm going to give. But the idea has been to get a score for a school, usually on a very complex index. And you collect baseline data. And then, after a while, you are measured again. And if you have met expected growth, things are good. If you have exceeded it, things are even better. If you haven't, you're put into a category. Now, my information is somewhat dated, but I remember when it all first started, I started informally collecting the names of the categories that you were put into if your school did not meet expectations in terms of growth. I believe Kentucky started with the term "a school in crisis." Now, I don't know if I like that or not. They have changed that term, but the images I got were that the ambulance would come up with the siren and that type of thing. I think Michigan at one time was playing with the term "reconstructed school." Now, I don't think I want to be reconstructed. I don't know about you, but that scared me a little bit. So, what is the alternative? What do we do? I think there is an option. One that is good but I am not a terrific fan of is to do more testing. And some places do that. Give end-of-quarter tests. And there is certainly a logic to that. You get a company, or you do it yourself, to create end-of-quarter tests for different subject areas. And, like I said, there is a logic to that. I would rather do that, though, than the current system, but I think there is a better way of doing business. I think there is a vehicle out there that is just sitting there, waiting to be used by us, to get better feedback on students, and to use that feedback to set challenging goals for all students and for our school. And that is the second most sacred cow in public education — and I already gored the first, and that is individual teachers' flexibility or freedom now to teach what they want — the second would be individual teachers' flexibility or freedom to grade any way that they want. So, let me talk about grading for a minute here. Do you know who Seymour Papert is? He wrote the book "Mindstorms." He is an artificial intelligence genius. I never heard him speak. My friend, Bob Ely, heard him speak. He, basically, is the developer of Logo. Do you remember Logo? And the thinking in those days went — and Abelson and Chait were also involved in Logo — the idea was that if you can get kids playing with computers, they will start to think in ways that you and I couldn't do because we didn't have access to that technology. They could actually see algorithms and functions being played out on the screen. Now, my friend, Bob Ely, heard him speak a number of years ago, and he provided a metaphor that was really telling. He said that if you took a physician from 100 years ago and put that physician in a current operating theater, the physician would basically have little idea as to what is going on, it has changed so much what goes on in an operating room. However, if you took a teacher from 100 years ago and put the teacher in a classroom, a lot would be different but an awful lot would be familiar. Now, one of the things that would be familiar — maybe not 100 years ago, but 80 years ago — is the report cards that we use right now. Now, with the advent of technology and grading books and computer-generated report cards, I would assert that teachers can keep track of students on specific knowledge and skills doing no more record-keeping than they do right now — I'll say it again — you can keep track at a very specific level with no more record-keeping than you do right now — if two things are in place. One is that you're lean and mean about what teachers are asked to keep track of. Can you see the first factor playing in here — I hope — guaranteed and viable curriculum? If you really address the area of viability and now, if you add to that some computer software that helps teachers keep track of things, they can keep track at a very specific level with no more — I would even assert less record-keeping time than they spend right now. Well, what would such a report card look like? I believe you can all see this fairly well. Here is a prototype of what a report card might look like. Notice at the top that you still have your overall grades. What is very different, though, is that for each subject area, you also have scores. Now, please don't get hung up on the 4-point scale that I'm using. Can you see where that could be a 100-point scale? Do you see where those could be A's, B's and C's? I happen to like a 4-point scale and I'm going to try to convince you as to its utility. So, the idea here is that you still have your overall grades, but you also break it down into finer detail as to how they're doing on this subject, this topic, this learning goal, this standard, whatever you want to call it. Now, a supplemental transcript that would go along with it would look something like that maybe. For those who can't read it very well, look at the area called mathematics. The first standard is called numeric problem solving. Do you see the average rating? It's a 2.4. Do you see where it says "number of ratings"? That's 5. Here is what it means. Five teachers in five different quarters gave a rating for that particular student on numeric problem solving. This might have taken two years to put together. Is there error in the individual teachers ratings? Yes, there is. But you can't get away from it. But, in the aggregate, what you start to see is patterns that I would assert have less error than that single test. Now, just let me play with this for a second. I'm going to spend a few minutes on this, but I just want to get your reaction to it. Assume that this was the report card used in your school. I know you have a lot of questions about it, but I gave you kind of the basic feel. And assume that, along with this, comes a description of what the 1, 2, the 3 and the 4 mean. If parents want, it would also come with a description of what was actually covered in each one of the areas. So, assume that is there. Here is my question to you: What do you like about this report card and what don't you like? But please answer from the perspective of a parent. Your son or daughter brings this home. What do you like and what don't you like? Thirty seconds, just turn to the person next to you, please. [Pause] BOB MARZANO: So, what are some of the things you don't like? Well, I've asked the question to enough people, I can guess. How many said something like parents won't understand it? [A show of hands] BOB MARZANO: That's a valid concern. Well, let me tell you. Over the last 10 years at McREL we've been playing this, what we have found is that we are pleasantly surprised at how open parents are to this. Really. And there are parents who just say, hey, I don't care about all that information; what grade did my son get? So, you will notice there is an overall grade up there. By the way, let me address that for a second. I can justify an overall grade on no grounds from a measurement perspective. I can justify an overall grade on no grounds from a learning perspective. Why would I give you an example of a report card, then, with an overall grade? Here is the reason. I'm a chicken. [Laughter] BOB MARZANO: I don't think that we're ready to give up the overall grade yet. If you can pull that off, God bless and more power to you. But I have just seen too many horrible battles when the issue of an overall grade was addressed. So, I've taken the position, okay, let's give the overall grade, but give more information, and people will, over time, start to say, so, what does that overall grade mean? My good friend and my boss, Tim Waters, at McREL, when he first joined McREL, he had a saying that he used to commonly use. And that is, are you willing to die on the hill for that battle? And of course, it was harkening back to the Native Americans who would stake themselves out in front of the village and I would either win or I die. So, when I ask myself the question, am I willing to die on the hill for the battle of the overall grade at this point in time, the answer is no. Because I know I will die on the hill. So, discretion is the better part of valor. Maybe it's not the time. So, let's give the overall grade, but then be more specific. Other things that people come up with — they say, well, isn't this too much work? Well, I said to answer from the perspective of a parent, not a teacher, but I know that comes up. Again, my qualifier is, don't do this if it requires more record-keeping time than it does right now. I say, with the proper software, that it doesn't, and being real lean and mean about what you keep track of. How about things you like about this? How many of you like the specificity? [A show of hands] BOB MARZANO: How many see this as a way that gives feedback to teachers, students, parents? Can you see the possibility of a system like this? If you had it in place, you actually could say, hey, Bobby, didn't do well on the State test, but we have three years' worth of evidence that, in mathematics, he knows this standard, this standard and this standard, and we stand by him. See, that's the advantage. I know the cost is great in terms of time and energy. This is not a simple thing to do. But, for me, we can't get away from cost, because the cost for doing what we do right now is that the external test will continue to make decisions about our students and about you and about your schools. So, as I say, it's a lose-lose situation in terms of energy here. If we keep doing what we're doing, the price we pay is that external tests will be the indicator of student achievement. The cost over here will be some time and energy up front. Now, thank God there are schools that have already done this. John Kendall's work I recommend to you, for numerous schools and districts. He has helped them identify what is essential versus what is supplement. There are software programs out there that do precisely what I say. The group we've been working with at ASCD is Excelsior Press. And Excelsior is in the convention hall. And there are other software companies. So, we have the wherewithal to do it now. What would it look like for an individual classroom teacher, though? Well, let me show you, briefly. The gradebook would look something like this. Now, for those who can't see it well, I will try to explain it to you. First of all, let me contrast it with the current gradebook. The current gradebook is the one that, by hand, you write in the scores. In the current gradebook, what do the columns represent? Assignments, right? Some people do it kind of by day, first day, second day, third day. More commonly, people have their quizzes section. But that is also by time sequence. And then the homework, et cetera, et cetera. Kind of from face value, it works really well. All you do is you put in your points there, right? At the end of the quarter, you add it all up, you divide by the total number of points, and you turn that into a grade. By the way, humor me and say you remember the formula: observed score equals true plus error. Remember that? Do you realize how much error is built in just from adding up the numbers and dividing? If you're a high school teacher — and I know in California, I've met teachers — over 200 kids at the high school level, unless you check your numbers, unless you recompute each student's score, there is probably a lot of error built in just from adding up the numbers. Which, again, is crazy when you think, why not let a computer do that? As long as you enter the data correctly, it does it automatically. So, in the current gradebook, the columns are the assignments, the assessments, whatever. The rows are the students. This gradebook is very different. There is a page for each student. On this particular page — and it doesn't say the student's name — the columns represent the standards, the outcomes. You follow me? And the rows represent the assessments, or assignments. Now, let's take a look at a specific one. Look at row i. First of all, that was a quiz on October 1st. What two things did that quiz measure? Do you see it? Help me out. Ocean currents and — oh, that's it. I can't see that far. Let's do j. That was another homework assignment. That was two things, measurement and reading tables, correct? Did I get that right? Now, you say, well, how did the teacher get that? Well, here is how the teacher got that. When he or she put the test together, the first question the teacher asked is, what standards, what outcomes, whatever you call them, does this test measure? That's more work. You can't do it the way I used to do it — "it" being design a test. You can't go to the back of the chapter and pull out 25 items. You have to think about which items measure which standards, which items measure which outcomes. So, we're losing time here, right? That's more time than currently it takes to put a test together — at least the way I used to put tests together. Now, when the teacher scores the test, the teacher gives two scores, not just one score. A score for, in this case, measurement of temperature and reading tables. The teacher gives two scores, not just one. Now, I'm using rubric scores, but could you see that could be percentage scores, too? Don't get sucked into my bias, although I wish you would. I like rubrics a lot. Now, if a rubric is used, here is what it might look like. This is a generic rubric, but I say rubrics can be used, even generic rubrics, quite effectively and quite precisely, realizing that we can never be totally precise. Just scan through this and get a sense of it. Start with the 3 actually. Get a sense of the 3. [Pause] BOB MARZANO: Do you see what the 3 says? The student has the content but not in great detail. There are no misconceptions. Look at a 4. 4 says the student has the content and in great detail, no misconceptions.Look at a 2. 2 says there are some misconceptions but the student is still in the ballpark. Look at a 1. So many misconceptions, they really don't have it. Now, let me apply this to this teacher assessing the test on this, or the assessment, on the two factors, measurement of temperature and reading tables. What the teacher does is look at the items on temperature, and she doesn't score every item with the rubric. She looks at all the items that deal with that topic and then takes a step back and says, in the aggregate, as a group, what do these tell me about that student's knowledge of that topic? Are you with me on this? Now, is there error in that? Yes. But, remember, you can't get away from error. But I would assert that a teacher can look at student responses and, from a global perspective, make a fairly precise judgment as to the student's level or knowledge of skill. And that's where a rubric comes in. Because what a rubric is is a description of levels of knowledge or skill. Actually, I still teach. I teach at Cardinal Stritch University in Milwaukee. I do it via teleconferencing. This is the way I teach. At the beginning of the semester, I identify the topics that I want to cover. That's the hard part. I have to know where I'm going to go before I start. For each one of the topics I'm going to cover, I give my doctoral students a rubric. And every time I give an assessment, I score it on the rubric. Now, it is hard at first. I mean, it is a little bit clumsy, because it's a different way of thinking. But teachers have reported, and I have found out myself, once you get used to it, you get actually very fast. Counterintuitive, here is what I found and what teachers have told me. If they have an assessment that measures two things, they read each assessment twice. In other words, they read every paper once for topic number one, and they're in kind of a mental set that allows them just to go right through it. And then they read each paper a second time, when they're in that different mental set. And I've found that also with my scoring of tests. And I would assert that I can do that faster than putting in points and adding up the points. Just to go a little more specifically here, what if you say, gee, on these four or five items on measurement of temperature, it's not a 2, because I don't see any great misconceptions, but it's not a 3 either, because I see some things wrong, what score do you give? How about a 2.5? Now, on face value, people say, well, that's crazy. Isn't that arbitrary? Remember, you cannot get away from error. That's why I spend a lot of time on fine tests like the ITBS and the SAT's. You cannot get away from error. Does that improve the precision? Absolutely. I believe, when Lee Cronbach was asked by the State of California, during the old class days, how to make their 4-point rubric more precise — now, Lee Cronbach, go to your stat text and open it up and you'll see Cronbach's coefficient alpha. This has a person with a statistic named after him; obviously he knows a little bit about measurement. When he was asked by the State of California, I believe, how to make the 4-point rubric more accurate, his comment was: Define a 1, define a 2, define a 3, and define a 4, but let teachers use half-point scores. So, even from the top there is an intuitive logic to this. Now, can you see a teacher, over time, giving assessments that cover more than one thing, and keeping track on a very specific level, and now having computerized gradebooks that allow them to enter the data? The advantages of that are just amazing, when you think about it. And the time saved is just ridiculous. One of the biggest issues that you will have to deal with is what to do with the academic versus the nonacademic factors if you go down this track. Let me talk to that a little bit. How many are in schools where they give overall grades? [A show of hands] BOB MARZANO: The vast majority. If you're not in a classroom now, think back to when you were in a classroom. Here is my question to you: When you put together an overall grade, what are some things other than the academic content that you considered in the overall grade? Do you understand my question? Again, turn to the person next to you. This will wake you up a little bit here. What are the things other than the academic content that you put into the overall grade? [Pause] BOB MARZANO: How many said things like effort? [A show of hands] BOB MARZANO: How many said things like cooperation? [A show of hands] BOB MARZANO: How many did attendance? [A show of hands] BOB MARZANO: How many did things like behavior? [A show of hands] BOB MARZANO: Okay, you got most of them. Let me show you what most research studies focus on. This is a study we did at McREL a few years back. We just simply asked actually classroom teachers that very simple question: Other than the academic grade, for places where they had to give an overall grade, what do you consider other than the academic content? And you can see the numbers there. The percentage of kindergarten teachers who included effort, behavior, et cetera, et cetera. Can you see why the overall grade doesn't mean anything? I mean, no kidding. You see, I teach fifth-grade map, you teach fifth-grade map. You include effort; I don't. I include cooperation and you don't. You include behavior; I don't. And this doesn't even tell us how they're weighted. We both include effort. I include it 25 percent, you include it 15 percent. No kidding, without exaggeration, I believe, in the last decade, I have talked to upwards of 1,000 K through 12 teachers. I believe, without exaggeration, that is fairly accurate. And almost to a person, when I ask them to explain their grading policy, it made a lot of sense to me. It really did. There was a logic to it. And I'm sure they did a great job of communicating that to their students, and they probably did a great job of communicating that to parents. But, unfortunately, when the student moves on to another class, the teacher is not there anymore to explain the grade. All you are left with is the overall grade. So, this freedom we have, called the freedom of the individual teacher to come up with his or her scheme, even a good scheme, for grading has really worked against us. Because nobody trusts the grades anymore. And actually, we have boxed ourselves into this corner. Can you see there is no alternative other than to use some external test? We cannot use a grading scheme. So, for me, this record-keeping is no small thing. It allows us to get out of the box we're in. Is it valid to grade kids on these non-achievement or nonacademic factors? Absolutely. Remember the SCANS report? Remember that? It was the early nineties, I believe, when they went out to the business world and asked, what do you want K through 12 schools to teach students? And they said academics, but not as much as you would think. They wanted some real basics, but they also said to teach them responsibility. Teach them to put effort into what they do. Teach them to show up on time. We will teach them the content, as long as they got some basic academics. But it is the other things, these lifelong learning areas, that we cannot teach them. Now, if we are going to give kids feedback. Let's do it as accurately as possible. And again, I'll give you my bias. I don't think a point system works with these nonacademic factors. I don't see how it works, as a matter of fact. Actually, characteristically, what happens is that, for effort, points are provided for extra credit. Is that right? What you find, though, is a lot of variance in terms of an individual teacher with given students. For one student, the extra credit is 10 points; for another student, the extra credit is 25 points — for a lot of personal reasons. Actually, behavior is the one where it really doesn't work well. I have seen very elaborate schemes from teachers, and it usually goes something like this. You all start with 100 points for the quarter, but I will take off points for behavior. And it sounds good. I've even seen very detailed plans, this infraction is this number of points, this infraction is this number of points. Well, what happens is Johnny does something last week, it's 5 points off of him. Mary does exactly the same infraction this week, it's 25 points for her and you're out of here. Well, why? I like Johnny. Mary I don't like as much. Besides, I didn't get a lot of sleep last night. So, what's a better system? I'll go to rubrics again. Rubrics that are designed specifically for these nonacademic areas. Just scan through that. This is a rubric for assignments. [Pause]
|
||||||||||||||||||||
![]() |
|||||||||||||||||||||