Plagiarism-Proof Assignments: The Up-Goer Five Challenge

up_goer_fiveOk, so there’s probably no such thing as a plagiarism-proof assignment, but I think I’ve got a reasonable approximation thereof.

It originated with my frustration with the perpetual struggle to have students in my distance education classes answer questions in their own words. My students are using their textbooks to answer questions, and many seem to feel that a textbook is the exception to the rule when it comes to plagiarism. Some simply don’t understand that they’re doing anything wrong. From experience, I can tell you that many people who are not my students also see it that way, and complaining about it is a great way to be branded as unreasonable. The problem, as I’ve documented before, is that students who copy from their textbook also tend to fail the class. After last term, I’ve decided that it’s in my best interest to consume alcohol before grading assignments. I’m not allowed to ignore plagiarism, but what I don’t see

Absent blissful ignorance, the only way to deal with plagiarism (without causing myself a variety of problems) is to change the assignments so that plagiarism isn’t possible. Now, if you’ve attempted to do this, you know it isn’t easy. A search online will give you tips like having students put themselves in the position of a person experiencing a historical event, and explaining their perspective on the matter. That’s something students (most likely) can’t copy from the internet. But suggestions like that are not especially helpful when the topic is how volcanoes work. (Although now that I think about it, “Imagine you are an olivine crystal in a magma chamber…”)

The solution came from my online source of comfort, xkcd. Randall Munroe, the creator of the webcomic, set himself the challenge of labeling a diagram of NASA’s Saturn 5 rocket (Up Goer Five) with only the 1000 most commonly used words in the English language. Soon after, members of the geoscience community took up the challenge of explaining their fields of research in the 1000 most commonly used words. Here are two examples from a blog post by hydrogeologist Anne Jefferson. Anne writes:

” So I decided to see if I could explain urban hydrology and why I study it using only the words in the list. Here’s what I came up with:

I study how water moves in cities and other places. Water is under the ground and on top of it, and when we build things we change where it can go and how fast it gets there. This can lead to problems like wet and broken roads and houses. Our roads, houses, and animals, can also add bad things to the water. My job is to figure out what we have done to the water and how to help make it better. I also help people learn how to care about water and land. This might seem like a sad job, because often the water is very bad and we are not going to make things perfect, but I like knowing that I’m helping make things better.

Science, teach, observe, measure, buildings, and any synonym for waste/feces were among the words I had to write my way around. If I hadn’t had access to “water”, I might have given up in despair.

But my challenge was nothing compared to that faced by Chris, as he explained paleomagnetism without the word magnet:

I study what rocks tell us about how the ground moves and changes over many, many (more than a hundred times a hundred times a hundred) years. I can do this because little bits hidden inside a rock can remember where they were when they formed, and can give us their memories if we ask them in the right way. From these memories we can tell how far and how fast the rocks have moved, and if they have been turned around, in the time since they were made. It is important to know the stories of the past that rocks tell, because it is only by understanding that story that we really understand the place where we live, how to find the things that we need to live there, and how it might change in the years to come. We also need to know these things so we can find the places where the ground can move or shake very fast, which can be very bad for us and our homes.”

Is that brilliant, or what?! To make it even better, Theo Sanderson developed a text editor to check whether only those words have been used. This is what happened when I typed part of the introduction to the chapter on volcanoes:

Up-Goer Five text editor

Yes, fortunately it has the word “rock.”

I decided to test-drive this with my class. I gave them the option of answering their assignment questions in this way. It’s difficult, so they got bonus points for doing it. A handful attempted it, and that was probably the most fun I’ve ever had grading assignments. If you’d like to give this kind of assignment a shot, there are a few things to keep in mind:

  • Students (and colleagues) may be skeptical. Explain that the exercise requires a solid knowledge of the subject matter (in contrast to paraphrasing the textbook) and is a very effective way for students to diagnose whether they know what they think they know. In my books, that gives it a high score in the learning-per-unit-time category.
  • The text editor has some work-arounds, like putting single quotes around a word, or adding “Mr or “Mrs” in front of a word (e.g., Mr Magma). Head those off at the pass, or you’ll get “But you didn’t say we couldn’t!”
  • You may wish to allow certain words for the assignment or for specific questions, depending on your goals. For example, if I were less diabolical, I might consider allowing the use of “lava.” The other reason for not allowing “lava” is that I want to be sure they know what it means. In contrast, I probably wouldn’t make them struggle with “North America.”
  • Make it clear that simple language does not mean simple answers. I found that students tended to give imprecise answers that didn’t address important details. I don’t think they were trying to cut corners—they just didn’t think it was necessary. If I were to do this again I would give them a rubric with examples of what is and isn’t adequate.
  • Recommend that they write out the key points of their answers in normal language first, and in a separate document, and then attempt to translate it.
  • Suggest that they use analogies or comparisons if they are stuck. For example, Randall Munroe refers to hydrogen as “the kind of air that once burned a big sky bag.”
  • Make the assignment shorter than you might otherwise, and focus on key objectives. Doing an assignment this way is a lot of work, and time consuming.
  • And finally, (as with all assignments) try it yourself first.

In that spirit:

I like to make stories with numbers to learn what happens when things go into the air that make air hot. Very old rocks from deep under water say things that help make number stories. The number stories are not perfect but they still tell us important ideas about how our home works. Some day the number stories about how old air got hot might come true again, but maybe if people know the old number stories, they will stop hurting the air. If they don’t stop hurting the air, it will be sad for us because our home will change in bad ways.

Time: The Final Frontier

Timefleet Academy logo: a winged hourglass made of ammonites

A logo begging for a t-shirt

Here it is: the final incarnation of my design project for Design and Development of Educational Technology—the Timefleet Academy. It’s a tool to assist undergraduate students of historical geology with remembering events in Earth history, and how those events fit into the Geological Time Scale. Much of their work consists of memorizing a long list of complicated happenings. While memorizing is not exactly at the top of Bloom’s Taxonomy (it’s exactly at the bottom, in fact), it is necessary. One could approach this task by reading the textbook over and over, and hoping something will stick, but I think there’s a better way.

I envision a tool with three key features:

  • A timeline that incorporates the Geological Time Scale, and “zooms” to show events that occur over widely varying timescales
  • The ability to add events from a pre-existing library onto a custom timeline
  • Assessments to help students focus their efforts effectively

Here’s an introduction to the problem, and a sketch of my solution. If your sensors start to detect something familiar about this enterprise then you’re as much of a nerd as I am.

Timefleet Academy is based on the constructionist idea that building is good for learning. Making a representation of something (in this case, Earth history) is a way of distilling its essential features. That means analyzing what those features are, how they are related, and expressing them explicitly. Ultimately this translates to the intuitive notion that it is best to approach a complex topic by breaking it into small digestible pieces.

Geological Time Scale

This is what you get to memorize.

As challenging as the Geological Time Scale is to memorize, it does lend itself to “chunking” because the Time Scale comes already subdivided. Even better, those subdivisions are designed to reflect meaningful stages (and therefore meaningful groupings of events) in Earth history.

There is an official convention regarding the colours in the Geological Time Scale (so no, it wasn’t my choice to put red, fuchsia, and salmon next to each other), and I’ve used it on the interface for two reasons. One is that it’s employed on diagrams and geological maps, so students might as well become familiar with it. The other is that students can take advantage of colour association as a memory tool.

Assessments

Assessments are a key difference between Timefleet Academy and other “zoomable” timelines that already exist. The assessments would come in two forms.

1. Self assessment checklists

These allow users to document their progress through the list of resources attached to individual events. This might seem like a trivial housekeeping matter, but mentally constructing a map of what resources have been used costs cognitive capital. Answering the question “Have I been here already?” has a non-zero cognitive load, and one that doesn’t move the user toward the goal of learning historical geology.

2. Drag-and-drop drills

The second kind of assessment involves drill-type exercises where users drag and drop objects representing events, geological time periods, and dates, to place them in the right order. The algorithm governing how drills are set would take into account the following:

  • The user’s previous errors: It would allow for more practice in those areas.
  • Changes in the user’s skill level: It would adjust by making tasks more or less challenging. For example, the difficulty level could be increased by going from arranging events in chronological order to arranging them chronologically and situating them in the correct spots on the Geological Time Scale. Difficulty could also be increased by placing time limits on the exercise, requiring that the user apply acquired knowledge rather than looking up the information.
  • The context of events: If drills tend to focus on the same group of events, the result could be overly contextualized knowledge. In other words, if the student were repeatedly drilled on the order of events A, B, and C separately from the order of events D, E, and F, and were then asked to put A, B, and E in the right order, there could be a problem.

The feedback from drills would consist of correct answers and errors being indicated at the end of each exercise, and a marker placed on the timeline to indicate where (when) errors have occurred. Students would earn points toward a promotion within Timefleet Academy for completing drills, and for correct answers.

Who wouldn’t want a cool new uniform?

How do you know if it works?

1. Did learning outcomes improve?

This could be tested by comparing the performance of a group of students who used the tool to that of a control group who didn’t. Performance measures could be results from a multiple choice exam. They could also be scores derived from an interview with each student, where he or she is asked questions to gauge not only how well events are recalled, but also whether he or she can explain the larger context of an event, including causal relationships. It would be interesting to compare exam and interview scores for students within each group to see how closely the results of a recall test track the results of a test focused on understanding.

For the group of students who have access to the tool, it would be important to have a measure of how they used it, and how often. For example, did they use it once and lose interest? Did they use it for organizing events but not do drills? Or did they work at it regularly, adding events and testing themselves throughout? Without this information, it would be difficult to know how to interpret differences (or a lack of differences) in performance between the two groups.

 2. Do they want to use it?

This is an important indicator of whether students perceive that the tool is helpful, but also of their experience interacting with it. Students could be surveyed about which parts of the tool were useful and which weren’t, and asked for feedback about what changes would make it better. (The option to print out parts of the timeline, maybe?) They could be asked specific questions about aspects of the interface, such as whether their drill results were displayed effectively, whether the controls were easy to use, etc. It might be useful to ask them if they would use the tool again, either in its current form, or if it were redesigned to take into account their feedback.

Timefleet in the bigger picture

Writing a test

All set to pass the test of time

Timefleet Academy is ostensibly a tool to aid in memorizing the details of Earth history, but it actually does something more than that. It introduces students to a systematic way of learning- by identifying key features within an ocean of details, organizing those features, and then testing their knowledge.

The point system rewards students for testing their knowledge regardless of whether they get all of the answers right. The message is twofold: testing one’s knowledge is valuable because it provides information about what to do next; and testing one’s knowledge counts as progress toward a goal even if you don’t get the right answers every time. Maybe it’s threefold: if you do enough tests, eventually you get a cape, and a shirt with stars on it.

Building Assessments into a Timeline Tool for Historical Geology

In my last post I wrote about the challenges faced by undergraduate students in introductory historical geology. They are required to know an overwhelming breadth and depth of information about the history of the Earth, from 4.5 billion years ago to present. They must learn not only what events occurred, but also the name of the interval of the Geological Time Scale in which they occurred. This is a very difficult task! The Geological Time Scale itself is a challenge to memorize, and the events that fit on it often involve processes, locations, and organisms that students have never heard of. If you want to see a case of cognitive overload, just talk to a historical geology student.

My proposed solution was a scalable timeline. A regular old timeline is helpful for organizing events in chronological order, and it could be modified to include the divisions of the Geological Time Scale. However, a regular old timeline is simply not up to the task of displaying the relevant timescales of geological events, which vary over at least six orders of magnitude. It is also not up to the job of displaying the sheer number of events that students must know about. A scalable timeline would solve those problems by allowing students to zoom in and out to view different timescales, and by changing which events are shown depending on the scale. It would work just like Google Maps, where the type and amount of geographic information that is displayed depends on the map scale.

Doesn’t that exist already?

My first round of Google searches didn’t turn anything up, but more recently round two hit paydirt… sort of. Timeglider is a tool for making “zoomable” timelines, and allows the user to imbed media. It also has the catch phrase “It’s like Google Maps but for time,” which made me wonder if my last post was re-inventing the wheel.

ChronoZoom was designed with Big History in mind, which is consistent with the range of timescales that I would need. I experimented with this tool a little, and discovered that users can build timelines by adding exhibits, which appear as nodes on the timeline. Users can zoom in on an exhibit and access images, videos, etc.

If I had to choose, I’d use ChronoZoom because it’s free, and because students could create their own timelines and incorporate timelines or exhibits that I’ve made. Both Timeglider and ChronoZoom would help students organize information, and ChronoZoom already has a Geological Time Scale, but there are still features missing. One of those features is adaptive formative assessments that are responsive to students’ choices about what is important to learn.

Learning goals

There is a larger narrative in geological history, involving intricate feedbacks and cause-and-effect relationships, but very little of that richness is apparent until students have done a lot of memorization. My timeline tool would assist students in the following learning goals:

  • Memorize the Geological Time Scale and the dates of key event boundaries.
  • Memorize key events in Earth history.
  • Place individual geological events in the larger context of Earth history.

These learning goals fit right at the bottom of Bloom’s Taxonomy, but that doesn’t mean they aren’t important to accomplish. Students can’t move on to understanding why things happened without first having a good feeling for the events that took place. It’s like taking a photo with the lens cap on- you just don’t get the picture.

And why assessments?

This tool is intended to help students organize and visualize the information they must remember, but they still have to practice remembering it in order for it to stick. Formative assessments would give students that practice, and students could use the feedback from those assessments to gauge their knowledge and direct their study to the greatest advantage.

How it would work

The assessments would address events on a timeline that the students construct for themselves (My Timeline) by selecting from many hundreds of events on a Master Timeline. The figure below is a mock-up of what My Timeline would look like when the scale is limited to a relatively narrow 140 million year window. When students select events, related resources (videos, images, etc.) would also become accessible through My Timeline.

Timeline interface

A mock-up of My Timeline. A and B are pop-up windows designed to show students which resources they have used. C is access to practice exercises, and D is how the tool would show students where they need more work.

Students would benefit from two kinds of assessments:

Completion checklists and charts

The problem with having abundant resources is keeping track of which ones you’ve already looked at. Checklists and charts would show students which resources they have used. A mouse-over of a particular event would pop up a small window (A in the image above) with the date (or range of dates) of the event and a pie chart with sections representing the number of resources that are available for that event. A mouse-over on the pie chart would pop up a hyperlinked list of those resources (B). Students would choose whether to check off a particular resource once they are satisfied that they have what they need from it, or perhaps flag it if they find it especially helpful. If a resource is relevant for more than one event, and shows up on multiple checklists, then checks and flags would appear for all instances.

Drag-and-drop exercises

Some of my students construct elaborate sets of flashcards so they can arrange events or geological time intervals spatially. Why not save them the trouble of making flashcards?

Students could opt to practice remembering by visiting the Timefleet Academy (C). They would do exercises such as:

  • Dragging coloured blocks labeled with Geological Time Scale divisions to put them in the right order
  • Dragging events to either put them in the correct chronological order (lower difficulty) or to position them in the correct location on the timeline (higher difficulty)
  • Dragging dates from a bank of options onto the Geological Time Scale or onto specific events (very difficult)

Upon completion of each of the drag-and-drop exercise, students would see which parts of their responses were correct. Problem areas (for example, a geological time period in the wrong order) would be marked on My Timeline with a white outline (D) so students could review those events in the appropriate context. White outlines could be cleared directly by the student, or else by successfully completing Timefleet Academy exercises with those components.

Drag-and-drop exercises would include some randomly selected content, as well as items that the student has had difficulty with in the past. The difficulty of the exercises could be scaled to respond to increasing skill, either by varying the type of drag-and-drop task, or by placing time limits on the exercise. Because a student could become very familiar with one stretch of geologic time without knowing others very well, the tool would have to detect a change in skill level and respond accordingly.

A bit of motivation

Students would earn points for doing Timefleet Academy exercises. To reward persistence, they would earn points for completing the exercises, in addition to points for correct responses. Points would accumulate toward a progression through Timefleet Academy ranks, beginning with Time Cadet, and culminating in Time Overlord (and who wouldn’t want to be a Time Overlord?). Progressive ranks could be illustrated with an avatar that changes appearance, or a badging system. As much as I’d like to show you some avatars and badges, I am flat out of creativity, so I will leave it to your imagination for now.

When Good Grades Are Bad Information

Assignment grades versus exam gradesThis week I set out to test a hypothesis. In one of my distance education courses, I regularly get final exam scores that could pass for pant sizes. I have a few reasons to suspect that the exam itself is not to blame. First, it consists of multiple-choice questions that tend toward definitions, and general queries about “what,” rather than “why” or “how.” Second, the exam questions come directly from the learning objectives, so there are no surprises. Third, if the students did nothing but study their assignments thoroughly, they would have enough knowledge to score well above the long-term class average. My hypothesis is that students do poorly because the class is easy to put on the back burner. When the exam comes around, they find themselves cramming a term’s worth of learning into a few days.

Part of the reason the class is easy to ignore is that the assignments can be accomplished with a perfunctory browsing of the textbook. In my defense, there isn’t much I can do about fixing the assignments.  Someone above my pay grade would have to start the machinery of course designers, contracts, and printing services. In defense of the course author, if a student were so inclined (and some have been), the assignments could be effective learning tools.

Another problem is that students tend to paraphrase the right part of the textbook.  Even if I suspect that they don’t understand what they’ve written, I have few clues about what to remedy.  The final result is that students earn high grades on their assignments. If they place any weight at all on those numbers, I fear they seriously overestimate their learning, and seriously underestimate the amount of work they need to put into the class.

So, back to testing my hypothesis: I decided to compare students’ averages on assignments with their final exam scores. I reasoned that a systematic relationship would indicate that assignment scores reflected learning, and therefore the exam was just too difficult. (Because all of the questions came undisguised from the learning objectives, I eliminated the possibility that a lack of relationship would mean the exam didn’t actually test on the course material.)

I also went one step further, and compared the results from this course (let’s call it the paraphrasing course) with another where assignments required problem-solving, and would presumably be more effective as learning tools (let’s call that the problem-solving course).

My first impression is that the paraphrasing course results look like a shotgun blast, and the problem-solving course results look more systematic. An unsophisticated application of Excel’s line fitting suggests that 67% of the data for the problem-solving course can be explained if assignment grades reflect knowledge gained, while only 27% of the data from the paraphrasing course can be explained that way.

I’m hesitant to call the hypothesis confirmed yet, because the results don’t really pass the thumb test. In the thumb test you cover various data with your thumb to see if your first impression holds. For example, if you cover the lowest exam score in the paraphrasing course with your thumb, the distribution could look a little more systematic, albeit with a high standard deviation. If you cover the two lowest exam scores in the problem-solving course, the distribution looks a little less so. There is probably a statistically sound version of the thumb test (something that measures how much the fit depends on any particular point or set of points, and gives low scores if the fit is quite sensitive) but googling “thumb test” hasn’t turned it up yet.

From looking at the results, I’ve decided that I would consider a course to be wildly successful if the grades on a reasonably set exam were systematically higher than the grades on reasonably set assignments— it would mean that the students learned something from the errors they made on their assignments, and were able to build on that knowledge.