Doug Johnson Website - dougwri - Data Mining Primer

Data Mining Primer

A Data Mining Primer and Implications for School Library Media Specialists
Knowledge Quest, May/June 2004

Last night I attended a district school board meeting where I watched an elementary principal give his building’s annual report. Things have changed. Numbers rule.

I have watched these annual school reports for a dozen years. PowerPoint, collaborative presentations by site leadership teams, videos, and photos have all been used to inform the board about the happenings and progress of individual schools. But last night, the story was told in a quite different way.

Of the 30 or so slides shown, over half were charts, graphs, and columns of numbers showing the growth of student achievement as measured by a battery of state and national test scores. New curriculum, exciting projects, special events, and staff development efforts were given short shrift. As a result of No Child Left Behind (NCLB) and state accountability requirements, the quantifiable “results” of schools’ educational effectiveness are driving building goals, budgets and principals’ priorities.

While this sole means of determining school quality should make all educators and parents uncomfortable for a number of reasons, we have to accept that “data” will continue to dominate the educational landscape.

It’s important for school library media specialists to understand the theory of data warehousing and data mining as it is beginning to be practiced by individual teachers, building leaders, and school district officials. Below are a short primer on data mining and some reflections on its possible implications for school librarians.

The Primer
Schools gather, store and use an increasingly large amount of data. Keeping track of everything from bus routes to building access codes to test scores to sports equipment to vaccination records is done with the help of specialized database programs. For some time school administrators have used large databases designed for budgeting and student recordkeeping.

Student information systems such as SASIxp, Skyward, PowerSchool, and other commercial and home-grown programs are primarily designed to hold a current school year’s data, conduct simple searches and create pre-constructed reports. But a continuing trend toward data-driven decision-making and educational accountability requires school leaders not only to store more data but to use it in increasingly sophisticated ways.

Spotting trends in dropout rates, grade inflation, gender or racial biases and truancy are all possible using properly created and interpreted reports generated from student information system data. But fueled by the requirements of NCLB, accountability-hungry state governments, and “continuous improvement” efforts, teams including curriculum specialists, assessment specialists, building administrators, and technology departments are realizing that student information systems alone are not powerful enough and are working toward implementing new data mining solutions.

Tools for data storage and access
Three common terms are often used (and confused). Data warehousing is simply storing information about students and programs in an accurate and uniform fashion. Most data warehousing applications store data about students for the course of their school careers. As you can imagine, this can amount to literally thousands of pieces of data for each student. Data mining is the ability to extract and interpret specific data held within the data warehouse. Data mining software often allows the user to select sub-groups of students, track cohort groups, create queries, and build visual representations of data patterns or trends. Data-driven decision making refers to the use of extracted data to make changes to current educational programs and make long range plans. (Eyes glazing over yet?)

Databases that provide data warehousing and programs that do data mining may do the following:

Keep information from multiple assessments (primarily tests), participation in special programs, and demographic data about individual students from year to year;
Disaggregate data to determine the performance of individual students or groups of students;
Track, identify and isolate the strategies, programs, teachers and interventions that affect student performance; and
Analyze the effectiveness of building and district programs and improvement plans.

Data-driven decision making
The concept behind data-driven decision making is that certain sets of data (indicators) can be used to determine whether programs or circumstances (interventions) have an effect on certain types of students (identifiers).

Data mining software needs to enable the user to find and understand the data through sorting, filtering and summarizing. At a basic level, the user can sort by multiple combinations of each of these areas:

Identifiers: Identification of the person or group. These are factors that are not changeable or controllable. (Name, ethnicity, gender, grade level, date of enrollment, length of time in district, socio-economic background, past attendance rate, results on past tests, etc.)
Interventions: The programs, strategies or other factors that may cause or may be correlated with change. (Summer school, special reading programs, Title One, special education programs, ESL programs, gifted and talented program participation, specific teachers, etc.)
Indicators: The data that indicate the extent to which change has occurred. (Current state test scores, standardized test scores, course grades, GPA, recent attendance, etc)

The goal of Mankato schools’ data warehousing and data mining is to help us answer complex questions like these which were generated by our building improvement teams:

How many years is a student likely to be in a Title I program?
Is an individual student showing growth? What interventions has the student already received?
How does a group of students (i.e.: grade level) compare to the same group of students at a different point in time? Ex: Compare fourth grade standardized test score results over three to five years.
What students are likely to benefit from a specific intervention based on tests and prior interventions?
Which individual students changed quartile bands on standardized tests from ninth to eleventh grade?
What does a group analysis of how students changed quartile bands in standardized tests from ninth to eleventh grade indicate?
Are all day kindergarten students better prepared for grade one than half day kindergarten students? Are kindergarten readiness test scores higher for these students? Are there fewer Title I students coming from all day kindergarten?
How do students who attended summer school compare to students who did not attended summer school?
Can we predict that a failing grade two student will also fail in the grade twelve? Can we get statistics to convince parents that intervention is needed?
Do students identified as gifted/talented early retain high scores in later tests?
Are low scores in one test predictive of future tests?
Compare the growth of reading ability between kindergarten and grade three between students who did and did not receive early intervention in reading. Is the difference increasing, decreasing or staying the same?
Are boys and girls showing the same level of performance?

Access to data by multiple stakeholders
The desire for access to student and school data is growing. Teachers want access to determine the past history and growth of individuals and groups of students and to help create site-driven school improvement plans. Parents want to compare the performance of their own children and school to nationally-normed groups of similar students and to the performance of students in other area schools. Communities and states want to identify high- and low-performing schools. Increasingly these groups expect to find appropriate, accurate, and secure access to specific information through the WWW.

Implications for librarians

1. Know and support your building and district’s planning efforts that may be driven by data interpretation and advocate for meaningful decision-making. I have long argued that a library media program’s goals should be directly tied to building and district educational goals. If these are now being developed as the result of test score data, we need to be aware of and involved in these processes. Having a place on building improvement committees and planning teams has never been more important. Few educators are statisticians at heart, but the need to make meaning out of raw data is a skill not just administrators, but teachers and parents as well, need to have. Differentiating between cause and correlation, knowing if a number is statistically significant, and identifying trends are all abilities that require training and practice. Data that are misinterpreted or over-emphasized can result in bad decisions being made. As budgets tighten, these skills are becoming increasingly important in determining what programs improve student performance and should be funded or changed. What is our role as school library media specialists in helping others correctly interpret data?

2. Select resources and tailor lessons to address the learning needs of specific students identified through data mining efforts. By disaggregating test score data, specific subgroups of students who are not making adequate yearly gains in reading, writing, math or other tested areas can be identified. Our library programs and materials need to target these groups. If our students for whom English is not a first language are struggling, what materials and activities should our library programs be providing them? If interpreting expository writing is shown to be a weak area, are we promoting non-fiction in our book talks and reading promotions? In other words, what specific things can our library programs do to help raise student test scores? A direct link between our efforts and measured school effectiveness can improve the perceived value of the school library.

Joyce Kasman Valenza at Springfield Township High School (PA) reports that the data mining efforts at her school have resulted in goal that are student-centered and performance-based including:

Increasing scholastic achievement of transfer-in students
Improving academic achievement of average students
Increasing overall participation in school community
“As the librarian,” Joyce reports, “I work with teachers to develop targeted strategies to achieve these goals - particularly with our “average” students.”

3. Include library programs as an intervention that may be having an impact on student achievement. While it is difficult, if not impossible, to measure the impact of an effective library media program on test scores within a single building, districts that may have various levels of library service among buildings can look at library service as a factor in student performance. Keith Curry Lance and others have effectively and persuasively done this on state-wide bases. Now we may have the ability to do local studies. While I do not know a district that has done this, I am excited by the possibility.

4. Advocate for other means of student assessment beyond standardized tests. One of the cruelest aspects of education’s drive toward accountability is its reliance on the results of standardized test scores as the sole measure of student educational attainment. This educational monoculture results in an over-emphasis on memorization and test-taking skills and ignores the importance of the ability to apply knowledge to solve problems and make informed decisions. Bright, creative children, who may have incredible talents that the school may well have helped develop, may show up as educational failures by schools that only use test scores to measure their effectiveness.

Strong information literacy curricula use performance-based assessments. Unlike paper and pencil tests, authentic assessment tools such as rubrics, checklists and conferencing, allow educators to measure students’ higher level thinking skills including application, transfer and judgment. Although it seems like we are swimming against strong educational tides in this area, we must continue to advocate for authentic assessment in both the library and the classroom. We need to begin to determine ways that we can convert these sometimes subjective assessments into results that can be quantified.

In the Mankato District, for example, our elementary library media specialists authentically assess attainment of benchmarked library and technology skills, but record and report each student’s performance on them. Since each student’s level of attainment is kept in the progress report database, we can show teachers, administrators, and the community the percentage of students meeting or exceeding benchmarks in specific skill areas. Unlike, standard tests, these assessments measure the ability to actually apply skills and allow students to practice skills until they are attained. But we still have numbers to show that skills are being taught and learned.

5. Remember and remind others that “school” has always been more than simply memorizing content.
Donald Norman in his excellent book, Things That Make Us Smart: Defending Human Attributes in the Age of the Machine, (Perseus Publishing,1993) gently reminds us:

The final result is that technology aids our thoughts and civilized lives, but it also provides a mind-set that artificially elevates some aspects of life and ignores others, not based upon their real importance but rather by the arbitrary condition of whether they can be measured scientifically and objectively by today’s tools. Consequently, science and technology tend to deal solely with the products of their measurements, they divorce themselves from the real world. The danger is that things that cannot be measured play no role in scientific work and are judged to be of little importance. Science and technology do what they can do and ignore the rest. They are superb at what they do, but what is left out can be of equal or greater importance.

Are we then, as educators, in danger of judging things like building the love of learning, student self esteem, the ability to work with others, the experiences of play and fun, the value of sportsmanship, and the appreciation of beauty, courage and humanity to be of small importance because we cannot measure them? A school’s success should be based on more than its test scores. We need to remember that data mining may be an important piece in evaluating the quality of our schools, but it is only a piece.

We as school library media specialists must continue to tell our stories of how our programs have had an impact on individual students, not just aggregate groups. How Luisa is reading more confidently as a result of having access to high interest materials. How George became excited about science after do a webquest we created. How for certain children the library is the one place in the school they feel comfortable and safe. Decision makers are moved not just by numbers, but by anecdotal, personal information about students and what we do. Stories and numbers should both be considered as evidence of the impact of any school program, but especially library programs.

Conclusion
Data-driven decision-making, with test scores as the primary source of data, will be a permanent part of education. As school library media specialists we need to become knowledgeable about both the philosophy and tools behind it if we are to remain vital players in education. We have a dual role: to assist in the development of sound planning that may result from data mining and to remind both the educational and general community that not all things of value can be neatly quantified and measured. It is more important than ever that school library media specialists bring both their minds and hearts to their buildings’ planning efforts.

Posted on Sunday, July 8, 2007 at 06:37PM by

Doug Johnson in Knowledge Quest |

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>