Written By M. Scott Thompson
M. Scott Thompson is a Digital Curator at The Center for Digital Antiquity. He received his PhD from Arizona State University in May 2014.
Dissertation data should remain alive in the digital age. I am trying to maintain my dissertation data as living, usable data by curating multiple sets in a widely accessible digital repository – the Digital Archaeological Record (tDAR). Let me tell you how and why I ditched the appendix.
Curating Dissertation Data in a Digital Repository
Recently, I completed my dissertation titled “Interaction with the Incorporeal in the Mississippian and Ancestral Puebloan Worlds.” The project is a comparative examination of the performance of mortuary ritual in the Prehispanic American Southeast and Southwest to understand the identities for the spirits of the dead in these two cultural environments. The examination involved the collection, management, and analysis of large amounts of mortuary data that span multiple archaeological culture areas.
I decided to present the dissertation data solely through tDAR (no appendices necessary). You can view the dissertation project page at the following URL: http://core.tdar.org/project/380979. Here is how I “published” all that data online.
Foremost, I curated the dissertation’s primary, raw data in tDAR. I uploaded the complete relational database that I used for collecting and managing the project’s information. I was able to make the primary data available immediately to other researchers who are interested in the dissertation. Moreover, I continue to manage and enhance the primary data and all the associated metadata. I am still currently documenting the large amounts of metadata that describe the database.
Second, I wanted to curate the processed data sets that I used in each of the study’s analyses, as well as the metric results that each statistical analysis returned. In the dissertation project, I conducted a series of multivariate, exploratory data analysis (EDA) procedures to characterize particular aspects of mortuary ritual within large mortuary samples. In order to perform these analyses, I had to process and format the raw data a great deal. During the course of the analyses, I gathered analysis results (such as multiple correspondence analysis [MCA] and multidimensional scaling [MDS] scores), and then continued to manipulate that information to interpret it. I needed to present these data in a way that allowed other researchers to obtain and use it – with no additional effort.
I uploaded to tDAR the processed data and the results that pertain to each multivariate analysis. These data are directly linked to figures and tables that present analysis results in the document. I placed persistent URL addresses in relevant figure captions and in the text to direct readers to appropriate tDAR resources/pages. You can view several of the processed/analysis results data sets at the following URLs: https://core.tdar.org/dataset/391946 and https://core.tdar.org/dataset/391948.
I hope that the curation of my dissertation data with tDAR ensures that these data are widely available in easily accessible, active formats. Like all others who spend too many years to count with their dissertation projects, I want the data to be used. I want other researchers to continue to analyze the information, to build upon or perhaps refute my study’s results, and to discover novel ways to approach these data in order to answer other questions.
Thinking Beyond the Appendix to Save Your Dissertation Data
In the paper age, authoring a dissertation presented many challenges for publishing associated data. The document itself was often the only venue for presenting these data. A manuscript does not offer ideal or even suitable formats for publishing large amounts of data. Presentation of data in a dissertation requires an author to make difficult decisions about data simplification simply to fit information into neat tables, which then span page after page after page. It eliminates any relationships that exist among the pieces of information. Finally, it lengthens a manuscript that, as your chair and your committee often remind you, is already long enough.
The dissertation’s primary vehicle for data presentation was and typically still is the dreaded appendix. Lurking beyond the dissertation’s references, appendices are often a no man’s land of supplementary information. They are long halls of formatted tables, with lists of categorical variables, numbers, and codes. Because they are printed, they require researchers to conduct hours of work to recreate the data in a format that can be manipulated and used. Thus, the appendices are only visited by those researchers who have such a pressing need to understand a dissertation’s primary data that they are willing to digitize it and re-analyze it.
In the digital age, there are new and emerging ways to disseminate dissertation data. These technologies and digital venues can lift dissertation data from the depths of appendices and place the information in curated formats that are widely discoverable. Through the use of digital data repositories, authors can preserve their primary data in perpetuity and make them widely available. Most importantly, though, they can use digital repository tools to ensure that the data are usable, right away.
It’s Still Alive
Let’s make the printed dissertation appendix a vestigial structure. With new digital technologies and venues, we have an opportunity to move beyond the simple publishing of data.
We have tools that allow us to curate and present primary data in increasingly flexible and creative formats. These tools enable authors and other researchers to interact with primary data in the formats in which they were originally created. More importantly, they allow researchers to interact with primary data in new and exciting ways, which can promote and even demand collaboration, continued manipulation, and growth of existing data. Let’s consider the management and presentation of dissertation data as a living process.
Dissertation data should not become the undead. Dissertation data should remain alive.