TEXT AND DATA MINING (TDM) is the computational analysis of vast quantities of digital information, whether free-form natural language text or structured data. Using specialized software, researchers can extract data, identify trends, look for patterns and better understand the relationships of terms within and between documents. Analysis might focus on word frequency, words that frequently appear near each other, contextual information for key words, common phrases and other patterns. Materials to be analyzed range from websites (such as publicly available Facebook posts), 16th C. manuscripts, DNA sequences, to old newspapers.
Mapping the Republic of Letters seeks to describe and visualize the dissemination and criticism of ideas, the spread of political news and the circulation of people and objects during a time before instant communication systems, professional associations and the internet.
Seeks to explore the dramatic and often traumatic changes as well as the sometimes surprising continuities in the social and political life of Civil War Richmond. It uses as its evidence nearly the full run of the Richmond Daily Dispatch, an archive of over 112,000 pieces consisting of nearly 24 million words. It uses as its principle methodology topic modeling, a computational, probabilistic technique to uncover categories and discover patterns in and among texts.
The Bomb Sight project is mapping the London WW2 bomb census between 7/10/1940 and 06/06/1941. Previously available only by viewing in the Reading Room at The National Archives, Bomb Sight is making the maps available to citizen researchers, academics, and students. They will be able to explore where the bombs fell and to discover memories and photographs from the period.
By comparing and contrasting the top songs on Billboard's year-end Hot 100 charts from 1965-2015, Walker uncovered common elements that most chart-toppers share and identified the musical elements that characterize the genre decade to decade. By and large our songs have gotten longer, wordier and far more explicit as time has gone on.
Lord of the Rings Project, commonly shortened LotrProject, is a creative web project dedicated to the works of J.R.R. Tolkien. It is perhaps most known for the extensive and ever updating genealogy, the historical timeline of Middle-Earth and the statistics of the population of Middle-Earth.
Visualization of how the linguistic standard of the presidential address has declined over time. Using the Flesch-Kincaid readability test the Guardian has tracked the reading level of every State of the Union.
Several years ago, more or less on a lark, a group of researchers from England used a computer program to analyze the emotional content of books from every year of the 20th century — close to a billion words in millions of books.
Inspired by visualizations of particle collisions at CERN, wordcollider accelerates two phrases against each other. The collision splits the words up in their letters, their elementary particles. After collision, wordcollider visualizes a signature for each letter, based on their phonetic characteristics. Wordcollider is the result of a processing workshop with Steffen Fiedler. http://steffenfiedler.com/