We used the R package phrasemachine to also look at the phrases in addition to the words in songs and we used structural topic modeling to extract a set of time varying topics, which would be large enough not to miss any meaningful topic and small enough to not include topics that seem uninterpretable. A topic in this context is a set of words and phrases that occur together in songs. It turned out that this magic number was about 12 and thus we present here twelve such topics, the words and phrases that define them (the R package wordcould was used), their changing prevalence over time, a sentiment analysis of the lyrics of that topic showing how much more or less frequently than average certain key sentiments are expressed in the topics and a set of five bands and five sub-genres that stand out in that topic. The sentiment analysis was done with the tidytext package in R using the NRC Emotion Lexicon from Saif Mohammad and Peter Turney. All plots were done with ggplot (duh!) and the poster with Inkscape.
Notice that all of the software used is FREE software. And so our gratitude goes to all the magnificent people who have made this software available and who have had the decency to provide a public good instead of exploiting our wallets.
This is a work in progress, so all thoughts and comments are welcome. We will be tinkering with this further, updating the data, doing some analyses that we have not done yet and hopefully writing a longer story about this.
A higher resolution version of the poster is available here and a long format version, which is better to look at on a computer screen, is available here.