Sings The Future – Song Popularity Prediction

Z/Yen were asked whether PropheZy could predict song popularity from some historic song data. Two spreadsheets below provide some positive results, though the test data was sparse. Some historic song data from the 1980s was provided - song, genre, artist, position in chart, week in year, date. Pre-processing was turned text into numbers. However, as PropheZy tends to perform well on classes as opposed to exact values in sparse data sets, the number of weeks was banded into clumps of 10, e.g. band 1 = 1-9 weeks. We then built a model excluding the last three weeks and tried to predict those.

The first sheet (General) just uses the predictor with no class. The results are good, but exactly 50% high, e.g. PropheZy predicts 16, 24 and 16 weeks in the charts for the last three songs, but they were 10, 15 and 8 respectively. Likewise it predicts bands 2, 3 and 2 when it was 2, 2 and 1. These cases fell on band boundaries. This is a case of predicting off of very, very little data. PropheZy is predicting weeks in the charts using only position, major genre and minor genre. There are no duplicate artists (so no information there) and having the date actually reduces the predictability a bit because it's such a limited data set.

The second sheet (Class) tries to predict the week band. At first it looks no better - predicts 2, 2 and 2 when it was 2, 2 and 1 (so still one off). However, looking behind it, it is actually picking up the pattern of those three song (10, 15, 8), but the banding reduces it to the same general class. In summary, while lots more data would help, PropheZy has said that the last three songs would have performed about where they did.

Little should be read into song prediction without a much richer dataset, but this example does show that small datasets do have some predictive power.