Last.fm normaliser - algorithm change

02 July 2007 | Matt Perdeaux |

After much head-scratching and a few hairy moments with the database this afternoon, we have updated the Last.fm Normaliser to use median track length values in its calculations, rather than the arithmetic mean values used previously.

Hopefully, this should smooth out a lot of the issues people were reporting with a handful of extra-long or extra-short tracks skewing the figures for a particular artist or album.

26 Comments

Ed Summers's GravatarYour xml needs to escape ampersands:

see http://www.associativetrails.com/stuff/normalisefm/index.cfm?user=inkcow&chart=artist&format=xml

for an example.

02 Jul 2007 at 10:28 PM | Ed Summers

harveydrone's GravatarThis is really cool. For artists, I think median makes the results much more realistic. For albums (maybe just for my listening habits), the arithmetic mean was much more accurate than median, because if I've listened to say 80 tracks from an album with 8 tracks, most likely I've listened to each track 10 times.

Maybe consider some kind of outlier-removal algorithm (eg take an average track length of all tracks except those > 1 std dev from the mean).

03 Jul 2007 at 12:56 PM | harveydrone

henriquemaia's GravatarHi, I love this normaliser approach. The only thing I find now weird is that In my case I have Mahler on the top and Bruckner in 3rd when I’m positive that I have listened more time to Bruckner that to Mahler. The median approach is not very precise in my particular case.

06 Jul 2007 at 09:26 PM | henriquemaia

error's GravatarI think that KoRn can't be normalised in this page. i think the problem is in the KoRn name because 'R' in the name of band is reserve. can you repair it?

12 Jul 2007 at 08:02 AM | error

Rednyrg721's GravatarI wonder whether you count tracks with length < 30 seconds when calculating median. It seems to me now they are taken into account (take artist Le Scrawl for example). But last.fm doesn't count them, so it seems to me you shouldn't too.

13 Jul 2007 at 04:05 PM | Rednyrg721

adrian's GravatarIt seems like when checking to albums, lose the no. 1 album from list...

20 Jul 2007 at 09:59 PM | adrian

Monk's GravatarStill one of my top-rated last.fm artists doesn't show in normalized top..

I am last.fm/user/Monkbel, and N.R.M. doesn't show up, though their musicbrainz page is full of info.

28 Jul 2007 at 07:43 AM | Monk

Claudio's GravatarGreat tool!

Could you please escape special chars? See here for related XML error with the ñ char, for instance:

http://www.validome.org/xml/validate/?lang=en&url=http://www.associativetrails.com/stuff/no
rmalisefm/index.cfm%3fuser=csaavedra%26chart=album%26format=xml

31 Jul 2007 at 02:33 PM | Claudio

Jason Daniels's GravatarHow about giving a default average to artists that aren't listed in the database? That way they don't drop off completely from the list.

14 Aug 2007 at 12:29 PM | Jason Daniels

Sergej Barbarisch's GravatarCan it come into question to take the top100 (or mb the top75) list to create a normalised top50 chart? I think it'd be interesting to see which artists would make it top50 lengthwise.. (Paysage d'Hiver <3)

20 Aug 2007 at 05:21 AM | Sergej Barbarisch

ice cream's Gravatarawesome!

could you make the normaliser work for users with spaces in their names?

16 Sep 2007 at 03:39 PM | ice cream

Steve's GravatarI've got a little list going next to my overall top artists of my top 15 normalized artists, but its a real pain to update it every week. It would be neat if there was a way to auto-update this list! But I guess that would be pretty intensive for the author... Just a thought :)

20 Sep 2007 at 02:08 PM | Steve

Matt Perdeaux's GravatarThe normaliser should now be pumping out proper UTF-8 XML, and will work for last.fm usernames with spaces.

24 Sep 2007 at 09:27 AM | Matt Perdeaux

Pixieguts's GravatarI run a group for independent artists at Last.fm.

Just wondering why all the artists who do not yet have a 'known name' dropped off my statistics on this.

Like: Titee, Electromagnetic Impulses, Remergence, Man Made Man, Phillip Wilkerson, DJ Satori The Nucleator - all in my Top Ten artists at Last.fm and nowhere to be seen on the results here.

27 Sep 2007 at 04:33 AM | Pixieguts

Matt Perdeaux's Gravatar@Pixieguts

There are numerous reasons why a certain artist may not appear in the normalised rankings. The Help/FAQ page goes through the various different causes, but if you're talking about emerging artists, its probably that they are either not in MusicBrainz yet, or the Last.fm data feed hasn't made the link to the MusicBrainz ID.

27 Sep 2007 at 06:31 AM | Matt Perdeaux

inkbot's GravatarI don't really understand how I can make an artist name that's stored with numerous albums at MusicBrainz anyway (Nick Cave & The Bad Seeds) appear in my ranking - Because it's not there.
And clicking on "Refresh Cache" doesn't change anything.

18 Nov 2007 at 06:47 AM | inkbot

Matt Perdeaux's Gravatar@inkbot

I get the same problem with Nick Cave & The Bad Seeds. The problem is that the data feed from last.fm does not contain a unique Musicbrainz ID for the band - see http://ws.audioscrobbler.com/1.0/user/iinkbot/topartists.xml

19 Nov 2007 at 05:44 AM | Matt Perdeaux

inkbot's Gravatarcan this be fixed?

21 Nov 2007 at 02:59 PM | inkbot

Lawliet's GravatarIs there a way for us to get a "feed" of the normalized data calculated? It'll be useful to do something with this data.

04 Dec 2007 at 10:03 PM | Lawliet

Lawliet's GravatarOh, my bad, I just saw the XML link. Please ignore. =X

04 Dec 2007 at 10:05 PM | Lawliet

Herbstlied's GravatarSome of the artists do not show on the normalised charts. The Angelic Process, for example.

15 Dec 2007 at 11:10 PM | Herbstlied

Angel's GravatarShe Wants Revenge seems to be a bit too short. I've listened to them over 400 times, but it only estimates like 30 minutes. I looked at the thing, and it's apparently counting all the 4 second silence tracks, and there's a lot of them. Maybe you could make it so it doesn't count tracks under like 10 seconds or so. Because none of the 400 tracks I've listened to were the silence ones, so SWR is seriously underrepresented on my chart.

07 Mar 2008 at 05:05 PM | Angel

http://www.last.fm/user/umbrella-ella/'s GravatarGet a widget going! This would be way cooler if you could attach a mini version to your profile

14 Apr 2008 at 07:20 AM | http://www.last.fm/user/umbrella-ella/

aoeu's Gravatarhey

You currently list for your input boxes in forms the following CSS:

background: #fff url("../_img/fieldbg.gif") repeat-x top;

You forget to declare the foreground colour. My theme says the text colour is white. So now I am typing this on white-on-white.... It is rather annoying, please fix.

(I hope I did not make many typos here...)

20 Apr 2008 at 09:35 AM | aoeu

fnoll's GravatarGreat tool! :) And yet another artis invisible for Normaliser, though it is in musicbrainz database: Cinq G - http://www.lastfm.pl/music/Cinq+G I hope it will be fixed in the future. :)

26 May 2008 at 02:43 PM | fnoll

nick's GravatarThis would be much more interesting if it considered the top 100 or 200 in constructing the normalized top 50

14 Jun 2008 at 03:52 PM | nick


Add a comment

  Your name is required.
  Your email address is required.
        

  Please enter the answer in figures (type 12 NOT twelve).
 
  NB - We will not publish or disclose your email address to third parties. We require it so we can check you're not a nasty spambot, and so we can display your Gravatar if you have one. Apologies for the little arithmetic test, but we've been having terrible trouble with comment spam.

Latest blog entries

Blog archive

Categories


www.associativetrails.com