Last.fm normaliser - algorithm change

02 July 2007 | Matt Perdeaux |

After much head-scratching and a few hairy moments with the database this afternoon, we have updated the Last.fm Normaliser to use median track length values in its calculations, rather than the arithmetic mean values used previously.

Hopefully, this should smooth out a lot of the issues people were reporting with a handful of extra-long or extra-short tracks skewing the figures for a particular artist or album.

46 Comments

Ed Summers's GravatarYour xml needs to escape ampersands:

see http://www.associativetrails.com/stuff/normalisefm/index.cfm?user=inkcow&chart=artist&format=xml

for an example.

02 Jul 2007 at 10:28 PM | Ed Summers

harveydrone's GravatarThis is really cool. For artists, I think median makes the results much more realistic. For albums (maybe just for my listening habits), the arithmetic mean was much more accurate than median, because if I've listened to say 80 tracks from an album with 8 tracks, most likely I've listened to each track 10 times.

Maybe consider some kind of outlier-removal algorithm (eg take an average track length of all tracks except those > 1 std dev from the mean).

03 Jul 2007 at 12:56 PM | harveydrone

henriquemaia's GravatarHi, I love this normaliser approach. The only thing I find now weird is that In my case I have Mahler on the top and Bruckner in 3rd when I’m positive that I have listened more time to Bruckner that to Mahler. The median approach is not very precise in my particular case.

06 Jul 2007 at 09:26 PM | henriquemaia

error's GravatarI think that KoRn can't be normalised in this page. i think the problem is in the KoRn name because 'R' in the name of band is reserve. can you repair it?

12 Jul 2007 at 08:02 AM | error

Rednyrg721's GravatarI wonder whether you count tracks with length < 30 seconds when calculating median. It seems to me now they are taken into account (take artist Le Scrawl for example). But last.fm doesn't count them, so it seems to me you shouldn't too.

13 Jul 2007 at 04:05 PM | Rednyrg721

adrian's GravatarIt seems like when checking to albums, lose the no. 1 album from list...

20 Jul 2007 at 09:59 PM | adrian

Monk's GravatarStill one of my top-rated last.fm artists doesn't show in normalized top..

I am last.fm/user/Monkbel, and N.R.M. doesn't show up, though their musicbrainz page is full of info.

28 Jul 2007 at 07:43 AM | Monk

Claudio's GravatarGreat tool!

Could you please escape special chars? See here for related XML error with the ñ char, for instance:

http://www.validome.org/xml/validate/?lang=en&url=http://www.associativetrails.com/stuff/no
rmalisefm/index.cfm%3fuser=csaavedra%26chart=album%26format=xml

31 Jul 2007 at 02:33 PM | Claudio

Jason Daniels's GravatarHow about giving a default average to artists that aren't listed in the database? That way they don't drop off completely from the list.

14 Aug 2007 at 12:29 PM | Jason Daniels

Sergej Barbarisch's GravatarCan it come into question to take the top100 (or mb the top75) list to create a normalised top50 chart? I think it'd be interesting to see which artists would make it top50 lengthwise.. (Paysage d'Hiver <3)

20 Aug 2007 at 05:21 AM | Sergej Barbarisch

ice cream's Gravatarawesome!

could you make the normaliser work for users with spaces in their names?

16 Sep 2007 at 03:39 PM | ice cream

Steve's GravatarI've got a little list going next to my overall top artists of my top 15 normalized artists, but its a real pain to update it every week. It would be neat if there was a way to auto-update this list! But I guess that would be pretty intensive for the author... Just a thought :)

20 Sep 2007 at 02:08 PM | Steve

Matt Perdeaux's GravatarThe normaliser should now be pumping out proper UTF-8 XML, and will work for last.fm usernames with spaces.

24 Sep 2007 at 09:27 AM | Matt Perdeaux

Pixieguts's GravatarI run a group for independent artists at Last.fm.

Just wondering why all the artists who do not yet have a 'known name' dropped off my statistics on this.

Like: Titee, Electromagnetic Impulses, Remergence, Man Made Man, Phillip Wilkerson, DJ Satori The Nucleator - all in my Top Ten artists at Last.fm and nowhere to be seen on the results here.

27 Sep 2007 at 04:33 AM | Pixieguts

Matt Perdeaux's Gravatar@Pixieguts

There are numerous reasons why a certain artist may not appear in the normalised rankings. The Help/FAQ page goes through the various different causes, but if you're talking about emerging artists, its probably that they are either not in MusicBrainz yet, or the Last.fm data feed hasn't made the link to the MusicBrainz ID.

27 Sep 2007 at 06:31 AM | Matt Perdeaux

inkbot's GravatarI don't really understand how I can make an artist name that's stored with numerous albums at MusicBrainz anyway (Nick Cave & The Bad Seeds) appear in my ranking - Because it's not there.
And clicking on "Refresh Cache" doesn't change anything.

18 Nov 2007 at 06:47 AM | inkbot

Matt Perdeaux's Gravatar@inkbot

I get the same problem with Nick Cave & The Bad Seeds. The problem is that the data feed from last.fm does not contain a unique Musicbrainz ID for the band - see http://ws.audioscrobbler.com/1.0/user/iinkbot/topartists.xml

19 Nov 2007 at 05:44 AM | Matt Perdeaux

inkbot's Gravatarcan this be fixed?

21 Nov 2007 at 02:59 PM | inkbot

Lawliet's GravatarIs there a way for us to get a "feed" of the normalized data calculated? It'll be useful to do something with this data.

04 Dec 2007 at 10:03 PM | Lawliet

Lawliet's GravatarOh, my bad, I just saw the XML link. Please ignore. =X

04 Dec 2007 at 10:05 PM | Lawliet

Herbstlied's GravatarSome of the artists do not show on the normalised charts. The Angelic Process, for example.

15 Dec 2007 at 11:10 PM | Herbstlied

Angel's GravatarShe Wants Revenge seems to be a bit too short. I've listened to them over 400 times, but it only estimates like 30 minutes. I looked at the thing, and it's apparently counting all the 4 second silence tracks, and there's a lot of them. Maybe you could make it so it doesn't count tracks under like 10 seconds or so. Because none of the 400 tracks I've listened to were the silence ones, so SWR is seriously underrepresented on my chart.

07 Mar 2008 at 05:05 PM | Angel

http://www.last.fm/user/umbrella-ella/'s GravatarGet a widget going! This would be way cooler if you could attach a mini version to your profile

14 Apr 2008 at 07:20 AM | http://www.last.fm/user/umbrella-ella/

aoeu's Gravatarhey

You currently list for your input boxes in forms the following CSS:

background: #fff url("../_img/fieldbg.gif") repeat-x top;

You forget to declare the foreground colour. My theme says the text colour is white. So now I am typing this on white-on-white.... It is rather annoying, please fix.

(I hope I did not make many typos here...)

20 Apr 2008 at 09:35 AM | aoeu

fnoll's GravatarGreat tool! :) And yet another artis invisible for Normaliser, though it is in musicbrainz database: Cinq G - http://www.lastfm.pl/music/Cinq+G I hope it will be fixed in the future. :)

26 May 2008 at 02:43 PM | fnoll

nick's GravatarThis would be much more interesting if it considered the top 100 or 200 in constructing the normalized top 50

14 Jun 2008 at 03:52 PM | nick

Freeman's GravatarYou need to support Unicode for non-English artists like ?????????.

06 Jul 2008 at 07:38 PM | Freeman

Freeman's GravatarYou need to support Unicode for non-English artists like ?????????.

06 Jul 2008 at 07:39 PM | Freeman

DeadLugosi's GravatarI think this is just great, especially for people who don't always listen to mainstream artists who only make 3 minutes songs. On your site, AMM is my second favorite band, while it's my 12th on lastfm, and I think AMM is *actually* my second or first favorite band, it should be used on lastfm, or at least one should be able to choose between this method and the other: Everything by Teenage Jesus and the Jerks is 20 minutes long and contains 12 songs while Amarok by Mike Oldfield is 1 hour long and is just a single song, in this context this is not really fair, I think, to use the current lastfm method

11 Jul 2008 at 11:14 AM | DeadLugosi

malpa's GravatarWhats going on it doesnt work any more ...

16 Jul 2008 at 08:04 PM | malpa

Aragorn_54's GravatarHey, nice idea you had :)

However, I would like to pinpoint two little problems I have with the 'normalisation' of my charts, concerning two classical music composers: Camille Saint-Saëns and Edvard Grieg.

The normaliser says Saint-Saëns must be considerably higher (8 positions higher) but I mostly listen to "Le carnaval des animaux" in which most songs are less than 2 minutes long.

At the same time, it says Edvard Grieg should be lower (3 positions lower), as if his songs were shorter than the average, but I've listened very often to his "Concerto For Piano In A Minor, Op 16, 1st movement: Allegro Molto Moderato", which is around 13 minutes long.

So, I would suggest you re-calculate the average song length for each of these two composers, if possible.

Great job besides that :)

10 Nov 2008 at 03:48 PM | Aragorn_54

Justin T.'s GravatarPerfect idea, except one of the artists aren't represented. Nicklas 'Nifflas' Nygren isn't on my list so do you have it as an artist?

03 Dec 2008 at 10:12 PM | Justin T.

Avinash Meetoo's GravatarI've been listening to In Rainbows by Radiohead a lot lately but this album does not appear in my top albums listing even though it is (obviously) in the MusicBrainz! database... My last.fm username is avinash by the way if you want to check.

Apart from that, congratulations :-)

05 Dec 2008 at 01:08 AM | Avinash Meetoo

John Beak's GravatarI got a problem with some bands not showing up in my profile. To be certain, Innocens are my top ten artist and they're not there.
I thought earlier that it might be caused by the information about the albums/song length is missing in Last.fm, but it's still the same after it was added. Then I read on the page about MusicBrainz, checked if it's there and added the missing album to the database (the other one was there already). I refreshed it on this site, but Innocens sttill don't appear in the charts... I waited few weeks, refreshed again, waited again... Nothing happens. What's wrong?
I've read something about Last.fm connecting teh feeds to MusicBrainz. Is there a way I can add the relations? I would do so for more bands in the future whenevr I find there's something missing.

05 Dec 2008 at 02:18 PM | John Beak

Matt Perdeaux's Gravatar@Justin T - I can't find Nicklas 'Nifflas' Nygren in the Musicbrainz database.

@Avinash - I've checked a few last.fm feeds (including my own), and "In Rainbows" doesn't have a Musicbrainz ID associated with it. Shame really.

@John Beak - Innocens do seem to be in the cache. Does your feed associate a Musicbrainz ID with them?

12 Dec 2008 at 05:47 AM | Matt Perdeaux

John Beak's GravatarI checked the feed I get for weekly charts and the mbid element for Innocens seems broken (only the closing tag is there). I added various stuff about the band to MusicBrainz a day or two ago and it was applied to the site and already shows up on Last.fm; I'm still waiting if it will kick in in the feed (shall see tomorrow with the next update). If it still doesn't work, I will probably need help, or something.
Is there anything more I can do?

21 Dec 2008 at 05:41 PM | John Beak

John Beak's GravatarHappy New Year! (at least i am supposed to say it being the first one to post here). Bump. It still doesn't work.

16 Jan 2009 at 08:57 AM | John Beak

lauschemaedchen's Gravatarpretty cool stuff, I like this! :)

18 Jan 2009 at 06:42 AM | lauschemaedchen

Matt Perdeaux's Gravatar@John Beak - It looks like the problem is that Innocens don't have a MusibrainzID attached to them in the last.fm database. The XML still shows a blank tag. I don't know how often last.fm review their MusicbrainzIDs, sorry :(

23 Jan 2009 at 06:35 AM | Matt Perdeaux

John Beak's GravatarOkay, thanks Matt. Looks like there's nothing we can do but wait now. Thanks anyway :)

24 Jan 2009 at 08:54 AM | John Beak

exdeath's GravatarWhy not make the normalizer able to make a charts based on the top list by executions (the last.fm one), and top list by time (the normaliser one). This chart, would get and rank in last.fm char and add the rank in the normalizer chart. Lower the number, higher the rank in this new chart would be.

This would the best of the two worlds.

08 Feb 2009 at 11:08 AM | exdeath

miriam's Gravatarnick cave & the bad seeds - at musicbrainz is under this. on lastfm it autocorrects to nick cave AND the bad seeds, hence it dropping off the charts. please can we fix this? maybe make it so 'and' and '&' are viewed as interchangeable, this would probably save a lot of problems with other bands, tegan & sara for example, etc.

09 Jun 2009 at 04:14 AM | miriam

J. Lizard's GravatarOne thing that bugs me is the disc separation made by Musicbrainz. I listen to an album, not just one disc. This makes my charts screwed up. Last.fm has started to gather discs into albums. I certainly hope that this model will be used by Musicbrainz aswell.

This site is great... Musicbrainz isn't

28 Jun 2009 at 12:52 PM | J. Lizard

Angela Close's GravatarI don't believe your source data is very accurate. It totally misses my most played album, Eyes Open by Snow Patrol. This is not a rare or brand new album, and it is spelled correctly and is listed correctly in Last.fm

28 Jul 2009 at 09:51 PM | Angela Close

KZ's GravatarLove this, but completely unusable for me, since the vast majority of my artists shows up as ??????

Please support UNICODE!

01 Aug 2009 at 03:57 PM | KZ

Kennet Klotkuk's GravatarIt completely ignores my scrobbles of DJ Tiësto (my no. 1 listened artist based on play count)

05 Sep 2009 at 01:53 AM | Kennet Klotkuk


Add a comment

  Your name is required.
  Your email address is required.
        

  Please enter the answer in figures (type 12 NOT twelve).
 
  NB - We will not publish or disclose your email address to third parties. We require it so we can check you're not a nasty spambot, and so we can display your Gravatar if you have one. Apologies for the little arithmetic test, but we've been having terrible trouble with comment spam.

Matt's latest tweets

Loading...

Latest blog entries

Blog archive

Categories


www.associativetrails.com