Spike Lee and Netflix


Having just finished and enjoyed Inside Man, it occurred to me I've been neglecting Spike Lee's work of late. So I was checking out his Netflix page to see what I should rent next, when I was struck by how low his ratings seem. I guess they aren't too bad, but no movies crack four stars, and quite a few seem to hover between two and three. And Inside Man gets the best mark? Finally, no DVD release of She's Gotta Have It?

Oh my goodness! The NYTimes reports that your pickle has come in and your prayers have been answered.

Netflix, the popular online movie rental service, is planning to award $1 million to the first person who can improve the accuracy of movie recommendations based on personal preferences.

To win the prize, which is to be announced today, a contestant will have to devise a system that is more accurate than the company’s current recommendation system by at least 10 percent. And to improve the quality of research, Netflix is making available to the public 100 million of its customers’ movie ratings, a database the company says is the largest of its kind ever released.


“The data set is the big deal here,” Mr. [James] Bennett said.

Netflix has already used its data set to test the accuracy of its existing recommendation system, so it will be able to gauge the accuracy of each entrant’s set of predictions, executives said.

Mr. [Reed] Hastings said he thought it was important to make the ratings database widely available. “Unless you work at Microsoft research or Yahoo research or for Jim Bennett here at Netflix, you won’t have access to a large data set,” he said. “The beauty of the Netflix prize is you can be a mathematician in Romania or a statistician in Taiwan, and you could be the winner.”The downside is that you are still a mathematician in Romania or a statistician in Taiwan.

Another problem is that Netflix owns the submissions. It might be more profitable to solve the problem and then license it and/or sell it to the high(er) bidder.

I totally know how to win this.

Man, winning would be impressive indeed. Folks have been throwing money and smart people at this problem for 15 years, so if you have a 10% improvement hidden up your sleeve I would run, not walk, straight to the patent office.

I was going to download the files myself and have a look, even though I'm sure I'm very out of my league. But somebody posted the README that comes with the files, and it describes the data you'll receive. It doesn't seem to me like there's enough information there. All you get is 17,770 movies, 480,189 users, and how those users rated those movies on a scale from 1 to 5 (plus the date they did the rating). The killer is that not only do you get no meta-info on the movies, but there are precious few ways of retrieving that meta-info from a third party:

- MovieID do not correspond to actual Netflix movie ids or IMDB movie ids.
- YearOfRelease can range from 1890 to 2005 and may correspond to the release of corresponding DVD, not necessarily its theaterical release.
- Title is the Netflix movie title and may not correspond to titles used on other sites. Titles are in English.

I dunno, but doesn't that take all the fun out if? No genres, even! Can you still do it?

I think if Netflix really wanted to solve this problem they would expose all their data—movie info, customer info, ratings, lists, etc.—as an API and let the web apps flourish.

I have no problem with things being thrown at me.

I actually don't think that Netflix wants anyone (else) to "solve" this problem. I'm fairly sure that they don't think that there is a solution. They don't plan on ever paying up... and a $50,000 "Progress Prize" annual investment in disingenous insurance is pretty cheap. Much like the Cuban missive crisis I think this is a way to conduct a self-selected focus-group of very smart/driven people. Not only does Netflix benefit from all of this... they get the intellectual property rights.

Reading the description of the data set just convinces me of this. They obviously don't want to reveal any of their proprietary techniques. They seem to want people to spend their time tinkering and tuning some kind of Bayesian algorithm to make Cinematch™ purr like a kitten. 10% is far too high a bar to reach based upon the data supplied by Netflix. So it all seems very much like a contest to tune/tie your combination sitar/bow tie... and hand out packs of gum to the "winners." I suspected as much.

Okay... I didn't expect to get only user id, title, rating and date. I though that there would be more (much more) metadata.

I actually think that "genres" serve to confuse things more than they clarify. I know that Netflix has clusters (and almonds and nougat) but that seems a very Yahoo-index kind of solution. I think that this is what google is slaving away at: transforming a search into a recommendation. Search is all about a database, its contents and how they relate. google has done a great job making search (and tons of other things/apps) easy to use. Recommend is all about an individual user and the synergy she has with the database, etc. Soon all of your cookies will be saved/stored on-line.

(I think that) To improve how a databse works you can improve the data set to be collected. You can make the data itself more accurate. You can invent a better processing method for the data you want to end up working with. Sift the info more intelligently. Create the best algorithm you can think of. And then, if you still have the energy, create a feedback-analysis process. But it all really has to be done in (sorta) that order.

Remember the old saying: "Garbage in, cribbage out."

I think I've thought of ways to improve/rework all of those steps. For instance: I think I could add a decimal point (maybe even two!) to the accuracy of an indiviual's ratings. Right now it is integers from 1 to 5. Think what fun we could have if we knew that your *Four Star* rating for The Mask was actually a 3.88? But I had planned to shake that info out of the metadata.

I also believe I have some strategies to improve the relational structure of their library database. I think there's a way to make The Mask and Lemony Snicket's A Series of Unfortunate Events much more closely related to Man on the Moon and Dumb & Dumber than they are to Son of the Mask... in spite of the fact that they belong in either the Children's or Sequels categories.

Genres... they only serve to obscure and screw things up. The day that Netflix (or any other capitalist running dog) lets a thousand flowers/web apps bloom is the day I turn in my little red book/envelope.

Whoa, maybe Netflix is looking at money well spent. 5.5% improvement isn't too shabby!

The day that Netflix (or any other capitalist running dog) lets a thousand flowers/web apps bloom is the day I turn in my little red book/envelope.

You don't think we're well on our way already? Amazon's API is already quite rich, and the flower's are a-bloomin'. Heck, I even grew a little dandelion of my very own...

And a very nice app it is.

I'd like to know how much entrée Amazon provides. I may not understand the landscape and/or you but I was referring to Netflix exposing "all their data-movie info, customer info, ratings, lists, etc.-as an API" and more. I was thinking about a company providing all of their raw data... and possibly their proprietary methods. In my metaphorical mind this means access access to the kitchen and all of its ingredients... and possibly the recipes.

I doubt that Netflix is going to pop the top off of Cinematch™ and let people make their own dandelion green salads which they could then pass around the table. Even if they provided just the raw data (all of it) you'd have the ingredients to reverse engineer a recommendation cookbook. If I were Netflix there is no way that I would allow people to input movie ratings and get a personal recommendation of what dish to order next... unless, that is, all of the ordering is done through Netflix.

You know better than I if Amazon is being open source about its ordering/pricing practices. It strikes me that if it was this would allow/encourage people to order directly from used booksellers. Maybe this is the case and it is only the lazy/uninformed who order off of Amazon's menu, in which case they would be getting their...

...just desserts.

But mostly it was a chance to make a Chairman Mao joke.

Amazon isn't completely open source. Much of how they crunch their own data, how their warehouses operate, etc. is under wraps. But they have a variety of web services, and they expose a TON of their data (that's just a quick overview of the bigger picture). They are also doing very interesting work in sharing their vast computing resources with developers via S3 and EC2. Check out the SmugMug success story (also, JungleDisk looks very cool as a consumer desktop app using S3).

You can even piggyback on their fulfillment skills these days. You could start the next Amazon pretty much on Amazon's back, if you had the right idea.

As for the used bookstores, I don't think Amazon puts you directly in touch with them, but they are more than happy to have you order through them. As a consumer, this arrangement makes me pretty happy. I'm much happier to have my credit card in one company's hands (Amazon's) than many (little Mom & Pop used bookstores all over the place).

Well... now I'm confused. Perhaps that's always been the case.

I think I understand the silver bells, but the cockle-shells are very confusing and I've never seen the sense in putting pretty maids all in a row

And I still can't escape the feeling that Amazon is one giant pyramid scheme.

Oh, and I like this particular flower.

You do? I'd go with Fimoculous' suggestion: Add porn titles to the database. Er, I'd go with Fimoculous' suggestion if that would win me a million dollars and not contribute to the destruction of our society.

Speaking of Spike Lee, New York Magazine has a new 6-page spread on the Spikester.

I noticed that, thanks!

You haven't seen 25th Hour? IMHO it's Lee's best.

I only think 25th Hour is his second best (you just can't beat Do The Right Thing in my book) but 25th Hour is still pretty fawesome.

if you are looking for a good spike lee movie i think you should consider seeing malcolm x, by my opinion its quite good and except for the obvious ending not very predictable as far as concern the directing and presenting the story by lee. i like some other his works too but i think that its just me :))) (like clockers, do the right thing and mo'better blues)

but you should know that inside man was quite a diffrent movie from what we were used to from lee.
so don't think to find that in any other of his films

well hope to be helpful and enjoy your viewings :D

"inside man was quite a diffrent movie from what we were used to from lee." Exactly. It is the least challenging of his films. Perhaps I should say "most audience-friendly." That, combined with the small sample of (assumedly) early adopter Spike Lee fans, might explain the high rating.

Besides, how many "Urban" Netflix offerings get high ratings? (Use your decoder ring.) She's Gotta Have It would definitely look out of place in the Urban Comedies category. Just another case of the Man trying to keep us down.

Do The Right Thing is one of the greatest, if not the greatest, American movies ever made. As such I'm surprised that it gets such a high rating. I think that Spike Lee never tries to make an "crowd pleaser" of a movie. Quite the opposite, I think he wants to make an audience "Wake Uuuup!"

I can't wait to see When the Levees Broke. But I'll have to...

I was already wearing my decoder ring when I posed the question. The "people who rate like me" pickle makes it even more interesting.

Oh yes, oh yes, I too am looking forward to Levees!

Thanks! So far I've seen:

Do the Right Thing
Get on the Bus
Inside Man
Jungle Fever
Malcolm X
Mo' Better Blues
The Original Kinggs of Comedy
She's Gotta Have It

... and I thought they ranged from good to excellent, which is why I was surprised by the low Netflix ratings.

I've seen most of his classics, but more recently I've missed his last five movies or so before finally catching Inside Man. There may be good reason for that though, as those are generally his lowest-rated movies.

Have I missed any good ones?

Of what you've not watched and I have seen I'd say School Daze (1988), Summer of Sam (1999) and 25th Hour (2002) are worth watching. Keep in mind School Daze is a musical!

You've seen way more Spike Lee than I have. I hope you don't mind, I posted your observations on my netflixfan blog, along with a screen cap of my ratings.

Don't mind at all, thanks for the link!

I'm amazed at the amount of output Spike's had in the last 29 years. IMDB has him listed as directing 41 titles. I realize not all of them are feature films but you have to admit he's not squandering his talent by a long shot. I also really like that fact that Spike has always elevated and promoted other African-American talent in front of and behind the camera.

When I go to Spike Lee's page on Netflix I see the following...

25th Hour 3 3/4 Stars
4 Little Girls 4 1/4 Stars
Bamboozled 3 1/4 Stars
Clockers 3 1/2 Stars
Crooklyn 3 7/8 Stars
Do the Right Thing 3 1/2 Stars
Get on the Bus 3 3/4 Stars
Girl 6 2 3/4 Stars
He Got Game 3 Stars
Inside Man 3 7/8 Stars
Jungle Fever 3 1/8 Stars
Malcolm X 3 7/8 Stars
Mo' Better Blues 3 1/4 Stars
School Daze 3 1/8 Stars
She Hate Me 2 7/8 Stars
Summer of Sam 3 1/8 Stars

Do those look like what you are seeing?

No, neat... I put mine next to yours in parens. Where my ratings are better I made them bold. The rest are worse.

25th Hour 3 3/4 Stars (3 1/2)
4 Little Girls 4 1/4 Stars (3 7/8)
Bamboozled 3 1/4 Stars (2 1/2)
Clockers 3 1/2 Stars (3 1/8)
Crooklyn 3 7/8 Stars (3)
Do the Right Thing 3 1/2 Stars (3 3/4)
Get on the Bus 3 3/4 Stars (2 1/2)
Girl 6 2 3/4 Stars (2 7/8)
He Got Game 3 Stars (3 1/4)
Inside Man 3 7/8 Stars (4)
Jungle Fever 3 1/8 Stars (2 7/8)
Malcolm X 3 7/8 Stars (3 1/2)
Mo' Better Blues 3 1/4 Stars (2 7/8)
School Daze 3 1/8 Stars (2 1/8)
She Hate Me 2 7/8 Stars (2 3/4)
Summer of Sam 3 1/8 Stars (2 7/8)

Netflix displays the red stars you see according to how you rate. "People who rate like you" are giving him those ratings. When I look at his page, I see only two 2-star movies, the rest are 3, 3.5 or 4. I'm going to blog about this and see if anyone else sees it any differently.

So, if I disagree with the ratings I see, does this mean that "people who rate like" me are idiots?

I suspected as much.

Yup, 'fraid so.

Wow, I had no idea, thanks for the insight!