I'm Shameless, Selfish and Stubborn. That about covers 90% of my personality. The other 10% is at war with the current social revolution who is slowly killing our individualism.
As I've only done self-study of such matters, and only worked on private projects, I've never gotten any feedback on it. Thus I've kept bad habits from tutorials on the web, but I'm open minded, and have no hard feelings for changing my coding style. Since I'm aiming for open source, cross-platform projects using g++, I should opt-in for anything that is considered better coding practice.
Once again, thank you for filling me up on it, and I'll make sure to optimize my coding for the better ;)
Out of pure curiosity though, do you work on any open source projects?
Like I said, the programmer helping me asks me lots of things, and I ask him things back. I also avoid being vague with what I want.
The problem with "strongly disagree, disagree, neutral, agree, strongly agree" is that it doesn't allow for vast differences in opinion. If you disagree with something mildly you vote that you disagree, and then when you disagree ten times as much as that you vote that you strongly, but you're just stuck when you disagree another ten times as much as that. This may not be a problem requiring much normalization in such surveys (which are likely more meant to suggest how to avoid backlash from extremists), but very many people want ratings for art to be useful to them, and as it is now the minority that votes much lower than average has disproportionate power to influence the ratings. There's also the problem of people who only vote extremely having disproportionate effect on the ratings. I favor normalization (without forcing a certain proportion of all votes to be negative or positive) because it lets people express their polarities equally without limiting their intensities. It's true that some people are very intense about almost everything, but they don't deserve disproportionate weight from my perspective.
It's not that things suddenly become better (something that is impossible, as things do not have qualitative values on their own, but are ascribed qualitative values by humans). The confidence in an atypical rating being representative becomes greater as it gains more votes. This applies to both typically liked and typically disliked things.
The proof is in the law of large numbers. You're simply disagreeing with statistics having any worth if you disagree with this. There's no cut off point at which a sample becomes accurate short of a 100% census. It's simply that moving from a sample of 1 to a sample of 2 makes a big difference, moving from 2 to 3 a lesser big difference, and so on. It's subjective where you cut off lesser bigness from the beginning of smallness, but that's not important, since things can be defined in terms of the standard error, which is huge for tiny samples, but only a few percent for larger ones. Of course this only applies strictly to random samples, which is why Bayesian estimates are used for self-selected ones.
My view of users is also harsh, which is why I don't expect them to do what I want, but what they want. It's my problem if I can't help them, since I'm the one trying to do so. They're only trying to be honest, excepting some bad apples that are my responsibility to deal with (as opposed to treating everyone like scum in the first place, which is counterproductive from a user usefulness standpoint). Anyway, the system pissing people off actually is a system integrity problem, as people will simply do whatever it takes to limit their distress, which will not be what is intended by the system designer.
I believe in proof, too, but I look at the entire picture. If a system favors a type of accuracy that is commonly rejected to the point of rebellion then it's not as good as a less perfect but more free system that won't incite rebellions. Likewise with the suggestions I make to people. If they simply will not adopt them regardless of their ideality then there is no sense in suggesting them. Baby steps...
Like I said, you can't achieve objectivity in any real sense anyway. Everything will only be a probability unless the samples are 100% censuses, so it's best to not muddle things by not allowing people to give you some important information just because it creates inequality. Equality won't ever be perfect anyway, but it's possible to work well with the users rather than requiring them to work well with you in this case, statistically speaking.
I know a programmer who tells me that he's coding at times, so I can't know if programming is really the extremely dominantly used word. After all, I only have a sample of two! Another programmer I know is slowly helping me with my site, and he's very careful and thoughtful, so it should be okay.
I think you're misunderstanding the context in which I made that suggestion. Your system simply has no chance in hell of being implemented. I have had very similar thoughts myself (albeit with less restrictive and more constructive solutions), but there's no sense in suggesting them here, or almost anywhere. They simply will be too computationally and programmatically intensive for anyone to want to implement them. Even IMDB doesn't care about the fact that it is being sexist by using a weighting system, and it has millions of users, the vast majority of whom are disenfranchised, as laughable as the benefit of that may be, given that the remaining users cheat/troll/hate even more than is usual. Of course IMDB wants to have its top 250 list be full of "quality" things, but that has only partly worked, not to mention that quality is a fantasy in the first place.
You're also ignoring that the law of large numbers commonly evens out randomness (your examples using three voters of course would not be very accurate at all), and that a sensible Bayesian estimate can help to account for the biases of self-selected samples. It's also the case that it's hard for things to achieve high ratings without being "well done" in some sense, and it's harder for things to be favorites or otherwise highly valued if they aren't "well done". It's still better to use MAL's ratings to sift out most of the trash (even at some cost) than to throw your hands up in the air and not care. ;p
Anyway, personalized correlations are better, but MAL isn't going to start generating them any time soon. Criticker does generate them, but it does so sloppily, and things have ended up being biased towards niche things (which of course attract more genre/style fans than haters relative to something popular) that haven't really appealed to me that much. It also can't understand my strong dislike for The Godfather, including well after I rated it. If you join Criticker I must recommend adjusting your Films In Common Minimum from the default of 10% up to 25% immediately. This forces recommenders to prove themselves to you seriously, as well as eliminating those who have seen many fewer movies than you from "wasting" one of your 1000 slots, when they simply won't have that many new recommendations to give you, which of course means that they drag down the number of average recommendations for a title, which of course decreases accuracy further. It's hard enough with Criticker only counting the first ten recommendations, and even doing so sloppily, without that number decreasing, too, sometimes below the minimum of three. Of course you can simply average or even weightedly average more recommendations yourself, but that's time intensive, and you really just want to mine some useful ore from the mountain to follow up with your own processing, since it's too time intensive to process the whole mountain both directly and in an extremely careful manner.
By the way, you omitted "2 10". ;p
"A simple efficient way to punish users is to create tension. Basically a situation where either way is good and also bad, while the middle is average. In the context of MAL, this should be something like:
* more input (you can think of it as rating it close to maximum) makes comparison with other peoples lists more accurate and gives series on your list a better score, but affects series on MALs ranking much less
* less input (ie. akin to using middle numbers) makes comparison with other people harder and less effective, and also lowers the scores on your list. So as a anology to the current system, your favorite show might show closer to a 7 then the 10 your fanboy self wants.
When faced with problem like this people tend to be honest, because by being honest they gain the most of both ends."
I hope you're not falling into the "objective quality" fallacy here. The fact is that I already carefully segregate my votes and still was forced to use the entire scale. That doesn't mean that I'm just being a fangirl and a hater, but that those things seriously are that far from neutral from my perspective, and don't make sense to merge with ranks closer to neutral to me.
"Thumbs Up, Neutral and Thumbs Down" actually sucks pretty hard at filtering the "okay" from the "truly desirable". Either you vote everything neutral and thus have little power or ability to compare your ratings with those of others, or you vote little neutral and end up calling something amazing merely decent, while you also call something only mildly enjoyable the same, this also distorting your ability to compare your ratings with those of others. Even taking up the burden of artificially rating each vote as often as the others doesn't work, because some things are simply well outside of the "it was/wasn't worth it" category. I tried Pandora, so I know that this simply isn't enough information, even in conjunction with their "professional" aspect categorization system.
"Next you go though a series of common elements and choose a polarity for each one. Something like..." too many things with no incentive. People won't do it unless they understand how they benefit from doing so. Better would to not require any elemental ratings, but still show which elements caused a recommendation, so that a user could know what went wrong and thus try to fix it. Most people would rather take the chance that the system would learn their tastes and take the cost of having to personally correct it (or not) sometimes than the permanent heavy burden of having to review everything in depth on an elemental level. Your three vote system would only frustrate them more, if only subconsciously, because it wouldn't be able to understand how much importance those ratings had in any given cases very well at all.
"You need to choose all of them to actually get to 0 points. Incidentally you also need to polarize all of them to get a perfect 10 on your list for the show."
Yeah, I thought so. That's just bad, no offense. A perfect rating wouldn't be implied by merely "liking" all categories. Never mind that they have differing importances and some may even be impossible to like to some people, but there's also the issue of who gets to define which categories officially exist, whether for each individual item or in general. Just because I liked fewer things about something doesn't mean that I liked it less overall. Some things are just more important than others. Evaluations are not subjectively determined by merely checking off a group of boxes and then combining them all together equally. If you forced this people would simply become frustrated at the very wrong order of the things in their lists, and (assuming they even wanted to stay after that, or even before seeing the results of their lack of subjective control on their own subjective rankings) would simply game the results to force the correct order, which of course would make your system extremely commonly cheated, except by some fans of "objective quality".
Your graph really doesn't explain things well at all, in my opinion. There's no way to tell how strongly any minority felt about any aspect. Also, people can't easily weightedly combine these aspects in a way that actually helps them (since you don't want anything to have either vertical or horizontal intensity relative to anything else, but only pass/fail quantities), so it would just produce "WTF?" results for most people.
Beyond that, there's no such thing as a true strong or weak point, only personal and group opinions. Just because someone is in the minority on most or even all aspects doesn't justify anti-recommending the title to that person. Instead it would be better to treat every single series as a potential statistical fluke, but try to minimize the standard error across things of and not of a voter's tastes. This means bringing back intensity and weighting, but you again can't expect voters to know their own personalities extremely well, but they can likely express themselves well enough across a large enough sample.
It's not enough to say that you like comedy, for example. Comedy, like all other qualitative elements, is subjective, and it's entirely possible to love comedy, but simply have not happened to come across many things with the type of comedy that you enjoy. If you were recommended comedies because you rated your few favorite comedies high in their comedy aspects then you should simply be told why it gave you a higher proportion of bad comedies as recommendations. Then you could fix the problem by giving the system more data, rather than being expected to use only three votes to constantly rate many subjective aspects without freedom in your aspect choices, something that people just won't want to do.
The scale really doesn't matter much if you normalize intelligently. There's no need to simplify it to nearly its most basic extreme. With normalization one person could vote 7/[0-10], another could vote 5/[1-5], and another could vote e/[√2-pi]. The only trick is to not allow absurd love or hate spamming or concentration easily, while still allowing people to like or dislike lots of things without some or any of them being counted as having the wrong polarity, or without their votes counting unfairly in general. I have solutions, but I'm keeping them to myself, since I have a dream of creating a recommendation/rating website. I can't program such things, though.
Anyway, who's to say what's a masterpiece? Just let people say how they felt directly (without forcing it) and you'll have better information for them. Should R2 have a shit rating because a lot of "objective quality" fans hate on many of its key aspects? Obviously not, unless you don't really want to be very helpful. And is Naruto just more or less "average"? It's obvious that it would be silly not to recommend Naruto before a great percentage of other things to a random voter, but you need to use Bayesian estimates or similar to express that.
"good is something that has a lot more positive aspects and very few negative aspects."
Nope. Good is more often something with at least one very intensely good aspect. Just because everything else was forgettable or mediocre doesn't necessarily weigh it down highly overall. The idea of good being merely "lacking offensive flaws" leads to things that don't appeal strongly to most people being placed first. That makes no sense, as I'd easily trade 10 90% sure minimal hits for 1 10% chance of something amazing to me, and I know that others more typically than not tend to feel the same. You can always drop crappy things, but once you've never heard of something due to it being hidden you're just screwed out of ever enjoying it.
"masterpiece is..." not real. Sorry, but there's no such objective thing, and your system would only produce results that favored a lack of negatives rather than favoring the existence of any strong, truly appealing positives, since you disallow both intensity and importance within and for aspects. Also, I can't see how it would become popular enough to have tons of useful data for most people in general. That's not a true flaw per se, but it limits the feasibility of its existence and continuation to cater only to a niche group that will do all the things you ask for while not minding the severe lack of control (without just cheating to fix the problem).
I hope you don't mind my reply too much, but I'm rather sure that it's possible to allow more freedom than your suggestion appears to without having to force people to pick between the frying pain (absurd personal rankings) and the fire (even less useful recommendations than could be ideally calculated from a large enough opt-in sample).
I want to tell you that I appreciate your thorough, rational, and fair posting style, and agree with your conclusions. MAL in fact accepts that low votes have easily abusably disproportionate weight by not normalizing votes, along with accepting cheating in general by disenfranchising users extremely commonly and not using a nontrivial Bayesian vote count (~10k ~7s would currently solve the "drop boosted and niche nonsense top rank pollution problem") when calculating its ratings. If you're interested you can find my poll supporting the end of noncheater disenfranchisement (which was naturally attacked by many, including dishonestly in the poll...) on MAL on my profile or in my signature. I wouldn't mind being friends, either. ;p
All Comments (31) Comments
oh wait...
Merry Christmas! my present from me to everyone on MAL. My hardwork went into making this gif for everyone to enjoy :3
As I've only done self-study of such matters, and only worked on private projects, I've never gotten any feedback on it. Thus I've kept bad habits from tutorials on the web, but I'm open minded, and have no hard feelings for changing my coding style. Since I'm aiming for open source, cross-platform projects using g++, I should opt-in for anything that is considered better coding practice.
Once again, thank you for filling me up on it, and I'll make sure to optimize my coding for the better ;)
Out of pure curiosity though, do you work on any open source projects?
Going by your username... Are you also excited for that new game, Vindictus!? :D
The problem with "strongly disagree, disagree, neutral, agree, strongly agree" is that it doesn't allow for vast differences in opinion. If you disagree with something mildly you vote that you disagree, and then when you disagree ten times as much as that you vote that you strongly, but you're just stuck when you disagree another ten times as much as that. This may not be a problem requiring much normalization in such surveys (which are likely more meant to suggest how to avoid backlash from extremists), but very many people want ratings for art to be useful to them, and as it is now the minority that votes much lower than average has disproportionate power to influence the ratings. There's also the problem of people who only vote extremely having disproportionate effect on the ratings. I favor normalization (without forcing a certain proportion of all votes to be negative or positive) because it lets people express their polarities equally without limiting their intensities. It's true that some people are very intense about almost everything, but they don't deserve disproportionate weight from my perspective.
The proof is in the law of large numbers. You're simply disagreeing with statistics having any worth if you disagree with this. There's no cut off point at which a sample becomes accurate short of a 100% census. It's simply that moving from a sample of 1 to a sample of 2 makes a big difference, moving from 2 to 3 a lesser big difference, and so on. It's subjective where you cut off lesser bigness from the beginning of smallness, but that's not important, since things can be defined in terms of the standard error, which is huge for tiny samples, but only a few percent for larger ones. Of course this only applies strictly to random samples, which is why Bayesian estimates are used for self-selected ones.
My view of users is also harsh, which is why I don't expect them to do what I want, but what they want. It's my problem if I can't help them, since I'm the one trying to do so. They're only trying to be honest, excepting some bad apples that are my responsibility to deal with (as opposed to treating everyone like scum in the first place, which is counterproductive from a user usefulness standpoint). Anyway, the system pissing people off actually is a system integrity problem, as people will simply do whatever it takes to limit their distress, which will not be what is intended by the system designer.
I believe in proof, too, but I look at the entire picture. If a system favors a type of accuracy that is commonly rejected to the point of rebellion then it's not as good as a less perfect but more free system that won't incite rebellions. Likewise with the suggestions I make to people. If they simply will not adopt them regardless of their ideality then there is no sense in suggesting them. Baby steps...
Like I said, you can't achieve objectivity in any real sense anyway. Everything will only be a probability unless the samples are 100% censuses, so it's best to not muddle things by not allowing people to give you some important information just because it creates inequality. Equality won't ever be perfect anyway, but it's possible to work well with the users rather than requiring them to work well with you in this case, statistically speaking.
I know a programmer who tells me that he's coding at times, so I can't know if programming is really the extremely dominantly used word. After all, I only have a sample of two! Another programmer I know is slowly helping me with my site, and he's very careful and thoughtful, so it should be okay.
Thanks.
You're also ignoring that the law of large numbers commonly evens out randomness (your examples using three voters of course would not be very accurate at all), and that a sensible Bayesian estimate can help to account for the biases of self-selected samples. It's also the case that it's hard for things to achieve high ratings without being "well done" in some sense, and it's harder for things to be favorites or otherwise highly valued if they aren't "well done". It's still better to use MAL's ratings to sift out most of the trash (even at some cost) than to throw your hands up in the air and not care. ;p
Anyway, personalized correlations are better, but MAL isn't going to start generating them any time soon. Criticker does generate them, but it does so sloppily, and things have ended up being biased towards niche things (which of course attract more genre/style fans than haters relative to something popular) that haven't really appealed to me that much. It also can't understand my strong dislike for The Godfather, including well after I rated it. If you join Criticker I must recommend adjusting your Films In Common Minimum from the default of 10% up to 25% immediately. This forces recommenders to prove themselves to you seriously, as well as eliminating those who have seen many fewer movies than you from "wasting" one of your 1000 slots, when they simply won't have that many new recommendations to give you, which of course means that they drag down the number of average recommendations for a title, which of course decreases accuracy further. It's hard enough with Criticker only counting the first ten recommendations, and even doing so sloppily, without that number decreasing, too, sometimes below the minimum of three. Of course you can simply average or even weightedly average more recommendations yourself, but that's time intensive, and you really just want to mine some useful ore from the mountain to follow up with your own processing, since it's too time intensive to process the whole mountain both directly and in an extremely careful manner.
By the way, you omitted "2 10". ;p
"A simple efficient way to punish users is to create tension. Basically a situation where either way is good and also bad, while the middle is average. In the context of MAL, this should be something like:
* more input (you can think of it as rating it close to maximum) makes comparison with other peoples lists more accurate and gives series on your list a better score, but affects series on MALs ranking much less
* less input (ie. akin to using middle numbers) makes comparison with other people harder and less effective, and also lowers the scores on your list. So as a anology to the current system, your favorite show might show closer to a 7 then the 10 your fanboy self wants.
When faced with problem like this people tend to be honest, because by being honest they gain the most of both ends."
I hope you're not falling into the "objective quality" fallacy here. The fact is that I already carefully segregate my votes and still was forced to use the entire scale. That doesn't mean that I'm just being a fangirl and a hater, but that those things seriously are that far from neutral from my perspective, and don't make sense to merge with ranks closer to neutral to me.
"Thumbs Up, Neutral and Thumbs Down" actually sucks pretty hard at filtering the "okay" from the "truly desirable". Either you vote everything neutral and thus have little power or ability to compare your ratings with those of others, or you vote little neutral and end up calling something amazing merely decent, while you also call something only mildly enjoyable the same, this also distorting your ability to compare your ratings with those of others. Even taking up the burden of artificially rating each vote as often as the others doesn't work, because some things are simply well outside of the "it was/wasn't worth it" category. I tried Pandora, so I know that this simply isn't enough information, even in conjunction with their "professional" aspect categorization system.
"Next you go though a series of common elements and choose a polarity for each one. Something like..." too many things with no incentive. People won't do it unless they understand how they benefit from doing so. Better would to not require any elemental ratings, but still show which elements caused a recommendation, so that a user could know what went wrong and thus try to fix it. Most people would rather take the chance that the system would learn their tastes and take the cost of having to personally correct it (or not) sometimes than the permanent heavy burden of having to review everything in depth on an elemental level. Your three vote system would only frustrate them more, if only subconsciously, because it wouldn't be able to understand how much importance those ratings had in any given cases very well at all.
"You need to choose all of them to actually get to 0 points. Incidentally you also need to polarize all of them to get a perfect 10 on your list for the show."
Yeah, I thought so. That's just bad, no offense. A perfect rating wouldn't be implied by merely "liking" all categories. Never mind that they have differing importances and some may even be impossible to like to some people, but there's also the issue of who gets to define which categories officially exist, whether for each individual item or in general. Just because I liked fewer things about something doesn't mean that I liked it less overall. Some things are just more important than others. Evaluations are not subjectively determined by merely checking off a group of boxes and then combining them all together equally. If you forced this people would simply become frustrated at the very wrong order of the things in their lists, and (assuming they even wanted to stay after that, or even before seeing the results of their lack of subjective control on their own subjective rankings) would simply game the results to force the correct order, which of course would make your system extremely commonly cheated, except by some fans of "objective quality".
Your graph really doesn't explain things well at all, in my opinion. There's no way to tell how strongly any minority felt about any aspect. Also, people can't easily weightedly combine these aspects in a way that actually helps them (since you don't want anything to have either vertical or horizontal intensity relative to anything else, but only pass/fail quantities), so it would just produce "WTF?" results for most people.
Beyond that, there's no such thing as a true strong or weak point, only personal and group opinions. Just because someone is in the minority on most or even all aspects doesn't justify anti-recommending the title to that person. Instead it would be better to treat every single series as a potential statistical fluke, but try to minimize the standard error across things of and not of a voter's tastes. This means bringing back intensity and weighting, but you again can't expect voters to know their own personalities extremely well, but they can likely express themselves well enough across a large enough sample.
It's not enough to say that you like comedy, for example. Comedy, like all other qualitative elements, is subjective, and it's entirely possible to love comedy, but simply have not happened to come across many things with the type of comedy that you enjoy. If you were recommended comedies because you rated your few favorite comedies high in their comedy aspects then you should simply be told why it gave you a higher proportion of bad comedies as recommendations. Then you could fix the problem by giving the system more data, rather than being expected to use only three votes to constantly rate many subjective aspects without freedom in your aspect choices, something that people just won't want to do.
The scale really doesn't matter much if you normalize intelligently. There's no need to simplify it to nearly its most basic extreme. With normalization one person could vote 7/[0-10], another could vote 5/[1-5], and another could vote e/[√2-pi]. The only trick is to not allow absurd love or hate spamming or concentration easily, while still allowing people to like or dislike lots of things without some or any of them being counted as having the wrong polarity, or without their votes counting unfairly in general. I have solutions, but I'm keeping them to myself, since I have a dream of creating a recommendation/rating website. I can't program such things, though.
Anyway, who's to say what's a masterpiece? Just let people say how they felt directly (without forcing it) and you'll have better information for them. Should R2 have a shit rating because a lot of "objective quality" fans hate on many of its key aspects? Obviously not, unless you don't really want to be very helpful. And is Naruto just more or less "average"? It's obvious that it would be silly not to recommend Naruto before a great percentage of other things to a random voter, but you need to use Bayesian estimates or similar to express that.
"good is something that has a lot more positive aspects and very few negative aspects."
Nope. Good is more often something with at least one very intensely good aspect. Just because everything else was forgettable or mediocre doesn't necessarily weigh it down highly overall. The idea of good being merely "lacking offensive flaws" leads to things that don't appeal strongly to most people being placed first. That makes no sense, as I'd easily trade 10 90% sure minimal hits for 1 10% chance of something amazing to me, and I know that others more typically than not tend to feel the same. You can always drop crappy things, but once you've never heard of something due to it being hidden you're just screwed out of ever enjoying it.
"masterpiece is..." not real. Sorry, but there's no such objective thing, and your system would only produce results that favored a lack of negatives rather than favoring the existence of any strong, truly appealing positives, since you disallow both intensity and importance within and for aspects. Also, I can't see how it would become popular enough to have tons of useful data for most people in general. That's not a true flaw per se, but it limits the feasibility of its existence and continuation to cater only to a niche group that will do all the things you ask for while not minding the severe lack of control (without just cheating to fix the problem).
I hope you don't mind my reply too much, but I'm rather sure that it's possible to allow more freedom than your suggestion appears to without having to force people to pick between the frying pain (absurd personal rankings) and the fire (even less useful recommendations than could be ideally calculated from a large enough opt-in sample).
so do you have any favorite anime?
i started in 2005 but it was in 2008 that i saw 137 out of the 264 anime shows that is on my completed list.
so what kind of anime do you like?
so how long have you been watching anime?