bugs

Forum > Bugs > Investigate correctness issues with tournament

~~Reply To Thread~~ (login)

Ender [1] Administrator 2 2015-02-26 01:38:44 🔗 [10 years, 159 days ago]	As discussed on the tournament announcement thread, there's some skepticism about the results. Category 1 in particular. Here's the plan for dealing with this: Get per-bot results posted - This will show how each bot did against each other bot in its category. In particular, it will show the percent HP remaining of the winner which can be used as a rough estimate of how close the battle was. Sift through per-bot results for anomalies - Look for examples of weak bots defeating many stronger bots. One-off variance is to be expected, so the focus of this is statistically improbable events, such as a level 2xx bot that's expected to lose 100 of 100 fights against a series of level 3xx bots, but then defeating them all with 100% HP remaining. Replay battles for any suspect matchups in a loop to determine actual probability of winning - This will help separate "lucky" (e.g. win 1 in 4) from "uncommon" (e.g. win 1 in 10) from "rare" (e.g. win 1 in 40) from "super rare" (e.g. win 1 in 1000). Note that with 5,206 battles having taken place in the tournament, it's probably not unexpected to see some 1 in 1,000 upsets. Note that there's a high degree of variance with bots4 tournaments compared to bots2 tournaments because each pair of bots only fights once (I think each pair fought 5 times in bots2). This variance brings in more of a luck factor a greater chance to see upsets. Whether this needs to or should be tuned is a separate discussion, but I'm pointing this out because it's a possible explanation for a "favorite" not winning. Anyway, I should have time to work on step 1 tomorrow night. The data is there, I just need to build a view for it, which shouldn't be too much work. Sit tight until then.
Jans [90] 5 2015-02-26 01:46:57 🔗 [10 years, 159 days ago]	viewing per-bot results = fun \o/
ActiveX [270] Head Moderator 18 2015-02-26 03:13:32 🔗 [10 years, 159 days ago]	nice plan of action wiggin ^_^ Excited about seeing the per bot results. Will make for some interesting reading! I expect to see AX take his rightful top spot in cat 1 after this investigation ^_~
Benny [44] 16 2015-02-26 03:42:21 🔗 [10 years, 159 days ago]	Nonsense Hun. You know you have To have the word "user" in your botname To win var 1 ðŸ˜‰. Anyway, I hope this will get sorted.
PeachCobbler [255] <Mount Wario> 20 2015-02-26 03:50:25 🔗 [10 years, 159 days ago]	I expect to see AX take his rightful top spot in cat 1 after this investigation ^_~ https://www.youtube.com/watch?v=FopyRHHlt3M
Gpof2 [130] <Lusitania> 127 2015-02-26 09:06:56 🔗 [10 years, 159 days ago]	Replay battles for any suspect matchups in a loop to determine actual probability of winning - This will help separate "lucky" (e.g. win 1 in 4) from "uncommon" (e.g. win 1 in 10) from "rare" (e.g. win 1 in 40) from "super rare" (e.g. win 1 in 1000). Note that with 5,206 battles having taken place in the tournament, it's probably not unexpected to see some 1 in 1,000 upsets. Just a random tidbit I felt like saying; The number of those fights can be deceiving when only looking at the big picture. The number of potential upset battles is significantly lower since many of those fights will be bots that are actually on-par.
Myriad [349] <Lusitania> 350 2015-02-27 06:08:13 🔗 [10 years, 158 days ago]	Although there were many cat winners who ended up having the best, or at least one of the best builds in their respective cat, and you could attribute the other winners to being just lucky, cat 1 allows for much less variability given the inflated life pools. The fact that it had the most illogical results (eg. more level 200-250 bots in the top 10 than level 300+ bots) points to some kind of error in the code. However, in scrolling through the forums I haven't seen much discussion as to why these results were awry - what went wrong, essentially. I've been thinking about it, and the only conclusion that I can come to is that gear damage played a much bigger role in this tournament than it does in actual fights. For example, if you look at the top 3-4 bots in cat 1, they all have excess str/dex for their gear. I'm talking 600+ dex, 500+ str etc. at the expense of con. This obviously means they have a much lower chance of their gear taking damage relative to other bots. My theory is that there was a bug in the code, which somehow made gear damage more prominent. For example, the gear damage formula was off (taking gear damage more frequently), the block formula was off (more blocks = more weapon damage), the condition formula was off (eg. reducing condition more per instance of gear damage), or any combination of the above. The end result? That the more heavily freaked bots were, ironically, at more of a disadvantage. My reasoning is that there were many good bots above 300 who must have lost 50%+ of fights to lower opponents. Personally I only scored 7 wins, so I must have lost at least 10/17 fights against the bots below 300, many of which are below 250. The only way I can see this happening is if my weapon broke during the fights, and my opponents subsequently had ample time to chip away at my defenseless bot. Obviously this doesn't explain everything, but it was the best explanation I could come up with. Anyone else have any theories?
Benny [44] 16 2015-02-27 07:39:38 🔗 [10 years, 158 days ago]	I guess if the gear wasn't repaired after each fight it would eventually cause results like we see. Add then the other factors like Pat mentioned, and things would get freaky (pun intended).
TheSteelRat [87] 4 2015-02-27 07:56:53 🔗 [10 years, 158 days ago]	Well gear condition bug would apply same way to lower level bots too. So higher level bots still would have bigger chances to win, just because of higher level and more stat points.
Leader2 [85] 21 2015-02-27 09:51:46 🔗 [10 years, 158 days ago]	I also had a think about what it could be and like you just said myriad I also thought about the condition and durability. But reverting back I thought this wouldn't just affect the bots in cat 1 however like you said if the bots weapon has broken in battle this leaves you with no chance against the other bots. Therefor giving the other bot an easy way towards victory. But I will say Everyone was warned what ever armour they had on was the armour they will be wearing. This can only make people think a little deeper when coming to winning tournaments.
ReneDescartes [220] 103 2015-02-27 10:44:38 🔗 [10 years, 158 days ago]	I've already sent a bmail to Ender and made a post in the Escapism clan forum detailing some of the issues with the tournament, but I thought I'd put snippets of both here. There are some major inconsistencies in the tournament results. It isn't just cat 1, it's a systemic problem, but the flaw hasn't affected every cat equally and that seems to have added legitimacy to some of the results. For example, several of the bots in cat 1 finished in their expected position, but many others did not. This fact has made it hard for me to see a pattern in the results, but there do seem to be trends. Below I'll detail my analysis of each cat. My point of view is unique in that I entered without question the strongest possible build into every cat from 2-16. I don't just know the top builds, I know the second, third and fourth strongest builds for different bot types. I know the win % between these builds; I know the win % of dumpers against these builds. You could argue successfully that my 53-47 win % against the second strongest build means nothing in a 1 fight tourney, but you'd have a harder time arguing my bots shouldn't finish near/at the top in every cat. Cat 1: Obvious issues. User Name winning is a bit unexpected. Lyrad/neps/TheCause being among the top bots is expected. A level 209 bot with >150 int placing just after them is statistically near impossible. For several other of these vastly inferior bots (We're talking 100+ levels difference here) to finish above bots like Myriad and Eucliwood is invalid. Cat 2: Appears to be correct. Draoi winning 8/14 is a bit unlucky and warrants a closer look. Cat 3: Some inconsistencies here. My bot Singularis winning 11/19 doesn't seem right, particularly when Detroit Dream, Xploded and Xplode are very much inferior builds and each won more than me. Interesting to note that Nos' 155-160 bots all finished with half the wins of Detroit Dream (A similar but slightly inferior build). Inconsistent. Cat 4: Somewhat questionable. Beer winning is possible but lucky. Samuli/Malachi winning 8/24 fights combined is very inconsistent. Cat 5: A cat that can be quite inconsistent owing to the near identical build everybody uses, but again some strange results. If I told you I went 23-4 against the second placed bot your reasoning would be sound if you thought I finished 1st (These results can be confirmed by the owner). I didn't though; I won 9/19 compared to the 15/19 won by second place. Thoros won 11/19 with a sub-par build. With a better build Fishwick won half as much as Thoros, compared to Esvrainzas (Who has an identical build) finishing with twice as many wins as Fishwick. Perhaps Rivan can confirm, is Duriel even workshop freaked? I could fight it to find out but I'd prefer to ask. Cat 6: Again problems. Benny and I entered the same top build and combined we won 22/40 fights. Highly unlikely, worth a closer look. Cat 7: I want to talk in detail here, there are some major flaws. My bot wins ~54% of the time against the 3rd strongest build; the build entered by NaturalBornWinner (Nobody entered the 2nd best build). The winner of this cat blas entered a bot that is nowhere near any of those top builds. I don't want to speculate as I don't know specifics but we're talking a win % against my bot of something like 30-35% at best. To further illustrate this point earlier I went 42-8 against blas (Again this can be confirmed by the owner). Yes that's a small number of fights in the grand scheme of things, yes it's statistically insignificant and yes it's possible it could win in a 1 fight tourney. But blas ended up with 60/62 wins compared to my 31/62. You might be thinking the winner had a better win % against other types of bots in cat 7. You'd be wrong though. I strategically entered ten level 114 unfreaked, standard, sub-optimal dumpers as I know they win ~14% of the time against my build and ~18% of the time against NaturalBornWinner (The next strongest bot). Remember, the winner did not have this build and so the win % against him is greater. Somehow though, all my ten of my 114 unfreaked bots finished well above my tourney bot. I'm not talking by a little bit, I'm talking two of those bots winning 47/62 compared to 31/62. Those vastly inferior level 114 bots also consistently outperformed Nos' level 119 bots which I believe are a slightly stronger variant of my build but with 5 levels of con. One possible explanation is that Nos' dumpers had condition damage from fighting, perhaps you could confirm that? Nos' half decent showroom freaked bots were also destroyed by the masses of dumpers. The results in this cat are invalid, we're talking major variations from the expected outcome that are nearly statistically impossible. Cat 8: An unlikely result but possible. I'd like to know how you dealt with ties Ender. Cat 9: Problems here. Benny's dumpers did far too well considering they are a sub-optimal, standard answerer build very much inferior to the normal build. Yet his tourney bot only won 15/33 fights with a build that wins well over 60% of the time against normal answerers (Again, the dumpers in this same cat weren't). There were a couple of other bots that finished well below those dumpers that shouldn't have and need to be investigated. Cat 10: Appear somewhat legit. One or two bots unlucky and worth investigating. Cat 11: Appears legitimate. Cat 12: Appears legitimate. Cat 13: Appears legitimate. One bot worth investigating. Cat 14: Appears legitimate. Cat 15: A lucky winner but appears legitimate. Cat 16: Appears legitimate. So, there are two somewhat consistent trends. The lower levels are largely unaffected by statistically unlikely variation. Cats with dumpers (I.e. unfreaked builds) led to inconsistent results. Bots with max gear damage (25%) didn't appear to lose more often, but I looked at this thoroughly and it is very inconsistent. In summary, I'd need to look at the results in more detail to see if a pattern and error emerges. It might be a 1 fight tourney but that doesn't explain some of the results. Being able to see individual fight logs might help a little bit. Sorry about the wall of text, I'm not one for posting on the forum so when I do it I make sure I do it well. Take your time reading it and try to think about an individual cat you might have entered a bot, not necessarily every cat as I have.
Nosferatu [277] <Valhalla> 53 2015-02-27 17:30:32 🔗 [10 years, 157 days ago]	I entered Duriel, it is 364 str Blasters. Not workshopped, only showroom. The 119s (I'm assuming you mean the Ant Colony bots) aren't mine, they belong to Benny. I agree Cat 2, and the lowest cats seems to be on par, but I also speculated more than Cat 1 was an issue. Very nice details Rene. Thanks for dropping some insight into this thread.
Adepto [250] <The Rivan Graveyard> 17 2015-02-27 23:14:18 🔗 [10 years, 157 days ago]	So after logging on this bot, and taking a gander in the showroom. I noticed all my armor, my weapon, and my shield had below 100% condition. I have not used it, so my only conclusion is that it happened during the tournament. If this is the case, and armors didn't return to 100% after each fight (in instances where bots didn't have kudos to pay for the repairs after each fight), then this would explain a lot of my bots doing poorly (and quite possibly explain a lot of issues throughout the tournament. This could be something to take a look at Ender.
Boondock Saints [120] 22 2015-02-28 00:21:24 🔗 [10 years, 157 days ago]	Detroit dream was a bot built by nos for me im surprised that my other 120 did worse then this bot as has a different build but same weaps
Ender [1] Administrator 2 2015-02-28 00:41:55 🔗 [10 years, 157 days ago]	Well damn, I totally fucked up the tournament. Thanks very much to the several people that suggested looking into the equipment condition theory. That was indeed the problem. Tournament battles were correctly affected by condition during battles, but condition did not properly reset after battles. What this means is that you might start battle 1 with 100% condition on your primary weapon, battle 2 with 92% condition on it, battle 3 with 76% condition, and so on. Category 1 was probably the most affected by this because of the high HP of bots in that category, and the therefore longer fights, and the therefore increased chances of taking equipment condition hits. There may have also been a bias that worked in favor of bots with higher ids. Because of the order that battles run, I think bots with higher ids were more likely to be matched up against "tired" bots (read: bots that have had their equipment condition worn out more), therefore giving them some advantage. And freaked bots surely got screwed by this. Consider this as me all but officially announcing that tournament edition 1 is now null and void. I will almost surely be re-running it this weekend once the code is fixed and doing some additional testing. I'll post more concrete and official plans tomorrow after thinking about this some. Sorry about this everyone. I take pride in the quality of this game and this is probably the most glaring issue bots4 has ever had so far in its 4.5 year stretch. It was just subtle enough to be able to trick an untrained eye during testing (read: me), but was obvious enough to anyone fully investing their time and energy into this game (read: you). :(
Jans [90] 5 2015-02-28 02:11:09 🔗 [10 years, 157 days ago]	Good thing you found the problem :) Looking forward to the new results!
Nosferatu [278] <Valhalla> 53 2015-02-28 02:16:46 🔗 [10 years, 157 days ago]	It takes a very big person to admit he was wrong Ender, and you have proven time and time again that when you make a mistake you are willing to accept the responsibility of it. I understand with the rush though that was the tournament, things would be over looked. You all but stated that in your initial comment about getting the tournament up and running that it would likely not go off without some sort of hitch. I'm just glad things were noticed, the issue was discovered, and moving forward THIS issue won't be one any more.
ActiveX [270] Head Moderator 18 2015-02-28 02:43:10 🔗 [10 years, 157 days ago]	nice work wiggin :)
Myriad [349] <Lusitania> 350 2015-02-28 07:28:40 🔗 [10 years, 157 days ago]	All good Ender, I'm just glad that we have tournaments at all.
DeTRoiTwhore [170] 9 2015-02-28 13:03:18 🔗 [10 years, 157 days ago]	With this being the very first tournament, having issues is to be expected, honestly. Ender, it sucks that you took so much flak over this. Thanks for providing us with more content, and good job on everyone that helped to figure out the issues with the tournament. Will be fun to see the new results!
Ender [1] Administrator 2 2015-03-01 00:27:48 🔗 [10 years, 156 days ago]	So after logging on this bot, and taking a gander in the showroom. I noticed all my armor, my weapon, and my shield had below 100% condition. This was the only thing I hadn't been able to explain until now. With the tournament bug, condition hits incorrectly carried over between battles, but as best I could tell, these condition hits were not being saved to the database. I figured out what happened though. You were actually mistaken about not using the bot. You made 2 attacks with it on 2/26 as a freaked bot and didn't have many kudos. Your condition went down as a result of this and couldn't be auto-repaired.
Star [111] 1 2015-03-01 08:24:38 🔗 [10 years, 156 days ago]	Thanks Ender for your hard work and swiftness in finding a resolution.

Forum > Bugs > Investigate correctness issues with tournament

~~Reply To Thread~~ (login)