#exult@irc.freenode.net logs for 25 Oct 2011 (GMT)

Archive Today Yesterday Tomorrow
Exult homepage


[01:46:22] --> shazza has joined #exult
[01:48:24] <-- shazza has left IRC (Client Quit)
[02:48:52] <-- Dominus2 has left IRC (Read error: Connection reset by peer)
[02:48:57] --> Dominus has joined #exult
[02:48:57] --- ChanServ gives channel operator status to Dominus
[02:53:31] <-- Darrenor64 has left IRC ()
[04:56:47] <-- Colourless has left IRC (Quit: casts improved invisibility)
[05:12:10] --> SHODAN has joined #exult
[05:49:12] <-- Sevalecan has left IRC (Ping timeout: 240 seconds)
[06:35:05] --> Sevalecan has joined #exult
[06:38:05] --> Colourless has joined #exult
[06:38:05] --- ChanServ gives channel operator status to Colourless
[07:36:08] <-- Matt_O has left IRC (Read error: Connection reset by peer)
[07:36:28] --> Matt_O has joined #exult
[12:25:42] <Marzo> Dominus: remember what I said about having to rewrite large chunks of code to use smart pointers because schedules use raw pointers to objects and that could be causing issues?
[12:26:10] <Marzo> Well, I haven't yet determined if it causes the disappearing objects, but it certainly causes crashes and other problems
[12:26:44] <Marzo> The only other solution would be to make schedules search for objects every time they are needed instead of storing pointers
[12:29:29] --> TheCycoONE has joined #exult
[13:28:27] <-- Kirben has left IRC ()
[13:34:27] <Dominus> hey marzo, first time I have some keyboard time, since friday :)
[13:34:34] <Marzo> Hi
[13:35:12] <Marzo> See above
[13:35:39] <Dominus> I guess the rewrite would be the better solution in the long run, right?
[13:35:53] <Marzo> I guess
[13:36:05] <Marzo> Otherwise, the error is just likely to pop up again
[13:36:43] <Dominus> hmm, so are you up for that?
[13:36:46] <Marzo> (someone might go: "this is too slow; we should cache these results!")
[13:36:58] <Marzo> It will take some time, but I have no idea how much
[13:37:11] <Dominus> I'm still wondering how to make a good test case for the problem
[13:37:26] <Marzo> I will start a branch and start doing stuff in it
[13:37:52] <Marzo> It is hard to get 'lucky' as it were
[13:38:32] <Marzo> In SI, it wasn't until I managed to trigger a cache-out of the chunk were the tables were that I managed to cause a problem with the waiter
[13:38:53] <Marzo> ... as the waiter was selecting a table to walk to, I mean
[13:39:06] <Dominus> so you actually got to see it yourself?
[13:39:27] <Marzo> I didn't see the disappearing objects yet, no
[13:39:39] <Marzo> I saw some memory corruption followed by a crash
[13:39:59] <Marzo> It may be related; but I would prefer to get this out of the way ASAP
[13:42:36] <Marzo> What I observed was this: cacheout causing a waiter to select a cached-out table; this caused invalid memory reads (trying to get shape and tile from cached-out tables)
[13:42:48] <Marzo> Stuff then got screwy pretty quickly, but I am not sure how
[13:43:39] <Marzo> Next thing, the dueling NPCs in the list field were trying to read data from deleted path-finder memory as if it were an NPC or object
[13:43:57] <-- SHODAN has left IRC (Remote host closed the connection)
[13:44:19] <Marzo> It crashed when an if decided it branched off to 0x0 -- a segmentation fault
[13:44:35] <Dominus> hmm, that sounds like another infinite loop thing...
[13:44:44] <Marzo> Things are connected, I just can't figure out how
[13:44:48] <Marzo> Yet
[13:47:08] <Dominus> hmm, I thought I had comitted another infinite loop crash to the bug tracker. can't find it now :(
[13:48:01] <Marzo> This one I mentioned doesn't seem to be an infinite loop bug -- it seems to be a straight invalid write to the code segment
[13:48:46] <Dominus> ah, this one, not by me, is an infinite loop I guess http://sourceforge.net/tracker/?func=detail&atid=102335&aid=3306112&group_id=2335
[13:50:20] <Dominus> didn't check up on another save if it happens every time or just with the save from the user
[14:02:03] --> SHODAN has joined #exult
[14:24:02] <Marzo> OK, I can pretty reliably reproduce a crash with the waiter by moving far away to trigger cache-out, move close enough to cache the waiter NPC back in and then run fast enough to trigger another cache-out as the NPC is looking for a table
[14:24:47] <Marzo> It is very likely that this will also cause trouble with other schedules, and for the same reason: storing raw pointers to objects
[14:25:06] <Dominus> great. how far away is triggering chace out?
[14:25:16] <Marzo> A few screens
[14:40:22] <-- SHODAN has left IRC (Quit: Shutting down.)
[15:44:23] <-- TheCycoONE has left IRC (Ping timeout: 260 seconds)
[16:09:47] --> TheCycoONE has joined #exult
[18:33:18] <sh4rm4> btw: the code which lets the npc's choose stairs is broken
[18:33:34] <sh4rm4> you can see a lot of guys sitting in the air...
[18:33:56] <sh4rm4> maybe that defect triggers the crash somehow
[18:34:28] <sh4rm4> s/stairs/chairs
[18:36:29] <sh4rm4> possibly fixing a few "visible" bugs may lead to the "fantom" bugs dissapearing
[18:36:58] <sh4rm4> or they get discovered during that work...
[18:37:21] <sh4rm4> btw: the google address sanitizer i linked last time is a very powerful tool
[18:37:43] <sh4rm4> it hunts down _any_ illegal out of bounds write
[18:38:09] <sh4rm4> a rewrite because you cant find the bug sounds like the most stupid option, imo
[18:40:01] <sh4rm4> the asan will manipulate the stack during compilation and each heap allocation in a way that any illegal access causes an immediate segfault
[18:40:05] <Marzo> I am doing a "rewrite" because the bug is damn near impossible to fix without it
[18:40:08] <sh4rm4> perfect to hunt it down
[18:40:37] <Marzo> The problem is: schedules (among other things) hold raw pointers to things that can be cached out
[18:40:50] <sh4rm4> cached "out" ?
[18:41:03] <Marzo> Deleted because they are too far offscreen
[18:41:14] <Marzo> If the things *are* cached out, the schedules (and others) don't have a way of knowing that
[18:41:33] <sh4rm4> shouldn't they be null'd then ?
[18:41:35] <Marzo> There are lots of dangling pointers resulting from that, pointing to invalid stuff
[18:41:50] <Marzo> Like I said, there is no way the schedules know that
[18:42:26] <sh4rm4> how about disabling that cache ?
[18:42:32] <sh4rm4> *out-cache
[18:42:43] <Marzo> And have the memory requirements skyrocket?
[18:43:04] <sh4rm4> hmm... ultima 7 worked fine on my 486 with 4mb ram
[18:43:17] <Marzo> Ultima 7 also had such caching routines
[18:43:25] <Marzo> Theirs were even more aggressive
[18:43:45] <sh4rm4> probably
[18:44:01] <Marzo> Not probably: certainly
[18:44:12] <Marzo> You just need to check it with a memory watch
[18:44:26] <sh4rm4> but they had a way to tag their objects in a way that whatever accesses it knows that it is not available
[18:44:32] <Marzo> (for example, using a debug version of DOSBox)
[18:44:39] <sh4rm4> i see
[18:44:45] <Marzo> Which is what we are missing
[18:45:01] <Marzo> The "rewrite" I am doing will swap normal pointers for smart pointers; these will know whether or not the stuff they point to has been deleted
[18:45:20] <sh4rm4> hmm iirc smart pointers make stuff slow
[18:45:33] <Marzo> Which is why I am doing it on a separate branch
[18:45:37] <sh4rm4> not that i am a C++ expert...
[18:45:40] <TheCycoONE> that's a rash generalization anyway
[18:45:46] <Marzo> But then again, we have a Java version
[18:45:58] <Marzo> If it is not slow in Java, it probably won
[18:46:04] <Marzo> won't be in C++
[18:46:11] <Marzo> (mistaken line break)
[18:46:13] <sh4rm4> exult in java ???
[18:47:15] <sh4rm4> anyway, i really think that you should try google's address sanitizer before you waste a lot of time rewriting big parts of code
[18:47:21] <Marzo> The android port
[18:47:28] <sh4rm4> it seems like the perfect tool for the job
[18:47:47] <Marzo> It is useless for the job -- a substantial rewrite is required anyway
[18:47:48] <sh4rm4> i have compiled asan a few months ago
[18:47:55] <sh4rm4> took me like 20 minutes
[18:48:02] <Marzo> I *have* looked into the tool
[18:48:09] <Marzo> I already know what the problem is
[18:48:12] <sh4rm4> i followed the README and everything worked as described...
[18:48:19] <Marzo> There are only two ways to fix the problem
[18:48:27] <Marzo> One is the path I am taking
[18:48:55] <Marzo> The other is preventing all schedules from caching raw pointers and have them search for objects every time they need it
[18:49:04] <Marzo> (the original games probably did the latter)
[18:49:43] <sh4rm4> ok
[18:50:09] <Marzo> asan, cppcheck, valgrind are great for finding what the problem is
[18:50:28] <Marzo> If you already know what it is, none of them help
[18:50:51] <Marzo> I *do* know what I am doing :-)
[18:50:57] <sh4rm4> fine :)
[18:51:21] <Marzo> In fact, I had already seen this problem coming several years ago and posted about it in the dev mailing list
[18:51:44] <TheCycoONE> or to keep another reference list to all schedules referencing something in the cache but I think there's no advantage over auto_ptr.
[18:51:48] <Marzo> The response was the same as you gave -- that smart pointers might slow Exult down too much
[18:52:48] <sh4rm4> they certainly have a performance penalty
[18:53:11] <sh4rm4> but maybe it's not critical at all
[18:53:21] <Marzo> Again -- given that we have a port in Java, I am not that much concerned
[18:53:35] <Marzo> It depends on how well I implement it
[18:54:33] <Marzo> Who knows: if I had swapped to smart pointers back when I first suggested it, we might already have Exult 1.6...
[18:54:52] <sh4rm4> indeed
[18:56:23] <Marzo> The idea I am working on works as such: chunks and containers hold a shared_ptr to their objects; this is a reference counting pointer that will keep the objects alive as long as the parent is alive
[18:56:54] <Marzo> Usecode will also hold a shared_ptr to the object in question
[18:57:32] <Marzo> Schedules and such will hold weak_ptr instead -- this can be converted to shared_ptr, but only if the shared_ptr is still alive
[18:57:43] <Marzo> (use count > 0)
[18:58:12] <Marzo> The shared_ptr deletes the object automatically when use count = 0, so there are no memory leaks
[18:58:32] <TheCycoONE> boost then
[18:58:41] <Marzo> C++11 actually
[18:58:52] <Marzo> The new standards have been approved in August
[18:59:18] <Marzo> So for some time, it will be a mix between TR1 and C++11 until compilers have caught up
[18:59:30] <Marzo> (but yeah, essentially boost)
[19:00:10] <sh4rm4> hmm ,,, if you use boost, it will add a lot of dependencies
[19:00:15] <sh4rm4> making it less portable
[19:00:37] <sh4rm4> also template hell when a compile error happens
[19:00:58] <Marzo> > (04:59:18 PM) Marzo: So for some time, it will be a mix between TR1 and C++11 until compilers have caught up
[19:01:08] <sh4rm4> what is TR1 ?
[19:01:13] <Marzo> The smart pointers have been lifted from boost into the new standards
[19:01:39] <Marzo> Technical Report 1, extensions for C++ some of which have been adopted by the new standards
[19:01:44] <sh4rm4> oh ok
[19:01:57] <Marzo> (and most of which were available for some time in recent compilers)
[19:02:23] <sh4rm4> as you may have noticed, i'm not a big fan of C++ anyway
[19:02:28] <Marzo> Boost itself won't enter in the matter -- at most, we would have to distribute some header files until compilers have caught up
[19:03:27] <sh4rm4> (in fact i thought about a C only conversion)
[19:03:53] <Marzo> That would require extreme masochism to pull
[19:04:01] <sh4rm4> yep...
[19:04:33] <Marzo> In my opinion, C sucks
[19:05:21] <sh4rm4> not really. it gives you a very clean set of functionality
[19:05:26] <Marzo> Several things you can do in C++ are just too hard to do in C
[19:05:32] <sh4rm4> everything comes out exactly as designed...
[19:06:01] <sh4rm4> if your function is called "test", then the symbol will be called "test"
[19:06:09] <sh4rm4> couldnt get much easier
[19:06:59] <Marzo> In C++, you can define (say) an operator+ for several base classes (matrixes, vectors, complex numbers...) and use the addition notation
[19:07:10] <sh4rm4> stuff like inheritance just makes stuff hard to follow
[19:07:21] <sh4rm4> will get even worse with the new "auto" keyword
[19:07:22] <Marzo> In C, not only you can't, but you can't overload a function "add" for the job
[19:07:48] <Marzo> You have to have (say) matrix_add, complex_add, etc...
[19:07:48] <sh4rm4> well, it'd be called vector_add
[19:07:57] <sh4rm4> giving you a clear hint whats happening
[19:08:07] <Marzo> It just gets too unwieldy to write
[19:08:37] <Marzo> Like writing 'v + w' is not just as clear (if not more) with v and w declared as vectors
[19:09:24] <sh4rm4> but the reader has to find out what + does in that specific context
[19:09:26] <Marzo> Operator (and function) overloading is done all the time in math; C++ has the advantage over C of allowing you to use it as well
[19:09:29] <sh4rm4> giving bugs more room
[19:09:51] <Marzo> With type checking, I can't see how it allows bugs to come about
[19:10:22] <sh4rm4> well, that specific + may not be overloaded, leading to wrong assumptions
[19:10:43] <Marzo> Leading to compile errors, you mean
[19:10:50] <sh4rm4> it'll add two pointers instead two vectors
[19:11:01] <Marzo> ^
[19:11:54] <Marzo> Vector u, v, w; u = v+w; // compile error: operator+(Vector, Vector) not defined
[19:12:09] <sh4rm4> mhm, how about Vector* ?
[19:12:22] <sh4rm4> it'll silently sum up the 2 pointers...
[19:12:52] <Marzo> You rarely use that anyway; that is what references are for
[19:13:17] <Marzo> You define a variable as Vector, and define your functions as taking Vector&
[19:13:38] <Marzo> (or, in some cases, Vector const&)
[19:14:08] <sh4rm4> wouldn't you rather do Vector* u = new Vector() ?
[19:14:22] <Marzo> Only if I wanted to store the pointer
[19:14:26] <sh4rm4> (if you intend to pass it on)
[19:14:32] <sh4rm4> yep
[19:14:58] <Marzo> In C++, you can just declare Vector u() and then pass the reference
[19:15:11] <Marzo> As long as it doesn't go out of scope, there are no problems
[19:15:16] <sh4rm4> yeah, that way it is stack alloced and a pointer will be passed on
[19:15:46] <Marzo> Only if you need the object to outlive the function you are on would you use a pointer
[19:16:08] <Marzo> And then, only if the variable is not a member of the current class
[19:16:34] <Marzo> You can also pass *vec if it is a pointer and the function takes a reference
[19:16:44] <Marzo> This also forces you to think about the code
[19:17:09] <sh4rm4> btw... http://harmful.cat-v.org/software/OO_programming/_pdf/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf
[19:21:02] <Marzo> Ah, yes, cache usage
[19:21:21] <Marzo> I once explained it to a professor in the Physics department
[19:21:52] <TheCycoONE> a C vs C++ war?
[19:22:05] <Marzo> He was having performance issues with his matrix diagonalization code -- for a very large matrix
[19:22:44] <Marzo> It was caused by repeated cache misses
[19:23:15] <Marzo> He changed the algorithm to a better one (he was using the naïve version) and the code became much faster
[19:24:08] <TheCycoONE> is that a dynamic programming problem?
[19:25:14] <sh4rm4> the problem is to allocate data in little chunks in specific objectz, instead one single big array
[19:25:59] <sh4rm4> thats the major cause for dwarf fortress performance problems..
[19:26:53] <sh4rm4> when it has to calculate how many blood splatters to display, what happens is for each visible entity { if entity->blood > 0 ... blah() }
[19:27:07] <Marzo> You will get no arguments for me (or most competent programmers) about storing large amounts of data in allocated memory elsewhere
[19:28:18] <Marzo> This is even another thing I had been thinking of to put in Exult -- sequential NPC and object allocation with a custom allocator
[19:28:31] <Marzo> I think I also wrote about it to the dev list some time back
[19:28:35] <sh4rm4> instead for sizeof(splatter_coords) { if invisible_area(splatter_coords[i])) blah()
[19:33:25] --> SiENcE has joined #exult
[19:34:51] <Marzo> The main problem with that pdf is that it fails to mention one thing -- that cache sizes are dependent on the processor, and so any given optimization strategy may cause worse performance on a different processor
[19:35:44] <Marzo> (for example, Intel processors usually have 2-3 MB L2 cache, while AMD ones usually have 512KB-1MB)
[19:36:36] <Marzo> Unless you know your target hardware (e.g., a console), trying to optimize the cache too much may cause harm
[19:38:05] <sh4rm4> uhm... i don't think that decreasing the number of cache misses hurts on any platform
[19:38:39] <sh4rm4> also, the code gets automatically simpler
[19:38:46] --> SHODAN has joined #exult
[19:39:27] <Marzo> Reducing the number of cache misses is one thing; trying to tune it to a CPU is another
[19:40:01] <sh4rm4> indeed, you should tune it to *any* cpu :)
[19:45:20] --> ParuNexus has joined #exult
[19:47:47] <-- ParuCodex has left IRC (Ping timeout: 260 seconds)
[19:58:26] <TheCycoONE> I think that pdf was talking about PS3 game development specifically
[20:00:20] <TheCycoONE> notably software that didn't have to be maintained for a long time. The arguments for OO are not based on performance, they're based on readability and maintainability of large projects.
[20:01:29] <TheCycoONE> Important because a lot of foss projects get dumped because they're an unmaintainable mess
[20:16:17] <-- SHODAN has left IRC (Ping timeout: 258 seconds)
[20:19:27] --> SHODAN has joined #exult
[21:20:05] <-- SHODAN has left IRC (Remote host closed the connection)
[21:33:48] <-- TheCycoONE has left IRC (Quit: KVIrc 4.0.4 Insomnia http://www.kvirc.net/)
[22:27:12] <-- SiENcE has left IRC (Ping timeout: 240 seconds)
[23:10:08] --> Kirben has joined #exult
[23:10:08] --- ChanServ gives channel operator status to Kirben
[23:10:28] --> SiENcE has joined #exult
[23:46:00] <-- SiENcE has left IRC (Ping timeout: 240 seconds)