- Removing "Template:Meetup graticule" seems to work
- I made a new Template (Template:Meetup graticule2), removing the part that generates the map, which does NOT work (as evidenced in JoannaTest)
- This I think rules out caribiner as the source of the problem
- In fact, lots of general errors suggest it might be MediaWiki itself that is causing problems
- Problems started ???
- Apache logs show logs of undefined variables, but they were there since before the problems started (i.e. 6+ weeks ago)
- I can see occasional seg faults in php, unsure about origin. Far less often than I'm seeing problems though
- The caching layer that was running is now OFF
- The database is functional and showing no issues of load, further suggesting the problem is at the PHP layer
- Thomcat noticed that after coordinates were available on 30 Sept, pages without spaces would load (sometimes slowly) but pages with spaces in the title (or underscores) would not load at all. Is this why WikiProblems was created without a space? --Thomcat (talk) 00:34, 1 October 2014 (EDT)
- No it wasn't, for example Template:New_on_the_wiki is still working fine
Would anyone object if I tried upgrading MediaWiki?
- Sounds reasonable to me. I have to admit the wiki seems pretty useless as it currently stands. Thanks for being willing to do this! Jiml (talk) 01:25, 2 October 2014 (EDT)
Finding the root
I've been looking at the meetup graticule template and tried running all the included templates individually in my sandbox. Surprisingly I managed to run them all pretty much without ever seeing the dreaded timeout (and even then it worked on the rerun). But the "big" template still doesn't work.
I noticed the "meetup links" template being veeery sluggish for what it's supposed to be doing (print a couple of links and be done) so I tried something else. Mampfred/Sandbox simply includes that template a couple of times and voilá, it consistently times out if you call the template multiple times (by including it with timeout=1) and consistently works if you call it only once (timeout=0).
This leads me to believe that the whole thing might not even a change that's related to our wiki that's causing the problem but maybe something simple as severely increased load from some other "project" running on the same infrastructure?
Unfortunately without any visibility of the server logs and/or some knowledge on what else is running on the infrastructure that's as far as I feel able to poke my head into the problem. Apologies if you guys already knew all this but at least from the description above I figured some more investigation is in order. Back over to Joannac :) - Mampfred (talk) 08:10, 2 October 2014 (EDT)
- Another thing I just thought of: in the last couple of months I got lots of random 408 errors on the wiki (not the 504 we're seeing now but a timeout as well). I wasn't sure if that was just my side of things but I didn't see any other problems anywhere else so eventually I threw the problem in IRC. Nobody confirmed the problem there back then but looking at the current situation it's likely that I've seen the first symptoms back then already. Luckily my client keeps a log of my IRC conversations so I know that I posted the problem to IRC on 2014-06-05. By that time it was going on for quite a while before I got annoyed enough to post it so I'd say the root of the problem goes as far back as the beginning of may 2014. - Mampfred (talk) 13:47, 2 October 2014 (EDT)
Code analysis in IRC
A few hours ago several hashers have collaborated on irc to identify the problem. It appears that the problem is retrieving the DOW value. We have been looking at the script and proposed a fix to Joannac, this has still to be tested --Eupeodes (talk) 14:00, 2 October 2014 (EDT)
- The wiki is working MUCH better. I still saw a time out on 2014-09 a couple of times, but maybe that is a pathological case with so many elements. Jiml (talk) 09:56, 4 October 2014 (EDT)