Server admin log
From Wikitech
July 20
- 18:57 paravoid: powerin on owa1/owa2 again, they're being used as ganglia aggregators for swift
- 18:11 Ryan_Lane: starting gerrit
- 17:59 binasher: db48 is now replicating from db1048 (and db1048 from db48)
- 17:26 RobHalsell: authdns-update run for new servers
- 15:29 paravoid: powering off the rest of mw10[0-9][0-9]
- 14:35 mark: Added server hydrogen to the dns_rec eqiad LVS pool
- 14:22 paravoid: powering off owa1/2/3, unused
- 13:54 paravoid: powering off all of mw1[0-9][0-9][0-9].eqiad.wmnet, unused
- 13:36 mark: Reinstalling hydrogen
- 13:19 paravoid: powercycling hydrogen, down since yesterday
- 12:59 paravoid: powercycling nescio/ns2, unresponsive network & console
- 10:38 mark: Built new varnish 3.0.3~rc1+persistent1-wm1 packages and inserted them into the precise-wikimedia APT repository
- 02:23 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Fri Jul 20 02:23:39 UTC 2012
- 02:23 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Fri Jul 20 02:23:27 UTC 2012
July 19
- 22:48 logmsgbot: catrope Finished syncing Wikimedia installation... :
- 22:10 logmsgbot: catrope Started syncing Wikimedia installation... :
- 22:02 binasher: deploying new mobile redirector to pmtpa text squids (currently inactive)
- 21:39 logmsgbot: kaldari Started syncing Wikimedia installation... :
- 21:36 binasher: deploying new mobile redirector to eqiad text squids
- 21:36 logmsgbot: catrope synchronized php-1.20wmf7/extensions/LastModified 'Remove E3Experiments cruft from LastModified'
- 21:07 binasher: deploying new mobile redirector to esams text squids
- 18:52 preilly: fixing subdomain check for zero vs m
- 18:52 logmsgbot: preilly synchronized wmf-config/mobile.php 'fix subdomain check'
- 18:21 RobH: updating dns with new zonefiles for legally won domain names
- 18:15 Jeff_Green: storage3 dist-upgrade and reboot
- 17:01 AaronSchulz: Running copyFileBackend.php for commons (shards c-f)
- 16:28 cmjohnson1: mw60 powering down to replace DIMM B1 rt3287
- 15:48 RobH: hydrogen repaired per rt 3243
- 15:37 RobH: rebooting hydrogen to set bios redirection
- 14:18 logmsgbot: reedy synchronized wmf-config/
- 13:42 Jeff_Green: dist-upgrade and reboot hosts in payments cluster
- 13:12 Reedy: Pointed /h/w/c/php to php-1.20wmf7
- 13:10 Reedy: Deleted php-1.20wmf6/cache/l10n from mediawiki-installation
- 02:31 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Thu Jul 19 02:31:02 UTC 2012
- 02:23 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Thu Jul 19 02:23:27 UTC 2012
July 18
- 22:02 notpeter: ok, changed my mind one more time. removing srv194 from apaches pool for precise testing
- 21:41 notpeter: scratch that. removing srv289 from apaches pool for precise testing, not mw1
- 21:39 notpeter: removing mw1 from apaches pool to do precise test install
- 21:28 notpeter: adding php5 packages to precise-wikimedia repo
- 19:45 cmjohnson1: srv281 powering down for HW checks
- 19:09 cmjohnson1: srv278 powering down for hardware problems -- random reboot system already depooled
- 18:05 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: 295 remaining wikis to 1.20wmf7
- 17:37 cmjohnson1: mw8 powering down to replace DIMM A3 rt 3273
- 17:25 cmjohnson1: search32 powering down to replace DIMM B1 rt 3076
- 16:55 logmsgbot: reedy synchronized php-1.20wmf7/includes/api/ApiQuerySiteinfo.php
- 12:52 pp-pdf1: upgraded mwlib to 0.13.11, restarted all services
- 07:58 hashar: gallium: added Firefox 14 to Testswarm, disabled Firefox 13.
- 02:47 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Wed Jul 18 02:47:23 UTC 2012
- 02:24 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Wed Jul 18 02:24:14 UTC 2012
July 17
- 23:24 Tim: on srv193: removed core dump files, disabled core dumping, restarted apache
- 19:52 logmsgbot: mlitn Finished syncing Wikimedia installation... :
- 19:09 logmsgbot: mlitn Started syncing Wikimedia installation... :
- 15:18 mark: lvs1002 is back up and idling
- 15:15 mark: Fixed serial console redirection after boot to OFF on lvs1002
- 15:06 mark: lvs4 is back up and serving traffic
- 14:51 mark: Reinstalling lvs4 with Ubuntu Precise
- 14:48 mark: Stopped PyBal on lvs4, failing over traffic to lvs3
- 14:47 mark: Reinstalling lvs1002 with Ubuntu Precise
- 14:46 mark: Stopped PyBal on lvs1002, failing over traffic to lvs1005
- 14:44 mark: lvs1001 is back up and serving traffic
- 14:11 mark: Stopped PyBal on lvs1001, failing over traffic to lvs1004
- 13:09 maplebed: changed auth URL for swift to use load balancer rather than round robin DNS
- 13:08 mark: lvs1003 is back up and serving traffic
- 13:07 maplebed: changed order on ganglia swift view graphs to group by metric rather than host
- 13:02 logmsgbot: reedy synchronized wmf-config/
- 12:53 mark: Fixed boot order on lvs1003
- 12:34 mark: Reinstalling lvs1003 with Ubuntu Precise
- 12:28 mark: Stopped PyBal on lvs1003, failing over traffic to lvs1006
- 12:10 mark: amslvs1 is back up and serving traffic
- 11:44 mark: Reinstalling amslvs1 with Ubuntu Precise
- 11:40 mark: Stopped PyBal on amslvs1, failing over traffic to amslvs3
- 11:32 mark: amslvs2 is back up and serving traffic
- 10:47 mark: Reinstalling amslvs2 with Ubuntu Precise
- 10:36 mark: Stopped PyBal on amslvs2, failing over traffic to amslvs4
- 07:30 Tim: testing envvars/apache2.conf change on srv193
- 07:09 Tim: restarted apache on srv193
- 02:58 Tim: graceful restart of all apaches
- 02:52 logmsgbot: reedy synchronized wmf-config/CommonSettings.php
- 02:47 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Tue Jul 17 02:47:06 UTC 2012
- 02:23 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Tue Jul 17 02:23:41 UTC 2012
- 00:44 Tim: deploying new redirects.conf
- 00:28 Tim: testing new redirects.conf on mw1
July 16
- 23:03 pp-pdf1: restarted nserve
- 23:02 pp-pdf1: fixed nserve handling of filenames with whitespace
- 22:46 pp-pdf1: restart all services
- 22:45 pp-pdf2: restart all services
- 22:45 pp-pdf3: restart all services
- 22:45 pp-pdf1: update mwlib to 0.13.9
- 22:45 pp-pdf2: update mwlib to 0.13.9
- 22:45 pp-pdf3: update mwlib to 0.13.9
- 22:44 pp-pdf2: update greenlet to 0.4.0
- 22:44 pp-pdf1: update greenlet to 0.4.0
- 22:44 pp-pdf3: update greenlet to 0.4.0
- 22:44 pp-pdf2: upgrade qserve to 0.2.8
- 22:44 pp-pdf3: upgrade qserve to 0.2.8
- 22:43 pp-pdf1: upgrade qserve to 0.2.8
- 22:20 logmsgbot: awjrichards synchronized wmf-config/CommonSettings.php 'Disabling epub generation from Collection extension on all wikis except for simple and test'
- 22:19 logmsgbot: awjrichards synchronized wmf-config/InitialiseSettings.php 'Making epub generation for Collection extension configurable'
- 18:26 AaronSchulz: Running copyFileBackend.php for commons (shards 8-b)
- 18:02 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: enwiki to 1.20wmf7
- 17:08 LeslieCarr: reactivating XO transit connections on cr1-sdtpa
- 17:03 LeslieCarr: draining cr2-eqiad to cr1-sdtpa link for moving of fiber
- 15:56 mark: lvs3 is back up, and idling
- 15:56 mutante: fixing pt.planet after missing locale has been added
- 15:48 mutante: installing package upgrades on singer (planet)
- 15:43 cmjohnson1: mw8 shutting down and taking offline to run Dell's dset program
- 15:38 mutante: planet - svn up the config
- 15:29 mark: Reinstalling lvs3 with Ubuntu Precise
- 15:17 mark: lvs5 is back up and serving traffic
- 14:49 mark: Reinstalling lvs5 with Ubuntu Precise
- 14:45 mark: Stopped PyBal on lvs5 to failover traffic to lvs1
- 14:35 maplebed: adjusted swift rings; set new object servers to 20, new container servers to 100
- 14:33 mark: lvs6 is back up and serving traffic
- 14:12 mark: Reinstalling lvs6 with Ubuntu Precise
- 14:02 mark: Stopped PyBal on lvs6 to failover traffic to lvs2
- 13:58 mutante: mw8 - alright, most likely just needs new DIMM per cmjohnson
- 13:50 mutante: mw8 PHP fatal errors, running out of memory
- 12:56 mutante: sync-apache, graceful-all to fix wikipedia.cz redirect
- 05:31 apergos: reboot to upgrade kernel etc on mw8 since it's been flapping anyways
- 02:46 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Mon Jul 16 02:46:14 UTC 2012
- 02:23 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Mon Jul 16 02:23:04 UTC 2012
July 15
- 20:19 Aaron|home: Running copyFileBackend.php for commons (shard 7)
- 02:46 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Sun Jul 15 02:46:33 UTC 2012
- 02:23 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Sun Jul 15 02:23:27 UTC 2012
July 14
- 18:46 LeslieCarr: powercycling frozen db1029
- 18:01 LeslieCarr: rebooting unresponsive mw1111
- 16:41 logmsgbot: midom synchronized php-1.20wmf6/includes/GlobalFunctions.php 'message key analysis'
- 16:18 logmsgbot: midom synchronized php-1.20wmf6/includes/GlobalFunctions.php 'message key analysis'
- 03:07 Aaron|home: Running copyFileBackend.php for commons (shards 5-6)
- 02:47 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Sat Jul 14 02:47:32 UTC 2012
- 02:23 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Sat Jul 14 02:23:14 UTC 2012
- 00:25 logmsgbot: reedy synchronized php-1.20wmf7/includes/api/ApiDelete.php
July 13
- 18:08 logmsgbot: reedy synchronized php-1.20wmf7/includes/GlobalFunctions.php
- 18:08 logmsgbot: reedy synchronized php-1.20wmf6/includes/GlobalFunctions.php
- 17:58 logmsgbot: reedy synchronized wmf-config/
- 17:57 logmsgbot: reedy synchronized php-1.20wmf7/
- 17:56 logmsgbot: reedy synchronized php-1.20wmf6/
- 17:32 logmsgbot: reedy synchronized wmf-config/
- 17:21 AaronSchulz: Running copyFileBackend.php for commons (shards 1-4) (actually started yesterday)
- 15:28 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Disable aftv5 on en_labswikimedia due to out of date schema'
- 15:21 logmsgbot: reedy synchronized php-1.20wmf7/includes/GlobalFunctions.php
- 15:20 logmsgbot: reedy synchronized php-1.20wmf6/includes/GlobalFunctions.php
- 15:20 logmsgbot: reedy synchronized php-1.20wmf7/includes/DefaultSettings.php
- 15:19 logmsgbot: reedy synchronized php-1.20wmf6/includes/DefaultSettings.php
- 15:17 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Setting wgDBerrorLogInUTC = true'
- 15:07 hasharDeadmau5: unknown column 'af_is_featured' on en_labswikimedia@db25
- 14:55 mark: Increased all mobile varnish server weights from 10 to 100 to aid chash
- 14:54 mark: Added cp1041.eqiad.wmnet back into the mobile LVS pool
- 10:10 mutante: mw1016 - was down, reinstalling with precise
- 02:46 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Fri Jul 13 02:45:59 UTC 2012
July 12
- 22:15 Aaron|home: Running copyFileBackend.php for commons (shards 0,c-f)
- 20:58 Aaron|home: Running copyFileBackend.php for commons (shards 0,8-b)
- 14:22 mutante: apache children on srv193 keep segfaulting since yesterday (test.wp)
- 14:00 RobHalsell: shutting down srv206 for chris per rt241
- 11:55 logmsgbot: reedy synchronized wmf-config/
- 10:40 mark: Depooled mobile varnish server cp1041 for vlan/hostname change and reinstallation with precise
- 04:38 binasher: started hot backup of db48 to db1048
- 02:50 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Thu Jul 12 02:50:49 UTC 2012
- 02:29 maplebed: adjusted the swift rings a bit to move some more traffic to the new hosts. set object partitions to weight 10, account and container 60.
- 02:27 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Thu Jul 12 02:26:57 UTC 2012
July 11
- 22:52 AaronSchulz: Running copyFileBackend.php for commons (shards 0,4-7)
- 22:06 AaronSchulz: Running copyFileBackend.php for commons (shards 0,0-3)
- 21:36 binasher: running hotbackup of db48 -> db49 (otrs / external misc)
- 21:31 binasher: rebooting db49 for kernel upgrade
- 21:23 binasher: rebuilding db49 (otrs slave)
- 21:01 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php
- 20:16 logmsgbot: reedy synchronized php-1.20wmf7/extensions/CentralNotice
- 19:33 cmjohnson1: db10 swapping disk1 out rt3251
- 19:33 RobH: shutting down srv203 for hardware checking per rt3110
- 19:12 hashar: finally stopped using my production ssh key on labs.
- 19:11 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikiquote and wikiversity to 1.20wmf7
- 19:08 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikibooks and wikinews to 1.20wmf7
- 19:06 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wiktionary and wikisource to 1.20wmf7
- 19:03 cmjohnson1: search35 shutting down to reseat DIMM rt-3260
- 18:29 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Switch special, private and closed wikis from 1.20wmf6 to 1.20wmf7
- 15:16 logmsgbot: robh synchronized wmf-config/mc.php 'moving srv208 to down'
- 15:14 logmsgbot: robh synchronized wmf-config/mc.php
- 14:30 cmjohnson1: shutting down search32 to reseat DIMM rt-3076
- 11:16 mutante: yay @ svn removal
- 11:13 hashar: /home/wikipedia/conf/httpd cleaned out the svn repository. That directory is now 100% under git tracking yeah!!! Thanks TIm!
- 11:11 mutante: rebooting srv266
- 11:05 mutante: purged wikipedie.cz/Experti_na_prirodu URLs with purgeList.php
- 10:52 mutante: apache-graceful-all
- 10:48 mutante: sync-apache to push fixed wikipedie.cz redirect
- 08:36 hashar: synced /h/w/conf/httpd git and svn repositories
- 08:33 mutante: chmod -R g+w /home/wikipedia/conf/httpd on fenari to fix group write on .git/objects
- 07:59 mutante: srv266 - package/kernel upgrades
- 07:35 mutante: powercycled downed owa3, installed kernel upgrades on owa1-3 (but they are also waiting to be repurposed, RT:2511)
- 04:05 Tim: on hume: running updateCollation.php --dry-run on ptwiki to determine whether there will be any key truncation issues with bug 35632
- 02:48 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Wed Jul 11 02:48:18 UTC 2012
- 02:25 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Wed Jul 11 02:25:19 UTC 2012
July 10
- 23:47 AaronSchulz: Doing shards c-f
- 23:47 maplebed: put two new swift front ends into rotation ms-fe3 and ms-fe4
- 23:40 AaronSchulz: Doing shards 8-b
- 23:34 AaronSchulz: Doing shards 4-7
- 23:31 AaronSchulz: copyFileBackend.php run rate for above processes is at /home/aaron/NFStoSwiftCopyRate, currently 50
- 23:26 AaronSchulz: Running copyFileBackend.php for zhwiki public zone shards 0-3 on hume
- 23:20 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php
- 22:17 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php
- 22:12 logmsgbot: asher synchronized wmf-config/CommonSettings.php 'moving parsercache from db40 to pc1'
- 22:05 maplebed: put swift backends ms-be9-12 in rotation for containers (weight 30) and objects (weight 5)
- 21:17 maplebed: rebooting new ms-be6-8 to change BIOS setting to boot from disk
- 20:26 logmsgbot: mlitn Finished syncing Wikimedia installation... :
- 20:26 RobH: shutting down sq36 for hardware troubleshooting
- 19:38 LeslieCarr: restarted frozen pdns on ns2
- 19:28 logmsgbot: mlitn Started syncing Wikimedia installation... :
- 19:15 cmjohnson1: sq36-requires a hard shutdown-unresponsive to mgmt. rt-3254
- 19:05 RoanKattouw: chmod g+w /home/wikipedia/common/php-1.20wmf7/cache/l10n/
- 19:03 RoanKattouw: chown l10nupdate:wikidev /home/wikipedia/common/php-1.20wmf7/cache/l10n/
- 18:58 hashar: reworked the git repository in /home/wikipedia/conf/httpd , manually synced changes from svn to the git repo
- 18:12 RobH: srv266 shutdown for chris
- 18:06 cmjohnson1: srv266 being brought down for an extended period of time to run diagnostic tests
- 17:25 cmjohnson1: srv266 shutting down for HW troubleshooting rt-2896
- 16:33 mutante: package upgrades and kernel on niobium
- 16:28 mutante: powercycling niobium
- 15:49 mutante: powercycling mw1008,mw1070,mw1073
- 14:10 logmsgbot: reedy synchronized hastidy
- 13:26 logmsgbot: tstarling synchronized live-1.5/robots.php
- 13:17 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 13:13 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 12:53 Tim: deploying favicon.php test alias
- 12:41 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 12:30 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 12:11 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 12:10 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 12:03 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 12:02 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 11:58 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 10:10 logmsgbot: reedy synchronized images/sul/ 'crushed'
- 10:05 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php
- 07:49 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 07:46 logmsgbot: tstarling synchronized live-1.5/favicon.php
- 07:41 logmsgbot: tstarling Finished syncing Wikimedia installation... :
- 06:32 logmsgbot: tstarling Started syncing Wikimedia installation... :
- 06:14 logmsgbot: tstarling Started syncing Wikimedia installation... :
- 06:06 Tim: removing php-*/cache/l10n/l10nupdate* and running scap with the new version of scap from I8bcd2817
- 05:01 Tim: on fenari: running git submodule update --init in /var/lib/l10nupdate/mediawiki/
- 03:56 Tim: testing new scheme for LU involving not pushing out LU files to all apaches
- 02:49 logmsgbot: LocalisationUpdate completed (1.20wmf7) at Tue Jul 10 02:49:16 UTC 2012
- 01:53 brion: test.wikipedia.org fails with 'TrustedXFF: hosts file missing.'
- 01:44 Tim: on manganese: ran date -s "`date`" to make sure that isn't the cause of the high CPU usage during clone, it wasn't
- 01:44 Tim: running rebuildLocalisationCache.php for testwiki
- 01:43 brion: test.wikipedia.org broken again, "No localisation cache found for English."
- 00:51 brion: test.wikipedia is broken loading Cite.php
July 9
- 23:53 Tim: deleting and recreating php-1.20wmf7 to test a script
- 20:51 logmsgbot: reedy synchronized php-1.20wmf7/includes/media/FormatMetadata.php
- 20:51 logmsgbot: reedy synchronized php-1.20wmf6/includes/media/FormatMetadata.php
- 20:30 pp-pdf3: upgraded mwlib.epub
- 20:30 pp-pdf1: upgraded mwlib.epub
- 20:30 pp-pdf2: upgraded mwlib.epub
- 20:27 cmjohnson1: db10 swapping disk1 for new disk
- 19:41 logmsgbot: reedy synchronized php-1.20wmf6/includes/WikiPage.php
- 19:37 logmsgbot: reedy synchronized php-1.20wmf7/includes/WikiPage.php
- 18:13 logmsgbot: reedy synchronized php-1.20wmf7/extensions/CodeReview/ 'CR to trunk to fix fatals'
- 18:05 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: mediawikiwiki and testwiki to 1.20wmf7
- 15:45 logmsgbot: reedy synchronized wmf-config/ExtensionMessages-1.20wmf7.php
- 15:43 logmsgbot: reedy Finished syncing Wikimedia installation... : test2wiki to 1.20wmf7 and rebuilding localisation caches
- 15:01 logmsgbot: reedy Started syncing Wikimedia installation... : test2wiki to 1.20wmf7 and rebuilding localisation caches
- 14:54 logmsgbot: reedy Started syncing Wikimedia installation... : test2wiki to 1.20wmf7 and rebuilding localisation caches
- 14:47 logmsgbot: reedy synchronized php-1.20wmf7
- 14:17 Reedy: Copying php-1.20wmf7 from /tmp to /h/w/c on fenari
- 13:26 Reedy: Running ddsh -cM -g mediawiki-installation -o -oSetupTimeout=30 -F30 -- "sudo -u mwdeploy rm -rf /usr/local/apache/common/php-1.20wmf3"
- 13:23 Reedy: Running ddsh -cM -g mediawiki-installation -o -oSetupTimeout=30 -F30 -- "sudo -u mwdeploy rm -rf /usr/local/apache/common/php-1.20wmf2"
- 13:13 Reedy: Removed wmf2 and wmf3 from bits docroots
- 13:11 mutante: powercycling and upgrading a couple more mw* servers
- 13:04 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php
- 13:01 mutante: labs.wikimedia.org is now a redirect to labsconsole
- 12:49 mutante: authdns-update to add labs.wm entry for redirect to labsconsole
- 08:34 logmsgbot: preilly synchronized wmf-config/CommonSettings.php 'adjust $wgCollectionFormats'
- 08:16 hashar: pallium, updating jenkins build script with gerrit 14666 & gerrit 14667
- 02:26 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Mon Jul 9 02:25:57 UTC 2012
July 8
- 12:17 apergos: removed HTCPpurger.log.1 and current log ater restart of purger on ms6, /was full. people reporting thumb issues from europe
- 02:26 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Sun Jul 8 02:26:21 UTC 2012
July 7
- 16:39 logmsgbot: reedy synchronized docroot/bits/DolphinBrowser/js/ 'Remove BOM'
- 16:10 logmsgbot: reedy synchronized php-1.20wmf6/extensions/cldr/ 'remove bom'
- 02:27 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Sat Jul 7 02:27:22 UTC 2012
July 6
- 21:49 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 38227 - Please enable WikiLove on Turkish Wikipedia'
- 21:48 binasher: truncating pagetriage tables on enwiki (per bsitu)
- 21:47 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 38227 - Please enable WikiLove on Turkish Wikipedia'
- 21:46 Reedy: Created WikiLove tables on trwiki
- 21:39 logmsgbot: aaron synchronized php-1.20wmf6/includes/WikiPage.php
- 21:07 maplebed: removed a bit more load from the swift spinning media container servers by adjusting the ring weights
- 20:45 logmsgbot: aaron synchronized php-1.20wmf6/includes/WikiPage.php 'rev debug logging'
- 20:22 logmsgbot: aaron synchronized php-1.20wmf6/includes/Revision.php 'es debug logging'
- 20:20 logmsgbot: aaron synchronized php-1.20wmf6/includes/WikiPage.php 'es debug logging'
- 19:49 logmsgbot: aaron synchronized php-1.20wmf6/includes/WikiPage.php 'debug logging'
- 19:30 logmsgbot: aaron synchronized php-1.20wmf6/includes/WikiPage.php 'debug logging'
- 19:24 logmsgbot: aaron synchronized php-1.20wmf6/includes/WikiPage.php 'debug logging'
- 17:28 maplebed: adjusted swift ring files to move container listings off spinning disks (ms-be1-4)
- 17:04 logmsgbot: reedy synchronized wmf-config/
- 16:53 Reedy: Running updateCollation.php in foreachwiki on fenari in screen as reedy
- 16:45 Reedy: Running updateCollation.php against enwiki on fenari in screen as reedy
- 16:39 Reedy: Running updateCollation.php against commonswiki on fenari in screen as reedy
- 16:08 logmsgbot: reedy synchronized images/wiki-en.png
- 15:54 logmsgbot: reedy synchronized images/sul/meta.png
- 15:45 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php
- 15:36 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php
- 15:21 cmjohnson1: ms-be5 removing disk0 to replace with good disk
- 14:38 RobHalsell: authdns-update to correct typo in mgmt dns entry
- 13:22 mark: Inserted new pybal 1.04 package in the precise-wikimedia APT repository, and upgraded all precise LVS servers
- 11:31 logmsgbot: tstarling Finished syncing Wikimedia installation... :
- 11:25 logmsgbot: tstarling Started syncing Wikimedia installation... :
- 11:20 Tim: running scap to delete docroot files per I48502d90 and Ie3afd137
- 08:58 mutante: dist-upgrading (unused) db10xx servers
- 07:43 Tim: also trusted-xff.phps
- 07:39 Tim: removing some untracked junk from /home/wikipedia/common
- 07:29 mutante: continue to restart and upgrade downed mw10xx servers
- 02:26 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Fri Jul 6 02:26:22 UTC 2012
- 01:35 binasher: updated squid redirector to cover wiki(quotes|books|versity)
- 00:13 Ryan_Lane: restarting gerrit
- 00:09 Ryan_Lane: force running puppet on manganese, it'll restart gerrit
July 5
- 23:54 Ryan_Lane: force running puppet on formey and manganese, since a config change is involved, it's going to restart
- 23:45 Ryan_Lane: restarting gerrit on manganese
- 23:38 Ryan_Lane: upgrading gerrit on formey
- 23:21 Ryan_Lane: updating database for gerrit
- 23:09 Ryan_Lane: upgrading gerrit on manganese
- 23:08 Ryan_Lane: stopping gerrit on manganese and disabling puppet
- 23:07 Ryan_Lane: stopping gerrit on formey and disabling puppet
- 22:47 logmsgbot: preilly synchronized wmf-config/mobile.php 'add subdomain check'
- 22:34 logmsgbot: reedy synchronized wmf-config/
- 22:17 logmsgbot: reedy synchronized wmf-config/ 'Tidying config for randomrootpage'
- 21:55 logmsgbot: reedy synchronized wmf-config/ 'Strike 2'
- 21:50 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'revert'
- 21:48 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Random root page enabled everywhere but wikipedias'
- 21:47 logmsgbot: kaldari Finished syncing Wikimedia installation... :
- 21:34 logmsgbot: kaldari Started syncing Wikimedia installation... :
- 21:18 logmsgbot: kaldari synchronized wmf-config/InitialiseSettings.php 'Bumping AFTv5 lottery percentage to 20% of en.wiki'
- 21:15 LeslieCarr: powercycling unresponsive mw1116
- 21:09 maplebed: added new ms-be pmtpa hosts to DNS
- 20:03 logmsgbot: preilly synchronized php-1.20wmf6/extensions/MobileFrontend 'fix cache issue with token'
- 19:14 Jeff_Green: *somebody* purged binary logs on blondel
- 18:58 logmsgbot: preilly synchronized php-1.20wmf6/extensions/MobileFrontend 'fix cache issue'
- 18:54 mark: Upgraded pybal on all precise LVS servers
- 18:43 mark: Built new pybal_1.03 package and inserted it into the precise-wikimedia APT repository
- 17:41 mutante: powercycling db1048, mw1136, mw1046
- 17:32 logmsgbot: midom synchronized wmf-config/InitialiseSettings.php 'reenabling db40'
- 17:30 mutante: more powercycling and upgrading: mw1036, mw1134, mw1043 ..
- 17:26 mutante: powercycling db1009,db1010
- 17:22 mutante: powercycling db1027,db1028
- 17:14 mutante: powercycling db1013
- 16:50 mutante: powercycling downed db1015
- 16:44 mutante: powercycling and upgrading more mw10xx servers, 1017,1023,1025 ...
- 16:27 mutante: mw1002, mw1007,mw1009,mw1011 - crashed,powercycling,dist-upgrading+kernel,reboot
- 16:18 mutante: powercycling mw1002
- 15:37 mutante: argon back up with new kernel,mysql,grub,.. looks happy afaict
- 15:33 mutante: argon (limesurvey) fscked, dist-upgrading
- 15:28 mutante: powercycling argon
- 15:25 hashar: updated Jenkins configuration on gallium : Updating f407ebe..4b669b9
- 14:08 cmjohnson1: HUME replacing disk 0
- 14:07 logmsgbot: midom synchronized wmf-config/InitialiseSettings.php 'disabling parser cache for now'
- 11:54 mark: Inserted new pybal_1.02 package into APT distribution precise-wikimedia
- 11:17 mark: Installed new pybal snapshot build for testing on lvs1005
- 11:13 mutante: add missing Russian locales on singer, run localegen, run ru.planet update
- 02:26 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Thu Jul 5 02:26:45 UTC 2012
July 4
- 16:32 mutante: wikidata.org on now - redirect purged from squids
- 15:43 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'imports'
- 15:40 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Enable LQT on fiwikimedia'
- 15:39 Reedy: Killed php-1.20wmf5 localisation cache from mediawiki-installation group
- 15:37 Reedy: Created LQT tables on fiwikimedia
- 15:25 mutante: wikidata.org works now, besides old redirect may still be cached on cp* boxes (not purged by purgeList.php via multicast?). www.wikidata.org/?notcached
- 15:06 mutante: sync-common-file extract2.php, apache-graceful-all
- 15:04 logmsgbot: dzahn synchronized extract2.php
- 13:23 mutante: apache-graceful-all to add wikidata.org virtual host
- 13:21 mutante: svn commiting gerrit 9874, sync-apache
- 13:14 mutante: git pull in /h/w/common/docroot . adding wikidata.org files on fenari. , then "sync-docroot"
- 12:44 logmsgbot: dzahn synchronized php/cache/interwiki.cdb 'Updating interwiki cache'
- 12:44 mutante: updating/syncing interwiki cache
- 09:32 hashar: swift-container-auditor seems to get down from time. Nagios reporting 0 processes at 8:15am and 9:25am UTC (I guess it get restarted automatically by puppet)
- 07:34 hashar: updating Jenkins copy of integration/jenkins from 0f069c3 to e264d1b. Bring new ant script + update to testswarm fetcher
- 07:09 Tim: on srv193: ran dpkg --set-selections to revert holds on php5 packages and ran apt-get upgrade
- 07:07 Tim: on srv193: fixing broken PHP packages causing puppet failure, nothing in the server admin log about them so I assume they were installed by accident
- 06:21 Tim: deployed Idb6d9a8b and restarting apaches
- 06:11 Tim: deployed Id7008681 and restarting apaches
- 05:45 Tim: reniced apache processes to level 0
- 05:04 Tim: deploying apache nice level change per RT #664
- 03:59 Tim: on mw1: experimenting with renice methods for RT 664
- 02:26 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Wed Jul 4 02:26:39 UTC 2012
July 3
- 23:03 logmsgbot: mlitn synchronized php-1.20wmf6/extensions/ArticleFeedbackv5/
- 22:36 logmsgbot: preilly synchronized php-1.20wmf6/extensions/MobileFrontend 'weekly update'
- 22:27 hashar: updating testswarm submitter on gallium
- 21:01 logmsgbot: mlitn Finished syncing Wikimedia installation... :
- 20:37 logmsgbot: mlitn Started syncing Wikimedia installation... :
- 20:18 logmsgbot: asher synchronized wmf-config/db.php 'lowering db32 weight'
- 19:34 logmsgbot: asher synchronized wmf-config/db.php 'lowering db36 weight'
- 19:31 logmsgbot: asher synchronized wmf-config/db.php 're-add db36, db32 (low weight), es3 (innodb)'
- 18:11 RobH: virt1006 mgmt serial not set correctly, fixed
- 17:51 RobH: investigating stat1001 power issue
- 17:22 RobH: fluorine offlining to test disks
- 17:22 RobH: pulling helium offline for disk testing with fluorine disks
- 17:10 RobH: db1047 disk0 rebuild in progress
- 17:04 RobH: replacing bad disk in db1047
- 16:45 logmsgbot: reedy synchronized wmf-config/ 'Various config changes'
- 16:30 logmsgbot: reedy synchronized docroot/mediawiki/xml/export-0.7.xsd
- 16:15 Jeff_Green: silicon gets dist-upgrade & reboot
- 16:15 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'wgCheckSerialized is deaded'
- 03:53 logmsgbot: tstarling synchronized wmf-config/CommonSettings.php 're-enable API action=purge on commonswiki'
- 02:27 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Tue Jul 3 02:27:02 UTC 2012
July 2
- 22:33 binasher: rebooted db36 for kernel upgrade
- 22:25 logmsgbot: asher synchronized wmf-config/db.php 'temp pulling db36'
- 22:02 brion: fun with routers in tampa, wikis down
- 21:48 maplebed: rebooted emery - it's been unresponsive for 3 days.
- 18:34 logmsgbot: reedy synchronized wmf-config/CommonSettings.php
- 18:30 hashar: set up ignore file in httpd configuration directory
- 18:23 logmsgbot: reedy synchronized wmf-config/ 'Enable WikimediaShopLink on enwiki'
- 18:11 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: 273 wikipedias to 1.20wmf6
- 18:03 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: enwiki to 1.20wmf6
- 15:21 logmsgbot: hashar synchronized wmf-config/CommonSettings.php '/etc/wikimedia-realm detection https://gerrit.wikimedia.org/r/13888'
- 15:18 logmsgbot: hashar synchronized docroot/bits/static-master '(bug 37245) docroot 'static-master' for beta bits'
- 15:04 mutante: authdns-update to switch jobs.wm redirect to wikimedia-lb to fix SSL cert mismatch (RT-3071)
- 14:55 mark: Reboot of cr1-sdtpa did not fix the RE packet loss issue... therefore unlikely to be leap second related
- 14:41 mark: Rebooting cr1-sdtpa
- 14:37 mark: Shutdown PyBal BGP sessions on cr1-sdtpa
- 14:34 mark: Shutdown BGP session to 2828 on cr1-sdtpa
- 13:36 hashar: db12 suffering some 1400sec (and growing) replag. mysqldump in progress on that host.
- 12:35 mutante: installing upgrades on fenari (linux-firmware linux-libc-dev..)
- 12:27 mutante: rebooting gallium one more time to install kernel
- 12:26 mutante: upgrading kernel on gallium
- 12:23 logmsgbot: hashar synchronized live-1.5/CREDITS
- 11:31 mark: Now we have packet loss within pmtpa/sdtpa... reverting change
- 10:57 mark: Problems on one of two pmtpa-eqiad waves; raised OSPF metric to 60 to failover traffic to the other link
- 10:50 Tim: fixing leap second issue on bastion1 by rebooting it
- 10:47 Tim: fixed leap second issue on bastion-restricted
- 09:56 Tim: fixing leap second issue on virt1,virt2,virt3,virt4,virt5
- 09:52 Tim: fixing leap second issue on aluminium,gallium,manganese
- 09:47 Tim: fixing leap second issue on formey,grosley,hooper,sanger,sockpuppet
- 09:43 Tim: on fenari: fixed leap second issue with the mozilla method
- 09:39 apergos: rebooting gallium, it's pretty unhappy (maybe related to leap second issue)
- 08:14 logmsgbot: hashar: srv190 srv266 srv281 timeouts on sync-file
- 08:14 logmsgbot: hashar synchronized wmf-config/InitialiseSettings.php 'Bug 37457 - fix import sources for viwikibooks'
- 08:11 hashar: Stopped Jenkins on gallium. It is not doing anything anyway. Asked to reboot box RT #3208
- 02:53 logmsgbot: LocalisationUpdate completed (1.20wmf5) at Mon Jul 2 02:53:51 UTC 2012
- 02:28 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Mon Jul 2 02:28:48 UTC 2012
- 01:48 Tim: kill -CONT on populateRevisionSha1.php processes
- 00:47 Tim: on nfs1: trying leap second fix suggested at https://bugzilla.mozilla.org/show_bug.cgi?id=769972#c5
- 00:26 logmsgbot: tstarling synchronized wmf-config/db.php 'reduce db32 read load to zero due to persistent lag'
- 00:12 Tim: switched enwiki back to r/w
- 00:12 logmsgbot: tstarling synchronized wmf-config/db.php
- 00:06 Tim: on hume: stopped all populateRevisionSha1.php processes with kill -STOP
- 00:03 logmsgbot: reedy synchronized wmf-config/db.php 's1/enwiki into readonly'
July 1
- 19:12 logmsgbot: reedy synchronized php-1.20wmf6/extensions/WikimediaMaintenance/ 'Update to master for hashar'
- 17:55 logmsgbot: aaron synchronized php-1.20wmf5/includes/WikiPage.php 'more logging'
- 17:45 logmsgbot: aaron synchronized php-1.20wmf5/includes/WikiPage.php 'more logging'
- 17:43 logmsgbot: aaron synchronized php-1.20wmf5/includes/WikiPage.php 'more logging'
- 17:32 logmsgbot: aaron synchronized php-1.20wmf5/includes/WikiPage.php
- 17:30 logmsgbot: aaron synchronized php-1.20wmf5/includes/WikiPage.php
- 16:53 logmsgbot: aaron synchronized php-1.20wmf5/includes/WikiPage.php
- 16:48 logmsgbot: aaron synchronized php-1.20wmf5/includes/WikiPage.php 'logging'
- 12:54 notpeter: also going to reboot all pmtpa search nodes. not in prod, but are still freaking out from leap second bug.
- 05:33 logmsgbot: aaron synchronized php-1.20wmf5/includes/WikiPage.php 'logging'
- 04:25 logmsgbot: LocalisationUpdate completed (1.20wmf5) at Sun Jul 1 04:25:25 UTC 2012
- 04:06 Ryan_Lane: virt1000 is back up, rebooting virt0
- 04:01 Ryan_Lane: rebooting virt1000
- 03:16 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Sun Jul 1 03:16:39 UTC 2012
- 01:43 notpeter: that worked. restarting all remaining search nodes.
- 01:39 notpeter: problem with lucene persisting through service restart, but not node restart. restarting en pool nodes.
- 01:20 paravoid: restarting opendj (nfs1/nfs2), load spike, possibly related to leap second
- 00:51 notpeter: search1004 dead. powercycling.
- 00:50 notpeter: based on ganglia evidence, lucene seems to have been affected by leap second bug. restartig each instance, one minute wait in between
June 30
- 16:12 mark: Temporarily added path 6939+ 14907+ to AVOID-PATHs on cr2-knams
- 02:53 logmsgbot: LocalisationUpdate completed (1.20wmf5) at Sat Jun 30 02:53:46 UTC 2012
- 02:28 maplebed: corrected LVS pdns_recursor config error causing DNS queries to fail on LVS servers in gerrit r13554 and r13555.
- 02:27 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Sat Jun 30 02:27:08 UTC 2012
June 29
- 19:49 hashar: restarting Jenkins to fix an issue with "parameterized builds" plugin. Updated git plugin as well.
- 19:35 RobH: dns update via authdns-update for vanadium ip
- 18:05 Jeff_Green: sync-apache and apache-graceful-all for donate.wikimedia.org-->https redirect
- 16:02 RobH: ms-be1001 and ms-be1002 powering down for ssd installation
- 15:59 RobH: authdns-update run
- 15:47 RobH: updating dns
- 15:16 mutante: dist-upgrading srv280,srv270,srv264
- 15:11 Jeff_Green: apache-graceful-all for redirect conf change
- 15:10 Jeff_Green: sync-apache to push out new foundation.conf
- 14:50 mark: Reinstalled chromium with precise
- 13:46 hashar: fixed interwiki on wikisource.org/ main page by hacking a script in production and refreshing cache
- 13:46 logmsgbot: hashar synchronized php-1.20wmf6/cache/interwiki.cdb 'Updating interwiki cache for 1.20wmf6'
- 13:32 logmsgbot: hashar synchronized php/cache/interwiki.cdb 'Updating interwiki cache'
- 12:38 logmsgbot: dzahn synchronized php/cache/interwiki.cdb 'Updating interwiki cache'
- 12:38 mutante: dumping interwiki and updating interwiki cache (to fix broken interwiki links, like wikisource.org -> wikipedia.org)
- 09:31 hashar: Jenkins: deployed gitsqlhaschanged patch ( d04f779 0f069c3 integration/jenkins.git )
- 07:56 logmsgbot: hashar synchronized wmf-config/CommonSettings.php 'send header from CS.php only for non CLI scripts gerrit 13435'
- 07:08 mutante: upgrading apt packages on brewster
- 02:51 logmsgbot: LocalisationUpdate completed (1.20wmf5) at Fri Jun 29 02:51:13 UTC 2012
- 02:26 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Fri Jun 29 02:26:25 UTC 2012
June 28
- 23:30 logmsgbot: reedy synchronized php-1.20wmf6/includes/resourceloader/ResourceLoader.php
- 23:15 binasher: completed aft offload_large_feedback migration on enwiki
- 23:03 logmsgbot: asher synchronized wmf-config/db.php 'returning db36'
- 21:50 logmsgbot: kaldari Finished syncing Wikimedia installation... :
- 21:39 logmsgbot: kaldari Started syncing Wikimedia installation... :
- 21:13 logmsgbot: asher synchronized wmf-config/db.php 'temp pulling db36'
- 20:38 binasher: ran aftv5 offload_large_feedback migrations on testwiki and en_labswikimedia
- 20:14 RobH: dns update for pc1-pc3
- 19:54 logmsgbot: kaldari Finished syncing Wikimedia installation... :
- 19:08 logmsgbot: kaldari Started syncing Wikimedia installation... :
- 18:47 hashar: the internal change to CommonSettings.php caused a lack of stylesheet for less than a minute on most wikis. I did test on test.wikipedia.org and beta project, but there must be a logic error somewhere that mess with the prod projects. Revert changes have been sent out in gerrit and merged in master.
- 18:35 hashar: so the nicely reviewed changes broke the enwiki stylesheets :/ reverted change :-(((
- 18:34 logmsgbot: hashar synchronized wmf-config/CommonSettings.php
- 18:33 hashar: srv190 and srv281 got ssh timeout
- 18:31 logmsgbot: hashar synchronized wmf-config/CommonSettings.php
- 18:30 hashar: did various tests using eval.php. Most important is $realm -> production. $cluster -> pmtpa. Syncing
- 18:25 hashar: updating mediawiki-config to grab a12545d edceb4c & eee97ad
- 18:23 RobHalsell: swapped bad psu out of ms1001-array3, redundant so no downtime
- 15:40 RobHalsell: pulling the following servers, relocating to payments rack: payments1001-1004, boron, beryllium, lithium
- 15:34 RobHalsell: dns updated
- 15:31 RobHalsell: boron appears to be unallocated, pulling IP allocation, rack allocation, moving to payments per 1227
- 14:40 mutante: svn server is rebooting.brb
- 14:38 mutante: dist-upgrading formey (svn/gerrit), rebooting soon
- 14:38 Jeff_Green: manganese rebooted for kernel update
- 14:37 RobH: allocating yttrium to payments rack per rt 1227
- 14:24 Jeff_Green: manganese dist-upgrade
- 14:15 Ryan_Lane: restarting apache on manganese
- 14:15 Ryan_Lane: restarting gerrit
- 05:08 Tim: srv266 was flooding the fatal error log, complaining about a missing file. Killed apache and ran sync-common.
- 05:03 Tim: fixed fatal.log on fenari, socat was writing to a deleted file
- 02:58 logmsgbot: LocalisationUpdate completed (1.20wmf5) at Thu Jun 28 02:58:42 UTC 2012
- 02:30 logmsgbot: LocalisationUpdate completed (1.20wmf6) at Thu Jun 28 02:30:03 UTC 2012
June 27
- 23:50 K4-7131: sync'd payments cluster to 592e0a5ba195
- 23:43 logmsgbot: preilly synchronized wmf-config/InitialiseSettings.php 'add rule for mediawiki'
- 22:37 K4-7131: sync'd payments cluster to 7e9072c2d571c
- 22:23 binasher: temporarily pulling srv211 from pybal
- 21:56 RobH: mw1102 has no nic0, rather than troubleshoot it for a long time, reinstall! (rt 3058