Search:
Where I Work
NKS
Subscribe
Add to Google
RSS 0.91
RSS 1.0
RSS 2.0
ATOM 1.0
RSS 2.0 and ATOM
Network
View Ian's profile on LinkedIn
Archives
2007 April (1)
2007 February (1)
2007 January (4)
2006 December (2)
2006 November (2)
2006 September (5)
2006 August (4)
2006 July (1)
2006 June (3)
2006 May (2)
2006 March (4)
2006 February (4)
2006 January (1)
2005 December (8)
2005 November (26)
2005 October (10)
2005 September (17)
2005 August (87)
2005 July (48)
2005 June (34)
2005 May (24)
2005 April (243)
2004 April (1)
2004 February (3)
2003 August (2)
2003 June (2)
2003 May (8)
2003 January (1)
2002 September (1)
2002 July (4)
2002 June (2)
2002 May (5)
2002 April (15)
2002 March (15)
Projects
CornFS
DENSO NAV
Rage Powered
Tampa Bay
TampaBad
SLUG
ob-buttons
Creative Commons OpenSource Linux Individual-i GeoURL Linux Speakeasy Speed Test
Twitter

follow icblenke at http://twitter.com
Google
Ian's shared items in Google Reader (subscribe)

Fenxi - Performance analysis made easy

Changing libgnomecups For Multiple Evolution Users

Re-Sync With Compiz Fusion

Capable packages

Happy National Sys Admin Appreciation Day!

NIS on Windows Server 2008

ESX iSCSI Basic Configuration from the CLI

Tape Rants and Raves: LTO4 Rules

IP Filter in OpenSolaris

iSCSI Security with CHAP

Plastic Ocean

apparently you aren't dead until you start to stink

Charlie Goes to Candy Mountain

iSCSI Security with CHAP

Seattle Scalability Conference, Pt II

Singing Tesla Coil.

Magic Tricks Tutorial Videos

Announcing the Hyperic VMware Appliance

SysAdmin Magazine: RIP

The megafreeze development model is broken

Geektalk

Recent JVM benchmarks

Overclocking tool for the Mac Pro

ADO.NET Entity Framework (Microsoft's new ORM) given a non-confidence vote by beta testers

Ruby interpreter flaws make the case for JRuby

The Stalled Server Room

AdvFS - Tru64 filesystem ported to Linux

OpenSolaris 2005.05 repository update to b91 - follow these instructions carefully

SXCE can ZFS install as of b90

Vertebra: EngineYard's Next Generation Cloud Computing Platform

Skype 4.0 beta overhauls video chat

Mozilla org receives traditional IE cake

Toyota Prius to go entirely Electric

Bill Gates steps down permanently for philanthropic activities

Men write code from Mars, Women write more helpful code from Venus

SproutCore - a GUI event driven model javascript web development platform, rails based by the looks of it.

Finding ARPANET

DRBD LVM Xen = Bug. A rather nasty one at that.

Intel unveils Ct as an extension for C/C to encourage threaded programming for multiple cores

VMWare ThinApp - Run any Windows app on any version of Windows

JDBC adapter for HBase

JRuby-Rack <-- a JRuby port of Rack

Rack <-- a lighter cousin to Merb, fully threaded and no Mutex.

Datamapper.org <-- ActiveRecord like, with no need to do migrations, it just kind of handles that by itself internally automagically.

Solaris Cluster Express (SCX) 6/08 released.

a-i-studio.com/cmd

CMDLogParserDemo

Changing solaris' default password hashing

Texas based service provider explosion affects 9,000 servers and 7,500 customers.

Jruby on Rails on Tomcat deployed as as WAR file

Rubinius

Milkfish.org SIP Router

42 more of the best Linux games

42 of the best Linux games

XenWindowsGplPv drivers

Use Google's cached ajax libraries

Arduino microcontroller with OS/X

The metasploit page describing the full impact of the poor RNG.

Holger Bert's blog post on the openssl RNG fiasco

Cayac - Cherokee MySQL PHP5 phpMyAdmin

ZFS very slow under an xVM kernel

VMWare's review-board.org

Google DocType

Dynamically editing libvirt xml configs while a VM is running to redefine reboot flags.

Chronoton - the time travelling robot who's best friend is a talking pie game

Endace DAG

Your pizza is done

Rietveld - Google's code review tool

Opensource multitouch displays

RTL8139 drivers supporting QEMU tcp segmentation offloading (XP's default driver does not) - doubles networking speed of Xen HVM networking without using the GPLPV drivers

Corporate map.

Ono - an efficient way to locate nearby peers

Solaris CIFS integrated AD with ZFS acls

Samba Winbind and ZFS acl working together

Why's unholy Ruby to Python .pyc compiler

Zentific poll daemon 1.0 beta

Solaris SAM-QFS NFS and OS/X

OpenSolaris 2008.05 final ISO image

Twitter abandoning Ruby on Rails

HP makes memory from a once-theoretical circuit

AVS seamless with ZFS

OpenVZ live migration demo

Setting Up an OpenSolaris NAS Box: Father-Son Bonding - The Video

Linux kernel Xen self-ballooning patch

HyperVM

FuildVM

Coolstack - Yet another group of solaris packages

SFE - Spec Files Extra - or, solaris's ports system

ksplice - live linux kernel patching

ZFS-102-A.pkg - binary package build of newer ZFS for Mac

ZFS for Mac Project page

Changing boot flags for a solaris domU guest

RAM based SSDs

Augeas - a configuration API

callflow - SIP callflow diagram generator

sdedit - quick sequence diagram editor

Milax - The OpenSolaris Small Live CD

Sun close-sourcing MySQL

Intel hardware virtulization breaks kvm - if you're going to run HVM on Intel, you want Xen 3.2 for the improved software emulation of instructions broken in Intel's hardware virtualization

Big Nerd Ranch on Windows/Linux/Leopard single signon

Sun touts big plans for OpenSolars as first release nears

Heroku - EC2 based Rails hosting.

RIP John Achibald Wheeler

Meadowcourt's compiled WindowsXenPV driver, v0.8.8, as built from win-pvdrivers.hg repo

What's new in Solaris 10 U5

The Thing About Git

Network Solutions hijacks all customer's unused subdomains

ZFS Evil Tuning Guide

ZFS speed bump: set zfs_nocacheflush = 1

We Don't Use Software That Costs Money Here

Free NIC drivers for Solaris

Hubble - a PlanetLab realtime Internet "blackhole" monitor

Citrix price jumps on rumors of potential IBM/Cisco bidding ware

Segway RMP

TechCrunch labs on their AppEngine deployment

pash - because powershell was too cool to let microsoft keep to itself

Skeptologists

Google AppEngine

Brazil migrates 430 thousand boting machines to Linux

How xVM can be made to suspend/restore instead of shutdown/restart guests on reboot of the underlying xVM host.

The Machine Emulator - TME can emulate a sparc4 with OBP

SFE - spec-files-extra

OSCON2008 schedule

Google releases new GCC linker

Automatic generation of peephole superoptimizers

Zentific

Zentifi

Disabling nagle under Solaris

Xen.org Trademark Policy for Review

SXCE b85 has problems booting under Xen 3.2

OpenSolaris xVM sysadmin doc

VNRP == opensolaris quagga rbridges crossbow xVM

RBAC vs sudo HOWTO

problems reprobing iscsi devices with solaris 10

IPMP for Solaris Zones

All OpenSolaris flag days

Liveupgrade for idiots

Sigma DP-1 review

ratemynetworkdiagram

LSI MegaRAID SAS/Dell PERC5 driver for Solaris

dm-band block IO bandwidth controller

Sun open sources SAM-QFS

Dojo.storage - Google Gears workalike?

PerlCritic

PerlTidy

Tux droid

ooma.com - free phone service after you buy their device

Hacking defibrilators shockingly easy

Microsoft working with Eclipse.

Pentagon attack last June stole an "amazing amount" of data

Solaris and Solaris Cluster on HP ProLiant Servers

Apple Introduces new MacBook and MacBook Pro models

Sun leaks 6-core Xeon, Nehalem details

Xen and Solaris - a journal of sorts

How to save the world with ZFS and 12 USB sticks

Xvm: a summary of creation of various Xen domU

OpenSolaris b82 comes with CoolStack

Disk Encryption Cracked?

Dilber PHB on Virtualization Consultants

Dilbert PHB on Virtualizing

Burger Haiku Contest

Sun xVM Ops Center GA v1.0 tomorrow

KernelTrap on the 2.6.23 Xen merge

Infiniband explained.

IETF XMPP/SIMPLE Interworking Draft

PSYCed - IRC/XMPP server that gateways transparently between both

Wikipedia OTR

OTR - Off The Record, Homepage. IM Encryption.

SIPE - Pidgin plugin for SIP/SIMPLE with Microsoft LCS compatibility hacks

Price Waterhouse Cooper's Global Cable Map

Solaris Windows iSCSI speedup disabling NAGLE

qooxdoo.org

ConVirt

OpenSolaris Storage Developer Wish List

Nexenta Builder - build your own Nexenta based distribution

Microsoft to acquire SideKick maker Danger

Linux Kernel 2.6.23-2.6.24 vmsplice local root exploit

The evolution of Tech Company logos

Hypertable

Mindstorms NXT Rubiks Cube Solver

Cut four undersea cables, shame on you, cut a fifth, also shame on you

Koha - OpenSource Integrated Library System

Oracle's VM patch for Xen to allow 32bit/64bit domU save/restore/migrate with a 64bit hypervisor and a 32bit dom0.

2 girls, 1 cup: The show

SIPE - SIP Exchange protocol - or, how to get Pidgin to talk to Microsoft Live Communication Server

Little notes on ZFS storage

Amazon SimpleDB written in Erlang

NexentaXenDom0

Three different cable cuts in the middle east: two off the coast of egypt, one off the coast of dubai.

Xen DR7 and CR4 Registers Multiple Local DoS vulnerabilities

XMLPulse - parse xen dom0/domu stats

Universcale

The rist of the FOSS spinmeister

ThinkingRock GTD

Smartphones patented - lawsuits immediately filed

TestDisk - when you've botched a simple->dynamic disk conversion and need that NTFS filesystem back, give this a try. Also, if you partition a disk mistakenly, this can find your filesystems and reconstruct a partition table painlessly.

H-Sphere cross-platform hosting control-panel

Mystery infestation strikes Linux/Apache web sites

Fenxi - A java based OpenSource Performance Analysis Engine. Fenxi (mandarin for analyze) is the successor to the Sun-internal tool called Xanadu.

Gizmo backdoor dialing

GNU/Solaris - When the fun begins

KDE goes cross platform with Windows and Mac/OSX support.

Microsoft prints get-out-of-jail card for Vista Home

Tsung - an erlang based multi-protocol distributed load testing tool

Microsoft relents, ban on vista virtualization is lifted

Architecture for Lustre ZFS

Lustre ZFS

Hyperic podcast talking smack with Luke KAnies of Puppet

Commodore SX64 vs MacBook Air

The Mysql storage engines, and when they are appropriate.

MADOCA - Message And Database Oriented Control Architecture

SMP Xen HVM Windows guests need timer_mode=1

Remember, Oracle owns innodb

Sun buys MySQL for $1billion

Wearscience.com

DreamHost billing issues

James Randi is coming to Tampa

Information Of Those Who Appealed Watch List Compromised

ITConversations

CNN Secondlife Blogs

Google MapReduce stats

Tata Nano - $2500 world's cheapest car

Dilbert on Agile Programming

Banks banned in Second Life

shimmer

Ubuntu embraces OpenVZ

Sears goes spyware

Savingtheinternetwithhate.com

Avocent KVM over IP

Zed Shaw: Rails is a Ghetto

Air Travel with Spare Batteries? Check the changes to what is permitted starting tomorrow.

TBO Crime Tracker

Tampa crime grid maps

TechShop Orlando

OpenNetAdmin

Open Configuration and Management Layer

FiveRuns RM-Manage - rails project monitoring

VLDB - Very Large Data Base Endowment Inc - nonprofit

Elastix - a more friendly Trixbox fork

The C days of Y2k

Toshiba micro nuclear reactor

Ball pit couch

A Glimpse and a Hook - a take on resumes

Xirrus - LISA used 7 arrays to provide WiFi

ipcluster

Imagine Peace

dopd - an easier way to keep drbd primary/secondaries in sync

OpenSIM - run your own SecondLife grid.

$4million in hardware lost in London data center heist

iscsi block device script for /etc/xen/scripts

Quaqua - Aqua look and feel widgets for jvm

Java6 for os/x: Soylatte

Chimps beat humans in memory tests.

WinFUSE

Level 3 needs technicians with FIREBALLS

10 steps to close down an open society

Slurm tutorial PDF

Longer flights to avoid air traffic control charges

News release from Six Apart about LJ sale to SUP

SUP bought LJ from Six Apart

Optimus keyboard is finally available

PlasticFS - an LD_PRELOAD to make applications think they're on a case-insensitive filesystem, and other neat hackery

pkgGen and logGen and Packagemaker - repackage os/x packages to deploy

Jumpbox.com - virtual appliances

TelegraphCQ - barkeley database research - adaptive dataflow capture, combine, analyze

UK loses CD of private info on 25million citizens

Solaris Automatic Migration opensourced

AVS ZFS Demo <-- replicated ZFS pool

Xen Virtualization book not yet published for sell on Amazon

Phoenix BIOS releasing its own hypervisor

Andrew Warfield's other publications

Parallax - managing storage for a million virtual machines, from the Xen guys at Cambridge

Kepler project - GRID scientific workflow engine

Google Distributed Systems

Google Code Map/Reduce mini lectures

What 24 would have been like in 1994.

WaterRoof - Mac OS/X Firewall Manager

Fedora Func

10 reasons why Oracle databases run best on VMWare

Google Caja - allow scripts in a 3rd party context

Miro 1.0 launched

Xen Windows PV drivers - opensource mercurial repository

QuickSilver - opensourced 11/06/07

vmcasting.org - someone else "gets it"

Vista True Info

ASUS EEEPC701 starts to appear

RedHat virt-factory

oh, THAT spacecraft! oops!"

Perian - Opensource quicktime codecs

KVM-lite == kvm-quemu lguest

RedHat cobbler

RSnapshot - an rsync based dirvish like tool

Flyback - a google code project equivalent to Apple's Time Machine, for Linux

Buglabs.net

Apple tablet PC is real, says Asus.

Yahoo Zookeeper

producten.hema.nl - wait for this one to load

Google rolls out the Open Handset Alliance

Cost analysis of Windows Vista Content Protection

HDF5

Git - a Google Talk by Randal Schwartz

Asus EEE PC 701

JQuery's AJAXSLT plugin

Google's AJAXSLT

indeed.com - MIT search engine for jobs crawled from monster, dice, etc.

Genius files

Genius - a mac flashcard app

The Day The Routers Died

Tomshardware's RAID Migration Adventure

Theo de Raadt on Virtualization, and the sate of OpenBSD Xen

Prius Limo

Tamparuby youtube video

Bitlbee - IRC gateway all of your other IM traffic

Off The Record - encrypted IM overlay

SATA drive -> NES cartridge style

SVN time lapse view

Google Gears in Motion

Amazon's one-click patents struck down

Morgan Stanley sells entire New York Times stake

The future of malware

GTDTools

GTD - Getting Things Done

PS3 supercomputer

Dolphin SCI

Massive installation management tools

smbldap-tool addons

Wi-Fi Detector Shirt

GULP: a unified logging architecture for authentication data

Sun xVM

Crazy Patents

zypper - suse's apt analog?

EC2 outage loses customer data

FutureOfWebApps conference underway

Microsoft releasing the Source Code for the .NET libraries

LiveView.sf.net - Java based graphical forensics tool that creates VMware virtula machines out of raw disk images or physical disk.

Thinstation.sf.net

Windows 2003 Server Emergency Management Services (EMS) - Special Administration Console (SAC)

Catalyst - the Perl web framework analog to Rails

Fusion io - the power of 1000 harddrives in the palm of your hand

Thingamy

Proggyfonts.com - fixed width font downloads

Verizon FIOS moving to IPTV

Heavy Reading

Math bug in Excel 2007

Glue

CoworkingOrlando

likemind.us

BlogOrlando starts Friday

BarCamp Orlando is this weekend

ESX3i Dell demo

How to us CHDK to give your Canon digial camera RAW support

Opcon/xps batch system

PBS batch system

LSF batch system

SGE batch system

UIKit Hello World

Cygnal - When Red5 just won't cut it for an RTMP server

Creepy pooch

IBM's CoScripter - automating web-based processes

AjaxWindows.com - Another Michael Robertson company

p0f passive fingerprinting IDS

Talking storage systems with Sun's ZFS team

Dr Nick's Magic Models

SproutCore - a MVC scaffolding for actual Application development

Skype protocol obfuscation layer

Microsoft Silverlight and the Mono team at Novell join up to create the Moonlight project

Bitlbee - bridge IM client networks to an IRC channel.

EJBCA - The J2EE Certificate Authority

OSC CAtool

Festo's latest pneumatic tech

Mcell 3.5" drive has 1GB of DDR RAM 2.5" drive == 110MB/s transfer rates

TENORIO-ON Product Demo

OpenSolaris Xen domU with a linux dom0

Tentakel: distributd command execution

Ganeti: Opensource virtual server management software for Xen

Seemless dynamic image resizing

Mono and XPCOM scripting VirtualBox

The bacon mat

podbrix young woz and jobs playset

Woz gets a speeding ticket for 104mph in a Prius

Sam Ruby's long bets

Project Starfire

The real computer monster

Google Starts Shared Storage Service

The $200 billion ripoff

OS/X TPM driver

Storm Worm DDoSes scanning machines

wiki.openmanagement.org

Defendant wins access to the Intoxilyzer 5000EN Breathalyzer source code

BarCampESM

IronKey

The Funded - VC ratings

Horrible Microsoft Vista song

How to replace graffiti 2 with the original graffiti on a Palm

customizegoogle.com - a firefox plugin for customizing google

Mon, 26 Jun 2006

Starting down the ActiveSalesforce path, my first goal was to do a simple dump of a class of objects to yaml using the API.


$ gem install activesalesforce
$ cat - <<EOF > dump_accounts.rb
#!/usr/bin/ruby

require 'rubygems'
require_gem 'activerecord'
require_gem 'activesalesforce'

ActiveRecord::Base.logger = Logger.new(STDERR)

ActiveRecord::Base.establish_connection(
  :adapter => "activesalesforce",
  :url => "https://www.salesforce.com/services/Soap/u/7.0",
  :username => "yourlogin@yourdomain.com",
  :password => "yourpassword"
)

class Account < ActiveRecord::Base
end

puts Account.find(:all).to_yaml
EOF
$ chmod u+rx dump_accounts.rb
$ ./dump_accounts > accounts.yml

Next step: figure out how to handle user authentication for Account Contacts...

Wed, 21 Jun 2006

The following is a post I've just made to the pgcluster-general mailing list. As it is blog worthy, it seemed appropriate to post here.

I've been testing a pgcluster running 1.5.0rc7 with pgbench 8.0.2.

I have 6 servers in a cluster:

2 pglb servers (2.6 kernel debian, amd sempron 2800, 1G RAM, 2 IDE drives software RAID1) 2 pgreplicate servers (2.6 kernel debian, amd64x2, 4G RAM, 2 IDE drives software RAID1) 2 postgres database servers (2.6 kernel debian, amd64x2, 4G RAM, 4 IDE drives software RAID10)

The pgbench page is:

http://www.sitening.com/tools/postgresql-benchmark/

It's a simple build:

$ wget http://www.sitening.com/pgbench.8.0.2.c
$ gcc -I/usr/include/postgresql -o pgbench pgbench.8.0.2.c -lpq4 -lm

After a bit of postgresql tweaking, I'm finally getting some good numbers (see below).

Things to remember when installing pgcluster:

  1. Your fully qualified hostnames must resolve and match the config.
    • add entries to /etc/hosts if you must, but make sure everything uses actual resolvable hostnames.
  2. Watch your user process limit (ulimit -u unlimited).
    • on the pglb master: pglb will spawn a thread for each pooled connection.
    • on the pgreplicate master: pgreplicate goes absolutely insane with threads
    • on db nodes: postgres spawns a thread for each incoming connection
  3. Your fully qualified hostnames must resolve and match the config.
    • add entries to /etc/hosts if you must, but make sure everything uses actual resolvable hostnames.
  4. Don't forget about the cluster.conf buried in the postgres server configuration directory on the db nodes.
  5. When you run things with "-v", expect a huge slowdown.
    • pglb drops from 12k tps (using "pgbench -S" for select() only) to only 6 tps. (5 orders of magnitude)
    • pgreplicate -v drops to below 1 tps. (2 orders of magnitude)
  6. Setup ssh key trust between servers using the userid that postgres runs as (usually "postgres")
  7. Remember to start slaves with pg_ctl -o "-i -R" the first time to pull down the rsync of the master.
    • this killed most of my "weird" deadlocks with select() only pgbench right away.

Back to the pgbench numbers.

The fastest mode of operation is select() only (pgbench -S):


$ ./pgbench -S -n -v -c 10 -t 1000 -m 10
  transaction type: SELECT only
  scaling factor: 1
  number of clients: 10
  number of transactions per client: 1000

  number of transactions actually processed: 10000/10000
  tps = 8394.268729 (including connections establishing)
  tps = 12815.846538 (excluding connections establishing)

  mean tps = 8399.209915 (including connections establishing)
  standard deviation = 202.800265

  mean tps = 12574.325714 (excluding connections establishing)
  standard deviation = 428.241079

Running this in parallel with an insert()/select() mix doesn't seem to impact it much. Meaning, a select() running in parallel with an insert()/select() run only seems to drop the numbers by 1k-2k tps or so.

To run an insert()/select() mix, run pgbench with the -N flag:


$ ./pgbench -N -n -v -c 10 -t 1000

  transaction type: Update only accounts
  scaling factor: 1
  number of clients: 10
  number of transactions per client: 1000

  number of transactions actually processed: 10000/10000
  tps = 115.539752 (including connections establishing)
  tps = 116.069260 (excluding connections establishing)

These numbers are to be expected with a synchronous replication system like pgcluster. As long as the select() to insert()/update() ratio is at least 9:1 things should be usable.

Trying to run the full "TPC-B (sort of)" mode, pgbench starts throwing update() into the mix.

This is where pgbench starts to deadlock for me.

You can add the "-d" flag to pgbench to debug things if it seems frozen.

The first pgcluster "bug":

It looks like I deadlock almost immediately after spawning pgbench with the following arguments:


$ ./pgbench -n -v -c 1 -t 1000 -m 1 -d
  pghost:  pgport: (null) nclients: 1 nxacts: 1000 dbName:
  message type 0x43 arrived from server while idle
  message type 0x5a arrived from server while idle
  client 0 sending begin
  client 0 receiving
  client 0 sending update accounts set abalance = abalance + 216 where aid = 52606

  client 0 receiving
  client 0 sending select abalance from accounts where aid = 52606
  client 0 receiving
  client 0 sending update tellers set tbalance = tbalance + 216 where tid = 7

  client 0 receiving
  client 0 sending update branches set bbalance = bbalance + 216 where bid = 1

  *deadlock*

The odd part is that only that pgbench seems hung. I can spawn any number of "pgbench -S" and "pgbench -N" sessions I want while that one is stuck, and things seem to continue running.

My second pgcluster "bug":

While doing this testing, I've found that pglb chokes if you request more client connections than it can handle.

In my testbed, I upped the max connections to 300 per server (each search tuned to allow 500 client connections), leaving me with 600 pooled connection threads running on my pglb server.

If I hit pglb with 600 available pooled connections with, say, 1000 pgbench connection attempts, pglb goes into a dead state refusing to accept more connections, even after pgbench is killed.


$ ./pgbench -S -n -v -c 10 -t 10000 -m 1
  transaction type: SELECT only
  scaling factor: 1
  number of clients: 10
  number of transactions per client: 10000

  number of transactions actually processed: 100000/100000
  tps = 11486.270576 (including connections establishing)
  tps = 11991.783709 (excluding connections establishing)

$ ./pgbench -S -n -v -c 1000 -t 10 -m 1
  Connection to database '' failed.
  pglb could not connect to server: no cluster available.
  $ ./pgbench -S -n -v -c 10 -t 10 -m 1
  Connection to database '' failed.
  pglb could not connect to server: no cluster available.
or
  Sorry, backend connection is full

After this state occurs, I need to kill off pglb and restart it, and sometimes this doesn't fix it (and I need to go through and restart the replication servers and the database servers).

In conclusion:

I actually have 3 distinct pgclusters going here, each containing 6 of the aforementioned servers, counting a total of 18 servers:

  • one dev cluster
  • one qa cluster
  • one production cluster

My question to the pgcluster list is: what version of pgcluster is "stable" enough to be used in a production environment?

I'd rather not need to go the direction of Slony-I with something like pgpool or dbbalance (to shunt writes to the master), first due to the complexity of managing these layers, and secondly due to the data coherency lost between the master and slaves (I want atomic synchronous replication).

Then again, this is all for hosting a Ruby on Rails application. We can make application changes as needed.

I hope this helps.

Tue, 06 Jun 2006

Subversion is a wonderful revision control system, until it breaks.

Historically, the BDB store was often corrupted and a "svnadmin recover" is required to rebuild the berkeley databases.

With newer versions of Subversion, FSFS has taken over as the preferred repository store. Faster and much more reliable than BDB, it has made Subversion far more stable than it has been in the past.

Unfortunately, FSFS appears to continue to have rare random corruption issues. Subversion developers are actively working on tracking down the cause, but it remains elusive at the moment.

Though much rarer than its BDB counterpart, repairing a corrupt FSFS store isn't as simple as running "svnadmin recover" - recover only works with BDB.

Recently, we stumbled upon such a cause. The symptoms appear much like the FAQ apr0.9.6 solution, with a difference: local file:// checkouts errored out just the same. As this is an Apache bug fix for poll(), a local checkout really shouldn't be using poll() at all.

Sadly, my users were continuing to commit changes to their subtrees in the repository. As long as they didn't try checking out the corruption affected tree, they could continue doing their work.

Trying to do a "svnadmin verify" results in the same error as the checkout, and an "svnadmin dump" fails just the same (as dump and verify appear to be very similar in nature). Without the ability to dump, it is neigh impossible to backup the revision history to restore to a repository elsewhere. The only daunting alternative was to checkout the various subtrees and hand-commit each revision checkout to another repository... I was suprised to find that an automated script to do such a thing hadn't been written yet.

Step 1 was to visit #svn on irc.openprojects.net and voice my concern at potentially finding an unanswered bug.

The folks on #svn immediately replied "go to #svn-dev and be prepaired to back your claims".

Step 2 then was to visit #svn-dev and lay out the above facts once again.

In the end, the developers suggested I send an email with all of the facts to the subversion user mailing list.

So I put the IRC chat together with the facts and posted "Repository corruption? Problem similar to FAQ#tiger-apr-0.9.6" to the subversion user list.

John Szakmeister soon replied with a suggestion to try:

http://www.szakmeister.net/fsfsverify/

This python script verifies the transactions in a given fsfs revision, and potentially repairs some of the more common problems found.

When run, the following error presented itself:


$ fsfsverify.py db/revs/653
...
NodeRev Id: 1st.g.r653/17936924
 type: file
 pred: 1st.g.r611/34703561
 text: DELTA 653 1668558 16257066 24194048 c0bd2a8b7ee4db1ee816ea607392755d
 prop: UNKNOWN 405 9727810 53 0 113136892f2137aa0116093a524ade0b
 cpath: /cpp/Client/IE/Project/src/Observer.ncb
 copyroot: 178 /cpp/Client/IE
starting length: 16257062
offset: 1668583
Decoded too many bytes
total: 14384890
remaining: 1872172
Traceback (most recent call last):
  File "/root/fsfsverify.py", line 699, in ?
    process(noderev, rev_file, options.dump_instructions,
options.dump_windows)
  File "/root/fsfsverify.py", line 652, in verify
    dump_windows)
  File "/root/fsfsverify.py", line 289, in verify
    digest = parse_svndiff(f, self.length, dump_instructions, dump_windows)
  File "/root/fsfsverify.py", line 188, in parse_svndiff
    raise 'svndiff error'
svndiff error

John suggested that Andrew MacKenzie also reported a similar issue, but he would need a copy of the revision to verify it was the same problem.

Attempting to use fsfsrepair.py's "-f" or "--fix-read-length-line-error" option didn't seem to affect anything at all.

I posted the revision somewhere John could access it.

After looking at it, John thinks this may be a new kind of corruption, and he'll work something into fsfsverify.py to fix it when he can find the time.

In the interim, John suggested truncating the file in the node revision using the following command:


fsfsverify --truncate=1st.g.r653/17936924 653

"This command will basically truncate the file to 0 length in that revision." - John.

Hopefully others will find this blog post along their search for an FSFS corruption fix and consider this potential "fix".

I'm not entirely sure what the data loss will entail, but it may very well solve your immediate problems dumping and restoring the repository elsewhere.

Note: always do an "svnadmin hotcopy" to make a test repository to test on, and immediately take a backup if you can.

Google
 
Web ian.blenke.com