Search:
Where I Work
NKS
Subscribe
Add to Google
RSS 0.91
RSS 1.0
RSS 2.0
ATOM 1.0
RSS 2.0 and ATOM
Network
View Ian's profile on LinkedIn
Archives
2007 April (1)
2007 February (1)
2007 January (4)
2006 December (2)
2006 November (2)
2006 September (5)
2006 August (4)
2006 July (1)
2006 June (3)
2006 May (2)
2006 March (4)
2006 February (4)
2006 January (1)
2005 December (8)
2005 November (26)
2005 October (10)
2005 September (17)
2005 August (87)
2005 July (48)
2005 June (34)
2005 May (24)
2005 April (243)
2004 April (1)
2004 February (3)
2003 August (2)
2003 June (2)
2003 May (8)
2003 January (1)
2002 September (1)
2002 July (4)
2002 June (2)
2002 May (5)
2002 April (15)
2002 March (15)
Projects
CornFS
DENSO NAV
Rage Powered
Tampa Bay
TampaBad
SLUG
ob-buttons
Creative Commons OpenSource Linux Individual-i GeoURL Linux Speakeasy Speed Test
Twitter

follow icblenke at http://twitter.com
Google
Ian's shared items in Google Reader (subscribe)

Fenxi - Performance analysis made easy

Changing libgnomecups For Multiple Evolution Users

Re-Sync With Compiz Fusion

Capable packages

Happy National Sys Admin Appreciation Day!

NIS on Windows Server 2008

ESX iSCSI Basic Configuration from the CLI

Tape Rants and Raves: LTO4 Rules

IP Filter in OpenSolaris

iSCSI Security with CHAP

Plastic Ocean

apparently you aren't dead until you start to stink

Charlie Goes to Candy Mountain

iSCSI Security with CHAP

Seattle Scalability Conference, Pt II

Singing Tesla Coil.

Magic Tricks Tutorial Videos

Announcing the Hyperic VMware Appliance

SysAdmin Magazine: RIP

The megafreeze development model is broken

Geektalk

Recent JVM benchmarks

Overclocking tool for the Mac Pro

ADO.NET Entity Framework (Microsoft's new ORM) given a non-confidence vote by beta testers

Ruby interpreter flaws make the case for JRuby

The Stalled Server Room

AdvFS - Tru64 filesystem ported to Linux

OpenSolaris 2005.05 repository update to b91 - follow these instructions carefully

SXCE can ZFS install as of b90

Vertebra: EngineYard's Next Generation Cloud Computing Platform

Skype 4.0 beta overhauls video chat

Mozilla org receives traditional IE cake

Toyota Prius to go entirely Electric

Bill Gates steps down permanently for philanthropic activities

Men write code from Mars, Women write more helpful code from Venus

SproutCore - a GUI event driven model javascript web development platform, rails based by the looks of it.

Finding ARPANET

DRBD LVM Xen = Bug. A rather nasty one at that.

Intel unveils Ct as an extension for C/C to encourage threaded programming for multiple cores

VMWare ThinApp - Run any Windows app on any version of Windows

JDBC adapter for HBase

JRuby-Rack <-- a JRuby port of Rack

Rack <-- a lighter cousin to Merb, fully threaded and no Mutex.

Datamapper.org <-- ActiveRecord like, with no need to do migrations, it just kind of handles that by itself internally automagically.

Solaris Cluster Express (SCX) 6/08 released.

a-i-studio.com/cmd

CMDLogParserDemo

Changing solaris' default password hashing

Texas based service provider explosion affects 9,000 servers and 7,500 customers.

Jruby on Rails on Tomcat deployed as as WAR file

Rubinius

Milkfish.org SIP Router

42 more of the best Linux games

42 of the best Linux games

XenWindowsGplPv drivers

Use Google's cached ajax libraries

Arduino microcontroller with OS/X

The metasploit page describing the full impact of the poor RNG.

Holger Bert's blog post on the openssl RNG fiasco

Cayac - Cherokee MySQL PHP5 phpMyAdmin

ZFS very slow under an xVM kernel

VMWare's review-board.org

Google DocType

Dynamically editing libvirt xml configs while a VM is running to redefine reboot flags.

Chronoton - the time travelling robot who's best friend is a talking pie game

Endace DAG

Your pizza is done

Rietveld - Google's code review tool

Opensource multitouch displays

RTL8139 drivers supporting QEMU tcp segmentation offloading (XP's default driver does not) - doubles networking speed of Xen HVM networking without using the GPLPV drivers

Corporate map.

Ono - an efficient way to locate nearby peers

Solaris CIFS integrated AD with ZFS acls

Samba Winbind and ZFS acl working together

Why's unholy Ruby to Python .pyc compiler

Zentific poll daemon 1.0 beta

Solaris SAM-QFS NFS and OS/X

OpenSolaris 2008.05 final ISO image

Twitter abandoning Ruby on Rails

HP makes memory from a once-theoretical circuit

AVS seamless with ZFS

OpenVZ live migration demo

Setting Up an OpenSolaris NAS Box: Father-Son Bonding - The Video

Linux kernel Xen self-ballooning patch

HyperVM

FuildVM

Coolstack - Yet another group of solaris packages

SFE - Spec Files Extra - or, solaris's ports system

ksplice - live linux kernel patching

ZFS-102-A.pkg - binary package build of newer ZFS for Mac

ZFS for Mac Project page

Changing boot flags for a solaris domU guest

RAM based SSDs

Augeas - a configuration API

callflow - SIP callflow diagram generator

sdedit - quick sequence diagram editor

Milax - The OpenSolaris Small Live CD

Sun close-sourcing MySQL

Intel hardware virtulization breaks kvm - if you're going to run HVM on Intel, you want Xen 3.2 for the improved software emulation of instructions broken in Intel's hardware virtualization

Big Nerd Ranch on Windows/Linux/Leopard single signon

Sun touts big plans for OpenSolars as first release nears

Heroku - EC2 based Rails hosting.

RIP John Achibald Wheeler

Meadowcourt's compiled WindowsXenPV driver, v0.8.8, as built from win-pvdrivers.hg repo

What's new in Solaris 10 U5

The Thing About Git

Network Solutions hijacks all customer's unused subdomains

ZFS Evil Tuning Guide

ZFS speed bump: set zfs_nocacheflush = 1

We Don't Use Software That Costs Money Here

Free NIC drivers for Solaris

Hubble - a PlanetLab realtime Internet "blackhole" monitor

Citrix price jumps on rumors of potential IBM/Cisco bidding ware

Segway RMP

TechCrunch labs on their AppEngine deployment

pash - because powershell was too cool to let microsoft keep to itself

Skeptologists

Google AppEngine

Brazil migrates 430 thousand boting machines to Linux

How xVM can be made to suspend/restore instead of shutdown/restart guests on reboot of the underlying xVM host.

The Machine Emulator - TME can emulate a sparc4 with OBP

SFE - spec-files-extra

OSCON2008 schedule

Google releases new GCC linker

Automatic generation of peephole superoptimizers

Zentific

Zentifi

Disabling nagle under Solaris

Xen.org Trademark Policy for Review

SXCE b85 has problems booting under Xen 3.2

OpenSolaris xVM sysadmin doc

VNRP == opensolaris quagga rbridges crossbow xVM

RBAC vs sudo HOWTO

problems reprobing iscsi devices with solaris 10

IPMP for Solaris Zones

All OpenSolaris flag days

Liveupgrade for idiots

Sigma DP-1 review

ratemynetworkdiagram

LSI MegaRAID SAS/Dell PERC5 driver for Solaris

dm-band block IO bandwidth controller

Sun open sources SAM-QFS

Dojo.storage - Google Gears workalike?

PerlCritic

PerlTidy

Tux droid

ooma.com - free phone service after you buy their device

Hacking defibrilators shockingly easy

Microsoft working with Eclipse.

Pentagon attack last June stole an "amazing amount" of data

Solaris and Solaris Cluster on HP ProLiant Servers

Apple Introduces new MacBook and MacBook Pro models

Sun leaks 6-core Xeon, Nehalem details

Xen and Solaris - a journal of sorts

How to save the world with ZFS and 12 USB sticks

Xvm: a summary of creation of various Xen domU

OpenSolaris b82 comes with CoolStack

Disk Encryption Cracked?

Dilber PHB on Virtualization Consultants

Dilbert PHB on Virtualizing

Burger Haiku Contest

Sun xVM Ops Center GA v1.0 tomorrow

KernelTrap on the 2.6.23 Xen merge

Infiniband explained.

IETF XMPP/SIMPLE Interworking Draft

PSYCed - IRC/XMPP server that gateways transparently between both

Wikipedia OTR

OTR - Off The Record, Homepage. IM Encryption.

SIPE - Pidgin plugin for SIP/SIMPLE with Microsoft LCS compatibility hacks

Price Waterhouse Cooper's Global Cable Map

Solaris Windows iSCSI speedup disabling NAGLE

qooxdoo.org

ConVirt

OpenSolaris Storage Developer Wish List

Nexenta Builder - build your own Nexenta based distribution

Microsoft to acquire SideKick maker Danger

Linux Kernel 2.6.23-2.6.24 vmsplice local root exploit

The evolution of Tech Company logos

Hypertable

Mindstorms NXT Rubiks Cube Solver

Cut four undersea cables, shame on you, cut a fifth, also shame on you

Koha - OpenSource Integrated Library System

Oracle's VM patch for Xen to allow 32bit/64bit domU save/restore/migrate with a 64bit hypervisor and a 32bit dom0.

2 girls, 1 cup: The show

SIPE - SIP Exchange protocol - or, how to get Pidgin to talk to Microsoft Live Communication Server

Little notes on ZFS storage

Amazon SimpleDB written in Erlang

NexentaXenDom0

Three different cable cuts in the middle east: two off the coast of egypt, one off the coast of dubai.

Xen DR7 and CR4 Registers Multiple Local DoS vulnerabilities

XMLPulse - parse xen dom0/domu stats

Universcale

The rist of the FOSS spinmeister

ThinkingRock GTD

Smartphones patented - lawsuits immediately filed

TestDisk - when you've botched a simple->dynamic disk conversion and need that NTFS filesystem back, give this a try. Also, if you partition a disk mistakenly, this can find your filesystems and reconstruct a partition table painlessly.

H-Sphere cross-platform hosting control-panel

Mystery infestation strikes Linux/Apache web sites

Fenxi - A java based OpenSource Performance Analysis Engine. Fenxi (mandarin for analyze) is the successor to the Sun-internal tool called Xanadu.

Gizmo backdoor dialing

GNU/Solaris - When the fun begins

KDE goes cross platform with Windows and Mac/OSX support.

Microsoft prints get-out-of-jail card for Vista Home

Tsung - an erlang based multi-protocol distributed load testing tool

Microsoft relents, ban on vista virtualization is lifted

Architecture for Lustre ZFS

Lustre ZFS

Hyperic podcast talking smack with Luke KAnies of Puppet

Commodore SX64 vs MacBook Air

The Mysql storage engines, and when they are appropriate.

MADOCA - Message And Database Oriented Control Architecture

SMP Xen HVM Windows guests need timer_mode=1

Remember, Oracle owns innodb

Sun buys MySQL for $1billion

Wearscience.com

DreamHost billing issues

James Randi is coming to Tampa

Information Of Those Who Appealed Watch List Compromised

ITConversations

CNN Secondlife Blogs

Google MapReduce stats

Tata Nano - $2500 world's cheapest car

Dilbert on Agile Programming

Banks banned in Second Life

shimmer

Ubuntu embraces OpenVZ

Sears goes spyware

Savingtheinternetwithhate.com

Avocent KVM over IP

Zed Shaw: Rails is a Ghetto

Air Travel with Spare Batteries? Check the changes to what is permitted starting tomorrow.

TBO Crime Tracker

Tampa crime grid maps

TechShop Orlando

OpenNetAdmin

Open Configuration and Management Layer

FiveRuns RM-Manage - rails project monitoring

VLDB - Very Large Data Base Endowment Inc - nonprofit

Elastix - a more friendly Trixbox fork

The C days of Y2k

Toshiba micro nuclear reactor

Ball pit couch

A Glimpse and a Hook - a take on resumes

Xirrus - LISA used 7 arrays to provide WiFi

ipcluster

Imagine Peace

dopd - an easier way to keep drbd primary/secondaries in sync

OpenSIM - run your own SecondLife grid.

$4million in hardware lost in London data center heist

iscsi block device script for /etc/xen/scripts

Quaqua - Aqua look and feel widgets for jvm

Java6 for os/x: Soylatte

Chimps beat humans in memory tests.

WinFUSE

Level 3 needs technicians with FIREBALLS

10 steps to close down an open society

Slurm tutorial PDF

Longer flights to avoid air traffic control charges

News release from Six Apart about LJ sale to SUP

SUP bought LJ from Six Apart

Optimus keyboard is finally available

PlasticFS - an LD_PRELOAD to make applications think they're on a case-insensitive filesystem, and other neat hackery

pkgGen and logGen and Packagemaker - repackage os/x packages to deploy

Jumpbox.com - virtual appliances

TelegraphCQ - barkeley database research - adaptive dataflow capture, combine, analyze

UK loses CD of private info on 25million citizens

Solaris Automatic Migration opensourced

AVS ZFS Demo <-- replicated ZFS pool

Xen Virtualization book not yet published for sell on Amazon

Phoenix BIOS releasing its own hypervisor

Andrew Warfield's other publications

Parallax - managing storage for a million virtual machines, from the Xen guys at Cambridge

Kepler project - GRID scientific workflow engine

Google Distributed Systems

Google Code Map/Reduce mini lectures

What 24 would have been like in 1994.

WaterRoof - Mac OS/X Firewall Manager

Fedora Func

10 reasons why Oracle databases run best on VMWare

Google Caja - allow scripts in a 3rd party context

Miro 1.0 launched

Xen Windows PV drivers - opensource mercurial repository

QuickSilver - opensourced 11/06/07

vmcasting.org - someone else "gets it"

Vista True Info

ASUS EEEPC701 starts to appear

RedHat virt-factory

oh, THAT spacecraft! oops!"

Perian - Opensource quicktime codecs

KVM-lite == kvm-quemu lguest

RedHat cobbler

RSnapshot - an rsync based dirvish like tool

Flyback - a google code project equivalent to Apple's Time Machine, for Linux

Buglabs.net

Apple tablet PC is real, says Asus.

Yahoo Zookeeper

producten.hema.nl - wait for this one to load

Google rolls out the Open Handset Alliance

Cost analysis of Windows Vista Content Protection

HDF5

Git - a Google Talk by Randal Schwartz

Asus EEE PC 701

JQuery's AJAXSLT plugin

Google's AJAXSLT

indeed.com - MIT search engine for jobs crawled from monster, dice, etc.

Genius files

Genius - a mac flashcard app

The Day The Routers Died

Tomshardware's RAID Migration Adventure

Theo de Raadt on Virtualization, and the sate of OpenBSD Xen

Prius Limo

Tamparuby youtube video

Bitlbee - IRC gateway all of your other IM traffic

Off The Record - encrypted IM overlay

SATA drive -> NES cartridge style

SVN time lapse view

Google Gears in Motion

Amazon's one-click patents struck down

Morgan Stanley sells entire New York Times stake

The future of malware

GTDTools

GTD - Getting Things Done

PS3 supercomputer

Dolphin SCI

Massive installation management tools

smbldap-tool addons

Wi-Fi Detector Shirt

GULP: a unified logging architecture for authentication data

Sun xVM

Crazy Patents

zypper - suse's apt analog?

EC2 outage loses customer data

FutureOfWebApps conference underway

Microsoft releasing the Source Code for the .NET libraries

LiveView.sf.net - Java based graphical forensics tool that creates VMware virtula machines out of raw disk images or physical disk.

Thinstation.sf.net

Windows 2003 Server Emergency Management Services (EMS) - Special Administration Console (SAC)

Catalyst - the Perl web framework analog to Rails

Fusion io - the power of 1000 harddrives in the palm of your hand

Thingamy

Proggyfonts.com - fixed width font downloads

Verizon FIOS moving to IPTV

Heavy Reading

Math bug in Excel 2007

Glue

CoworkingOrlando

likemind.us

BlogOrlando starts Friday

BarCamp Orlando is this weekend

ESX3i Dell demo

How to us CHDK to give your Canon digial camera RAW support

Opcon/xps batch system

PBS batch system

LSF batch system

SGE batch system

UIKit Hello World

Cygnal - When Red5 just won't cut it for an RTMP server

Creepy pooch

IBM's CoScripter - automating web-based processes

AjaxWindows.com - Another Michael Robertson company

p0f passive fingerprinting IDS

Talking storage systems with Sun's ZFS team

Dr Nick's Magic Models

SproutCore - a MVC scaffolding for actual Application development

Skype protocol obfuscation layer

Microsoft Silverlight and the Mono team at Novell join up to create the Moonlight project

Bitlbee - bridge IM client networks to an IRC channel.

EJBCA - The J2EE Certificate Authority

OSC CAtool

Festo's latest pneumatic tech

Mcell 3.5" drive has 1GB of DDR RAM 2.5" drive == 110MB/s transfer rates

TENORIO-ON Product Demo

OpenSolaris Xen domU with a linux dom0

Tentakel: distributd command execution

Ganeti: Opensource virtual server management software for Xen

Seemless dynamic image resizing

Mono and XPCOM scripting VirtualBox

The bacon mat

podbrix young woz and jobs playset

Woz gets a speeding ticket for 104mph in a Prius

Sam Ruby's long bets

Project Starfire

The real computer monster

Google Starts Shared Storage Service

The $200 billion ripoff

OS/X TPM driver

Storm Worm DDoSes scanning machines

wiki.openmanagement.org

Defendant wins access to the Intoxilyzer 5000EN Breathalyzer source code

BarCampESM

IronKey

The Funded - VC ratings

Horrible Microsoft Vista song

How to replace graffiti 2 with the original graffiti on a Palm

customizegoogle.com - a firefox plugin for customizing google

Wed, 30 Nov 2005

Copy on Write (CoW)

First off, lets decide how we're going to build our filesystems. While there is CopyOnWrite (CoW) support (LVM writable persistent snapshots), it isn't 100% reliable yet, and doesn't handle out-of-space conditions very well. Because of this, I am going to avoid using it.

That doesn't mean we shouldn't understand it a bit first though:

Creating the "virgin" backing store volume:

        # lvcreate -n virgin -L 4G vg
        # mkfs -t xfs /dev/vg/virgin
        # mount /dev/vg/virgin /mnt
        # debootstrap sarge /mnt http://source.rfc822.org/debian
        # vi /mnt/etc/fstab 
        # umount /mnt

Creating a clone filesystem:

        # lvcreate -s -n myclonedisk1 -L 1G /dev/vg/virgin

This new volume ("myclonedisk1") can handle up to 1G of "block differences" before it runs out of space. To that end, you will need to periodically grow the block device depending on the space remaining:

        # lvextend +1G /dev/vg/myclonedisk1

Can you see the danger here? For each clone disk snapshot, you will need to monitor the space used to see if enough space remains, and grow it whenver the space approaches some kind of threshold. If something goes crazy and rapidly makes changes to a filesystem, you may not catch the change in time with a monitoring script in dom0, and you may get a fatally corrupted volume in the process.

For this reason, I am avoiding it.

XenU RAID1 vs dm-mirror

Rather than use the somewhat experimental dm-mirror support for mirrored volumes, we're going to leave the mirroring up to the XenU domains to do themselves.

Lets create a domain that runs on "node0", the first cluster node:

Create some volumes.

        # lvcreate -n blenke-web-00_mirror0 -L 4G vg /dev/md3
        # lvcreate -n blenke-web-00_mirror1 -L 4G vg /dev/etherd/e0.1

Fill the primary volume:


        # mkfs -t xfs /dev/vg/blenke-web-00_mirror0
        # mount /dev/vg/blenke-web-00 /mnt
        # debootstrap sarge /mnt http://source.rfc822.org/debian
        # vi /mnt/etc/fstab
        # echo blenke-web-00 > /mnt/etc/hostname

Rather than using debootstrap, I strongly suggest doing this once and rsyncing other images from this base tree somewhere in your management infrastructure.

Now that ther volumes exist, here is a XenU configuration that would use these volumes:

        # cat - <<EOF > /etc/xen/auto/blenke-web-00
        kernel = "/boot/vmlinuz-2.6-xenU"
        memory = 64
        cpu = -1 # Xen should allocate a proc to run on.
        vcpus = 1 # We only want 1 CPU for this domain (Xen 3.0 SMP!)
        name = "blenke-web-00"
        nics = 1
        vif = [ 'mac=aa:00:0a:00:00:0a, bridge=xenbr0' ]
        ip = "10.0.0.10"
        disk = [ 'phy:vg/blenke-web-00_mirror0,sda1,w',
                 'phy:vg/blenke-web-00_mirror1,sda2,w' ]
        root = "/dev/md0 ro"
        EOF

(more to come)

Wed, 30 Nov 2005

This is a summary of the GFS wiki instructions, as applied to our new cluster.

First, get fenced running:

        # fence_tool join

Next, create the GFS filesystem:

        # gfs_mkfs -p lock_dlm -t <ClusterName>:<FSName> -j <Journals> <Device>

        <ClusterName> must match the cluster name used in CCS config
        <FSName> is a unique name chosen now to distinguish this fs from others
        <Journals> the number of journals in the fs, one for each node to mount
        <Device> a block device, usually an LVM logical volume

for a 2 node setup ("node0" and "node1"), you might use:

On node0:

        # lvcreate -n shared_node0 -L 10G vg /dev/md3
        # lvcreate -n shared_node1 -L 10G vg /dev/etherd/e0.1

        # gfs_mkfs -p lock_dlm -t blenke:shared_node0 -j 2 /dev/lv/shared_node0
        # gfs_mkfs -p lock_dlm -t blenke:shared_node1 -j 2 /dev/lv/shared_node1

On both:

        # mkdir -p /shared/node0 /shared/node1
        # mount /dev/lv/shared_node0 /shared/node0
        # mount /dev/lv/shared_node1 /shared/node1

Remember: GFS filesystems, while accessible by both nodes, ARE NOT MIRRORED. You create the GFS filesystem on a shared block device. If the block device happens to be on one server or the other, when that server is rebooted, the other nodes will be unable to access that filesystem.

For cluster mirroring, look for dm-mirror and the lvcreate -m option. The dm-mirror kernel module is made up of dm-raid1 and dm-log, which is being worked on by RedHat right now LVM2 Mirroring for RHEL4. Currently only pvmove and lvmcreate -m use this kernel module (if you have a recent lvm2 build), and you're really on your own.

If you have a cluster of more than 3 nodes (more than 3 PVs in the cluster VG), you can create a mirrored volume. One PV will get one half of the mirror, one PV will get the other half of the mirror, and one PV will get the mirror log volume.

        # lvcreate -m 1 -n mirror1 --alloc anywhere -L 4G vg
        Logical volume "mirror1" created
        # lvscan
        ACTIVE            '/dev/vg/mirror1' [4.00 GB] anywhere
        ACTIVE            '/dev/vg/mirror1_mlog' [4.00 MB] anywhere
        ACTIVE            '/dev/vg/mirror1_mimage_0' [4.00 GB] inherit
        ACTIVE            '/dev/vg/mirror1_mimage_1' [4.00 GB] inherit
Wed, 30 Nov 2005

First, create a Physical Volume for the local RAID10 stripe, then for the remote RAID10 stripe via AoE:

    pvcreate /dev/md3
    pvcreate /dev/etherd/e0.1

This is where that extra RAID stripe comes in. The first pv is for the stripe on this cluster node, the second is for the stripe on the other cluster node.

Next, create a Volume Group that contains both Physical Volumes:

    vgcreate vg /dev/md3 /dev/etherd/e1.0

This creates a "vg" volume group that is visible from both cluster nodes, where volumes can be carved out as needed between them.

(Note: This does not mirror the pv's. That's what the -m flag to lvcreate is for. Alternatively, the XenU domain must do software RAID1 to accomplish this goal.)

Wed, 30 Nov 2005

lvm2 is an entirely userspace abstraction that uses the devmapper kernel module to present volumes carved out of physical block device space.

lvm2 has a cluster manager called "clvmd" that registers with cman to communicate with other cluster nodes to act in a cluster configuration. With clvmd, lvm2 becomes a cluster-wide naming system for volumes carved up out of network exposed block devices, and a locking engine for the same.

        # apt-get install lvm2

Or build from CVS:

        # cvs -d :pserver:cvs@sources.redhat.com:/cvs/lvm2 login cvs
        # cvs -d :pserver:cvs@sources.redhat.com:/cvs/lvm2 checkout LVM2
        # cd LVM2 ; ./configure --with-clvmd=cman --with-confdir=/etc/lvm --prefix=/usr && make && make install

After the cluster is configured and running ("ccsd" and "cman"), and lvm2 is installed, we need to edit /etc/lvm/lvm.conf to make this a cluster aware setup.

        # vi /etc/lvm/lvm.conf

In devices {}, Add:

        filter = [ "a|/dev/etherd/*|" ]
        types = [ "aoe", 1024 ]
        sysfs_scan = 0

In global {}, comment out:

        # locking_type = 1

just below that, in global {}, uncomment or add:

        locking_library = "liblvm2clusterlock.so"
        locking_type = 2
        library_dir = "/lib/lvm2"

Then save, and start up clvmd (make sure cman is running first, and the node is part of the cluster):

        # clvmd &

You can now scan for volume groups:

        # vgscan

NOTE: lvm2 does not scan AoE devices by default. In fact, if you have sysfs enabled it will not find AoE devices at all, even if you add a filter that matches them. Moreover, lvm2 will only find AoE devices with a major as listed in /etc/modules:

        # grep aoe /proc/devices
        152 aoechr
        152 aoe

This means that all of the AoE devices you wish to scan must start with a major number of 152. If you look at /dev/etherd, you will see 16 "partition" devices for each shelf/slot device by default. Using 16 partitions, as AoE assigns minor numbers linearly, the crossover to major 153 happens just after "e1.5p14". This means that you really only have all of one shelf visible to lvm2, and part of a second (a maximum of 16 devices.. not good for a large cluster of more than 16 nodes).

One "fix" is to edit drivers/block/aoe/aoe.h in your kernel source and replace "AOEPARTITIONS 16" with "AOEPARTITIONS 1":

        # perl -pi -e 's/(AOE_PARTITIONS 1)6/$1/g' drivers/block/aoe/aoe.h

Alternatively, set AOE_PARTITIONS=1 when building your kernel

        # make ARCH=xen AOE_PARTITIONS=1 oldconfig clean bzImage modules module_install

Rebuild your kernel, then re-generate your /etc/ethered devices using the n_partitions variable:

        # n_partitions=1 aoe-mkdevs /dev/etherd

This really fixes the problem, and lvm2 can scan all of the AOE shelf/slot devices!

Wed, 30 Nov 2005

When configuring the RedHat clustering, you must create a cluster.conf which will exist on every node.

 # vi /etc/cluster/cluster.conf

This is an example 2 node configuration, with manual fencing:

        <?xml version="1.0"?>
         <cluster name="blenke" config_version="1">
         <clusternodes>
          <clusternode name="smart" nodeid="1" votes="1">
           <fence>
            <method name="human">
             <device name="last_resort" ipaddr="smart.ssn.blenke.net"/>
            </method>
           </fence>
          </clusternode>
          <clusternode name="stupid" nodeid="2" votes="1">
           <fence>
            <method name="human">
             <device name="last_resort" ipaddr="stupid.ssn.blenke.net"/>
            </method>
           </fence>
          </clusternode>
         </clusternodes>
         <fencedevices>
          <fencedevice name="lastresort" agent="fencemanual"/>
         </fencedevices>
         <cman port="6809" twonode="1" expectedvotes="1">
         </cman>
        </cluster>

   

Once the config file is created, we start ccsd. The ccs daemon keeps the configuration in sync between cluster nodes.

    /etc/init.d/ccsd start

Next, join the cluster with cman. The cman kernel module is the cluster manager. It uses dlm locking and heartbeat thread to form a quorum of nodes that are part of the cluster.

    # cman_tool join

This will join, or create, a cluster.

Wed, 30 Nov 2005

On a cluster server, the goal is to share storage with other nodes in the cluster.

Each cluster server node is going to share the entire /dev/md3 stripe as a single large block device to the other clvm'ed nodes.

Each "shared" cluster stripe will be defined as an AoE shelf/slot.

vblade 0 0 eth1 /dev/md3

This will create a device "/dev/etherd/e0.0" shared over the eth1 network interface between the cluster nodes on the shared private storage network. Only the other nodes will see this device, you must continue to reference it as /dev/md3 locally. LVM2 will automagically scan this device and include it when re-assembling the cluster volume group on boot.

For production use, as vblade doesn't fork, the easiest way to keep vblade running is to add it to inittab as respawn.

on node0:

# echo "e0:2:respawn:/usr/sbin/vblade 0 0 eth1 /dev/md3" >> /etc/inittab
# init q

on node1:


# echo "e1:2:respawn:/usr/sbin/vblade 0 1 eth1 /dev/md3" >> /etc/inittab
# init q

You should see output from the vblade starting appear in /var/log/daemon. On the other node, you should be able to aoe-discover and aoe-stat show the device:

# aoe-interfaces eth1
# aoe-discover
# aoe-stat
e0.0       306.440GB   eth1 up

Note: as this is at the end of /etc/inittab, and running in runlevel 2, the rc2 script will need to finish first before init starts respawning vblade. To expose the aoe device to the network before this point (if you really must), just put this line before the rc2 line in /etc/inittab.

Wed, 30 Nov 2005

Both aoetools and vblade (the ATA over Ethernet target) have debian packages. If you can't apt-get install them straightaway, drop me an email, and I'll post the backports of these to woody.

This step needs a bit more documentation (will fill it in shortly).

Wed, 30 Nov 2005

During a "make dist", the build process looks in xen-unstable/dist/install/boot/config-2.6.12.6-xen0 (or -xenU) for the config file to use, and will override the default w/ those files if they exist.

It's generally best to remove the xen-unstable/linux-1.6.12-xen? directories between builds if the Xen tree has been updated; safer that way.

You will need to change the Xen0 2.6.12 kernel so that it builds with devmapper (dm) support, and ATA over Ethernet (AoE):


# cd xen-unstable/linux-2.6.12-xen0
# make ARCH=xen menuconfig clean bzImage modules
# cp -f arch/i386/boot/bzImage /boot/vmlinuz-2.6.12.2-xen0
# cp -f System.map /boot/System.map-2.6.12.2-xen0
# cp -f .config /boot/config-2.6.12.2-xen0
Once you're done rebuilding and preparing to install your kernel, you will also need to re-build the "dlm" and "cman" kernel modules as well:

# cd cluster ; ./configure --kernel_src=`pwd`/../xen-unstable/linux-2.6.12-xen0
# make -C cluster/ install

You will also need to add a boot menu option for this Xen kernel using the Xen 3.0 hypervisor:


# vi /boot/grub/menu.lst

Add a section like so:


title Xen 3.0 / XenLinux 2.6.12.6
kernel /boot/xen-3.0.gz dom0_mem=256000 console=vga apic_verbosity=verbose noapic
module /boot/vmlinuz-2.6.12.6-xen0 root=/dev/md0 noapic ro console=tty0

Note: this is why we don't use lilo. Getting lilo to work with command line arguments for both kernel (append=) and module (initrd=) is only the beginning of the pain. Use grub. Be happy.

You are now ready to reboot with a cluster-ready Xen kernel.

Wed, 30 Nov 2005

To build the source below, we will need a compiler, and cvs for the source checkouts.

# apt-get install gcc-3.4-dev libc6-dev cvs

Xen has a few dependencies:

# apt-get install libncurses5-dev bridge-utils hotplug iproute python2.3-dev zlib1g-dev

If you want to build the documentation as well, you'll need a few more (tetex, "ps2pdf" from gs-common, and "fig2dev" from transfig, and a recent version of perl with pod2man that supports the --name option).

# apt-get install tetex gs-common transfig perl

Now, grab the Xen "unstable" release and extract it. This includes a 2.6.12 kernel, which is required by the RedHat cluster tools (which we will discuss below).

# wget http://www.cl.cam.ac.uk/Research/SRG/netos/xen/downloads/xen-unstable-src.tgz
# tar xvzf xen-unstable-src.tgz

Now, build the userspace Xen tools and an initial Dom0 kernel (we will rebuild it in the next step, don't worry too much about the .config file right now):

# cd xen-unstable
# make dist (everything builds)
# ./install.sh
Installing Xen from './dist/install' to '/'...
All done.
Checking to see whether prerequisite tools are installed...
Xen CHECK-INSTALL  Wed Nov 23 22:46:09 EST 2005
Checking check_brctl: OK
Checking check_hotplug: OK
Checking check_iproute: OK
Checking check_python: OK
Checking check_zlib_lib: OK
All done.
# make install

Other bits that probably aren't required anymore:

Now we're done with the Xen kernel and userspace tools. Lets move on to the RedHat cluster tools to build against the Xen Dom0 kernel.

The stable RedHat cluster tools can be grabbed via CVS:

# cvs -d :pserver:cvs@sources.redhat.com:/cvs/cluster login cvs
Password: {enter "cvs"}
# cvs -d :pserver:cvs@sources.redhat.com:/cvs/cluster checkout -r STABLE cluster

When we build the cluster tools, we want to point the build at the source tree for the Xen Dom0 kernel so that it builds the appropriate kernel modules.

First, some dependencies:

# apt-get install libxml2-dev

Then a small fix to get around the fact that a glibc 2.2 doesn't have an ifaddrs.h or getifaddrs()/freeifaddrs(). You don't need to do this if you're running a glibc 2.3 or later system:

# cat > /usr/include/ifaddrs.h <<EOF
#define getifaddrs(x)   -1
#define freeifaddrs(x)

struct ifaddrs {
    struct  ifaddrs *ifa_next;
    char    *ifa_name;
    struct sockaddr *ifa_addr;
};
EOF

Yeah, it's an ugly hack, but it fixes our woody enough to allow this to build. I'm a bad bad sysadmin.

In the latest CVS checkout, I also had to add an #include back to the top of cluster/cman/lib/libcman.c:

#include "libcman.h"

Then we build:

# cd cluster
# ./configure --kernel_src=`pwd`/../xen-unstable/linux-2.6.12-xen0
# make install

Now the software is ready. Both the Xen tools and the RedHat cluster tools are installed, and the Xen hypervisor and Dom0 kernel is built with the RedHat cluster kernel modules.

Wed, 30 Nov 2005

I use a debian based distro that I maintain in-house with an extensive hand-maintained repository of backports.

The auto-install platform is roughly based on the SystemImager package, only heavily hacked to simplify maintenance and unify the install script across all of our builds in a flexible way (some day I hope to opensource it here somewhere soon).

I strongly recommend that you have a running filesystem for root (/), usr, and var, that are NOT encapsulated with lvm. You will understand why later. This would be a slightly different layout than our standard NKS setup:


 /dev/md0 - RAID1 - root (/) (1G)
 /dev/md1 - RAID10 - /usr (4G)
 /dev/md2 - RAID10 - /var (16G)
 /dev/md3 - RAID10 - everything else.

You can do the following manually with a Knoppix CD if you really want to:

On a 4 drive Parallel ATA (PATA) setup you can generate the above using:


$ cat - <<EOF | sfdisk /dev/hda
0,500,fd,*
,1000,82
,4000,83
,,5
,8000,83
,,83
EOF

Cryptic, yes, but simple.

Repeat for each drive to partition. Then follow with mdadm to build the arrays:


# /sbin/mdadm --create /dev/md0 --force --run --level 1 --chunk 128 \
    --raid-devices 4 /dev/hda1 /dev/hdb1 /dev/hdc1 /dev/hdd1
# /sbin/mdadm --create /dev/md1 --force --run --level 10 --chunk 128 \
    --raid-devices 4 /dev/hda3 /dev/hdb3 /dev/hdc3 /dev/hdd3
# /sbin/mdadm --create /dev/md2 --force --run --level 10 --chunk 128 \
    --raid-devices 4 /dev/hda5 /dev/hdb5 /dev/hdc5 /dev/hdd5
# /sbin/mdadm --create /dev/md3 --force --run --level 10 --chunk 128 \
    --raid-devices 4 /dev/hda6 /dev/hdb6 /dev/hdc6 /dev/hdd6

Keeping with this scheme, booting single user, or init=/bin/bash, should give you at least md0 from which you can mount md1 and md2 to do rescue operations. This should be enough to fix most server deaths with RAID1 and without worrying about LVM.

Now format those arrays:


 # mke2fs -j /dev/md0
 # mkfs.xfs /dev/md1
 # mkfs.xfs /dev/md2

And mount them:


# mkdir /target
# mount /dev/md0 /target
# mkdir /target/usr
# mount /dev/md1 /target/usr
# mkdir /target/var
# mount /dev/md2 /target/var

Then fill it with debootstrap (or rsync, or whatever):


# debootstrap sarge /target http://source.rfc822.org/debian

Now edit /target/etc/fstab:


/dev/md0 / ext3 defaults 0 0
/dev/md1 /usr xfs defaults 0 0
/dev/md2 /var xfs defaults 0 0

and install a kernel (this is temporary):


# cp /etc/resolv.conf /target/etc/resolv.conf
# chroot /target apt-get update
# chroot /target apt-get install kernel-image-2.6.8

Now install grub as the MBR on all drives. Make them all bootable as hda, in case hda should die. NOTE: We do not use lilo, as it cannot handle booting the Xen hypervisor and Xen kernels without some ugliness.


# chroot /target apt-get install grub
# mkdir /target/boot/grub
# cp -a /lib/grub/i386-pc/ /target/boot/grub/
# cp /target/usr/share/doc/grub/examples/menu.lst /target/boot/grub/menu.lst
# grub
grub> root (hd0,0)
grub> setup (hd0)
grub> setup (hd1)
grub> setup (hd2)
grub> setup (hd3)

Edit your /target/boot/grub/menu.lst so that it points to the kernel.

Now you should have a bootable system. Unmount the /target mounted filesystems and reboot.

You should now be running a base install of a distribution of Linux on your server that boots with grub and has an unused md storage device that spans the majority of free space (/dev/md3). Xen requires the former, and lvm2/aoe will require the latter.

Tue, 29 Nov 2005

When building any Linux cluster, the first step is laying out the topology and shared storage.

To keep things simple, fast, and cheap, ATA over Ethernet (AoE) is really the best solution available at the moment.

For simplicity, each server in the cluster will be given two network interfaces. An "internal" protected storage network, and an "external" firewalled public network.

Tue, 29 Nov 2005

The goal: Make a managable cluster of machines work together to provide 99.999% availability for a set of virtual machines in the fastest way possible with current cheap commodity hardware.

To this end, I've put a bit of energy into building a simple Xen cluster. This whitepaper is an attempt to document the effort.

Xen is a hypervisor. Think of it as a microkernel done right. There exists Linux, NetBSD, and even an OpenSolaris port that run under the Xen hypervisor. The "host" machine is Domain 0 (Dom0), and is responsible for talking to hardware on the box and configuring and booting the Domain User (DomU) slices. Don't be confused by Dom0, however; the Xen hypervisor is the magician behind the scenes making this possible.

Xen 3.0 has migration features: you can move a Xen DomU instance between physical Xen servers. To do this, however, you need a shared storage system, or some method of NAS/SAN visible to all nodes in the cluster.

RedHat has a wonderful clustering platform with native clustered stupport for LVM2. Instead of GNBD, however, I've decided to use ATA-over-Ethernet for simplicity and speed. With this, we have a clusterable group of machines that share a common storage namespace (and can access each other's storage directly via the network), permitting native Xen domain migration.

The following guides formed the basis of the above decision:

Mon, 28 Nov 2005

It appears the Video Keg has been slashdotted.

This is a poor little user-mode-linux image running ruby on rails via fastcgi. The only thing saving me thus far is fragment caching within Rails.

Odd that something put together 3 years ago is getting slashdotted now.

I'll be watching this server closely today...

Tue, 22 Nov 2005
Tue, 15 Nov 2005

Brilliant. Apparently Sony's rootkit does DNS lookups. A brilliant soul took that fact and checked DNS caches and found hits on at least 568,200 nameservers.

Doxpara.com

and an image of the resultant data plot across this hemisphere:

Wow.

Sun, 13 Nov 2005

Andrew Escobar has found how to enable safe-sleep suspend-to-disk on Macs other than the newer powerbooks. This may have be started by Matt Johnston, who has another great guide to this.

1. Set has-sleep-safe property

The first step is to enable the has-safe-sleep property in nvram:

sudo nvram nvramrc='" /" select-dev
" msh" encode-string " has-safe-sleep" property
unselect
'
sudo nvram "use-nvramrc?"=true

which should look like this in a terminal window:

Last login: Fri Nov 11 11:11:11 on ttyp1
Welcome to Darwin!
computer:~ User$ sudo nvram nvramrc='" /" select-dev
> " msh" encode-string " has-safe-sleep" property
> unselect
> '
computer:~ User$ sudo nvram "use-nvramrc?"=true

2. Enable Sleep Safe

Sleep Safe requires as much free disk space as physical memory, plus 750MB. To enable Sleep Safe, in the Terminal enter:

sudo pmset -a hibernatemode 3

"If you have secure virtual memory enabled, use 7 rather than 3 to disable encrypted hibernation. Encrypted hibernation does not work. Do not set it to 7 if you do not have secure virtual memory."

This will create the file /var/vm/sleepimage which will be used for the actual suspend-to-disk.

Disabling Safe Sleep

To disable Safe Sleep enter in the Terminal:

sudo pmset -a hibernatemode 0

No need to restart.

For a more full undo, disable all nvramrc variables:

sudo nvram \"use-nvramrc?\"=false

For more info, visit Andrew Escobar's blog post and comments, or Matt Johnstons webpage.

Fri, 11 Nov 2005

As a pluggable daemon, mcp needs a flexible command syntax to permit both control of the plugins and passthrough of commands to the plugins for scripting.

First, we make a usage function to advise the user:

        def usage(argv,stdin,stdout,stderr)
                stderr.puts "Usage: mcp {command}
Where {command} is one of:

        plugin stop {plugin name}           - Stop a plugin thread
        plugin start {plugin name}          - Start a plugin thread
        plugin load {plugin name}           - Load a named plugin
        plugin unload {plugin name}         - Unload a named plugin
        plugin tell {plugin name} {command} - Tell a plugin a command
        thread list                         - List currently running threads
        exit                                - Kill mcpd
"
                1
        end

You probably want to use a "here document" for that multi-line print, but I'm having problems getting it to render in bluecloth (markdown) at the moment.

Now for the real fun. All of the commands are passed to the command() method. This is where we handle each of the above:

        def command(argv,stdin,stdout,stderr)
                @command=argv.join(' ')
                begin
                        log("mcp #{@command}")
                        case @command
                                when /^quit$/i, /^exit$/i
                                        # Need more exit handling here!
                                        exit
                                when /^plugin list$/i
                                        @plugins.each_key { |plugin| stdout.puts "#{plugin}\n" }
                                when /^thread list$/i
                                        stdout.puts Thread.list.map { |t| "#{t.to_s} #{t['name']}\n" }
                                when /^plugin tell (\S+) (.*)$/i
                                        log("Telling #{$1} to #{$2}")
                                        @plugins[$1].command($2,stdin,stdout,stderr)
                                when /^plugin start (.*)$/i
                                        @plugins[$1].start()
                                when /^plugin stop (.*)$/i
                                        @plugins[$1].stop()
                                when /^plugin load (.*)$/i
                                        plugin_load($1)
                                when /^plugin unload (.*)$/
                                        plugin_unload($1)
                                else
                                        usage(argv,stdin,stdout,stderr);
                        end
                rescue => detail
                        stderr.puts detail.message + "\n"
                        stderr.puts detail.backtrace.join("\n") + "\n"
                        1
                end
        end

Simple, eh? Now plugins are controllable from the command line.

Not bad for ~100 lines of ruby so far.

The next step is setting the thread['name'] properties for the "thread list" command. I'll cover that in the next post.

Fri, 11 Nov 2005

The EFF is collecting a list of people who satisfy the following criteria:

  1. you have a Windows computer;
  2. First 4 Internet's "xcp" copy protection has been installed on your computer from a Sony CD (for more details, see our blog post http://www.eff.org/deeplinks/archives/004144.php referenced above or SysInternals blog http://www.sysinternals.com/blog/2005/10/bypass-traverse-checking-or-is-it.html);
  3. you reside in either California or New York;
  4. you are willing to participate in litigation.

They are considering litigation against Sony.

If you were affected, and fit the above criteria, look into it.

Fri, 11 Nov 2005

I've been running this little script for a while to "smooth-out" traffic from my home network, but don't appear to have posted it anywhere.

This example doesn't use HTB to shape the traffic: instead, it creates three priority queues - each as a round-robin to guarantee fairness between other packets in that class. Also, TCP sessions can jump from one traffic class to another based on their traffic pattern (window size changes).

The neat part about this is that applications also set the TOS bits, meaning that you can run a number of daemons for IP telephony that work perfectly alongside the TOS rules created below.

Here it is:


#!/bin/sh

INTERFACES=eth0

for interface in $INTERFACES ipsec0; do

 tc qdisc del root dev $interface

 tc qdisc add dev $interface root handle 1: prio
 tc qdisc add dev $interface parent 1:1 handle 11: sfq
 tc qdisc add dev $interface parent 1:2 handle 12: sfq
 tc qdisc add dev $interface parent 1:3 handle 13: sfq

 iptables -F -t mangle
 iptables -t mangle -X chkack
 iptables -t mangle -X chktos

 # Prioritize all ICMP traffic
 iptables -A PREROUTING -t mangle -p icmp -j TOS --set-tos Minimize-Delay

 # Prioritize all UDP traffic
 iptables -A PREROUTING -t mangle -p udp -j TOS --set-tos Minimize-Delay

 # Create "check TCP ack" chain. Small ACKs get priority, large ACKs are demoted.
 iptables -t mangle -N chkack
 iptables -t mangle -A chkack -m tos --tos ! Normal-Service -j RETURN
 iptables -t mangle -A chkack -p tcp -m length --length 0:128 -j TOS --set-tos Minimize-Delay
 iptables -t mangle -A chkack -p tcp -m length --length 128: -j TOS --set-tos Maximize-Throughput
 iptables -t mangle -A chkack -j RETURN

 # If a TCP ACK packet is being sent, run it through the "check TCP ack" chain first.
 iptables -A PREROUTING -t mangle -p tcp -m tcp --tcp-flags SYN,RST,ACK ACK -j chkack

 # Create "check TOS" chain. This adapts for things like ssh that use Minimize-Delay
 # by default, but should really use Maximize-Throughput for things like rsync-over-ssh.
 # This checks for more than 2 large TCP packets per second, and corrects their mislabeled
 # TOS appropriately (think top over ssh).
 iptables -t mangle -N chktos
 iptables -t mangle -A chktos -p tcp -m length --length 0:512 -j RETURN
 iptables -t mangle -A chktos -m limit --limit 2/s --limit-burst 10 -j RETURN
 iptables -t mangle -A chktos -j TOS --set-tos Maximize-Throughput
 iptables -t mangle -A chktos -j RETURN

 # Now, match all TCP streams, checking their TOS through the above rule.
 iptables -t mangle