Structure and Practices of the Video Relay Service Program
The YouTube Video You Don’t See
Shop with confidence across the web
Helicopter view of your driving directions on Google Maps
Google CIO and others talk DevOps and "Disaster Porn" at Surge
Burning Man 2011 - Yes we were there.
Getting Started on the Google API
CACertMan app to address DigiNotar & other bad CA’s
Custom Class Loading in Dalvik
TWO REPORTS OF ADVISORY COMMITTEES ON DISABILITIES ISSUES RELEASED
Join the White House Disability Group Monthly Call on July 27
Multiple APK Support in Android Market
Forever alone involuntary flashmob
PS3 root key released - sign and run anything
Don't have a front-facing camera?
Mobile phone product testing: Models
How Can the LHC withstand 1 Petabyte of Data a Second?
Linus Torvalds is now officially a US Citizen
Portland bike lanes get mario symbols
Skype RC4 claimed reverse-engineered
Measurement Lab - Google IO BigQuery session is live querying 60 billion rows instantly
All you need is a little egotism, and $6
Convert IDN punycode to/from native characters
Sparkfun free day tomorrow: 1/7
Need a recursive DNS server? Use 8.8.8.8 and 8.8.4.4
JIQL - Java JDBC wrapper for Google DataStore
Unicorn == Mongrel delayed_job
Remus - Transparent HA for Xen
Crossbow Virtual Wire Demo Tool
Eucalyptus MySQL SOLR RabbitMQ Varnish == Nebula.nasa.gov
Apple drops ZFS due to legal concerns
Peering disputes between Cogent and Hurricane Electric
Equinix to acquire Switch and Data for $689 million
Project kxen renamed project HXEN
Lessconf Jacksonville - followed the next day by Barcamp
Stick-figure guide to advanced AES crypto
Why you should pay attention to Google Wave
rails-primer - how to easily host rails projects on appengine
AppEngine-JRuby on google code
Ruby on Google AppEngine: appengine-jruby video
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine
Proxmox VE - OpenVZ KVM Cluster appliance management
Sun/Oracle kill of SXCE: Sysadmins everywhere cry in horror.
making water drinkable through nano-filtration
Pigin 2.6.1 adds Xmpp voice and video support
Setting up a Layer-3 tunnel VPN using ssh 4.3 and -w option tun devices
shadowserver.org - botnet hunting resources
OpenBSC - a Siemens BS-11 microBTS or a ip.access nanoBTS == your own GSM tower
Karesansui Project - a Xen management harness from Japan
Pygowave Server - Run your own Google Wave server
Xen clocksource0 time went backwards
Internet vs World Population stats
Apple pulls Google Voice app from iPhone - AT&T's fault
live-android boot ISO - very neat
How to update your GeoIP information in addition to SWIPping
Google Wave hackathon on 20th/21st, if you happen to be in Mountainview
Did I mention OTOY here before?
STuPiD - STUN/TURN using PHP in Dispair
Browser based Server-side 3D gaming from OTOY
Cisco's replacement for the WRT54GL is the WRT160NL
Spinn3r.com - Index the blogosphere
Parts of galaxy Messier 87 are missing
DRAEGER ALCOTEST 7110 MKIII-C Evaluation of Breathalizer Source Code
How Michael Osinski Helped Build the Bomb That Blew Up Wallstreet
Bruce Perens - A Cyber-Attach on an American City
How Google and Facebook are using R
adito - the new gpl fork of the old sslexplorer project
IP Address geolocation for free
Shapeways - $50 "3-D poem rings" until the end of the month
GrandCentral to become Google Voice
TurboVNC VirtualGL == FAST network GL
Ben Rockwood's presentation at the OpenSolaris Storage Summit: ZFS in the trenches
The Crisis of Credit Visualized on Vimeo
10gen - a java based app hosting infrastructure
Engineyard Vertebra - another cloud infrastructure management harness
Eucalyptus - an opensource EC2 compatible hosting infrastructure
railsbrain.com <-- ajaxified rdoc
AP IMPACT: SWAT Teams Deployed in 911 fraud
Lessons learned by people who have quit Google
Makwana indicted for Fanny Mae malware
Zentific svn repo: alpha available
DACS - Distribution and Configuration System - version 2.0
Video of Cisco IOS attack talk at Chaos Computer Conference
Cosmic radio background noise 6 times higher than expected
Grow your own bioluminescent algae
Quartz Composer and Cruise Control status
Sunay Tripathi's Solaris Networking Blog
Merry Christmas from Chiron Beta Prime
Google's Native Client... the next ActiveX?
kenai.com - xVM Server Project site
58% Spam Drop from one colo shutdown
Xenomips - a Xen friendly domU version of Dynamips - Emulate a Cisco 7200
Debian and Android dual-boot on the G1
Sipper (SIPr) - a SIP testing framework in ruby
DBslayer - a SQL abstraction layer using JSON
Fingerworks keyboard in a MacBookPro
The Phoenix BIOS hypervisor is Xen
Do you live in a Constitution-Free zone?
Puppet presentation at NYCOSUG this month
XenSmartIO - Infiniband IO for Xen
Starting with b100, OpenSolaris has virtual consoles
OpenSolaris testfarm build server interface now available
Firefox M9 Fenric - Maemo alpha
SystemZ - aka Sirius - a port of OpenSolaris to IBM System Z mainframe OS running in z/VM mode
Solaris and ZFS on a Dell 2950, tweaking notes
Early Access Windows PV drivers for xVM
Economics: The Theory of Interstellar Trade
The Financial Crisis: What Happened and What's Next?
Cisco to run Windows 2008 on their appliance virtually for services
Packetfence: an OpenSource Network Access Control system
persist.js - an alternative to gears
Chinese building "impossible" EM drive
COMSTAR SMTF - solaris FC, SAS, and iSCSI targets
Flexiscale - yet another control panel?
RightScale - cloud control panels?
Criticial ESXi remote vulnerability in openwsman
Copy on Write (CoW)
First off, lets decide how we're going to build our filesystems. While there is CopyOnWrite (CoW) support (LVM writable persistent snapshots), it isn't 100% reliable yet, and doesn't handle out-of-space conditions very well. Because of this, I am going to avoid using it.
That doesn't mean we shouldn't understand it a bit first though:
Creating the "virgin" backing store volume:
# lvcreate -n virgin -L 4G vg
# mkfs -t xfs /dev/vg/virgin
# mount /dev/vg/virgin /mnt
# debootstrap sarge /mnt http://source.rfc822.org/debian
# vi /mnt/etc/fstab
# umount /mnt
Creating a clone filesystem:
# lvcreate -s -n myclonedisk1 -L 1G /dev/vg/virgin
This new volume ("myclonedisk1") can handle up to 1G of "block differences" before it runs out of space. To that end, you will need to periodically grow the block device depending on the space remaining:
# lvextend +1G /dev/vg/myclonedisk1
Can you see the danger here? For each clone disk snapshot, you will need to monitor the space used to see if enough space remains, and grow it whenver the space approaches some kind of threshold. If something goes crazy and rapidly makes changes to a filesystem, you may not catch the change in time with a monitoring script in dom0, and you may get a fatally corrupted volume in the process.
For this reason, I am avoiding it.
XenU RAID1 vs dm-mirror
Rather than use the somewhat experimental dm-mirror support for mirrored volumes, we're going to leave the mirroring up to the XenU domains to do themselves.
Lets create a domain that runs on "node0", the first cluster node:
Create some volumes.
# lvcreate -n blenke-web-00_mirror0 -L 4G vg /dev/md3
# lvcreate -n blenke-web-00_mirror1 -L 4G vg /dev/etherd/e0.1
Fill the primary volume:
# mkfs -t xfs /dev/vg/blenke-web-00_mirror0
# mount /dev/vg/blenke-web-00 /mnt
# debootstrap sarge /mnt http://source.rfc822.org/debian
# vi /mnt/etc/fstab
# echo blenke-web-00 > /mnt/etc/hostname
Rather than using debootstrap, I strongly suggest doing this once and rsyncing other images from this base tree somewhere in your management infrastructure.
Now that ther volumes exist, here is a XenU configuration that would use these volumes:
# cat - <<EOF > /etc/xen/auto/blenke-web-00
kernel = "/boot/vmlinuz-2.6-xenU"
memory = 64
cpu = -1 # Xen should allocate a proc to run on.
vcpus = 1 # We only want 1 CPU for this domain (Xen 3.0 SMP!)
name = "blenke-web-00"
nics = 1
vif = [ 'mac=aa:00:0a:00:00:0a, bridge=xenbr0' ]
ip = "10.0.0.10"
disk = [ 'phy:vg/blenke-web-00_mirror0,sda1,w',
'phy:vg/blenke-web-00_mirror1,sda2,w' ]
root = "/dev/md0 ro"
EOF
(more to come)
This is a summary of the GFS wiki instructions, as applied to our new cluster.
First, get fenced running:
# fence_tool join
Next, create the GFS filesystem:
# gfs_mkfs -p lock_dlm -t <ClusterName>:<FSName> -j <Journals> <Device>
<ClusterName> must match the cluster name used in CCS config
<FSName> is a unique name chosen now to distinguish this fs from others
<Journals> the number of journals in the fs, one for each node to mount
<Device> a block device, usually an LVM logical volume
for a 2 node setup ("node0" and "node1"), you might use:
On node0:
# lvcreate -n shared_node0 -L 10G vg /dev/md3
# lvcreate -n shared_node1 -L 10G vg /dev/etherd/e0.1
# gfs_mkfs -p lock_dlm -t blenke:shared_node0 -j 2 /dev/lv/shared_node0
# gfs_mkfs -p lock_dlm -t blenke:shared_node1 -j 2 /dev/lv/shared_node1
On both:
# mkdir -p /shared/node0 /shared/node1
# mount /dev/lv/shared_node0 /shared/node0
# mount /dev/lv/shared_node1 /shared/node1
Remember: GFS filesystems, while accessible by both nodes, ARE NOT MIRRORED. You create the GFS filesystem on a shared block device. If the block device happens to be on one server or the other, when that server is rebooted, the other nodes will be unable to access that filesystem.
For cluster mirroring, look for dm-mirror and the lvcreate -m option. The dm-mirror kernel module is made up of dm-raid1 and dm-log, which is being worked on by RedHat right now LVM2 Mirroring for RHEL4. Currently only pvmove and lvmcreate -m use this kernel module (if you have a recent lvm2 build), and you're really on your own.
If you have a cluster of more than 3 nodes (more than 3 PVs in the cluster VG), you can create a mirrored volume. One PV will get one half of the mirror, one PV will get the other half of the mirror, and one PV will get the mirror log volume.
# lvcreate -m 1 -n mirror1 --alloc anywhere -L 4G vg
Logical volume "mirror1" created
# lvscan
ACTIVE '/dev/vg/mirror1' [4.00 GB] anywhere
ACTIVE '/dev/vg/mirror1_mlog' [4.00 MB] anywhere
ACTIVE '/dev/vg/mirror1_mimage_0' [4.00 GB] inherit
ACTIVE '/dev/vg/mirror1_mimage_1' [4.00 GB] inherit
First, create a Physical Volume for the local RAID10 stripe, then for the remote RAID10 stripe via AoE:
pvcreate /dev/md3
pvcreate /dev/etherd/e0.1
This is where that extra RAID stripe comes in. The first pv is for the stripe on this cluster node, the second is for the stripe on the other cluster node.
Next, create a Volume Group that contains both Physical Volumes:
vgcreate vg /dev/md3 /dev/etherd/e1.0
This creates a "vg" volume group that is visible from both cluster nodes, where volumes can be carved out as needed between them.
(Note: This does not mirror the pv's. That's what the -m flag to lvcreate is for. Alternatively, the XenU domain must do software RAID1 to accomplish this goal.)
lvm2 is an entirely userspace abstraction that uses the devmapper kernel module to present volumes carved out of physical block device space.
lvm2 has a cluster manager called "clvmd" that registers with cman to communicate with other cluster nodes to act in a cluster configuration. With clvmd, lvm2 becomes a cluster-wide naming system for volumes carved up out of network exposed block devices, and a locking engine for the same.
# apt-get install lvm2
Or build from CVS:
# cvs -d :pserver:cvs@sources.redhat.com:/cvs/lvm2 login cvs
# cvs -d :pserver:cvs@sources.redhat.com:/cvs/lvm2 checkout LVM2
# cd LVM2 ; ./configure --with-clvmd=cman --with-confdir=/etc/lvm --prefix=/usr && make && make install
After the cluster is configured and running ("ccsd" and "cman"), and lvm2 is installed, we need to edit /etc/lvm/lvm.conf to make this a cluster aware setup.
# vi /etc/lvm/lvm.conf
In devices {}, Add:
filter = [ "a|/dev/etherd/*|" ]
types = [ "aoe", 1024 ]
sysfs_scan = 0
In global {}, comment out:
# locking_type = 1
just below that, in global {}, uncomment or add:
locking_library = "liblvm2clusterlock.so"
locking_type = 2
library_dir = "/lib/lvm2"
Then save, and start up clvmd (make sure cman is running first, and the node is part of the cluster):
# clvmd &
You can now scan for volume groups:
# vgscan
NOTE: lvm2 does not scan AoE devices by default. In fact, if you have sysfs enabled it will not find AoE devices at all, even if you add a filter that matches them. Moreover, lvm2 will only find AoE devices with a major as listed in /etc/modules:
# grep aoe /proc/devices
152 aoechr
152 aoe
This means that all of the AoE devices you wish to scan must start with a major number of 152. If you look at /dev/etherd, you will see 16 "partition" devices for each shelf/slot device by default. Using 16 partitions, as AoE assigns minor numbers linearly, the crossover to major 153 happens just after "e1.5p14". This means that you really only have all of one shelf visible to lvm2, and part of a second (a maximum of 16 devices.. not good for a large cluster of more than 16 nodes).
One "fix" is to edit drivers/block/aoe/aoe.h in your kernel source and replace "AOEPARTITIONS 16" with "AOEPARTITIONS 1":
# perl -pi -e 's/(AOE_PARTITIONS 1)6/$1/g' drivers/block/aoe/aoe.h
Alternatively, set AOE_PARTITIONS=1 when building your kernel
# make ARCH=xen AOE_PARTITIONS=1 oldconfig clean bzImage modules module_install
Rebuild your kernel, then re-generate your /etc/ethered devices using the n_partitions variable:
# n_partitions=1 aoe-mkdevs /dev/etherd
This really fixes the problem, and lvm2 can scan all of the AOE shelf/slot devices!
When configuring the RedHat clustering, you must create a cluster.conf which will exist on every node.
# vi /etc/cluster/cluster.conf
This is an example 2 node configuration, with manual fencing:
<?xml version="1.0"?>
<cluster name="blenke" config_version="1">
<clusternodes>
<clusternode name="smart" nodeid="1" votes="1">
<fence>
<method name="human">
<device name="last_resort" ipaddr="smart.ssn.blenke.net"/>
</method>
</fence>
</clusternode>
<clusternode name="stupid" nodeid="2" votes="1">
<fence>
<method name="human">
<device name="last_resort" ipaddr="stupid.ssn.blenke.net"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="lastresort" agent="fencemanual"/>
</fencedevices>
<cman port="6809" twonode="1" expectedvotes="1">
</cman>
</cluster>
Once the config file is created, we start ccsd. The ccs daemon keeps the configuration in sync between cluster nodes.
/etc/init.d/ccsd start
Next, join the cluster with cman. The cman kernel module is the cluster manager. It uses dlm locking and heartbeat thread to form a quorum of nodes that are part of the cluster.
# cman_tool join
This will join, or create, a cluster.
On a cluster server, the goal is to share storage with other nodes in the cluster.
Each cluster server node is going to share the entire /dev/md3 stripe as a single large block device to the other clvm'ed nodes.
Each "shared" cluster stripe will be defined as an AoE shelf/slot.
vblade 0 0 eth1 /dev/md3
This will create a device "/dev/etherd/e0.0" shared over the eth1 network interface between the cluster nodes on the shared private storage network. Only the other nodes will see this device, you must continue to reference it as /dev/md3 locally. LVM2 will automagically scan this device and include it when re-assembling the cluster volume group on boot.
For production use, as vblade doesn't fork, the easiest way to keep vblade running is to add it to inittab as respawn.
on node0:
# echo "e0:2:respawn:/usr/sbin/vblade 0 0 eth1 /dev/md3" >> /etc/inittab
# init q
on node1:
# echo "e1:2:respawn:/usr/sbin/vblade 0 1 eth1 /dev/md3" >> /etc/inittab
# init q
You should see output from the vblade starting appear in /var/log/daemon. On the other node, you should be able to aoe-discover and aoe-stat show the device:
# aoe-interfaces eth1
# aoe-discover
# aoe-stat
e0.0 306.440GB eth1 up
Note: as this is at the end of /etc/inittab, and running in runlevel 2, the rc2 script will need to finish first before init starts respawning vblade. To expose the aoe device to the network before this point (if you really must), just put this line before the rc2 line in /etc/inittab.
Both aoetools and vblade (the ATA over Ethernet target) have debian packages. If you can't apt-get install them straightaway, drop me an email, and I'll post the backports of these to woody.
This step needs a bit more documentation (will fill it in shortly).
During a "make dist", the build process looks in xen-unstable/dist/install/boot/config-2.6.12.6-xen0 (or -xenU) for the config file to use, and will override the default w/ those files if they exist.
It's generally best to remove the xen-unstable/linux-1.6.12-xen? directories between builds if the Xen tree has been updated; safer that way.
You will need to change the Xen0 2.6.12 kernel so that it builds with devmapper (dm) support, and ATA over Ethernet (AoE):
# cd xen-unstable/linux-2.6.12-xen0
# make ARCH=xen menuconfig clean bzImage modules
# cp -f arch/i386/boot/bzImage /boot/vmlinuz-2.6.12.2-xen0
# cp -f System.map /boot/System.map-2.6.12.2-xen0
# cp -f .config /boot/config-2.6.12.2-xen0
Once you're done rebuilding and preparing to install your kernel, you will also need to re-build the "dlm" and "cman" kernel modules as well:
# cd cluster ; ./configure --kernel_src=`pwd`/../xen-unstable/linux-2.6.12-xen0
# make -C cluster/ install
You will also need to add a boot menu option for this Xen kernel using the Xen 3.0 hypervisor:
# vi /boot/grub/menu.lst
Add a section like so:
title Xen 3.0 / XenLinux 2.6.12.6
kernel /boot/xen-3.0.gz dom0_mem=256000 console=vga apic_verbosity=verbose noapic
module /boot/vmlinuz-2.6.12.6-xen0 root=/dev/md0 noapic ro console=tty0
Note: this is why we don't use lilo. Getting lilo to work with command line arguments for both kernel (append=) and module (initrd=) is only the beginning of the pain. Use grub. Be happy.
You are now ready to reboot with a cluster-ready Xen kernel.
To build the source below, we will need a compiler, and cvs for the source checkouts.
# apt-get install gcc-3.4-dev libc6-dev cvs
Xen has a few dependencies:
# apt-get install libncurses5-dev bridge-utils hotplug iproute python2.3-dev zlib1g-dev
If you want to build the documentation as well, you'll need a few more (tetex, "ps2pdf" from gs-common, and "fig2dev" from transfig, and a recent version of perl with pod2man that supports the --name option).
# apt-get install tetex gs-common transfig perl
Now, grab the Xen "unstable" release and extract it. This includes a 2.6.12 kernel, which is required by the RedHat cluster tools (which we will discuss below).
# wget http://www.cl.cam.ac.uk/Research/SRG/netos/xen/downloads/xen-unstable-src.tgz
# tar xvzf xen-unstable-src.tgz
Now, build the userspace Xen tools and an initial Dom0 kernel (we will rebuild it in the next step, don't worry too much about the .config file right now):
# cd xen-unstable
# make dist (everything builds)
# ./install.sh
Installing Xen from './dist/install' to '/'...
All done.
Checking to see whether prerequisite tools are installed...
Xen CHECK-INSTALL Wed Nov 23 22:46:09 EST 2005
Checking check_brctl: OK
Checking check_hotplug: OK
Checking check_iproute: OK
Checking check_python: OK
Checking check_zlib_lib: OK
All done.
# make install
Other bits that probably aren't required anymore:
Now we're done with the Xen kernel and userspace tools. Lets move on to the RedHat cluster tools to build against the Xen Dom0 kernel.
The stable RedHat cluster tools can be grabbed via CVS:
# cvs -d :pserver:cvs@sources.redhat.com:/cvs/cluster login cvs
Password: {enter "cvs"}
# cvs -d :pserver:cvs@sources.redhat.com:/cvs/cluster checkout -r STABLE cluster
When we build the cluster tools, we want to point the build at the source tree for the Xen Dom0 kernel so that it builds the appropriate kernel modules.
First, some dependencies:
# apt-get install libxml2-dev
Then a small fix to get around the fact that a glibc 2.2 doesn't have an ifaddrs.h or getifaddrs()/freeifaddrs(). You don't need to do this if you're running a glibc 2.3 or later system:
# cat > /usr/include/ifaddrs.h <<EOF
#define getifaddrs(x) -1
#define freeifaddrs(x)
struct ifaddrs {
struct ifaddrs *ifa_next;
char *ifa_name;
struct sockaddr *ifa_addr;
};
EOF
Yeah, it's an ugly hack, but it fixes our woody enough to allow this to build. I'm a bad bad sysadmin.
In the latest CVS checkout, I also had to add an #include back to the top of cluster/cman/lib/libcman.c:
#include "libcman.h"
Then we build:
# cd cluster
# ./configure --kernel_src=`pwd`/../xen-unstable/linux-2.6.12-xen0
# make install
Now the software is ready. Both the Xen tools and the RedHat cluster tools are installed, and the Xen hypervisor and Dom0 kernel is built with the RedHat cluster kernel modules.
I use a debian based distro that I maintain in-house with an extensive hand-maintained repository of backports.
The auto-install platform is roughly based on the SystemImager package, only heavily hacked to simplify maintenance and unify the install script across all of our builds in a flexible way (some day I hope to opensource it here somewhere soon).
I strongly recommend that you have a running filesystem for root (/), usr, and var, that are NOT encapsulated with lvm. You will understand why later. This would be a slightly different layout than our standard NKS setup:
/dev/md0 - RAID1 - root (/) (1G)
/dev/md1 - RAID10 - /usr (4G)
/dev/md2 - RAID10 - /var (16G)
/dev/md3 - RAID10 - everything else.
You can do the following manually with a Knoppix CD if you really want to:
On a 4 drive Parallel ATA (PATA) setup you can generate the above using:
$ cat - <<EOF | sfdisk /dev/hda
0,500,fd,*
,1000,82
,4000,83
,,5
,8000,83
,,83
EOF
Cryptic, yes, but simple.
Repeat for each drive to partition. Then follow with mdadm to build the arrays:
# /sbin/mdadm --create /dev/md0 --force --run --level 1 --chunk 128 \
--raid-devices 4 /dev/hda1 /dev/hdb1 /dev/hdc1 /dev/hdd1
# /sbin/mdadm --create /dev/md1 --force --run --level 10 --chunk 128 \
--raid-devices 4 /dev/hda3 /dev/hdb3 /dev/hdc3 /dev/hdd3
# /sbin/mdadm --create /dev/md2 --force --run --level 10 --chunk 128 \
--raid-devices 4 /dev/hda5 /dev/hdb5 /dev/hdc5 /dev/hdd5
# /sbin/mdadm --create /dev/md3 --force --run --level 10 --chunk 128 \
--raid-devices 4 /dev/hda6 /dev/hdb6 /dev/hdc6 /dev/hdd6
Keeping with this scheme, booting single user, or init=/bin/bash, should give you at least md0 from which you can mount md1 and md2 to do rescue operations. This should be enough to fix most server deaths with RAID1 and without worrying about LVM.
Now format those arrays:
# mke2fs -j /dev/md0
# mkfs.xfs /dev/md1
# mkfs.xfs /dev/md2
And mount them:
# mkdir /target
# mount /dev/md0 /target
# mkdir /target/usr
# mount /dev/md1 /target/usr
# mkdir /target/var
# mount /dev/md2 /target/var
Then fill it with debootstrap (or rsync, or whatever):
# debootstrap sarge /target http://source.rfc822.org/debian
Now edit /target/etc/fstab:
/dev/md0 / ext3 defaults 0 0
/dev/md1 /usr xfs defaults 0 0
/dev/md2 /var xfs defaults 0 0
and install a kernel (this is temporary):
# cp /etc/resolv.conf /target/etc/resolv.conf
# chroot /target apt-get update
# chroot /target apt-get install kernel-image-2.6.8
Now install grub as the MBR on all drives. Make them all bootable as hda, in case hda should die. NOTE: We do not use lilo, as it cannot handle booting the Xen hypervisor and Xen kernels without some ugliness.
# chroot /target apt-get install grub
# mkdir /target/boot/grub
# cp -a /lib/grub/i386-pc/ /target/boot/grub/
# cp /target/usr/share/doc/grub/examples/menu.lst /target/boot/grub/menu.lst
# grub
grub> root (hd0,0)
grub> setup (hd0)
grub> setup (hd1)
grub> setup (hd2)
grub> setup (hd3)
Edit your /target/boot/grub/menu.lst so that it points to the kernel.
Now you should have a bootable system. Unmount the /target mounted filesystems and reboot.
You should now be running a base install of a distribution of Linux on your server that boots with grub and has an unused md storage device that spans the majority of free space (/dev/md3). Xen requires the former, and lvm2/aoe will require the latter.
When building any Linux cluster, the first step is laying out the topology and shared storage.
To keep things simple, fast, and cheap, ATA over Ethernet (AoE) is really the best solution available at the moment.
For simplicity, each server in the cluster will be given two network interfaces. An "internal" protected storage network, and an "external" firewalled public network.
The goal: Make a managable cluster of machines work together to provide 99.999% availability for a set of virtual machines in the fastest way possible with current cheap commodity hardware.
To this end, I've put a bit of energy into building a simple Xen cluster. This whitepaper is an attempt to document the effort.
Xen is a hypervisor. Think of it as a microkernel done right. There exists Linux, NetBSD, and even an OpenSolaris port that run under the Xen hypervisor. The "host" machine is Domain 0 (Dom0), and is responsible for talking to hardware on the box and configuring and booting the Domain User (DomU) slices. Don't be confused by Dom0, however; the Xen hypervisor is the magician behind the scenes making this possible.
Xen 3.0 has migration features: you can move a Xen DomU instance between physical Xen servers. To do this, however, you need a shared storage system, or some method of NAS/SAN visible to all nodes in the cluster.
RedHat has a wonderful clustering platform with native clustered stupport for LVM2. Instead of GNBD, however, I've decided to use ATA-over-Ethernet for simplicity and speed. With this, we have a clusterable group of machines that share a common storage namespace (and can access each other's storage directly via the network), permitting native Xen domain migration.
The following guides formed the basis of the above decision:
It appears the Video Keg has been slashdotted.
This is a poor little user-mode-linux image running ruby on rails via fastcgi. The only thing saving me thus far is fragment caching within Rails.
Odd that something put together 3 years ago is getting slashdotted now.
I'll be watching this server closely today...
Various people have advised me that the VideoKeg has been published on NewsForge.
Hooray!
Andrew Escobar has found how to enable safe-sleep suspend-to-disk on Macs other than the newer powerbooks. This may have be started by Matt Johnston, who has another great guide to this.
1. Set has-sleep-safe property
The first step is to enable the has-safe-sleep property in nvram:
sudo nvram nvramrc='" /" select-dev
" msh" encode-string " has-safe-sleep" property
unselect
'
sudo nvram "use-nvramrc?"=true
which should look like this in a terminal window:
Last login: Fri Nov 11 11:11:11 on ttyp1
Welcome to Darwin!
computer:~ User$ sudo nvram nvramrc='" /" select-dev
> " msh" encode-string " has-safe-sleep" property
> unselect
> '
computer:~ User$ sudo nvram "use-nvramrc?"=true
2. Enable Sleep Safe
Sleep Safe requires as much free disk space as physical memory, plus 750MB. To enable Sleep Safe, in the Terminal enter:
sudo pmset -a hibernatemode 3
"If you have secure virtual memory enabled, use 7 rather than 3 to disable encrypted hibernation. Encrypted hibernation does not work. Do not set it to 7 if you do not have secure virtual memory."
This will create the file /var/vm/sleepimage which will be used for the actual suspend-to-disk.
Disabling Safe Sleep
To disable Safe Sleep enter in the Terminal:
sudo pmset -a hibernatemode 0
No need to restart.
For a more full undo, disable all nvramrc variables:
sudo nvram \"use-nvramrc?\"=false
For more info, visit Andrew Escobar's blog post and comments, or Matt Johnstons webpage.
As a pluggable daemon, mcp needs a flexible command syntax to permit both control of the plugins and passthrough of commands to the plugins for scripting.
First, we make a usage function to advise the user:
def usage(argv,stdin,stdout,stderr)
stderr.puts "Usage: mcp {command}
Where {command} is one of:
plugin stop {plugin name} - Stop a plugin thread
plugin start {plugin name} - Start a plugin thread
plugin load {plugin name} - Load a named plugin
plugin unload {plugin name} - Unload a named plugin
plugin tell {plugin name} {command} - Tell a plugin a command
thread list - List currently running threads
exit - Kill mcpd
"
1
end
You probably want to use a "here document" for that multi-line print, but I'm having problems getting it to render in bluecloth (markdown) at the moment.
Now for the real fun. All of the commands are passed to the command() method. This is where we handle each of the above:
def command(argv,stdin,stdout,stderr)
@command=argv.join(' ')
begin
log("mcp #{@command}")
case @command
when /^quit$/i, /^exit$/i
# Need more exit handling here!
exit
when /^plugin list$/i
@plugins.each_key { |plugin| stdout.puts "#{plugin}\n" }
when /^thread list$/i
stdout.puts Thread.list.map { |t| "#{t.to_s} #{t['name']}\n" }
when /^plugin tell (\S+) (.*)$/i
log("Telling #{$1} to #{$2}")
@plugins[$1].command($2,stdin,stdout,stderr)
when /^plugin start (.*)$/i
@plugins[$1].start()
when /^plugin stop (.*)$/i
@plugins[$1].stop()
when /^plugin load (.*)$/i
plugin_load($1)
when /^plugin unload (.*)$/
plugin_unload($1)
else
usage(argv,stdin,stdout,stderr);
end
rescue => detail
stderr.puts detail.message + "\n"
stderr.puts detail.backtrace.join("\n") + "\n"
1
end
end
Simple, eh? Now plugins are controllable from the command line.
Not bad for ~100 lines of ruby so far.
The next step is setting the thread['name'] properties for the "thread list" command. I'll cover that in the next post.
The EFF is collecting a list of people who satisfy the following criteria:
They are considering litigation against Sony.
If you were affected, and fit the above criteria, look into it.
I've been running this little script for a while to "smooth-out" traffic from my home network, but don't appear to have posted it anywhere.
This example doesn't use HTB to shape the traffic: instead, it creates three priority queues - each as a round-robin to guarantee fairness between other packets in that class. Also, TCP sessions can jump from one traffic class to another based on their traffic pattern (window size changes).
The neat part about this is that applications also set the TOS bits, meaning that you can run a number of daemons for IP telephony that work perfectly alongside the TOS rules created below.
Here it is:
#!/bin/sh
INTERFACES=eth0
for interface in $INTERFACES ipsec0; do
tc qdisc del root dev $interface
tc qdisc add dev $interface root handle 1: prio
tc qdisc add dev $interface parent 1:1 handle 11: sfq
tc qdisc add dev $interface parent 1:2 handle 12: sfq
tc qdisc add dev $interface parent 1:3 handle 13: sfq
iptables -F -t mangle
iptables -t mangle -X chkack
iptables -t mangle -X chktos
# Prioritize all ICMP traffic
iptables -A PREROUTING -t mangle -p icmp -j TOS --set-tos Minimize-Delay
# Prioritize all UDP traffic
iptables -A PREROUTING -t mangle -p udp -j TOS --set-tos Minimize-Delay
# Create "check TCP ack" chain. Small ACKs get priority, large ACKs are demoted.
iptables -t mangle -N chkack
iptables -t mangle -A chkack -m tos --tos ! Normal-Service -j RETURN
iptables -t mangle -A chkack -p tcp -m length --length 0:128 -j TOS --set-tos Minimize-Delay
iptables -t mangle -A chkack -p tcp -m length --length 128: -j TOS --set-tos Maximize-Throughput
iptables -t mangle -A chkack -j RETURN
# If a TCP ACK packet is being sent, run it through the "check TCP ack" chain first.
iptables -A PREROUTING -t mangle -p tcp -m tcp --tcp-flags SYN,RST,ACK ACK -j chkack
# Create "check TOS" chain. This adapts for things like ssh that use Minimize-Delay
# by default, but should really use Maximize-Throughput for things like rsync-over-ssh.
# This checks for more than 2 large TCP packets per second, and corrects their mislabeled
# TOS appropriately (think top over ssh).
iptables -t mangle -N chktos
iptables -t mangle -A chktos -p tcp -m length --length 0:512 -j RETURN
iptables -t mangle -A chktos -m limit --limit 2/s --limit-burst 10 -j RETURN
iptables -t mangle -A chktos -j TOS --set-tos Maximize-Throughput
iptables -t mangle -A chktos -j RETURN
# Now, match all TCP streams, checking their TOS through the above rule.
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j chktos
# Classify the TOS of various protocols by well known port number
iptables -A PREROUTING -t mangle -p tcp --sport 22 -j TOS --set-tos Minimize-Delay
iptables -A PREROUTING -t mangle -p tcp --dport 22 -j TOS --set-tos Minimize-Delay
iptables -A PREROUTING -t mangle -p udp --sport 53 -j TOS --set-tos Minimize-Delay
iptables -A PREROUTING -t mangle -p udp --dport 53 -j TOS --set-tos Minimize-Delay
iptables -A PREROUTING -t mangle -p tcp --dport ftp -j TOS --set-tos Minimize-Delay
iptables -A PREROUTING -t mangle -p tcp --dport ftp-data -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --sport 110 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --dport 110 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --sport 143 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --dport 143 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --sport 993 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --dport 993 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --sport 995 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --dport 995 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --sport 1352 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --dport 1352 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --sport 80 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --dport 80 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --sport 443 -j TOS --set-tos Maximize-Throughput
iptables -A PREROUTING -t mangle -p tcp --dport 443 -j TOS --set-tos Maximize-Throughput
done
This really is a simple script, and while it works for me, a more generic script is beyond the scope of this post.
Sprint just broadcast an E911 Update for their Treo650 firmware.
My guess is that it has something to do with GPS triangulation enforcement when dealing with 911 calls. Other Qualcomm chipset based phones have a GPS "lock" when you dial 911 - my guess is that the 1.12 firmware doesn't have it, and due to new regulations they must - so the 1.12a firmware was born.
The various forums should have further info.