Search:
Where I Work
NKS
Subscribe
Add to Google
RSS 0.91
RSS 1.0
RSS 2.0
ATOM 1.0
RSS 2.0 and ATOM
Network
View Ian's profile on LinkedIn
Archives
2007 April (1)
2007 February (1)
2007 January (4)
2006 December (2)
2006 November (2)
2006 September (5)
2006 August (4)
2006 July (1)
2006 June (3)
2006 May (2)
2006 March (4)
2006 February (4)
2006 January (1)
2005 December (8)
2005 November (26)
2005 October (10)
2005 September (17)
2005 August (87)
2005 July (48)
2005 June (34)
2005 May (24)
2005 April (243)
2004 April (1)
2004 February (3)
2003 August (2)
2003 June (2)
2003 May (8)
2003 January (1)
2002 September (1)
2002 July (4)
2002 June (2)
2002 May (5)
2002 April (15)
2002 March (15)
Projects
CornFS
DENSO NAV
Rage Powered
Tampa Bay
TampaBad
SLUG
ob-buttons
Creative Commons OpenSource Linux Individual-i GeoURL Linux Speakeasy Speed Test
Twitter

follow icblenke at http://twitter.com
Google
Ian's shared items in Google Reader (subscribe)

Fenxi - Performance analysis made easy

Changing libgnomecups For Multiple Evolution Users

Re-Sync With Compiz Fusion

Capable packages

Happy National Sys Admin Appreciation Day!

NIS on Windows Server 2008

ESX iSCSI Basic Configuration from the CLI

Tape Rants and Raves: LTO4 Rules

IP Filter in OpenSolaris

iSCSI Security with CHAP

Plastic Ocean

apparently you aren't dead until you start to stink

Charlie Goes to Candy Mountain

iSCSI Security with CHAP

Seattle Scalability Conference, Pt II

Singing Tesla Coil.

Magic Tricks Tutorial Videos

Announcing the Hyperic VMware Appliance

SysAdmin Magazine: RIP

The megafreeze development model is broken

Geektalk

Algae farming for biofuels

Mozilla Ubiquity

How a VoIP E911 call is handled

A critique of RDMA

MonetDB - a column based RDBMS, ideal for time series data

BarcampTampaBay

Intel's programmable matter

Nexenta Hackathon

The value of side projects

VMfaq's comparison of virtual storage IO

Xen 3.3 released

USB3.0 cables

Intel wireless power.

Xen and Solaris, a log of experience.

Adeona.cs.washington.edu

OpenSolaris CR#6654713 - 32G limit bug stemmed from bad USB hardware? Perhaps fixed?

Xen CPUID example config

OpenSolaris CommonArrayManager

Multiple zero capacity quantum communication channels can actually transmit non-zero amounts of data thanks to entanglement

Sharity-Light - smbfs derived samba clone

Drizzle, a thin mysql, generating buzz

VMWare to offer ESX hypervisor for free

Veedeeeyes

Dr Horrible's Sing-Along-Blog

Fan, the programming language.

Blackberry Thunder with Haptics keyboard

iPhone App Store Live Walkthrough now available

Google Protocol Buffers

Time to patch your DNS

Recent JVM benchmarks

Overclocking tool for the Mac Pro

ADO.NET Entity Framework (Microsoft's new ORM) given a non-confidence vote by beta testers

Ruby interpreter flaws make the case for JRuby

The Stalled Server Room

AdvFS - Tru64 filesystem ported to Linux

OpenSolaris 2005.05 repository update to b91 - follow these instructions carefully

SXCE can ZFS install as of b90

Vertebra: EngineYard's Next Generation Cloud Computing Platform

Skype 4.0 beta overhauls video chat

Mozilla org receives traditional IE cake

Toyota Prius to go entirely Electric

Bill Gates steps down permanently for philanthropic activities

Men write code from Mars, Women write more helpful code from Venus

SproutCore - a GUI event driven model javascript web development platform, rails based by the looks of it.

Finding ARPANET

DRBD LVM Xen = Bug. A rather nasty one at that.

Intel unveils Ct as an extension for C/C to encourage threaded programming for multiple cores

VMWare ThinApp - Run any Windows app on any version of Windows

JDBC adapter for HBase

JRuby-Rack <-- a JRuby port of Rack

Rack <-- a lighter cousin to Merb, fully threaded and no Mutex.

Datamapper.org <-- ActiveRecord like, with no need to do migrations, it just kind of handles that by itself internally automagically.

Solaris Cluster Express (SCX) 6/08 released.

a-i-studio.com/cmd

CMDLogParserDemo

Changing solaris' default password hashing

Texas based service provider explosion affects 9,000 servers and 7,500 customers.

Jruby on Rails on Tomcat deployed as as WAR file

Rubinius

Milkfish.org SIP Router

42 more of the best Linux games

42 of the best Linux games

XenWindowsGplPv drivers

Use Google's cached ajax libraries

Arduino microcontroller with OS/X

The metasploit page describing the full impact of the poor RNG.

Holger Bert's blog post on the openssl RNG fiasco

Cayac - Cherokee MySQL PHP5 phpMyAdmin

ZFS very slow under an xVM kernel

VMWare's review-board.org

Google DocType

Dynamically editing libvirt xml configs while a VM is running to redefine reboot flags.

Chronoton - the time travelling robot who's best friend is a talking pie game

Endace DAG

Your pizza is done

Rietveld - Google's code review tool

Opensource multitouch displays

RTL8139 drivers supporting QEMU tcp segmentation offloading (XP's default driver does not) - doubles networking speed of Xen HVM networking without using the GPLPV drivers

Corporate map.

Ono - an efficient way to locate nearby peers

Solaris CIFS integrated AD with ZFS acls

Samba Winbind and ZFS acl working together

Why's unholy Ruby to Python .pyc compiler

Zentific poll daemon 1.0 beta

Solaris SAM-QFS NFS and OS/X

OpenSolaris 2008.05 final ISO image

Twitter abandoning Ruby on Rails

HP makes memory from a once-theoretical circuit

AVS seamless with ZFS

OpenVZ live migration demo

Setting Up an OpenSolaris NAS Box: Father-Son Bonding - The Video

Linux kernel Xen self-ballooning patch

HyperVM

FuildVM

Coolstack - Yet another group of solaris packages

SFE - Spec Files Extra - or, solaris's ports system

ksplice - live linux kernel patching

ZFS-102-A.pkg - binary package build of newer ZFS for Mac

ZFS for Mac Project page

Changing boot flags for a solaris domU guest

RAM based SSDs

Augeas - a configuration API

callflow - SIP callflow diagram generator

sdedit - quick sequence diagram editor

Milax - The OpenSolaris Small Live CD

Sun close-sourcing MySQL

Intel hardware virtulization breaks kvm - if you're going to run HVM on Intel, you want Xen 3.2 for the improved software emulation of instructions broken in Intel's hardware virtualization

Big Nerd Ranch on Windows/Linux/Leopard single signon

Sun touts big plans for OpenSolars as first release nears

Heroku - EC2 based Rails hosting.

RIP John Achibald Wheeler

Meadowcourt's compiled WindowsXenPV driver, v0.8.8, as built from win-pvdrivers.hg repo

What's new in Solaris 10 U5

The Thing About Git

Network Solutions hijacks all customer's unused subdomains

ZFS Evil Tuning Guide

ZFS speed bump: set zfs_nocacheflush = 1

We Don't Use Software That Costs Money Here

Free NIC drivers for Solaris

Hubble - a PlanetLab realtime Internet "blackhole" monitor

Citrix price jumps on rumors of potential IBM/Cisco bidding ware

Segway RMP

TechCrunch labs on their AppEngine deployment

pash - because powershell was too cool to let microsoft keep to itself

Skeptologists

Google AppEngine

Brazil migrates 430 thousand boting machines to Linux

How xVM can be made to suspend/restore instead of shutdown/restart guests on reboot of the underlying xVM host.

The Machine Emulator - TME can emulate a sparc4 with OBP

SFE - spec-files-extra

OSCON2008 schedule

Google releases new GCC linker

Automatic generation of peephole superoptimizers

Zentific

Zentifi

Disabling nagle under Solaris

Xen.org Trademark Policy for Review

SXCE b85 has problems booting under Xen 3.2

OpenSolaris xVM sysadmin doc

VNRP == opensolaris quagga rbridges crossbow xVM

RBAC vs sudo HOWTO

problems reprobing iscsi devices with solaris 10

IPMP for Solaris Zones

All OpenSolaris flag days

Liveupgrade for idiots

Sigma DP-1 review

ratemynetworkdiagram

LSI MegaRAID SAS/Dell PERC5 driver for Solaris

dm-band block IO bandwidth controller

Sun open sources SAM-QFS

Dojo.storage - Google Gears workalike?

PerlCritic

PerlTidy

Tux droid

ooma.com - free phone service after you buy their device

Hacking defibrilators shockingly easy

Microsoft working with Eclipse.

Pentagon attack last June stole an "amazing amount" of data

Solaris and Solaris Cluster on HP ProLiant Servers

Apple Introduces new MacBook and MacBook Pro models

Sun leaks 6-core Xeon, Nehalem details

Xen and Solaris - a journal of sorts

How to save the world with ZFS and 12 USB sticks

Xvm: a summary of creation of various Xen domU

OpenSolaris b82 comes with CoolStack

Disk Encryption Cracked?

Dilber PHB on Virtualization Consultants

Dilbert PHB on Virtualizing

Burger Haiku Contest

Sun xVM Ops Center GA v1.0 tomorrow

KernelTrap on the 2.6.23 Xen merge

Infiniband explained.

IETF XMPP/SIMPLE Interworking Draft

PSYCed - IRC/XMPP server that gateways transparently between both

Wikipedia OTR

OTR - Off The Record, Homepage. IM Encryption.

SIPE - Pidgin plugin for SIP/SIMPLE with Microsoft LCS compatibility hacks

Price Waterhouse Cooper's Global Cable Map

Solaris Windows iSCSI speedup disabling NAGLE

qooxdoo.org

ConVirt

OpenSolaris Storage Developer Wish List

Nexenta Builder - build your own Nexenta based distribution

Microsoft to acquire SideKick maker Danger

Linux Kernel 2.6.23-2.6.24 vmsplice local root exploit

The evolution of Tech Company logos

Hypertable

Mindstorms NXT Rubiks Cube Solver

Cut four undersea cables, shame on you, cut a fifth, also shame on you

Koha - OpenSource Integrated Library System

Oracle's VM patch for Xen to allow 32bit/64bit domU save/restore/migrate with a 64bit hypervisor and a 32bit dom0.

2 girls, 1 cup: The show

SIPE - SIP Exchange protocol - or, how to get Pidgin to talk to Microsoft Live Communication Server

Little notes on ZFS storage

Amazon SimpleDB written in Erlang

NexentaXenDom0

Three different cable cuts in the middle east: two off the coast of egypt, one off the coast of dubai.

Xen DR7 and CR4 Registers Multiple Local DoS vulnerabilities

XMLPulse - parse xen dom0/domu stats

Universcale

The rist of the FOSS spinmeister

ThinkingRock GTD

Smartphones patented - lawsuits immediately filed

TestDisk - when you've botched a simple->dynamic disk conversion and need that NTFS filesystem back, give this a try. Also, if you partition a disk mistakenly, this can find your filesystems and reconstruct a partition table painlessly.

H-Sphere cross-platform hosting control-panel

Mystery infestation strikes Linux/Apache web sites

Fenxi - A java based OpenSource Performance Analysis Engine. Fenxi (mandarin for analyze) is the successor to the Sun-internal tool called Xanadu.

Gizmo backdoor dialing

GNU/Solaris - When the fun begins

KDE goes cross platform with Windows and Mac/OSX support.

Microsoft prints get-out-of-jail card for Vista Home

Tsung - an erlang based multi-protocol distributed load testing tool

Microsoft relents, ban on vista virtualization is lifted

Architecture for Lustre ZFS

Lustre ZFS

Hyperic podcast talking smack with Luke KAnies of Puppet

Commodore SX64 vs MacBook Air

The Mysql storage engines, and when they are appropriate.

MADOCA - Message And Database Oriented Control Architecture

SMP Xen HVM Windows guests need timer_mode=1

Remember, Oracle owns innodb

Sun buys MySQL for $1billion

Wearscience.com

DreamHost billing issues

James Randi is coming to Tampa

Information Of Those Who Appealed Watch List Compromised

ITConversations

CNN Secondlife Blogs

Google MapReduce stats

Tata Nano - $2500 world's cheapest car

Dilbert on Agile Programming

Banks banned in Second Life

shimmer

Ubuntu embraces OpenVZ

Sears goes spyware

Savingtheinternetwithhate.com

Avocent KVM over IP

Zed Shaw: Rails is a Ghetto

Air Travel with Spare Batteries? Check the changes to what is permitted starting tomorrow.

TBO Crime Tracker

Tampa crime grid maps

TechShop Orlando

OpenNetAdmin

Open Configuration and Management Layer

FiveRuns RM-Manage - rails project monitoring

VLDB - Very Large Data Base Endowment Inc - nonprofit

Elastix - a more friendly Trixbox fork

The C days of Y2k

Toshiba micro nuclear reactor

Ball pit couch

A Glimpse and a Hook - a take on resumes

Xirrus - LISA used 7 arrays to provide WiFi

ipcluster

Imagine Peace

dopd - an easier way to keep drbd primary/secondaries in sync

OpenSIM - run your own SecondLife grid.

$4million in hardware lost in London data center heist

iscsi block device script for /etc/xen/scripts

Quaqua - Aqua look and feel widgets for jvm

Java6 for os/x: Soylatte

Chimps beat humans in memory tests.

WinFUSE

Level 3 needs technicians with FIREBALLS

10 steps to close down an open society

Slurm tutorial PDF

Longer flights to avoid air traffic control charges

News release from Six Apart about LJ sale to SUP

SUP bought LJ from Six Apart

Optimus keyboard is finally available

PlasticFS - an LD_PRELOAD to make applications think they're on a case-insensitive filesystem, and other neat hackery

pkgGen and logGen and Packagemaker - repackage os/x packages to deploy

Jumpbox.com - virtual appliances

TelegraphCQ - barkeley database research - adaptive dataflow capture, combine, analyze

UK loses CD of private info on 25million citizens

Solaris Automatic Migration opensourced

AVS ZFS Demo <-- replicated ZFS pool

Xen Virtualization book not yet published for sell on Amazon

Phoenix BIOS releasing its own hypervisor

Andrew Warfield's other publications

Parallax - managing storage for a million virtual machines, from the Xen guys at Cambridge

Kepler project - GRID scientific workflow engine

Google Distributed Systems

Google Code Map/Reduce mini lectures

What 24 would have been like in 1994.

WaterRoof - Mac OS/X Firewall Manager

Fedora Func

10 reasons why Oracle databases run best on VMWare

Google Caja - allow scripts in a 3rd party context

Miro 1.0 launched

Xen Windows PV drivers - opensource mercurial repository

QuickSilver - opensourced 11/06/07

vmcasting.org - someone else "gets it"

Vista True Info

ASUS EEEPC701 starts to appear

RedHat virt-factory

oh, THAT spacecraft! oops!"

Perian - Opensource quicktime codecs

KVM-lite == kvm-quemu lguest

RedHat cobbler

RSnapshot - an rsync based dirvish like tool

Flyback - a google code project equivalent to Apple's Time Machine, for Linux

Buglabs.net

Apple tablet PC is real, says Asus.

Yahoo Zookeeper

producten.hema.nl - wait for this one to load

Google rolls out the Open Handset Alliance

Cost analysis of Windows Vista Content Protection

HDF5

Git - a Google Talk by Randal Schwartz

Asus EEE PC 701

JQuery's AJAXSLT plugin

Google's AJAXSLT

indeed.com - MIT search engine for jobs crawled from monster, dice, etc.

Genius files

Genius - a mac flashcard app

The Day The Routers Died

Tomshardware's RAID Migration Adventure

Theo de Raadt on Virtualization, and the sate of OpenBSD Xen

Prius Limo

Tamparuby youtube video

Bitlbee - IRC gateway all of your other IM traffic

Off The Record - encrypted IM overlay

SATA drive -> NES cartridge style

SVN time lapse view

Google Gears in Motion

Amazon's one-click patents struck down

Morgan Stanley sells entire New York Times stake

The future of malware

GTDTools

GTD - Getting Things Done

PS3 supercomputer

Dolphin SCI

Massive installation management tools

smbldap-tool addons

Wi-Fi Detector Shirt

GULP: a unified logging architecture for authentication data

Sun xVM

Crazy Patents

zypper - suse's apt analog?

EC2 outage loses customer data

FutureOfWebApps conference underway

Microsoft releasing the Source Code for the .NET libraries

LiveView.sf.net - Java based graphical forensics tool that creates VMware virtula machines out of raw disk images or physical disk.

Thinstation.sf.net

Windows 2003 Server Emergency Management Services (EMS) - Special Administration Console (SAC)

Catalyst - the Perl web framework analog to Rails

Fusion io - the power of 1000 harddrives in the palm of your hand

Thingamy

Proggyfonts.com - fixed width font downloads

Verizon FIOS moving to IPTV

Heavy Reading

Math bug in Excel 2007

Glue

CoworkingOrlando

likemind.us

BlogOrlando starts Friday

BarCamp Orlando is this weekend

ESX3i Dell demo

How to us CHDK to give your Canon digial camera RAW support

Opcon/xps batch system

PBS batch system

LSF batch system

SGE batch system

UIKit Hello World

Cygnal - When Red5 just won't cut it for an RTMP server

Creepy pooch

IBM's CoScripter - automating web-based processes

AjaxWindows.com - Another Michael Robertson company

p0f passive fingerprinting IDS

Talking storage systems with Sun's ZFS team

Dr Nick's Magic Models

SproutCore - a MVC scaffolding for actual Application development

Skype protocol obfuscation layer

Microsoft Silverlight and the Mono team at Novell join up to create the Moonlight project

Bitlbee - bridge IM client networks to an IRC channel.

EJBCA - The J2EE Certificate Authority

OSC CAtool

Festo's latest pneumatic tech

Mcell 3.5" drive has 1GB of DDR RAM 2.5" drive == 110MB/s transfer rates

TENORIO-ON Product Demo

OpenSolaris Xen domU with a linux dom0

Tentakel: distributd command execution

Ganeti: Opensource virtual server management software for Xen

Seemless dynamic image resizing

Mono and XPCOM scripting VirtualBox

The bacon mat

podbrix young woz and jobs playset

Sat, 29 Mar 2008

It's been a while since I've posted here. Let me bring you up to speed.

I've started deploying Solaris xVM in an attempt to use ZFS on the backend via iSCSI and the goodness of Xen 3.1.2 that is now in SXCE b85.

The first step in embracing xVM was deciphering the labyrinth that is Sun marketing.

OpenSolaris is the source based distribution. It relates to ON source trees and compiling things. If you state in #opensolaris that you are using "OpenSolaris", they assume you are building from source and are a developer.

The ON source trees are there for developers to build from. Likewise, the Blindingly Fast Updates (BFUs) are there for developers to update binaries between weekly builds, so they don't have to rebuild an entire tree. If you use BFUs, you break packaging and upgrades, and are effectively on your own.

What you are most interested in is Solaris Express Community Edition, otherwise known as SXCE.

SXCE is based on weekly build numbers, and is released every other week as a ISO image for mainstream users to play with. You can LiveUpdate between SXCE releases, and all packaging is handled properly.

SXDE is SXCE "frozen" quarterly. It is dead now. Ian killed it. Let me explain...

The Linux distribution Debian is pronounced /ˈde.bi.ən/. It comes from the names of the creator of Debian, Ian Murdock, and his wife, Debra.

Sun hired Ian Murdock. Ian has been changing things internally within Sun. Ian has championed a Linuxization of Solaris of sorts, which is a bit against the grain of most senior Solaris folks.

The new Indiana project is a culmination of this effort. It is effectively a repackaging of Solaris leaning more on the GNU tools and adopting a new "pkg" format that can update from repositories more readily. The current release of Indiana is Developer Preview 2, which is based off of SXCE b79. It is a live CD that can run a full desktop environment without actually installing it on a machine. There is an integrated "light" version of the caiman installer on the desktop that will allow you to install to harddrive media if you wish.

The "pkg" packaging in Indiana is a wonderful thing. Unfortunately, the repository doesn't appear to be updating every two weeks like SXCE.. yet. They're planing on doing this soon, which should make updates relatively painless. Automatic dependency resolution and the ability to point to a server or media repos makes this very similar to the apt-get way of doing things, though it's a Python based system (rather than Perl) that is actively under development.

Indiana installs to a zfs root. SXCE currently (as of b85) only installs to a UFS root.

While you can make SXCE boot to a zfs root, you effectively break LiveUpdate, as it doesn't grok zfs root.

Indiana doesn't need LiveUpdate. The "pkg" system will soon automagically do zfs snapshots to do upgrades (similar to the Nexenta apt-clone that I absolutely love), but you can approximate that now with minor effort.

... back to the explaination: Due to the advent of Indiana and Ian Murdock's influence, it looks like SXDE is effectively dead. There will reportedly be no future SXDE releases.

The default boot option of SXCE is "Solaris Express Developer Release". This is the caiman installer that is slightly bleeding edge and installs everything possible in a rather simple way.

The SXCE "Solaris Express" boot option is for the older more familiar Solaris installer. This allows you to fully specify what packages to install, and is more involved at install time.

Back to xVM: SXCE b89 will be the freeze point for Sun's xVM Server.

SXCE is currently in week b87, so in 2 more weeks there will be a deep freeze for that build.

Again, you have to pay attention to the community posts, flag days, and other things that let you get a feel for Sun's release cycle and marketing changes. I'm only just beginning to get a handle on it.

So, in conclusion, if you want to play with xVM, b85->b89 is a great time to get up to speed for the xVM Server product release.

Tue, 17 Apr 2007

Here's a Makefile I use to build freenx, posted here for others to use:

Makefile

To initialize, run "/opt/nx/2.1.0/nxsetup --install", and you're done.

You may have to edit nxloadconfig and/or nxserver to replace the /usr/NX path with the installed path of /opt/nx/2.1.0. You may also want to edit /data/nx/conf/nxnode.conf with site specific changes.

More to come.

Enjoy.

Mon, 26 Feb 2007

Picking the right virtualization technology requires a basic understanding of what is available out there today.

Rik Van Riel has put up the virt.kernelnewbies.org page that shows a number of the existing virtualization methods. You might want to peruse this first to get a feel.

"Bare Metal" or "Raw Iron"

Basic computing today typically occurs on "Bare Metal". This would be where your Operating Systems is installed directly on a given hardware platform. This "Raw Iron" role is how most people treat computing platforms today.

Some higher end hardware platforms offer "Hardware Partitioning". This is where the hardware platform is divvied up between multiple parallel operating systems at the same time. The hardware platform offers up CPUs, memory, and disk to independent operating systems that then run on the resources allocated to them. This isn't as much virtualization as it is resource partitioning. An example of this would be higher end Unix hardware like Sun T1 processor based servers: each hardware platform can be broken up into 32 "LDoms", each with its own install of Solaris.

VPS "Containers" - Security/Role based Virtualization

If your userspace applications don't require unique kernel services to operate, you get far more density with a VPS "Container" solution than with any other virtualization method. Simply put, all of your userspace applications share one kernel and are separated from each other via role based security mechanisms.

There are a number of different VPS technologies out there, each with its own benefits and limitations:

    OpenVZ/Vserver
    Linux-Vserver
    Solaris Zones
    BSD Jails

Solaris Zones is the only VPS platform that supports running other flavors of Unix under its "BrandZ" containers. With it, you can run a number of 32bit Linux guest flavors alongside various Solaris/OpenSolaris versions.

OpenVZ has relatively new support for IPTables as well as IPSEC independent to guests, as well as live migration.

Simply put, you should really spend some time verifying that a VPS solution won't solve your virtualization problems first. They are the best method of virtualizing with the least amount of overhead and the highest virtualization density.

User-Mode-Linux

If you need a unique kernel for each virtual machine, and don't mind a bit of overhead, User-Mode-Linux provides a secure jail with a Linux kernel, running entirely in userspace.

Using "skas0", a User-Mode-Linux kernel can boot and run under and Linux kernel without much host kernel support (usually only tuntap networking). The I/O performance of User-Mode-Linux does suffer somewhat, however, and RAM allocation per virtual image isn't as ideal as a VPS solution.

The obvious benefit is the ability to run an manage a User-Mode-Linux virtual server as userspace processes on any "standard" Linux kernel.

If you're going to use User-Mode-Linux, I strongly suggest trying Xen paravirtualization instead. The only thing that User-Mode-Linux buys you is the ability to oversubscribe memory based on host kernel virtual memory paging. Xen doesn't let you overcommit RAM as associated with guests (though it does let you change the running memory footprint on the fly, unlike User-Mode-Linux which pre-allocates it from tmpfs).

User-Mode-Linux suffers from low I/O throughput however, and tends to fall apart under load.

Paravirtualization

Paravirtualization uses a technique of "cooperative virtualization" between guests and a hypervisor. Simply put, a paravirtualized guest virtual machine is aware that it is running under a virtual environment, and adapts to this environment as appropriate.

Xen's hypercall API is well documented, and has been available to the community longer than VMWare's VMI interface. As such, there are a number of Xen "PV" ports including FreeBSD, OpenBSD, and OpenSolaris, as well as the native Linux port that Xen embraces as part of the current opensource Xen platform.

Xen is slowly being ported into the Linux kernel proper, but there is much developer pushback to each stage of the import effort. Instead, the Linux Kernel Maintainers are gung-ho about Rusty's l-guest (previously known as "l-hype") as a paravirtualization platform for future Linux kernels. At this time, l-guest is very immature and quite slow, not nearly ready enough to consider for a production deployment.

VMWare opened up their VMI specification for everyone to use, to entice systems developers to standardize on a paravirtualization API. Providing this VMI interface would allow VMI aware guests to run under VMI aware hypervisors. Unfortunately, the device interface doesn't appear to have made the cut, so guests still need to be aware of paravirtualized devices as well.

Xen PV "backend"" devices appear on a XenBus, and are accessed using a PV "frontend" device driver. Natively, the opensource Xen 3.0 only has Linux 2.6 PV drivers. The various Xen ports of FreeBSD, OpenBSD, and OpenSolaris each have their own PV "frontend" driver implementation.

VMWare ESX uses their LSI SCSI device driver and VMX networking driver to optimally talk to virtual devices. These are available for a number of operating systems and are far more mature than Xen.

Some of the benefits of a paravirtualized guest include the ability to reallocate resources on the fly from the hypervisor (changing memory footprint, hotplugging CPUs) and more integrated lifecycle management (reboot, suspend, migrate).

Both Xen and VMWare ESX are hypervisor approaches with the ability to run paravirtualized guests on intel class hardware.

Xen 2.0 was initially offered only a paravirtulized "PV" mode of operation. Xen 3.0 offers it as well, alongside Hardware Virtualized "HVM" that we will over in the next section.

System Virtualization - Virtual Bare Metal

If VPS, User-Mode-Linux, and Paravirtualization aren't adequate to the task you have at hand, it might be time to consider full system virtualization.

This mode of operation is normally much more resource intensive, and is far less scalable than the earlier virtualization methods. However, for some Operating Systems (like Microsoft Windows), there really are no better choices at the moment.

Full System Virtualization is done in a number of ways.

The entire virtual system memory address space is pre-allocated, and appears to the virtual machine to be a linear address space regardless of how it is actually mapped from the physical hardware address space.

A system BIOS boots inside this address space, much like a full PC's BIOS would boot, providing a real-mode int13 interface to emulated chipsets inside the virtual machine. The Operating System boots and loads devices drivers to interface with the emulated chipsets. As far as the Operating System is concerned, it is running "Bare Metal".

There are a few methods of full system virtualization: software emulation only, software code-scanning and emulation, hardware only, hybrid software with hardware assistance. The difference is really in how each uses Intel VT (vmx) or AMD V (svm) CPU virtualization.

A CPU software emulation only approach is slow. QEMU (without kqemu), BOCHS, older versions of SoftPC for Mac, etc, are prime examples of this. The benefits are that a non-intel hardware platform can run emulated intel software, and that the emulation can be run entirely (if not inefficiently) in userspace.

A CPU software code-scanning and emulation approach is much faster than software emulation only. Guest code pages are scanned for illegal instructions, and illegal code is "trapped" to handle opcodes and operations that would endanger other virtual machines outside of a given virtual machine sandbox. This method only works on like architectures (intel code scanning on intel hardware) and doesn't require any special CPU support for hardware emulation. QEMU (with kqemu), Win4Lin, Virtuozzo, and a number of other "pre-VT" system virtualization technologies used this approach.

A CPU hardware assisted only solution is really limited to two implementations at present. The Linux kvm project allows full system guests to run under a linux host kernel using a modified QEMU to present the virtual emulated chipsets and other system features. Likewise, Xen's Hardware Virtual Machine (HVM) does the same, only running natively under the Xen hypervisor instead of as under a Linux kernel.

A hybrid software with CPU hardware assistance approach can be a bit faster than hardware assisted virtualization alone. VirtualBox is the only opensource project of note at the moment that does this. Commercially, VMWare and Parallels both use this hybrid approach to accelerate system virtualization.

Of the full system virtualization technologies, VMWare is by far the most mature and fully featured. It is, however, commercially licensed. While you can get "Free" versions of VMWare Player and VMWare Server, there are real limitations as to how scalable either are, and what you can do with them.

VMWare Workstation is the "bleeding edge" version of VMWare. All innovations happen on that platform first. The stripped down player is based on VMWare Workstation. Eventually, many of these innovations make their way back into the server grade versions of VMWare.

IBM's power hypervisor is the oddball here, but it's important to mention. iSeries/pSeries have collapsed onto the Power5 hardware architecture with the hypervisor based i5/OS. Using Transitive's x86 emulation, this platform will (soon? already?) run "hundreds of virtual PCs" as well as AS/400, AIX5L, and native Linux on a single hardware platform. Heck, with Fundamental's FLEX-ES, UMX's Virtual Mainframe Facility, or even hercules, you can even emulate a zSeries mainframe.

Unfortunately, power5 hardware isn't commodity PC hosting gear. And that's probably the kind of hardware you're looking at, isn't it?

So, you really really want to use Xen?

First, lets consider the "flavors of Xen".

There are three primary "flavors" of Xen: Opensource Xen, XenSource Enterprise/Express, and Virtual Iron's Xen.

As we're still talking about full system virtualization rather than paravirtualization from this point on, it's important to realize the speed impact of using emulated chipset devices and generic device drivers rather than PV device drivers to access disk and network resources.

Xen uses QEMU to emulate a Intel PIIX3 IDE chipset (with some PIIX4 features), and a Realtek 8139 network card. While the IDE chipset emulation is bearable, it does incur a bit of CPU overhead in dom0 as QEMU emulates the chipset. The network emulation, on the other hand, is abysmal. Upload rates are "ok" at 6mbit+, but download rates are below 1mbit in speed, running on standard commodity PC hardware. While it could be a mere IRQ issue, it is important that you realize that running with the IDE drivers and RTL8139 drivers inside your guest are going to significantly impact your virtual system's performance.

This is where PV drivers come in.

OpenSource Xen and XenSource both have a XenBus upon which "PV devices" appear. Virtual Iron reworked their XenBus into NexBus, largely to support live migration of HVM guests, and likewise have their own unique "PV devices".

Each "flavor" of Xen needs a different set of PV device drivers.

OpenSource Xen 3.0 has been incorporated into a number of Linux Distributions: SuSE 10.1, RedHat Enterprise Linux 5, Fedora Core 6, Debian Etch, Ubuntu Edgy, and Gentoo are just a few.

The Xen project includes "unmodified_kernel" drivers for Linux 2.6. This means, if you want to run full system virtualization using Xen HVM, you only have the option of building Linux 2.6 PV drivers for your guest.

Only Novell's SuSE 10.2 commercial "Xen pilot" will have Windows PV drivers. There are no other OpenSource Xen device drivers for Windows at this time.

XenSource Enterprise/Express, on the other hand, have their own PV device drivers. While you can "almost" use the XenSource PV device drivers with the OpenSource Xen, there is much talk of data corruption and general "that just shouldn't work" messages on the IRC channel from XenSource developers. Simply put, if you run the commercial XenSource product, you should use the XenSource drivers.

Likewise, Virtual Iron has their own device drivers that are unique to their hosting platform. Their "vstools" support one version of SuSE 9 and one version of RedHat Enterprise Linux 4 (U2) in addition to their Windows drivers. While you can download the domu sources from their website, good luck trying to get them running on a linux kernel newer than around 2.6.9. I know. I've tried. If you want to run a Linux guest in Virtual Iron, you're pretty much limited to RHEL4U2. Good luck with anything else.

What if I just want to run Windows under OpenSource virtualization?

OpenSource Xen doesn't have the PV drivers yet. It will be too slow for you to really use in a production capacity.

VirtualBox.org would be my suggestion to you. It includes device drivers that seriously speed up the Windows experience and make it a viable full system virtualized environment for opensource based windows hosting.

If you don't mind forking out the coin, Virtual Iron has a good Windows virtualization platform that is much cheaper than VMWare, and is licensed per socket. With it, you get live migration and vendor support.

If you seriously have no qualms about the cost of the virtualization and want a mature top notch platform, fork out the cash for VMWare ESX.

If none of these solutions seem good to you, look at the "free" VMWare Server. It is based on mature VMWare GSX tech (though features have been whittled down in places) It doesn't scale as well as VMWare ESX, but the cost point is much easier to swallow (free as in beer).

Use the best tool for the job. Move on to the larger business problems. How is that SOA deployment going, anyway? ;)

Tue, 23 Jan 2007

Unlike AMD's V (svm) support, Intel's VT (vmx) mode requires BIOS support.

More specifically, your motherboard vendor (or system vendor) must allow enabling vmx mode in their BIOS. Without BIOS support, you cannot use vmx mode.

Vendors apparently can disable vmx support in their systems entirely by setting the lock bit in the Feature Control MSR. Some vendors like HP have taken to disabling VT support in laptops, claiming that they disable it because they don't test it before shipping...

If your system BIOS supports enabling VT, doing so does NOT immediately make VT mode available. In fact you must hard power cycle the CPU for this change to take effect.

While documented fairly frequently (based on my google results), this apparently continues to bite new Xen HVM users.

Even systems without BIOSes sometimes need fixes as well.

Some early Macs with VT support needed modifications for DFI support for VT mode, I suffered through this with my early Mac Mini core duo.

Fri, 19 Jan 2007

Oh dear. I've really messed things up this time. I am entirely off base, and have confused a large number of people (including myself, apparently).

Any reference you've seen from me regarding VMI being a device interface is entirely wrong.

Any reference you've seen from me about Rusty maintaining VMI is entirely wrong.

This is a recent dialog with aliguori, someone directly involved in kvm/xen development, enough to tell me that I'm entirely off base:

*aliguori* paravirt_ops is a low-level paravirtualization interface.
it doesn't make any hypercalls but allows for "modules" to hook that
paravirtualization interface and then translate to the underlying
hypervisor's paravirtualization interface
*aliguori* there is a paravirt_ops implementation for VMI, Xen, and KVM
at the moment
*aliguori* you can think of paravirt_ops as paravirtualization
infrastructure, and then xen/vmi/kvm's paravirt_ops implementation as
drivers for specific hypervisors
*aliguori* and btw, there is no such thing as VMI device drivers
*aliguori* VMI is strictly a CPU paravirtualization interface
<aliguori> Zachary Amsden is doing the VMI paravirt_ops implementation,
Jeremy Fitzhardinge is doing the Xen paravirt_ops implementation, and 
Rusty is doing the lhype implementation (and I guess Ingo is sort of 
doing the KVM implementation)

Argh. So, mea culpa. I really messed that one up now, didn't I.

Anything I said about virtual devices is apparently entirely off base. Now I get to ensure that future posts are accurate on this matter.

IOMMUs and the future of hardware virtualization

There is one last thing to think about: isolation capable IOMMUs. Soon next generation Intel VT-d and AMD SR-IOV capable CPUs should be out with isolation capable IOMMUs. This means that you will see huge speed improvements from IO virtualization, and the potential to both assign PCI devices to hardware virtualized operating systems and have new "virtual aware" devices from hardware vendors that can be shared by multiple guests at a hardware level.

According to jnalley's post on the Xen developer IRC channel, "SR-IOV allows a PCI-e device to present virtual functions to the root complex. This would allow a guest OS (domU) to access the device directly."

Intel VT-d and AMD IOV should be out sometime Real Soon Now

For more information on SR-IOV, visit the specifications for SR (and MR) IOV.

I hope this helps clears things up.

Again, my apologies for those who were misled by my misunderstanding.

Tue, 16 Jan 2007

Yesterday, someone stumbled into the #kvm channel and mentioned that VirtualBox has gone OpenSource.

After some frantic questions and listening to the #vbox channel, it became apparent that there are some benefits and limitations of VirtualBox worth noting.

VirtualBox can use Intel/VT or AMD-V/SVM if available, but does not require it. Much like VMWare, which take the same hybrid software/hardware approach to virtualization. For 32bit guests, this can be much faster than pure VT/SVM.

VirtualBox (herein referred to as VBox) is similar to VMWare workstation or VMWare server, in that it has a ring0 kernel driver for a linux host.

This ring0 requirement means that it is not compatible with a Xen paravirtualized domU (and that includes dom0).

VBox leverages QEMU heavily for software emulation of real-mode and other critical code sections, as well as for hardware emulation.

QEMU has a closed source kernel module, kqemu, and a somewhat alpha quality opensource equivalent, qvm86, that do the software code-scanning method of virtualization. They do not require or recognize VT/SVM.

VBox's primary competitor is the kvm project, which provides QEMU based VT/SVM guests. The downside of kvm, of course, is the requirement for VT/SVM support from your CPU. VirtualBox has no such limitation.

VBox only supports 32bit host kernels and 32bit guest images. There is no 64bit support for either running under a 64bit Linux host kernel, or running a 64bit guest OS. The website does mention that 64bit support is under active development, however.

VBox has yet another virtual bus of virtual devices, akin to Xen's paravirtualized XenBus devices (or Virtual Iron's NexBus). While hardware devices are available (PCNet32, etc) using QEMU hardware emulation, VBox also has some excellent video/network/disk drivers that eliminate the hardware chipset emulation overhead.

VMWare tried to make VMI a standard for paravirtualized bus devices. The Linux kernel developer community initially balked, but VMI support lives on in Rusty's paravirt-ops patches. Recently, Ingo has been making great strides with paravirtualized kvm support.

One oddity is that VBox uses .VDI files for its disk images. Not QEMU's QCOW format, not VMWare's VMDK format, and not RAW disk image format.

And for the n00bs that keep popping in and asking about 3d support. No, VBox doesn't proxy 3d. No, QEMU doesn't proxy 3d. Yes, you can use a 3d card with a Xen paravirtualized domain (NOT with an HVM domain).

The only virtualization platform that supports 3d for Windows guests, that I am aware of, is VMWare 5.0 and later which have a somewhat crashy "beta" DirectX 3d support. (Simply add "mks.enable3d = TRUE" to your .vmx file by hand, for more info try googling for "mks.enable3d").

Parallels has promised 3d guests for 4th quarter of this year. If they deliver it, I will be pleasantly suprised.

If you really need 3d gaming for Windows games on a non-Windows platform, consider Transgaming's Cedega product line. Yes, it is Wine. Yes, there is a 50% overhead for the emulation. No, you're not going to do much better without running windows bare iron.

Where does this leave me? In limbo, mostly. I have a 32bit farm of Xen hosts moving toward a 64bit Xen hosting platform at the moment. Xen appears to be crawling while other tech like kvm and virtualbox keep popping up to challenge it. Xen's "maturity" is only really a year at best with its HVM support (quite a lead in tech terms), I can see l-hype/kvm and virtualbox quickly overshadowing Xen in the near future.

Eventually, VMI/paravirt-ops is going to level the playing field with standardized guest device drivers, regardless of hosting platform. Until then, we continue to craft guests based on the virtualization platform under which they will be run.

Thu, 04 Jan 2007

While Xen is a wonderful virtualization platform, there are a number of lesser known limitations of Xen which aren't well documented. You learn these limitations from first-hand experience.

Xen modes of operation

There are 3 modes of operation for Xen:

  • 32bit
  • 32bit+pae
  • 64bit

The hypervisor mode must match the PV mode. As dom0 is a PV, that means it must match the mode of the hypervisor. This goes for all PV domains.

This means you can't run a pure 32bit PV under a 64bit hypervisor. Nor can you run a 32bit+pae PV under anything but a 32bit+pae hypervisor It must match, all the way through.

The Xen developers are working to fix this, eventually.

The same is not true for HVM operation: you can run 32bit HVM domains under a 64bit hypervisor/dom0.

The easiest way to find out what modes are available to you is to run "xm info | grep xen_caps". That will tell you exactly what guests you can run with your current setup.

Xen does not page

The Xen hypervisor does not page/swap to disk. In fact, the Xen hypervisor isn't directly aware of disk storage at all. All IO goes through the dom0 kernel which communicates with PCI devices.

Xen only manages available RAM.

By default, the Xen Balloon driver allows PV domains to be allocated some amount of RAM (up to maxmem) or reduced to some miminum amount of RAM (minmem), on the fly.

HVM domains allocate maxmem on start, and cannot be resized dynamically (you must restart the domain).

The Xen Balloon driver is shunned all over the xen-devel list historically. It has gotten better over time, though it still has some interesting behaviors.

With the current 3.0.4, for example, if you are running a PV domain with less than maxmem memory assign and save that domain to migrate it, when to restore the domain, it will allocate maxmem memory to it.

Every version of Xen tweaks the behavior of memory allocation just a little more. The full history of said behavior is still well beyond my understanding at this time.

Xen shared pages are limited

When a domU is started, there are a number of "shared pages" between the dom0 and the domU for them to communicate using a system of grants and page flipping between them.

Sadly, this grant space is limited. So limited in fact, that other Xen limits were introduced:

Xen 3.0.3 limits domUs to 3 network interfaces

This is due in part to the above shared page pool limitations.

People were using many many network interfaces, each incurring additional stress on the limited shared resources for inter-domain communication.

Apparently, part of the "fix" was to impose an artificial restriction of 3 network interfaces for all domUs in Xen 3.0.3.

Xen has a potential DoS condition if netloop isn't used

This one is particularly disturbing, and hard to explain or gauge how limiting it really is.

When a domU sends a packet to dom0, the ethernet frame is put into a shared page and access is granted for dom0 to use it.

While dom0 is using that page for the shared ethernet frame, there is a danger that a busy network might drain all available shared pages and Xen may panic.

As long as dom0 is immediately copying off frames to another network interface to be shipped off, there is no problem.

If, however, packets are destined to be processed by dom0 userspace, that skb sits in kernel space until the userspace daemon processes that packet's contents. This causes a strain and potential exhaustion of shared dom0/domU pages for these packets to sit around until they are handled.

Ouch.

This is where netloop comes in. Netloop is a Xen driver that provides a vif0.0/veth0 pair locally to the dom0 explicitly to be used to buffer those ethernet frames. By adding vif0.0 to a bridge along with the vif of a domU guest, any packets destined to be handled by dom0 userspace can take its sweet time and no problems will befall the system.

If you have any dom0 servicing domUs with userspace daemons, and you're not using a netloop to copy the frames, you may want to rethink this immediately. This includes routed/bridged/natted configurations, anything where a packet is handled by a dom0 userspace daemon coming from a domU.

Xen schedulers

There are 3 schedulers in Xen:

  • BVT
  • SEDF
  • CREDIT

Both BVT and SEDF are "complex and buggy", and will go away in future releases.

CREDIT

  • Is the simplest of the bunch to use.
  • Handles SMP much more efficiently than both of the previous schedulers.
  • Doesn't have the real-time behavior of SEDF (time-sensitive guests can be impacted, such as VoIP or any RTP streaming applications)
  • Is the default scheduler in 3.0.3 and newer
  • Is the only one that will survive going forward

Xen HVM gotchas

HVM domains require an Intel VT or an AMD V (SVM) capable processor. You can check your cpuinfo flags for "vmx" or "svm" to see if your processor has support for this feature.

The qemu bios used by xen is not patched for lba48, and you are limited to 160G disks.

You can use the commercial XenSource PV drivers (from XenExpress) to avoid the qemu-dm hardware emulation overhead.

HVM domains currently do not suspend/restore/migrate, much less live migrate. The announcement for 3.0.4 suggests that this is a feature slated for 3.0.5.

SMP support for HVM guests in 3.0.4 is better, as is support for other non-windows and non-linux guests, but I've yet to get SMP HVM guests working myself.

Xen volume size limits

There were numerous reports of 2TB limits with Xen vbd volumes in as late as Xen 3.0.3, even with 64bit. No, I do not know if 3.0.4 addressed them.

Xen logical volume resizing

You can't resize LVM2 logical volumes on the fly and have the domU see them to allow them to resize their filesystems without rebooting.

This means downtime whenever I need to grow a domU's filesystem. I get to lvextend it, reboot the domU, then xfs_growfs the filesystem. In that order.

Frequency Scaling kills Xen

Just turn off any frequency scaling in your dom0 (like AMD powernowd, or cpufreq settings), it drives Xen crazy.

Xen's ACPI support

Xen has minimal ACPI support. Don't think you're going to get S3 or S5 sleep suspend/resume working with Xen on your laptop. If you do, LET ME KNOW.

Xen Xserver video drivers

The nVidia video driver needs the following patch to work with Xen.

There have been a couple of reports of symbol errors when loading this. No, I haven't ried it myself, this patch was from someone else via IRC (nick long forgotten):

patch-nv-1.0-9625-xenrt.txt

Xen PVs run ring1, not ring0

This means you can't run VMWare, QEMU/kqemu, or Linux kvm under a Xen PV (this includes dom0, which is a glorified PV).

In theory, you should be able to run VMWare or QEMU/kqemu under an HVM domU.

Xen supported kernels

Xen 3.0.3 ships with patches for Linux 2.6.16.29. Xen 3.0.4 ships with patches for Linux 2.6.16.33.

If you have a newer kernel running Xen, it's probably a distribution patched version.

This means, if you want a driver from 2.6.18 or 2.6.19, you either need to backport said driver to 2.6.16.x, or you need to bravely forge ahead and risk help from the xen-devel team.

Not that you're entirely unsupported, just that your distribution is bravely adopting a newer kernel with untested/unsupported patches.

In conclusion

Those are most of the biggies that people seem to clamor about the most. If you have any others, please drop me a line.

Thu, 28 Dec 2006

What exactly is the difference between Paravirtualization (PV) and Hardware Virtualization (HVM) with regard to Xen?

This question continues to come up again and again. Rather than answer it in a private email or rather useless IRC chat room, it seems best to summarize it in a blog post.

Paravirtualization means that guests "cooperate" with the virtualization they run under. This means that paravirtualized virtual machines are aware they are running in a virtual environment, and have special drivers or awareness of that environment as they run.

In the case of Xen, all guests run under the guidance of a tiny Xen "hypervisor".

Think of a hypervisor as a microkernel (remember when those were big?), that is responsible for allocating RAM, acting as an intermediary for IO, routing hardware interrupts, and scheduling a fair share of CPU time to each virtual machine.

By default in Xen, one virtual machine talks to the PCI hardware, doing all IO for the others. In Xen parlance, this is "domain 0" (or "dom0"), which is a master OS that talks to the hardware on the box and provides IO resources like networking and disk space to the other domains on the physical machine. There are Linux, OpenSolaris, and FreeBSD dom0s now, it isn't just linux.

With Paravirtualization, your guest kernels need to be compiled to be aware of the Xen hypervisor, with a special Xen patch set. These kernels cannot run without the Xen hypervisor. They require Xen to operate.

The "new" thing out there is something that Xen coins Hardware Virtualized Machines (HVM). Normally, x86 based Operating Systems run with a kernel running in "ring0". Historically, only one Operating System can run as ring0 on a x86 based PC. Now, both Intel/AMD have added special VT/SVM CPU extensions that allow a special "privileged mode" of operation where a hypervisor can run multiple Operating Systems in ring0 at the same time.

Historically, without these VT/SVM instructions, you have to scan every code page for illegal instructions and/or trap instructions to emulate dangerous ones. This is how VMWare initially worked (today VMWare is a software/hardware hybrid that is aware of VT/SVM instructions). This is how QEMU, Parallels, Microsoft Virtual Server, and other virtualized PC platforms initially functioned.

With Hardware virtualization, you install an Operating System from CDROM just as if it is a physical machine. There is a BIOS, there is a VGA display (VNC/SDL), there are emulated IDE and RTL8139 network chipsets. The hardware is actually borrowed from the QEMU project, but thanks to the VT/SVM instructions, there is no need to scan the code or trap illegal CPU instructions the same way as the previous generation of PC virtualization had to.

The mainframe has had Hardware Virtualization since at least the OS/360 days. This is only something new to the PC platform.

While this is a fun essay, and I'd love to go on at length, I think this answers the initial question adequately.

If anyone else has any questions, please feel free to join us on ##xen on freenode, or drop me an email. Please don't be suprised if I post the answers here.

Xen documentation is sorely lacking. Lets try and change that, shall we?

Thu, 28 Dec 2006

I just finished backporting Xen 3.0.4 and a slew of 2.6.16.33 kernels to our standard platform (building patched debian packages along the way).

The new bits are an API change, better support for SMP and ACPI, some bug fixes, and framebuffer consoles for PVs (borrowing from HVM, it appears).

Here is the announcement from the list:

Folks, 

We're pleased to announce the official release of xen 3.0.4!

This is largely an opportunistic stabilising release for HVM guests, due to
the large amount of work in that area of the code since 3.0.3. These
enhancements have in particular improved support for SMP and ACPI Linux and
Windows operating systems.

Other highlights of this release include:
  - support for kexec/kdump of Xen and domain 0;
  - graphical framebuffer support for paravirtualised guests;
  - preview support for the new XenAPI management interfaces;
  - enhanced support for IA64 (IPF) and Power systems.

Since 3.0.4 is an interim release, certain features such as HVM save/restore
will now be part of Xen 3.0.5 which we expect to release in early 2007.

You can get the source using mercurial from:
  http://xenbits.xensource.com/xen-3.0.4-testing.hg

Source and binary tarballs, and RPMs, will be made available from:
  http://www.xensource.com/downloads

Cheers,
Keir (on behalf of the whole Xen dev team)
Thu, 16 Nov 2006

The process for converting a VMWare VMDK disk image to Xen HVM is rather quite easy. However, there are "gotchas" that you need to consider when doing this conversion.

First, and most importantly, identify if this is a SCSI or an IDE virtual disk. If you installed Windows to a SCSI disk under VMWare, it is unlikely that Windows has the IDE drivers appropriate for Xen HVM. To remedy this, you need to follow the guide documented by Microsoft kb314082.

Once you have ensured that your windows image has IDE drivers installed, you can procede to converting the image.

Next, you need "vmware-vdiskmanager", to convert newer VMWare VMDK files into a compatible format for furthe processing. This tool comes with VMWare 5.0 and VMWare Server 1.0. There is a similar (but different) method of doing this under VMWare ESX.

Identify the appropriate vmdk file to use that represents your disk. This will either be:

  1. The lone .vmdk file that is rather tiny and contains a numer of lines of text describing the geometry and component series of files that comprise the whole .vmdk.
  2. The first .vmdk file in a series of 2G segmented files named with trailing -0001 style numbering,
  3. The last "snapshot" .vmdk file in a series (again, named with trailing -00001 style named files).
  4. The latest "REDO" .vmdk file in a series of snapshots.

I'm sure there are more incarnations of this. It's rather hairy if you've not dealt with it before.

How do you find the right one? Look inside your ".vmx" file for a line beginning with:

scsi0:0.fileName = windows2003.vmdk

or

ide0:0.fileName = windows2003.vmdk

That's all there is to it. Now, lets assume the name of our disk is "windows2003.vmdk".

$ vmware-vdiskmanager -r windows2003.vmdk -t 0 windows2003-flattened.vmdk

This will create a "single growable virtual disk" that is flattened into a single file.

The next step is to turn this flattend.vmdk file into a disk image with qemu-img from the QEMU project.

$ qemu-img convert windows-2003-flattened.vmdk windows2003.img

When this completes, you will now have a windows2003.img file that might boot for you.

The unfortunate reality of running a Windows OS is that it makes a number of assumptions at install time as to your PC hardware. If you transplant the image, you may need to change the Hardware Abstraction Layer (HAL).

Windows 2003, for example has 6 HALs:

HALMACPI.DLL - ACPI Multi processor PC
HALAACPI.DLL - ACPI Uniprocessor PC
HALACPI.DLL - Advanced Configuration and PowerInterface (ACPI)
HALMPS.DLL - MPS Multiprocessor PC
HALAPIC.DLL - MPS Uniprocessor PC
HAL.DLL - Standard PC

Only one is selected and installed as \WINDOWS\SYSTEM32\HAL.DLL at install time.

It is possible to modify your C:\boot.ini to specify a different "/HAL=HAL.DLL", if you copy in the other DLLs so they can be referenced. In this way, it is possible to do some trial and error to see which of the above HALs work with which domU HVM configuration.

When you create your Xen configuration file, you have the opportunity to set four flags that critically interact with the above HALs, namely:

# enable/disable HVM guest PAE, default=0 (disabled)
pae=0

# enable/disable HVM guest ACPI, default=0 (disabled)
acpi=0

# enable/disable HVM guest APIC, default=0 (disabled)
apic=0

# The number of CPUs to assign to this domU
vcpus=1

The above configuration would be most at home with the "Standard PC" HAL.DLL.

For the MPS HALs, one would assume you would enable APIC.

For the ACPI HALs, one would assume you would enable ACPI.

Good luck figuring out which Xen configuration matches which HAL. At the moment, the only success I've really had with Xen 3.0.3's HVM is to use the "Standard PC" HAL.DLL.

When VMWare was used to build the Windows image, it detected ACPI and used an ACPI HAL. To revert this to the "Standard PC" HAL.DLL, I had to mount the image and replace this file:

# mount -o loop,offset=$((63*512)),rw windows2003.img /mnt
# find /mnt -name 'hal*.dll' -print
/mnt/WINDOWS/ServicePackFiles/i386/halaacpi.dll
/mnt/WINDOWS/ServicePackFiles/i386/hal.dll
/mnt/WINDOWS/ServicePackFiles/i386/halacpi.dll
/mnt/WINDOWS/ServicePackFiles/i386/halapic.dll
/mnt/WINDOWS/ServicePackFiles/i386/halmacpi.dll
/mnt/WINDOWS/ServicePackFiles/i386/halmps.dll
/mnt/WINDOWS/system32/hal.dll
# cp -f /mnt/WINDOWS/ServicePackFiles/i386/hal.dll /mnt/WINDOWS/system32/hal.dll
# umount /mnt

Now that you have a "fixed" img file representing the entire drive, you can dd it straight to a lvm logical volume to be used as a Xen phy: vbd device:

# ls -la win2003.img
-rw-r--r--  1 root root 8589934592 2006-11-16 13:44 win2003.img
# lvcreate -L 8G -n win2003-hda vg
# dd if=windows2000.img of=/dev/vg/win2003-hda bs=1M

Now you are done. Start up your spiffy new HVM domain.

This, in a nutshell, is how you convert a VMWare image into a Xen HVM disk image.

Thu, 16 Nov 2006

Xen HVM uses the AMD SVM (Pacifica) and Intel VTX (Vanderpool) hardware CPU virtualization.

Both Parallels and VMWare now utilize the same VTX technologies in their products. Based on blogs I have read, VMWare added VTX support somewhere around or just before VMWare Workstation 5.5, and Parallels has supported Core-Duo Intel Macs since their beginning. No, I don't know if either supports AMD's SVM quite yet.

Aside from these products, I am currently unaware of anything else that use today's modern CPU SVM or VTX features.

VMWare Workstation and Server normally runs alongside a host OS, inserting a "vmmon" driver into Ring0. VMWare ESX has its own hypervisor, much like Xen, though you do need to embrace RedHat for their management harness. Hardware is emulated virtually in software (IDE, SCSI via Buslogic/LSI, Network via Pcnet32/VMX, etc). Guest OSes talk to these drivers as if they were running on a physical machine.

Xen is a small hypervisor that "paravirtualizes" CPU scheduling and assigns hardware resources to virtual "domains". The first domain, dom0, is responsible for talking to your PC's hardware directly. Each "guest" domain, or domU, can only talk directly to hardware if it has been configured to allow such access. Typically, a domU only has "frontend" drivers that talk to resources exposed by a "backend" typically from dom0. Things like virtual block devices and virtual network interfaces are handled by native Xen aware device drivers in such paravirtualized domUs.

Xen can also run in HVM mode. This means that instead of paravirtualized devices, a real set of virtual hardware is exposed to the domU to use real device drivers to talk to. Much like