Fenxi - Performance analysis made easy
Changing libgnomecups For Multiple Evolution Users
Happy National Sys Admin Appreciation Day!
ESX iSCSI Basic Configuration from the CLI
Tape Rants and Raves: LTO4 Rules
apparently you aren't dead until you start to stink
Charlie Goes to Candy Mountain
Seattle Scalability Conference, Pt II
How a VoIP E911 call is handled
MonetDB - a column based RDBMS, ideal for time series data
VMfaq's comparison of virtual storage IO
Xen and Solaris, a log of experience.
OpenSolaris CR#6654713 - 32G limit bug stemmed from bad USB hardware? Perhaps fixed?
OpenSolaris CommonArrayManager
Sharity-Light - smbfs derived samba clone
Drizzle, a thin mysql, generating buzz
VMWare to offer ESX hypervisor for free
Fan, the programming language.
Blackberry Thunder with Haptics keyboard
iPhone App Store Live Walkthrough now available
Overclocking tool for the Mac Pro
ADO.NET Entity Framework (Microsoft's new ORM) given a non-confidence vote by beta testers
Ruby interpreter flaws make the case for JRuby
AdvFS - Tru64 filesystem ported to Linux
OpenSolaris 2005.05 repository update to b91 - follow these instructions carefully
SXCE can ZFS install as of b90
Vertebra: EngineYard's Next Generation Cloud Computing Platform
Skype 4.0 beta overhauls video chat
Mozilla org receives traditional IE cake
Toyota Prius to go entirely Electric
Bill Gates steps down permanently for philanthropic activities
Men write code from Mars, Women write more helpful code from Venus
DRBD LVM Xen = Bug. A rather nasty one at that.
Intel unveils Ct as an extension for C/C to encourage threaded programming for multiple cores
VMWare ThinApp - Run any Windows app on any version of Windows
JRuby-Rack <-- a JRuby port of Rack
Rack <-- a lighter cousin to Merb, fully threaded and no Mutex.
Solaris Cluster Express (SCX) 6/08 released.
Changing solaris' default password hashing
Texas based service provider explosion affects 9,000 servers and 7,500 customers.
Jruby on Rails on Tomcat deployed as as WAR file
42 more of the best Linux games
Use Google's cached ajax libraries
Arduino microcontroller with OS/X
The metasploit page describing the full impact of the poor RNG.
Holger Bert's blog post on the openssl RNG fiasco
Cayac - Cherokee MySQL PHP5 phpMyAdmin
ZFS very slow under an xVM kernel
Dynamically editing libvirt xml configs while a VM is running to redefine reboot flags.
Chronoton - the time travelling robot who's best friend is a talking pie game
Rietveld - Google's code review tool
Opensource multitouch displays
Ono - an efficient way to locate nearby peers
Solaris CIFS integrated AD with ZFS acls
Samba Winbind and ZFS acl working together
Why's unholy Ruby to Python .pyc compiler
OpenSolaris 2008.05 final ISO image
Twitter abandoning Ruby on Rails
HP makes memory from a once-theoretical circuit
Setting Up an OpenSolaris NAS Box: Father-Son Bonding - The Video
Linux kernel Xen self-ballooning patch
Coolstack - Yet another group of solaris packages
SFE - Spec Files Extra - or, solaris's ports system
ksplice - live linux kernel patching
ZFS-102-A.pkg - binary package build of newer ZFS for Mac
Changing boot flags for a solaris domU guest
callflow - SIP callflow diagram generator
sdedit - quick sequence diagram editor
Milax - The OpenSolaris Small Live CD
Big Nerd Ranch on Windows/Linux/Leopard single signon
Sun touts big plans for OpenSolars as first release nears
Heroku - EC2 based Rails hosting.
Meadowcourt's compiled WindowsXenPV driver, v0.8.8, as built from win-pvdrivers.hg repo
Network Solutions hijacks all customer's unused subdomains
ZFS speed bump: set zfs_nocacheflush = 1
We Don't Use Software That Costs Money Here
Hubble - a PlanetLab realtime Internet "blackhole" monitor
Citrix price jumps on rumors of potential IBM/Cisco bidding ware
TechCrunch labs on their AppEngine deployment
pash - because powershell was too cool to let microsoft keep to itself
Brazil migrates 430 thousand boting machines to Linux
The Machine Emulator - TME can emulate a sparc4 with OBP
Google releases new GCC linker
Automatic generation of peephole superoptimizers
Xen.org Trademark Policy for Review
SXCE b85 has problems booting under Xen 3.2
VNRP == opensolaris quagga rbridges crossbow xVM
problems reprobing iscsi devices with solaris 10
LSI MegaRAID SAS/Dell PERC5 driver for Solaris
dm-band block IO bandwidth controller
Dojo.storage - Google Gears workalike?
ooma.com - free phone service after you buy their device
Hacking defibrilators shockingly easy
Microsoft working with Eclipse.
Pentagon attack last June stole an "amazing amount" of data
Solaris and Solaris Cluster on HP ProLiant Servers
Apple Introduces new MacBook and MacBook Pro models
Sun leaks 6-core Xeon, Nehalem details
Xen and Solaris - a journal of sorts
How to save the world with ZFS and 12 USB sticks
Xvm: a summary of creation of various Xen domU
OpenSolaris b82 comes with CoolStack
Dilber PHB on Virtualization Consultants
Sun xVM Ops Center GA v1.0 tomorrow
KernelTrap on the 2.6.23 Xen merge
IETF XMPP/SIMPLE Interworking Draft
PSYCed - IRC/XMPP server that gateways transparently between both
OTR - Off The Record, Homepage. IM Encryption.
SIPE - Pidgin plugin for SIP/SIMPLE with Microsoft LCS compatibility hacks
Price Waterhouse Cooper's Global Cable Map
Solaris Windows iSCSI speedup disabling NAGLE
OpenSolaris Storage Developer Wish List
Nexenta Builder - build your own Nexenta based distribution
Microsoft to acquire SideKick maker Danger
Linux Kernel 2.6.23-2.6.24 vmsplice local root exploit
The evolution of Tech Company logos
Mindstorms NXT Rubiks Cube Solver
Cut four undersea cables, shame on you, cut a fifth, also shame on you
Koha - OpenSource Integrated Library System
SIPE - SIP Exchange protocol - or, how to get Pidgin to talk to Microsoft Live Communication Server
Amazon SimpleDB written in Erlang
Xen DR7 and CR4 Registers Multiple Local DoS vulnerabilities
XMLPulse - parse xen dom0/domu stats
The rist of the FOSS spinmeister
Smartphones patented - lawsuits immediately filed
H-Sphere cross-platform hosting control-panel
Mystery infestation strikes Linux/Apache web sites
GNU/Solaris - When the fun begins
KDE goes cross platform with Windows and Mac/OSX support.
Microsoft prints get-out-of-jail card for Vista Home
Tsung - an erlang based multi-protocol distributed load testing tool
Microsoft relents, ban on vista virtualization is lifted
Hyperic podcast talking smack with Luke KAnies of Puppet
The Mysql storage engines, and when they are appropriate.
MADOCA - Message And Database Oriented Control Architecture
SMP Xen HVM Windows guests need timer_mode=1
James Randi is coming to Tampa
Information Of Those Who Appealed Watch List Compromised
Tata Nano - $2500 world's cheapest car
Air Travel with Spare Batteries? Check the changes to what is permitted starting tomorrow.
Open Configuration and Management Layer
FiveRuns RM-Manage - rails project monitoring
VLDB - Very Large Data Base Endowment Inc - nonprofit
Elastix - a more friendly Trixbox fork
A Glimpse and a Hook - a take on resumes
Xirrus - LISA used 7 arrays to provide WiFi
dopd - an easier way to keep drbd primary/secondaries in sync
OpenSIM - run your own SecondLife grid.
$4million in hardware lost in London data center heist
iscsi block device script for /etc/xen/scripts
Quaqua - Aqua look and feel widgets for jvm
Chimps beat humans in memory tests.
Level 3 needs technicians with FIREBALLS
10 steps to close down an open society
Longer flights to avoid air traffic control charges
News release from Six Apart about LJ sale to SUP
Optimus keyboard is finally available
pkgGen and logGen and Packagemaker - repackage os/x packages to deploy
Jumpbox.com - virtual appliances
TelegraphCQ - barkeley database research - adaptive dataflow capture, combine, analyze
UK loses CD of private info on 25million citizens
Solaris Automatic Migration opensourced
AVS ZFS Demo <-- replicated ZFS pool
Xen Virtualization book not yet published for sell on Amazon
Phoenix BIOS releasing its own hypervisor
Andrew Warfield's other publications
Parallax - managing storage for a million virtual machines, from the Xen guys at Cambridge
Kepler project - GRID scientific workflow engine
Google Code Map/Reduce mini lectures
What 24 would have been like in 1994.
WaterRoof - Mac OS/X Firewall Manager
10 reasons why Oracle databases run best on VMWare
Google Caja - allow scripts in a 3rd party context
Xen Windows PV drivers - opensource mercurial repository
QuickSilver - opensourced 11/06/07
vmcasting.org - someone else "gets it"
ASUS EEEPC701 starts to appear
Perian - Opensource quicktime codecs
RSnapshot - an rsync based dirvish like tool
Flyback - a google code project equivalent to Apple's Time Machine, for Linux
Apple tablet PC is real, says Asus.
producten.hema.nl - wait for this one to load
Google rolls out the Open Handset Alliance
Cost analysis of Windows Vista Content Protection
Git - a Google Talk by Randal Schwartz
indeed.com - MIT search engine for jobs crawled from monster, dice, etc.
Tomshardware's RAID Migration Adventure
Theo de Raadt on Virtualization, and the sate of OpenBSD Xen
Bitlbee - IRC gateway all of your other IM traffic
Off The Record - encrypted IM overlay
SATA drive -> NES cartridge style
Amazon's one-click patents struck down
Morgan Stanley sells entire New York Times stake
Massive installation management tools
GULP: a unified logging architecture for authentication data
EC2 outage loses customer data
FutureOfWebApps conference underway
Microsoft releasing the Source Code for the .NET libraries
Windows 2003 Server Emergency Management Services (EMS) - Special Administration Console (SAC)
Catalyst - the Perl web framework analog to Rails
Fusion io - the power of 1000 harddrives in the palm of your hand
Proggyfonts.com - fixed width font downloads
BarCamp Orlando is this weekend
How to us CHDK to give your Canon digial camera RAW support
Cygnal - When Red5 just won't cut it for an RTMP server
IBM's CoScripter - automating web-based processes
AjaxWindows.com - Another Michael Robertson company
p0f passive fingerprinting IDS
Talking storage systems with Sun's ZFS team
SproutCore - a MVC scaffolding for actual Application development
Skype protocol obfuscation layer
Microsoft Silverlight and the Mono team at Novell join up to create the Moonlight project
Bitlbee - bridge IM client networks to an IRC channel.
EJBCA - The J2EE Certificate Authority
Mcell 3.5" drive has 1GB of DDR RAM 2.5" drive == 110MB/s transfer rates
OpenSolaris Xen domU with a linux dom0
Tentakel: distributd command execution
Ganeti: Opensource virtual server management software for Xen
Seemless dynamic image resizing
It's been a while since I've posted here. Let me bring you up to speed.
I've started deploying Solaris xVM in an attempt to use ZFS on the backend via iSCSI and the goodness of Xen 3.1.2 that is now in SXCE b85.
The first step in embracing xVM was deciphering the labyrinth that is Sun marketing.
OpenSolaris is the source based distribution. It relates to ON source trees and compiling things. If you state in #opensolaris that you are using "OpenSolaris", they assume you are building from source and are a developer.
The ON source trees are there for developers to build from. Likewise, the Blindingly Fast Updates (BFUs) are there for developers to update binaries between weekly builds, so they don't have to rebuild an entire tree. If you use BFUs, you break packaging and upgrades, and are effectively on your own.
What you are most interested in is Solaris Express Community Edition, otherwise known as SXCE.
SXCE is based on weekly build numbers, and is released every other week as a ISO image for mainstream users to play with. You can LiveUpdate between SXCE releases, and all packaging is handled properly.
SXDE is SXCE "frozen" quarterly. It is dead now. Ian killed it. Let me explain...
The Linux distribution Debian is pronounced /ˈde.bi.ən/. It comes from the names of the creator of Debian, Ian Murdock, and his wife, Debra.
Sun hired Ian Murdock. Ian has been changing things internally within Sun. Ian has championed a Linuxization of Solaris of sorts, which is a bit against the grain of most senior Solaris folks.
The new Indiana project is a culmination of this effort. It is effectively a repackaging of Solaris leaning more on the GNU tools and adopting a new "pkg" format that can update from repositories more readily. The current release of Indiana is Developer Preview 2, which is based off of SXCE b79. It is a live CD that can run a full desktop environment without actually installing it on a machine. There is an integrated "light" version of the caiman installer on the desktop that will allow you to install to harddrive media if you wish.
The "pkg" packaging in Indiana is a wonderful thing. Unfortunately, the repository doesn't appear to be updating every two weeks like SXCE.. yet. They're planing on doing this soon, which should make updates relatively painless. Automatic dependency resolution and the ability to point to a server or media repos makes this very similar to the apt-get way of doing things, though it's a Python based system (rather than Perl) that is actively under development.
Indiana installs to a zfs root. SXCE currently (as of b85) only installs to a UFS root.
While you can make SXCE boot to a zfs root, you effectively break LiveUpdate, as it doesn't grok zfs root.
Indiana doesn't need LiveUpdate. The "pkg" system will soon automagically do zfs snapshots to do upgrades (similar to the Nexenta apt-clone that I absolutely love), but you can approximate that now with minor effort.
... back to the explaination: Due to the advent of Indiana and Ian Murdock's influence, it looks like SXDE is effectively dead. There will reportedly be no future SXDE releases.
The default boot option of SXCE is "Solaris Express Developer Release". This is the caiman installer that is slightly bleeding edge and installs everything possible in a rather simple way.
The SXCE "Solaris Express" boot option is for the older more familiar Solaris installer. This allows you to fully specify what packages to install, and is more involved at install time.
Back to xVM: SXCE b89 will be the freeze point for Sun's xVM Server.
SXCE is currently in week b87, so in 2 more weeks there will be a deep freeze for that build.
Again, you have to pay attention to the community posts, flag days, and other things that let you get a feel for Sun's release cycle and marketing changes. I'm only just beginning to get a handle on it.
So, in conclusion, if you want to play with xVM, b85->b89 is a great time to get up to speed for the xVM Server product release.
Here's a Makefile I use to build freenx, posted here for others to use:
To initialize, run "/opt/nx/2.1.0/nxsetup --install", and you're done.
You may have to edit nxloadconfig and/or nxserver to replace the /usr/NX path with the installed path of /opt/nx/2.1.0. You may also want to edit /data/nx/conf/nxnode.conf with site specific changes.
More to come.
Enjoy.
Picking the right virtualization technology requires a basic understanding of what is available out there today.
Rik Van Riel has put up the virt.kernelnewbies.org page that shows a number of the existing virtualization methods. You might want to peruse this first to get a feel.
"Bare Metal" or "Raw Iron"
Basic computing today typically occurs on "Bare Metal". This would be where your Operating Systems is installed directly on a given hardware platform. This "Raw Iron" role is how most people treat computing platforms today.
Some higher end hardware platforms offer "Hardware Partitioning". This is where the hardware platform is divvied up between multiple parallel operating systems at the same time. The hardware platform offers up CPUs, memory, and disk to independent operating systems that then run on the resources allocated to them. This isn't as much virtualization as it is resource partitioning. An example of this would be higher end Unix hardware like Sun T1 processor based servers: each hardware platform can be broken up into 32 "LDoms", each with its own install of Solaris.
VPS "Containers" - Security/Role based Virtualization
If your userspace applications don't require unique kernel services to operate, you get far more density with a VPS "Container" solution than with any other virtualization method. Simply put, all of your userspace applications share one kernel and are separated from each other via role based security mechanisms.
There are a number of different VPS technologies out there, each with its own benefits and limitations:
OpenVZ/Vserver
Linux-Vserver
Solaris Zones
BSD Jails
Solaris Zones is the only VPS platform that supports running other flavors of Unix under its "BrandZ" containers. With it, you can run a number of 32bit Linux guest flavors alongside various Solaris/OpenSolaris versions.
OpenVZ has relatively new support for IPTables as well as IPSEC independent to guests, as well as live migration.
Simply put, you should really spend some time verifying that a VPS solution won't solve your virtualization problems first. They are the best method of virtualizing with the least amount of overhead and the highest virtualization density.
User-Mode-Linux
If you need a unique kernel for each virtual machine, and don't mind a bit of overhead, User-Mode-Linux provides a secure jail with a Linux kernel, running entirely in userspace.
Using "skas0", a User-Mode-Linux kernel can boot and run under and Linux kernel without much host kernel support (usually only tuntap networking). The I/O performance of User-Mode-Linux does suffer somewhat, however, and RAM allocation per virtual image isn't as ideal as a VPS solution.
The obvious benefit is the ability to run an manage a User-Mode-Linux virtual server as userspace processes on any "standard" Linux kernel.
If you're going to use User-Mode-Linux, I strongly suggest trying Xen paravirtualization instead. The only thing that User-Mode-Linux buys you is the ability to oversubscribe memory based on host kernel virtual memory paging. Xen doesn't let you overcommit RAM as associated with guests (though it does let you change the running memory footprint on the fly, unlike User-Mode-Linux which pre-allocates it from tmpfs).
User-Mode-Linux suffers from low I/O throughput however, and tends to fall apart under load.
Paravirtualization
Paravirtualization uses a technique of "cooperative virtualization" between guests and a hypervisor. Simply put, a paravirtualized guest virtual machine is aware that it is running under a virtual environment, and adapts to this environment as appropriate.
Xen's hypercall API is well documented, and has been available to the community longer than VMWare's VMI interface. As such, there are a number of Xen "PV" ports including FreeBSD, OpenBSD, and OpenSolaris, as well as the native Linux port that Xen embraces as part of the current opensource Xen platform.
Xen is slowly being ported into the Linux kernel proper, but there is much developer pushback to each stage of the import effort. Instead, the Linux Kernel Maintainers are gung-ho about Rusty's l-guest (previously known as "l-hype") as a paravirtualization platform for future Linux kernels. At this time, l-guest is very immature and quite slow, not nearly ready enough to consider for a production deployment.
VMWare opened up their VMI specification for everyone to use, to entice systems developers to standardize on a paravirtualization API. Providing this VMI interface would allow VMI aware guests to run under VMI aware hypervisors. Unfortunately, the device interface doesn't appear to have made the cut, so guests still need to be aware of paravirtualized devices as well.
Xen PV "backend"" devices appear on a XenBus, and are accessed using a PV "frontend" device driver. Natively, the opensource Xen 3.0 only has Linux 2.6 PV drivers. The various Xen ports of FreeBSD, OpenBSD, and OpenSolaris each have their own PV "frontend" driver implementation.
VMWare ESX uses their LSI SCSI device driver and VMX networking driver to optimally talk to virtual devices. These are available for a number of operating systems and are far more mature than Xen.
Some of the benefits of a paravirtualized guest include the ability to reallocate resources on the fly from the hypervisor (changing memory footprint, hotplugging CPUs) and more integrated lifecycle management (reboot, suspend, migrate).
Both Xen and VMWare ESX are hypervisor approaches with the ability to run paravirtualized guests on intel class hardware.
Xen 2.0 was initially offered only a paravirtulized "PV" mode of operation. Xen 3.0 offers it as well, alongside Hardware Virtualized "HVM" that we will over in the next section.
System Virtualization - Virtual Bare Metal
If VPS, User-Mode-Linux, and Paravirtualization aren't adequate to the task you have at hand, it might be time to consider full system virtualization.
This mode of operation is normally much more resource intensive, and is far less scalable than the earlier virtualization methods. However, for some Operating Systems (like Microsoft Windows), there really are no better choices at the moment.
Full System Virtualization is done in a number of ways.
The entire virtual system memory address space is pre-allocated, and appears to the virtual machine to be a linear address space regardless of how it is actually mapped from the physical hardware address space.
A system BIOS boots inside this address space, much like a full PC's BIOS would boot, providing a real-mode int13 interface to emulated chipsets inside the virtual machine. The Operating System boots and loads devices drivers to interface with the emulated chipsets. As far as the Operating System is concerned, it is running "Bare Metal".
There are a few methods of full system virtualization: software emulation only, software code-scanning and emulation, hardware only, hybrid software with hardware assistance. The difference is really in how each uses Intel VT (vmx) or AMD V (svm) CPU virtualization.
A CPU software emulation only approach is slow. QEMU (without kqemu), BOCHS, older versions of SoftPC for Mac, etc, are prime examples of this. The benefits are that a non-intel hardware platform can run emulated intel software, and that the emulation can be run entirely (if not inefficiently) in userspace.
A CPU software code-scanning and emulation approach is much faster than software emulation only. Guest code pages are scanned for illegal instructions, and illegal code is "trapped" to handle opcodes and operations that would endanger other virtual machines outside of a given virtual machine sandbox. This method only works on like architectures (intel code scanning on intel hardware) and doesn't require any special CPU support for hardware emulation. QEMU (with kqemu), Win4Lin, Virtuozzo, and a number of other "pre-VT" system virtualization technologies used this approach.
A CPU hardware assisted only solution is really limited to two implementations at present. The Linux kvm project allows full system guests to run under a linux host kernel using a modified QEMU to present the virtual emulated chipsets and other system features. Likewise, Xen's Hardware Virtual Machine (HVM) does the same, only running natively under the Xen hypervisor instead of as under a Linux kernel.
A hybrid software with CPU hardware assistance approach can be a bit faster than hardware assisted virtualization alone. VirtualBox is the only opensource project of note at the moment that does this. Commercially, VMWare and Parallels both use this hybrid approach to accelerate system virtualization.
Of the full system virtualization technologies, VMWare is by far the most mature and fully featured. It is, however, commercially licensed. While you can get "Free" versions of VMWare Player and VMWare Server, there are real limitations as to how scalable either are, and what you can do with them.
VMWare Workstation is the "bleeding edge" version of VMWare. All innovations happen on that platform first. The stripped down player is based on VMWare Workstation. Eventually, many of these innovations make their way back into the server grade versions of VMWare.
IBM's power hypervisor is the oddball here, but it's important to mention. iSeries/pSeries have collapsed onto the Power5 hardware architecture with the hypervisor based i5/OS. Using Transitive's x86 emulation, this platform will (soon? already?) run "hundreds of virtual PCs" as well as AS/400, AIX5L, and native Linux on a single hardware platform. Heck, with Fundamental's FLEX-ES, UMX's Virtual Mainframe Facility, or even hercules, you can even emulate a zSeries mainframe.
Unfortunately, power5 hardware isn't commodity PC hosting gear. And that's probably the kind of hardware you're looking at, isn't it?
So, you really really want to use Xen?
First, lets consider the "flavors of Xen".
There are three primary "flavors" of Xen: Opensource Xen, XenSource Enterprise/Express, and Virtual Iron's Xen.
As we're still talking about full system virtualization rather than paravirtualization from this point on, it's important to realize the speed impact of using emulated chipset devices and generic device drivers rather than PV device drivers to access disk and network resources.
Xen uses QEMU to emulate a Intel PIIX3 IDE chipset (with some PIIX4 features), and a Realtek 8139 network card. While the IDE chipset emulation is bearable, it does incur a bit of CPU overhead in dom0 as QEMU emulates the chipset. The network emulation, on the other hand, is abysmal. Upload rates are "ok" at 6mbit+, but download rates are below 1mbit in speed, running on standard commodity PC hardware. While it could be a mere IRQ issue, it is important that you realize that running with the IDE drivers and RTL8139 drivers inside your guest are going to significantly impact your virtual system's performance.
This is where PV drivers come in.
OpenSource Xen and XenSource both have a XenBus upon which "PV devices" appear. Virtual Iron reworked their XenBus into NexBus, largely to support live migration of HVM guests, and likewise have their own unique "PV devices".
Each "flavor" of Xen needs a different set of PV device drivers.
OpenSource Xen 3.0 has been incorporated into a number of Linux Distributions: SuSE 10.1, RedHat Enterprise Linux 5, Fedora Core 6, Debian Etch, Ubuntu Edgy, and Gentoo are just a few.
The Xen project includes "unmodified_kernel" drivers for Linux 2.6. This means, if you want to run full system virtualization using Xen HVM, you only have the option of building Linux 2.6 PV drivers for your guest.
Only Novell's SuSE 10.2 commercial "Xen pilot" will have Windows PV drivers. There are no other OpenSource Xen device drivers for Windows at this time.
XenSource Enterprise/Express, on the other hand, have their own PV device drivers. While you can "almost" use the XenSource PV device drivers with the OpenSource Xen, there is much talk of data corruption and general "that just shouldn't work" messages on the IRC channel from XenSource developers. Simply put, if you run the commercial XenSource product, you should use the XenSource drivers.
Likewise, Virtual Iron has their own device drivers that are unique to their hosting platform. Their "vstools" support one version of SuSE 9 and one version of RedHat Enterprise Linux 4 (U2) in addition to their Windows drivers. While you can download the domu sources from their website, good luck trying to get them running on a linux kernel newer than around 2.6.9. I know. I've tried. If you want to run a Linux guest in Virtual Iron, you're pretty much limited to RHEL4U2. Good luck with anything else.
What if I just want to run Windows under OpenSource virtualization?
OpenSource Xen doesn't have the PV drivers yet. It will be too slow for you to really use in a production capacity.
VirtualBox.org would be my suggestion to you. It includes device drivers that seriously speed up the Windows experience and make it a viable full system virtualized environment for opensource based windows hosting.
If you don't mind forking out the coin, Virtual Iron has a good Windows virtualization platform that is much cheaper than VMWare, and is licensed per socket. With it, you get live migration and vendor support.
If you seriously have no qualms about the cost of the virtualization and want a mature top notch platform, fork out the cash for VMWare ESX.
If none of these solutions seem good to you, look at the "free" VMWare Server. It is based on mature VMWare GSX tech (though features have been whittled down in places) It doesn't scale as well as VMWare ESX, but the cost point is much easier to swallow (free as in beer).
Use the best tool for the job. Move on to the larger business problems. How is that SOA deployment going, anyway? ;)
Unlike AMD's V (svm) support, Intel's VT (vmx) mode requires BIOS support.
More specifically, your motherboard vendor (or system vendor) must allow enabling vmx mode in their BIOS. Without BIOS support, you cannot use vmx mode.
Vendors apparently can disable vmx support in their systems entirely by setting the lock bit in the Feature Control MSR. Some vendors like HP have taken to disabling VT support in laptops, claiming that they disable it because they don't test it before shipping...
If your system BIOS supports enabling VT, doing so does NOT immediately make VT mode available. In fact you must hard power cycle the CPU for this change to take effect.
While documented fairly frequently (based on my google results), this apparently continues to bite new Xen HVM users.
Even systems without BIOSes sometimes need fixes as well.
Some early Macs with VT support needed modifications for DFI support for VT mode, I suffered through this with my early Mac Mini core duo.
Oh dear. I've really messed things up this time. I am entirely off base, and have confused a large number of people (including myself, apparently).
Any reference you've seen from me regarding VMI being a device interface is entirely wrong.
Any reference you've seen from me about Rusty maintaining VMI is entirely wrong.
This is a recent dialog with aliguori, someone directly involved in kvm/xen development, enough to tell me that I'm entirely off base:
*aliguori* paravirt_ops is a low-level paravirtualization interface.
it doesn't make any hypercalls but allows for "modules" to hook that
paravirtualization interface and then translate to the underlying
hypervisor's paravirtualization interface
*aliguori* there is a paravirt_ops implementation for VMI, Xen, and KVM
at the moment
*aliguori* you can think of paravirt_ops as paravirtualization
infrastructure, and then xen/vmi/kvm's paravirt_ops implementation as
drivers for specific hypervisors
*aliguori* and btw, there is no such thing as VMI device drivers
*aliguori* VMI is strictly a CPU paravirtualization interface
<aliguori> Zachary Amsden is doing the VMI paravirt_ops implementation,
Jeremy Fitzhardinge is doing the Xen paravirt_ops implementation, and
Rusty is doing the lhype implementation (and I guess Ingo is sort of
doing the KVM implementation)
Argh. So, mea culpa. I really messed that one up now, didn't I.
Anything I said about virtual devices is apparently entirely off base. Now I get to ensure that future posts are accurate on this matter.
IOMMUs and the future of hardware virtualization
There is one last thing to think about: isolation capable IOMMUs. Soon next generation Intel VT-d and AMD SR-IOV capable CPUs should be out with isolation capable IOMMUs. This means that you will see huge speed improvements from IO virtualization, and the potential to both assign PCI devices to hardware virtualized operating systems and have new "virtual aware" devices from hardware vendors that can be shared by multiple guests at a hardware level.
According to jnalley's post on the Xen developer IRC channel, "SR-IOV allows a PCI-e device to present virtual functions to the root complex. This would allow a guest OS (domU) to access the device directly."
Intel VT-d and AMD IOV should be out sometime Real Soon Now
For more information on SR-IOV, visit the specifications for SR (and MR) IOV.
I hope this helps clears things up.
Again, my apologies for those who were misled by my misunderstanding.
Yesterday, someone stumbled into the #kvm channel and mentioned that VirtualBox has gone OpenSource.
After some frantic questions and listening to the #vbox channel, it became apparent that there are some benefits and limitations of VirtualBox worth noting.
VirtualBox can use Intel/VT or AMD-V/SVM if available, but does not require it. Much like VMWare, which take the same hybrid software/hardware approach to virtualization. For 32bit guests, this can be much faster than pure VT/SVM.
VirtualBox (herein referred to as VBox) is similar to VMWare workstation or VMWare server, in that it has a ring0 kernel driver for a linux host.
This ring0 requirement means that it is not compatible with a Xen paravirtualized domU (and that includes dom0).
VBox leverages QEMU heavily for software emulation of real-mode and other critical code sections, as well as for hardware emulation.
QEMU has a closed source kernel module, kqemu, and a somewhat alpha quality opensource equivalent, qvm86, that do the software code-scanning method of virtualization. They do not require or recognize VT/SVM.
VBox's primary competitor is the kvm project, which provides QEMU based VT/SVM guests. The downside of kvm, of course, is the requirement for VT/SVM support from your CPU. VirtualBox has no such limitation.
VBox only supports 32bit host kernels and 32bit guest images. There is no 64bit support for either running under a 64bit Linux host kernel, or running a 64bit guest OS. The website does mention that 64bit support is under active development, however.
VBox has yet another virtual bus of virtual devices, akin to Xen's paravirtualized XenBus devices (or Virtual Iron's NexBus). While hardware devices are available (PCNet32, etc) using QEMU hardware emulation, VBox also has some excellent video/network/disk drivers that eliminate the hardware chipset emulation overhead.
VMWare tried to make VMI a standard for paravirtualized bus devices. The Linux kernel developer community initially balked, but VMI support lives on in Rusty's paravirt-ops patches. Recently, Ingo has been making great strides with paravirtualized kvm support.
One oddity is that VBox uses .VDI files for its disk images. Not QEMU's QCOW format, not VMWare's VMDK format, and not RAW disk image format.
And for the n00bs that keep popping in and asking about 3d support. No, VBox doesn't proxy 3d. No, QEMU doesn't proxy 3d. Yes, you can use a 3d card with a Xen paravirtualized domain (NOT with an HVM domain).
The only virtualization platform that supports 3d for Windows guests, that I am aware of, is VMWare 5.0 and later which have a somewhat crashy "beta" DirectX 3d support. (Simply add "mks.enable3d = TRUE" to your .vmx file by hand, for more info try googling for "mks.enable3d").
Parallels has promised 3d guests for 4th quarter of this year. If they deliver it, I will be pleasantly suprised.
If you really need 3d gaming for Windows games on a non-Windows platform, consider Transgaming's Cedega product line. Yes, it is Wine. Yes, there is a 50% overhead for the emulation. No, you're not going to do much better without running windows bare iron.
Where does this leave me? In limbo, mostly. I have a 32bit farm of Xen hosts moving toward a 64bit Xen hosting platform at the moment. Xen appears to be crawling while other tech like kvm and virtualbox keep popping up to challenge it. Xen's "maturity" is only really a year at best with its HVM support (quite a lead in tech terms), I can see l-hype/kvm and virtualbox quickly overshadowing Xen in the near future.
Eventually, VMI/paravirt-ops is going to level the playing field with standardized guest device drivers, regardless of hosting platform. Until then, we continue to craft guests based on the virtualization platform under which they will be run.
While Xen is a wonderful virtualization platform, there are a number of lesser known limitations of Xen which aren't well documented. You learn these limitations from first-hand experience.
Xen modes of operation
There are 3 modes of operation for Xen:
The hypervisor mode must match the PV mode. As dom0 is a PV, that means it must match the mode of the hypervisor. This goes for all PV domains.
This means you can't run a pure 32bit PV under a 64bit hypervisor. Nor can you run a 32bit+pae PV under anything but a 32bit+pae hypervisor It must match, all the way through.
The Xen developers are working to fix this, eventually.
The same is not true for HVM operation: you can run 32bit HVM domains under a 64bit hypervisor/dom0.
The easiest way to find out what modes are available to you is to run "xm info | grep xen_caps". That will tell you exactly what guests you can run with your current setup.
Xen does not page
The Xen hypervisor does not page/swap to disk. In fact, the Xen hypervisor isn't directly aware of disk storage at all. All IO goes through the dom0 kernel which communicates with PCI devices.
Xen only manages available RAM.
By default, the Xen Balloon driver allows PV domains to be allocated some amount of RAM (up to maxmem) or reduced to some miminum amount of RAM (minmem), on the fly.
HVM domains allocate maxmem on start, and cannot be resized dynamically (you must restart the domain).
The Xen Balloon driver is shunned all over the xen-devel list historically. It has gotten better over time, though it still has some interesting behaviors.
With the current 3.0.4, for example, if you are running a PV domain with less than maxmem memory assign and save that domain to migrate it, when to restore the domain, it will allocate maxmem memory to it.
Every version of Xen tweaks the behavior of memory allocation just a little more. The full history of said behavior is still well beyond my understanding at this time.
Xen shared pages are limited
When a domU is started, there are a number of "shared pages" between the dom0 and the domU for them to communicate using a system of grants and page flipping between them.
Sadly, this grant space is limited. So limited in fact, that other Xen limits were introduced:
Xen 3.0.3 limits domUs to 3 network interfaces
This is due in part to the above shared page pool limitations.
People were using many many network interfaces, each incurring additional stress on the limited shared resources for inter-domain communication.
Apparently, part of the "fix" was to impose an artificial restriction of 3 network interfaces for all domUs in Xen 3.0.3.
Xen has a potential DoS condition if netloop isn't used
This one is particularly disturbing, and hard to explain or gauge how limiting it really is.
When a domU sends a packet to dom0, the ethernet frame is put into a shared page and access is granted for dom0 to use it.
While dom0 is using that page for the shared ethernet frame, there is a danger that a busy network might drain all available shared pages and Xen may panic.
As long as dom0 is immediately copying off frames to another network interface to be shipped off, there is no problem.
If, however, packets are destined to be processed by dom0 userspace, that skb sits in kernel space until the userspace daemon processes that packet's contents. This causes a strain and potential exhaustion of shared dom0/domU pages for these packets to sit around until they are handled.
Ouch.
This is where netloop comes in. Netloop is a Xen driver that provides a vif0.0/veth0 pair locally to the dom0 explicitly to be used to buffer those ethernet frames. By adding vif0.0 to a bridge along with the vif of a domU guest, any packets destined to be handled by dom0 userspace can take its sweet time and no problems will befall the system.
If you have any dom0 servicing domUs with userspace daemons, and you're not using a netloop to copy the frames, you may want to rethink this immediately. This includes routed/bridged/natted configurations, anything where a packet is handled by a dom0 userspace daemon coming from a domU.
Xen schedulers
There are 3 schedulers in Xen:
Both BVT and SEDF are "complex and buggy", and will go away in future releases.
CREDIT
Xen HVM gotchas
HVM domains require an Intel VT or an AMD V (SVM) capable processor. You can check your cpuinfo flags for "vmx" or "svm" to see if your processor has support for this feature.
The qemu bios used by xen is not patched for lba48, and you are limited to 160G disks.
You can use the commercial XenSource PV drivers (from XenExpress) to avoid the qemu-dm hardware emulation overhead.
HVM domains currently do not suspend/restore/migrate, much less live migrate. The announcement for 3.0.4 suggests that this is a feature slated for 3.0.5.
SMP support for HVM guests in 3.0.4 is better, as is support for other non-windows and non-linux guests, but I've yet to get SMP HVM guests working myself.
Xen volume size limits
There were numerous reports of 2TB limits with Xen vbd volumes in as late as Xen 3.0.3, even with 64bit. No, I do not know if 3.0.4 addressed them.
Xen logical volume resizing
You can't resize LVM2 logical volumes on the fly and have the domU see them to allow them to resize their filesystems without rebooting.
This means downtime whenever I need to grow a domU's filesystem. I get to lvextend it, reboot the domU, then xfs_growfs the filesystem. In that order.
Frequency Scaling kills Xen
Just turn off any frequency scaling in your dom0 (like AMD powernowd, or cpufreq settings), it drives Xen crazy.
Xen's ACPI support
Xen has minimal ACPI support. Don't think you're going to get S3 or S5 sleep suspend/resume working with Xen on your laptop. If you do, LET ME KNOW.
Xen Xserver video drivers
The nVidia video driver needs the following patch to work with Xen.
There have been a couple of reports of symbol errors when loading this. No, I haven't ried it myself, this patch was from someone else via IRC (nick long forgotten):
Xen PVs run ring1, not ring0
This means you can't run VMWare, QEMU/kqemu, or Linux kvm under a Xen PV (this includes dom0, which is a glorified PV).
In theory, you should be able to run VMWare or QEMU/kqemu under an HVM domU.
Xen supported kernels
Xen 3.0.3 ships with patches for Linux 2.6.16.29. Xen 3.0.4 ships with patches for Linux 2.6.16.33.
If you have a newer kernel running Xen, it's probably a distribution patched version.
This means, if you want a driver from 2.6.18 or 2.6.19, you either need to backport said driver to 2.6.16.x, or you need to bravely forge ahead and risk help from the xen-devel team.
Not that you're entirely unsupported, just that your distribution is bravely adopting a newer kernel with untested/unsupported patches.
In conclusion
Those are most of the biggies that people seem to clamor about the most. If you have any others, please drop me a line.
What exactly is the difference between Paravirtualization (PV) and Hardware Virtualization (HVM) with regard to Xen?
This question continues to come up again and again. Rather than answer it in a private email or rather useless IRC chat room, it seems best to summarize it in a blog post.
Paravirtualization means that guests "cooperate" with the virtualization they run under. This means that paravirtualized virtual machines are aware they are running in a virtual environment, and have special drivers or awareness of that environment as they run.
In the case of Xen, all guests run under the guidance of a tiny Xen "hypervisor".
Think of a hypervisor as a microkernel (remember when those were big?), that is responsible for allocating RAM, acting as an intermediary for IO, routing hardware interrupts, and scheduling a fair share of CPU time to each virtual machine.
By default in Xen, one virtual machine talks to the PCI hardware, doing all IO for the others. In Xen parlance, this is "domain 0" (or "dom0"), which is a master OS that talks to the hardware on the box and provides IO resources like networking and disk space to the other domains on the physical machine. There are Linux, OpenSolaris, and FreeBSD dom0s now, it isn't just linux.
With Paravirtualization, your guest kernels need to be compiled to be aware of the Xen hypervisor, with a special Xen patch set. These kernels cannot run without the Xen hypervisor. They require Xen to operate.
The "new" thing out there is something that Xen coins Hardware Virtualized Machines (HVM). Normally, x86 based Operating Systems run with a kernel running in "ring0". Historically, only one Operating System can run as ring0 on a x86 based PC. Now, both Intel/AMD have added special VT/SVM CPU extensions that allow a special "privileged mode" of operation where a hypervisor can run multiple Operating Systems in ring0 at the same time.
Historically, without these VT/SVM instructions, you have to scan every code page for illegal instructions and/or trap instructions to emulate dangerous ones. This is how VMWare initially worked (today VMWare is a software/hardware hybrid that is aware of VT/SVM instructions). This is how QEMU, Parallels, Microsoft Virtual Server, and other virtualized PC platforms initially functioned.
With Hardware virtualization, you install an Operating System from CDROM just as if it is a physical machine. There is a BIOS, there is a VGA display (VNC/SDL), there are emulated IDE and RTL8139 network chipsets. The hardware is actually borrowed from the QEMU project, but thanks to the VT/SVM instructions, there is no need to scan the code or trap illegal CPU instructions the same way as the previous generation of PC virtualization had to.
The mainframe has had Hardware Virtualization since at least the OS/360 days. This is only something new to the PC platform.
While this is a fun essay, and I'd love to go on at length, I think this answers the initial question adequately.
If anyone else has any questions, please feel free to join us on ##xen on freenode, or drop me an email. Please don't be suprised if I post the answers here.
Xen documentation is sorely lacking. Lets try and change that, shall we?
I just finished backporting Xen 3.0.4 and a slew of 2.6.16.33 kernels to our standard platform (building patched debian packages along the way).
The new bits are an API change, better support for SMP and ACPI, some bug fixes, and framebuffer consoles for PVs (borrowing from HVM, it appears).
Here is the announcement from the list:
Folks, We're pleased to announce the official release of xen 3.0.4! This is largely an opportunistic stabilising release for HVM guests, due to the large amount of work in that area of the code since 3.0.3. These enhancements have in particular improved support for SMP and ACPI Linux and Windows operating systems. Other highlights of this release include: - support for kexec/kdump of Xen and domain 0; - graphical framebuffer support for paravirtualised guests; - preview support for the new XenAPI management interfaces; - enhanced support for IA64 (IPF) and Power systems. Since 3.0.4 is an interim release, certain features such as HVM save/restore will now be part of Xen 3.0.5 which we expect to release in early 2007. You can get the source using mercurial from: http://xenbits.xensource.com/xen-3.0.4-testing.hg Source and binary tarballs, and RPMs, will be made available from: http://www.xensource.com/downloads Cheers, Keir (on behalf of the whole Xen dev team)
The process for converting a VMWare VMDK disk image to Xen HVM is rather quite easy. However, there are "gotchas" that you need to consider when doing this conversion.
First, and most importantly, identify if this is a SCSI or an IDE virtual disk. If you installed Windows to a SCSI disk under VMWare, it is unlikely that Windows has the IDE drivers appropriate for Xen HVM. To remedy this, you need to follow the guide documented by Microsoft kb314082.
Once you have ensured that your windows image has IDE drivers installed, you can procede to converting the image.
Next, you need "vmware-vdiskmanager", to convert newer VMWare VMDK files into a compatible format for furthe processing. This tool comes with VMWare 5.0 and VMWare Server 1.0. There is a similar (but different) method of doing this under VMWare ESX.
Identify the appropriate vmdk file to use that represents your disk. This will either be:
I'm sure there are more incarnations of this. It's rather hairy if you've not dealt with it before.
How do you find the right one? Look inside your ".vmx" file for a line beginning with:
scsi0:0.fileName = windows2003.vmdk
or
ide0:0.fileName = windows2003.vmdk
That's all there is to it. Now, lets assume the name of our disk is "windows2003.vmdk".
$ vmware-vdiskmanager -r windows2003.vmdk -t 0 windows2003-flattened.vmdk
This will create a "single growable virtual disk" that is flattened into a single file.
The next step is to turn this flattend.vmdk file into a disk image with qemu-img from the QEMU project.
$ qemu-img convert windows-2003-flattened.vmdk windows2003.img
When this completes, you will now have a windows2003.img file that might boot for you.
The unfortunate reality of running a Windows OS is that it makes a number of assumptions at install time as to your PC hardware. If you transplant the image, you may need to change the Hardware Abstraction Layer (HAL).
Windows 2003, for example has 6 HALs:
HALMACPI.DLL - ACPI Multi processor PC
HALAACPI.DLL - ACPI Uniprocessor PC
HALACPI.DLL - Advanced Configuration and PowerInterface (ACPI)
HALMPS.DLL - MPS Multiprocessor PC
HALAPIC.DLL - MPS Uniprocessor PC
HAL.DLL - Standard PC
Only one is selected and installed as \WINDOWS\SYSTEM32\HAL.DLL at install time.
It is possible to modify your C:\boot.ini to specify a different "/HAL=HAL.DLL", if you copy in the other DLLs so they can be referenced. In this way, it is possible to do some trial and error to see which of the above HALs work with which domU HVM configuration.
When you create your Xen configuration file, you have the opportunity to set four flags that critically interact with the above HALs, namely:
# enable/disable HVM guest PAE, default=0 (disabled)
pae=0
# enable/disable HVM guest ACPI, default=0 (disabled)
acpi=0
# enable/disable HVM guest APIC, default=0 (disabled)
apic=0
# The number of CPUs to assign to this domU
vcpus=1
The above configuration would be most at home with the "Standard PC" HAL.DLL.
For the MPS HALs, one would assume you would enable APIC.
For the ACPI HALs, one would assume you would enable ACPI.
Good luck figuring out which Xen configuration matches which HAL. At the moment, the only success I've really had with Xen 3.0.3's HVM is to use the "Standard PC" HAL.DLL.
When VMWare was used to build the Windows image, it detected ACPI and used an ACPI HAL. To revert this to the "Standard PC" HAL.DLL, I had to mount the image and replace this file:
# mount -o loop,offset=$((63*512)),rw windows2003.img /mnt
# find /mnt -name 'hal*.dll' -print
/mnt/WINDOWS/ServicePackFiles/i386/halaacpi.dll
/mnt/WINDOWS/ServicePackFiles/i386/hal.dll
/mnt/WINDOWS/ServicePackFiles/i386/halacpi.dll
/mnt/WINDOWS/ServicePackFiles/i386/halapic.dll
/mnt/WINDOWS/ServicePackFiles/i386/halmacpi.dll
/mnt/WINDOWS/ServicePackFiles/i386/halmps.dll
/mnt/WINDOWS/system32/hal.dll
# cp -f /mnt/WINDOWS/ServicePackFiles/i386/hal.dll /mnt/WINDOWS/system32/hal.dll
# umount /mnt
Now that you have a "fixed" img file representing the entire drive, you can dd it straight to a lvm logical volume to be used as a Xen phy: vbd device:
# ls -la win2003.img
-rw-r--r-- 1 root root 8589934592 2006-11-16 13:44 win2003.img
# lvcreate -L 8G -n win2003-hda vg
# dd if=windows2000.img of=/dev/vg/win2003-hda bs=1M
Now you are done. Start up your spiffy new HVM domain.
This, in a nutshell, is how you convert a VMWare image into a Xen HVM disk image.
Xen HVM uses the AMD SVM (Pacifica) and Intel VTX (Vanderpool) hardware CPU virtualization.
Both Parallels and VMWare now utilize the same VTX technologies in their products. Based on blogs I have read, VMWare added VTX support somewhere around or just before VMWare Workstation 5.5, and Parallels has supported Core-Duo Intel Macs since their beginning. No, I don't know if either supports AMD's SVM quite yet.
Aside from these products, I am currently unaware of anything else that use today's modern CPU SVM or VTX features.
VMWare Workstation and Server normally runs alongside a host OS, inserting a "vmmon" driver into Ring0. VMWare ESX has its own hypervisor, much like Xen, though you do need to embrace RedHat for their management harness. Hardware is emulated virtually in software (IDE, SCSI via Buslogic/LSI, Network via Pcnet32/VMX, etc). Guest OSes talk to these drivers as if they were running on a physical machine.
Xen is a small hypervisor that "paravirtualizes" CPU scheduling and assigns hardware resources to virtual "domains". The first domain, dom0, is responsible for talking to your PC's hardware directly. Each "guest" domain, or domU, can only talk directly to hardware if it has been configured to allow such access. Typically, a domU only has "frontend" drivers that talk to resources exposed by a "backend" typically from dom0. Things like virtual block devices and virtual network interfaces are handled by native Xen aware device drivers in such paravirtualized domUs.
Xen can also run in HVM mode. This means that instead of paravirtualized devices, a real set of virtual hardware is exposed to the domU to use real device drivers to talk to. Much like