Search:
Where I Work
NKS
Subscribe
Add to Google
RSS 0.91
RSS 1.0
RSS 2.0
ATOM 1.0
RSS 2.0 and ATOM
Network
View Ian's profile on LinkedIn
Archives
2007 April (1)
2007 February (1)
2007 January (4)
2006 December (2)
2006 November (2)
2006 September (5)
2006 August (4)
2006 July (1)
2006 June (3)
2006 May (2)
2006 March (4)
2006 February (4)
2006 January (1)
2005 December (8)
2005 November (26)
2005 October (10)
2005 September (17)
2005 August (87)
2005 July (48)
2005 June (34)
2005 May (24)
2005 April (243)
2004 April (1)
2004 February (3)
2003 August (2)
2003 June (2)
2003 May (8)
2003 January (1)
2002 September (1)
2002 July (4)
2002 June (2)
2002 May (5)
2002 April (15)
2002 March (15)
Projects
CornFS
DENSO NAV
Rage Powered
Tampa Bay
TampaBad
SLUG
ob-buttons
Creative Commons OpenSource Linux Individual-i GeoURL Linux Speakeasy Speed Test
Twitter

follow icblenke at http://twitter.com
Google
Ian's shared items in Google Reader (subscribe)

What do Google Reader engineers read?

Breakfast - Arduino shell

Fan powered flying car

Housing in tents times

What Would Depression 2009 Look Like?

Smart car turned into a monster

The computer science behind EMC’s cloud storage

Let's get "Teh" in the dictionary! Think of the Scrabble® players!!!

Expanding the Cloud: Amazon CloudFront

Django 1.0.1 released!

Rasputine, An XMPP Telnet Client

● High quality YouTube video hack

Holding a conversation with Google

Your Search Activity Predicts Flu Outbreaks

2008-11-11

LinkedIn Cuts 10% Of Staff

Exclusive: First Look at Blue Spruce, IBM's Next Generation Browser Platform

Dow 7,000. My Fault.

Infrastructure at Second Life

Android hacks roundup

Geektalk

Zone Alarm 2009 Free Tomorrow

kenai.com - xVM Server Project site

58% Spam Drop from one colo shutdown

Xenomips - a Xen friendly domU version of Dynamips - Emulate a Cisco 7200

Debian and Android dual-boot on the G1

Sipper (SIPr) - a SIP testing framework in ruby

DBslayer - a SQL abstraction layer using JSON

Clojure - JVM based LISP dialect with immutable persistent data structures that are inherently thread safe

Fingerworks keyboard in a MacBookPro

NfSen - Netflow Sensor

The Phoenix BIOS hypervisor is Xen

Do you live in a Constitution-Free zone?

Puppet presentation at NYCOSUG this month

Kemari - Xen lock-step HA

XenSmartIO - Infiniband IO for Xen

Starting with b100, OpenSolaris has virtual consoles

OpenSolaris testfarm build server interface now available

Firefox M9 Fenric - Maemo alpha

SystemZ - aka Sirius - a port of OpenSolaris to IBM System Z mainframe OS running in z/VM mode

40.8% efficient solar cell

FREDNET

World sunlight map

Solaris and ZFS on a Dell 2950, tweaking notes

Logstalgia

Early Access Windows PV drivers for xVM

Economics: The Theory of Interstellar Trade

COMSTAR Admin Guide PDF file

The Financial Crisis: What Happened and What's Next?

3.5" DIY SSD drive

Microsoft usurping ODF

Cisco to run Windows 2008 on their appliance virtually for services

Packetfence: an OpenSource Network Access Control system

Public.resource.org

persist.js - an alternative to gears

Chinese building "impossible" EM drive

Supertinykeyboard

COMSTAR SMTF - solaris FC, SAS, and iSCSI targets

Flexiscale - yet another control panel?

RightScale - cloud control panels?

GoGrid, a servepath company.

OSCON in 37 minutes

Criticial ESXi remote vulnerability in openwsman

Parasitic power

Microsoft FUD on VMWare: vmwarecostswaytoomuch.com

nmap builds zenmap topology maps

Don't forget about BarCampTampaBay

RubyConf08, In Orlando

The LHC accelerates, and that's what it's all about.

Fun with mechanical turk

Sun's launch of xVM, live webinar

Microsoft to give away Hyper-V for free, live migration by 2010

Ubuntu's Intrepid Ibex will be followed by Jaunty Jackalope

Why Xen traps negative segment offsets

Rails 2.1.1 more REXML bug fixes

ISO torrent for OS2008.11

Indiana OS2008.03 RN3 released - based on nv_b96

Skype Mobile Phone (Not in the US)

Youtube gets closed captioning support

Getting xVM to work on OpenSolaris 2008.05

Xen Memory Overcommit

Algae farming for biofuels

Mozilla Ubiquity

How a VoIP E911 call is handled

A critique of RDMA

MonetDB - a column based RDBMS, ideal for time series data

BarcampTampaBay

Intel's programmable matter

Nexenta Hackathon

The value of side projects

VMfaq's comparison of virtual storage IO

Xen 3.3 released

USB3.0 cables

Intel wireless power.

Xen and Solaris, a log of experience.

Adeona.cs.washington.edu

OpenSolaris CR#6654713 - 32G limit bug stemmed from bad USB hardware? Perhaps fixed?

Xen CPUID example config

OpenSolaris CommonArrayManager

Multiple zero capacity quantum communication channels can actually transmit non-zero amounts of data thanks to entanglement

Sharity-Light - smbfs derived samba clone

Drizzle, a thin mysql, generating buzz

VMWare to offer ESX hypervisor for free

Veedeeeyes

Dr Horrible's Sing-Along-Blog

Fan, the programming language.

Blackberry Thunder with Haptics keyboard

iPhone App Store Live Walkthrough now available

Google Protocol Buffers

Time to patch your DNS

Recent JVM benchmarks

Overclocking tool for the Mac Pro

ADO.NET Entity Framework (Microsoft's new ORM) given a non-confidence vote by beta testers

Ruby interpreter flaws make the case for JRuby

The Stalled Server Room

AdvFS - Tru64 filesystem ported to Linux

OpenSolaris 2005.05 repository update to b91 - follow these instructions carefully

SXCE can ZFS install as of b90

Vertebra: EngineYard's Next Generation Cloud Computing Platform

Skype 4.0 beta overhauls video chat

Mozilla org receives traditional IE cake

Toyota Prius to go entirely Electric

Bill Gates steps down permanently for philanthropic activities

Men write code from Mars, Women write more helpful code from Venus

SproutCore - a GUI event driven model javascript web development platform, rails based by the looks of it.

Finding ARPANET

DRBD LVM Xen = Bug. A rather nasty one at that.

Intel unveils Ct as an extension for C/C to encourage threaded programming for multiple cores

VMWare ThinApp - Run any Windows app on any version of Windows

JDBC adapter for HBase

JRuby-Rack <-- a JRuby port of Rack

Rack <-- a lighter cousin to Merb, fully threaded and no Mutex.

Datamapper.org <-- ActiveRecord like, with no need to do migrations, it just kind of handles that by itself internally automagically.

Solaris Cluster Express (SCX) 6/08 released.

a-i-studio.com/cmd

CMDLogParserDemo

Changing solaris' default password hashing

Texas based service provider explosion affects 9,000 servers and 7,500 customers.

Jruby on Rails on Tomcat deployed as as WAR file

Rubinius

Milkfish.org SIP Router

42 more of the best Linux games

42 of the best Linux games

XenWindowsGplPv drivers

Use Google's cached ajax libraries

Arduino microcontroller with OS/X

The metasploit page describing the full impact of the poor RNG.

Holger Bert's blog post on the openssl RNG fiasco

Cayac - Cherokee MySQL PHP5 phpMyAdmin

ZFS very slow under an xVM kernel

VMWare's review-board.org

Google DocType

Dynamically editing libvirt xml configs while a VM is running to redefine reboot flags.

Chronoton - the time travelling robot who's best friend is a talking pie game

Endace DAG

Your pizza is done

Rietveld - Google's code review tool

Opensource multitouch displays

RTL8139 drivers supporting QEMU tcp segmentation offloading (XP's default driver does not) - doubles networking speed of Xen HVM networking without using the GPLPV drivers

Corporate map.

Ono - an efficient way to locate nearby peers

Solaris CIFS integrated AD with ZFS acls

Samba Winbind and ZFS acl working together

Why's unholy Ruby to Python .pyc compiler

Zentific poll daemon 1.0 beta

Solaris SAM-QFS NFS and OS/X

OpenSolaris 2008.05 final ISO image

Twitter abandoning Ruby on Rails

HP makes memory from a once-theoretical circuit

AVS seamless with ZFS

OpenVZ live migration demo

Setting Up an OpenSolaris NAS Box: Father-Son Bonding - The Video

Linux kernel Xen self-ballooning patch

HyperVM

FuildVM

Coolstack - Yet another group of solaris packages

SFE - Spec Files Extra - or, solaris's ports system

ksplice - live linux kernel patching

ZFS-102-A.pkg - binary package build of newer ZFS for Mac

ZFS for Mac Project page

Changing boot flags for a solaris domU guest

RAM based SSDs

Augeas - a configuration API

callflow - SIP callflow diagram generator

sdedit - quick sequence diagram editor

Milax - The OpenSolaris Small Live CD

Sun close-sourcing MySQL

Intel hardware virtulization breaks kvm - if you're going to run HVM on Intel, you want Xen 3.2 for the improved software emulation of instructions broken in Intel's hardware virtualization

Big Nerd Ranch on Windows/Linux/Leopard single signon

Sun touts big plans for OpenSolars as first release nears

Heroku - EC2 based Rails hosting.

RIP John Achibald Wheeler

Meadowcourt's compiled WindowsXenPV driver, v0.8.8, as built from win-pvdrivers.hg repo

What's new in Solaris 10 U5

The Thing About Git

Network Solutions hijacks all customer's unused subdomains

ZFS Evil Tuning Guide

ZFS speed bump: set zfs_nocacheflush = 1

We Don't Use Software That Costs Money Here

Free NIC drivers for Solaris

Hubble - a PlanetLab realtime Internet "blackhole" monitor

Citrix price jumps on rumors of potential IBM/Cisco bidding ware

Segway RMP

TechCrunch labs on their AppEngine deployment

pash - because powershell was too cool to let microsoft keep to itself

Skeptologists

Google AppEngine

Brazil migrates 430 thousand boting machines to Linux

How xVM can be made to suspend/restore instead of shutdown/restart guests on reboot of the underlying xVM host.

The Machine Emulator - TME can emulate a sparc4 with OBP

SFE - spec-files-extra

OSCON2008 schedule

Google releases new GCC linker

Automatic generation of peephole superoptimizers

Zentific

Zentifi

Disabling nagle under Solaris

Xen.org Trademark Policy for Review

SXCE b85 has problems booting under Xen 3.2

OpenSolaris xVM sysadmin doc

VNRP == opensolaris quagga rbridges crossbow xVM

RBAC vs sudo HOWTO

problems reprobing iscsi devices with solaris 10

IPMP for Solaris Zones

All OpenSolaris flag days

Liveupgrade for idiots

Sigma DP-1 review

ratemynetworkdiagram

LSI MegaRAID SAS/Dell PERC5 driver for Solaris

dm-band block IO bandwidth controller

Sun open sources SAM-QFS

Dojo.storage - Google Gears workalike?

PerlCritic

PerlTidy

Tux droid

ooma.com - free phone service after you buy their device

Hacking defibrilators shockingly easy

Microsoft working with Eclipse.

Pentagon attack last June stole an "amazing amount" of data

Solaris and Solaris Cluster on HP ProLiant Servers

Apple Introduces new MacBook and MacBook Pro models

Sun leaks 6-core Xeon, Nehalem details

Xen and Solaris - a journal of sorts

How to save the world with ZFS and 12 USB sticks

Xvm: a summary of creation of various Xen domU

OpenSolaris b82 comes with CoolStack

Disk Encryption Cracked?

Dilber PHB on Virtualization Consultants

Dilbert PHB on Virtualizing

Burger Haiku Contest

Sun xVM Ops Center GA v1.0 tomorrow

KernelTrap on the 2.6.23 Xen merge

Infiniband explained.

IETF XMPP/SIMPLE Interworking Draft

PSYCed - IRC/XMPP server that gateways transparently between both

Wikipedia OTR

OTR - Off The Record, Homepage. IM Encryption.

SIPE - Pidgin plugin for SIP/SIMPLE with Microsoft LCS compatibility hacks

Price Waterhouse Cooper's Global Cable Map

Solaris Windows iSCSI speedup disabling NAGLE

qooxdoo.org

ConVirt

OpenSolaris Storage Developer Wish List

Nexenta Builder - build your own Nexenta based distribution

Microsoft to acquire SideKick maker Danger

Linux Kernel 2.6.23-2.6.24 vmsplice local root exploit

The evolution of Tech Company logos

Hypertable

Mindstorms NXT Rubiks Cube Solver

Cut four undersea cables, shame on you, cut a fifth, also shame on you

Koha - OpenSource Integrated Library System

Oracle's VM patch for Xen to allow 32bit/64bit domU save/restore/migrate with a 64bit hypervisor and a 32bit dom0.

2 girls, 1 cup: The show

SIPE - SIP Exchange protocol - or, how to get Pidgin to talk to Microsoft Live Communication Server

Little notes on ZFS storage

Amazon SimpleDB written in Erlang

NexentaXenDom0

Three different cable cuts in the middle east: two off the coast of egypt, one off the coast of dubai.

Xen DR7 and CR4 Registers Multiple Local DoS vulnerabilities

XMLPulse - parse xen dom0/domu stats

Universcale

The rist of the FOSS spinmeister

ThinkingRock GTD

Smartphones patented - lawsuits immediately filed

TestDisk - when you've botched a simple->dynamic disk conversion and need that NTFS filesystem back, give this a try. Also, if you partition a disk mistakenly, this can find your filesystems and reconstruct a partition table painlessly.

H-Sphere cross-platform hosting control-panel

Mystery infestation strikes Linux/Apache web sites

Fenxi - A java based OpenSource Performance Analysis Engine. Fenxi (mandarin for analyze) is the successor to the Sun-internal tool called Xanadu.

Gizmo backdoor dialing

GNU/Solaris - When the fun begins

KDE goes cross platform with Windows and Mac/OSX support.

Microsoft prints get-out-of-jail card for Vista Home

Tsung - an erlang based multi-protocol distributed load testing tool

Microsoft relents, ban on vista virtualization is lifted

Architecture for Lustre ZFS

Lustre ZFS

Hyperic podcast talking smack with Luke KAnies of Puppet

Commodore SX64 vs MacBook Air

The Mysql storage engines, and when they are appropriate.

MADOCA - Message And Database Oriented Control Architecture

SMP Xen HVM Windows guests need timer_mode=1

Remember, Oracle owns innodb

Sun buys MySQL for $1billion

Wearscience.com

DreamHost billing issues

James Randi is coming to Tampa

Information Of Those Who Appealed Watch List Compromised

ITConversations

CNN Secondlife Blogs

Google MapReduce stats

Tata Nano - $2500 world's cheapest car

Dilbert on Agile Programming

Banks banned in Second Life

shimmer

Ubuntu embraces OpenVZ

Sears goes spyware

Savingtheinternetwithhate.com

Avocent KVM over IP

Zed Shaw: Rails is a Ghetto

Air Travel with Spare Batteries? Check the changes to what is permitted starting tomorrow.

TBO Crime Tracker

Tampa crime grid maps

TechShop Orlando

OpenNetAdmin

Open Configuration and Management Layer

FiveRuns RM-Manage - rails project monitoring

VLDB - Very Large Data Base Endowment Inc - nonprofit

Elastix - a more friendly Trixbox fork

The C days of Y2k

Toshiba micro nuclear reactor

Ball pit couch

A Glimpse and a Hook - a take on resumes

Xirrus - LISA used 7 arrays to provide WiFi

ipcluster

Imagine Peace

dopd - an easier way to keep drbd primary/secondaries in sync

OpenSIM - run your own SecondLife grid.

$4million in hardware lost in London data center heist

iscsi block device script for /etc/xen/scripts

Quaqua - Aqua look and feel widgets for jvm

Java6 for os/x: Soylatte

Chimps beat humans in memory tests.

WinFUSE

Level 3 needs technicians with FIREBALLS

10 steps to close down an open society

Slurm tutorial PDF

Longer flights to avoid air traffic control charges

News release from Six Apart about LJ sale to SUP

SUP bought LJ from Six Apart

Optimus keyboard is finally available

PlasticFS - an LD_PRELOAD to make applications think they're on a case-insensitive filesystem, and other neat hackery

pkgGen and logGen and Packagemaker - repackage os/x packages to deploy

Jumpbox.com - virtual appliances

TelegraphCQ - barkeley database research - adaptive dataflow capture, combine, analyze

UK loses CD of private info on 25million citizens

Solaris Automatic Migration opensourced

AVS ZFS Demo <-- replicated ZFS pool

Xen Virtualization book not yet published for sell on Amazon

Phoenix BIOS releasing its own hypervisor

Andrew Warfield's other publications

Parallax - managing storage for a million virtual machines, from the Xen guys at Cambridge

Kepler project - GRID scientific workflow engine

Google Distributed Systems

Google Code Map/Reduce mini lectures

What 24 would have been like in 1994.

WaterRoof - Mac OS/X Firewall Manager

Fedora Func

10 reasons why Oracle databases run best on VMWare

Google Caja - allow scripts in a 3rd party context

Miro 1.0 launched

Xen Windows PV drivers - opensource mercurial repository

QuickSilver - opensourced 11/06/07

vmcasting.org - someone else "gets it"

Vista True Info

ASUS EEEPC701 starts to appear

RedHat virt-factory

oh, THAT spacecraft! oops!"

Perian - Opensource quicktime codecs

KVM-lite == kvm-quemu lguest

RedHat cobbler

RSnapshot - an rsync based dirvish like tool

Flyback - a google code project equivalent to Apple's Time Machine, for Linux

Buglabs.net

Apple tablet PC is real, says Asus.

Yahoo Zookeeper

producten.hema.nl - wait for this one to load

Google rolls out the Open Handset Alliance

Cost analysis of Windows Vista Content Protection

HDF5

Git - a Google Talk by Randal Schwartz

Asus EEE PC 701

JQuery's AJAXSLT plugin

Google's AJAXSLT

indeed.com - MIT search engine for jobs crawled from monster, dice, etc.

Genius files

Genius - a mac flashcard app

The Day The Routers Died

Tomshardware's RAID Migration Adventure

Theo de Raadt on Virtualization, and the sate of OpenBSD Xen

Prius Limo

Tamparuby youtube video

Bitlbee - IRC gateway all of your other IM traffic

Off The Record - encrypted IM overlay

SATA drive -> NES cartridge style

SVN time lapse view

Google Gears in Motion

Mon, 25 Jul 2005

Please excuse this brain dump. As ideas come up, I continue to edit this node. Eventually, some structure will be enforced.

Inspired by SSHFS and SHFS, what would it take to make a filesystem that spans a cluster of servers and exposes aggregate diskspace while still mirroring data?

Exposing a filesystem with FUSE on a master node would be ideal, with some form of WebDAV network access (using something as simple as Apache mod_dav) for client access.

Most distributed filesystems have the idea of a "master" for metadata:

  • Google's Filesystem has a master model with distributed "chunk servers" for the data. Not OpenSource. Also not POSIX, it's a programming API interface, you can't "mount" it AFAIK. They could probably throw a FUSE filesystem together in short order if they really wanted to.
  • HDFS (previously NDFS), or the Hadoop (Nutch) Distributed Filesystem is a Java knockoff of the Google Filesystem. As a backend for the Apache Lucene Nutch project, it is a programmatic API inteface filesystem. While you can't mount it, writing a FUSE frontend wouldn't be hard.
  • PVFS v1 has one master, v2 has multiple masters, but no mirroring - meant for high-IO scientific clusters.
  • OpenAFS has many servers, and mirrors at the volume level, but requires a complex kerberos infrastruture and much manual volume creation to balance the layout. There is only one read/write volume, the rest of the volume replicas are read-only. Don't think I'm not temped by OpenAFS, it just doesn't solve the need we have at the moment (long story).
  • CODA (sometimes referred to as AFSv3) offers disconnected roaming, but mirrors at the server level - not at a volume level.
  • Lustre has a master model, but mirrors on a volume level.
  • Intermezzo was Peter J. Braam's predecessor to Lustre. Ideal for straight mirroring, not distributing files throughout a cluster.
  • both GFS and OpenGFS use a DLM cluster arrangement with shared storage to present a shared filesystem. CLVM mirroring is very young (lvmcreate -m is undocumented at best, allocation is impossible to specify, and you can't have more than one mirror log volume yet). Boy was this fun to play with.
  • CXFS is SGI's Clustered XFS. Very similar to GFS, only cross platform and very scalable.
  • OpenSSI's CFS is little more than network mirroring across whatever underlying filesystem to present a unified root image for the OpenSSI cluster. Not what we're looking for.
  • MFS and DFSA are from Mosix / Openmosix. MFS is the feature of openMosix that enables you access to remote filesystems as if those filesystems were locally mounted. With DFSA enabled, system calls will be executed on the remote node without migrating the process back to it's home node

There are others, but these are the "big boys" that I can think of.

There are a couple of distributed filesystems that run without a master server. This isn't trivial to implement:

  • GPFS is IBM's General Parallel File System. What is claims is downright nirvana. I've not have the time (or money) to play with it. Seriously, read this page. I want a copy. Not OpenSource. ;)
  • xFS is Berkeley's Serverless Network File Service. Basically, a log based network striped filesystem with metadata "map" servers that trade "write tokens" to update files between each other.

Storage servers in the cluster might each have some space set aside to this purpose. The easiest way would be to create and mount a loopback file filesystem with the space to be shared:

storage-node$ mkdir -p /data/cornfs/spool/ /data/cornfs/export/
storage-node$ dd if=/dev/zero of=/data/cornfs/spool/storage_fs bs=1M count=5k
storage-node$ mke2fs -f /data/cornfs/spool/storage_fs
storage-node$ mount -o loop /data/cornfs/spool/storage_fs /data/cornfs/export/storage

On the Master, each storage server's remote filesystem would be mounted based on the master's config (which is modeled likewise in a filesystem tree):

master-node$ mkdir -p /data/cornfs/cfgs/nodes
master-node$ cd /data/cornfs/cfgs/nodes
master-node$ echo /data/cornfs/export/storage > storage-node1
master-node$ echo /data/cornfs/export/storage > storage-node2

master-node$ mkdir -p /data/cornfs/import
master-node$ for node in * ; do mkdir -p /data/cornfs/import/$node ; shfsmount $node:`cat $node` /data/cornfs/import/$node ; done

The beauty of this is that shfs caches files and works with pretty much any host you can ssh into (including Windows via Cygwin). There are some shortcomings to shfs: "df -i" doesn't work, extended attributes aren't maintained, and it only works from linux kernels (were there only a Mac port ;)

Each file in the master tree will have a FILE pathname, including the filename.

Ideally, each file would have at least two copies. For our purposes, I'll suggest that this filesystem should endeavor to track two mirrors for every file, and clean up any "extra" copies.

The Master itself should have a few trees for the metadata. This leaves us with a few directory trees:

/data/cornfs/metadata/state/FILE
- the FILE has the same owner, group, permissions, ctime/atime/mtime, and size as the actual FILE (as a sparse file). 
- Extended attributes make a great storage for things like the primary and secondary mirror server names (setxattr/getxattr).

/data/cornfs/import/SERVER/FILE
- contains the actual file, if SERVER is one of the FILE mirrors.

/data/cornfs/metadata/SERVER/FILE
- this is a sparse version of the above file, used as a sanity check and for regenerating a SERVER from scratch. 
- This local metadata replica of a remote server is the masters opinion of what the server actually holds. 
- If something does not exist in this copy, but exists on the server, it should be removed from that server. 
- If something exists in this copy but not on the server, corruption has occurred.

/data/cornfs/metadata/cache/FILE
- a directory tree containing the past N days worth of accessed FILEs (pruned via cron)

This ends up requiring more than twice the number of actual file inodes to represent the full filesystem on the master. One full copy of the entire metadata state, one copy spread across all of the servers for their metadata state replica on the master server, and some fraction of the filesystem in cache for frequent and/or recent file access.

The Master filesystem would be mounted somewhere handy to be filled, like /master:

master$ mkdir /master
master$ /opt/cornfs/current/bin/cornfs /master

Any new files created under /master would be written to the cache until the user closes the file. On file close, the Master needs to:

  1. Lock the file in the metadata state tree so that no two close operations can occur in parallel. Run a "df" on all of the /data/cornfs/import/ filesystems to see which two have the most available space, then fork off a copy to those respective filesystems.
  2. Creates a /data/cornfs/metadata/state/ sparse file
  3. Tag the /data/cornfs/metadata/state/ file with a "mirror1" extended attribute when the copy completes (setxattr). Update the /data/cornfs/metadata/SERVER/ file to mark that the copy was successful.
  4. Tag the /data/cornfs/metadata/state file with a "mirror2" extended attribute when the copy completes (setxattr). update the /data/cornfs/metadata/SERVER/ file to mark that the copy was successful.

When release() is called for a file, if any write() calls were used on the file, it should have been flagged as "dirty" (by an associative array in memory, along with an extended attribute just in case the running daemon is killed). If a file is dirty, it needs to be written out to the mirrors on release(). If a file is clean, don't do anything at all! The file is handily in the cache for the next access.

When reading a file:

  1. Check /data/cornfs/metadata/cache/ for the file. Open if it exists.
  2. If the file does not exist, one of the mirrors would be selected for the file.
  3. Copy the file to the cache. There is nothing wrong with allowing the client to read, as long as it doesn't try to read more data than has been streamed from the mirror server so far (seek or read() past the EOF as the cache file grows). In that case, the read or seek should block until the entire file is in the cache.
  4. If no mirrors are accessible, an error would be returned.

When moving a file/directory:

  1. Move the state/ copy of the file, if it exists. If this fails for any reason, pass the error code up.
  2. Move the cache/ copy of the file, if it exists.
  3. Iterate through the local metadata/SERVER, moving the file, if it exists.
  4. Iterate through the remote import/SERVER, moving the file, if it exists.

When unlinking (removing) a file/directory:

  1. Remove the state/ copy of the file, if it exists. If this fails for any reason, pass the error code up.
  2. Remove any cache/ copy of the file, if it exists.
  3. Iterate through the local metadata/SERVER, removing the file/dir, if it exists.
  4. Iterate through the remote import/SERVER, removing the file/dir, if it eixsts.

Changing permissions, access times, or ownership would really only affect the /data/cornfs/metadata/state/ sparse file.

Most metadata information would use the state sparse file.

A "helper daemon" needs to run periodically to make sure that servers are accessible.

  1. If a server becomes unreachable but has not timed out as "dead", read()s fail over to the other mirror (or fail if both mirrors are unreachable - such operations should probably trigger a mirror copy() as well), and write()s move the unreachable mirror of a file over to another reachable server.
  2. If a server is totally inaccessible for a period of time to mark it as "dead", the helper daemon needs to refer to the /data/cornfs/metadata/SERVER/ tree and create a new mirrored copy for each file across the farm. In the process, the metadata/SERVER tree will be pruned.
  3. A "sanity" script must be periodically run against each metadata/SERVER tree to see if a copy of a file exists on the server that does NOT exist in the metadata/SERVER tree. If so, that's an orphaned mirror, and should be deleted. Orphans would happen when the master's metadata state for a server says something shouldn't be there, but the server has been down during the time when the mirror would have been removed

As metadata state is updated, locking must be used to ensure atomic operations on the metadata tree. We would not want multiple updates to a file to occur out of order due to a delay in a copy operation to a server in the field.

Speed and availability should be consistently monitored to select faster responding mirrors (if possible) and/or noting that nodes are unreachable for file operations to trigger a mirror for a file with a broken mirror.

Symlinks, block/character devices, and other non-files are stored in the metadata state/ tree alongside the sparse files that represent the actual files that are being distributed.

There is no "inode" construct per se, outside of the metadata state/ tree. That is the "master metadata" that most filesystem operations use. Only when reading/writing, opening/closing, moving, or unlinking, do the mounted server filesystems under import/ get involved to hold the data.

Making this a single instance store (ideal for backups) would require just a bit more logic to include an SHA1/MD5 hash encoded as a directory tree (broken up by octet to a path tree structure); something like:

/data/cornfs/metadata/state/SHA1/MD5/object

Another neat extension would be to build a "revision history" of documents in the filesystem by:

  1. On close(), if a file has changed, it should be archived.
  2. Move original version of files into a revision/ metadata tree by hash ID.
  3. Copy in the new version of file from the cache to the mirrors.
  4. Tag the state/ tree of the new file with an extended attribute as to the "previous revision"'s SHA1/MD5 HASH in the revision/ metadata tree.

This would address files that change, but would not save us from directory trees that are removed. For this, we would want an archive/ metadata tree by datestamp:

  1. On unlink(), create an archive/TIMESTAMP/ metadata tree and move the file there.

Moving files and/or directory trees around in state/ would maintain the extended attributes, effectively retaining the revisionist history FOR FREE! When files are moved, the mirrors must be moved as well.

Reconstructing things from the revision/ and archive/ trees would be interesting, but well beyond the initial scope of this endeavor.

The quickest way to throw this together would be with the Fuse.pm perl module. I'm actively writing code now.

The eventual goal would be to write a thread aware C version based on the above prototype, primarily for speed reasons.

More to come.. SOON..

Google
 
Web ian.blenke.com