Intel, Adobe plan a chicken in every pot, Flash on every HDTV
Move over, Eee: Android now running on HP Mini-Note 2133
MSI unveils ultra-thin X-Slim 320, fits snugly into manila envelope
Nano-powered "FreeStyle" netbook hands-on
Minoru 3D Webcam ships this week, still looks freaky
Wisair's Wireless USB Display Adapter Set coming soon for $129
Helpless: A New Tilt-Shift Time-Lapse Video by Keith Loutit
Useless Monkey Webcam Smiles and Cheers You On [Cute]
WorkBay Chair Helps Keep Annoying Workmates At Bay [From The Desks Of Agoraphobes]
Sonic Chair Now Includes Touchscreen iMac [Apple]
Socket Deer: Antlers For Your Outlets [Outlet Antlers]
Gmail Gets a Built-in PDF Reader, Lets You Avoid Acrobat Reader [Gmail]
OLPC Ad Goes For the Jugular With Child Laborers, Child Prostitutes, Child Warriors [Olpc]
Tilt-Shift Photography On the iPhone, Sorry Starving Artists [IPhone Apps]
Huge Hole Found on Earth's Magnetic Field, Run Around In Panic Now [Nasa]
Macabre Plush Toys Are Perfect Xmas Gift for Future Psychokillers [Bloody Xmas]
How to Re-Enable Unlock and Jailbreak in Mac OS X 10.5.6 [IPhone]
Grow your own bioluminescent algae
Quartz Composer and Cruise Control status
Sunay Tripathi's Solaris Networking Blog
Merry Christmas from Chiron Beta Prime
Google's Native Client... the next ActiveX?
kenai.com - xVM Server Project site
58% Spam Drop from one colo shutdown
Xenomips - a Xen friendly domU version of Dynamips - Emulate a Cisco 7200
Debian and Android dual-boot on the G1
Sipper (SIPr) - a SIP testing framework in ruby
DBslayer - a SQL abstraction layer using JSON
Fingerworks keyboard in a MacBookPro
The Phoenix BIOS hypervisor is Xen
Do you live in a Constitution-Free zone?
Puppet presentation at NYCOSUG this month
XenSmartIO - Infiniband IO for Xen
Starting with b100, OpenSolaris has virtual consoles
OpenSolaris testfarm build server interface now available
Firefox M9 Fenric - Maemo alpha
SystemZ - aka Sirius - a port of OpenSolaris to IBM System Z mainframe OS running in z/VM mode
Solaris and ZFS on a Dell 2950, tweaking notes
Early Access Windows PV drivers for xVM
Economics: The Theory of Interstellar Trade
The Financial Crisis: What Happened and What's Next?
Cisco to run Windows 2008 on their appliance virtually for services
Packetfence: an OpenSource Network Access Control system
persist.js - an alternative to gears
Chinese building "impossible" EM drive
COMSTAR SMTF - solaris FC, SAS, and iSCSI targets
Flexiscale - yet another control panel?
RightScale - cloud control panels?
Criticial ESXi remote vulnerability in openwsman
Microsoft FUD on VMWare: vmwarecostswaytoomuch.com
nmap builds zenmap topology maps
Don't forget about BarCampTampaBay
The LHC accelerates, and that's what it's all about.
Sun's launch of xVM, live webinar
Microsoft to give away Hyper-V for free, live migration by 2010
Ubuntu's Intrepid Ibex will be followed by Jaunty Jackalope
Why Xen traps negative segment offsets
Rails 2.1.1 more REXML bug fixes
Indiana OS2008.03 RN3 released - based on nv_b96
Skype Mobile Phone (Not in the US)
Youtube gets closed captioning support
Getting xVM to work on OpenSolaris 2008.05
How a VoIP E911 call is handled
MonetDB - a column based RDBMS, ideal for time series data
VMfaq's comparison of virtual storage IO
Xen and Solaris, a log of experience.
OpenSolaris CR#6654713 - 32G limit bug stemmed from bad USB hardware? Perhaps fixed?
OpenSolaris CommonArrayManager
Sharity-Light - smbfs derived samba clone
Drizzle, a thin mysql, generating buzz
VMWare to offer ESX hypervisor for free
Fan, the programming language.
Blackberry Thunder with Haptics keyboard
iPhone App Store Live Walkthrough now available
Overclocking tool for the Mac Pro
ADO.NET Entity Framework (Microsoft's new ORM) given a non-confidence vote by beta testers
Ruby interpreter flaws make the case for JRuby
AdvFS - Tru64 filesystem ported to Linux
OpenSolaris 2005.05 repository update to b91 - follow these instructions carefully
SXCE can ZFS install as of b90
Vertebra: EngineYard's Next Generation Cloud Computing Platform
Skype 4.0 beta overhauls video chat
Mozilla org receives traditional IE cake
Toyota Prius to go entirely Electric
Bill Gates steps down permanently for philanthropic activities
Men write code from Mars, Women write more helpful code from Venus
DRBD LVM Xen = Bug. A rather nasty one at that.
Intel unveils Ct as an extension for C/C to encourage threaded programming for multiple cores
VMWare ThinApp - Run any Windows app on any version of Windows
JRuby-Rack <-- a JRuby port of Rack
Rack <-- a lighter cousin to Merb, fully threaded and no Mutex.
Solaris Cluster Express (SCX) 6/08 released.
Changing solaris' default password hashing
Texas based service provider explosion affects 9,000 servers and 7,500 customers.
Jruby on Rails on Tomcat deployed as as WAR file
42 more of the best Linux games
Use Google's cached ajax libraries
Arduino microcontroller with OS/X
The metasploit page describing the full impact of the poor RNG.
Holger Bert's blog post on the openssl RNG fiasco
Cayac - Cherokee MySQL PHP5 phpMyAdmin
ZFS very slow under an xVM kernel
Dynamically editing libvirt xml configs while a VM is running to redefine reboot flags.
Chronoton - the time travelling robot who's best friend is a talking pie game
Rietveld - Google's code review tool
Opensource multitouch displays
Ono - an efficient way to locate nearby peers
Solaris CIFS integrated AD with ZFS acls
Samba Winbind and ZFS acl working together
Why's unholy Ruby to Python .pyc compiler
OpenSolaris 2008.05 final ISO image
Twitter abandoning Ruby on Rails
HP makes memory from a once-theoretical circuit
Setting Up an OpenSolaris NAS Box: Father-Son Bonding - The Video
Linux kernel Xen self-ballooning patch
Coolstack - Yet another group of solaris packages
SFE - Spec Files Extra - or, solaris's ports system
ksplice - live linux kernel patching
ZFS-102-A.pkg - binary package build of newer ZFS for Mac
Changing boot flags for a solaris domU guest
callflow - SIP callflow diagram generator
sdedit - quick sequence diagram editor
Milax - The OpenSolaris Small Live CD
Big Nerd Ranch on Windows/Linux/Leopard single signon
Sun touts big plans for OpenSolars as first release nears
Heroku - EC2 based Rails hosting.
Meadowcourt's compiled WindowsXenPV driver, v0.8.8, as built from win-pvdrivers.hg repo
Network Solutions hijacks all customer's unused subdomains
ZFS speed bump: set zfs_nocacheflush = 1
We Don't Use Software That Costs Money Here
Hubble - a PlanetLab realtime Internet "blackhole" monitor
Citrix price jumps on rumors of potential IBM/Cisco bidding ware
TechCrunch labs on their AppEngine deployment
pash - because powershell was too cool to let microsoft keep to itself
Brazil migrates 430 thousand boting machines to Linux
The Machine Emulator - TME can emulate a sparc4 with OBP
Google releases new GCC linker
Automatic generation of peephole superoptimizers
Xen.org Trademark Policy for Review
SXCE b85 has problems booting under Xen 3.2
VNRP == opensolaris quagga rbridges crossbow xVM
problems reprobing iscsi devices with solaris 10
LSI MegaRAID SAS/Dell PERC5 driver for Solaris
dm-band block IO bandwidth controller
Dojo.storage - Google Gears workalike?
ooma.com - free phone service after you buy their device
Hacking defibrilators shockingly easy
Microsoft working with Eclipse.
Pentagon attack last June stole an "amazing amount" of data
Solaris and Solaris Cluster on HP ProLiant Servers
Apple Introduces new MacBook and MacBook Pro models
Sun leaks 6-core Xeon, Nehalem details
Xen and Solaris - a journal of sorts
How to save the world with ZFS and 12 USB sticks
Xvm: a summary of creation of various Xen domU
OpenSolaris b82 comes with CoolStack
Dilber PHB on Virtualization Consultants
Sun xVM Ops Center GA v1.0 tomorrow
KernelTrap on the 2.6.23 Xen merge
IETF XMPP/SIMPLE Interworking Draft
PSYCed - IRC/XMPP server that gateways transparently between both
OTR - Off The Record, Homepage. IM Encryption.
SIPE - Pidgin plugin for SIP/SIMPLE with Microsoft LCS compatibility hacks
Price Waterhouse Cooper's Global Cable Map
Solaris Windows iSCSI speedup disabling NAGLE
OpenSolaris Storage Developer Wish List
Nexenta Builder - build your own Nexenta based distribution
Microsoft to acquire SideKick maker Danger
Linux Kernel 2.6.23-2.6.24 vmsplice local root exploit
The evolution of Tech Company logos
Mindstorms NXT Rubiks Cube Solver
Cut four undersea cables, shame on you, cut a fifth, also shame on you
Koha - OpenSource Integrated Library System
SIPE - SIP Exchange protocol - or, how to get Pidgin to talk to Microsoft Live Communication Server
Amazon SimpleDB written in Erlang
Xen DR7 and CR4 Registers Multiple Local DoS vulnerabilities
XMLPulse - parse xen dom0/domu stats
The rist of the FOSS spinmeister
Smartphones patented - lawsuits immediately filed
H-Sphere cross-platform hosting control-panel
Mystery infestation strikes Linux/Apache web sites
GNU/Solaris - When the fun begins
KDE goes cross platform with Windows and Mac/OSX support.
Microsoft prints get-out-of-jail card for Vista Home
Tsung - an erlang based multi-protocol distributed load testing tool
Microsoft relents, ban on vista virtualization is lifted
Hyperic podcast talking smack with Luke KAnies of Puppet
The Mysql storage engines, and when they are appropriate.
MADOCA - Message And Database Oriented Control Architecture
SMP Xen HVM Windows guests need timer_mode=1
James Randi is coming to Tampa
Information Of Those Who Appealed Watch List Compromised
Tata Nano - $2500 world's cheapest car
Air Travel with Spare Batteries? Check the changes to what is permitted starting tomorrow.
Open Configuration and Management Layer
FiveRuns RM-Manage - rails project monitoring
VLDB - Very Large Data Base Endowment Inc - nonprofit
Elastix - a more friendly Trixbox fork
A Glimpse and a Hook - a take on resumes
Xirrus - LISA used 7 arrays to provide WiFi
dopd - an easier way to keep drbd primary/secondaries in sync
OpenSIM - run your own SecondLife grid.
$4million in hardware lost in London data center heist
iscsi block device script for /etc/xen/scripts
Quaqua - Aqua look and feel widgets for jvm
Chimps beat humans in memory tests.
Level 3 needs technicians with FIREBALLS
10 steps to close down an open society
Longer flights to avoid air traffic control charges
News release from Six Apart about LJ sale to SUP
Optimus keyboard is finally available
pkgGen and logGen and Packagemaker - repackage os/x packages to deploy
Jumpbox.com - virtual appliances
TelegraphCQ - barkeley database research - adaptive dataflow capture, combine, analyze
UK loses CD of private info on 25million citizens
Solaris Automatic Migration opensourced
AVS ZFS Demo <-- replicated ZFS pool
Xen Virtualization book not yet published for sell on Amazon
Phoenix BIOS releasing its own hypervisor
Andrew Warfield's other publications
Parallax - managing storage for a million virtual machines, from the Xen guys at Cambridge
Kepler project - GRID scientific workflow engine
Google Code Map/Reduce mini lectures
What 24 would have been like in 1994.
WaterRoof - Mac OS/X Firewall Manager
10 reasons why Oracle databases run best on VMWare
Google Caja - allow scripts in a 3rd party context
Xen Windows PV drivers - opensource mercurial repository
QuickSilver - opensourced 11/06/07
After filling a CornFS volume for a couple of days now, I found a few problems that really begged for another release.
I'm still building cornfs with debug flags and under gdb to catch any segfaults in the new caching code. Sure enough, it found a segfault or two that I needed to cleanup my pointer handling a bit. Cacheinsert() now works for a rather huge cacheinventory() run without incident.
There was also a bug with the statfs() setup in the cache upload function. Instead of statfs()ing the /data/cornfs/import/{servername} directory, it was handily using /data/cornfs/import. All of the servers appeared to have the same remaining free space, which caused the last two servers to fill to the brim.
Like I said, there will likely be some rapid releases this week as I stumble upon more nits to pick.
For now you can download cornfs-v0.0.5.1.tar.bz2 and have at it.
I've been working on cornfs this weekend a bit to speed things up.
With the help of gprof and gcc -pg, I found that the caching routines were causing a huge performance hit. Every read() and write() was doing a linear linked list search through every cached entry. This is fixed.
Along the way, I found it difficult to debug things with one huge cornfs.c source file. So I've split that up into numerous .c source files to fix this.
I also updated the Makefile to build on its own without building under fuse/examples as before. It is now 2.5.2 friendly, and compiles with 22 ABI compatibility. I'll see about adding the 25 ABI functions shortly.
So, download cornfs-v0.0.5.0.tar.bz2, extract, and build with make.
A few folks have mentioned they were playing with cornfs via private email. With this latest version, and NFS over ssh, NKS is finally running this in a production environment.
Look forward to some rapid updates here in the near future.
I've had serious problems using shfs, sshfs, and sfs. The first two fall apart under load, and the latter is a nightmare to get working in our environment (a PAM nightmare, that is).
Rather than dealing with something crazy, I decided to go back to a faithful old standard: NFS. As the remote storage nodes are accessible only via ssh, ssh was the ideal transport for the NFS mounts.
How do you do this? With a little port trickery and some inittab craziness to hold the tunnels up.
NFS v3 and newer have a TCP transport mode that make it possible to tunnel using ssh. Older versions of NFS use a UDP based ONC RPC transport. Make sure you have kernel support for TCP and NFS v3 before you continue.
On the remote nodes, install NFS:
# apt-get install nfs-kernel-server nfs-common portmap
Then setup an exports file sharing something to localhost:
# echo "/exports localhost(rw,async,insecure,no_root_squash)" >> /etc/exports
We need to have mountd start on a known port to setup the ssh tunnel from the master. The "-p" flag is used for this. Debian keeps the RPCMOUNTDOPTS flags in /etc/default/nfs-kernel/server, easily updated with this perl one-liner:
# perl -pi -e 's/^(RPCMOUNTDOPTS)=.*$/$1="-p 32767"/' /etc/default/nfs-kernel-server
It's also a good idea to block portmap request from anything but localhost with tcpwrappers, just in case your firewall rules happen to be down for some reason.
# echo "portmap: LOCAL" >> /etc/hosts.allow
Now restart things and make sure the mountpoint is being exported:
# /etc/init.d/nfs-kernel-server stop
# /etc/init.d/nfs-common stop
# /etc/init.d/portmap stop
# /etc/init.d/portmap start
# /etc/init.d/nfs-common start
# /etc/init.d/nfs-kernel-server start
# rpcinfo -p localhost
# showmount -e localhost
The remote server is now ready to mount. Return to your central master cornfs server that will act as the client and setup an ssh tunnel.
Step 1: Install nfs-client
# apt-get install nfs-client
Step 2: Setup key trust with the remote server:
# ssh-keygen -f ~/.ssh/id_dsa-cornfs -P'' -t dsa -b 1024
# cat ~/.ssh/id_dsa-cornfs.pub | ssh remoteserver 'mkdir ~/.ssh; cat - >> ~/.ssh/authorized_keys'
Step 3: Setup the SSH tunnel with an inittab respawn
# echo 'N0:23:respawn:/usr/bin/ssh -c blowfish -L 10000:localhost:2049 -L 11000:localhost:32767 remoteserver vmstat 300' >> /etc/inittab
# telinit q
Now you should see an ssh tunnel running in a process listing. Check your system logs to see if there are any problems.
Step 4: Add fstab entries for NFS:
# echo 'localhost:/export /data/cornfs/import/remoteserver nfs rw,bg,soft,port=10000,mountport=11000,tcp 0 0' >> /etc/fstab
# mount /data/cornfs/import/remoteserver
You should now see your remote server /export filesystem mounted under /data/cornfs/import.
Each remote server will need to have a unique nfs and mountd port assignment. Repeat steps 3 and 4 for each.
I started at 10000 and 11000 and worked my way up from there. The next server's port assignments are 10001 and 11001, etc.
This works suprisingly well, and appears to be quite stable (far more stable than the other alternatives).
That's not to say things are as fast as they could possibly be, but it works.
Latest version: v0.0.5.0
The braindump for CORNFS explains many things about this project.
CORNFS is an attempt at creating a distributed filesystem that mirrors N copies of files across a group of M number of servers. Everything in CORNFS is stored as a file.
At any time, it is possible to reconstruct the entire filesystem via a simple overlay rsync from the remote filesystems - there is no "special database" to worry about.
Rather than mirroring at the volume or block level, CORNFS mirrors at the file level, tracking what servers a file is mirrored on. CORNFS works with locally cached copies of files and a central metadata state directory.
Extended attributes are used to mark metadata state files with information CORNFS uses to track the mirrors for a particular file, as well as cached files that are marked as "dirty" (for copying back to remote servers when a cached file is modified).
As files are written, the servers with the most available disk space are used for new files (braindead simple algorithm for the moment). When a cached file is modified, the file is copied back to its mirrors (or new mirrors should a server be unavailable). CORNFS keeps metadata centrally to keep a sane filesystem state. Every remote server's metadata state is known by the central server. The central server's metadata state is authorative; while remote servers may go offline, when they come back online, any files that were updated while they were unavailable will have been removed from that server in the central metadata and will not be referred to (such "orphaned files" will need to be pruned periodically).
As a last resort, the master's cached copy is authorative. If mirrors cannot be written to, the cache file will remain dirty, and will not be expired.
This is a production running release, as used by my employer today.
The history of development so far:
The first (broken) release.
A number of fixes make this version _usable_. There are most definitely corner cases
that have not been dealt with yet, though it seems to suffer an rsync/rm well now.
Adds partial read()s while the copy is underway during an open() (until I figure out
how to spawn a pthread() for the copy, this does not really do much yet).
Added pthread_mutex_lock(&corn_copy_lock) to copy_file. Added corn_magic and
USE_MAGIC wrappers for magic file identification.
The dynamic expiring cache code is now present. Added cache_inventory(),
cache_insert(), cache_update(), cache_expire_to_limit(), cache_rename(),
and cache_remove().
Added S_ISREG check to read() and write(). Any non-regular file read/write calls
are now mapped correctly to state files. Also fixed cache_insert.
Turned off debugging, removed hardcoded size limit.
Remove stat()ing of cached files, replace with cache_exists(), particularly in read()
and write(). Move as many dirty checks to corn_cache as possible. cache_mark_dirty(),
cache_mark_clean(), cache_is_clean(). Fixed some more mallocs.
Add dirty check to cache_expire
(do not expire something from cache if it does not have a good mirror!!!)
Add copy_file_thread, copy_file_wait, and copy_file_nowait.
Copying is now threadable!
Add fsck_thread and xmp_init/xmp_destroy. The fsck_handler_*
functions are no done yet, but are ready to fill in.
Relabeled all xmp_ functions to cornfs_;
Reworked open() function quite a bit: moved much
of the copy to cache logic to download_to_cache();
Defined corn_file_info struct, used to pass open()
file descriptor to read() and write();
Moved code from release() to upload_from_cache(),
aadded to fsck_cache_handler()
Filled in fsck_meta_handler() and fsck_state_handler()
Fixed some logic errors in fsck_import_handler()
Filesystem appears to fsck correctly now.
Add control_file_read()/write() and corn_file_info structure updates to handle control file IO.
Profiled code, found cache_update()/cache_insert() biggest culprit
Made corn_cache a two-way linked list to remedy above.
Split up cornfs.c into numerous .c source files to simplify coding
Now in a tarball because of above.
Or grab the latest cornfs.tar.bz2 with everything you need.
Things to fix:
The easiest way to build this is to grab fuse-2.5.2.tar.gz and extract it:
$ tar xvzf fuse-2.5.2.tar.gz
Then extract the cornfs.tar.bz2 somewhere and build it:
$ cd /tmp
$ wget http://ian.blenke.com/projects/cornfs/cornfs.tar.bz2
$ cd /tmp ; tar xvjf cornfs.tar.bz2
$ make -C /tmp/cornfs
You should now have a "cornfs" runtime. If not, drop me an email.
The directory tree to make this usable is hardcoded at the moment into the runtime (constants toward the top of the source file).
$ mkdir -p /data/cornfs/cfgs/servers
$ echo /remote/path > /data/cornfs/cfgs/servers/SERVERNAME
$ mkdir -p /data/cornfs/metadata/state
$ mkdir -p /data/cornfs/metadata/cache
$ mkdir -p /data/cornfs/metadata/SERVERNAME
$ mkdir -p /data/cornfs/import/SERVERNAME
The only missing bits are mounting the import/SERVERNAME directories for each filesystem configured in cfgs/servers/. You can use SHFS, NFS, DAVFS2, or whatever the heck your linux kernel has support for. The CORNFS strives to be filesystem agnostic.
$ cd /data/cornfs/cfgs/servers
$ for server in * ; do mkdir -p /data/cornfs/import/$server ; shfsmount $server:`cat $server` /data/cornfs/import/$server ; done
Now start the cornfs server with a reference path:
$ cd /tmp/cornfs
$ mkdir /mnt/cornfs
$ ./cornfs /mnt/cornfs -d
The "-d" flag adds FUSE debugging.
The lower you set the DEBUG level when building cornfs, the more debugging info will appear. It's an enum, so that can easily be reversed (Verbosity).
By default, the DEBUG level isn't set at all. In that mode, all debugging is macroed away to oblivion to speed things up.
CornFS is being used in production with SSH over NFS instead of SHFS for stability. If you plan on using CornFS in a production role, please let me know.
Enjoy.
Please excuse this brain dump. As ideas come up, I continue to edit this node. Eventually, some structure will be enforced.
Inspired by SSHFS and SHFS, what would it take to make a filesystem that spans a cluster of servers and exposes aggregate diskspace while still mirroring data?
Exposing a filesystem with FUSE on a master node would be ideal, with some form of WebDAV network access (using something as simple as Apache mod_dav) for client access.
Most distributed filesystems have the idea of a "master" for metadata:
There are others, but these are the "big boys" that I can think of.
There are a couple of distributed filesystems that run without a master server. This isn't trivial to implement:
Storage servers in the cluster might each have some space set aside to this purpose. The easiest way would be to create and mount a loopback file filesystem with the space to be shared:
storage-node$ mkdir -p /data/cornfs/spool/ /data/cornfs/export/
storage-node$ dd if=/dev/zero of=/data/cornfs/spool/storage_fs bs=1M count=5k
storage-node$ mke2fs -f /data/cornfs/spool/storage_fs
storage-node$ mount -o loop /data/cornfs/spool/storage_fs /data/cornfs/export/storage
On the Master, each storage server's remote filesystem would be mounted based on the master's config (which is modeled likewise in a filesystem tree):
master-node$ mkdir -p /data/cornfs/cfgs/nodes
master-node$ cd /data/cornfs/cfgs/nodes
master-node$ echo /data/cornfs/export/storage > storage-node1
master-node$ echo /data/cornfs/export/storage > storage-node2
master-node$ mkdir -p /data/cornfs/import
master-node$ for node in * ; do mkdir -p /data/cornfs/import/$node ; shfsmount $node:`cat $node` /data/cornfs/import/$node ; done
The beauty of this is that shfs caches files and works with pretty much any host you can ssh into (including Windows via Cygwin). There are some shortcomings to shfs: "df -i" doesn't work, extended attributes aren't maintained, and it only works from linux kernels (were there only a Mac port ;)
Each file in the master tree will have a FILE pathname, including the filename.
Ideally, each file would have at least two copies. For our purposes, I'll suggest that this filesystem should endeavor to track two mirrors for every file, and clean up any "extra" copies.
The Master itself should have a few trees for the metadata. This leaves us with a few directory trees:
/data/cornfs/metadata/state/FILE
- the FILE has the same owner, group, permissions, ctime/atime/mtime, and size as the actual FILE (as a sparse file).
- Extended attributes make a great storage for things like the primary and secondary mirror server names (setxattr/getxattr).
/data/cornfs/import/SERVER/FILE
- contains the actual file, if SERVER is one of the FILE mirrors.
/data/cornfs/metadata/SERVER/FILE
- this is a sparse version of the above file, used as a sanity check and for regenerating a SERVER from scratch.
- This local metadata replica of a remote server is the masters opinion of what the server actually holds.
- If something does not exist in this copy, but exists on the server, it should be removed from that server.
- If something exists in this copy but not on the server, corruption has occurred.
/data/cornfs/metadata/cache/FILE
- a directory tree containing the past N days worth of accessed FILEs (pruned via cron)
This ends up requiring more than twice the number of actual file inodes to represent the full filesystem on the master. One full copy of the entire metadata state, one copy spread across all of the servers for their metadata state replica on the master server, and some fraction of the filesystem in cache for frequent and/or recent file access.
The Master filesystem would be mounted somewhere handy to be filled, like /master:
master$ mkdir /master
master$ /opt/cornfs/current/bin/cornfs /master
Any new files created under /master would be written to the cache until the user closes the file. On file close, the Master needs to:
When release() is called for a file, if any write() calls were used on the file, it should have been flagged as "dirty" (by an associative array in memory, along with an extended attribute just in case the running daemon is killed). If a file is dirty, it needs to be written out to the mirrors on release(). If a file is clean, don't do anything at all! The file is handily in the cache for the next access.
When reading a file:
When moving a file/directory:
When unlinking (removing) a file/directory:
Changing permissions, access times, or ownership would really only affect the /data/cornfs/metadata/state/ sparse file.
Most metadata information would use the state sparse file.
A "helper daemon" needs to run periodically to make sure that servers are accessible.
As metadata state is updated, locking must be used to ensure atomic operations on the metadata tree. We would not want multiple updates to a file to occur out of order due to a delay in a copy operation to a server in the field.
Speed and availability should be consistently monitored to select faster responding mirrors (if possible) and/or noting that nodes are unreachable for file operations to trigger a mirror for a file with a broken mirror.
Symlinks, block/character devices, and other non-files are stored in the metadata state/ tree alongside the sparse files that represent the actual files that are being distributed.
There is no "inode" construct per se, outside of the metadata state/ tree. That is the "master metadata" that most filesystem operations use. Only when reading/writing, opening/closing, moving, or unlinking, do the mounted server filesystems under import/ get involved to hold the data.
Making this a single instance store (ideal for backups) would require just a bit more logic to include an SHA1/MD5 hash encoded as a directory tree (broken up by octet to a path tree structure); something like:
/data/cornfs/metadata/state/SHA1/MD5/object
Another neat extension would be to build a "revision history" of documents in the filesystem by:
This would address files that change, but would not save us from directory trees that are removed. For this, we would want an archive/ metadata tree by datestamp:
Moving files and/or directory trees around in state/ would maintain the extended attributes, effectively retaining the revisionist history FOR FREE! When files are moved, the mirrors must be moved as well.
Reconstructing things from the revision/ and archive/ trees would be interesting, but well beyond the initial scope of this endeavor.
The quickest way to throw this together would be with the Fuse.pm perl module. I'm actively writing code now.
The eventual goal would be to write a thread aware C version based on the above prototype, primarily for speed reasons.
More to come.. SOON..