ZFS: the final straw

ian – Thu, 2008 – 02 – 21 11:45

It's official. I am finally pronouncing ZFS 'buggy'.

I do not trust the integrity of my data with this filesystem, and I suggest that you do not use it for any purpose.

I've tried three platforms so far, trying to find a stable ZFS system. They are:

OpenSolaris

Didn't support any common SATA controllers. Crashed if you looked at it the wrong way. Actively user-hostile. Tiny user community, most of it hostile to Linux users and noobs.

Linux ZFS-FUSE

Leaks memory and crashes about once a week, more if you use it. I can't complete a scrub or a complete copy of my storage pool (800GB) in one pass: the machine must be rebooted in the middle. Sometimes claims that the pools are fully intact but contain no data, which is somewhat disconcerting. Little hope for future updates as Sun has 'purchased' the developer that was working on the project.

FreeBSD 7

User-hostile, but not nearly as bad as OpenSolaris. Also leaks memory and crashes if you use it a lot, but not as badly as Linux. Still has the precious-data-disappearing bug. Barfs if the drive arrays aren't just right when you boot up. An array that I created and copied data to just did not exist the next time I booted the machine.

So what's next?

I don't know. I thought my problems were specific to the OS implementation - maybe they were buggy because they're not from the ZFS authors - but the fact that the exact same bugs appear across multiple platforms suggests to me that the problem lies with ZFS, not particular ports of it.

I've had enough. I really, really want something that does checksumming; I'm using this array for long-term storage and ZFS has demonstrated that bits do rot over time.

XFS has the stability that I want, but no checksumming. Linux RAID is painful and dangerous. I'm tempted to just buy a hardware RAID controller, but I've found them to be even less reliable than software RAID, as well as costing a bundle. The promise of ZFS was for a simple, easy-to-administer disk storage system with redundancy and reliability on commodity hardware. Great idea, but poor implementation.

I've got my IntegriFS project, but of course, that's a long way off. And any filesystem that I write will be considered unstable for a long time anyway; I certainly wouldn't want to be keeping un-backed-up data on there.

Any suggestions?

Update 23 Feb 2008

Linux RAID6 + ext3 + USB drive boot. I've invested maybe three hours of labour into this (versus about a hundred for various ZFS schemes) and it works remarkably well. Linux RAID has come a long way since I last tried it - it's a whole lot harder to destroy an array with a typo. It's also quite flexible in the ways you can modify and resize an array after it has been created. Aaaand it has data scrubbing now, so you know (eventually) that there are checksum errors on your disks.

ext3 is surprisingly fast on the large-files workload - much faster than ZFS. Some directory listings used to take twenty seconds, and with ext3 they're instant.

USB booting works perfectly, of course.

So I'm very happy with the end result and will probably sleep a lot better.


How about Vinum on FreeBSD?

How about Vinum on FreeBSD? Perhaps too simple on my part, as your solution, but I had luck with it in the past.

Anyway, good luck with your storage quest.

Bob C (not verified) – Thu, 2008 – 09 – 18 02:07

I forgot to say, this 8-port

I forgot to say, this 8-port SATA controller is very highly regarded in the Solaris community and uses the same chipset as in the Sun Fire X4500, aka Thumper:

http://www.supermicro.com/products/accessories/addon/AOC-SAT2-MV8.cfm
http://www.newegg.com/Product/Product.aspx?Item=N82E16815121009

Simon Breden (not verified) – Tue, 2008 – 07 – 29 20:31

Hi Ian, I've been running

Hi Ian,

I've been running ZFS on OpenSolaris for the last 6 months and it has been rock solid.

I'm running with a ZFS pool containing a single 4-disk RAIDZ1 vdev, and experienced zero checksum errors so far. I'm using the SXCE flavour of OpenSolaris (Community Edition).

I researched my hardware carefully and installed a vanilla SXCE onto it. It's been superb so far, so no real complaints.

See more here, including info on hardware, rationale for choosing the OpenSolaris implementation of ZFS, setup, ethernet port trunking etc: http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/

I am also a long-term Linux user, but have not had any major problems learning Solaris. And the "ZFS discuss" user group on the opensolaris.com site has lots of helpful info and enthusiastic posters.

Regards,
Simon

Simon Breden (not verified) – Tue, 2008 – 07 – 29 20:03

While I agree about the

While I agree about the tiny, insular, hostile community, I have to say that not everyone there is like that. Just some individuals, but unfortunately more then I would like. It is more pronounced because of how small the commmunity is. The IRC chanels for example have under 100 users.. sometimes under 50.

The non solaris ZFS implementations are very buggy, that is a known fact and is advertised by the implementors of them. They warn against using them until they are ready, if you want ZFS use solaris.

As for getting it to run... admittedly solaris crashes, and getting it installed and confiugred is like pulling teeth, but once you have it you are golden.

Sincerely
- Recently recovered noob.

TT (not verified) – Sat, 2008 – 06 – 14 05:40

I am in the same boat as

I am in the same boat as you. I tried OpenSolaris, but it would lock up on me every week or so. I tried FreeBSD 7 and couldn't get it working on my hardware. I am currently still using Linux with ZFS-Fuse and it crashes on me every once in a while, but it doesn't bring down the OS, so I just wrote a little script to re-initialize the array. Its also quite slow. I really wish ZFS could have delivered on its promises.

Tim B (not verified) – Wed, 2008 – 04 – 16 20:26

Obviously you've had a

Obviously you've had a run-in with some Solaris zealots. In exactly the same way as the Linux community has rabidly anti-Sun/Solaris members, some in the Solaris community are quite anti-Linux. It's hardly a majority, though, so just don't antagonise them and/or ignore them :)

That said, I'm surprised you had stability issues with Solaris. I'm running it on a number of non-Sun machines here (most of the mjust thrown together from whatever bits I have lying around) without any stability problems whatsoever. The only time in the last year that any of the machines has had any unplanned downtime was when a circuit breaker tripped. I suspect your problems come from using non-official Solaris distributions - I've briefly tried Nexenta and found it to be very unstable, having both application and kernel crashes. I don't know what they've done to the kernel, but it's massively worse than "vanilla" OpenSolaris.

The supported hardware list is smaller than Linux, but has quite good coverage of server hardware. Even, yes, SATA. It supports JMicron, Sil, and AHCI controllers, which make up a huge chunk of the market.

Regarding ZFS, I won't disagree that the non-Solaris implementations are horribly buggy. It'll be interesting to see if Apple can break the trend. But you can't say ZFS is buggy based on 3rd-party implementations. That'd be like saying Windows is buggy because Wine has problems (OK, bad example). The Solaris ZFS implementation is rock-solid, IME. The machines I'm running have, combined, over 8 TB of data stored in ZFS volumes (mostly raid-z).

Finally, Solaris is no more user hostile than any other *nix. You've clearly got a lot of experience using Linux, so to you Linux seems "easy". Solaris/*BSD/Windows/OSX/etc all do things in their own (different) way, which, once you're used to them, seem just as intuituve and normal as when you're using Linux. I also came from a Linux environment, and initially found Solaris wierd and confusing ("what do you mean, 'top: command not found'?!?!?"). In time, however, I've got used to it, and if anything find it better thought out than Linux.

ZFS is not the universal solution for all file storage. It's not great for large-file sequential access, you can't expand raid-z pools (though this will be fixed Real Soon Now with bp rewrite, which will also allow migration, vdev deletion, and lots more fun stuff), and it can interact in odd ways with some access patterns (particularly databases) to give you poor performance. But I would hardly call it buggy.

If you have the time, I'd recommend that you try one of the official OpenSolaris releases, or even Solaris 10. Hopefully you'll find it nicer to use than the ugly stepchildren (Nexenta et al).

On a completely seperate topic (and the one that orignally brought me here), have you found anywhere where you can get CPLDs or FPGAs at sane prices? :)

Anonymous (not verified) – Sun, 2008 – 04 – 06 03:29

Post new comment

Please solve the math problem above and type in the result. e.g. for 1+1, type 2
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
More information about formatting options