BigAdmin System Administration Portal
XPert Session - Implementing ZFS
Active Tab XPert Session
Begin Tab Sub Links Active SubSession XPerts Home
Page 3 (31-40 of 40 questions)
Last Updated February 02, 2007
** Note: Some information in these answers is outdated.
For up-to-date information about ZFS, see the ZFS FAQ.
 
 
 

Q: Could I use HAStoragePlus as the data service for ZFS's volume in Sun Cluster 3.X? I'm exploring using Sun Cluster with ZFS.

A: Sun Cluster 3.2 will support ZFS via HAStoragePlus. It will only support fail-over for ZFS.

October 30, 2006 Back to top


Q: Have you played with ZFS in Sun Cluster 3.2 much? I'm curious how to make a ZFS file system global since you don't really "mount" a ZFS file system, and we're not supposed to use vfstab for ZFS in Sun Cluster 3.2. I can only see my pool (which is on a dual-hosted 3510 JBOD) from one node in the cluster. But I want all nodes to see it!

A: Sun Cluster 3.2 does not support ZFS as a global file system. It only supports it as a fail-over file system.

October 30, 2006 Back to top


Q: I once experienced a neat feature in Veritas Volume Manager -- I did a new Solaris install (not upgrade) and installed the same or newer version of Veritas Volume Manager. Then we were able to have VVM inspect all the external disks and discover the volumes and recreate the device links automatically. Then we just reused the vfstab entries from the old Solaris install, and we were back up and in business.

Does or will ZFS have the capability to (non-destructively) inspect disks and re-discover ZFS pools, etc.? VVM had a private partition on disks to hold its metadata -- does ZFS do anything like that? For example, if all ZFS storage on external JBOD and the host dies a bad death, we would like to attach the JBODs to a different host and bring up the ZFS pools, etc. (easily, with ZFS and Solaris OS doing as much work as possible).

A: Yes, ZFS will do this.

"zpool import" will list all the pools available to the system by searching all the attached disks for any pools. They can then be imported by name or by ID.

November 6, 2006 Back to top


Q: How long do you expect it will be before ZFS performance is on par with VxFS?

A: I've not measured VxFS performance against ZFS. However, for general file system operations I would not expect ZFS to be wildly worse than VxFS now. Looking at the comparisons between UFS and ZFS, particularly where UFS is using directio, it is clear there is some way to go. But for more general file system operations the performance should be as good and often better than UFS.

November 8, 2006 Back to top


Q: I've started to play with ZFS for our web server but I discovered that I will have a big problem with the quota notion of ZFS compared to a traditional UNIX quota. I've tried to write something about this in my blog: http://blog.lgb.hu/post160.

For quite a large redundant storage capacity (which can be extended online), with good FS snapshot-making ability, etc., ZFS would be perfect. However, on the storage itself I have several traditional UNIX daemons operating on discrete UNIX UIDs and need kernel-level quotas. There are more than 100,000 users, with heavy modifications done. I can't find a better solution than UNIX quotas (only an ioctl() is enough), which are not provided by ZFS.

IMHO a UNIX environment should not break standard UNIX traditions, only extend them; this quota thing is an issue for me.

A: Moving to a model where you have a file system per user is a change, but not one that daemons should have a problem with. Typically programs cope better with the concept of a file system being full rather than a user over quota, so apart from a daemon that is manipulating a user quota, there would be no requirement to change anything.

It is true that currently the Solaris OS has difficulty managing very large numbers of file systems with an acceptable performance but that is being worked on and it should be possible to fix this.

November 12, 2006 Back to top


Q: There is a very strong case for file size quotas, not the least of which is mail spool files. We implement user-level quotas on our current VERITAS FS set aside for mail spool storage, with a 40 g per-user quota. This means that each user has a max of 40 gig of mail spool storage space at a file level. How do I implement this in ZFS, when the local mail delivery agent only knows how to deliver to files in a directory, not individual file systems or directories for each user?

A: I'm not sure I agree that mail spool files is that strong a case. Yes, ZFS with respect to quotas for users changes things, and that may require changes to mail delivery programs, like sendmail.

However, we already have people wanting to apply a single quota for a user that covers all their disk usage, both mail spool and home directory. People have in the past implemented this by making /var/mail a symbolic link that points into the same file system that contains the user's home directories. However, this does not scale beyond the case where all user directories can fit in a single file system.

A much better solution would be to have the mail delivery program be able to deliver in to a file other than /var/mail/$LOGNAME. If you could configure this to be /tank/users/$LOGNAME/mail/in-box, and then have users' home directories in /tank/users/$LOGNAME/homedir, both could then live in the same file system /tank/users/$LOGNAME, or they could be separate file systems -- one for mail and one for the home directory, depending on the site's administrative choices.

I have filed an RFE to have sendmail support this:

6488870 sendmail needs to support mail delivery to a directory (file system) per user.

November 16, 2006 Back to top


Q: What is the largest file system size currently supported for ZFS?

A: The limit is practically defined by the size of the storage you can attach to a system. The limits are:

  • 248 snapshots in any file system
  • 248 files in any individual file system
  • 16 exabyte file systems
  • 16 exabyte files
  • 16 exabyte attributes
  • 3x1023 petabyte storage pools
  • 248 attributes for a file
  • 248 files in a directory
  • 264 devices in a storage pool
  • 264 storage pools per system
  • 264 file systems per storage pool

December 07, 2006 Back to top


Q: So user quotas will not be implemented... Well, that brings us to a position of an unusable file system for user purposes, because how can I export 50,000 file systems via NFS, one for each user?

A: It is a bit early to jump to that conclusion. There is work going on to make issues around the thousands of file systems go away. When NFSv4 clients can cross server mount points without needing the automounter and a server with many thousands of file systems can boot in a reasonable time, the need for user quotas disappears.

** Information is outdated. Please see Note.

December 12, 2006 Back to top


Q: With Solaris 10 (11/06) out, what has changed with regards to ZFS from its inital release?

A: The following patches will get you to the same level of ZFS support as there is in the Solaris 10 (11/06) release. Please note there may be later versions of patches available:

Solaris 10 Update 3 (11/06) Patches

SPARC Patches

  • 118833-36 SunOS 5.10: kernel patch
  • 124204-03 SunOS 5.10: zfs patch
  • 122660-07 SunOS 5.10: zones jumbo patch
  • 120986-07 SunOS 5.10: mkfs and newfs patch
  • 123839-01 SunOS 5.10: Fault Manager patch
i386 Patches
  • 118855-36 SunOS 5.10_x86: kernel Patch
  • 122661-05 SunOS 5.10_x86: zones jumbo patch
  • 124205-04 SunOS 5.10_x86: zfs/zpool patch
  • 120987-07 SunOS 5.10_x86: mkfs, newfs, other ufs utils patch
  • 123840-01 SunOS 5.10_x86: Fault Manager patch

These patches deliver the following new features and bug fixes:

ZFS Features/Projects

  • PSARC 2006/223 ZFS Hot Spares
  • PSARC 2006/303 ZFS Clone Promotion
  • PSARC 2006/388 snapshot -r

ZFS Bug Fixes/RFEs

  • 4034947 anon_swap_adjust() should call kmem_reap() if availrmem is low.
  • 6276916 support for "clone swap"
  • 6288488 du reports misleading size on RAID-Z
  • 6354408 libdiskmgt needs to handle sysevent failures in miniroot or failsafe environments better
  • 6366301 CREATE with owner_group attribute is not set correctly with NFSv4/ZFS
  • 6373978 want to take lots of snapshots quickly ('zfs snapshot -r')
  • 6385436 zfs set returns an error, but still sets property value
  • 6393490 libzfs should be a real library
  • 6397148 fbufs debug code should be removed from buf_hash_insert()
  • 6401400 zfs(1) usage output is excessively long
  • 6405330 swap on zvol isn't added during boot
  • 6405966 Hot Spare support in ZFS
  • 6409228 typo in aclutils.h
  • 6409302 passing a non-root vdev via zpool_create() panics system
  • 6415739 assertion failed: !(zio->io_flags & 0x00040)
  • 6416482 filebench oltp workload hangs in zfs
  • 6416759 ::dbufs does not find bonus buffers anymore
  • 6416794 zfs panics in dnode_reallocate during incremental zfs restore
  • 6417978 double parity RAID-Z a.k.a. RAID6
  • 6420204 root filesystem's delete queue is not running
  • 6421216 ufsrestore should use acl_set() for setting ACLs
  • 6424554 full block re-writes need not read data in
  • 6425111 detaching an offline device can result in import confusion
  • 6425740 assertion failed: new_state != old_state
  • 6430121 3-way deadlock involving tc_lock within zfs
  • 6433208 should not be able to offline/online a spare
  • 6433264 crash when adding spare: nvlist_lookup_string(cnv, "path", &path) == 0
  • 6433406 zfs_open() can leak memory on failure
  • 6433408 namespace_reload() can leak memory on allocation failure
  • 6433679 zpool_refresh_stats() has poor error semantics
  • 6433680 changelist_gather() ignores libuutil errors
  • 6433717 offline devices should not be marked persistently unavailable
  • 6435779 6433679 broke zpool import
  • 6436502 fsstat needs to support file systems greater than 2TB
  • 6436514 zfs share on /var/mail needs to be run explicitly after system boots
  • 6436524 importing a bogus pool config can panic system
  • 6436526 delete_queue thread reporting drained when it may not be true
  • 6436800 ztest failure: spa_vdev_attach() returns EBUSY instead of ENOTSUP
  • 6439102 assertain failed: dmu_buf_refcount(dd->dd_dbuf) == 2 in dsl_dir_destroy_check()
  • 6439370 assertion failures possible in dsl_dataset_destroy_sync()
  • 6440499 zil should avoid txg_wait_synced() and use dmu_sync() to issue parallel IOs when fsyncing
  • 6443585 zpool create of poolname > 250 and < 256 characters panics in debug printout
  • 6444346 zfs promote fails in zone
  • 6446569 deferred list is hooked on flintstone vitamins
  • 6447377 ZFS prefetch is inconsistent
  • 6447381 dnode_free_range() does not handle non-power-of-two blocksizes correctly6451860 zfs rename' a filesystem|clone to its direct child will cause internal error
  • 6447452 re-creating zfs files can lead to failure to unmount
  • 6448371 'zfs promote' of a volume clone fails with EBUSY
  • 6448999 panic: used == ds->ds_phys->ds_unique_bytes
  • 6449033 PIT nightly fails due to the fix for 6436514
  • 6449078 Makefile for fsstat contains '-g' option
  • 6450292 unmount original file system, 'zfs promote' cause system panic.
  • 6451124 assertion failed: rc->rc_count >= number
  • 6451412 renaming snapshot with 'mv' makes unmounting snapshot impossible
  • 6452372 assertion failed: dnp->dn_nlevels == 1
  • 6452420 zfs_get_data() of page data panics when blocksize is less than pagesize
  • 6452923 really out of space panic even though ms_map.sm_space > 0
  • 6453304 s10u3_03 integration for 6405966 breaks on10-patch B3 feature build
  • 6458781 random spurious ENOSPC failures

January 26, 2007 Back to top


Q: We use EMC SAN storage, which internally handles mirroring and RAID, and we are looking at using ZFS on these. Is there a best practice for using ZFS on SAN storage?

A: Ideally you should present simple striped LUNs to ZFS and let ZFS handle the redundancy so that it can spot any data errors on one path and recover from the other.

Even if you cannot do this and the EMC storage is doing the redundancy, then ZFS will still check the data. However, since there is no second copy available to ZFS, if there were a data integrity issue, it would return an error to the application.

Currently such a configuration should be avoided because if you have a write to the device and you lose all the paths to the device, the system will panic. Obviously if you have multiple paths, which would be normal, the probability of this is small. Also other file systems don't cope much better if at all. Nonetheless, ZFS is being worked on so it would be able to recover from such a failure more gracefully than with a panic.

February 02, 2007 Back to top


Question file was not found.

BigAdmin
  
 
BigAdmin Upgrade Hub