Patch Name: PHCO_23651

Patch Description: s700_800 11.00 fsck_vxfs(1M) cumulative patch

Creation Date: 01/03/21

Post Date: 01/03/21

Hardware Platforms - OS Releases:
	s700: 11.00
	s800: 11.00

Products: N/A

Filesets:
	JournalFS.VXFS-BASE-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP

Automatic Reboot?: No

Status: General Release

Critical:
	Yes
	PHCO_23651: CORRUPTION
		A significantly fragmented VxFS filesystem may
		be corrupted by fsck_vxfs(1M). Two consecutive
		invocations of fsck_vxfs(1M) can result in a
		directory corruption.
	PHCO_22453: CORRUPTION OTHER
		fsck may incorrectly detect a corruption on an LCT
		entry that is correct. Attempting to fix it will
		cause that entry to become corrupt.

		fsck may fail to recover a filesystem after a
		system hang or panic and the filesystem will
		fail to mount.
	PHCO_20882: CORRUPTION
		When invoked by extendfs(1M) during the process
		of extending a VxFS file system, fsck(1M) may
		corrupt the file system.  Subsequent invocations
		of fsck(1M) will fail to repair the file system
		and the file system will fail to mount.
	PHCO_13411: OTHER
		It is a backward compatibility issue.
		If you are using OmniStore (DMAPI app)
		you will experience data corruption.
	PHCO_13377: OTHER
		The vxfs fsck command will sometimes fail to
		recover the file system with the message "no
		valid ILISTS for fset 999".  After this, one
		is unable to mount the file system.

Category Tags:
	defect_repair general_release critical corruption

Path Name: /hp-ux_patches/s700_800/11.X/PHCO_23651

Symptoms:
	PHCO_23651:
	1. A data corruption can occur when fsck_vxfs(1M) is run on
	a significantly fragmented filesystem. During the fsck run
	an message similar to the following may be displayed:
	"fileset 999 iau 5 summary incorrect - fix? (ynq) y"
	However, it is possible that instead of fixing the
	iau summary an arbitrary block would be overwritten.
	Fsck then reports that all is fixed, and allows to mount
	a filesystem. An unpredictable behavior may result from
	this error, depending on the data contained in the corrupted
	block.

	2. If fsck_vxfs(1M) is run twice in a row in a log reply
	mode it may corrupt the directory entries. A full fsck
	will be required, and the lost data should be manually
	restored from lost+found to recover.

	PHCO_22453:
	1.fsck deletes user data stored in extended attribute
	inodes. For each file that has such attributes a message
	displayed:
	fileset 999 primary inode ???? has
				invalid attributes clear? (ynq)
	By default these are cleared.

	2.fsck incorrectly detects LCT corruption:
	pass0 - checking structural files
	pass1 - checking inode sanity and blocks
	pass2 - checking directory linkage
	pass3 - checking reference counts
	Fileset 999 LCT entries are incorrect, fix? (ynq)n
	pass4 - checking resource maps
	OK to clear log? (ynq)n

	3.After a system hang or panic fsck may fail
	with the following error:
	log replay in progress
	vxfs fsck: file system does not contain a valid log
	vxfs fsck: cannot perform log replay
	pass0 - checking structural files
	pass1 - checking inode sanity and blocks
	vxfs fsck: fileset 999 primary inode xxxx reorg extent
		   list overflow
	file system check failure, aborting ...

	PHCO_20882:
	fsck fails to repair a filesystem during an
	extendfs operation. This problem was introduced
	in PHCO_15037.

	fsck goes into infinite loop as follows:
	# fsck -F vxfs -o /dev/vg01/rlvol2
	replay in progress
	file system is not clean, full fsck required

	pass0 - checking structural files
	vxfs fsck: structural inode 97 (Primary Ilist 1) failed
	validationclear? (ynq)y
	pass1 - checking inode sanity and blocks
	pass2 - checking directory linkage
	pass3 - checking reference
	countsrebuild structural files? (ynq)y
	pass0 - checking structural files
	vxfs fsck: structural inode 97 (Primary Ilist 1) failed
	validation clear? (ynq)y
	Pass2 ...

	fsck -m fails to check the sanity of insane filesystem
	when ran through the fs_wrapper.

	fsck fails to repair the filesystem after conversion
	to vxfs by vxconvert

	PHCO_17556:
	fsck fails with the following message:
	fileset 1 primary inode 65 has invalid size (2190737408)
	fileset 1 primary inode 97 has invalid size (2190737408)
	no valid ILISTs for fileset 999
	file system check failure, aborting ...

	PHCO_17009:
	fsck fails with the following message:
	log replay in progress
	pass0 - checking structural file
	pass1 - checking inode sanity and blocks
	pass2 - checking directory linkage
	pass3 - checking reference counts
	vxfs fsck: invalid LCT extent

	PHCO_15037:
	Upon a system crash a full fsck occurred on
	filesystems even though fsck reported that a
	log replay was not required and the filesystem
	is clean.

	Fsck incorrectly marks IFQUO,IFILIST,IFIAU,IFEMR
	inodes bad if they are sparse (have less blocks
	allocated to them as compared to their sizes in
	blocks).

	The reorg pointer (rlp) is not properly incremented
	as it transverses the reorglist.

	If a resize operation is in progress when a system
	failure occurs then fsck cannot clean the filesystem.

	Add assertions to the HOLD_BP() and RELE_BP() macros
	so that the hold count on the buffer is sane.

	Fsck was not correctly validating the inodes against
	the CUT value.

	An additional change for the modifications made in
	PHCO_13377 concerning failure of fsck to
	recover a file system resulting in the message
	"no valid ILISTS for fset 999".

	PHCO_13411:
	OmniStore makes extensive use of extended
	attributes via the DMAPI interface.  Its use
	of the vx_attr_direct structure requires that
	member ad_len be of type 32bit unsigned.
	Its current type of 64bit has broken compatibility
	with the 10.20/10.10/10.01 releases.
	Currently, only OmniStore uses the vx_attr_direct
	structure.

	This patch defines a new structure
	vx_attr_direct2 instead of modifying vx_inode.h.

	This patch requires the installation of PHKL_13387.

	PHCO_13377:
	The vxfs fsck command may complain about invalid
	LCT entries.

	The vxfs fsck command fails to recover a file
	system resulting in the message "no valid ILISTS
	for fset 999".

Defect Description:
	PHCO_23651:
	1. fsck_vxfs(1M) does not handle odd-sized extents
	correctly. On a significantly fragmented filesystem
	extent sizes can become odd, so that IAU summary
	block would not follow IAU header. However fsck
	assumes that those two are consecutive blocks
	and overwrites the block immediately following
	the IAU header.

	2. In reply mode fsck_vxfs attempts to optimize
	the directory structure. It moves the entries towards
	the beginning of a list, if there is room. If
	interrupted and restarted again in a log reply mode
	it will not be aware of the moved entries, and
	will try to write to the old locations of the
	moved directories, as per the intent log.

	Resolution:
	1. The code in fsck_vxfs(1M) was modified to
	handle the extents with IAU summary not
	immediately following the IAU header correctly

	2. Directory optimization during the log reply
	removed from fsck_vxfs(1M)

	PHCO_22453:
	1.fsck deletes extended attributes with user data
	(created by a third party app). The problem is
	in passing a 32 bit value to a 64 bit function
	without a proper cast.

	2.lct_check() calls iget() for each inode
	corresponding to an entry in the LCT buffer
	without having a hold on the LCT buffer.
	Sometimes the LCT buffer is reused for inode
	data, after which we will very quickly discover
	"incorrect entries".  If fsck is allowed to "fix"
	these entries, it will actually corrupt the
	filesystem.

	3.The fsck failure, "fileset 999 primary inode xxx
	reorg extent list overflow", was due to fsck having
	a limitation of 1000 extents in a reorg inode and
	an incorrect algorithm to detect inconsistencies between
	the reorg and original inode. This fsck failure may
	occur after a system crash or hang. Two such documented
	systems failures were fixed in PHKL_21941 and PHKL_22393.

	Resolution:
	1.Casted variable to 64 bit.
	2.Put a hold on the LCT buffer while we're using it.
	3.Implemented new algorithm for reorg inode checking.

	PHCO_20882:
	fsck fails to repair a filesystem during operation of
	extendfs.  The bug is in using an incorrect variable
	name when switching the org type to TYPED. This
	problem was introduced in PHCO_15037

	fsck fails to validate structural inodes and loops.
	Will send fsck into an infinite loop. The system stays
	unmountable.

	fsck -m fails to detect an insane file system when
	it is ran through the fs_wrapper, which is the normal
	way of running fsck. The problem occured because
	of an uninitialized data structure.

	After upgrading from hfs to vxfs on 128G+ system
	fsck fails to fix the filesystem, resulting in an
	unmountable filesystem. The problem is
	in the incorrect buffer offest calculation.

	Resolution:
	Corrected the variable name to fix failing of
	fsck during extendfs operation

	The code was added to validate structural inodes
	and to fix the corrupted ones.

	The code was added to initialise a data structure
	which was originally left uninitialised.

	A one line change implemented to correct the
	calculation of the offset into the buffer.

	PHCO_17556:
	fsck will fail on a filesystem with greater than
	8 million inodes and the largefiles option not set.
	For a filesystem to accomodate more than 8 million
	inodes the structural ILIST file must be greater that
	2GB. The fsck failure occurs during validation of the
	inode referencing the structural ILIST file. The inode's
	size field is greater than 2GB and because the largefile
	option is not set an error flag is set.

	Resolution:
	This fix requires both a fsck command and a kernel patch.
	An additional test has been added to the fsck inode
	validation routine to check if the inode is referencing a
	structural file prior to setting an error condition if the
	size field is greater than 2GB and the largefile option
	is not set.

	This patch requires the installation of kernel patches
	of PHKL_17869 and PHKL_14764.

	PHCO_17009:
	fsck was incorrectly handling extents less than 8k.
	For Version 3 filesystems variable-sized indirect extents
	are allowed. An extent of less than 8k can occur because
	of filesystem fragmentation.

	PHCO_15037:
	Upon a system crash a full fsck occurred on
	filesystems even though fsck reported that a
	log replay was not required and the filesystem
	is clean. This occurred on filesystems tagged
	as "dusty". The filesystems were correctly
	tagged however fsck incorrectly marked the
	filesystem for a full fsck.

	Fsck incorrectly marks IFQUO,IFILIST,IFIAU,IFEMR
	inodes bad if they are sparse (have less blocks
	allocated to them as compared to their sizes in
	blocks). These inodes types along with
	regular inodes and inodes with IMMED organization
	are allowed to be sparse.

	The reorg pointer (rlp) was not properly incremented
	as it transverses the reorglist.

	If a resize operation is in progress when a system
	failure occurs then fsck cannot clean the filesystem.
	Fsck will now complete (or undo for pre-v3 filesystems)
	a pending resize operation when run with the -y command
	line option.

	Add assertions to the HOLD_BP() and RELE_BP() macros
	so that the hold count on the buffer is sane.

	Fsck was not correctly validating the inodes against
	the CUT value.

	An additional change for the modifications made in
	PHCO_13377 concerning failure of fsck to
	recover a file system resulting in the message
	"no valid ILISTS for fset 999".

	PHCO_13411:
	The vx_attr_direct structure is defined in the
	vx_inode.h file. This structure is used by
	OmniStore. During one of numerous integration
	cycles with DFS we redefined the ad_len member
	of the vx_attr_ direct structure to a
	DFS-compatible data type vxhyper_t
	(64bit value). Unfortunately, the disk layout
	impact was overlooked. OmniStore makes extensive
	use of extended attributes via DMAPI and most of
	those attributes are direct. The vx_attr_direct
	structure should keep ad_len as a 32bit
	unsigned type in order to keep the backward
	compatibility with the 10.20/10.10/10.01
	releases.

	PHCO_13377:
	The vxfs fsck command may complain about invalid
	LCT entries.  It complained because the disk had LCT
	counts of 0 with the lcd_free bit set and the incore
	copy of the table also had a count of 0, but the free
	bit wasn't set.  Since fsck will never set the free bit
	incore, if this is the only difference then the disk is
	correct.  For every fileset, fsck calls lct_check which
	checks the LCT for all filesets.  lct_check should either
	process only one fileset or it should call it only once.

	The vxfs fsck command fails to recover a file system
	resulting in the message "no valid ILISTS for fset 999".
	The ilist for fset 999 has gone double indirect.  While
	adding the fileset 999, the primary and replica inodes
	for the fileset ilist are read in buffers in the buffer
	cache and then compared.  The extent maps of both inodes
	are compared using the inodes in the buffer cache and
	calling a function due to double indirection.  One of the
	calls to bread() within that function doesn't pass a buffer
	to read in data, consequently the bread function reads
	in data in random memory causing corruption.  The fix
	is to pass the missing buffer argument to the bread
	function.

SR:
	8606147578 5003459271 5003460253 8606125754 1653291369
	1653276618 4701376574

Patch Files:
	
	JournalFS.VXFS-BASE-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,
		v=HP:
	/sbin/fs/vxfs/fsck

what(1) Output:
	
	JournalFS.VXFS-BASE-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,
		v=HP:
	/sbin/fs/vxfs/fsck:
		$ PATCH/11.00:PHCO_19491  Aug  9 1999 09:49:32 $
		PATCH_11_00: aggr.o attr.o bmap.o dir.o extent.o ext
			ern.o fset.o inode.o links.o lwrite.o machde
			p.o main.o olt.o readi.o replay.o subr.o sub
			replay.o super.o 01/03/21

cksum(1) Output:
	
	JournalFS.VXFS-BASE-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,
		v=HP:
	4242980044 491520 /sbin/fs/vxfs/fsck

Patch Conflicts: None

Patch Dependencies:
	s700: 11.00: PHKL_17869
	s800: 11.00: PHKL_17869

Hardware Dependencies: None

Other Dependencies: None

Supersedes:
	PHCO_13377 PHCO_13411 PHCO_15037 PHCO_17009 PHCO_17556 PHCO_20882
	PHCO_22453

Equivalent Patches: None

Patch Package Size: 510 KBytes

Installation Instructions:
	Please review all instructions and the Hewlett-Packard
	SupportLine User Guide or your Hewlett-Packard support terms
	and conditions for precautions, scope of license,
	restrictions, and, limitation of liability and warranties,
	before installing this patch.
	------------------------------------------------------------
	1. Back up your system before installing a patch.

	2. Login as root.

	3. Copy the patch to the /tmp directory.

	4. Move to the /tmp directory and unshar the patch:

		cd /tmp
		sh PHCO_23651

	5. Run swinstall to install the patch:

		swinstall -x autoreboot=true -x patch_match_target=true \
			  -s /tmp/PHCO_23651.depot

	By default swinstall will archive the original software in 
	/var/adm/sw/save/PHCO_23651.  If you do not wish to retain a
	copy of the original software, use the patch_save_files option:

		swinstall -x autoreboot=true -x patch_match_target=true \
			  -x patch_save_files=false -s /tmp/PHCO_23651.depot

	WARNING: If patch_save_files is false when a patch is installed,
		 the patch cannot be deinstalled.  Please be careful
		 when using this feature.

	For future reference, the contents of the PHCO_23651.text file is 
	available in the product readme:

		swlist -l product -a readme -d @ /tmp/PHCO_23651.depot

	To put this patch on a magnetic tape and install from the
	tape drive, use the command:

		dd if=/tmp/PHCO_23651.depot of=/dev/rmt/0m bs=2k

Special Installation Instructions: None

