VCSA file-based backup troubleshooting
In this short blog I will try to describe how to troubleshoot a VCSA-backup issue with the aim to understand the inner working of the VCSA BackupManager.
While KB316609 describes most common VCSA backup issues, my customer encountered one that was not to be found on this list. Instead they observed the following three behaviors:
No update on Data…
No progress on Status (resulting in error)
And third, no updates whatsoever in /var/log/vmware/applmgmt/backup.log
Knowing VCSA-backup works based on several scripts, as seen below, I decided to use a ps-one-liner to understand what script/process is failing.
../backup_restore/py/vmware/appliance/backup_restore/BackupManager.py
../backup_restore/py/vmware/appliance/backup_restore/plugins/../util/Calculate.py
../backup_restore/scripts/local_storage_io.py
../backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py
../backup_restore/py/vmware/appliance/backup_restore/util/Proc.py
../backup_restore/py/vmware/appliance/backup_restore/util/Net.py
Now, on a working VCSA, we can monitor execution of scripts with the following one-liner # watch -d 'ps -ef | tail -25'
. As soon as we open the Backup-Now dialog we notice the following few processes popping up:
/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py --size seat
/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py --size common
/etc/vmware/backup/component-scripts/imagebuilder/backup_restore.py --size
/opt/vmware/vpostgres/current/bin/pg_dump -U postgres -d VCDB -p 5432 -F custom --compress 0 --schema-only
As it seems, the results for the Data fields in the dialog are coming from the first two scripts. This is the moment to test them directly on the cmd-line and recognize the reason for the backup to fail!
# /usr/bin/python /usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py --size seat
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 185
netAddrFamily = 'ipv4'
^
IndentationError: unexpected indent
With the above result we have found the root-cause for the backup to fail. After discussion with the customer they explained to have put an internal workaround [iKB381922] in place to mitigate a PNID resolving-issue. After double-checking the code in BackupManager.py we found the indentation error and saved the fixed version. Running the same cmd-lines results in expected output (130/248):
# /usr/bin/python /usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py --size seat
130
# /usr/bin/python /usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py --size common
248
The Backup Now dialog also shows the progress-icon resulting in the same number(s):
Sub-sequential backups will now succeed. Enjoy!