CPT: ignore deleted linked chr blk fifo nodesIgnore unlinked but referenced pipes, character and block device nodes.
Restore process will create it itself.
Bug #455855
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
CPT: Dump fake hardlinks on inotify watch's inodesWhen a watch is attached to unlinked and closed file it
will not be restored, since the inode will not be in image.
To fix this the proposal is to create a fake link on the
inode in a temp dir and dump it.
Bug #454944
CPT: Add ioctl CPT_HARDLNK_ON for rstvzctl have to call ioctl CPT_HARDLNK_ON to enable open hardlinked
files by kernel during restore.
This protection is needed to prevent mix new kernel + old vzctl (which
doesn't do cleaning). In other words, prevent creating/open files
which will not be removed, and therefore this issue can lead to
security problem.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
CPT: Create hard links to "deleted but referenced" during checkpointFor "deleted but referenced" files, kernel creates hard link in
directory (that was set via CPT_LINKDIR_ADD) in format:
.cpt_hardlink.xxxxxxxx
x - digit, from 0 to 9
Note - this policy is used only when no other ways of dumping unlined
file helped.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
CPT: Add ioctl CPT_LINKDIR_ADD for cptvzctl have to call ioctl CPT_LINKDIR_ADD to tell kernel where
create hardlinked files during checkpoint. Without this ioctl
kernel assumes that creating hardlinked files is off.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
CPT: restart local_kernel_thread in case of -ERESTARTNOINTRThis is essential in case of migration to SLM node.
We can bump into situation when SLM refuses to fork during the
undumping process because it thinks that subgroup's resources
are to be redistributed. When this happens fork is delayed with
the -ERESTARTNOINTR error and the undumping process fails.
As Den (den@) noticed userspace is not intented to see the
-ERESTARTNOINTR error so we should h...
CPT: save/restore only classic task flagsTask flags were restored as they were saved in image. That is not correct as
flags are differs in 2.6.9, 2.6.16 and 2.6.18 kernels.
Actually we just need to save/restore only classic flags (PF_EXITING, PF_DEAD,
PF_FORKNOEXEC, PF_SUPERPRIV, PF_DUMPCORE and PF_SIGNALED).
The problems can occure because during migration from 2.6.9 to 2.6.18 kernel
flag PF_USED_MATH was not restored on tsk->flags ...
CPT: udp sockets restore fixSome applications (like ntpd) set on udp sockets sk_reuse to 1. So any other
applications can bind to the same port. During restore we must skip this
check and restore and bind all sockets. On IPv6 we must also force DAD
(Duplicate Address Detection) procedure to be sure that IFA_F_TENTATIVE flag
will be cleared on IPv6 address and socket can be binded to it.
http://bugzilla.openvz.org/show_bu...
CPT: screw up udev bindmounts knotUbuntu's udev on boot does:
if ! mountpoint -q /dev; then
# initramfs didn't mount /dev, so we'll need to do that
mount -n --bind /dev /etc/udev
mount -n -t tmpfs -o mode=0755 udev /dev
mkdir -m 0700 -p /dev/.static/dev
mount -n --move /etc/udev /dev/.static/dev
fi
So, workaround is dumping "/dev" as bindmount's sourc...
CPT: restore dead tasks proc filesIf some process opened /proc/<pid><somefile> and process with <pid> will die
after some time then checkpoint fails with error:
Can not dump VE: Invalid argument
Error: d_path cannot be looked up /proc/125/cmdline
The fix is to catch this situation at the dump time, mark the image respectively
and restore a fake file on restore.
http://bugzilla.openvz.org/show_bug.cgi?id=1047
Sig...
CPT: adjust vfsmounts restore orderIdea is: Dump parent before dump his children
This order is needed during checkpoint/restore:
mount /A /B -o bind
mount none /C -t tmpfs
mkdir /C/D
mount /B /C/D --move
After this, checkpoint (w/o this patch) will dump vfsmounts in order:
- vfsmount, bind to /A, mounted to /C/D
- vfsmount, mounted to /C (tmpfs)
and will restore in the same order, that cause...
CPT: dont cpt requiresdev fsDon't allow chkpnt VE with mounted ext2/ext3, etc filesystems.
Allow checkpoint only for mounted nodev and "external" filesystem.
This check protects from error on restore:
CPT ERR: ffff810007113000,102 :-2 mounting /root/some_dir ext3 40000000
as do_one_mount() doesn't pass mntdev to mount().
[xemul: actually, the reason we don't support filesystems other than
virtual and tmpfs ...
CPT: Restore information about tcp listening socketsNot all options are important. Only missed ipv6only can cause
error if other application want to listen the same port for IPv4 any address.
tp->XXX are inherited by children (noticed by Alexey Kuznetsov), so we need also
to restore these options.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Comment from Alexey:
It [everything before] was not OK. The feature which are broken are important...
CPT: put 'expect' after insert to the 'conntrack'During restore conntrack, we need to put expect after allocating
ip_conntrack_expect and do something with one. Expect will be
freed or immediate (if nobody has this expect) or during cleanup/timer
hooks. Otherwise expect never will be freed.
Note: Approaches for kernels 2.6.18 and 2.6.9 are different. For example
see help() in "net/ipv4/netfilter/ip_conntrack_netbios_ns.c"
Signed-off-by: Vi...
CPT: Fix ip_conntrack_ftp usage counter leakFunction ip_conntrack_helper_find_get() gets module counter. So put a
conntrack after putting in the hash and handling the conntrack's expect
list.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
CPT: dump and restore global snmp statisticsPer device exists for ipv6 only and is probably not used now, but
anyway - I'll do it later.
This patch adds new section CPT_SECT_SNMP_STATS that is populated
with CPT_OBJ_BITS set of objects - one for each type of statistics.
Objects have variable length. Stats are stored as a plain array of
__u32 numbers and thus the order in which stats types are stored is
implicitly hard-coded.
In case we...
CPT: Fix memory corruption if cpt_family is wrong.During restore, if parent socket is AF_INET but cpt_family is
wrong (non initialized, see bug ##95113), then consider request as
related to AF_INET6 is not right and leads to memory corruption.
As there are a lot of buggy images, so we can't check only on values
AF_INET and AF_INET6.
Desicion:
- Check request on AF_INET6 first, and consider
request as AF_INET by default.
- Additionally c...
CPT: fix restoring of /dev/null opened early by initThe problem is the following:
* init from fc9 starts and opens /dev/null for its stdin, stdout
and stderr
* udev starts and overmounts /dev with tmpfs
After this cpt cannot dump this ve, since one process holds a file,
that is inaccessible from ve root.
The proposed solution is the following:
1. allow for /dev/null to be over-mounted
2. restore init's file in two stages:
stage1: *before*...
CPT: lock sock before restoring its synwait queueThis new socket already has all the necessary TCP timers armed,
so tcp_keepalive_timer can fire during the rst_restore_synwait_queue
and (for the latter being lockless) can spoil the queue.
Bug #118912
CPT: add check for presence of module slm_dmprst if SLM is enabledAdd a check in "checks" for presence of module slm_dmprst if SLM is enabled.
Check will be performed for both source and destination nodes. Changes in
vzmigrate are not needed.
Bug #114312
CPT: add diagnostics in case of iptables-restore failIt is not clear right now what is wrong if iptables-restore fails.
Add some diagnostics in case of error.
Bug #95952
CPT: fix check in decode_tuple()Tuple structure can be used as a mask and protonum can be 0xffff in 2.6.9
kernel. In 2.6.18 kernel all masks for protonum are 0xff and 0xffff will
be shrunken to 0xff.
CPT: fix restore of conntrack expect timerOne more fix of restore conntrack procedure.
Following code:
if (ct->helper->timeout && !del_timer(&exp->timeout)) {
...
}
can lead to oops, as exp->timeout is not initialized at this point.
Actually this optimization is not needed at all.
If expectation is dying, then we will let it die by its own death.
Also in ip_conntrack_expect_insert() there is an initialization of
exp->timeout. And ...
CPT: convert conntrack tuple from 2.6.9 kernel imageAdd conversion for conntrack tuple from 2.6.9 kernel image.
Check for correct value is added in decode_tuple().
CPT: convert conntrack image from 2.6.9 to 2.6.18CPT structure in image file for conntracks is different in 2.6.9 and 2.6.18
kernels (array cpt_help_data was enlarged in the middle of the structure), so
conntracks from 2.6.9 kernel are restored incorrectly on 2.6.18 kernel and
lead to kernel oops.
A simple conversion from 2.6.9 to 2.6.18 is introduced to restore conntracks
correctly on 2.6.18 kernel.
Bug #113290
CPT: create kernel threads in VE0 contextIn current implementation master process which performs checkpointing has
owner_env set to VE0 and exec_env set to VE. All auxiliary kernel threads
are created with exec_env set to VE and owner_env set to VE0, so after the
do_fork_pid() we have the follwing:
* new thread has owner_env == ve0, exec env == ve
* its pid belongs to ve (pid->veid != 0)
That is why if ve_enter() in thread fails, ...
CPT: restore rlimits correctly during 32bit-64bit migrationDuring 32bit to 64bit migration rlimits were restored incorrectly due to
different size of long on 32bit and 64bit archs. Now simple conversion is
introduced in case of 32bit-64bit migration. Infinity values are restored as
infinity values. Error is returned if value greater than RLIM_INFINITY32 is
found in dump during restore on 32bit arch.
Bug #111965
CPT: restore packet control block from kernels with and without IPv6More generic mechanism for restoring packet control blocks. Unfortunately we
do not save length of control block in dump and we can only try to calculate
it during restore. This method is based on knowledge that the flags value in
TCP control block is not zero for all packets in queue.
Since this image version TCP control block will be saved in IPv6 form
regardless to IPv6 config option.
Restor...
CPT: add binfmt_misc fs in supported listJust add binfmt_misc in list of supported file systems. With this small
quick fix migration will be allowed, but all binfmt_misc entries will
be dropped during migration.
This fix is only for the first time. Later will be implemented generic
mechanism for checkpointing/restore of external modules. And this quick
fix will be replaced with full support for binfmt_misc in CPT.
Bugs #100709, #101061
CPT: relax check for several bind mounts on the same mount pointRelax check for special bind mounts which mounted several times on the same
mount point. We need to check only dentry, mount check can be skipped in this
case.
We can't remove completely mount check as there are exist cases when we need
to check mnt too. E.g. /dev is mounted with NODEV over /dev and some file is
opened from underlying mount. If mount check is removed, then we will be able
to ch...
CPT: fix reopen dentries procedureDentries were not reopened correctly during checkpointing and restore.
Two bugs fixed:
1. In case of huge files (more then 2Gb) dentry_open() returns -EFBIG if
O_LARGEFILE flag is not set. This flag should be used for temporary files
used during checkpointing and restore process.
Bug #99544
https://bugzilla.sw.ru/show_bug.cgi?id=99544
2. In dump_content_regular() we have following co...
CPT: fix save/restore of open requestsOpen requests were saved and restored sometimes incorrectly:
1. Family of open request was not saved (commented out)
2. Restore was broken, would crash because rsk_ops was cleared by memset.
3. And finally, all the coded restoring open requests was skipped.
Tested with http_load.
Bug #95113
http://bugzilla.openvz.org/show_bug.cgi?id=784
cpt: add lost dcache_lock protection around __d_path()Protect __d_path() call with dcache_lock spinlock.
Protect other checks with env->op_sem semaphore.
Bug #98833
cpt: fix restore of inotify on symlinkInside VE file /etc/mtab is a symlink to /proc/mounts.
FreeNX server with KDE creates inotify on /etc/mtab file.
To restore such inotify we need to obtain dentry with path_lookup() and
restore inotify on it.
Bug #96464
ve: Don't check for CAP_SETVEID - use more ... imaginationThis patch:
The proposed check correctly detects the root in ve0.
However, we lose the ability to create containers with
some fancy tool, that has the CAP_SETVEID capability
*only*, but we don't have such.
The cap itself is declared to be obsoleted, but there's
no need in rewriting vzctl in a rush - things will still
work. If we'll want to manipulate audit caps from the
...
fairsched: Sanitize fairsched manipulations on ve startupFirst of all we won't be able to call them after we fix
capability checks. Second of it is that taking the fairsched
mutex 4 times on startup is an overkill.
ve-net: permit changing of netdev's tx_queue_len from inside a CTIn particular it makes OpenVPN happy.
Bug #457318
Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>