Commits
Pavel Tikhomirov authored and Konstantin Khorenko committed 1a31e40d3ee
ve: use host ipt_mask for pseudosuper in setsockopt and request module We have a quiet tricky design problem: 1) CRIU want's to run both the CT init task and it's parent CRIU task in VEX cgroup so that most restore steps obey VEX rules; 2) VEX cgroup allows only _one_ process to join on CT preparation step; 3) (1) + (2) -> We clone CT init task while in VEX already; 4) Mounts in CRIU are restored in temporary mntns, and later they are copied to their CT mount namespaces; 5) When mount is copied to other mntns, owner userns should be the same for src and dst mntnses else mounts will become locked; 6) (4) + (5) -> Temporary mntns should be owned by CT userns; 7) (6) -> CRIU clones CT init task with CLONE_NEWUSER|CLONE_NEWNS creating both CT userns and temporary mntns; 8) If mount is created in VEX we mark it as owned by VEX, we use these marks in kernel security checks; 9) (3) + (7) + (8) -> All mounts in temporary mntns are marked as owned by VEX; 10) CRIU want's to setup iptables rules to block network. 11) VZ CT can have netfilter=disabled and iptables configuration is prohibited in CT (setsockopt and request_module). 12) (10) + (11) -> On restore we try to restore these bocking rules so we need to temporary enter VE0 for it. 13) (9) + (12) We exec binary of iptables-restore from VE0 on mount marked as VEX, and fail kernel security check. There are a lot of places where we can change the design and fix the problem actually, but I chose step (12) as it looks for now it is the simplies way do do it. After the patch CRIU will not be required to enter VE0 for iptables-restore operation. Also will do a corresponding patch to CRIU. https://jira.sw.ru/browse/PSBM-98702 Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>