OpenVZ Source code
  1. OpenVZ Source code

vzkernel

Public
AuthorCommitMessageCommit dateIssues
Eric DumazetEric Dumazet
ed56cda1474ms/af_key: do not use GFP_KERNEL in atomic contextspfkey_broadcast() might be called from non process contexts, we can not use GFP_KERNEL in these cases [1]. This patch partially reverts commit ba51b6be38c1 ("net: Fix RCU splat in af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock() section. [1] : syzkaller reported : in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439 3 locks held by syzkaller183439/2932: #0...PSBM-94668
David AhernDavid Ahern
a90b876e439ms/net: Fix RCU splat in af_keyHit the following splat testing VRF change for ipsec: [ 113.475692] =============================== [ 113.476194] [ INFO: suspicious RCU usage. ] [ 113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted [ 113.477545] ------------------------------- [ 113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-s...PSBM-94668
Daniel BorkmannDaniel Borkmann
8be3459fe13ms/tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter[ Upstream commit 5a5abb1fa3b05dd6aa821525832644c1e7d2905f ] Sasha Levin reported a suspicious rcu_dereference_protected() warning found while fuzzing with trinity that is similar to this one: [ 52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage! [ 52.765688] other info that might help us debug this: [ 52.765695] rcu_scheduler_active = 1, debug_locks = ...PSBM-94755
Konstantin KhorenkoKonstantin Khorenko
fd48f655e5bOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.9Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Ben HutchingsBen Hutchings
6c68a869397ms/tcp: Clear sk_send_head after purging the write queueDenis Andzakovic discovered a potential use-after-free in older kernel versions, using syzkaller. tcp_write_queue_purge() frees all skbs in the TCP write queue and can leave sk->sk_send_head pointing to freed memory. tcp_disconnect() clears that pointer after calling tcp_write_queue_purge(), but tcp_connect() does not. It is (surprisingly) possible to add to the write queue between disconnec...PSBM-97312
Konstantin KhorenkoKonstantin Khorenko
46f1803c0beRevert "venet: add venet_free_stat callback"This reverts commit 4275964f99a30bfaad72a1e8c97a800655f45769. We don't have closed source modules long ago, so don't need venet_free_stat() callback. Fixes: a6fce566d713 ("drivers/net/ve: venet network device introduced") https://jira.sw.ru/browse/PSBM-69078 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>PSBM-69078
Konstantin KhorenkoKonstantin Khorenko
91d8c3e8a14Revert "fs: use original vfsmount for touch_atime"This reverts commit c03ccabe433a3cf811477358975e9d0a4bd0c144. The patch being reverted had been introduced in the scope of https://jira.sw.ru/browse/PSBM-51009 Now it's reverted in the scope of https://jira.sw.ru/browse/PSBM-78863 because Red Hat had backported 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") long ago and now xfstests should wor...2 Jira Issues
Konstantin KhorenkoKonstantin Khorenko
b4f99d63887Revert "ve/fs/fadvise: introduce FADV_DEACTIVATE flag"This reverts commit 114935b7b36b5d228fdce234f885701fe53774b2. Drop our home brew fadvise FADV_DEACTIVATE flag which was introduced in the scope of https://jira.sw.ru/browse/PSBM-57915 Reasoning: it won't be used because of performance degradation: https://pmc.acronis.com/browse/VSTOR-22963 https://jira.sw.ru/browse/PSBM-94829 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>3 Jira Issues
Konstantin KhorenkoKonstantin Khorenko
6407127a944Revert "mm/page_alloc.c: check if page cgroup still in use during alloc/free."This reverts commit cf8aaacbc2c2b191b39ef9f677dbd5cbba3729a1. Reverting the debug patch: it did not provide us any single case, at the same time issues are not reported anymore, probably it was fixed as a side effect. Anyway, we don't need a debug BUG_ON() in the release code. https://jira.sw.ru/browse/PSBM-96036 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>PSBM-96036
Konstantin KhorenkoKonstantin Khorenko
fecffac6edaOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.8Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Ildar IsmagilovIldar Ismagilov
b9ae1b8f967fs/fuse kio: don't wait read requests in case of fsync/flushIn this patch, the KIO requests are divided into two types: read and write. And in case of fsync/flush we only wait for completion write requests. https://pmc.acronis.com/browse/VSTOR-11372 Signed-off-by: Ildar Ismagilov <ildar.ismagilov@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuznet@acronis.com>VSTOR-11372
Pavel TikhomirovPavel Tikhomirov
1a31e40d3eeve: use host ipt_mask for pseudosuper in setsockopt and request moduleWe have a quiet tricky design problem: 1) CRIU want's to run both the CT init task and it's parent CRIU task in VEX cgroup so that most restore steps obey VEX rules; 2) VEX cgroup allows only _one_ process to join on CT preparation step; 3) (1) + (2) -> We clone CT init task while in VEX already; 4) Mounts in CRIU are restored in temporary mntns, and later they are copied to their CT mount na...PSBM-98702
Vasily AverinVasily Averin
e54d6e25df6ms/netfilter: nf_tables: avoid global info storageML commit 2a43ecf96ba6a6eed70dbcd99d0888fc0ad3b82b Author: Florian Westphal <fw@strlen.de> Date: Wed Jul 11 13:45:13 2018 +0200 netfilter: nf_tables: avoid global info storage This works because all accesses are currently serialized by nfnl nf_tables subsys mutex. If we want to have per-netns locking, we need to make this scratch area pernetns or allocate it on demand. This does the l...PSBM-98682
Vasily AverinVasily Averin
42463cc52c9rdma/i40iw: hide high-order-allocation warning in i40iw_save_msix_info()Size of struct i40iw_msix_vector was increased in RHEL7.7 kernels due to ML commit 43731753c4b7 ("RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint"). It triggers high-order-allocation warning in i40iw_save_msix_info(). The patch disables this warning. https://pmc.acronis.com/browse/VSTOR-27273 Signed-off-by: Vasily Averin <vvs@virtuozzo.com>VSTOR-27273
Pavel TikhomirovPavel Tikhomirov
1b77ab0ea80ve/exec: allow trusted exec change both on boot and on running systemCan be configured either with no_trusted_exec boot option of via /proc/sys/fs/trusted_exec sysctl, by default it is enabled. (When "fs.trusted_exec" is enabled (==1) it means, the defense is "on"). https://jira.sw.ru/browse/PSBM-98702 Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>PSBM-98702
Pavel TikhomirovPavel Tikhomirov
ff3f7964a87ve/fs/exec: send SIGSEGV to a process trying to execute untrusted filesIt can help faster find out the cause of the problem in case userspace is executing CT binary from host. Logs are not enough sometimes. Avoid disk overflown with coredumps by ratelimiting them to 3 times a day. https://jira.sw.ru/browse/PSBM-98702 Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>PSBM-98702
Konstantin KhorenkoKonstantin Khorenko
0bae0c7c67aOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.7Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Pavel TikhomirovPavel Tikhomirov
7efd5dd465bsysfs/ploop: add new device option trustedAfter the patch "ve/fs/exec: don't allow a privileged user to execute untrusted files" we need a way to execute files on trusted ploop. A new file is added on sysfs (default 0 - exec security check enabled): /sys/devices/virtual/block/ploopXXXXX/ptune/trusted Writing 1 to the file will allow execution. On PLOOP_IOC_STOP ioctl (ve stop) the value is dropped back to default. Note: execution o...PSBM-98234
Pavel TikhomirovPavel Tikhomirov
d85a5b7c31bve/fs/exec: don't allow a privileged user to execute untrusted filesIf we run some binary (exploit) from CT on host, it can easily give a user in these CT an ability to do anything on host sending commands through unix socket to the exploit. Such an exploit can mimic to bash, ip, systemd, ping or some other "trusted" utility. I've tested with these patch that we don't call from VE0 any binaries from CT-fs on start, stop, enter, suspend, resume or migration. Bu...PSBM-98094
Konstantin KhorenkoKonstantin Khorenko
0bc3343d3dbOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.6Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Andrey RyabininAndrey Ryabinin
8bec805e4abmm/slab: Fix deadlock on attempt to shrink slab.echo 1 > /sys/kernel/slab/<name>/shrink deadlocks as kmem_cache_shrink() attempts to lock slab_mutex which is already held by caller. Replace slab_mutex locking with [get,put]_online_mems(). This is what the sane kernel does. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Andrey RyabininAndrey Ryabinin
a12399732edmm/memcg: restore lost css_put() in memcg_kmem_cache_create_func()Restore lost css_put() in memcg_kmem_cache_create_func() otherwise mem cgroup cannot be destroyed. https://jira.sw.ru/browse/PSBM-98444 Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>PSBM-98444
Andrey RyabininAndrey Ryabinin
ff6496ddc93ve/net/netfilter/core: Don't allow container to crash the kernel.The expression BUG_ON(!ve_is_super(get_exec_env())); basically says that we allow to crash the kernel if we are in container. This doesn't make any sense, remove this idiocy. https://jira.sw.ru/browse/PSBM-98211 Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>PSBM-98211
Andrey RyabininAndrey Ryabinin
081f6e9e8f7ve/kmod, nf_tables: allow nf_tables.ko autoloading on request from ve.Allow nf_tables.ko module autloading from CT. Needed for iptables in centos 8. https://jira.sw.ru/browse/PSBM-98211 Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>PSBM-98211
Konstantin KhorenkoKonstantin Khorenko
5308f492456OpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.5Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Kirill TkhaiKirill Tkhai
1323f47d486ploop: Kill noop complete_merge methodIt's unused after previous patch. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Kirill TkhaiKirill Tkhai
54ab284ad59ploop: Assign holes_bitmap before merge on merge-no-return pointSince we merge clusters back into base image, we should take into account holes into base image and reuse them firstly. Otherwise base image may grow unexpected. Thus, we move populate_holes_bitmap() into start_merge. https://jira.sw.ru/browse/PSBM-98313 Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>PSBM-98313
Benjamin CoddingtonBenjamin Coddington
9c4ba52547ams/NFS: Fix a double unlock from nfs_match,get_clientNow that nfs_match_client drops the nfs_client_lock, we should be careful to always return it in the same condition: locked. Fixes: 950a578c6128 ("NFS: make nfs_match_client killable") Reported-by: syzbot+228a82b263b5da91883d@syzkaller.appspotmail.com Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> https://jira.sw.ru/browse/PS...PSBM-98297
Benjamin CoddingtonBenjamin Coddington
31b27ff4659ms/NFS: Cleanup if nfs_match_client is interruptedDon't bail out before cleaning up a new allocation if the wait for searching for a matching nfs client is interrupted. Memory leaks. Reported-by: syzbot+7fe11b49c1cc30e3fce2@syzkaller.appspotmail.com Fixes: 950a578c6128 ("NFS: make nfs_match_client killable") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> https://jira...PSBM-98297
Roberto Bergantinos CorpasRoberto Bergantinos Corpas
0b58690841bms/NFS: make nfs_match_client killable Actually we don't do anything with return value from nfs_wait_client_init_complete in nfs_match_client, as a consequence if we get a fatal signal and client is not fully initialised, we'll loop to "again" label This has been proven to cause soft lockups on some scenarios (no-carrier but configured network interfaces) Signed-off-by: Roberto Bergantinos Corpas <rbergant@...PSBM-98297
Jan DakinevichJan Dakinevich
bdf4b08ccafms/perf/x86/intel: make reusable LBR initialization code, part 2/2This patch introduces globally visible intel_pmu_lbr_fill() routine, which gathers information which LBR MSRs are support for specific CPU family/model. It is supposed that the routine would be used in KVM code, using guest CPU information as an input. By this reason, it should not have any side effect which could affect host system. https://jira.sw.ru/browse/PSBM-75679 Signed-off-by: Jan Dak...2 Jira Issues
Jan DakinevichJan Dakinevich
bacfeef9a1ams/perf/x86/intel: make reusable LBR initialization code, part 1/2Move LBR information from `struct x86_pmu' to separate structure `struct x86_pmu_lbr'. LBR initialization is nailed to perf subsystem and to global 'boot_x86_pmu' structure. To reuse this code and keep these changes readable the work splited into to parts. https://jira.sw.ru/browse/PSBM-75679 Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>PSBM-75679
Jan DakinevichJan Dakinevich
a636b35364dRevert "ms/perf/x86/intel: make reusable LBR initialization code"This reverts commit 59528e58cb39978b4b7c7a2ba57a1ba25749456e. Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Kan LiangKan Liang
aff22705c65ms/x86/CPU: Add more Icelake model numbersAdd the CPUID model numbers of Icelake (ICL) desktop and server processors to the Intel family list. [ Qiuxu: Sort the macros by model number. ] Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <pete...
Peter ZijlstraPeter Zijlstra
20cb398db5ams/x86/cpu: Sanitize FAM6_ATOM namingGoing primarily by: https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors with additional information gleaned from other related pages; notably: - Bonnell shrink was called Saltwell - Moorefield is the Merriefield refresh which makes it Airmont The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE for i in `git grep -l FAM6_ATOM` ; do sed -i -e 's/ATOM_PINEVIEW/ATOM_BONN...
Andy ShevchenkoAndy Shevchenko
2ea6be4fde1ms/x86/cpu: Keep model defines sorted by model numberFor better maintenance keep it sorted by numeric model ID. Add new lines to seperate model groups. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20170316155045.50389-1-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> (cherry picked from commit c238f2343441e399...
Jan DakinevichJan Dakinevich
24af5f2bfdems/KVM: x86: set ctxt->have_exception in x86_decode_insn()x86_emulate_instruction() takes into account ctxt->have_exception flag during instruction decoding, but in practice this flag is never set in x86_decode_insn(). Fixes: 6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed...PSBM-68018
Jan DakinevichJan Dakinevich
6bee5035ac2ms/KVM: x86: always stop emulation on page faultinject_emulated_exception() returns true if and only if nested page fault happens. However, page fault can come from guest page tables walk, either nested or not nested. In both cases we should stop an attempt to read under RIP and give guest to step over its own page fault handler. This is also visible when an emulated instruction causes a #GP fault and the VMware backdoor is enabled. To han...PSBM-68018
Paolo BonziniPaolo Bonzini
d551d9537b7ms/KVM: x86: inject exceptions produced by x86_decode_insnSometimes, a processor might execute an instruction while another processor is updating the page tables for that instruction's code page, but before the TLB shootdown completes. The interesting case happens if the page is in the TLB. In general, the processor will succeed in executing the instruction and nothing bad happens. However, what if the instruction is an MMIO access? If *that* happe...PSBM-68018
Jan DakinevichJan Dakinevich
bf38702c790ve/cpu: handle sysfs attributes for CTsAdd support for 'offline', 'online', 'possible' and 'present' attributes under '/sys/devices/system/cpu' for CTs. All allowed CPUs are assumed as posible and online, list of offline CPUs is empty. This allows lscpu utility to correctly show an amount of CPUs in its output, however topology information is still not provided. https://jira.sw.ru/browse/PSBM-91808 Signed-off-by: Jan Dakinevich <j...PSBM-91808
Vasily AverinVasily Averin
b78765469afOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.4Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Vladimir DavydovVladimir Davydov
55186bf6ecfms/pagemap: Port diff-ms-pagemap-do-not-leak-physical-addresses-to-non-privileged-userspaceAuthor: Konstantin Khorenko Email: khorenko@parallels.com Subject: ms/pagemap: do not leak physical addresses to non-privileged userspace Date: Mon, 23 Mar 2015 19:21:49 +0400 ms commit: ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce Original thread in LKML: https://lkml.org/lkml/2015/3/9/864 https://jira.sw.ru/browse/PSBM-32308 Signed-off-by: Konstantin Khorenko <khorenko@openvz.org> From: "K...3 Jira Issues
Vasily AverinVasily Averin
195a4bcca6eOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.3Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Julien GomesJulien Gomes
d6e63886029ms/tun: allow positive return values on dev_get_valid_name() callML commit: 5c25f65fd1e42685f7ccd80e0621829c105785d9 If the name argument of dev_get_valid_name() contains "%d", it will try to assign it a unit number in __dev__alloc_name() and return either the unit number (>= 0) or an error code (< 0). Considering positive values as error values prevent tun device creations relying this mechanism, therefor we should only consider negative values as errors h...2 Jira Issues
Cong WangCong Wang
4e12e25348ems/tun: call dev_get_valid_name() before register_netdevice()ML commit: 0ad646c81b2182f7fa67ec0c8c825e0ee165696d register_netdevice() could fail early when we have an invalid dev name, in which case ->ndo_uninit() is not called. For tun device, this is a problem because a timer etc. are already initialized and it expects ->ndo_uninit() to clean them up. We could move these initializations into a ->ndo_init() so that register_netdevice() knows better, h...2 Jira Issues
Jens AxboeJens Axboe
0d55e95a45bnbd: don't allow invalid blocksize settingssyzbot reports a divide-by-zero off the NBD_SET_BLKSIZE ioctl. We need proper validation of the input here. Not just if it's zero, but also if the value is a power-of-2 and in a valid range. Add that. Cc: stable@vger.kernel.org Reported-by: syzbot <syzbot+25dbecbec1e62c6b0dd4@syzkaller.appspotmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef BacikJosef Bacik
5306bf2d892nbd: stop leaking socketsThis was introduced in the multi-connection patch, we've been leaking socket's ever since. Fixes: 9561a7a ("nbd: add multi-connection support") cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 6a8a21546507a3ec88e81c2ec927a3fb63efa8ff) Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Josef BacikJosef Bacik
ae53d3c3ab0nbd: cleanup workqueue on error properlyIf we fail to register the blockdev we need to make sure to destroy the recv workqueue. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 6330a2d0b465527d621a9d95cad6b2fc0a959f13) Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Josef BacikJosef Bacik
dd49f8c1784nbd: set the logical and physical blocksize properlyWe noticed when trying to do O_DIRECT to an export on the server side that we were getting requests smaller than the 4k sectorsize of the device. This is because the client isn't setting the logical and physical blocksizes properly for the underlying device. Fix this up by setting the queue blocksizes and then calling bd_set_size. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jen...
Josef BacikJosef Bacik
8ad986dacbenbd: cleanup ioctl handlingBreak the ioctl handling out into helper functions, some of these things are getting pretty big and unwieldy. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 9442b739207aab6b1053abf858a238e7642fbcd1) Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>