OpenVZ Source code
  1. OpenVZ Source code

vzkernel

Public
AuthorCommitMessageCommit dateIssues
Konstantin KhorenkoKonstantin Khorenko
6c2ebd90e9cOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.11Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Ross ZwislerRoss Zwisler
9ad39a201b1ms/ext4: use jbd2_inode dirty range scopingUse the newly introduced jbd2_inode dirty range scoping to prevent us from waiting forever when trying to complete a journal transaction. Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org (cherry picked from commit 73131fbb003b3691cfcf9656f234b00da497fcd6) Signed-off-by: Andrey Ryabinin ...
Ross ZwislerRoss Zwisler
69e7012e8e2ms/jbd2: introduce jbd2_inode dirty range scopingCurrently both journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() operate on the entire address space of each of the inodes associated with a given journal entry. The consequence of this is that if we have an inode where we are constantly appending dirty pages we can end up waiting for an indefinite amount of time in journal_finish_inode_data_buffers() while we wait fo...
Ross ZwislerRoss Zwisler
7634b6c6cfcms/mm: add filemap_fdatawait_range_keep_errors()In the spirit of filemap_fdatawait_range() and filemap_fdatawait_keep_errors(), introduce filemap_fdatawait_range_keep_errors() which both takes a range upon which to wait and does not clear errors from the address space. Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org (cherry picked ...
Jan KaraJan Kara
9cf08f4c696ms/ext4: do not ask jbd2 to write data for delalloc buffersCurrently we ask jbd2 to write all dirty allocated buffers before committing a transaction when doing writeback of delay allocated blocks. However this is unnecessary since we move all pages to writeback state before dropping a transaction handle and then submit all the necessary IO. We still need the transaction commit to wait for all the outstanding writeback before flushing disk caches durin...
Jan KaraJan Kara
ecc4310112dms/jbd2: add support for avoiding data writes during transaction commitsCurrently when filesystem needs to make sure data is on permanent storage before committing a transaction it adds inode to transaction's inode list. During transaction commit, jbd2 writes back all dirty buffers that have allocated underlying blocks and waits for the IO to finish. However when doing writeback for delayed allocated data, we allocate blocks and immediately submit the data. Thus as...
Andrey RyabininAndrey Ryabinin
f7bdb500460fs-writeback: add cond_resched() in wb_do_writeback()In case of too many writeback works we might not reschedule for too long. Add cond_resched() just in case. https://jira.sw.ru/browse/PSBM-97743 Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>PSBM-97743
Konstantin KhorenkoKonstantin Khorenko
849fc13df16Revert "fs: avoid writeback busy-loop if redirty"This reverts commit 351104c35fa9e976d68c80169bfd62bba26385a6. The patch made sense previously because fuse_writepages() called redirty_page_for_writepage(), but nowadays it does not, so let's remove the hack. Found while working in the scope of https://jira.sw.ru/browse/PSBM-97743 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>PSBM-97743
Konstantin KhorenkoKonstantin Khorenko
2b1fcd5a2eaOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.10Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Pavel TikhomirovPavel Tikhomirov
7606beb9a2cve: add a comment about possible pseudosuper raceSigned-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Pavel TikhomirovPavel Tikhomirov
d478d513af0blk-wbt: increase maximum queue depth to increase performance of writesWith wbt patches on simple test: #!/bin/bash rm -rf /my/filetree echo 3 > /proc/sys/vm/drop_caches sync time tar -xzf /my/filetree.zip -C /my time sync we have a performance degradation of ~20-50% of last sync. That looks connected with the fact that SATA devices always have a small queue depth (request_queue->queue_depth == 31) and thus wbt is limiting the maximum number of inf...PSBM-96243
Pavel TikhomirovPavel Tikhomirov
3995bb3ca52block: enable CONFIG_BLK_WBT*https://jira.sw.ru/browse/PSBM-96243 Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> ===================== Patchset description: block: backport writeback throttling We have a problem that if we run heavy write load on one cpu simultaneousely with short direct reads on other cpu, the latter will hang significantly. Writeback throttling looks like a sollution for these reads, as ...PSBM-96243
Jens AxboeJens Axboe
785b8b84150ms/block: hook up writeback throttlingEnable throttling of buffered writeback to make it a lot more smooth, and has way less impact on other system activity. Background writeback should be, by definition, background activity. The fact that we flush huge bundles of it at the time means that it potentially has heavy impacts on foreground workloads, which isn't ideal. We can't easily limit the sizes of writes that we do, since that wo...PSBM-96243
Jens AxboeJens Axboe
716d6e8c15ams/blk-wbt: add general throttling mechanismWe can hook this up to the block layer, to help throttle buffered writes. wbt registers a few trace points that can be used to track what is happening in the system: wbt_lat: 259:0: latency 2446318 wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1, wmean=518866, wmin=15522, wmax=5330353, wsamples=57 wbt_step: 259:0: step down: step=1, window=72727272, backg...PSBM-96243
Omar SandovalOmar Sandoval
5890478e4fems/block: get rid of struct blk_issue_statstruct blk_issue_stat squashes three things into one u64: - The time the driver started working on a request - The original size of the request (for the io.low controller) - Flags for writeback throttling It turns out that on x86_64, we have a 4 byte hole in struct request which we can fill with the non-timestamp fields from blk_issue_stat, simplifying things quite a bit. Signed-off-by: Omar...PSBM-96243
Jens AxboeJens Axboe
724cd8b5da0ms/writeback: track if we're sleeping on progress in balance_dirty_pages()Note in the bdi_writeback structure whenever a task ends up sleeping waiting for progress. We can use that information in the lower layers to increase the priority of writes. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> https://jira.sw.ru/browse/PSBM-96243 (cherry picked from commit b57d74aff9ab92fbfb7c197c384d1adfa2827b2e) Signed-off-by: Andrey Ryabinin <ary...PSBM-96243
Jens AxboeJens Axboe
78b88036902ms/writeback: mark background writeback as suchIf we're doing background type writes, then use the appropriate background write flags for that. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> https://jira.sw.ru/browse/PSBM-96243 (cherry picked from commit 13edd5e7315a26b448c5f7f33fc7721b1e0c17ef) Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> ===================== Patchset description: ...PSBM-96243
Jens AxboeJens Axboe
7f4e4674f3cms/writeback: add wbc_to_write_flags()Add wbc_to_write_flags(), which returns the write modifier flags to use, based on a struct writeback_control. No functional changes in this patch, but it prepares us for factoring other wbc fields for write type. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> https://jira.sw.ru/browse/PSBM-96243 (cherry picked from co...PSBM-96243
Jens AxboeJens Axboe
4fef6861798ms/block: add REQ_BACKGROUNDThis adds a new request flag, REQ_BACKGROUND, that callers can use to tell the block layer that this is background (non-urgent) IO. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> https://jira.sw.ru/browse/PSBM-96243 (cherry picked from commit 1d796d6a9641fbfcd90fcfaf6fb4894a13d0304f) Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by...PSBM-96243
Pavel TikhomirovPavel Tikhomirov
8632b85239dx86/asm: remove the unused get_limit() methodhttps://jira.sw.ru/browse/PSBM-96243 (inspired by commit 72d64cc76941cde45e65e2a5b9fb81d527963645) To fix compilation of ("blk-wbt: add general throttling mechanism") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> ===================== Patchset description: block: backport writeback throttling We have a problem that if we run heavy write load on one cpu simultaneousely with sho...PSBM-96243
Konstantin KhorenkoKonstantin Khorenko
2de980dd472ve/net/core: allow to call setsockopt(SO_SNDBUFFORCE) from Containers"nft" util (in CentOS 8 environment) does use setsockopt(SO_SNDBUFFORCE) unconditionally, so we have to allow it from inside a Container. At the same time we don't want to allow a Container to set too much memory for a socket, so just threat SO_SNDBUFFORCE like SO_SNDBUF if called inside a Container. Simple rule to test: # nft add rule filter INPUT ct state related,established accept https:...PSBM-98794
Eric DumazetEric Dumazet
ed56cda1474ms/af_key: do not use GFP_KERNEL in atomic contextspfkey_broadcast() might be called from non process contexts, we can not use GFP_KERNEL in these cases [1]. This patch partially reverts commit ba51b6be38c1 ("net: Fix RCU splat in af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock() section. [1] : syzkaller reported : in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439 3 locks held by syzkaller183439/2932: #0...PSBM-94668
David AhernDavid Ahern
a90b876e439ms/net: Fix RCU splat in af_keyHit the following splat testing VRF change for ipsec: [ 113.475692] =============================== [ 113.476194] [ INFO: suspicious RCU usage. ] [ 113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted [ 113.477545] ------------------------------- [ 113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-s...PSBM-94668
Daniel BorkmannDaniel Borkmann
8be3459fe13ms/tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter[ Upstream commit 5a5abb1fa3b05dd6aa821525832644c1e7d2905f ] Sasha Levin reported a suspicious rcu_dereference_protected() warning found while fuzzing with trinity that is similar to this one: [ 52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage! [ 52.765688] other info that might help us debug this: [ 52.765695] rcu_scheduler_active = 1, debug_locks = ...PSBM-94755
Konstantin KhorenkoKonstantin Khorenko
fd48f655e5bOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.9Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Ben HutchingsBen Hutchings
6c68a869397ms/tcp: Clear sk_send_head after purging the write queueDenis Andzakovic discovered a potential use-after-free in older kernel versions, using syzkaller. tcp_write_queue_purge() frees all skbs in the TCP write queue and can leave sk->sk_send_head pointing to freed memory. tcp_disconnect() clears that pointer after calling tcp_write_queue_purge(), but tcp_connect() does not. It is (surprisingly) possible to add to the write queue between disconnec...PSBM-97312
Konstantin KhorenkoKonstantin Khorenko
46f1803c0beRevert "venet: add venet_free_stat callback"This reverts commit 4275964f99a30bfaad72a1e8c97a800655f45769. We don't have closed source modules long ago, so don't need venet_free_stat() callback. Fixes: a6fce566d713 ("drivers/net/ve: venet network device introduced") https://jira.sw.ru/browse/PSBM-69078 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>PSBM-69078
Konstantin KhorenkoKonstantin Khorenko
91d8c3e8a14Revert "fs: use original vfsmount for touch_atime"This reverts commit c03ccabe433a3cf811477358975e9d0a4bd0c144. The patch being reverted had been introduced in the scope of https://jira.sw.ru/browse/PSBM-51009 Now it's reverted in the scope of https://jira.sw.ru/browse/PSBM-78863 because Red Hat had backported 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") long ago and now xfstests should wor...2 Jira Issues
Konstantin KhorenkoKonstantin Khorenko
b4f99d63887Revert "ve/fs/fadvise: introduce FADV_DEACTIVATE flag"This reverts commit 114935b7b36b5d228fdce234f885701fe53774b2. Drop our home brew fadvise FADV_DEACTIVATE flag which was introduced in the scope of https://jira.sw.ru/browse/PSBM-57915 Reasoning: it won't be used because of performance degradation: https://pmc.acronis.com/browse/VSTOR-22963 https://jira.sw.ru/browse/PSBM-94829 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>3 Jira Issues
Konstantin KhorenkoKonstantin Khorenko
6407127a944Revert "mm/page_alloc.c: check if page cgroup still in use during alloc/free."This reverts commit cf8aaacbc2c2b191b39ef9f677dbd5cbba3729a1. Reverting the debug patch: it did not provide us any single case, at the same time issues are not reported anymore, probably it was fixed as a side effect. Anyway, we don't need a debug BUG_ON() in the release code. https://jira.sw.ru/browse/PSBM-96036 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>PSBM-96036
Konstantin KhorenkoKonstantin Khorenko
fecffac6edaOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.8Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Ildar IsmagilovIldar Ismagilov
b9ae1b8f967fs/fuse kio: don't wait read requests in case of fsync/flushIn this patch, the KIO requests are divided into two types: read and write. And in case of fsync/flush we only wait for completion write requests. https://pmc.acronis.com/browse/VSTOR-11372 Signed-off-by: Ildar Ismagilov <ildar.ismagilov@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuznet@acronis.com>VSTOR-11372
Pavel TikhomirovPavel Tikhomirov
1a31e40d3eeve: use host ipt_mask for pseudosuper in setsockopt and request moduleWe have a quiet tricky design problem: 1) CRIU want's to run both the CT init task and it's parent CRIU task in VEX cgroup so that most restore steps obey VEX rules; 2) VEX cgroup allows only _one_ process to join on CT preparation step; 3) (1) + (2) -> We clone CT init task while in VEX already; 4) Mounts in CRIU are restored in temporary mntns, and later they are copied to their CT mount na...PSBM-98702
Vasily AverinVasily Averin
e54d6e25df6ms/netfilter: nf_tables: avoid global info storageML commit 2a43ecf96ba6a6eed70dbcd99d0888fc0ad3b82b Author: Florian Westphal <fw@strlen.de> Date: Wed Jul 11 13:45:13 2018 +0200 netfilter: nf_tables: avoid global info storage This works because all accesses are currently serialized by nfnl nf_tables subsys mutex. If we want to have per-netns locking, we need to make this scratch area pernetns or allocate it on demand. This does the l...PSBM-98682
Vasily AverinVasily Averin
42463cc52c9rdma/i40iw: hide high-order-allocation warning in i40iw_save_msix_info()Size of struct i40iw_msix_vector was increased in RHEL7.7 kernels due to ML commit 43731753c4b7 ("RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint"). It triggers high-order-allocation warning in i40iw_save_msix_info(). The patch disables this warning. https://pmc.acronis.com/browse/VSTOR-27273 Signed-off-by: Vasily Averin <vvs@virtuozzo.com>VSTOR-27273
Pavel TikhomirovPavel Tikhomirov
1b77ab0ea80ve/exec: allow trusted exec change both on boot and on running systemCan be configured either with no_trusted_exec boot option of via /proc/sys/fs/trusted_exec sysctl, by default it is enabled. (When "fs.trusted_exec" is enabled (==1) it means, the defense is "on"). https://jira.sw.ru/browse/PSBM-98702 Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>PSBM-98702
Pavel TikhomirovPavel Tikhomirov
ff3f7964a87ve/fs/exec: send SIGSEGV to a process trying to execute untrusted filesIt can help faster find out the cause of the problem in case userspace is executing CT binary from host. Logs are not enough sometimes. Avoid disk overflown with coredumps by ratelimiting them to 3 times a day. https://jira.sw.ru/browse/PSBM-98702 Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>PSBM-98702
Konstantin KhorenkoKonstantin Khorenko
0bae0c7c67aOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.7Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Pavel TikhomirovPavel Tikhomirov
7efd5dd465bsysfs/ploop: add new device option trustedAfter the patch "ve/fs/exec: don't allow a privileged user to execute untrusted files" we need a way to execute files on trusted ploop. A new file is added on sysfs (default 0 - exec security check enabled): /sys/devices/virtual/block/ploopXXXXX/ptune/trusted Writing 1 to the file will allow execution. On PLOOP_IOC_STOP ioctl (ve stop) the value is dropped back to default. Note: execution o...PSBM-98234
Pavel TikhomirovPavel Tikhomirov
d85a5b7c31bve/fs/exec: don't allow a privileged user to execute untrusted filesIf we run some binary (exploit) from CT on host, it can easily give a user in these CT an ability to do anything on host sending commands through unix socket to the exploit. Such an exploit can mimic to bash, ip, systemd, ping or some other "trusted" utility. I've tested with these patch that we don't call from VE0 any binaries from CT-fs on start, stop, enter, suspend, resume or migration. Bu...PSBM-98094
Konstantin KhorenkoKonstantin Khorenko
0bc3343d3dbOpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.6Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Andrey RyabininAndrey Ryabinin
8bec805e4abmm/slab: Fix deadlock on attempt to shrink slab.echo 1 > /sys/kernel/slab/<name>/shrink deadlocks as kmem_cache_shrink() attempts to lock slab_mutex which is already held by caller. Replace slab_mutex locking with [get,put]_online_mems(). This is what the sane kernel does. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Andrey RyabininAndrey Ryabinin
a12399732edmm/memcg: restore lost css_put() in memcg_kmem_cache_create_func()Restore lost css_put() in memcg_kmem_cache_create_func() otherwise mem cgroup cannot be destroyed. https://jira.sw.ru/browse/PSBM-98444 Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>PSBM-98444
Andrey RyabininAndrey Ryabinin
ff6496ddc93ve/net/netfilter/core: Don't allow container to crash the kernel.The expression BUG_ON(!ve_is_super(get_exec_env())); basically says that we allow to crash the kernel if we are in container. This doesn't make any sense, remove this idiocy. https://jira.sw.ru/browse/PSBM-98211 Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>PSBM-98211
Andrey RyabininAndrey Ryabinin
081f6e9e8f7ve/kmod, nf_tables: allow nf_tables.ko autoloading on request from ve.Allow nf_tables.ko module autloading from CT. Needed for iptables in centos 8. https://jira.sw.ru/browse/PSBM-98211 Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>PSBM-98211
Konstantin KhorenkoKonstantin Khorenko
5308f492456OpenVZ kernel rh7-3.10.0-1062.1.2.vz7.114.5Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Kirill TkhaiKirill Tkhai
1323f47d486ploop: Kill noop complete_merge methodIt's unused after previous patch. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Kirill TkhaiKirill Tkhai
54ab284ad59ploop: Assign holes_bitmap before merge on merge-no-return pointSince we merge clusters back into base image, we should take into account holes into base image and reuse them firstly. Otherwise base image may grow unexpected. Thus, we move populate_holes_bitmap() into start_merge. https://jira.sw.ru/browse/PSBM-98313 Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>PSBM-98313
Benjamin CoddingtonBenjamin Coddington
9c4ba52547ams/NFS: Fix a double unlock from nfs_match,get_clientNow that nfs_match_client drops the nfs_client_lock, we should be careful to always return it in the same condition: locked. Fixes: 950a578c6128 ("NFS: make nfs_match_client killable") Reported-by: syzbot+228a82b263b5da91883d@syzkaller.appspotmail.com Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> https://jira.sw.ru/browse/PS...PSBM-98297
Benjamin CoddingtonBenjamin Coddington
31b27ff4659ms/NFS: Cleanup if nfs_match_client is interruptedDon't bail out before cleaning up a new allocation if the wait for searching for a matching nfs client is interrupted. Memory leaks. Reported-by: syzbot+7fe11b49c1cc30e3fce2@syzkaller.appspotmail.com Fixes: 950a578c6128 ("NFS: make nfs_match_client killable") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> https://jira...PSBM-98297