OpenVZ Source code
  1. OpenVZ Source code

vzkernel

Public

Commits

AuthorCommitMessageCommit DateIssues
Kirill TkhaiKirill Tkhai
7fded4f7cd9ms/ext4: Actually request zeroing of inode table after grow[This is reviewed and goes to ms] It is never possible, that number of block groups decreases, since only online grow is supported. But after a growing occured, we have to zero inode tables for just created new block groups. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz>
Konstantin KhorenkoKonstantin Khorenko
4893e61b315OpenVZ kernel rh7-3.10.0-957.10.1.vz7.94.17Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Kirill TkhaiKirill Tkhai
1a7f38dfe63ploop: kaio: Enter standby mode on EIO as wellvstorage may return EIO on lease loss (in addition to EBUSY and ENOTCONN). It's difficult to make vstorage and fastpath to return only EBUSY and ENOTCONN in such situations. So, Andrei suggested to enter standby mode on EIO too. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Andrey Zaitsev <azaitsev@virtuozzo.com>
Kirill TkhaiKirill Tkhai
75afcf0f1cdploop: Clear abort bit on replace deltaNew delta should perform IO well, so clear the bit to allow bio handling. https://pmc.acronis.com/browse/VSTOR-22414 Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Andrey Zaitsev <azaitsev@virtuozzo.com>VSTOR-22414
Kirill TkhaiKirill Tkhai
7c9779b74d9ploop: Repopulate holes_bitmap on changing deltaNew delta is black box, and holes may be anywhere. So, we have to populate holes_bitmap in this case. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Michael S. TsirkinMichael S. Tsirkin
6ce387b733bDMA-API: exceeded 7 overlapping mappings of cacheline shareOK, so I enabled CONFIG_DMA_API_DEBUG: and now I get: [ 178.789451] ------------[ cut here ]------------ [ 178.789558] DMA-API: exceeded 7 overlapping mappings of cacheline 0x000000001a161a80 [ 178.789578] WARNING: CPU: 7 PID: 1223 at kernel/dma/debug.c:523 add_dma_entry+0x1f6/0x200 [ 178.789580] Modules linked in: kvm_intel nouveau kvm iwlmvm iwlwifi [ 178.789592] CPU: 7 PID: 1223 Comm: ...PSBM-93919
Konstantin KhorenkoKonstantin Khorenko
950c4e29f37config.OpenVZ.debug: disable CONFIG_AMD_MEM_ENCRYPTWith CONFIG_AMD_MEM_ENCRYPT and KASAN enabled the debug kernel does not boot on modern EPYC AMD node. https://jira.sw.ru/browse/PSBM-93777 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>PSBM-93777
Konstantin KhorenkoKonstantin Khorenko
9f45321bb50OpenVZ kernel rh7-3.10.0-957.10.1.vz7.94.16Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Michal HockoMichal Hocko
290eb4a984cms/vmpressure: make sure there are no events queued after memcg is offlinedvmpressure is called synchronously from reclaim where the target_memcg is guaranteed to be alive but the eventfd is signaled from the work queue context. This means that memcg (along with vmpressure structure which is embedded into it) might go away while the work item is pending which would result in use-after-release bug. We have two possible ways how to fix this. Either vmpressure pins me...PSBM-93884
Chris WilsonChris Wilson
0745ffa3744ms/mm/slub.c: run free_partial() outside of the kmem_cache_node->list_lockWith debugobjects enabled and using SLAB_DESTROY_BY_RCU, when a kmem_cache_node is destroyed the call_rcu() may trigger a slab allocation to fill the debug object pool (__debug_object_init:fill_pool). Everywhere but during kmem_cache_destroy(), discard_slab() is performed outside of the kmem_cache_node->list_lock and avoids a lockdep warning about potential recursion: ======================...PSBM-93885
Konstantin KhorenkoKonstantin Khorenko
8f2bfcd1dd0kernfs: keep kernfs node alive for __kernfs_remove()__kernfs_remove() which is called under kernfs_mutex, assumes nobody kills kernfs node whie it's working on it and "get"s current kernfs node for that. But we hit a warning in kernfs_get(): kn->counter == 0 already: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 63923 at fs/kernfs/dir.c:377 kernfs_get+0x2f/0x40 ... Call Trace: [<ffffffffa7f92e67>] dump_stack+0x19/0x1b ...PSBM-93611
Konstantin KhorenkoKonstantin Khorenko
42689274ab5OpenVZ kernel rh7-3.10.0-957.10.1.vz7.94.15Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Kirill TkhaiKirill Tkhai
c6aa2b675feploop: Fix locking imballanceMove file_start_write() up. Fixes: 6b526783df00 ("ploop: Fallocate cluster in cached_submit() during hole reuse") https://jira.sw.ru/browse/PSBM-93873 Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>PSBM-93873
Konstantin KhorenkoKonstantin Khorenko
55dbe48bbceOpenVZ kernel rh7-3.10.0-957.10.1.vz7.94.14Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Kirill TkhaiKirill Tkhai
3209b332026ploop: Add check that ploop don't grow endlesslyEven during resize, ploop can must follow limitations. Add a sanity check for that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Kirill TkhaiKirill Tkhai
5b36c6bd1ecnet: Allow autoloading conntrack nft-helper-* modulesOtherwise, in case of destination node does not have modules loaded, CT migration fails. https://jira.sw.ru/browse/PSBM-90319 Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>PSBM-90319
Pavel ButsykinPavel Butsykin
18533eab294fs/fuse kio: add warning about jumbo chunksKIO doesn't support jumbo chunks yet, so all requests to jumbo chunks are silently redirected to user-space. It will be useful to see a message about this until support has been added to KIO. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuznet@virtuozzo.com> ================================== Patchset description: fix jumbo chunk warning Initially the w...
Pavel ButsykinPavel Butsykin
4046801685afs/fuse kio: sync pcs_mds_sys_info structFor some reason pcs_mds_sys_info structure is different in the kernel and userspace. Let's synchronize it to avoid inaccuracies and discrepancies in the future. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuznet@virtuozzo.com> ================================== Patchset description: fix jumbo chunk warning Initially the warning was added incorrectly du...
Pavel ButsykinPavel Butsykin
b57aff4506dfs/fuse kio: export io_localityWe will need this option for performance analysis. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> ===================== Patchset description: KIO performance fixes This patch-set aims to fix the performance issue with single-thread sequential async reads. https://pmc.acronis.com/browse/VSTOR-11050 Acked-by: Alexey Kuznetsov <kuznet@virtuozzo.com> Pavel Butsykin (4): fs/fuse ki...VSTOR-11050
Pavel ButsykinPavel Butsykin
b6133ff6b30fs/fuse kio: add missed sock write in pcs_sock_sendmsg()We need to write the ready data to socket, in case write_queue list is empty, instead of rescheduling it. This will help maintain a balance between recv and send, because after rescheduling the receive will be called first. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> ===================== Patchset description: KIO performance fixes This patch-set aims to fix the performance issu...VSTOR-11050
Pavel ButsykinPavel Butsykin
d943b6f98c2fs/fuse kio: relax congestion avoidance limits (backport from usermode)Investigation of VZ US-QA cluster shows that congestion window reduction after idle periods results in too slow window open after data start to flow again. So, introduce ssthresh to allow faster window open after idle periods. Maybe, even this is not enough and window should be open even more aggressively. Further observations will show. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com...VSTOR-11050
Pavel ButsykinPavel Butsykin
ae9c6c60aa5fs/fuse kio: fix a typo in worth_to_grow()It was supposed the function returns true if time has passed less than netlat_cutoff since the request was sent. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> ===================== Patchset description: KIO performance fixes This patch-set aims to fix the performance issue with single-thread sequential async reads. https://pmc.acronis.com/browse/VSTOR-11050 Acked-by: Alexey Kuzn...VSTOR-11050
Konstantin KhorenkoKonstantin Khorenko
7449d6da90aconfig.OpenVZ.minimal: enable R8169, PATA_ATIIXP, X86_AMD_PLATFORM_DEVICE, NET_DEVLINKFor some reason NET_DEVLINK was not enabled in minimal config, so bring it back. Other options are required to boot on one of our AMD nodes. +CONFIG_NET_DEVLINK=y +CONFIG_X86_AMD_PLATFORM_DEVICE=y +CONFIG_PATA_ATIIXP=y +CONFIG_R8169=y https://jira.sw.ru/browse/PSBM-93812 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>PSBM-93812
Konstantin KhorenkoKonstantin Khorenko
ef5f88eb5f7tty/vt: avoid high order pages allocation on GIO_UNIMAP ioctlGIO_UNIMAP can easily result in a high order allocation, seen 6th order allocation on radeondrmfb: fbcon: radeondrmfb (fb0) is primary device Console: switching to colour frame buffer device 160x64 radeon 0000:01:05.0: fb0: radeondrmfb frame buffer device WARNING: CPU: 0 PID: 78661 at mm/page_alloc.c:3532 __alloc_pages_nodemask+0x1b1/0x600 order 6 >= 3, gfp 0x40d0 At the same time ...PSBM-93812
Pavel ButsykinPavel Butsykin
93b7b846b6bfs/fuse kio: export fastpath protocol versionIn order to transfer the logic of the fallback decision to user-space, let's add export fastpath version. https://jira.sw.ru/browse/PSBM-93637 Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>PSBM-93637
Konstantin KhorenkoKonstantin Khorenko
b9cb7cedbe7mm/netlink: Make all netlink skb memory be kmem accountedAt the moment we are able to account vmalloc() memory, so drop prohibition of using vmalloc() for big netlink messages in Containers and update allocation flags up to accountable version. As a side result we get rid of high order pages allocations for big netlink messages. Fixes: 84708b8d44e9 ("mm/netlink: Make all in-cg memory be kmem accounted") https://jira.sw.ru/browse/PSBM-93761 Signed-...PSBM-93761
Konstantin KhorenkoKonstantin Khorenko
732455ed5c9OpenVZ kernel rh7-3.10.0-957.10.1.vz7.94.13Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Kirill TkhaiKirill Tkhai
98c63fbc561ploop: Make PLOOP_IOC_FREEBLKS always return 0This passes balloon file blocks to kernel. New native discard scheme should send discard in these blocks. Make it return 0 for a while to make tests happy (later, it will be needed to implement cycle, sending discards in the blocks). Note, that the ioctl won't return any data to userspace anymore, since in new scheme it's not needed to call PLOOP_IOC_RELOCBLKS after that (everything should be ...
Kirill TkhaiKirill Tkhai
cdc9c8f4750ploop: Disable ioctl(PLOOP_IOC_BALLOON)This ioctl enters in discard maintaince mode in hidden way. The discard logic is rewritten, so we disable it. v2: Leave entering into PLOOP_MNTN_BALLOON, since it protects ploop against entering another maintaince mode (grow, etc). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Kirill TkhaiKirill Tkhai
00f131b7082ploop: Add native discard support parameterThis adds a possibility to determ whether driver supports native discard (without maintaince mode). Currently, we show features in module parameters and this was started since "large_disk_support". The patch continues the way, but the parameter is made RW to allow prohibit the feature on flight. https://jira.sw.ru/browse/PSBM-93731 Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>PSBM-93731
Kirill GorkunovKirill Gorkunov
7ac4df2b35dprctl: Fix false positive in validate_prctl_mapWhile validating new map we require the @start_data to be strictly less than @end_data, which is fine for regular applications (this is why this nit didn't trigger for that long). These members are set from executable loaders such as elf halders, still it is pretty valid to have a loadable data section with zero size in file, in such case the start_data is equal to end_data once kernel loader f...PSBM-93526
Konstantin KhorenkoKonstantin Khorenko
ba5547a703cOpenVZ kernel rh7-3.10.0-957.10.1.vz7.94.12Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Konstantin KhorenkoKonstantin Khorenko
e30e977ad5fproc/vestat: show correct maxlat in /proc/vz/vestatDon't show pointer as a latency value, it does not look valid. Fixes: 9ecc9b390a5d ("/proc/<pid>/vz_latency: Show maximal allocation latency in the last second.") https://jira.sw.ru/browse/PSBM-93675 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>PSBM-93675
Martin ZhangMartin Zhang
9fcf5efe9ecms/net: use skb_clone to avoid alloc_pages failure.1. new skb only need dst and ip address(v4 or v6). 2. skb_copy may need high order pages, which is very rare on long running server. Signed-off-by: Junwei Zhang <linggao.zjw@alibaba-inc.com> Signed-off-by: Martin Zhang <martinbj2008@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> https://jira.sw.ru/browse/PSBM-93713 (cherry picked ...PSBM-93713
Eric DumazetEric Dumazet
36e86920560ms/tcp: fix potential huge kmalloc() calls in TCP_REPAIRtcp_send_rcvq() is used for re-injecting data into tcp receive queue. Problems : - No check against size is performed, allowed user to fool kernel in attempting very large memory allocations, eventually triggering OOM when memory is fragmented. - In case of fault during the copy we do not return correct errno. Lets use alloc_skb_with_frags() to cook optimal skbs. Fixes: 292e8d8c8538 ("...PSBM-93672
Andrey RyabininAndrey Ryabinin
85d3da078ddnet/skuff: WARN if kmalloc_reserve() fails to allocate memory.Commit c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves") removed memory allocation failure message in cases when pfmemalloc reserves are not allowed to use. Warning makes easier to observe network performance issues which might happen due to allocation failures. So remove __GFP_NOWARN unless we can use fallback allocation with __GFP_MEMALLOC. Also remove __GFP_NOMEMALLOC ...VSTOR-21390
Andrey RyabininAndrey Ryabinin
fe0b01d2eeanet/skbuff: Don't waste memory reservesWe were observing network performance issues due to packets being dropped by sk_filter_trim_cap() since the 'skb' was allocated from pfmemalloc reserves: /* * If the skb was allocated from pfmemalloc reserves, only * allow SOCK_MEMALLOC sockets to use it as this socket is * helping free memory */ if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) retu...VSTOR-21390
Peter ShierPeter Shier
e2257ac3583ms/KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)Bugzilla: 1671904 There are multiple code paths where an hrtimer may have been started to emulate an L1 VMX preemption timer that can result in a call to free_nested without an intervening L2 exit where the hrtimer is normally cancelled. Unconditionally cancel in free_nested to cover all cases. Embargoed until Feb 7th 2019. Signed-off-by: Peter Shier <pshier@google.com> Reported-by: Jim Matt...2 JIRA Issues
Jann HornJann Horn
68b39a9a930ms/kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)kvm_ioctl_create_device() does the following: 1. creates a device that holds a reference to the VM object (with a borrowed reference, the VM's refcount has not been bumped yet) 2. initializes the device 3. transfers the reference to the device to the caller's file descriptor table 4. calls kvm_get_kvm() to turn the borrowed reference to the VM into a real reference The ownership transfe...2 JIRA Issues
Konstantin KhorenkoKonstantin Khorenko
f93b00328dfOpenVZ kernel rh7-3.10.0-957.10.1.vz7.94.11Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Kirill TkhaiKirill Tkhai
eb55312dac3ms/fuse: Wake up req->waitq of only if not backgroundms commit 5e0fed717a38 Currently, we wait on req->waitq in request_wait_answer() function only, and it's never used for background requests. Since wake_up() is not a light-weight macros, instead of this, it unfolds in really called function, which makes locking operations taking some cpu cycles, let's avoid its call for the case we definitely know it's completely useless. Signed-off-by: Mikl...
Kirill TkhaiKirill Tkhai
0c05175bb91ms/fuse: Optimize request_end() by not taking fiq->waitq.lockms commit 217316a60101 We take global fiq->waitq.lock every time, when we are in this function, but interrupted requests are just small subset of all requests. This patch optimizes request_end() and makes it to take the lock when it's really needed. queue_interrupt() needs small change for that. After req is linked to interrupt list, we do smp_mb() and check for FR_FINISHED again. In case of ...
Kirill TkhaiKirill Tkhai
40a3ff3f708ms/fuse: Kill fasync only if interrupt is queued in queue_interrupt()ms commit 8da6e9183275 We should sent signal only in case of interrupt is really queued. Not a real problem, but this makes the code clearer and intuitive. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Konstantin KhorenkoKonstantin Khorenko
d508dab219cOpenVZ kernel rh7-3.10.0-957.10.1.vz7.85.10Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Kirill TkhaiKirill Tkhai
dad342fde54ploop: Fix huge grow with head clusters relocationIn case of resize is huge, start of data clusters are relocated. This case we can relay on data_off_in_clusters(), so we should move all bits from new start bit to 0 (this would be a fix). But this is uncomfortable and make code complicated. So, we won't try to save 1 / (8 * 1024 * 1024) of bitmap size like we do now, and we will populate the bitmap with absolute numbers of bits (from start of ...PSBM-93243
Kirill TkhaiKirill Tkhai
60b4323aebdploop: Populate and maintain holes bitmapHoles bitmap is needed for allocation of next free cluster. Otherwise we don't know, where cluster number should be taken. TODO: Flag to handle broken allocs (set bit back) Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> ===================== Patchset description: ploop: Discard with zeroing of ploop1 indexes support https://jira.sw.ru/browse/PSBM-92367 https://pmc.acronis.com/browse/VS...2 JIRA Issues
Kirill TkhaiKirill Tkhai
6b526783df0ploop: Fallocate cluster in cached_submit() during hole reuse__map_extent_bmap() is for raw format, when we don't have information about presence of a cluster. Ploop1 must allocate all the space in beginning of cached_submit() function. Otherwise, we can't control what is going on. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> ===================== Patchset description: ploop: Discard with zeroing of ploop1 indexes support https://jira.sw.ru/br...2 JIRA Issues
Kirill TkhaiKirill Tkhai
fc532d43521ploop: Zero indexes on discardShitch preq into PLOOP_E_DATA_WBI state to continue execution after discard's write is finished. Zero index in that stage. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> ===================== Patchset description: ploop: Discard with zeroing of ploop1 indexes support https://jira.sw.ru/browse/PSBM-92367 https://pmc.acronis.com/browse/VSTOR-19972 Kirill Tkhai (10): ploop: Export m...2 JIRA Issues
Kirill TkhaiKirill Tkhai
5b417b4fe22ploop: Add .complete_merge methodIt will be used to reallocated holes bitmap after merge. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> ===================== Patchset description: ploop: Discard with zeroing of ploop1 indexes support https://jira.sw.ru/browse/PSBM-92367 https://pmc.acronis.com/browse/VSTOR-19972 Kirill Tkhai (10): ploop: Export map defines to separate header file ploop: Make submit_alloc(...2 JIRA Issues
Kirill TkhaiKirill Tkhai
e3bf8d5acf2ploop: Introduce data_off_in_clusters() helperSigned-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> ===================== Patchset description: ploop: Discard with zeroing of ploop1 indexes support https://jira.sw.ru/browse/PSBM-92367 https://pmc.acronis.com/browse/VSTOR-19972 Kirill Tkhai (10): ploop: Export map defines to separate header file ploop: Make submit_alloc() return int value ploop: Introduce ploop_submit_all...2 JIRA Issues