OpenVZ Source code / vzkernel

Author	Commit	Message	Commit date	Issues
Konstantin Khorenko	567f230d52c	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.17Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	27 Mar 2024
Alexey Kuznetsov	94358e57e7a	fs/fuse: pcs: improve workqueue schedulingAdd configurable flags to control workqueue scheduling and job forwarding to workqueues. 1. enable_cpu_wq - create the second workqueue pool, which is intended to perform cpu intensive never sleeping works. Default: 1 2. worker_flags - allow to tune flags on main workqueue. 0 - nothing. Default 1 - WQ_CPU_INTENSIVE 2 - WQ_UNBOUND 3. cpu_worker_flags - like worker_flags, but for...	27 Mar 2024	VSTOR-83607
Alexey Kuznetsov	647f0205973	fs/fuse: pcs: force use of crc32c-intelThis is workaround, we try to load crc32c-intel tfm explicitly. And when it is present (and it is), it will be used. It is enough for now. Nevertheless, this small patch deserves some comments. I see this as a probem to be solved. The problem is that at time when fuse module is loaded module crc32c-intel is not, so fuse uses very slow default crc implementation. I do not know any good solutio...	27 Mar 2024	VSTOR-83607
Alexey Kuznetsov	46f5388b054	fs/fuse: pcs: badly aligned requests were not handled correctlyMistakenly set error condition was not properly reset and did not allow properly finished request to complete clean. https://pmc.acronis.work/browse/VSTOR-83928 Signed-off-by: Alexey Kuznetsov <kuznet@acronis.com> Feature: vStorage	27 Mar 2024	VSTOR-83928
Konstantin Khorenko	380f2e4d674	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.16Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	11 Mar 2024
Keith Busch	ad39b46573b	ms/nvme-pci: add BOGUS_NID for Intel 0a54 deviceThese ones claim cmic and nmic capable, so need special consideration to ignore their duplicate identifiers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217981 Reported-by: welsh@cassens.com Signed-off-by: Keith Busch <kbusch@kernel.org> https://pmc.acronis.work/browse/TTASK-64922 (cherry picked from commit 5c3f4066462a5f6cac04d3dd81c9f551fabbc6c7) Signed-off-by: Konstantin Khorenko <kh...	12 Oct 2023	TTASK-64922
Konstantin Khorenko	6fcea7107c7	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.15Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	08 Mar 2024
Alexey Kuznetsov	a39ef62694a	fs/fuse/pcs: remove useless fuse callfuse makes an entirely noop yet wasting lots of time call to user space every fsync(). Actually, it was a surprize from mainstream, which used to corrupt vstorage in vz9 and late vz7. We put a strut to prevent corruption bd7d3266.. "fuse: illegal access to file in vstorage" Now it is time to remove this strut completely algother with this nonsensial call. NOTE: for vstorage only, flag close_...	27 Feb 2024	VSTOR-79527
Alexey Kuznetsov	acfd366ca1f	fs/ext4: relax inappropriate memory alignment checkThe bug is ancient, ext4 has been broken in mainstream since 2014, I guess this means nobody cares. Though right place of this patch is mainstream of course. It destroys performance of direct io in vz9 compared to vz7, where this place was still correct. Note, our memory buffers are aligned to 512 and we physically cannot get it even more coarse and there are exactly no reasons it would be not...	27 Feb 2024	VSTOR-79527
Pavel Tikhomirov	6ceb6de48cc	mm: migrate page private for high-order folios in swap cache correctlyWe have a check in do_swap_page that page from lookup_swap_cache should have private field equal to the swapcache index we searched it at (page_private(page) != entry.val). So folio_migrate_mapping should preserve page private for each page of a huge folio to satisfy this check else we get infinite loop in: +-> mmap_read_lock +-> __get_user_pages_locked +-> for-loop # taken once ...	06 Mar 2024	PSBM-153264
Charan Teja Kalla	15c99f6db9e	ms/mm: migrate high-order folios in swap cache correctlyLarge folios occupy N consecutive entries in the swap cache instead of using multi-index entries like the page cache. However, if a large folio is re-added to the LRU list, it can be migrated. The migration code was not aware of the difference between the swap cache and the page cache and assumed that a single xas_store() would be sufficient. This leaves potentially many stale pointers to th...	06 Mar 2024	PSBM-153264
Konstantin Khorenko	ef743830155	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.14Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	19 Feb 2024
Andrey Zhadchenko	eea68f04f8e	drivers/vhost/blk: increase max queues to 328 queues is too few, for example vhost-scsi has 128. Increase to 32 queues max, as we do not want to eat too much memory. As for performance, 32 queues will top around 3M iops, which should be fine for almost all cases. https://virtuozzo.atlassian.net/browse/PSBM-152241 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk ...	19 Feb 2024	PSBM-152241
Pavel Tikhomirov	9d5cbd19fa5	mm/vmscan: add rcu_read_lock to replace released shrinker_rwsemAfter commit [1] we release shrinker_rwsem for nfs while processing do_shrink_slab, we need this to mitigate blocked shrinker_rwsem due to a hang in nfs shrinker. After that we lack shrinker_rwsem and rcu_read_lock in these stacks: +-< rcu_dereference_protected(ockdep_is_held(&shrinker_rwsem)) +-< shrinker_info_protected +-< xchg_nr_deferred_memcg +-< xchg_nr_deferred ...	06 Feb 2024	PSBM-153973
Konstantin Khorenko	18e8246e00e	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.12Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	09 Feb 2024
Konstantin Khorenko	731f0a067d1	drm: Disable modeset by default1. VHS and HCI are server products => nice graphical console output is not that important. 2. QXL driver is buggy, we face issues with L1 VMs hungs during console writing using QXL driver. => let's disable framebuffer by default (like we always have "nomodeset" kernel boot option set). To reverse the situation, we add "modeset" kernel boot option which enables back framebuffer drivers u...	09 Feb 2024	VSTOR-81614
Konstantin Khorenko	4a866400040	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.11Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	05 Feb 2024
Alexey Kuznetsov	7dee4f82540	fs/fuse: hashed write bucketsPrevious write record was 3.5G/sec and this ceiling could not be penetrated even though eventloop had lots of spare cpu. The bottlneck is diagnosed as saturation of thread copying requests from kernel. So, we have to switch to spreading it over multiple threads, similar to scheme used for reads. So, for writes we introduce two-level table, keyed by request size and by inode hash. New record is...	05 Feb 2024	VSTOR-79527
Konstantin Khorenko	2efaade9626	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.10Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	05 Feb 2024
Matthew Wilcox (Oracle)	b669125e80b	ms/mm/swap: convert add_to_swap_cache() to take a folioWith all callers using folios, we can convert add_to_swap_cache() to take a folio and use it throughout. Link: https://lkml.kernel.org/r/20220902194653.1739778-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit a4c366f01f10073e0220656561b875627ff7cd90) https://virtuozzo.atlassi...	01 Feb 2024	PSBM-153264
Matthew Wilcox (Oracle)	3bb87414bb5	ms/mm/swap: convert __read_swap_cache_async() to use a folioRemove a few hidden (and one visible) calls to compound_head(). Link: https://lkml.kernel.org/r/20220902194653.1739778-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit a0d3374b070776e985bbd7b165b178fa688bf37a) Change: Also update vz specific hunk SetPageActive->folio_set_acti...	01 Feb 2024	PSBM-153264
Chen Aotian	74de53465d6	ms/netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNTFor memory alloc that store user data from nla[NFTA_OBJ_USERDATA], use GFP_KERNEL_ACCOUNT is more suitable. mFixes: 33758c891479 ("memcg: enable accounting for nft objects") Signed-off-by: Chen Aotian <chenaotian2@163.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> https://virtuozzo.atlassian.net/browse/PSBM-153598 (cherry picked from commit af0acf22aea359e04412237d68787401f96bb58...	06 Apr 2023	PSBM-153598
Konstantin Khorenko	0a48c8a012c	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.9Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	24 Jan 2024
Alexey Kuznetsov	f297f03d878	fs/fuse: enhanced splice support, fixesTwo bugs have been noticed by Kui Liu <kui.liu@acronis.com>: 1. We used only 2 words of 8 in onstack copy of user array 2. fdput in error path was missing, we could leak open file when daemon would supply non-pipe file descriptor https://pmc.acronis.work/browse/VSTOR-79527 Fixes: 72dcce0c8d21 ("fs/fuse: enhanced splice support") Signed-off-by: Alexey Kuznetsov <kuznet@acronis.com> Featur...	23 Jan 2024	VSTOR-79527
Alexey Kuznetsov	12779e53c69	net: zerocopy for unix socket, fixupsWe do not want to deal with SOCK_SEQPACKET sockets, as was noticed by Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Fallback for occasional splicing of zerocopied pages did not work, returned EINVAL. Not essential as we do not use it, still tests revealed this situation. So, repairing this. vstorage specific note: soon we enable zerocopy at server side and will have to choose between zerocopy ...	23 Jan 2024	VSTOR-79527
Konstantin Khorenko	3680b6bf6d6	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.8Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	23 Jan 2024
Pavel Tikhomirov	ac39f3b02e3	ms/netfilter: bridge: replace physindev with physinif in nf_bridge_infoAn skb can be added to a neigh->arp_queue while waiting for an arp reply. Where original skb's skb->dev can be different to neigh's neigh->dev. For instance in case of bridging dnated skb from one veth to another, the skb would be added to a neigh->arp_queue of the bridge. As skb->dev can be reset back to nf_bridge->physindev and used, and as there is no explicit mechanism that prevents this p...	23 Jan 2024	PSBM-153269
Pavel Tikhomirov	a2baff1c605	ms/netfilter: propagate net to nf_bridge_get_physindevThis is a preparation patch for replacing physindev with physinif on nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve device, when needed, and it requires net to be available. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit a54e721970...	23 Jan 2024	PSBM-153269
Pavel Tikhomirov	71ce14d3e16	ms/netfilter: nf_queue: remove excess nf_bridge variableWe don't really need nf_bridge variable here. And nf_bridge_info_exists is better replacement for nf_bridge_info_get in case we are only checking for existence. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit aeaa44075f8e49e2e0ad4507d925e690b7950145) http...	23 Jan 2024	PSBM-153269
Pavel Tikhomirov	9663dd170f2	ms/netfilter: nfnetlink_log: use proper helper for fetching physinifWe don't use physindev in __build_packet_message except for getting physinif from it. So let's switch to nf_bridge_get_physinif to get what we want directly. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit c3f9fd54cd87233f53bdf0e191a86b3a5e960e02) https:/...	23 Jan 2024	PSBM-153269
Pavel Tikhomirov	17ad03ac78b	drivers/vhost: fix missing rcu_read_lock in vhost_work_queueIn this stack: +-> vhost_vsock_dev_ioctl +-> vhost_vsock_start +-> vhost_work_queue +-> xas_find +-> xas_load +-> xas_start +-> xa_head +-> rcu_dereference_check We require either rcu_read_lock or xa_lock but have none. Let's fix it by calling a xa_find, which is a wraper for xas_find having proper rcu and also xas_retry ...	22 Jan 2024	PSBM-153264
Thomas Zeitlhofer	d9dd6ae87cd	ms/net: neigh: decrement the family specific qlenCommit 0ff4eb3d5ebb ("neighbour: make proxy_queue.qlen limit per-device") introduced the length counter qlen in struct neigh_parms. There are separate neigh_parms instances for IPv4/ARP and IPv6/ND, and while the family specific qlen is incremented in pneigh_enqueue(), the mentioned commit decrements always the IPv4/ARP specific qlen, regardless of the currently processed family, in pneigh_queu...	15 Nov 2022	2 Jira Issues
Konstantin Khorenko	9100cbc72cf	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.7Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	11 Jan 2024
Alexey Kuznetsov	72dcce0c8d2	fs/fuse: enhanced splice supportUnfortunately, existing support of splice in fuse is completely useless, it has many flaws, each of them is fatal, even taken separately. - it passes only single splice, which requires of user space to prepare one more splice to merge header. - ... and does not allow to use splices coming from TCP as they can be huge and do not fit to single pipe buffer. - it uses kvmalloc(!!!) for temp bu...	09 Jan 2024	VSTOR-79527
Alexey Kuznetsov	137e8807d5b	net: zerocopy over unix socketsObservation is that af_unix sockets today became slower and eat a lot of more cpu than 100G ethernet. So, implement MSG_ZEROCOPY over af_unix sockets to be able to talk to local services without collapse of performance. Unexpectedly, this makes sense! F.e. zerocopy cannot be done in TCP over loopback, because skbs when passing over loopback change ownership. But unix sockets traditionally impl...	09 Jan 2024	VSTOR-79527
Alexey Kuznetsov	105a147a0c2	fs/fuse: fuse queue routingGeneric fuse multiqueue support. It improves previously existing per-cpu routing and makes it extensible. At the moment three routing tactics are implemented and tested: 1. Old per-cpu routing. Deprecated, but left for performance comparisons. Also it still can be good in some situations. 2. Size buckets to support large fuse writes. Userspace selects it as default for fuse writes. 3. Ha...	09 Jan 2024	VSTOR-79527
Konstantin Khorenko	3cb059b77cd	OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.6Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>	09 Jan 2024
Konstantin Khorenko	b6a336efca8	configs: Enable in-kernel accelerator for virtio-blk guests in configs dirWe store precompiled config files for the convenience, so enable VHOST_BLK module there as well. https://virtuozzo.atlassian.net/browse/PSBM-139414 https://virtuozzo.atlassian.net/browse/PSBM-152375 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests	09 Jan 2024	2 Jira Issues
Konstantin Khorenko	04de95c1036	configs: Enable in-kernel accelerator for virtio-blk guestshttps://virtuozzo.atlassian.net/browse/PSBM-139414 https://virtuozzo.atlassian.net/browse/PSBM-152375 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests	15 Sep 2022	2 Jira Issues
Andrey Zhadchenko	e21e142cb12	drivers/vhost: vhost-blk accelerator for virtio-blk guestsAlthough QEMU virtio is quite fast, there is still some room for improvements. Disk latency can be reduced if we handle virito-blk requests in host kernel istead of passing them to QEMU. The patch adds vhost-blk kernel module to do so. Some test setups: fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128 QEMU drive options: cache=none filesystem: xfs SSD: \| r...	04 Jan 2024	4 Jira Issues
Andrey Zhadchenko	5f60cbb11d0	drivers/vhost: add ioctl to increase the number of workersFinally add ioctl to allow userspace to create additional workers For now only allow to increase the number of workers https://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> ====== Patchset description: vhost-blk: in-kernel accelerator for virtio-blk guests Although QEMU virtio-blk is quite fast, there is still some room for improvements. Dis...	04 Jan 2024	4 Jira Issues
Mike Christie	5271bf51f1b	ms/vhost: replace single worker pointer with xarrayThe next patch allows userspace to create multiple workers per device, so this patch replaces the vhost_worker pointer with an xarray so we can store mupltiple workers and look them up. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-15-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ======== Also rework vhost_work_...	04 Jan 2024	2 Jira Issues
Mike Christie	ab2f6961e8b	ms/vhost: convert poll work to be vq basedThis has the drivers pass in their poll to vq mapping and then converts the core poll code to use the vq based helpers. In the next patches we will allow vqs to be handled by different workers, so to allow drivers to execute operations like queue, stop, flush, etc on specific polls/vqs we need to know the mappings. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <2023062...	04 Jan 2024	2 Jira Issues
Andrey Zhadchenko	1bcb1e6e6d1	drivers/vhost: attach cgrous to specififc workerUpdate vhost_attach_cgroups() to operate with specific worker rather than global vhost device functions https://virtuozzo.atlassian.net/browse/PSBM-152375 https://virtuozzo.atlassian.net/browse/PSBM-139414 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests	04 Jan 2024	2 Jira Issues
Mike Christie	b33e9080a1e	ms/vhost: take worker or vq for flushingThis patch has the core work flush function take a worker. When we support multiple workers we can then flush each worker during device removal, stoppage, etc. It also adds a helper to flush specific virtqueues, so vhost-scsi can flush IO vqs from it's ctl vq. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-7-michael.christie@oracle.com> Signed-off-...	04 Jan 2024	2 Jira Issues
Mike Christie	db46389987f	ms/vhost: take worker or vq instead of dev for queueingThis patch has the core work queueing function take a worker for when we support multiple workers. It also adds a helper that takes a vq during queueing so modules can control which vq/worker to queue work on. This temp leaves vhost_work_queue. It will be removed when the drivers are converted in the next patches. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <2023062...	04 Jan 2024	2 Jira Issues
Mike Christie	312bc762cd9	ms/vhost, vhost_net: add helper to check if vq has workIn the next patches each vq might have different workers so one could have work but others do not. For net, we only want to check specific vqs, so this adds a helper to check if a vq has work pending and converts vhost-net to use it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230626232307.97930-5-michael.christie@oracle....	04 Jan 2024	2 Jira Issues
Mike Christie	ee7a2282666	ms/vhost: add vhost_worker pointer to vhost_virtqueueThis patchset allows userspace to map vqs to different workers. This patch adds a worker pointer to the vq so in later patches in this set we can queue/flush specific vqs and their workers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ========= (cherry picked from...	04 Jan 2024	2 Jira Issues
Mike Christie	a25b680efbd	ms/vhost: dynamically allocate vhost_workerThis patchset allows us to allocate multiple workers, so this has us move from the vhost_worker that's embedded in the vhost_dev to dynamically allocating it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ========= Half of this commit is already present. Add the res...	04 Jan 2024	2 Jira Issues
Konstantin Khorenko	e10e2fafaa6	FD: vhost-blk: in-kernel accelerator for virtio-blk guestshttps://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests	04 Jan 2024	PSBM-139414

vzkernel

Commits