OpenVZ Source code
  1. OpenVZ Source code

vzkernel

Public
AuthorCommitMessageCommit dateIssues
Konstantin KhorenkoKonstantin Khorenko
567f230d52cOpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.17Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
94358e57e7afs/fuse: pcs: improve workqueue schedulingAdd configurable flags to control workqueue scheduling and job forwarding to workqueues. 1. enable_cpu_wq - create the second workqueue pool, which is intended to perform cpu intensive never sleeping works. Default: 1 2. worker_flags - allow to tune flags on main workqueue. 0 - nothing. Default 1 - WQ_CPU_INTENSIVE 2 - WQ_UNBOUND 3. cpu_worker_flags - like worker_flags, but for...VSTOR-83607
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
647f0205973fs/fuse: pcs: force use of crc32c-intelThis is workaround, we try to load crc32c-intel tfm explicitly. And when it is present (and it is), it will be used. It is enough for now. Nevertheless, this small patch deserves some comments. I see this as a probem to be solved. The problem is that at time when fuse module is loaded module crc32c-intel is not, so fuse uses very slow default crc implementation. I do not know any good solutio...VSTOR-83607
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
46f5388b054fs/fuse: pcs: badly aligned requests were not handled correctlyMistakenly set error condition was not properly reset and did not allow properly finished request to complete clean. https://pmc.acronis.work/browse/VSTOR-83928 Signed-off-by: Alexey Kuznetsov <kuznet@acronis.com> Feature: vStorageVSTOR-83928
Konstantin KhorenkoKonstantin Khorenko
380f2e4d674OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.16Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Keith BuschKonstantin KhorenkoKeith Busch
ad39b46573bms/nvme-pci: add BOGUS_NID for Intel 0a54 deviceThese ones claim cmic and nmic capable, so need special consideration to ignore their duplicate identifiers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217981 Reported-by: welsh@cassens.com Signed-off-by: Keith Busch <kbusch@kernel.org> https://pmc.acronis.work/browse/TTASK-64922 (cherry picked from commit 5c3f4066462a5f6cac04d3dd81c9f551fabbc6c7) Signed-off-by: Konstantin Khorenko <kh...TTASK-64922
Konstantin KhorenkoKonstantin Khorenko
6fcea7107c7OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.15Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
a39ef62694afs/fuse/pcs: remove useless fuse callfuse makes an entirely noop yet wasting lots of time call to user space every fsync(). Actually, it was a surprize from mainstream, which used to corrupt vstorage in vz9 and late vz7. We put a strut to prevent corruption bd7d3266.. "fuse: illegal access to file in vstorage" Now it is time to remove this strut completely algother with this nonsensial call. NOTE: for vstorage only, flag close_...VSTOR-79527
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
acfd366ca1ffs/ext4: relax inappropriate memory alignment checkThe bug is ancient, ext4 has been broken in mainstream since 2014, I guess this means nobody cares. Though right place of this patch is mainstream of course. It destroys performance of direct io in vz9 compared to vz7, where this place was still correct. Note, our memory buffers are aligned to 512 and we physically cannot get it even more coarse and there are exactly no reasons it would be not...VSTOR-79527
Pavel TikhomirovKonstantin KhorenkoPavel Tikhomirov
6ceb6de48ccmm: migrate page private for high-order folios in swap cache correctlyWe have a check in do_swap_page that page from lookup_swap_cache should have private field equal to the swapcache index we searched it at (page_private(page) != entry.val). So folio_migrate_mapping should preserve page private for each page of a huge folio to satisfy this check else we get infinite loop in: +-> mmap_read_lock +-> __get_user_pages_locked +-> for-loop # taken once ...PSBM-153264
Charan Teja KallaKonstantin KhorenkoCharan Teja Kalla
15c99f6db9ems/mm: migrate high-order folios in swap cache correctlyLarge folios occupy N consecutive entries in the swap cache instead of using multi-index entries like the page cache. However, if a large folio is re-added to the LRU list, it can be migrated. The migration code was not aware of the difference between the swap cache and the page cache and assumed that a single xas_store() would be sufficient. This leaves potentially many stale pointers to th...PSBM-153264
Konstantin KhorenkoKonstantin Khorenko
ef743830155OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.14Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Andrey ZhadchenkoKonstantin KhorenkoAndrey Zhadchenko
eea68f04f8edrivers/vhost/blk: increase max queues to 328 queues is too few, for example vhost-scsi has 128. Increase to 32 queues max, as we do not want to eat too much memory. As for performance, 32 queues will top around 3M iops, which should be fine for almost all cases. https://virtuozzo.atlassian.net/browse/PSBM-152241 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk ...PSBM-152241
Pavel TikhomirovKonstantin KhorenkoPavel Tikhomirov
9d5cbd19fa5mm/vmscan: add rcu_read_lock to replace released shrinker_rwsemAfter commit [1] we release shrinker_rwsem for nfs while processing do_shrink_slab, we need this to mitigate blocked shrinker_rwsem due to a hang in nfs shrinker. After that we lack shrinker_rwsem and rcu_read_lock in these stacks: +-< rcu_dereference_protected(ockdep_is_held(&shrinker_rwsem)) +-< shrinker_info_protected +-< xchg_nr_deferred_memcg +-< xchg_nr_deferred ...PSBM-153973
Konstantin KhorenkoKonstantin Khorenko
18e8246e00eOpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.12Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Konstantin KhorenkoKonstantin Khorenko
731f0a067d1drm: Disable modeset by default1. VHS and HCI are server products => nice graphical console output is not that important. 2. QXL driver is buggy, we face issues with L1 VMs hungs during console writing using QXL driver. => let's disable framebuffer by default (like we always have "nomodeset" kernel boot option set). To reverse the situation, we add "modeset" kernel boot option which enables back framebuffer drivers u...VSTOR-81614
Konstantin KhorenkoKonstantin Khorenko
4a866400040OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.11Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
7dee4f82540fs/fuse: hashed write bucketsPrevious write record was 3.5G/sec and this ceiling could not be penetrated even though eventloop had lots of spare cpu. The bottlneck is diagnosed as saturation of thread copying requests from kernel. So, we have to switch to spreading it over multiple threads, similar to scheme used for reads. So, for writes we introduce two-level table, keyed by request size and by inode hash. New record is...VSTOR-79527
Konstantin KhorenkoKonstantin Khorenko
2efaade9626OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.10Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Matthew Wilcox (Oracle)Konstantin KhorenkoMatthew Wilcox (Oracle)
b669125e80bms/mm/swap: convert add_to_swap_cache() to take a folioWith all callers using folios, we can convert add_to_swap_cache() to take a folio and use it throughout. Link: https://lkml.kernel.org/r/20220902194653.1739778-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit a4c366f01f10073e0220656561b875627ff7cd90) https://virtuozzo.atlassi...PSBM-153264
Matthew Wilcox (Oracle)Konstantin KhorenkoMatthew Wilcox (Oracle)
3bb87414bb5ms/mm/swap: convert __read_swap_cache_async() to use a folioRemove a few hidden (and one visible) calls to compound_head(). Link: https://lkml.kernel.org/r/20220902194653.1739778-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit a0d3374b070776e985bbd7b165b178fa688bf37a) Change: Also update vz specific hunk SetPageActive->folio_set_acti...PSBM-153264
Chen AotianKonstantin KhorenkoChen Aotian
74de53465d6ms/netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNTFor memory alloc that store user data from nla[NFTA_OBJ_USERDATA], use GFP_KERNEL_ACCOUNT is more suitable. mFixes: 33758c891479 ("memcg: enable accounting for nft objects") Signed-off-by: Chen Aotian <chenaotian2@163.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> https://virtuozzo.atlassian.net/browse/PSBM-153598 (cherry picked from commit af0acf22aea359e04412237d68787401f96bb58...PSBM-153598
Konstantin KhorenkoKonstantin Khorenko
0a48c8a012cOpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.9Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
f297f03d878fs/fuse: enhanced splice support, fixesTwo bugs have been noticed by Kui Liu <kui.liu@acronis.com>: 1. We used only 2 words of 8 in onstack copy of user array 2. fdput in error path was missing, we could leak open file when daemon would supply non-pipe file descriptor https://pmc.acronis.work/browse/VSTOR-79527 Fixes: 72dcce0c8d21 ("fs/fuse: enhanced splice support") Signed-off-by: Alexey Kuznetsov <kuznet@acronis.com> Featur...VSTOR-79527
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
12779e53c69net: zerocopy for unix socket, fixupsWe do not want to deal with SOCK_SEQPACKET sockets, as was noticed by Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Fallback for occasional splicing of zerocopied pages did not work, returned EINVAL. Not essential as we do not use it, still tests revealed this situation. So, repairing this. vstorage specific note: soon we enable zerocopy at server side and will have to choose between zerocopy ...VSTOR-79527
Konstantin KhorenkoKonstantin Khorenko
3680b6bf6d6OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.8Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Pavel TikhomirovKonstantin KhorenkoPavel Tikhomirov
ac39f3b02e3ms/netfilter: bridge: replace physindev with physinif in nf_bridge_infoAn skb can be added to a neigh->arp_queue while waiting for an arp reply. Where original skb's skb->dev can be different to neigh's neigh->dev. For instance in case of bridging dnated skb from one veth to another, the skb would be added to a neigh->arp_queue of the bridge. As skb->dev can be reset back to nf_bridge->physindev and used, and as there is no explicit mechanism that prevents this p...PSBM-153269
Pavel TikhomirovKonstantin KhorenkoPavel Tikhomirov
a2baff1c605ms/netfilter: propagate net to nf_bridge_get_physindevThis is a preparation patch for replacing physindev with physinif on nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve device, when needed, and it requires net to be available. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit a54e721970...PSBM-153269
Pavel TikhomirovKonstantin KhorenkoPavel Tikhomirov
71ce14d3e16ms/netfilter: nf_queue: remove excess nf_bridge variableWe don't really need nf_bridge variable here. And nf_bridge_info_exists is better replacement for nf_bridge_info_get in case we are only checking for existence. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit aeaa44075f8e49e2e0ad4507d925e690b7950145) http...PSBM-153269
Pavel TikhomirovKonstantin KhorenkoPavel Tikhomirov
9663dd170f2ms/netfilter: nfnetlink_log: use proper helper for fetching physinifWe don't use physindev in __build_packet_message except for getting physinif from it. So let's switch to nf_bridge_get_physinif to get what we want directly. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit c3f9fd54cd87233f53bdf0e191a86b3a5e960e02) https:/...PSBM-153269
Pavel TikhomirovKonstantin KhorenkoPavel Tikhomirov
17ad03ac78bdrivers/vhost: fix missing rcu_read_lock in vhost_work_queueIn this stack: +-> vhost_vsock_dev_ioctl +-> vhost_vsock_start +-> vhost_work_queue +-> xas_find +-> xas_load +-> xas_start +-> xa_head +-> rcu_dereference_check We require either rcu_read_lock or xa_lock but have none. Let's fix it by calling a xa_find, which is a wraper for xas_find having proper rcu and also xas_retry ...PSBM-153264
Thomas ZeitlhoferKonstantin KhorenkoThomas Zeitlhofer
d9dd6ae87cdms/net: neigh: decrement the family specific qlenCommit 0ff4eb3d5ebb ("neighbour: make proxy_queue.qlen limit per-device") introduced the length counter qlen in struct neigh_parms. There are separate neigh_parms instances for IPv4/ARP and IPv6/ND, and while the family specific qlen is incremented in pneigh_enqueue(), the mentioned commit decrements always the IPv4/ARP specific qlen, regardless of the currently processed family, in pneigh_queu...2 Jira Issues
Konstantin KhorenkoKonstantin Khorenko
9100cbc72cfOpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.7Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
72dcce0c8d2fs/fuse: enhanced splice supportUnfortunately, existing support of splice in fuse is completely useless, it has many flaws, each of them is fatal, even taken separately. - it passes only single splice, which requires of user space to prepare one more splice to merge header. - ... and does not allow to use splices coming from TCP as they can be huge and do not fit to single pipe buffer. - it uses kvmalloc(!!!) for temp bu...VSTOR-79527
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
137e8807d5bnet: zerocopy over unix socketsObservation is that af_unix sockets today became slower and eat a lot of more cpu than 100G ethernet. So, implement MSG_ZEROCOPY over af_unix sockets to be able to talk to local services without collapse of performance. Unexpectedly, this makes sense! F.e. zerocopy cannot be done in TCP over loopback, because skbs when passing over loopback change ownership. But unix sockets traditionally impl...VSTOR-79527
Alexey KuznetsovKonstantin KhorenkoAlexey Kuznetsov
105a147a0c2fs/fuse: fuse queue routingGeneric fuse multiqueue support. It improves previously existing per-cpu routing and makes it extensible. At the moment three routing tactics are implemented and tested: 1. Old per-cpu routing. Deprecated, but left for performance comparisons. Also it still can be good in some situations. 2. Size buckets to support large fuse writes. Userspace selects it as default for fuse writes. 3. Ha...VSTOR-79527
Konstantin KhorenkoKonstantin Khorenko
3cb059b77cdOpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.6Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Konstantin KhorenkoKonstantin Khorenko
b6a336efca8configs: Enable in-kernel accelerator for virtio-blk guests in configs dirWe store precompiled config files for the convenience, so enable VHOST_BLK module there as well. https://virtuozzo.atlassian.net/browse/PSBM-139414 https://virtuozzo.atlassian.net/browse/PSBM-152375 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests2 Jira Issues
Konstantin KhorenkoKonstantin Khorenko
04de95c1036configs: Enable in-kernel accelerator for virtio-blk guestshttps://virtuozzo.atlassian.net/browse/PSBM-139414 https://virtuozzo.atlassian.net/browse/PSBM-152375 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests2 Jira Issues
Andrey ZhadchenkoKonstantin KhorenkoAndrey Zhadchenko
e21e142cb12drivers/vhost: vhost-blk accelerator for virtio-blk guestsAlthough QEMU virtio is quite fast, there is still some room for improvements. Disk latency can be reduced if we handle virito-blk requests in host kernel istead of passing them to QEMU. The patch adds vhost-blk kernel module to do so. Some test setups: fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128 QEMU drive options: cache=none filesystem: xfs SSD: | r...4 Jira Issues
Andrey ZhadchenkoKonstantin KhorenkoAndrey Zhadchenko
5f60cbb11d0drivers/vhost: add ioctl to increase the number of workersFinally add ioctl to allow userspace to create additional workers For now only allow to increase the number of workers https://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> ====== Patchset description: vhost-blk: in-kernel accelerator for virtio-blk guests Although QEMU virtio-blk is quite fast, there is still some room for improvements. Dis...4 Jira Issues
Mike ChristieKonstantin KhorenkoMike Christie
5271bf51f1bms/vhost: replace single worker pointer with xarrayThe next patch allows userspace to create multiple workers per device, so this patch replaces the vhost_worker pointer with an xarray so we can store mupltiple workers and look them up. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-15-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ======== Also rework vhost_work_...2 Jira Issues
Mike ChristieKonstantin KhorenkoMike Christie
ab2f6961e8bms/vhost: convert poll work to be vq basedThis has the drivers pass in their poll to vq mapping and then converts the core poll code to use the vq based helpers. In the next patches we will allow vqs to be handled by different workers, so to allow drivers to execute operations like queue, stop, flush, etc on specific polls/vqs we need to know the mappings. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <2023062...2 Jira Issues
Andrey ZhadchenkoKonstantin KhorenkoAndrey Zhadchenko
1bcb1e6e6d1drivers/vhost: attach cgrous to specififc workerUpdate vhost_attach_cgroups() to operate with specific worker rather than global vhost device functions https://virtuozzo.atlassian.net/browse/PSBM-152375 https://virtuozzo.atlassian.net/browse/PSBM-139414 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests2 Jira Issues
Mike ChristieKonstantin KhorenkoMike Christie
b33e9080a1ems/vhost: take worker or vq for flushingThis patch has the core work flush function take a worker. When we support multiple workers we can then flush each worker during device removal, stoppage, etc. It also adds a helper to flush specific virtqueues, so vhost-scsi can flush IO vqs from it's ctl vq. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-7-michael.christie@oracle.com> Signed-off-...2 Jira Issues
Mike ChristieKonstantin KhorenkoMike Christie
db46389987fms/vhost: take worker or vq instead of dev for queueingThis patch has the core work queueing function take a worker for when we support multiple workers. It also adds a helper that takes a vq during queueing so modules can control which vq/worker to queue work on. This temp leaves vhost_work_queue. It will be removed when the drivers are converted in the next patches. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <2023062...2 Jira Issues
Mike ChristieKonstantin KhorenkoMike Christie
312bc762cd9ms/vhost, vhost_net: add helper to check if vq has workIn the next patches each vq might have different workers so one could have work but others do not. For net, we only want to check specific vqs, so this adds a helper to check if a vq has work pending and converts vhost-net to use it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230626232307.97930-5-michael.christie@oracle....2 Jira Issues
Mike ChristieKonstantin KhorenkoMike Christie
ee7a2282666ms/vhost: add vhost_worker pointer to vhost_virtqueueThis patchset allows userspace to map vqs to different workers. This patch adds a worker pointer to the vq so in later patches in this set we can queue/flush specific vqs and their workers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ========= (cherry picked from...2 Jira Issues
Mike ChristieKonstantin KhorenkoMike Christie
a25b680efbdms/vhost: dynamically allocate vhost_workerThis patchset allows us to allocate multiple workers, so this has us move from the vhost_worker that's embedded in the vhost_dev to dynamically allocating it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ========= Half of this commit is already present. Add the res...2 Jira Issues
Konstantin KhorenkoKonstantin Khorenko
e10e2fafaa6FD: vhost-blk: in-kernel accelerator for virtio-blk guestshttps://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guestsPSBM-139414