Commits
Author | Commit | Message | Commit date | Issues | |
---|---|---|---|---|---|
Konstantin Khorenko | 567f230d52c | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.17Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Alexey Kuznetsov | 94358e57e7a | fs/fuse: pcs: improve workqueue schedulingAdd configurable flags to control workqueue scheduling and job forwarding to workqueues. 1. enable_cpu_wq - create the second workqueue pool, which is intended to perform cpu intensive never sleeping works. Default: 1 2. worker_flags - allow to tune flags on main workqueue. 0 - nothing. Default 1 - WQ_CPU_INTENSIVE 2 - WQ_UNBOUND 3. cpu_worker_flags - like worker_flags, but for... | VSTOR-83607 | ||
Alexey Kuznetsov | 647f0205973 | fs/fuse: pcs: force use of crc32c-intelThis is workaround, we try to load crc32c-intel tfm explicitly. And when it is present (and it is), it will be used. It is enough for now. Nevertheless, this small patch deserves some comments. I see this as a probem to be solved. The problem is that at time when fuse module is loaded module crc32c-intel is not, so fuse uses very slow default crc implementation. I do not know any good solutio... | VSTOR-83607 | ||
Alexey Kuznetsov | 46f5388b054 | fs/fuse: pcs: badly aligned requests were not handled correctlyMistakenly set error condition was not properly reset and did not allow properly finished request to complete clean. https://pmc.acronis.work/browse/VSTOR-83928 Signed-off-by: Alexey Kuznetsov <kuznet@acronis.com> Feature: vStorage | VSTOR-83928 | ||
Konstantin Khorenko | 380f2e4d674 | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.16Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Keith Busch | ad39b46573b | ms/nvme-pci: add BOGUS_NID for Intel 0a54 deviceThese ones claim cmic and nmic capable, so need special consideration to ignore their duplicate identifiers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217981 Reported-by: welsh@cassens.com Signed-off-by: Keith Busch <kbusch@kernel.org> https://pmc.acronis.work/browse/TTASK-64922 (cherry picked from commit 5c3f4066462a5f6cac04d3dd81c9f551fabbc6c7) Signed-off-by: Konstantin Khorenko <kh... | TTASK-64922 | ||
Konstantin Khorenko | 6fcea7107c7 | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.15Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Alexey Kuznetsov | a39ef62694a | fs/fuse/pcs: remove useless fuse callfuse makes an entirely noop yet wasting lots of time call to user space every fsync(). Actually, it was a surprize from mainstream, which used to corrupt vstorage in vz9 and late vz7. We put a strut to prevent corruption bd7d3266.. "fuse: illegal access to file in vstorage" Now it is time to remove this strut completely algother with this nonsensial call. NOTE: for vstorage only, flag close_... | VSTOR-79527 | ||
Alexey Kuznetsov | acfd366ca1f | fs/ext4: relax inappropriate memory alignment checkThe bug is ancient, ext4 has been broken in mainstream since 2014, I guess this means nobody cares. Though right place of this patch is mainstream of course. It destroys performance of direct io in vz9 compared to vz7, where this place was still correct. Note, our memory buffers are aligned to 512 and we physically cannot get it even more coarse and there are exactly no reasons it would be not... | VSTOR-79527 | ||
Pavel Tikhomirov | 6ceb6de48cc | mm: migrate page private for high-order folios in swap cache correctlyWe have a check in do_swap_page that page from lookup_swap_cache should have private field equal to the swapcache index we searched it at (page_private(page) != entry.val). So folio_migrate_mapping should preserve page private for each page of a huge folio to satisfy this check else we get infinite loop in: +-> mmap_read_lock +-> __get_user_pages_locked +-> for-loop # taken once ... | PSBM-153264 | ||
Charan Teja Kalla | 15c99f6db9e | ms/mm: migrate high-order folios in swap cache correctlyLarge folios occupy N consecutive entries in the swap cache instead of using multi-index entries like the page cache. However, if a large folio is re-added to the LRU list, it can be migrated. The migration code was not aware of the difference between the swap cache and the page cache and assumed that a single xas_store() would be sufficient. This leaves potentially many stale pointers to th... | PSBM-153264 | ||
Konstantin Khorenko | ef743830155 | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.14Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Andrey Zhadchenko | eea68f04f8e | drivers/vhost/blk: increase max queues to 328 queues is too few, for example vhost-scsi has 128. Increase to 32 queues max, as we do not want to eat too much memory. As for performance, 32 queues will top around 3M iops, which should be fine for almost all cases. https://virtuozzo.atlassian.net/browse/PSBM-152241 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk ... | PSBM-152241 | ||
Pavel Tikhomirov | 9d5cbd19fa5 | mm/vmscan: add rcu_read_lock to replace released shrinker_rwsemAfter commit [1] we release shrinker_rwsem for nfs while processing do_shrink_slab, we need this to mitigate blocked shrinker_rwsem due to a hang in nfs shrinker. After that we lack shrinker_rwsem and rcu_read_lock in these stacks: +-< rcu_dereference_protected(ockdep_is_held(&shrinker_rwsem)) +-< shrinker_info_protected +-< xchg_nr_deferred_memcg +-< xchg_nr_deferred ... | PSBM-153973 | ||
Konstantin Khorenko | 18e8246e00e | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.12Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Konstantin Khorenko | 731f0a067d1 | drm: Disable modeset by default1. VHS and HCI are server products => nice graphical console output is not that important. 2. QXL driver is buggy, we face issues with L1 VMs hungs during console writing using QXL driver. => let's disable framebuffer by default (like we always have "nomodeset" kernel boot option set). To reverse the situation, we add "modeset" kernel boot option which enables back framebuffer drivers u... | VSTOR-81614 | ||
Konstantin Khorenko | 4a866400040 | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.11Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Alexey Kuznetsov | 7dee4f82540 | fs/fuse: hashed write bucketsPrevious write record was 3.5G/sec and this ceiling could not be penetrated even though eventloop had lots of spare cpu. The bottlneck is diagnosed as saturation of thread copying requests from kernel. So, we have to switch to spreading it over multiple threads, similar to scheme used for reads. So, for writes we introduce two-level table, keyed by request size and by inode hash. New record is... | VSTOR-79527 | ||
Konstantin Khorenko | 2efaade9626 | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.10Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Matthew Wilcox (Oracle) | b669125e80b | ms/mm/swap: convert add_to_swap_cache() to take a folioWith all callers using folios, we can convert add_to_swap_cache() to take a folio and use it throughout. Link: https://lkml.kernel.org/r/20220902194653.1739778-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit a4c366f01f10073e0220656561b875627ff7cd90) https://virtuozzo.atlassi... | PSBM-153264 | ||
Matthew Wilcox (Oracle) | 3bb87414bb5 | ms/mm/swap: convert __read_swap_cache_async() to use a folioRemove a few hidden (and one visible) calls to compound_head(). Link: https://lkml.kernel.org/r/20220902194653.1739778-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit a0d3374b070776e985bbd7b165b178fa688bf37a) Change: Also update vz specific hunk SetPageActive->folio_set_acti... | PSBM-153264 | ||
Chen Aotian | 74de53465d6 | ms/netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNTFor memory alloc that store user data from nla[NFTA_OBJ_USERDATA], use GFP_KERNEL_ACCOUNT is more suitable. mFixes: 33758c891479 ("memcg: enable accounting for nft objects") Signed-off-by: Chen Aotian <chenaotian2@163.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> https://virtuozzo.atlassian.net/browse/PSBM-153598 (cherry picked from commit af0acf22aea359e04412237d68787401f96bb58... | PSBM-153598 | ||
Konstantin Khorenko | 0a48c8a012c | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.9Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Alexey Kuznetsov | f297f03d878 | fs/fuse: enhanced splice support, fixesTwo bugs have been noticed by Kui Liu <kui.liu@acronis.com>: 1. We used only 2 words of 8 in onstack copy of user array 2. fdput in error path was missing, we could leak open file when daemon would supply non-pipe file descriptor https://pmc.acronis.work/browse/VSTOR-79527 Fixes: 72dcce0c8d21 ("fs/fuse: enhanced splice support") Signed-off-by: Alexey Kuznetsov <kuznet@acronis.com> Featur... | VSTOR-79527 | ||
Alexey Kuznetsov | 12779e53c69 | net: zerocopy for unix socket, fixupsWe do not want to deal with SOCK_SEQPACKET sockets, as was noticed by Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Fallback for occasional splicing of zerocopied pages did not work, returned EINVAL. Not essential as we do not use it, still tests revealed this situation. So, repairing this. vstorage specific note: soon we enable zerocopy at server side and will have to choose between zerocopy ... | VSTOR-79527 | ||
Konstantin Khorenko | 3680b6bf6d6 | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.8Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Pavel Tikhomirov | ac39f3b02e3 | ms/netfilter: bridge: replace physindev with physinif in nf_bridge_infoAn skb can be added to a neigh->arp_queue while waiting for an arp reply. Where original skb's skb->dev can be different to neigh's neigh->dev. For instance in case of bridging dnated skb from one veth to another, the skb would be added to a neigh->arp_queue of the bridge. As skb->dev can be reset back to nf_bridge->physindev and used, and as there is no explicit mechanism that prevents this p... | PSBM-153269 | ||
Pavel Tikhomirov | a2baff1c605 | ms/netfilter: propagate net to nf_bridge_get_physindevThis is a preparation patch for replacing physindev with physinif on nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve device, when needed, and it requires net to be available. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit a54e721970... | PSBM-153269 | ||
Pavel Tikhomirov | 71ce14d3e16 | ms/netfilter: nf_queue: remove excess nf_bridge variableWe don't really need nf_bridge variable here. And nf_bridge_info_exists is better replacement for nf_bridge_info_get in case we are only checking for existence. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit aeaa44075f8e49e2e0ad4507d925e690b7950145) http... | PSBM-153269 | ||
Pavel Tikhomirov | 9663dd170f2 | ms/netfilter: nfnetlink_log: use proper helper for fetching physinifWe don't use physindev in __build_packet_message except for getting physinif from it. So let's switch to nf_bridge_get_physinif to get what we want directly. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit c3f9fd54cd87233f53bdf0e191a86b3a5e960e02) https:/... | PSBM-153269 | ||
Pavel Tikhomirov | 17ad03ac78b | drivers/vhost: fix missing rcu_read_lock in vhost_work_queueIn this stack: +-> vhost_vsock_dev_ioctl +-> vhost_vsock_start +-> vhost_work_queue +-> xas_find +-> xas_load +-> xas_start +-> xa_head +-> rcu_dereference_check We require either rcu_read_lock or xa_lock but have none. Let's fix it by calling a xa_find, which is a wraper for xas_find having proper rcu and also xas_retry ... | PSBM-153264 | ||
Thomas Zeitlhofer | d9dd6ae87cd | ms/net: neigh: decrement the family specific qlenCommit 0ff4eb3d5ebb ("neighbour: make proxy_queue.qlen limit per-device") introduced the length counter qlen in struct neigh_parms. There are separate neigh_parms instances for IPv4/ARP and IPv6/ND, and while the family specific qlen is incremented in pneigh_enqueue(), the mentioned commit decrements always the IPv4/ARP specific qlen, regardless of the currently processed family, in pneigh_queu... | 2 Jira Issues | ||
Konstantin Khorenko | 9100cbc72cf | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.7Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Alexey Kuznetsov | 72dcce0c8d2 | fs/fuse: enhanced splice supportUnfortunately, existing support of splice in fuse is completely useless, it has many flaws, each of them is fatal, even taken separately. - it passes only single splice, which requires of user space to prepare one more splice to merge header. - ... and does not allow to use splices coming from TCP as they can be huge and do not fit to single pipe buffer. - it uses kvmalloc(!!!) for temp bu... | VSTOR-79527 | ||
Alexey Kuznetsov | 137e8807d5b | net: zerocopy over unix socketsObservation is that af_unix sockets today became slower and eat a lot of more cpu than 100G ethernet. So, implement MSG_ZEROCOPY over af_unix sockets to be able to talk to local services without collapse of performance. Unexpectedly, this makes sense! F.e. zerocopy cannot be done in TCP over loopback, because skbs when passing over loopback change ownership. But unix sockets traditionally impl... | VSTOR-79527 | ||
Alexey Kuznetsov | 105a147a0c2 | fs/fuse: fuse queue routingGeneric fuse multiqueue support. It improves previously existing per-cpu routing and makes it extensible. At the moment three routing tactics are implemented and tested: 1. Old per-cpu routing. Deprecated, but left for performance comparisons. Also it still can be good in some situations. 2. Size buckets to support large fuse writes. Userspace selects it as default for fuse writes. 3. Ha... | VSTOR-79527 | ||
Konstantin Khorenko | 3cb059b77cd | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.6Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Konstantin Khorenko | b6a336efca8 | configs: Enable in-kernel accelerator for virtio-blk guests in configs dirWe store precompiled config files for the convenience, so enable VHOST_BLK module there as well. https://virtuozzo.atlassian.net/browse/PSBM-139414 https://virtuozzo.atlassian.net/browse/PSBM-152375 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests | 2 Jira Issues | ||
Konstantin Khorenko | 04de95c1036 | configs: Enable in-kernel accelerator for virtio-blk guestshttps://virtuozzo.atlassian.net/browse/PSBM-139414 https://virtuozzo.atlassian.net/browse/PSBM-152375 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests | 2 Jira Issues | ||
Andrey Zhadchenko | e21e142cb12 | drivers/vhost: vhost-blk accelerator for virtio-blk guestsAlthough QEMU virtio is quite fast, there is still some room for improvements. Disk latency can be reduced if we handle virito-blk requests in host kernel istead of passing them to QEMU. The patch adds vhost-blk kernel module to do so. Some test setups: fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128 QEMU drive options: cache=none filesystem: xfs SSD: | r... | 4 Jira Issues | ||
Andrey Zhadchenko | 5f60cbb11d0 | drivers/vhost: add ioctl to increase the number of workersFinally add ioctl to allow userspace to create additional workers For now only allow to increase the number of workers https://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> ====== Patchset description: vhost-blk: in-kernel accelerator for virtio-blk guests Although QEMU virtio-blk is quite fast, there is still some room for improvements. Dis... | 4 Jira Issues | ||
Mike Christie | 5271bf51f1b | ms/vhost: replace single worker pointer with xarrayThe next patch allows userspace to create multiple workers per device, so this patch replaces the vhost_worker pointer with an xarray so we can store mupltiple workers and look them up. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-15-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ======== Also rework vhost_work_... | 2 Jira Issues | ||
Mike Christie | ab2f6961e8b | ms/vhost: convert poll work to be vq basedThis has the drivers pass in their poll to vq mapping and then converts the core poll code to use the vq based helpers. In the next patches we will allow vqs to be handled by different workers, so to allow drivers to execute operations like queue, stop, flush, etc on specific polls/vqs we need to know the mappings. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <2023062... | 2 Jira Issues | ||
Andrey Zhadchenko | 1bcb1e6e6d1 | drivers/vhost: attach cgrous to specififc workerUpdate vhost_attach_cgroups() to operate with specific worker rather than global vhost device functions https://virtuozzo.atlassian.net/browse/PSBM-152375 https://virtuozzo.atlassian.net/browse/PSBM-139414 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests | 2 Jira Issues | ||
Mike Christie | b33e9080a1e | ms/vhost: take worker or vq for flushingThis patch has the core work flush function take a worker. When we support multiple workers we can then flush each worker during device removal, stoppage, etc. It also adds a helper to flush specific virtqueues, so vhost-scsi can flush IO vqs from it's ctl vq. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-7-michael.christie@oracle.com> Signed-off-... | 2 Jira Issues | ||
Mike Christie | db46389987f | ms/vhost: take worker or vq instead of dev for queueingThis patch has the core work queueing function take a worker for when we support multiple workers. It also adds a helper that takes a vq during queueing so modules can control which vq/worker to queue work on. This temp leaves vhost_work_queue. It will be removed when the drivers are converted in the next patches. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <2023062... | 2 Jira Issues | ||
Mike Christie | 312bc762cd9 | ms/vhost, vhost_net: add helper to check if vq has workIn the next patches each vq might have different workers so one could have work but others do not. For net, we only want to check specific vqs, so this adds a helper to check if a vq has work pending and converts vhost-net to use it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230626232307.97930-5-michael.christie@oracle.... | 2 Jira Issues | ||
Mike Christie | ee7a2282666 | ms/vhost: add vhost_worker pointer to vhost_virtqueueThis patchset allows userspace to map vqs to different workers. This patch adds a worker pointer to the vq so in later patches in this set we can queue/flush specific vqs and their workers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ========= (cherry picked from... | 2 Jira Issues | ||
Mike Christie | a25b680efbd | ms/vhost: dynamically allocate vhost_workerThis patchset allows us to allocate multiple workers, so this has us move from the vhost_worker that's embedded in the vhost_dev to dynamically allocating it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ========= Half of this commit is already present. Add the res... | 2 Jira Issues | ||
Konstantin Khorenko | e10e2fafaa6 | FD: vhost-blk: in-kernel accelerator for virtio-blk guestshttps://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests | PSBM-139414 |