Commits
Author | Commit | Message | Commit date | Issues | |
---|---|---|---|---|---|
Konstantin Khorenko | 9100cbc72cf | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.7Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Alexey Kuznetsov | 72dcce0c8d2 | fs/fuse: enhanced splice supportUnfortunately, existing support of splice in fuse is completely useless, it has many flaws, each of them is fatal, even taken separately. - it passes only single splice, which requires of user space to prepare one more splice to merge header. - ... and does not allow to use splices coming from TCP as they can be huge and do not fit to single pipe buffer. - it uses kvmalloc(!!!) for temp bu... | VSTOR-79527 | ||
Alexey Kuznetsov | 137e8807d5b | net: zerocopy over unix socketsObservation is that af_unix sockets today became slower and eat a lot of more cpu than 100G ethernet. So, implement MSG_ZEROCOPY over af_unix sockets to be able to talk to local services without collapse of performance. Unexpectedly, this makes sense! F.e. zerocopy cannot be done in TCP over loopback, because skbs when passing over loopback change ownership. But unix sockets traditionally impl... | VSTOR-79527 | ||
Alexey Kuznetsov | 105a147a0c2 | fs/fuse: fuse queue routingGeneric fuse multiqueue support. It improves previously existing per-cpu routing and makes it extensible. At the moment three routing tactics are implemented and tested: 1. Old per-cpu routing. Deprecated, but left for performance comparisons. Also it still can be good in some situations. 2. Size buckets to support large fuse writes. Userspace selects it as default for fuse writes. 3. Ha... | VSTOR-79527 | ||
Konstantin Khorenko | 3cb059b77cd | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.6Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Konstantin Khorenko | b6a336efca8 | configs: Enable in-kernel accelerator for virtio-blk guests in configs dirWe store precompiled config files for the convenience, so enable VHOST_BLK module there as well. https://virtuozzo.atlassian.net/browse/PSBM-139414 https://virtuozzo.atlassian.net/browse/PSBM-152375 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests | 2 Jira Issues | ||
Konstantin Khorenko | 04de95c1036 | configs: Enable in-kernel accelerator for virtio-blk guestshttps://virtuozzo.atlassian.net/browse/PSBM-139414 https://virtuozzo.atlassian.net/browse/PSBM-152375 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests | 2 Jira Issues | ||
Andrey Zhadchenko | e21e142cb12 | drivers/vhost: vhost-blk accelerator for virtio-blk guestsAlthough QEMU virtio is quite fast, there is still some room for improvements. Disk latency can be reduced if we handle virito-blk requests in host kernel istead of passing them to QEMU. The patch adds vhost-blk kernel module to do so. Some test setups: fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128 QEMU drive options: cache=none filesystem: xfs SSD: | r... | 4 Jira Issues | ||
Andrey Zhadchenko | 5f60cbb11d0 | drivers/vhost: add ioctl to increase the number of workersFinally add ioctl to allow userspace to create additional workers For now only allow to increase the number of workers https://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> ====== Patchset description: vhost-blk: in-kernel accelerator for virtio-blk guests Although QEMU virtio-blk is quite fast, there is still some room for improvements. Dis... | 4 Jira Issues | ||
Mike Christie | 5271bf51f1b | ms/vhost: replace single worker pointer with xarrayThe next patch allows userspace to create multiple workers per device, so this patch replaces the vhost_worker pointer with an xarray so we can store mupltiple workers and look them up. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-15-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ======== Also rework vhost_work_... | 2 Jira Issues | ||
Mike Christie | ab2f6961e8b | ms/vhost: convert poll work to be vq basedThis has the drivers pass in their poll to vq mapping and then converts the core poll code to use the vq based helpers. In the next patches we will allow vqs to be handled by different workers, so to allow drivers to execute operations like queue, stop, flush, etc on specific polls/vqs we need to know the mappings. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <2023062... | 2 Jira Issues | ||
Andrey Zhadchenko | 1bcb1e6e6d1 | drivers/vhost: attach cgrous to specififc workerUpdate vhost_attach_cgroups() to operate with specific worker rather than global vhost device functions https://virtuozzo.atlassian.net/browse/PSBM-152375 https://virtuozzo.atlassian.net/browse/PSBM-139414 Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests | 2 Jira Issues | ||
Mike Christie | b33e9080a1e | ms/vhost: take worker or vq for flushingThis patch has the core work flush function take a worker. When we support multiple workers we can then flush each worker during device removal, stoppage, etc. It also adds a helper to flush specific virtqueues, so vhost-scsi can flush IO vqs from it's ctl vq. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-7-michael.christie@oracle.com> Signed-off-... | 2 Jira Issues | ||
Mike Christie | db46389987f | ms/vhost: take worker or vq instead of dev for queueingThis patch has the core work queueing function take a worker for when we support multiple workers. It also adds a helper that takes a vq during queueing so modules can control which vq/worker to queue work on. This temp leaves vhost_work_queue. It will be removed when the drivers are converted in the next patches. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <2023062... | 2 Jira Issues | ||
Mike Christie | 312bc762cd9 | ms/vhost, vhost_net: add helper to check if vq has workIn the next patches each vq might have different workers so one could have work but others do not. For net, we only want to check specific vqs, so this adds a helper to check if a vq has work pending and converts vhost-net to use it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230626232307.97930-5-michael.christie@oracle.... | 2 Jira Issues | ||
Mike Christie | ee7a2282666 | ms/vhost: add vhost_worker pointer to vhost_virtqueueThis patchset allows userspace to map vqs to different workers. This patch adds a worker pointer to the vq so in later patches in this set we can queue/flush specific vqs and their workers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ========= (cherry picked from... | 2 Jira Issues | ||
Mike Christie | a25b680efbd | ms/vhost: dynamically allocate vhost_workerThis patchset allows us to allocate multiple workers, so this has us move from the vhost_worker that's embedded in the vhost_dev to dynamically allocating it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ========= Half of this commit is already present. Add the res... | 2 Jira Issues | ||
Konstantin Khorenko | e10e2fafaa6 | FD: vhost-blk: in-kernel accelerator for virtio-blk guestshttps://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests | PSBM-139414 | ||
Liu Kui | 14faea33294 | fs/fuse kio: destroy rdma_cm_id immediately in case cm fails during connection establishmentPreviously, if cm fails after the rio has been created, the rdma_cm_id would not be destroyed immediately. However the cm_id->context could still point to rc->id which would no longer be valid. This dealy create a window during which cm_id->context holds an illegal pointer. If an RMDA cm event arrives during this window, an illegal pointer dereference will happen, thus crashing the system. htt... | VSTOR-79838 | ||
Konstantin Khorenko | ff5a8ce86ad | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.5Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Alexey Kuznetsov | c4e2490aa51 | fs/fuse: multithread fuse writefuse user space creates cloned channel device and binds it to cpu. Kernel routes WRITE requests to these channels, which allows us to offload expensive reads from fuse device to multiple threads. At the moment we see significant improvements, about 30% in some major ostor workload. Signed-off-by: Alexey Kuznetsov <kuznet@acronis.com> Feature: fuse: multithread fuse write | |||
Konstantin Khorenko | 544856295cf | mm: Drop swap_cache_info reporting in vzstatMainstream has dropped swap_cache_info statistics: 442701e7058b ("mm/swap: remove swap_cache_info statistics") So we are dropping reporting it via /proc/vz/stats interface. We could leave the format of /proc/vz/stats file the same (it is an interface after all, should be stable), but as in vz9 we'll have so many changes, vzstat utility is also should be rewritten, so it's a good time to drop... | PSBM-152466 | ||
Konstantin Khorenko | 77567b1b78a | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.4Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Konstantin Khorenko | cd020db2561 | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.3Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Yuriy | a9f0b04bb37 | fs/fuse kio: skip truncating dropped cslistsIf try_cslist_get returns false, it indicates that the cslist has already been dropped and the map has been truncated. So, this cslist should not be handled. https://pmc.acronis.work/browse/VSTOR-76384 Signed-off-by: Yuriy Vasilev <yuriy.vasilev@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuznet@virtuozzo.com> | VSTOR-76384 | ||
Yuriy | 063c30f17ff | fs/fuse kio: skip handling dropped cslists in pcs_map_notify_addr_changeIf try_cslist_get returns false, it indicates that the cslist has been dropped and should not be handled without holding cs->lock. https://pmc.acronis.work/browse/VSTOR-76384 Signed-off-by: Yuriy Vasilev <yuriy.vasilev@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuznet@virtuozzo.com> | VSTOR-76384 | ||
Yuriy | a184bf61849 | fs/fuse kio: introduce try_cslist_get()This function allows checking if the cslist has been dropped before usage. https://pmc.acronis.work/browse/VSTOR-76384 Signed-off-by: Yuriy Vasilev <yuriy.vasilev@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuznet@virtuozzo.com> | VSTOR-76384 | ||
Yuriy | 5c38a1637ee | fs/fuse kio: do not allow getting cslist when refcnt is equal to 0When the refcnt of a cslist is equal to 0, it indicates that the cslist has been dropped and is going to be freed. In such cases, let's trigger a BUG_ON to prevent use after free. https://pmc.acronis.work/browse/VSTOR-76384 Signed-off-by: Yuriy Vasilev <yuriy.vasilev@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuznet@virtuozzo.com> | VSTOR-76384 | ||
Konstantin Khorenko | 3b5521b69b3 | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.2Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Alexander Atanasov | a86d1d9f0ea | ext4/mfsync: do not BUG_ON on wrong set of filesmfsync(...) can not sync files from different filesystems if passed such set of files it BUG_ONs. Instead of BUG return -EINVAL. https://pmc.acronis.work/browse/VSTOR-78331 Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuznet@virtuozzo.com> | VSTOR-78331 | ||
Pavel Tikhomirov | 8cf1c11d447 | mm/memcontrol: prohibit writing to memory.numa_migrate from containerWe might want to put containers on designated numa nodes for optimal perfomance, it will be all ruinied if container could force its memory pages to move to any node it wants. This memory.numa_migrate file was originaly made for vcmmd which works from ve0, so we should be fine with this additional restriction. Fixes: dfc0b63bfd50c ("mm: memcontrol: add memory.numa_migrate file") https://virtu... | PSBM-152372 | ||
Konstantin Khorenko | 4d08995e658 | sched: Do not set LBF_NEED_BREAK flag if scanned all the tasksAfter ms commit b0defa7ae03e ("sched/fair: Make sure to try to detach at least one movable task") detach_tasks() does not stop on the condition (env->loop > env->loop_max) in case no movable task found. Instead of that (if there are no movable tasks in the rq) exits always happen on the loop_break check - thus with LBF_NEED_BREAK flag set. It's not a problem for mainstream because load_balanc... | |||
Konstantin Khorenko | a486357cc95 | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.35.1Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Konstantin Khorenko | 5c2b8d6367e | OpenVZ kernel rh9-5.14.0-362.8.1.vz9.30.14Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> | |||
Konstantin Khorenko | a970393ab4a | configs: commit actual Virtuozzo 9 release and debug configsThose configs are generated by the following command: # cd redhat/configs/ && ARCH_MACH=x86_64 ./build_configs.sh kernel rhel Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Feature: internal | |||
Alexey Kuznetsov | fe71f02f6fa | fuse: pcs: dangerous typo in commit_sync_info()Unpleasant, shows the code never was in this place before. Fixes: 3202fa19f30e ("fuse: a protocol to reenable optimizations after replication finished") https://pmc.acronis.work/browse/VSTOR-77923 Signed-off-by: Alexey Kuznetsov <kuznet@acronis.com> | VSTOR-77923 | ||
Kui Liu | 60bf2c64fdd | fs/fuse kio: always ack RIO_MSG_RDMA_READ_REQ received from csdIn our userspace RDMA implementation, it is required that every RIO_MSG_RDMA_READ_REQ msg must be acked strictly in order. However this rule can be broken due to a bug in kio, which though is triggered by very abnormal hardware behaviour that it can take very long time (>10s) for a WR to complete. This happens in the read workload with large block size that the the client needs to issue RDMA R... | 4 Jira Issues | ||
Kui Liu | 9573bf29e31 | fs/fuse: make size of qhash and limit of each bucket module parametersBoth size of qhash and limit of each bucket can affect performance of certain workload significantly. There is no single set of value that'd be the best for all workload, we may need to choose a value based on workload, so it'd be better make them configurable. Here we choose the default value to be 16 (qhash size) x 256 (bucket limit). Signed-off-by: Liu Kui <Kui.Liu@acronis.com> Acked-by: A... | |||
Ilpo Järvinen | da615639c35 | ms/tty: Make ->set_termios() old ktermios constThere should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220816115739.10928-9-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Getting rid of compilation warnings. ht... | PSBM-148793 | ||
Ilpo Järvinen | 7fef9ee2ece | ms/usb: serial: Make ->set_termios() old ktermios constThere should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220816115739.10928-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Getting rid of compilation warnings. ht... | PSBM-148793 | ||
Ilpo Järvinen | 5067e1dd0a4 | ms/tty: Make ldisc ->set_termios() old ktermios constThere should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220816115739.10928-6-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Getting rid of compilation warnings. ht... | PSBM-148793 | ||
Ilpo Järvinen | 05e671ac027 | ms/serial: dz: Assume previous baudrate is validAssume previously used termios has a valid baudrate and use it directly. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220816115739.10928-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Getting ... | PSBM-148793 | ||
Kees Cook | e188502de22 | ms/treewide: Replace open-coded flex arrays in unionsIn support of enabling -Warray-bounds and -Wzero-length-bounds and correctly handling run-time memcpy() bounds checking, replace all open-coded flexible arrays (i.e. 0-element arrays) in unions with the DECLARE_FLEX_ARRAY() helper macro. This fixes warnings such as: fs/hpfs/anode.c: In function 'hpfs_add_sector_to_btree': fs/hpfs/anode.c:209:27: warning: array subscript 0 is outside the bound... | PSBM-148793 | ||
Alexander Atanasov | 46ad581166d | ve/userns: remove all hashed entries before freeing user_namespace548df8b4b57b (ve/userns: associate user_struct with the user_namespace, 2017-03-13) introduced dynamically allocated per-userns uid hastable instead of using a global static hash table. The problem with that allocate hashtable is that life cycle of the two objects is different - both structes use reference counts but they are counted separately. The contained objects (user_struct) are not refe... | PSBM-150648 | ||
Alexey Kuznetsov | 55c440bad3d | fuse: scalable queue limitingThis is missing element in previous scalability patch. We removes any limits on direct io submitted to cluster there, which is not right thing to do. The problem is not trivial. Experiments show we cannot do _any_ shared spinlock in this path, even empty lock-unlock added there reduces performance twice! So, we have to come with scalable solution not using locks. Several approaches were tried,... | VSTOR-54040 | ||
Alexey Kuznetsov | 34cba88f29f | fuse: skip bg_queue for async direct io pcs requestsThere is a capital problem in fuse pcs implementation. While requests scale by cpu we still have contention on bg_lock and all the requests go through single bottleneck at bg_queue. Of course we had inferior performance due to this, but we ignored the problem as the preformance still was good. But recently it was found that under some realistic curcumstances we get collapse of preformance, it ... | VSTOR-54040 | ||
Alexey Kuznetsov | 8620fd2a3a1 | fuse: pcs: new rpc affinity mode - RSSThe mode aligns socket io jobs to RSS, receive/transmit jobs are scheduled at cpus which is mapped by RSS from rpc socket. Precondition is multiqueue device with enabled RSS and XPS. If RSS and XPS are enabled, sockets are entirely localized to one cpu, they are not accessed from other cpus, which minimizes lock contention and keep perfect cache locality for socket data. Nevertheless, we have ... | VSTOR-54040 | ||
Alexey Kuznetsov | 153b63ea657 | fuse: pcs: split trace_printktrace_printk() is a function which is not desired in release kernels, if is referenced from a module, even if it is not actually used, it allocates lots of memory and scares people with some messages. What can we do? 1. Surround it with ifdef turned off in release kernels No. We need this at customer's environments to investigate actual problems, modules cannot be replaced with debuggin... | PSBM-146513 | ||
Alexey Kuznetsov | 33acf7985c5 | fuse: pcs: rpc timeout was incoherentThe code from user space was ported incorrectly without understanding how this actually works. This can result in lockup of failing connection. We have two timeouts - per-message timeout, when we cancel timed out request, but assume this is because of semantics of the request, f.e. CS needs to talk to another CS or to MDS, and that communitacion fails, which obviously does not mean _this_ conn... | VSTOR-54040 | ||
Alexey Kuznetsov | fbdab838e5b | fuse: do not accelerate writes with unknown dirty stateIf we have no sync seq numbers we cannot force dirty status. So, route writes via slow path. When CSes reply with dirty seqs, we will be able to make shortcut. https://pmc.acronis.work/browse/VSTOR-54040 Signed-off-by: Alexey Kuznetsov <kuznet@acronis.com> Feature: vStorage | VSTOR-54040 |