[IOPRIO] request gets a beancounterPreviously, we could release a beancounter that has requests in-flight.
Now, every request gets a ub, so beancounter can disappear only after
all requests are accomplished. This reasoning concerns async requests
only. Also note, that now the final put of beancounter can occur with
queue lock held, so we modify release_beancounter() function in order it
always use another thread for actual relea...
As per debug session and discussions with Pavel, dealing with kmemsizeprecharges was done bit early, i.e. kmem_cache_free(sighand) could
shortcut and bump kmemsize precharge again when freeing sighand, after
ub_task_uncharge() dealt with it.
So we need to deal with kmemsize precharged in ub_task_put()
[IOPRIO] get queue lock on async queues putWhen beancounter dies all async queues assosiated with it are removed
from hash list. This operation should be protected by queue lock.
[IOPRIO] excess list_del in forced dispatching caseWe don't need to delete cfq_bc_data from active list in forced
dispatching case, because it happens automatically:
cfq_forced_dispatch_cfqqs() -> cfq_dispatch_insert() ->
cfq_remove_request() -> cfq_del_crq_rb() -> cfq_del_cfqq_rr() -> ...
[VETH] MAC filtering on veth device1. enabled MAC address setting on veth devices from inside VE
2. enabled MAC filtering on veth by default.
3. also added checks for correctness (non-zero and non-multicast) of address.
Fix for over-optimization of OTHERSOCKBUF accounting. For those sockets there is no protection by socket sock.Bug was provoked by optimization of charging/uncharging othersockbufs:
diff-ubc-tcpsndopt-20060429
In brief idea is the following: optimization is based on assumption that soket
is always locked by lock_sock and protected from using the socket by more
than one users simultaneously. But current assumption is wrong for datagram
sockets (for example PF_UNIX ones), that are not locked in the majo...
[PATCH] buffer: memorder fixunlock_buffer(), like unlock_page(), must not clear the lock without
ensuring that the critical section is closed.
Mingming later sent the same patch, saying:
We are running SDET benchmark and saw double free issue for ext3 extended
attributes block, which complains the same xattr block already being freed (in
ext3_xattr_release_block()). The problem could also been triggered by
multiple thr...
[PATCH] return ENOENT from ext3_link when racing with unlinkReturn -ENOENT from ext[34]_link if we've raced with unlink and i_nlink is
0. Doing otherwise has the potential to corrupt the orphan inode list,
because we'd wind up with an inode with a non-zero link count on the list,
and it will never get properly cleaned up & removed from the orphan list
before it is freed.
[akpm@osdl.org: build fix]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: <...
[PATCH] make static counters in new_inode and iunique be 32 bitsFrom: Jeff Layton <jlayton@redhat.com>
To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
When a 32-bit program that was not compiled with large file offsets does a
stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
(correctly) generates an EOVERFLOW error. We can't do anything about fs's
with larger permanent inode numbers, but when we generate them on th...
[IOPRIO] A handle to switch off write prioritizationAdds write_virt_mode file in /sys/block/<device>/queue/iosched/, that
allows to switch off async requests prioritization. Prioritization
switching off is achieved by setting an owner of newly created
cfq_queues to UB0.
[IOPRIO] Switches UB context in places necessary for proper write schedulingWe need information about owner of async requests. We get it from the
page marked by IO Accounting feature and change context to the context
of page's owner. Later cfq picks up current context and handle it
properly (previous patch).
[IOPRIO] Support of prioritized write inside cfqAll ubioprio patches before now do fair prioritization only for sync
requests. For async requests the problem exists: who has actually
produced the request? Using OVZ IO Accounting feature we can obtain this
arcane knowledge. IO Accounting feature sets a mark to page: who's made
it dirty. At the moment of request submitting we change context to the
context of the owner of the page (this operati...
[IOPRIO] A handle to switch virtmode off/onAdds virt_mode file in /sys/block/<device>/queue/iosched/, that allows
to switch off virtualized mode. Virtualization switching off is achieved
by setting an owner of newly created cfq_queues to UB0. Note, it means,
the already created process, will still scheduled.
[IOPRIO] Configurable base timeslice durationelevator's parameters can be managed using files in
/sys/block/<device>/queue/iosched/. This patch adds there ub_slice file,
that controls base_slice value. By default it is HZ/2.
[IOPRIO] Add userspace interface for beancounter ioprio managementioprio_set() syscall is modified by this patch to get as a second
parameter IOPRIO_WHO_UBC constant, that indicates, that ioprio should be
set for beancounter. Additional information on using this syscall is
located in Documentation/block/ioprio.txt file.
[IOPRIO] Introduce active beancounter switchThis is very important patch, that implements actual scheduling of UBs.
UB holds a list of active UBs (UBs that have I/O requests at the moment)
in the last serviced order. It allows us to get next UB for service for
O(1) time. The duration of time slice depends on UB priority by the
following way:
ub_slice = base_slice + (base_slice * (ioprio - UB_IOPRIO_MIN)) /
(UB_IOPRIO_MAX - UB_IOPRIO_MIN)...
[IOPRIO] Handle forced dispatching caseSometimes driver asks elevator to throw down all requests in-flight to
it (to driver). This is called forced dispatching. In CFQ it is handled
by separate functions and this patch modifies them to work properly with
ubioprio feature.
[IOPRIO] Switche cfq to use virtualized data structuresOne of the main part of ubioprio feature is to switch CFQ algorithm from
working on per-device structures (cfq_data) to working on per-(device,
ub) structures (cfq_bc_data). This patch does it.
[IOPRIO] Introduce main data structures and functionsThe patch defines new data structures and functions that forms basic
infrastructure of ubioprio feature. cfq_bc_data is a structure that
represents pare: (block device, UB). It owns all virtualized data from
original cfq_data structure. Each device (cfq_data) holds a list of
cfq_bc_data structures. UB holds a list of active cfq_bc_data
structures. Sources content more detailed description. The ...
This patch just moves required declarations and definitions from cfq-iosched.c to external cfq-iosched.h file.This patchset introduces I/O prioritization for beancounters.
The feature is implemented on basis of CFQ I/O scheduler. In CFQ I/O
scheduler each process obtains a time slice, which duration depends on
priority of the process. We use the same approach for UBs. All the logic
of I/O scheduling for processes belonging to particular UB is preserved.
So we obtain something like 2-level CFQ scheduli...
vm86(2) was never implemented on x86_64, always giving warnings. First, warning was rate-limited, then non-rate-limited warning, then half-assed warning which doesn't prevent dmesg spamming.So, remove it completely.
OOM generation should be updated on mm destroy, not task exit.This patch calculates OOM generations directly.
The counter is increased when MM of process killed by OOM is
finally destroyed.
[IPV6] MCAST: Fix joining all-node multicast group on device initialization. From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Date: Mon, 15 Jan 2007 05:48:40 +0000 (-0800)[IPV6] MCAST: Fix joining all-node multicast group on device initialization.
Join all-node multicast group after assignment of dev->ip6_ptr
because it must be assigned when ipv6_dev_mc_inc() is called.
This fixes Bug#7817, reported by <gernoth@informatik.uni-erlangen.de>.
Closes: 7817
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.ne...
Missed ve context switch in NFS RPC code.pipefs switches the context to ve0 and never returns to ve context. Such a
situation takes place in __rpc_execute (net/sunrpc/sched.c) and svc_recvfrom
(net/sunrpc/svcsock.c) functions. This causes oops on starting ve in case when
ve private area is placed on nfs partition.
ext3 error behavior was broken in linux kernels since 2.5.x versions by the following patch:2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
Default mount options from superblock for ext2/3 filesystems
http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ
In case ext3 file system is mounted with errors=continue (EXT3_ERRORS_CONTINUE)
errors should be ignored when possible. However at present in case of any error
kernel aborts journal and remounts filesystem to ...
task puts UBC before the task becomes invisible for all (e.g. /proc),thus a task can be found on the list without exec_env/owner_env
which should not happen.
Introduced by diff-ubc-dont-uncharge-in-RCU-20070212
EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for error behaviour.Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@openvz.org>
EXT2_ERRORS_CONTINUE should be read from the sb as default error behaviour. parse_option() should clean the alternative options and should not change default value taken from the superblock.Signed-off-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@openvz.org>
Patch from mainstream: [SPARC64]: Fix Tomatillo/Schizo IRQ handling.The code in schizo_irq_trans_init() should set irq_data->sync_reg
to the location of the SYNC register if this is Tomatillo, and set
it to zero otherwise. But that is not what it is doing.
As a result, non-Tomatillo systems were trying to access a
non-existent register resulting in bus errors at the first
PCI interrupt.
Thanks to Roland Stigge for the bug report.
Signed-off-by: David S. Mil...
Same story as with p4-clockmod. Driver does set_cpus_allowed(cpu), then checks for smp_processor_id() being equal to "cpu".http://bugzilla.openvz.org/show_bug.cgi?id=467
fix umask when noACL kernel meets extN tuned for ACLsFix insecure default behaviour reported by Tigran Aivazian: if an ext2
or ext3 filesystem is tuned to mount with "acl", but mounted by
a kernel built without ACL support, then umask was ignored when creating
inodes - though root or user has umask 022, touch creates files as 0666,
and mkdir creates directories as 0777.
This appears to have worked right until 2.6.11, when a fix to the default
mo...
Fix for shmem_truncate_range() BUG_ON()Ran into BUG() while doing madvise(REMOVE) testing. If we are punching a
hole into shared memory segment using madvise(REMOVE) and the entire hole
is below the indirect blocks, we hit following assert.
BUG_ON(limit <= SHMEM_NR_DIRECT);
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: ...
make ppc64 current preempt-safeRepeated -j20 kernel builds on a G5 Quad running an SMP PREEMPT kernel
would often collapse within a day, some exec failing with "Bad address".
In each case examined, load_elf_binary was doing a kernel_read, but
generic_file_aio_read's access_ok saw current->thread.fs.seg as USER_DS
instead of KERNEL_DS.
objdump of filemap.o shows gcc 4.1.0 emitting "mr r5,r13 ... ld r9,416(r5)"
here for get_p...
fix msync error on unmapped areaFix the 2.6.18 sys_msync to report -ENOMEM correctly when an unmapped area
falls within its range, and not to overshoot: to satisfy LSB 3.1 tests and
to fix Debian Bug#394392. Took the 2.6.19 sys_msync as starting point
(including its cleanup of repeated "current->mm"s), reintroducing the
msync_interval and balance_dirty_pages_ratelimited_nr needed in 2.6.18.
The misbehaviour fixed here may n...
read_zero_pagealigned() locking fixRamiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
bugzilla 7645. Right: read_zero_pagealigned uses down_read of mmap_sem,
but another thread's racing read of /dev/zero, or a normal fault, can
easily set that pte again, in between zap_page_range and zeromap_page_range
getting there. It's been wrong ever since 2.4.3.
The simple fix is to use down_write instead, but tha...