net: bridge - process skbs has been already substituted due to via_phys_devWhen via_phys_dev is enabled we substitute skb->dev with
master_dev and pass it back to bridge code. Instead of
dropping such skb we should pass it up to network stack
to process.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
VE: fix idle time accountingMake both account ways simmetic: idle time accounted as idle or iowait,
depending on number tasks in iowait state.
http://bugzilla.openvz.org/show_bug.cgi?id=1217
(#114633)
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
cfq: revalidate cached async queueAsync queues are stored on struct cfq_bc_data
and are cached on per-process struct cfq_io_context.
The cached queue may be invalid due to io_page beancounter
driven io-context switch.
So, cfq_io_context gets cached queue, but corresponding
cfq_bc and user_beancounter may be already destroyed -- all
this leads to oops at get_beancounter in cfq_set_request.
Add check for async queue owner and ...
cfq link cfq_bc_data without bc io schedFixes oops at first IO with CONFIG_BC_IO_SCHED=n.
The cfq_set_request wants to get ub by cfqq->cfq_bc->ub_iopriv,
so save ref to ub0 there.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
bc: fix permissions on /proc/bcThe reading of /proc/bc/* is permitted for those only who
has CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH attributes
set. We should not point files as "group" or "other"
readable/executable since they are not.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
simfs: don't work with buggy inputSome (buggy) filesystems (aufs for example) pass NULL as mnt to getatts
and hope for the better...
Let's not confuse the user with the oops at least.
http://bugzilla.openvz.org/show_bug.cgi?id=1054
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Replace swsoft mentions with parallelsReplaces COPYRIGHT statements, COPYING.SWsoft references and
the file itself and module authors if any.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
bridge: don't leak master device on brctl addifIf we add a second ethernet device to bridge the former one leaks.
http://bugzilla.openvz.org/show_bug.cgi?id=1145
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
net: NETIF_F_VIRTUAL intersects with NETIF_F_LROFortunately, this is not a part of user/kernel interface
[xemul picked 2.6.27's 4826fea3]
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
ve: sanitize capability checks for namespaces creationThe existing hard checking for namespaces mask is too bad. The
intention was to ban namespaces creation for containers, but
there aready exists a proper security mechanism to govern this
question.
Switch to existing capability-driven policy, thus allowing for
namespaces creation from the HN.
http://bugzilla.openvz.org/show_bug.cgi?id=1113
Signed-off-by: Konstantin Khlebnikov <khlebnikov@open...
NFS: NFS super blocks in different VEs should be differentNFS: NFS super blocks in different VEs should be different
Teach nfs_compare_super to this
#265926
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
net: init init_net ve owner (to ve0)http://bugzilla.openvz.org/show_bug.cgi?id=1128
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
netlink: Fix oops in netlink conntrack moduleIf we load conntrack modules after ve start one pointer on ve_struct
is NULL and accessing it causes an oops.
This is handled in most of the places, but the netlink interface.
Fix this one as well.
http://bugzilla.openvz.org/show_bug.cgi?id=788
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
net: set ve context when init/exit method is calledBoth pernet init and exit methods are called:
- from VE context when VE is created;
- from VE0 context if module registers pernet operations
This difference in approches leads to many nasty things, since the
init callback can be actually called with wrong exec_env.
Unify both approaches.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
СPT: Use sock_create instead sock_create_kernsock_create_kern() uses init_net as default net namespace. Therefore
sockets and net devices are belonged to init_net, though must belong
to current net namespace.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
net: unix: fix inflight counting bug in garbage collectorPreviously I assumed that the receive queues of candidates don't
change during the GC. This is only half true, nothing can be received
from the queues (see comment in unix_gc()), but buffers could be added
through the other half of the socket pair, which may still have file
descriptors referring to it.
This can result in inc_inflight_move_tail() erronously increasing the
"inflight" counter fo...
net: Fix recursive descent in __scm_destroy().__scm_destroy() walks the list of file descriptors in the scm_fp_list
pointed to by the scm_cookie argument.
Those, in turn, can close sockets and invoke __scm_destroy() again.
There is nothing which limits how deeply this can occur.
The idea for how to fix this is from Linus. Basically, we do all of
the fput()s at the top level by collecting all of the scm_fp_list
objects hit by an fput()....
Fix wrong size of ub0_percpuThe struct percpu_data dynamically allocated and have array only for
1 cpu, so static usage of it does not work.
Plus rework macros for static percpu variables declaration and
initialization.
http://bugzilla.openvz.org/show_bug.cgi?id=1039
Singed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
sunrpc: fix lost set_exec_env-back and unlock the op_semAny NFS connect over TCP-IPv4 from VE block VE stop process.
This patch add missed unlock op_sem and set_exec_env.
http://bugzilla.openvz.org/show_bug.cgi?id=1007
(picked from openvz ubuntu branch patch
0145-VE-add-missed-semaphore-up-and-set-exec-env.patch
2.6.18 not affected, 2.6.26+ already fixed by den@)
[NET]: sk_release_kernel needs to be exported to modulesFixes:
ERROR: "sk_release_kernel" [net/ipv6/ipv6.ko] undefined!
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 45af1754bc09926b5e062bda24f789d7b320939f)
[NET]: Make netlink_kernel_release publically available as sk_release_kernel.This staff will be needed for non-netlink kernel sockets, which should
also not pin a namespace like tcp_socket and icmp_socket.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit edf0208702007ec1f6a36756fdd005f771a4cf17)
[NETLINK]: No need for a separate __netlink_release call.Merge it to netlink_kernel_release.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9dfbec1fb2bedff6b118504055cd9f0485edba45)
[NETNS]: Fix race between put_net() and netlink_kernel_create().The comment about "race free view of the set of network
namespaces" was a bit hasty. Look (there even can be only
one CPU, as discovered by Alexey Dobriyan and Denis Lunev):
put_net()
if (atomic_dec_and_test(&net->refcnt))
/* true */
__put_net(net);
queue_work(...);
/*
* note: the net now has refcnt 0, but still in
* the global list of net namespaces
*/
== re-schedule ...
[NETNS]: Namespace stop vs 'ip r l' race.backport mainline commit 775516bfa2bd7993620c9039191a0c30b8d8a496
During network namespace stop process kernel side netlink sockets
belonging to a namespace should be closed. They should not prevent
namespace to stop, so they do not increment namespace usage
counter. Though this counter will be put during last sock_put.
The raplacement of the correct netns for init_ns solves the problem
only ...
[NETNS]: Consolidate kernel netlink socket destruction.backport mainline commit b7c6ba6eb1234e35a74fb8ba8123232a7b1ba9e4
Create a specific helper for netlink kernel socket disposal. This just
let the code look better and provides a ground for proper disposal
inside a namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[NETNS]: Double free in netlink_release.Netlink protocol table is global for all namespaces. Some netlink
protocols have been virtualized, i.e. they have per/namespace netlink
socket. This difference can easily lead to double free if more than 1
namespace is started. Count the number of kernel netlink sockets to
track that this table is not used any more.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <ado...
[UBC]: Double free for UDP socket akaThe socket resided in UB space waiting queue could be released. In this
case ub_snd_wakeup running on the another CPU could hold/release that
socket effectively hitting 0 refcounter second time.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
ubc: uncharging too much for TCPSNDBUFubc: uncharging too much for TCPSNDBUF
It is not allowed to go to the label wait_for_memory with chargesize != 0
when this space is already placed to the skb.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Endless loop in __sk_stream_wait_memory.[UBC]: Endless loop in __sk_stream_wait_memory.
The loop in __sk_stream_wait_memory when tcp_sendmsg asks to wait for
TCPSNDBUF space is endless when the timeout is not specified. The only way
out is to queue a signal for that process.
Lets return a status flag from ub_sock_snd_queue_add that UB space is
available. This is enough to make a correct decision to leave the cycle.
Signed-off-by: ...
Allow envID fields in /proc/self/status in VE. Also allow get VPid, PNState, StopState, etc.OpenVZ Bug #936
http://bugzilla.openvz.org/show_bug.cgi?id=936
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
futexes: fix fault handling in futex_lock_pifutexes: fix fault handling in futex_lock_pi
commit 1b7558e457ed0de61023cfc913d2c342c7c3d9f2 upstream
This patch addresses a very sporadic pi-futex related failure in
highly threaded java apps on large SMP systems.
David Holmes reported that the pi_state consistency check in
lookup_pi_state triggered with his test application. This means that
the kernel internal pi_state and the user space f...
CPT: fix restore of inotify on symlinkInside VE file /etc/mtab is a symlink to /proc/mounts.
FreeNX server with KDE creates inotify on /etc/mtab file.
To restore such inotify we need to obtain dentry with path_lookup() and
restore inotify on it.
Bug #96464
CPT: fix EXIT_DEAD/TASK_DEAD checksFor one thing EXIT_DEAD was moved to ->exit_state only.
For another, this task state is called TASK_DEAD now and lives in ->state;
VE: let ->ve_netns live a bit more1. netns shutdown is done asynchronously
2. nsproxy free is done synchronously
which means we can't use "get_exec_env()->ve_ns->net_ns" construct
anywhere in netns teardown codepath. ->ve_ns will be NULL (fixable) or
will point to freed memory (hardly fixable).
The solution it to pin netns one more time, and use get_exec_env()->ve_netns .
get_exec_env() is always valid. It's ->ve_netns will al...
Memory leak on network namespace stop.mainline commit 4f84d82f7a623f8641af2574425c329431ff158f
Network namespace allocates 2 kernel netlink sockets, fibnl &
rtnl. These sockets should be disposed properly, i.e. by
sock_release. Plain sock_put is not enough.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Backport "[NET]: Make rtnetlink infrastructure network namespace aware (v3)"mainline commit 97c53cacf00d1f5aa04adabfebcc806ca8b22b10 + tweaks to get
netns from either netdevice ot something else.
http://bugzilla.openvz.org/show_bug.cgi?id=905
[NET]: Make rtnetlink infrastructure network namespace aware (v3)
After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network...
netlink: fix lookup checknetlink_unicast() is done in init_net context because
a) rtnl socket is bound to init_net,
b) kernel-space socket is successfully looked up by any VE,
c) rtnl is kernel-spase socket.
which is b-r-o-k-e-n, because e.g. just about any manipulation with
netdevices via netlink will be projected onto VE0.
Fix (after per-netns rtnl socket patches)
http://bugzilla.openvz.org/show_bug.cgi?id=905
proc: fix proc_cwd_linkIf d_root_check() in there fails, we shouldn't pretend everything is OK
and leave mnt unitialized or NULL (in case /proc/*/cwd).
http://bugzilla.openvz.org/show_bug.cgi?id=900