OpenVZ-legacy
  1. OpenVZ-legacy

linux-2.6.18-openvz

Public
AuthorCommitMessageCommit dateIssues
Konstantin KhlebnikovKonstantin Khlebnikov
d32a0be81afutimes: synch sys_utimensat syscall with the upstream
Konstantin OzerkovKonstantin Ozerkov
b8bad68c3cdubc: Fix compilation when CONFIG_UBC_DEBUG_KMEM enabledhttp://bugzilla.openvz.org/show_bug.cgi?id=1048
Konstantin OzerkovKonstantin Ozerkov
8b3ce58411cFix OOPS while stopping VE after binfmt_misc.ko loaded,ve_binfmt_fini() should check if current VE has registered binfmt_misc fs. (Properly handling situation while stopping VE which started before binfmt_misc.ko loaded) http://bugzilla.openvz.org/show_bug.cgi?id=1028
Konstantin OzerkovKonstantin Ozerkov
65247bdf6a9Fix broken kernel compilation after previous patch.Replace unknown type bool with int, true with 1 and false with 0. Signed-off-by: Konstanit Ozerkov <kozerkov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Pavel EmelyanovPavel Emelyanov
ecc8634e8b6vfs: fix lock inversion in drop_pagecache_sb()Backport mainstream commit: eccb95cee4f0d56faa46ef22fb94dd4a3578d3eb originally done by Dmitry Monakhov. Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock before calling __invalidate_mapping_pages(). We just have to make sure inode won't go away from under us by keeping reference to it and putting the reference only after we have safely resumed the scan of the in...
Marat StanichenkoMarat Stanichenko
13024ea95f3Fix the possibility to mount nfs inside CT without appropriate featureIt's possible to mount nfs from inside the VE without nfs feature to be on. The bug only affected vanilla kernels not rhel5 based ones. http://bugzilla.openvz.org/show_bug.cgi?id=1018
OpenVZ teamOpenVZ team
b47a7de72d5linux-2.6.18-028stab056.1 released
Konstantin KhlebnikovKonstantin Khlebnikov
91e59389b1cVE: move /proc/vz/veinfo from vznetdev to vzmonSince some people wish to run openvz w/o venet device, but vzlist tool relies on /proc/vz/veinfo file presence, vzmon module is a better place for this file. venet part of changes http://bugzilla.openvz.org/show_bug.cgi?id=394 Signed-off-by: Konstantin Khlebnikov <khlebnikov@parallels.com>
Konstantin KhlebnikovKonstantin Khlebnikov
8ae0eb995cfVE: move /proc/vz/veinfo from vznetdev to vzmon Since some people wish to run openvz w/o venet device, but vzlist tool relies on /proc/vz/veinfo file presence, vzmon module is a better place for this file.http://bugzilla.openvz.org/show_bug.cgi?id=394 Signed-off-by: Konstantin Khlebnikov <khlebnikov@parallels.com>
Alexey DobriyanAlexey Dobriyan
ea0848bfd9bsysfs: add kernel.dummy-pde to prevent invisible /proc/sys/kernel #112482This is a mess. There are global and VE0-local sysctl trees. They are populated on-demand: creating creates necessary directories if needed, removing removes directories if they become empty. It can very well happen that VE0-local tree is empty before first module registers sysctl. Now, if someone instantiates /proc/sys/kernel inode while VE0-local sysctl tree doesn't contain /proc/sys/kern...
Denis V. LunevDenis V. Lunev
a1948cc826bVE: decrease ve_struct size in case of huge NR_CPUS #97575kstat_lat_pcpu_struct contains array of NR_CPUS elements. Replace it with alloc_percpu data which helps to keep ve_struct relatively small and prevents allocation fails of huge order. Mostly relevant to IA64, where NR_CPUS=1024 Signed-off-by: Denis Lunev <den@openvz.org>
Denis V. LunevDenis V. Lunev
848fa0baa1cNFS: NFS super blocks in different VEs should be differentTeach nfs_compare_super to this. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Pavel EmelianovPavel Emelianov
e00104cc01bIOACCT: Fix read accounting discrepancy #111808Two places changed since we ported Andrew's patches till he submitted them to mainline: 1. ll_rw_block doesn't need to call acct since it calls submit_bio, which does this 2. read_cache_pages should not acct read bytes in case of error Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
485c61929acquota: Produce correct nlink count for /proc/vz/vzaquota #115343Use count mounpoints accessible from VE as upper estimate for count subdirectories inside /proc/vz/vzaquot. Concept stolen from vzdq_aquotd_readdir. Disable enumation in VE0 for performance reason (like in _readdir and _lookup) Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
7e994a0758aVE: add ve mem class to sysfs Create in VE sysfs mem class and some its devices: null, zero, full, random, urandom #99897Required for Ubuntu 8.04 and maybe some other new distro udev package. Signed-off-by: Konstantin Khlebnikov <khlebnikov@parallels.com>
Kirill ShileevKirill Shileev
72d4d4e4ffcns: sys_getpriority() should return ESRCH in case global PID is used sys_getpriority(..., pid) must return ESRCH in case pid is real one and the invocation is done in CT (and not CT0) This is needed to avoid false-positive detection of root exploits by chkrootkits package.http://bugzilla.openvz.org/show_bug.cgi?id=736 Signed-off-by: Kirill Shileev <kshileev@sw.ru>
Konstantin KhlebnikovKonstantin Khlebnikov
a63527f014cproc: Fix proc root entry nlink count* Add entries from local tree, similar as in proc_getattr; * Use per-ve process count for VE's root, rather than the total number of processes in the system. All of the above is an upper estimation, that is perfectly fine with 'find' utlity. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
4769308d614proc: Fix proc entires' nlink during VE systsem startupCorrect nlink counters on proc_dir_entry-es at thei initial tossing around local and global trees. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
8074d7cbaa4proc: Fix i_nlink for proc files in getattr #114644Move the nlink correction from proc_lookup to proc_getattr and change it right in the stat buffer insted of inode nlink. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Vitaliy GusevVitaliy Gusev
43b24d1164aProc: add empty /proc/devices to CT #114847Maik said that some fancy tools are disappointed by its absence on the one hand, but do not care for its content on the other. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
3bbe8dd156enfs: fix macro in ve_nfs.h compilation when CONFIG_VE=nSigned-off-by: Konstantin Khlebnikov <khlebnikov@parallels.com>
Pavel EmelyanovPavel Emelyanov
8e30d1bf710netlink: Fix oops in netlink conntrack module If we load conntrack modules after ve start one pointer on ve_struct is NULL and accessing it causes an oops.This is handled in most of the places, but the netlink interface. Fix this one as well. http://bugzilla.openvz.org/show_bug.cgi?id=788 Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
ff1030c1b25NETFILTER: Conntrack sysctl mess fixThe ip_conntrack_checksum sysctls were added in middle of table shift indexies (somethere between 2.6.9 and 2.6.18). As a result sysctl initialization in ip_conntrack_sysctl_init become broken, and sysctl net.ipv4.netfilter.* showed strange values. (In 2.6.24-ovz all ok) http://bugzilla.openvz.org/show_bug.cgi?id=866 Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Pavel EmelianovPavel Emelianov
9eb0bce0cffsit: Virtualize sit device #115411This mostly looks as sit netnsization patches I did for mainstream, but have some pecularities: 1. sit is builtin in ipv6 module in this kernel 2. VE_FEATURE_SIT controlls the sit availability in VE Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Pavel EmelianovPavel Emelianov
55d2ea41dc3ipv4: Use per-VE ipv4_devconf_dflt for new devices Otherwise, setting sys.ipv4.conf.default inside VE won't have any effect and will confuse userspace.http://bugzilla.openvz.org/show_bug.cgi?id=826 Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Denis V. LunevDenis V. Lunev
f6869d0c8f0IGMP: IGMP packets should be sent in the context of correct VE.Timers by default are processed in the context of VE0. Obtain the context from device and send the packet inside it. (Add 2 more timers after tangaldi's patch). Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: Denis Tangalitchev <tangaldi@medi-a.ru>
Pavel EmelyanovPavel Emelyanov
346258106f0Compilation fixesSigned-off-by: Pavel Emelyanov <xemul@openvz.org>
Pavel EmelyanovPavel Emelyanov
90813987d94NET: Re-register sysfs entry when moving netdevice into VE Seems like we only have to call netdev_(un)register_sysfs() in correct order with proper locking #98423The only problem we may have is when register sysfs fails - there is not way to gracefully handle this case, since this may be on netdev_del or ve_stop action when returning error is not an option and trying to register this thing back may fail as well. Known side effect after this patch - the /sys/class/net/<dev>/device symlink is broken (maybe some others will be too), but we can't pull all ...
Pavel EmelyanovPavel Emelyanov
f6262c1f8bcVE: virtualize binfmt_misc #99599Nothing special. SUN jdk complains since can't use binfmt. Not serious and java surely works fine w/o it, but just to make it and its users happy let's virtualize binfmt_misc. 1. register ve start-stop hook 2. register per-ve filesystem 3. make status variable per-ve 4. make list of entries per-ve 5. make vfsmnt per-ve (for simple_pin/release_fs) 6. don't forget to genocide the entries on VE s...
Denis V. LunevDenis V. Lunev
ced9e9dc0d7ub: dentry->dentry_bc.d_ub is unreliable after the sleep #116095d_kill can sleep inside. In this case dentry->dentry_bc.d_ub saved before is unreliable as we can have dcache accounting on event during sleep. In this case we'll have saved ub == NULL and OOPS/leak inside dcache_uncharge. Another problem here is that we should decrement inuse count on the dentry appropriately. Signed-off-by: Denis Lunev <den@openvz.org>
Denis V. LunevDenis V. Lunev
fc07c2966c0ubc: uncharging too much for TCPSNDBUFIt is not allowed to go to the label wait_for_memory with chargesize != 0 when this space is already placed to the skb. Signed-off-by: Denis V. Lunev <den@openvz.org>
Pavel EmelyanovPavel Emelyanov
60b8367d210ub: Set correct permissions on beancounters' proc filesAll these files allow to read from them only for root, but the mode mask is r--r--r--, which sometimes confuses the user. Set the r-------- mask for all the bc proc files and the r-x------ for directories. http://bugzilla.openvz.org/show_bug.cgi?id=782 Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
5067ea276c6UBC: fix ub proc link countOverride getattr callback on /proc/bc and ubc entries to get correct nlink. http://bugzilla.openvz.org/show_bug.cgi?id=672 Signed-off-by: Konstantin Khlebnikov <khlebnikov@parallels.com>
Pavel EmelianovPavel Emelianov
8b2b66c396fFix pte allocation OOPS on ppc64 boxThe return code from do_pte_alloc() MUST be checked, because it is now likely to fail doe to UBC constraints. http://bugzilla.openvz.org/show_bug.cgi?id=680 Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
5a20b9ea158UBC: kmem pid charge size fix for debug caseadd missign CHARGE_SIZE macro Signed-off-by: Konstantin Khlebnikov <khlebnikov@parallels.com>
Konstantin KhlebnikovKonstantin Khlebnikov
10895b2cd82ubc: alloc_pid irq fix when CONFIG_USER_RESOURCE=nFix irq enable/disable sequence in case CONFIG_USER_RESOURCE=n Signed-off-by: Konstantin Khlebnikov <khlebnikov@parallels.com>
Denis V. LunevDenis V. Lunev
5d73fa6beb1ub: ub_sock_tcp_chargesend warning if called via tcp_fragment #115332BUG: warning at kernel/ub/ub_net.c:335/__ub_skb_set_charge() (Tainted: P ) [<c043b658>] ub_sock_tcp_chargesend+0x86/0x171 [<c05db289>] tcp_fragment+0xaf/0x452 [<c05d507a>] tcp_sacktag_write_queue+0x30f/0x71e [<c05d568f>] tcp_ack+0x206/0x184c [<c05dc61a>] __tcp_push_pending_frames+0x4ab/0x79e [<c05d9f6f>] tcp_rcv_established+0x76b/0x884 ...
Konstantin KhlebnikovKonstantin Khlebnikov
4f009818e2eIOPRIO: per (bc, device)-pair delays between bc enqueuing and activationThis total wait time in milliseconds is show in /proc/bc/<bc>/ioprio_queues. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Pavel EmelyanovPavel Emelyanov
be4fb0db7aaUBC: Show BC IO scheduler activity in procThings, that may be interested for a beancounter are: * the number of requests on the BC * whether or not this BC is during a request dispatch * whether or not this BC is active (all above is per-queue). Add the /proc/bc/<id>/ioprio_queues file with the information described above. sda 1 DA hda 0 sda 2 Signed-...
Pavel EmelyanovPavel Emelyanov
d79028028f6UBC: Show BC current IO prioritySurprisingly, but currently on the running system, there's no way to find out what IO priority a VE... sorry - CT has. Add the /proc/bc/<id>/ioprio file with (currently only) this information. prio: 4 Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
dea5ac70bb0UBC: add ub counterAdd counter of ubc. Protected with ub_hash_lock. Needed for correct proc n_link calculation. Signed-off-by: Konstantin Khlebnikov <khlebnikov@parallels.com>
Vitaliy GusevVitaliy Gusev
2c2a453d7b3ub: Check skb on repeated chargingThis is a (useful) debug patch to find charge memory leak. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
c7a1e4e5f83CFQ: remove variable time-slicesRemove variable bc timeslice -- with fair queue it is useless. BC gets bigger timeslice, but bandwidth stays unchanged, because the bc is queued according it its iotime. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Konstantin KhlebnikovKonstantin Khlebnikov
2c3e1d1372bCFS: Implement bandwidth distribution controlThe one is based on per-ioprio iotime ratio coefficient (precalculated). Bluntly speaking, the bc iotime flows with different speed, depending on ioprio. BC with lower IO priority sinks faster in queue and gets less bandwidth. Ratio coefficients selected to conform with bandwidth distribution in the previous implementation: ratio = 100./(1+ioprio/7.) ioprio 0 -> ratio 100 -> weight 1 ioprio ...
Konstantin KhlebnikovKonstantin Khlebnikov
7a1de40ab88CFQ: Correct the bc-data in-queue positionThis patch eliminates BC IO overdose/underdose after bc start, wakeup or iotime overlap. Queue position (cfq_bc_position) is a upper edge of previous active BC iotime. Position updated at BC switch if it was before new active BC iotime. BC is enqueued according its current iotime, if the one is withun the interval [position -/+ maximum_slice], otherwise the iotime changed to queue position. ...
Konstantin KhlebnikovKonstantin Khlebnikov
191aa99d3c0CFQ: Replace list of beancounters with rbtree #98276Replace round-robin scheduling in CFQ BC level with "fair queuing" scheme based on used io time accounting, by replacing the list of per cfq bc-data with rb-tree based priority queue ordered by total used io time (cfq_bc_iotime). This iotime is a monotonic rising counter of bc total used io time. On bc switch the iotime of previous active bc is updated according it used time and the bc with sm...
Pavel EmelianovPavel Emelianov
77bd99cd195UBC: show how much page beancounters each UB has #114660Essentially, this is the per-UB rss value calculated (unline physpages and privvmpages) w/o taking sharing into account. With this statistics (shown via /proc/bc/XXX/vmaux:rss) we can evaluate the portion of pages, that are shared accross beancounters (i.e. CTs) like this: (\sum (bc.rss + bc.tmpfs_respages) - \sum (bc.physpages)) / (\sum (bc.rss + bc.tmpfs_respages)) Signed-off-by: P...
Denis V. LunevDenis V. Lunev
2e032d9c40fsimfs: do not use s_root dentry of underlying for statfs #115232The real problem is that s_root on the NFS super block is a crap. Unfortunately, the original dentry (which is asked to be statfs-ed) is not available at this point. The only visible solution for this is to use the dentry to which simfs is point to. Signed-off-by: Denis V. Lunev <den@parallels.com>
Michael HalcrowMichael Halcrow
91b894e71betpm: Security vulnerability in tpm_write()tpm_write() has an elementary security vulnerability that we should consider to be exploitable. An unsigned value is assigned to a signed value, and then a bounds check is made against the signed value. This patch changes the signed data type into its proper unsigned data type. The patch also corrects the data type in tpm_read(). This issue has already been announced in a public forum: http:/...
Masayuki NakagawaMasayuki Nakagawa
b3347b45812tcp: Fix use-after-free of some sk_buff in tcp code #111423Backport of ms patch fb7e2399ec17f1004c0e0ccfd17439f8759ede01: I encountered a kernel panic with my test program, which is a very simple IPv6 client-server program. The server side sets IPV6_RECVPKTINFO on a listening socket, and the client side just sends a message to the server. Then the kernel panic occurs on the server. (If you need the test program, please let me know. I can provide it...