Tough one this. I've googled and posted elsewhere, but I thought I might try here.
Using 14.04 on my new W541, everything works fine but it crashes fairly regularly - looks like X/driver issues. So I took the advice to upgrade the HWE to the one from I think 15.04 using the instructions on the ubuntu site.
Now everything works well except my battery life has been slashed. Power consupmtion was about 12W now it's about double that. Laptop's hot, fan comes on at the drop of a hat.
No unusual processes listed under top
Powertop tunables are all good
Powersave governer on
intel_pstate driver in use, however disabling this makes no difference
Powertop reports lots of wakeups per second - a couple of hundred even when idle - but contrary to what the internet suggests I can't get it to tell me WHAT is responsible for the wakeups.
Stuck.
Dmesg then grep for power, wake etc?
Has this laptop got that funky Nvidia Optimus thing in it?
An integrated graphics controller for low performance but low power, and a super-fast Nvidia thing for gaming, which eats lots of power?
I think the support for this is a bit experimental....
Power consumption of what? Complete platform or just the CPU e.g. using the likwid tools to measure it?
(likwid-powermeter -s 30 and let the machine idle and post back what it spits out, along with the CPU model, I can't remember if it does that. I power monitor a bunch of machine cpus like this for various reasons, so depending on CPU I can guesstimate what I'd expect to see)
As above, optimus is a barrel of snakes best avoided.
I use a very similar machine (dell precision laptop) that has zero bother with linux BUT i disable optimus. Power savings of the integrated aren't worth the unreliability for me and I'm generally using external displays or needing the GPU anyway.
As a *rough* guide (i can be more specific with more info on the machine) I'd expect idle chip power reported by the likwid tools to be about 5W on the cpu. (if they don't run, try root or with/without the daemon options. You might need to run the daemon as root. The approach that works varies by machine IME. RTFM.)
Nvidia-smi will tell you what the GPU is up to. You can "watch -n 1 nvidia-smi" to monitor it (and e.g. pipe that to a file).
Other things that can cause issues:
How are your external ports wired, if you know? Direct to GPU? Switchable? MST nonsense?
Skylake CPU? Had issue with the GPU on those under 14.04.
Theres a system tray thing which monitors the power governor mode, worth trying. I forget the name and not sat in front of the right machine. the likwid tools would show up this issue though.
It is Optimus, but it was working flawlessly yesterday, as said, with 9 hours battery life. (except for once daily crashing)
Powertop reports nvidia gpu powered down.
dmesg |grep wake
[ 0.206015] pci 0000:00:14.0: System wakeup disabled by ACPI
[ 0.206416] pci 0000:00:19.0: System wakeup disabled by ACPI
[ 0.206579] pci 0000:00:1a.0: System wakeup disabled by ACPI
[ 0.206962] pci 0000:00:1c.1: System wakeup disabled by ACPI
[ 0.207088] pci 0000:00:1c.2: System wakeup disabled by ACPI
[ 0.207379] pci 0000:00:1d.0: System wakeup disabled by ACPI
[ 0.793343] rtc_cmos 00:02: RTC can wake from S4
I don't have time to learn likwid now, will try this evening.
As said - everything worked perfectly before I upgraded the distribution. I think this is down to high wakeups, but I don't know how to trace why they are happening.
[code]
System baseline power is estimated at 31.9 W
Power est. Usage Device name
12.2 W 0.0 pkts/s Network interface: eth0 (e1000e)
7.19 W 100.0% Radio device: thinkpad_acpi
5.66 W 100.0% USB device: usb-device-8087-07dc
3.46 W 0.0 pkts/s nic:vmnet8
1.48 W 10.2% DRAM
592 mW 10.2% CPU core
542 mW 100.0% Display backlight
310 mW 0.0 pkts/s nic:vmnet1
201 mW 10.5 ops/s GPU core
137 mW 7.1 pkts/s Network interface: wlan0 (iwlwifi)
113 mW 10.2% CPU misc
52.3 mW 100.0% Radio device: iwlwifi
7.94 mW 10.5 ops/s GPU misc
[/code]
I don't have time to learn likwid now, will try this evening.
When you do, it's pretty easy. Just install it then
"likwid-powermeter -s 30"
to sample for 30s. If it doesn't work try as root, and also try with -M 0 or -M 1. Worth doing "modprobe msr" to be sure it's loaded.
Get it here: https://github.com/RRZE-HPC/likwid
More info: https://code.google.com/p/likwid/wiki/LikwidPowermeter
I spend a lot of time getting Optimus to work properly, it does make a huge difference to the graphics performance but it runs very hot (if you have a docking station and multiple monitors (2+) you need Optimus to be working)
Have you done the easy stuff, such as checking the BIOS is bang upto date? And installed the latest Unix NVIDIA driver? Also from working flawlessly one day to broken the next, did you install any package updates?
working flawlessly
errr
once daily crashing
doh was going to say run Powertop but if its showing lots of wakeups then Eventstat will dump out what's causing them
I’m sorry Molgrips for laughing here but you do seem to be allergic to technology 😆
Rachel
1889 Total events, 188.90 events/sec (kernel: 63.20, userspace: 125.70)
I’m sorry Molgrips for laughing here but you do seem to be allergic to technology
It's not that - it's because I try and make it do what I want, and I have quite specific requirements. Most people don't give a crap. My colleagues are happy for their laptop to be burning away with all sorts of shite running on it.
Booted with the old kernel, 3.13 instead of 3.19, and battery usage has gone back to normal.
Did you check the nvidia driver version on both?
Auto kernel upgrades can break the nvidia module if you don't re-install the driver - had this problem on a recent ubuntu kernel upgrade. Might have been the same thing - try 3.19 and re-install the nvidia drivers so the module gets rebuilt and try that?
Good idea ta.
The driver tool thingy gives me the choice of 340 and 352 - the choices are the same post upgrade.
I'll try that later.
If you do build from the NVIDIA installer make sure to sure the --dkms option that way it will survive kernel updates.
I'm running 346 with Optimus working well, however 352 should work as well
If you do build from the NVIDIA installer make sure to sure the --dkms option that way it will survive kernel updates.
Should survive is more accurate. Some of our systems have been fine for many kernel updates but some recent one broke the nvidia driver, re-installing it to force the modules to rebuilt fixed it.
I've been using Jockey, the gui thingy, to install nvidia drivers.
Well that's pissed me off.
3.13 was the original kernel that came with 14.04, and reverting to it always put battery life back to normal. Except now, I ran the software updater and now it's buggered too - double the power usage.
It's a Haswell CPU - Intel(R) Core(TM) i7-4810MQ
As such likwid doesn't work on it.
Right. IA wins.
If I install a new kernel, I have to *boot into it* and then fully purge and reinstall nvidia drivers. However - it's made a difference but not fixed it. Power is now about 19W or so with 3.19, instead of the previous 12W with 3.13.
If I switch to nvidia graphics, it stays the same.
EDIT oh, it's because nvidia is still powered up, must not've purged bbswitch properly.
Good job on the update to document the fix.
Rather strange that I couldn't find any reference to this online, I've been searching for days. IA was the closest, tbh.
Could it be that it's just MY build?
Shit.. rebuilt nvidia properly, nvidia card off, still around 17W now. Something definitely going on.
Pfft. Spoke too soon. Both cards are back on again this morning 🙁
likwid does work on Haswell, I run it every day.
Did you:
1) remember to modprobe msr
2) Try the with/without daemon options
3) if you tried the daemon, is it setuid
One of the above will sort it.
It will tell you CPU power usage so rule out that (or tell you it's the governor).
I can tell you exactly what the idle on a 4810MQ is this afternoon, as I'll be sat beside one, but for now I can only tell you I'd expect it to be between 4 and 5 watts. Which TBH is good enough for you to debug.
Both cards are back on again this morning
Sounds more like optimus issues then, maybe some sort of version mismatch. And this sort of nonsense is I why I don't go near optimus with someone else's bargepole.
I got an error message:
Query Turbo Mode only supported on Intel Nehalem/Westmere/SandyBridge/IvyBridge processors!
Dunno what you mean about daemon options. This is way above my linux pay grade.
I think it's Intel graphics driver related. Now I seem to be able to switch the nvidia card off in Intel mode, but it uses MORE power than if I use the nvidia card. If I switch to nvidia it uses pretty much what I used to see on nvidia before I upgraded.
And yes, optimus has been a pain, sadly I have no choice. TBH at this stage, Windows is beckoning, but I have no time to do a rebuild.
About once a year I consider installing Linux, then one of your threads puts me off. Thanks for this valuable public service 🙂
After a few years trying this, I can give my considered opinion.
If you are lucky, it's great. If not, it's a nightmare.
It's probably 50/50 though so it's worth a try 🙂 For a lot of people it goes well.
The trick to getting lucky is to use well supported hardware setups.
Sounds like you need a newer likwid version - grab the git latest and try that.
3.0.0 was the latest tar on their site.
I'd have thought Lenovo W series would be well supported....
Can you buy a Lenovo with Ubuntu? E.g. You can buy a Precision with Ubuntu. Though they'll disable Optimus (and indeed I do too)
By disabling optimus, you mean using integrated only graphics or discrete only graphics?
That was a BIOS option on my old W520, but it's not on this one. And besides, there are downsides. Integrated means no external displays, discrete means worse battery life. Just over half as much, on that old machine.
On 14.04 optimus works fine OOB, with nvidia-prime. External monitors worked with no config required.. Unless it was responsible in some way for the instability, which I don't suspect. But maybe.
I'm going to try re-installing the X stack and intel drivers.. if I get the chance because a training ride is at the top of the list 🙂
A thought - tried 15.10? THere's a fair bit of hardware support* not back ported yet that won't appear in LTS till 16.04....which is a few months anyway if you can wait.
*skylake on 14.04, arg! Faff.
I'm trying to stick to the LTS versions - they are the officially supported ones for work. And it takes a long time for them to approve and rebuild all the work spyware for them.
If reinstalling the X stack doesn't work I've got to go back to Windows. Can't handle a high pressure work stituation with a flaky laptop. Still keeps crashing - folded up today a couple of times within an hour.
Although I should probably try and investigate more. Problem is there's about four different ways it crashes, you don't always get anything in logs.
Can't reinstall X packages. It complains about needing a version of some package (libcheese) which won't install, presumably because it's not in my repository. It appears that I upgraded the HWE stack, which included a load of packages labelled wily, but wily is not in my apt sources.
This can't be right surely?
What you want is a Macbook, then you get the benefits of Windows style reliability for all the graphics stuff just working, but can run native Linux apps etc as well.
Ah but if he needs a powerful machine apple don't make any, I use macs at home and Linux at work for this reason.
And that's only a slightly trolling response 😉
If you need reliable and Linux disable the Intel GPU and take the battery hit.
If you need battery life take the GPU performance hit.
If you need both, try Windows and good luck with all the things you needed Linux for? 😉
If you want Linux + raw power, just hire cycles on a cloud somewhere and remote the display...
Yeah well, my faith in Linux may have been misplaced, which might've influenced my previous decision had I known how bad 14.04 was going to be.
16GB limit on the Macbook would've been an issue, although not a major one.
If you need reliable and Linux disable the Intel GPU and take the battery hit.If you need battery life take the GPU performance hit.
I can't do either of those things. The BIOS option to do that is gone.
Anyway.
Managed to find a suggestion of packages that removed all the -vivid and/or -trusty related X/Intel packages and replace them with the non version specific names. Which I assume means it's the latest one from the repo... which appears to be trusty. So.. in graphics terms at least i've effectively undone the hwe upgrade. But that was kind of broken it seems.
I suspect something to do with OpenGL rendering. glxinfo reports hardware rendering, but server vendor string is SGI - suspicious, but I don't know how significant that is.
**** this shit.
Decision made. Windows at the weekend. Just have to make it through the week without an embarassing crash in front of a client.
If you want Linux + raw power, just hire cycles on a cloud somewhere and remote the display...
Also not really an option for many types of work that require raw power.
I think this machine's got faulty hardware. I've installed clean Windows on it (on a spare partition) and it's already fallen over more times than a Blackpool hen night.
Right, I think it might be RAM related.
The minidump from the Windows crash implicated the kernel, which according to the internet suggests a ram problem. I did a Windows ram check last night which came up with nothing, and today it's been fine.
I wonder if somehow exercising the entire RAM helped?
I wonder if somehow exercising the entire RAM helped?
Very unlikely, unless it's not seated right and heating up helped it.
Reseat the ram?
Yeah, it's on my list.
About 19 hours without a crash now. Windows has also settled down from its initial update mania, so it's cooler.
I'm still on 12.04 after the installation of 14.04 said my graphics card wasn't upto it...