A lot of people think that defective RAM modules are easy to detect. They think that if you get a blue screen of death, the error message will reveal that the memory is bad, and that’s that. Problem is, it’s nowhere near that simple.
There are two steps to diagnosing bad RAM. The first is that you are able to recognize the possible symptoms of bad memory, and the second is actually using a tool to confirm that your theory is correct. Then you just replace the bad module, and off you go.
A defective memory stick can cause a LOT of different problems, some of which might surprise you. It may seem that your hard drive is corrupted or defective, when in fact it’s the RAM that’s bad! It can all get very confusing.
And so, here I shall endeavor to describe what to look out for with your RAM, and what to do about it.
The number one fact to keep in mind at the start is that Windows error messages are not always terribly “reliable”. Sure, most of the time they are very helpful in the sense that they are telling you what went wrong. In the case of Windows 7, the operating system itself will often detect a problem and offer to fix it for you, and it actually works well most of the time!
But especially if you have just built a new computer with all new components, deciding which new device is causing you problems can be a bit hairy. The error message you get may be hiding a deeper issue. Even if your computer is an oldie but goodie, bad RAM can manifest in all kinds of different problems that will have you reinstalling software and jumping through all kinds of hoops for no good reason. And that, my friends, is bad juju.
I recently built a computer, installed Windows 7, and handed it over to its happy new user. That’s when the fun began!
First, it was BSOD’s indicating that data on the hard drive was corrupted. Chkdsk fixed that automatically. Then it was a problem with the graphics card – or so it appeared! Then it worked fine for a day or two, but each morning when the machine was powered up, there would be another blue screen. Of the numerous BSOD’s encountered, only one of them indicated a hex code (which I looked up on Google) that meant “Dude, you’ve got bad RAM”. The rest of the errors were indicating that there were problems with practically every other device in the computer, as well as firewall software, anti-malware software, etc!
Well, you might think that I had just constructed a real lemon of a computer. It turns out, though, that one of the two 2GB DDR3 memory modules was bad.
You see, there are many things you need to keep in mind about what the operating system is doing, and what the hardware itself is doing.
For example, in Vista and Windows 7, there is this wonderful feature called “address space layout randomization (ASLR)“. In short, ASLR will take chunks of the operating system code and randomly plop them in different locations in RAM every time you boot. What this means is that malware that is used to finding certain system files at the same location in RAM will not have any luck. It’s a nice little security feature that is also used in other operating systems.
So why does ASLR matter?
Well, when a memory stick is bad, it isn’t the entire module that’s kaput – it may be one little teeny cell that might even cause a single bit error in a huge chunk of data. That’s enough to cause a blue screen, though, if the bit is “in the right place at the right time”. When your computer with bad RAM boots, sometimes it might work fine, and sometimes it might not… It could depend on if a key system file is being written to the portion of the RAM stick that has the bad bit. It could also depend on if some service for, say, your 3rd-party firewall hits the bad part of the RAM, or tries to access a system file that is on the bad portion of the faulty memory stick.
While ASLR is a nice feature for security, it can make diagnosing RAM problems a bit more tricky! Sometimes things work, sometimes they don’t. You might then falsely assume that your RAM is fine, and it must be the hard drive, or the graphics card, or that new piece of software you just installed. Just because the system crashes on loading a 3rd-party firewall doesn’t mean that the firewall is the problem!
Also, keep in mind that generally speaking, all data in the computer gets stuffed through the processor and RAM. So just because it looks like your hard drive is corrupting data doesn’t mean that the data wasn’t corrupted in RAM before it was sent to the hard drive.
In any case, if you are experiencing a lot of strange behavior that doesn’t seem to make sense, stop pulling out your hair and download Memtest86+ v4.00.
The file is an ISO, so you’ll need to burn the ISO file to a blank CD (you can get ImgBurn here for free). If you have some other app like Nero, you can use that to burn the ISO instead if you wish.
Once the CD is burned, pop it in your CD/DVD drive, reboot, and make sure to boot from the CD. The memory test will automatically fire up and start running. I recommend that you wait for at least one full pass, preferably 2 or 3 just to be sure. It can take awhile!!
You should see something like the following screenshot. Note that red entries mean it found an error!
Now, here’s where it gets fun: Even if Memtest86+ doesn’t find ANY errors, that doesn’t mean your RAM is fine. Testing RAM thoroughly is not a quick and simple matter. To actually fully test a 2GB stick of RAM, various data patterns must be written to and read back from each byte on the stick to verify that each bit is able to be written to and read from without any errors. A complete, exhaustive test would take literally years to complete. But MOST of the time, tools like Memtest86+ will detect your bad RAM.
To make sure that Memtest86+ is not missing something, power off your computer. Insert only one stick of RAM at a time in the first slot. Power up the puter, and let the RAM test run again on only one stick. Then rinse and repeat for the other stick(s).
Yes, this takes a lot of time, and it may seem kind of annoying and even useless. But trust me on this one: I have seen memory sticks test okay individually, but when installed as a pair, there were literally thousands of errors. When those 2 sticks were replaced with 2 new identical modules, all the errors stopped, and the machine has been working fine ever since!
I have also seen the reverse: together 2 sticks were fine, but individually one of them had errors. I replaced the bad stick, and everything worked great after that.
As you can see, RAM errors are not simple. There are tons of variables to consider. For example, does single-channel vs. dual-channel mode mode come into play? It might, given a specific memory controller and a specific make/model of RAM!
The most important point you should take away from all of this is that sometimes, diagnosing problems – especially RAM problems – is not as simple and clear cut as it may seem at first. You need to make sure that you are not making any assumptions, that you keep an open mind by trying new and different diagnostic techniques, and above all you must have lots of patience!
If you persist, you will succeed in the end. And remember to have fun!