Robert's admin blog: Samsung SSD 840: Endurance Destruct Test

Thursday, October 10, 2013

Samsung SSD 840: Endurance Destruct Test

When you operate a Datacenter with many servers you also probably have big number of installed disks. In most cases even you have cluster, especialy in small companies, there is still SPOF (Single point of failure) somewhere.

In this SPOF it's critical to know condition of the system. When system using HDD it's important to know HDD condition to prevent failures, plan maintenance etc.

We will talk about Samsung SSD 840 PRO series. We realize this SSD has very good performance and lifetime. Before we use it in production we must know how to monitor condition. There is many articles and technical specification but we had a lot of questions without answer.

ssd-disks

(Update: 10.10.2013)

(Samsung performance issues, read new post, 03.12.2014)

For example:

What happened when smart normalized value "Wearleveling count" (WLC) drop to 0 ?

Will It effect disk performance ?

How temperature change during utilization of disk ?

How many data can we write to disk without error ?

Hardware

Samsung SSD 840 PRO - 128 GB

Dell Server R320

32 GB ECC Ram

1x CPU, Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz (6 Cores + HT)

SAS Controller, Perc H710 NV

OS Debian Squeeze

Linux Kernel 3.2.2

Tests

Our test consist of two stages which are in a loop.

Stage 1 is performance test. This test writes 5 GB of data to disk for seq. read/write, rand. read/write, tests. During this stage we write something about 20 GB of data. We measure speed, latency and iops.

Stage 2 is fill test. This test measure all values from Stage 1 but fill whole disk.

All stages writes data with option "sync".

Test server is monitored separately for CPU, Memory, System Load Average, Network, Stats of active Sockets, Number of Processes, IO stats, Disks free space, number of connected users. We also monitor all Smart values from SSD disk.

You may ask, why monitored all these values. Simple answer is, to know what is going on the server. When we will processing the results of these tests, we should see some strange values in some time intervals. We must have additional information what happened to decide this results is false or it's real behavior of SSD disk.

I use FIO utility for all tests on SSD.

Overal I have 58 graphs from the server, most of it's smart values.

Wearleveling count

What happened when it drops to 0 ?

Simply nothing :). This value drops to 1 when it writes about 465 TB of data. This value I count from number of tests and crosscheck with Smart Value - "Total LBAs Written". What I realize this is only prefail value and there was no errors or sectors reallocation. To this days disk wrote another 235 TB without any errors or reallocations, and test still continue.

ssd-test-war

Wearleveling count vs performance

As you can see on graphs below, there is no performance decrease before or after we reach 1% WLC.

ssd-test-iostat

This graphs is only for comparison with indicatively values. You should see no performance decrease during test.

When we want exact values we must look into result logs from fio utility. Section Graphs from FIO utility of this article.

Temperature

In our datacenter temperature of SSD is somewhere around 26 °C. When SSD start writing and reading temperature grow up to 39.9 °C

ssd-test-temp

Data written

As I mentioned above, test isn't done yet. When I wrote this article, Total LBAs Written was 1503673300498. This value show how many 512 bytes blocks ware written.

LBA_Value * 512 = Bytes written

So we written about 700 TB to this 128 GB disk. Our tests have 148 GB (128 GB fill disk and 20 GB test disk). We run about 5000 tests which I can see on counter. When we divide 700 TB with 148 GB we get around 4870 tests. It's really close, difference is around 3%.

ssd-test-lbas

Bandwidth and IOPS

First look on manufacturer site for some specification.

Manufacturer specification:

Seq. read up to 530 MB/s

Seq. write up to 390 MB/s

Rand. read speed up to 97 000 IOPS

Rand. write speed up to 90 000 IOPS

Let see my measured values:

Seq. read ~ 275 MB/s

Seq. write ~ 300MB/s

Random read over 250 MB/s

Random write ~ 100 MB/s

Random read speed ~ 65 000 IOPS

Random write speed ~ 28 000 IOPS

Seq. read speed ~ 67 000 IOPS

Seq. write speed ~ 75 000 IOPS

Quite interesting results. Some values are similar to manufacturer specification and some are really different.

Graphs from FIO test utility

Conclusion

Test still continues, disk have WLC on 1% but no reserved blocks was used, no reallocation, no errors.

When we reach 1% of WLC, disk write about 465 TB of data.

It means if your server writes daily 20 GB of data, it will take 65 years. For example if you rewrite whole disk every day you reach 1% of WLC in about 10 years.

If you plan renewal HW every 5 years, you can be safe if you rewrite whole disk twice a day.

What I see, it's good to monitor these values:

Normalized value of WLC

Reallocated sector count

Normalized value of Used Reserved Blocks Count

Dictionary

Wearleveling count - The maximum number of erase operations performed on a single flash memory block.

Reallocated sector count - When encountering a read/write/check error, a device remaps a bad sector to a "healthy" one taken from a special reserve pool. Normalized value of the attribute decreases as the number of available spares decreases.On a regular hard drive, Raw value indicates the number of remapped sectors, which should normally be zero. On an SSD, the Raw value indicates the number of failed flash memory blocks.

Used Reserved Blocks Count - On an SSD, this attribute describes the state of the reserve block pool. The value of the attribute shows the percentage of the pool remaining. The Raw value sometimes contains the actual number of used reserve blocks.

UPDATE 10.10.2013

Our SSD is finaly dead after almost 5 months of heavy write test.

Some numbers

more then 3 PB written

rewritten more then 24 400 times

stable temperature about 37 C

and some nice graphs

UPDATE: (Samsung performance issues, read new post, 03.12.2014)

25 comments:

AndrewJanuary 8, 2014 at 5:20 PM
That drive sure took a beating Interesting to watch how it wore.

Thank you for sharing all that detailed and well put together info.
ReplyDelete
Replies
toffitomekFebruary 9, 2014 at 3:37 AM
brilliant and useful article!

How did you generate SMART graphs..?
ReplyDelete
Replies
Incredible Samsung 840 Pro MLC enduranceFebruary 18, 2014 at 7:47 AM
[…] just have to share this: http://www.vojcik.net/samsung-ssd-84...destruct-test/ This person made a write endurance test on a Samsung 840 Pro, 128 GB version in order to check […]
ReplyDelete
Replies
Robert VojčíkFebruary 18, 2014 at 8:02 AM
Hi, I used cacti project with my own scripts to retrieve smart information from disk.
ReplyDelete
Replies
SSD im Server als System, und was noch?February 21, 2014 at 11:44 AM
[…] […]
ReplyDelete
Replies
AlexFebruary 25, 2014 at 11:21 PM
Thanks so much! My 840 Evo had a wear leveling count of 1 after only 20 days! I was horrified. Your work is the best proof I need that the drive is safe to use.
ReplyDelete
Replies
AndreyFebruary 27, 2014 at 12:40 AM
Hello,
thanks for sharing that information - you did really impressive piece of work. Could you also share fio scripts that you used to perform tests. I want to try doing similar things working with bigger 840 Pro's.
Can you also share how did you bypass server raid controller. I am struggling with Dell r620 and its PERC H710 raid controller. It doesn't allow me to pass trim commands to the drives. Can you advise something about it?

thank you
ReplyDelete
Replies
AnonymousMarch 3, 2014 at 3:32 PM
[…] Belastung z.B. als Storage in einem DB Server oder als Plattencache. Von der 840 Pro 128GB gibt es hier einen Endurance Test, da sieht man auch gut die Entwicklung der Reallocated Sectoren und welche Dimensionen die annehmen […]
ReplyDelete
Replies
Robert VojčíkMarch 13, 2014 at 9:53 AM
Hi Andrey,

many people ask mi about scripts. Originaly scripts not ready for distribution. They was very specific. I plan to similiar scripts on github. Both graphing smart values and example of testing . So stay tuned ;)
ReplyDelete
Replies
Chris WetemansApril 30, 2014 at 5:08 PM
Was it a pro or non-pro 840 you used?
ReplyDelete
Replies
Robert VojčíkMay 2, 2014 at 1:24 AM
Hi Chris, thank you for your comment. I updated article, it was PRO version.
ReplyDelete
Replies
VitaliyJune 12, 2014 at 1:47 PM
Thank you for great article! I don`t have any doubts about buying this ssd. You`ve made a good job.
ReplyDelete
Replies
MichałJuly 1, 2014 at 2:33 PM
I was wondering how do you normalize Wear Leveling Count? For example I get:

177 Wear_Leveling_Count 0x0013 098 098 000 Pre-fail Always - 61

No information about normalized value.
ReplyDelete
Replies
Robert VojčíkJuly 5, 2014 at 3:31 AM
Hi Michal,

this line consist of "ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
"

Column VALUE is our normalized value of last column which is raw actual value.. Best norm. value of the new disk is 100 and how data is written it slowly go down to 0. Your value is now 98, which indicates disk is in good health.
ReplyDelete
Replies
David HendersonAugust 11, 2014 at 6:07 PM
Wow! So I have the 250gb version, so I should expect it to get about double the data written since you used a 128GB, correct?
ReplyDelete
Replies
AnonymousAugust 23, 2014 at 3:54 PM
[…] […]
ReplyDelete
Replies
Paul MOctober 6, 2014 at 4:46 AM
as a rough estimation, yes; however, there may be a *proportionally* smaller area of flash for over-provisioning, but if you use TRIM then that shouldn't make much difference depending on how often you push it into write amplification.
ReplyDelete
Replies
supertrampOctober 22, 2014 at 10:43 AM
Hello everyone,

I have Samsung 840 EVO 120GB SSD. Today I realised I have a "used reserve block count error" on HDTune pro.

Here is the screenshot:

http://imgur.com/Bms5qnR

Samsung Magician and SSDLife reporting the same smart values but no warning.

Any ideas about the meaning of this?

Waiting to hear from you guys.

Best
ReplyDelete
Replies
Samsung SSD 840 PRO – performance degradation « Róbert VojčíkDecember 3, 2014 at 6:59 AM
[…] year ago I wrote blogpost about endurance and performance test of Samsung SSD 840 PRO. Some things has changed, especially firmware of […]
ReplyDelete
Replies
Erfahrungsbericht Supertalent SSD 128 GB von 2009 - 2015March 22, 2015 at 1:35 PM
[…] […]
ReplyDelete
Replies
PomaMay 15, 2015 at 2:46 AM
Thank you for that useful article
ReplyDelete
Replies
Solidata K5 64 GB vs. Samsung 830 oder Samsung 840 Pro 128 GBOctober 8, 2015 at 12:30 PM
[…] […]
ReplyDelete
Replies
Are SSD drives as reliable as mechanical drives (2013)? – segmentfaultOctober 28, 2015 at 8:36 AM
[…] links may help: http://www.vojcik.net/samsung-ssd-840-endurance-destruct-test/ […]
ReplyDelete
Replies
Overprovisioning bei Samsung 850 PROJanuary 26, 2016 at 5:40 PM
[…] Ich meine die Basis der Berechnung ist 5000 oder 6000 spezifizierte P/E Zyklen, aber wie dieser Endurance Test einer 840 Pro 128GB zeigt, sind die bei Samsung meist sehr konservativ die Garantierten Zyklen angegeben: Weiter sieht […]
ReplyDelete
Replies
Are SSD drives as reliable as mechanical drives (2013)? – Blog SatoHostJuly 11, 2018 at 1:04 PM
[…] links may help: http://www.vojcik.net/samsung-ssd-840-endurance-destruct-test/ […]
ReplyDelete
Replies

Add comment