Gossamer Forum
Home : Products : Gossamer List : Pre Sales :

Large lists and speed

Quote Reply
Large lists and speed
Hi GT folks,

I'm in the market for an MLM for quite some time now and am seriously considering your product. I already have linksSqL and GMail and I adore both.

We have very large mailing lists (1 million plus subscribers) growing daily by 1000s.

The dedicated server will have GMail and Glist and GCommunity only and is a dual Pentium IV 2.0 GHZ with 1 GB of RAM. Modperl is installed and works great with GMail.

1) How fast will the messages be sent out. I guess I am looking for a rate i.e how many expected per hour. Can I for example expect 500,000 emails delivered per hour assuming email are sent when there is virtually zero activity with GM (2:00am-6:00 am)?
2) I am aware of some products out there that send email via sockets. Any thoughts on this regard? Does GL do this?
3) What would be the added performance benefit of adding an extra GB of RAM to server?
4) Is there a benefit of using something other than sendmail, e.g. postfix or qmail?
5) With users ability to have their own lists, is this a paid service we can offer? Can you please explain this further?

The server is hooked on a superfast 10 mbps dedicated connection and so there is no bandwidth restriction.


Thanks

Frank
Quote Reply
Re: [frankLo] Large lists and speed In reply to
Hi,

Gossamer List does not currently do anything special for high volume mailings, and it's upto your mail server to handle things. That said, I would _not_ recommend sendmail for that size of mailings, but rather qmail (postfix is probably just as good, it's just we have much more experience with qmail).

The rate depends on a lot of factors such as the list quality, the speed of recipients, the size of the mailing, the mail server used, the quality of the network, the amount of bandwidth, etc. On sites we've worked with we've seen qmail push out 5,000 messages a minute (which would get you about 300,000 an hour).

With qmail, the important thing is to have fast disks. Make sure you have good scsi drives. 1 gig of ram would be adequate.

Hope this helps,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] Large lists and speed In reply to
Still scsi? Get SATA! Wink
Max
The one with Mac OS X Server 10.4 :)
Quote Reply
Re: [maxpico] Large lists and speed In reply to
No, SCSI drives are much faster then SATA. =)

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [maxpico] Large lists and speed In reply to
Yup. SCSI Drives are quite a bit faster than SATA in the real world.

Don't forget that SCSI drives are "smart" using high level commands (and its own language) to transfer data autonomously, and they take 95% less overhead than IDE/SATA drives to transfer data.

These days we use 15,000 rpm SCSI320 Drives - incredibly fast. There aren't any 15,000 rpm SATA drives.
Quote Reply
Re: [webslicer] Large lists and speed In reply to
Much faster? Don't really agree. SCSI is an old technology nowadays... SATA has a greater troughput with a tenth of the space required for cables (compare a scsi connector and cable with a sata one. This is very important on rack servers!)
Max
The one with Mac OS X Server 10.4 :)
Quote Reply
Re: [maxpico] Large lists and speed In reply to
Max, I love those SATA cables!

But, Max, sorry, but physics and electical engineering rules here. The sustained data transfer rate of a drive had got to do with the number of bytes of actual data per track (averaged over the drive as tracks towards the center are shorter with a lower total .)

No current drive can transfer sustained information at the cable /interface rate.

SUSTAINED Transfer Rate(in Mbits/Second) = RPM/60 * average track data amount

Sustained therefore is limited mainly by the RPM and data bit density, as well as the track-to-track stepping speed including settling, and that is why 4200rpm laptop drives do not excell at video, etc.

Random output will take into account buffer size, algorithm, head weight (which governs how fast the head can be moved between tracks), track-to track delay, and track spanning speed, and settling time (how long it takes for head to settle into position taking into account deaccelleration, etc.

SCSI Drives have since day 1, had built in microprocessors that executed a high level data transfer language, which took away the load from the CPU.
In fact, multiple SCSI drives can make transfers between themselves unattended once the commands are given. So multi process situations are one place where SCSI drives destroy the competition. This helps in mult access/user DATABASES, web servers (!!!), etc.

Output can be increased by increasing RPM, increasing bit density, decreasing error correction data needed, and reducing CPU processing time.

and SCSI320 uses 64 bit contollers with quite a bit of smarts, and is MUCH, MUCH faster than SATA for web servers and all mid to hard use applications.

Present SCSI Drives are very fast at track to track stepping as well as random access, and the specs they have beat ALL presently made SATA drives, and that's not even mentioning the RPM speed advantage of going at 15,000 RPM.

In fact, they're 4+ times as fast in web servers under load.

Last edited by:

webslicer: Feb 2, 2004, 8:29 AM
Quote Reply
Re: [webslicer] Large lists and speed In reply to
Yes, I know that the data rates promised by SATA are not being reached in the real world.
But I am very skeptical about benchmarks as they really depends on which configuration you are using.
It's a fact, though, that SCSI will die in the near future and SATA will take its place with SATA-II already being implemented. Modern rack servers (for webservers too) has now only ATA and SATA drives. Do you know about ATA-to-Fibre Channel interfaces? I work on video editing too and tried a lot of systems... Those interfaces (like the ones Apple developed in their Xserve RAID) are one of the fastest on the market.
Max
The one with Mac OS X Server 10.4 :)
Quote Reply
Re: [maxpico] Large lists and speed In reply to
Hi,

Seek times are much more important for webservers and database servers then throughput (which is important for multimedia/video editing). Most webservers/databaseservers won't come close to needing the full throughput of a modern drive. Have a look at:

http://storagereview.com/...php?page=LeaderBoard

where 15k scsi drives are almost twice as fast as the best ata/sata. Also, the majority of 1U rack servers use SCSI (look at Dell, IBM, Compaq, etc).

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] Large lists and speed In reply to
You've heard about the most successful online music store? iTunes? I'ts completely based on WebObjects (a Java server solution from Apple) and for DB Sybase If I remember well. Well all this super high load system is completely based on Xserve and Xserve RAID. Databases, pages displays, music transfer and so on... All on ATA drives.
I would not be as sure as you to affirm that SCSI is much faster than newer ATA drives. Not sure at all.
Max
The one with Mac OS X Server 10.4 :)
Quote Reply
Re: [maxpico] Large lists and speed In reply to
Drive to Drive, the SCSI320's beat the SATA's. We have 10,000RPM SATA's on workstations, and we also have SCSI 320's on servers.

You're talking about a large commerce application.

A Store like iTunes will be built on a full database commerce platform, which as you point out, typically use large drive arrays and distributed databases on multiple servers. They also use MANY GB of Drive Cache RAM or virtual RAM drives to speed things up, and custom server tunings, and special server software.

You get a lot of performance with a couple million of hardware invested, right?

Who says they are on ATA drives, though? Where did you hear that?

You must remember, Apple made SCSI very popular years back!

There was a very recent article / tech analysis comparing drive types on one of the hardware sites recently.
I was amazed at the difference in speed, because I had thought that the SATA drives would make a better showing. (not video streaming)
If I can find it, I will post it for your thoughts.

Oh, and final thought.... System to System, the Apple OS's are much faster with drive operations over Windows. Got a little bit to do with where the directory index files are placed on the disk, etc. Of course I'm not saying Linux or Unix.
Quote Reply
Re: [maxpico] Large lists and speed In reply to
I'm not sure whether it's really worth arguing this with you or not, since it seems you think that the Apple XServe is the best thing since sliced bread (I don't have a thing against it - it looks like a really nice machine), but here goes anyways.

It's got nothing to do with Apple vs. PC or anything like that (though that wasn't really anyone's point). It does not really have much to do with Serial ATA vs SCSI interfaces either. It's mainly the fact that current top-of-the-line serial ata drives only spin at 10000RPM (the ones in Apple's XServe to my knowledge are only 7200RPM drives), while SCSI drives spin at 15000RPM. That difference in spindle speed makes a huge difference in latency, which is very important for database applications.

This link here compares the performance of 4 drives:
  • Fujitsu MAS3735 (one of the fastest 15k scsi drives)
  • Maxtor Atlas IV (one of the fastest 10k scsi drives)
  • Western Digital Raptor WD740GD (one of the fastest and only 10k sata drives)
  • Hitachi Deskstar 7K250 (one of the fastest 7200rpm sata drives)

From these numbers, looking at disk read times, it goes from 5.6ms for the 15k to 12.1ms for the 7200rpm drive. That makes the 15k drive more than twice as fast for an average read.

You mention iTunes as your reason why SATA drives are faster than the 15K SCSI drives, but there's not much to go by there. We don't know what kind of setup they have running the operation. They could have a million Xserves running it, or just one.

I thought this would be fun to figure out. This article says that iTunes has sold over 20 million songs in around 8 months. On average, that's around 1 song download per second. Considering people aren't going to be evenly downloading songs throughout the day like that, take the peak, say they do 6 hours of this 'distributed' traffic in 1 hour, then that's around 6 songs purchased per second. If a song averages at about 6MB and the average download speed is 10KB/s, then it would take about 600 seconds for the download to complete. In each of those 600 seconds, you have 6 more people starting downloads, so that's around 3600 simultaneous downloads, downloading at a total of 36MB/s.

With 100,000+ songs at around 6MB a piece, that's at least 600GB of space needed. Triple that to be safe, so they need 2TB of space that can handle 36MB/s of sustained, random transfers. Sweet. Smile

Man this thread has gone waaaay off topic...

Adrian
Quote Reply
Re: [brewt] Large lists and speed In reply to
Btw, Mac OS X is unix (in the general term of course).
Read the Xserve RAID page. They are using that and here you'll find ATA drives with super fast fibre channels interfaces. 3.5TB in 3U.
SCSI has been, and still is, a great technology and you're right. As with USB and Firewire, Apple was the distributor for SCSI. But now this technology is no more used on newer servers. The compromise between performance and various engineering needs has made companies like Apple to chose (successfully) ATA and SATA drives. Their performance really depends on what configuration you're using just as I said.
That's why it's faulty to say (like many other things on planeth Earth): SCSI is much faster than SATA.
Like Mhz for microprocessors, RPMs are not the only thing to value the performance, Adrian.
Max
The one with Mac OS X Server 10.4 :)
Quote Reply
Re: [maxpico] Large lists and speed In reply to
I never said the SCSI interface was faster than SATA. There is NO RPM myth. Latency and bandwidth are different and affect performance in different ways (eg. video editing requires bandwidth, while databases need low latency). No changes in the interface (SCSI/ATA/etc) or bus will affect latency as much as an RPM improvement - a 7200RPM drive will never have as low latency as a 15000RPM drive, no matter what interface or bus used. Note that we're only talking about the latency aspect of performance right now as this is the most relevant with high performance database/web applications.

Adrian
Quote Reply
Re: [brewt] Large lists and speed In reply to
No, that's the fault. You can't talk about just one aspect of performance (like latency). The buffer memory size and the interface are playing a great role on that. The rpms are good to measure the drive itself but the drive itself it's useless without the computer. And for the computer what really count is the interface.
The performance is a statistic of overall aspects (interface, rpms, memory buffer, etc.) on drives as well as on computers. And again you can't just talk about one particular application like databases. It depends. And we're talking about the general performance of SCSI vs ATA/SATA drives.
The conclusion is that Alex and webslicer sentence (SCSI is much faster than SATA) is basically wrong. An engineer would agree with me (like we see on what they produce in the real world -again, modern servers are ATA based-).
Max
The one with Mac OS X Server 10.4 :)
Quote Reply
Re: [maxpico] Large lists and speed In reply to
I'm concentrating on latency because, we're talking about applications (database, and to be on topic, a heavily loaded mail server) where it is one of the most important factors. For a heavily loaded mail server, it will be taking many relatively small chunks of data and dumping these to disk as well as reading these from disk. Here, bandwidth is not as important since most of the time will be spent getting the drive in position to read/write data. The latency of a 15000RPM drive will greatly out perform a 7200RPM drive (again, regardless of the interface).

Of course, the interface and bus to the system does make a difference, but this generally doesn't make such a performance impact as the latency does in such applications.

Here's some decent reading on SCSI vs ATA:
http://storagereview.com/...compPerformance.html
http://storagereview.com/.../if/compSummary.html

Adrian
Quote Reply
Re: [brewt] Large lists and speed In reply to
Again it depends on configurations.
You can achieve great output performance with 7200 rpm drives. I gave a world recognized example of that and those pages are a bit outdated. Tongue
Max
The one with Mac OS X Server 10.4 :)
Quote Reply
Re: [brewt] Large lists and speed In reply to
Hi,

Not to want to jump in here, but my previous ISP made a big point about putting the mail spoolers on solid-state hardware. I forget the details, but for high-performance you need RAM-DISK type access. These units are expensive, or at least they were ($50k and up at the time), but the performance obviated the need for additional servers/hardware. Depending on configuration speed increases of 200-800% are available.

There are now plug-in cards that can move "hot" files to solid-state access, while keeping the "cold" files on traditional disk. For email, outgoing mail would be "hot" while received mail would be "cold".

Anyway, just thought I'd mention it, since 300,000 and such numbers, are really pusing the limits of mechanical devices.

I never quite understood the reason for the need for solid state devices for email, but this discussion made a few things click.

There is no way even the fastest mechanical spinning disk can approach the speeds of solid state access. Small files, stripped chunks, etc all point toward installing solid state hardware to improve massive outgoing email performance.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Large lists and speed In reply to
Yeah, solid state disks are nice, but extremely expensive. Looking at the BiTMICRO solid state scsi drives, they have access times of 33-48 usec (3.3 x 10^-5 seconds) vs a 15k drive of 5.6 ms (5.6 x 10^-3 seconds). The sustained transfer rates (34MB/s - 200MB/s) are pretty nice too.

But with applications like mail servers, it's often cheaper to just distribute the load to multiple servers. This also gives you redundancy which is always a nice thing.

Adrian
Quote Reply
Re: [maxpico] Large lists and speed In reply to
Max;

Engineer?

Uh, I AM a engineer (electrical), and I WAS a computer manufacturer and computer designer. I've built and sold several hundred computers.

Drives are physical devices and the main physical limiting factors are head weight and drive rpm.

Even the most modern interfaces are held back by the real world physicality of the drives. Only RAM disks can exceed these. Buffers are tiny compared to total drive storage, not even 1/4 of 1 percent. A drive reading from multiple heads at once would increase streaming, but not affect latency, which is mostly affected by RPM.
Quote Reply
Re: [webslicer] Large lists and speed In reply to
But you're not an Apple engineer are you?!@#

Adrian
Quote Reply
Re: [brewt] Large lists and speed In reply to
>> But with applications like mail servers, it's often cheaper to just distribute the load to multiple servers.
>> This also gives you redundancy which is always a nice thing.

Actually, their argument was the opposite. The costs involved in extra mechanical devices, from the servers/cpus, heat build up, electricity, etc, when coupled with the performance gain, and decrease in support/technical issues involved in maintaining a "farm" of servers, was more than made up for in the cost.

They needed quite large mass storage, for their mail server, but for a smaller operation, it might be possible to use a smaller solid state device, and feed it.

I was looking around, and not at prices, but at stats. Seems to be that about 500 meg is where they peg most mail server loads. (Unless you are an ISP or routing hub). So, a 1 gig device would cover most needs. Most PC's can handle 2 gig or more of ram, so it begs the issue, why not set up a RAM disk?

Since the concern is OUTGOING mail, generated to lists, if a device crashes, the _worst_ that happens is that a user or two would get a duplicate mailing.

If your "check" email addresses were the first and the last of any list, you'd know which lists started, and which succeeded pretty easily.

Like I said, I didn't want to jump in, but I've been doing a lot of "retro" work lately, digging into old servers (256k DOS days, etc) and migrating some legacy Unix servers to a more modern configuration. No one uses RAM disks any more... but maybe in this case they can work.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Large lists and speed In reply to
Quote:
so it begs the issue, why not set up a RAM disk?

Putting your queue on a ram disk would violate the rfc for an smtp server. Your smtp server is technically only allowed to say OK when the mail has been retrieved and stored on disk (not just in cache -- which is why qmail fsync's the disk before saying ok, as even if the disk crashed just after saying ok and you lost everything in the disk cache, you would still not have lost mail).

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] Large lists and speed In reply to
But we are talking about a niche application here. An out going list mailer. None of the messages is irreplaceable, the only "cost" is some people might get a duplicate mailing. It would seem this is a "niche" which the RFC isn't directly looking at.

In that case, the risk of a ram disk/crash would be about the same as a hard-drive crash which is possible.

But, I don't understand the whole sendmail issue, since I've never needed more than basic/reasonable capacity from any mailer <G>, and our *mass* mailings are in the hundreds or low thousands at best. I understand in general how mail is sort of stripped then assembled, queued and cached, but not the specifics (and my first look at a sendmail config file turned my hair white!).

It seems that mail serving might be hard on a disk, actually, and solid state devices might pay for themselves in the long run. At what point that performance/savings kicks in, I don't have a clue :) I've load balanced indirectly server loads, but not email. It just seems that at the high volumes of out going mail mentioned, the amount of wear and tear on the disks would require replacement more often than their MTBF would suggest. Solid state devices should give 4-5 years of performance, without degredation -- longer if properly cooled and climate controlled (but at that point, the solid state technology would probably have improved to warrant upgrade anyway).

I realize there are costs here, but I got the impression we weren't looking at farming 2 servers, but 3-5 servers, or even more, to handle the load in real time. At an 800% speed increase, or performance increase, that's a lot of duplicate hardware a solid state device can replace. Granted, I'm not sure where the 200% vs 800% speed increase occurs, but I would imagine the more disk I/O that is saved, the higher the performance.

Would mailing to a list like this be higher or lower on the spectrum of disk I/O ?? The higher up, the more cost effective solid state devices would be.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.