Gossamer Forum
Home : Products : Gossamer Links : Version 1.x :

New Server - How Many Links Could It Hold?

(Page 1 of 2)
> >
Quote Reply
New Server - How Many Links Could It Hold?
Hi,

We are on the verge of getting a new, dedicated webserver with the following config.

Dell PowerEdge II PIII/700
Double CPU Server
448MB RAM
12/24GB DDS-3 tape drive

How many links do you think this could hold? We are really aimming at the 2 million or more mark, can this handle it?
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
aren't tape drives for backup?

how much disk space is there for regular hard disk space?

jerry
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
Sorry bout that,

We are getting a 9.2GB SCSI hard disk. I think that the server will be bogged down by the ammount of links before running out of space.

We are really trying to get this machine to be able to handle the load as much as possible. Mod_Perl will be used, as well as anything else we can come up with the store all the links.
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
The short answer? Yes and no.

You didn't specify an OS. Under NT, I wouldn't even make a guess.

Under Unix, you will find your bottleneck is probably going to be disk I/O with this configuration, and for 2 million links, and any amount of reasonable users you'd probably want to have at least two disks sharing the load -- even if one was doing logging and the other was doing serving.

You might even want 3, one for swap space, so that you don't get killed by the head positioning I/O bottlenecks.

But it depends on your traffic, too. I gather from the CPU configuration you are anticipating a high hit load.

Physically, the system should be able to contain the links system (I think Alex posted some stats on file size per link in a discussion about 2 weeks ago).

The concern is going to be "load" across the system. I think a single large disk is going to be much slower over all than multiple smaller disks. You're going to have loads of open files, and multiple simultaneous reads/writes across the system.

I wish I could get some hard performance stats on webservers in real-world situations (not just Links). I've tried to plan and I find a lot is guess work based on current performance. But it's not easy to figure since a lot of programs have start-up overhead that isn't there under constant load, and other programs start to slow down and use more resources under the same situations.

Going by what I've been able to get out of the mod_perl docs and mysql docs, my own stats, etc, etc. You really are going to need a second disk for performance to split the writes across the system.

You've got a good database server, but if you are going to sustain a high hit volume, you might want to run two servers - a lean one to handle the standard html calls and front door to the system, and the database server that's the "workhorse" unit running the database and the mod_perl servers.

I'm looking at doing something similar, putting a higher powered machine behind the main server to run the database and cgi-calls, and keep the main machine lean and responsive. That way I can keep track of which parts of the system/network are bogging down and tune/upgrade that part to meet the need.

The needs of a database server are a bit different than the needs of an http server.





------------------
POSTCARDS.COM -- Everything Postcards on the Internet www.postcards.com
LinkSQL FAQ: www.postcards.com/FAQ/LinkSQL/








Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
Sounds good, so two 9.2GB's or something like that would be the way to go ay...

I am not that good with web servers at the moment, and would have difficulty setting up links to work across two hard disks, but that is definatly something I will work with.

The server is probably going to be running Redhat Linux 6.2 - but we may use FreeBSD, we are really looking for performance, and I am guessing FreeBSD is faster, and will look into speeds of the different OS's and post it here if I find anything worth talking about.

------------------
Michael Bray
....
Review your webhost, or find a new one at http://www.webhostarea.com


Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
 
Quote:
The needs of a database server are a bit different than the needs of an http server.

OK - Now you got me curious. What are the different server specs you would be looking for with a database server then you would with a http server?

I thought that server there which we are going to get in the next few days is a very high end all round database/http server.

Also - I heard that Links SQL can easily be ported to other databases... what other linux compatible databases that are better then MySQL exist? Is that PostgreSQL (or something like that better?) Is Oracle better/Linux compatible?

We really need everything here to be capable of an enourmas ammount of links, and of a quite substantial ammount of hits.

Thanks alot. Sorry bout all the questions. We really need a high end system here.

PS - Any sites that would give me info on setting up links ( cgi in general ) to run across to hard disks/servers?

Thank!
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
Actually, under unix, your software has no idea what "disks" it's on. Unix has a very abstract and quite bizzare (for the Dos/windows world) concept of "filesystem"

If you mount your "logs" on the "system disk" (lets call it) and put the apache server software there as well, you can put the "usr" file system and "/www" file system on the second disk and mount it wherever you want. I've actually mounted the second disk as /www and it's got the whole physical medium. Everything else (which is amost trivial load) runs off the other disk, which is partitioned into the rest of the file system. The logic was the /www partition was the one that would grow uncontrolably, and would be the one I'd want physically portable if possible. My logs have 2+gig of space on the main drive, and they are compressed and rotated weekly, and archived every few months, so there's plenty of breathing room there as well. At least 2-3 months of negelect without worry.

I'm also using the server as a mail server, so I had to allow a reasonable amount of space for /var and /usr -- but again, with an 8 or 9 gig hard drive, it's not a problem. That's more than most Unix systems ever had <G> All the Unix books install it on an 800 meg to 1 gig harddrive, if that... and the authors hem and haw about how to partition larger drives. There's no concensus. But, keep the suggested boot partitions, double everything else, triple your /var and /usr partitions, creat a /logs partition so it doesn't eat up your system, and then put everything else into your /www file system.

If you wanted, you could put change the MySQL defaults to put the data directory onto the /www partition, or even on it's own partition. On a 9 gig disc, put 2 gig into the /www partition and 7 into the MySQL partition... ?? Keep the binaries in the default location, just move the data directories.

This is all just rambling... Smile There is _NO_ right way to do it, but there are probably a few wrong ones, and trying to run off of one disk is wrong. Your logs are almost continual writes, so putting them on the system disk in their own partition makes great sense, and keeping the data/www files on a separate physical disk and partition makes it easy to move that disk if something happens to your system, as well as making it easier to back up ... ( backup /www )

With a dual processor sysem and 400+ meg of ram, you are probably not going to hit a CPU crunch (except with NT <G> ).

What you need to think about is disk space, I/O bottle necks, and bandwidth. Your system will _probably_ be able to saturate a 10 mbs connection when it's humming along. If you expect that sort of traffic, you might want to make sure you have a 100 MBS connect with "on demand" peaks. ISP's have all sorts of formula's for figuring out this stuff. A 486 running Unix could saturate a T1, so dual P3 700's can do a heck of a lot more Smile

Granted, that's a lot of traffic, and a T1 bandwidth on a 10 MBS ethernet is a reasonable baseline, and if your ISP averages out over the 24 hour day, and month, then an hour or two of spiking can still average out to T1 monthly usage, but 10mbps+ peak usage...

Anway, I'm still rambling, and coming around on my second 24 hours....
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
At the moment, we can really only get one high end server like I have the specs below.

I am guessing that if we got 2 servers, one with the specs below, and one a cheap ass 400 Celeron, with 128MB ram with the 9.2GB we could sort of use that to serve the HTML docs, and have the highend running the CGI and Database stuff...

The second server is the only problem, we have budgetted for one server... and made deals with the webhost to set that one up already.

We'll make money anyway - the server shouldn't really be a problem...

I am worried about crashing the system now. I can just imagine how fun it will be building a site that big Smile

------------------
Michael Bray
....
Review your webhost, or find a new one at http://www.webhostarea.com


Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
As for a database server different from an HTTP server... think about what each is doing.

AN apache http server is about 800k (at least mine was before PHP <G> ) If you add mod_perl to that, each process is about 4 meg (?). That's a bloated chunk of memory, a lot of start up overhead, but faster execution once started. Especially if the same cgi programs are being run, and persistant database connections can be maintained, and most calls are cgi based.

If you are doing mostly http calls, simple "kick the file out the port" type work, mod_perl bloated servers are _not_ the way to go. In fact, that's part of the effort to come up with some guidelines on when it pays to run two servers (check their site).

A database server needs fast disk I/O, good CPU speed to process requests. A server is also tuned to "serving" database files, so read/write access is optimized, and all sorts of other memory, bus, and bios tweaks. The software is leaner, less unnecessary services, and the machine is allowed to do what it was set up to do -- Operate a database system, and manage it's own processes.

An http server on the other hand can be lean, lower powered. Disk I/O is not as near as important as enough RAM to allow the servers to load and keep a cache around. CPU load is very, very low. On my apache system, the servers can't make the machine breath hard even under heavy load. The database on the other hand, when the beta spider is running, can get MySQL up to almost 50% CPU usage (It usually hovers around 2-3% with Link SQL). I did that with only 3 spider processes.

On the other hand, I can serve 40 or 50 http processes with lean apache and have the CPU at about 2-4% depending on how much CGI is being run.

So.... you want a server with fast CPU and system archetecture set up with the software optimized to allow the database to access system resources without having to "share" with unrelated processes.

The other way to look at it is, you've got a really good database server, especially for this sort of read-mostly write-hardly access. You won't be waiting on it much, if ever, even during peak usage. On the other hand, you've got a freakin' awesome http server, that is probably overkill unless you were running a couple of dozen active virtual domains off it.

If you combine the two on one machine, you've got something in the middle of each.

I guess a good analogy would be my DSL line. I've got a 640k line, because I really don't need faster (my cost/value falls off like a cliff after that point, I just want fast pages <G> ). But, I get 640k DOWN stream, and 90k UP stream, with the upstream given preference (I guess because my CPU is calling the shots). If I use 40k of my upstream bandwidth, I lose _at_least_ 300k of my downstream bandwidth.

The database system is like the UPSTREAM and the http is like the downstream. You will lose a lot more of your HTTP as the database sucks it away to keep itself running. Unix is good about making things share, but still, http uses very little resources per process (it's designed that way on purpose, and historically) but databases _need_ more resources, and every feature we ask be incorporated makes them suck more. The bigger the database package, the more horsepower it needs to do the same work (MySQL is _very_ efficient compared to things like Oracle, but it has fewer of the advanced features).

There are no _rules_ for this. Some applications coexist very, very well together, even when they might not on the surface appear to. Others don't get along at all, even if the raw resources are apparantly there.

Links SQL is not a big hog of reasouces (but I only have a realtively small database) but the rest of my site is all cgi and qmail (we guide people through the postcards process, then have to send it).

My overall system usage is still under 10% even running the spider several hours every night.

Your mileage will vary Smile

Oh, BTW, there is _nothing_ better than MySQL out there for the price, and for most applications Oracle is over kill, and MySQL performs better. Check the mysql.org site for some stats on that. If you don't need the advanced and "optional" features of Oracle, MySQL is the way to go. You can port your applications and databases UP to almost any other system should the need ever arise.

If I were building your system, and wanted to put everything in place and grow into it as you are doing, I'd add a disk drive to the machine you have, and consider going to a hardware RAID archetecture if your traffic increases. You'll get better performance and data security as well. I'd put a solid front end machine next to it to handle the basic http requests, and put the CGI file system on the database server with mod_perl (a basic guide is on the mod_perl site). What's a solid machine? If you want to stick with Intel archetecture, a P3 600+ with 256 meg of ram, 2 4-8 meg heavy duty hard drives (hard to get anything smaller now adays), and set it up with apache, 25 or 30 waiting processes. You'll be able to monitor traffic on the HTTP server, as well as the database server, and you'll be able to tune each of them for better performance separately.

Can I give you specifics now? No. But I plan to be in your shoes in about 6 months, and I'm looking at a database server similar to yours, but based on Sparc archetecture since my ISP is a Solaris shop. (If they move to linux, at all, that might change.) I'll keep the Sparc 10 I have now as the front end, and maybe beef up the ram a bit.

Even though this machine is not breaking a sweat yet, I don't want to be in the situation of HAVING to do it all and make it work over a weekend, or in a crunch. I'd rather be there 4 to 6 months early, and have it all working and just scale up, not start out.

Make sense??

It would be nice if others who are running their own machines would jump in with stats -- hardware, usage, etc. Fewer and fewer sites publish that stuff any more, so it's harder to get an idea of what the various sites are using, and what is working best. Only the error messages hint at NT vs Unix <G>





Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
Hey, if you are "growing" you don't need the second server now. But, you may find that in 3 months, you will. Or that it would have been nice to do it that way.

Set up the server for the database, let the http take care of itself. It will. You can always add cheaper servers as the front end. It's the back end that kills most sites.

I did it kind of backwards, because I had to prove the idea worked. It does, so now I'm looking at putting in the database server behind the http server that's also running the database.


Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
I might do that then, and set this one up as designed for a database server, and then when we have some cash flowing in setup another server for http requests.

Thanks for the help everyone!
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
are there really 2 million packages of web hosting on the internet? Smile

jerry
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
Webhost Area is on a $99 a year Virtual server from http://www.tera-byte.com Wink

LOL - I don't think there will that much speed difference with the flat file and SQL version for Webhost Area, as I don't think it will ever exceed 5000 packages. Thing is it has a hell of a lot of columns to search.

------------------
Michael Bray
....
Review your webhost, or find a new one at http://www.webhostarea.com


Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
My recommendations would be to use the linux, mysql, and two copies of apache. Get as much ram as you can (hey, it's pretty cheap), as well as some quick drives. Setup mod_perl on one server, and a real light apache with mod_proxy as the frontend server. Make sure the first server handles all images and html requests, while the mod_perl only handles cgi requests. Try not to run any cgi on the front end, and you can then make it real light.

Some minor tweaks to Links SQL would also help speed things up. Like checking to see whether the category structure has changed, and use that to decide whether to build stats (which will take awhile w/ 2 million links). (This can be done by a simple file test of the Category.ISM file in mysql).

Hope that helps,

Alex

Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
OK then. Heres the story, we can only get one machine for the time being, but can get more then one hard disk.

I am thinking this is not as good - but will the server with the specs below hold 2 million links, and make it useable?

If not, without buying a new server, what upgrades would make it most useable?
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
There is a way to use two copies of Apache on one machine. You need to check the mod_perl site. The info is sketchy, but there. You use either two ports or two IP's.

Quote:
Similar to above, but one HTTPD runs bound to one IP address, while the other runs bound to another IP address. The only difference is that one machine runs both
servers. Total memory usage is reduced because the majority of files are served by the smaller HTTPD processes, so there are fewer large mod_perl HTTPD
processes sitting around.[/code]
Check the apache.org mod_perl site.
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
I'll do the reading - hopefully the webhosting company will be able to do it when its setup.

I am thinking I will definatly have to build the pages rather then use page.cgi Wink I am really sure on that one.
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
It's very common to use two apache's on one server with mod_perl (that's what I was mentioning earlier, you don't need two servers). In fact, that's how we are running our system.

Typically, you have a light apache with only the minimum info needed to handle serving of static pages, and images. Any request for programming, get's transparently passed to the larger (in terms of memory size) mod_perl server. This way you only need < 10 mod_perl children around, whereas apache typically can have more then 70 (on a heavy site).

It's usually pretty easy to set this up on a dedicated, not something that can be done on a virtual though.

Cheers,

Alex
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
K then - Sounds ALOT better then getting 2 servers!
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
It's a good start, or interim, before two servers.

You will still be swapping processes, and sharing resources. But with 400+ meg of ram and a couple of fast harddrives (and the dual processors) you'll be able to handle it for awhile.

Down the road you'll find that pulling the "light weight" apache off the machine to a leaner machine makes more "high end" resources available to the database server, especially if you are really busy. Even though apache is light weight, it's consuming resources on a high-end machine that could be served from a lesser machine. That would in turn, make more resources available for the database (mostly RAM and I/O space), and speed it up -- preventing an "upgrade".

Make sure to avoid "localhost" and use full machine names. It will make upgrades easier, by allowing you to install the new machine, give it the light-weight servers IP address, and shutting down the original server process.

IF you consider the two apaches to be two separate physical machines, on two separate IP addresses, you'll be on your way to a distributed network.

It's not been a common thing, but more and more people are doing that, so more information will be available in the near future. I'm about ready to do that, run a PHP version of Apache for sysadmin, and lean apache and mod_perl enhanced apache for the public server.


Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
We have decided to get the one server - but are getting it with 512MB Ram instead (Have already sent off the order to the host...)

I am gonna send em a detailed email about how to setup the server, cause they'll just use the normal one apache otherwise.

I am hoping to put a new Apache on each hard drive - but I'll see what the other people are thinking first.
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
Actually, you'll probably see the most benefit down the road by moving mysql to a separate machine that is dedicated and tuned to it.

Cheers,

Alex
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
A separate database server is the ultimate, but then the other http:// server has to be a bit more powerful to handle the beefed up Apache.

But, setting up the machine with two IP addresses, and using the full machine names, not "localhost" would allow for moving both Apaches to a new machine, and keeping the database server as it is.

My ideal would be to set up the database server, and get a machine like I'm running now, a sparc 10 with 256meg of ram to run the http daemons. That should be sufficient for most "read only" type situations.

If you intend to go "interactive" such as adding in user home pages, email, etc, and users have access to custom features that require two way interaction, you will of course need more horse power.

The caveat here, is that this can be of course set up on one machine, and run pretty well. The parameters we were given, though, were

Quote:
We are really aimming at the 2 million or more mark, can this handle it

If you have a very active portal type site, there is no way even one database server is going to handle 2 million links under full load of even a PIII http: daemon. I have no figures, so I won't make any up, but what sort of traffic are you planning on? That's the missing piece of the puzzle. I could run 2 million links on my desktop here, I just couldn't do it in a multi-user environment <G>

Also, I haven't seen many stories of MySQL with 1 million+ links. I'm sure by the time you grow to that size, the data will be in, and the program does keep evolving.

If you are planning on an "archive" type site, that's getting 50,000 unique "average" visits a day, you probably wouldn't outgrow that system before it needs replacing due to age. If you are looking at 250,000 or more visits a day, then use the bigger sites out there as a guide, and dig up the sort of networks they are running on. You are going to need distributed servers so you can take them down for maintennance, same for http:// daemons. You'll probably want mirrors on different ends of the country to prevent bottlenecking, etc, etc.

I didn't want to start a flame war, or worry you. I took you at your 2,000,000 link word, and growth.

If you are worried, you might want to look up a good small Lan/networking consultant, and ask them what sort of load can be handled by your type of equipment, and what sorts of loads you are expecting. They can advise you on what sort of planning you should do _now_ to make the move up easier.

In all reality, if you do get up to 2 million links, and over power this server, you should be making _real_ money, because you will _need_ significant additional hardware, not just one server. Two servers as I initially described is just a way to set the grounds for the network to come, and to delay it by maybe 30%.

Computer expansion in the initial stages is not linear, it's logarythmic, or at best, exponential. You go from one machine to 4. Then, you add machines 2-3x as powerful (which skews the curve on units of hardware) with each upgrade, or you add a machine at each end of the process.

That being said,

Your system with both the light and mod_perl apache's should be able to handle a lot of load. Unix is a very good OS, and will do it's best for you. (Under NT, all bets are off. Seems NT servers across the Net are having problems the past 2 days <sigh> )

I've actually tried to find some benchmarking for this, but haven't found anything useful.

I wish people would post their systems and stats!

I'm building a targeted site, so I'm not out for quantity. In a way I wish I was, because I'd like to see the programs under full load, in a real-world situation like yours Smile

You can't go by ISP virtual server offerings, because most virtual servers are completely underutilized, and as you see here, if your server starts to use any sort of resources, you are shut down or forced off by peformance issues.

Anyway, keep us advised!!

I'm sure there are others just as interested, sitting silently by Smile


Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
I'll keep you up to date. I am not sure how much traffic we are expecting ( There is quite a few people in on this ). Anyway - We are getting 300GB Transfer or sometthing like that, hopefully we will utilise most of it!
Quote Reply
Re: New Server - How Many Links Could It Hold? In reply to
Hi pugdog,

My System config is:

PIII-500 single on a BX board
512 MB Ram
Adaptec LVD
2 x IBM 9,2 GB LVD 7200
Linux, Apache

I am using a software-raid solution, which is speeding up the file access about 70%

Currently there are 1200 categorys (building time 18 sec) I am expecting an average of 200.000 Links. The system is still offline.

Do you know a programm to simulate a high load? I would like to benchmark the system before putting it on the net.
> >