
-
Sep2
An Intro to Virtualization with VMware vSphere 4
Author: Susanta K Beura; Filed under: Computer & Internet; Tagged as: AnIntrotoVirtualizationwithVMwarevSphere, clusters, confusion, cpu memory, datacenter, datacentre, ESXhypervisor, fanfare, hard disks, hardware cpu, hypervisor, medium businesses, physical hardware, physical server, product launch, relevance, resource pools, virtual equivalent, virtual machine, virtual machines, virtual systems, Virtualization, VirtualizationConcepts, VirtualizationforFree, VirtualizationFundamental, VirtualizationFundamentalConcepts, VirtualizationwithVMwarevSphere, virtualmachine, vm, VMware, VMwarevSphere, vSphere
No CommentsBackground
VMware officially released vSphere 4 on the 21st of April, with all the fanfare you would expect to accompany a major product launch, however much of the focus was on how it would enable enterprises to move to “cloud computing”. Whilst this all sounds great it is of little relevance to the SMB IT Admin, whose “datacentre” might run from a couple of servers to a couple of racks of servers with a constrained budget to match it.
Since our focus is on IT for those “Small to Medium Businesses” I have been evaluating vSphere 4 from that angle – looking at the features which are most relevant for those considering whether it is right for their network. This is the first of a series of articles to help you with your evaluation, which will discuss the key features of vSphere 4, subsequent articles will cover the actual process of installing and configuring the various components so you can test it yourself.
Virtualization Terminology
When discussing virtualization there are several key terms which can often cause confusion, in particular that of “server” which can be interchangeable between physical and virtual systems. Thus to avoid any confusion it is important to define the various terms employed by VMware and in this article:
Datacenters, clusters and hosts – The “host” is the basic building block of vSphere and refers to a physical server running the ESX hypervisor, whilst a cluster consists of two or more hosts with their associated resource pools, virtual machines and datastores. The datacenter is the largest unit of management in vSphere and contains one or more clusters.
Virtual Machine – the VM is the virtual equivalent of a physical server and as such has all the resources defined that you would usually expect when specifying physical hardware – CPU, memory, hard disks and networking. The operating system is then installed on the virtual machine in the usual way, the hardware is completely emulated by the hypervisor so that to all intents and purposes there is no significant difference between a virtual and a physical machine.
Virtual Appliance – Many vendors now supply pre-built virtual machines ready to deploy as a “virtual appliance”, which may provide an easy way to test a server based product or increasingly as an alternative to a dedicated hardware appliance. A good example of this is Clearswift, who previously only supplied their web and email filtering systems as a hardware device but now offer a virtual appliance as an alternative. With vSphere Vmware have also introduced the ”vApp”, which extends the idea of the virtual appliance to enterprise level by allowing for the creation of multi-VM vApps with linked service and resource policies.
Hypervisor - the hypervisor is the software which provides the sharing and translation layer between the server hardware and the virtual machines running on top of it. The latest generation of Intel and AMD CPUs incorporate virtualization extensions which enhance this functionality by allowing more efficient sharing of resources whilst improving the isolation of individual VMs, so that one VM cannot cause another to crash.
Datastore – ESX storage is divided into datastores which are formatted with its proprietary VMFS file system, one of the key features of which is its ability to handle concurrent access by multiple hosts. Datastores can be created from any type of storage visible to the hypervisor but to enable most of the advanced vSphere features shared storage which is accessible to all the hosts in a cluster is required, i.e. a Storage Area Network. Datastore space is then provisioned into virtual disks which are attached to virtual machines to provide the storage for the operating systems installed on them.
Resource Pool – Many of the advanced features of vSphere revolve around the concept of a cluster of virtual hosts whose key resources (CPU MHz and memory) are concentrated into a central pool. This pool can be subdivided into more resource pools which can then be used to automatically manage the allocation of resources to virtual machines and enable prioritization of key VMs over less important ones.
Virtualization – Fundamental Concepts
vSphere 4 is VMware’s name for the suite of applications that make up the latest major release of their virtualization solution and replaces their previous “Virtual Infrastructure” product suite. It is built around their ESX 4.0 hypervisor, and vCenter Server which manages the ESX hosts to provide the advanced functionality. VMware’s direct competitors are Citrix, with their “Essentials” suite based on the XENserver hypervisor, and Microsoft with their Hyper-V hypervisor and System Center Virtual Machine Manager. Such is the competitiveness of the virtualization market, and the obvious value of gaining market share, that all three vendors are now giving away their core hypervisors for free. Although migrating from one solution to another is not impossible it is a major undertaking, which underlines the importance of choosing the right vendor from the start.
The fundamental concepts of virtualization are the same irrespective of vendor and require a significant change of mindset for the IT Manager used to the concept of a “server” being a package of hardware and software together. Turning the software “server” into a “virtual machine” makes it hardware independent as it just requires a hypervisor providing the necessary resources, and turning the hardware “server” into a “virtual host” by installing the hypervisor on it allows it to run as many virtual machines as it has resources for. Separating the software from the hardware in this way opens up a whole new spectrum of possibilities and also brings features that were previously “enterprise only” well within the reach of the smaller IT infrastructure.

ESXi - Virtualization for Free
Thanks to the competition between the major vendors you can make the initial move to virtualization without any financial outlay, except for the time required to deploy it. You can register with VMware to obtain a free ESXi 4.0 license key for as many hosts as you like, and you can manage those hosts effectively using the incorporated vSphere client. This will enable you to take advantage of a fundamental benefit of virtualization, the ability to maximize hardware resource usage by installing multiple VMs on a single physical host. Even in a single server setup its worth considering, once you have virtualized your server into a VM upgrading to new server hardware in future becomes a one hour job; (i) buy new server, (ii) install ESXi, (iii) copy your VM from the old server to the new one – no need to worry about updating device drivers etc. The same also applies for disaster recovery scenarios, restoring to new hardware is much quicker and more reliable when you no longer have to worry about changing hardware specifications.
vSphere 4 – Advanced Virtualization Functionality
So if you can do virtualization for free with ESXi why should you be thinking about spending money on vSphere 4? Basically if you are just running a single host setup then there is little reason to consider upgrading, virtually all the key features require at least a two host cluster in order to work. There is also one other key requirement which shouldn’t be omitted when considering the costs of implementation, you will require a SAN to provide shared storage to all your hosts. The factors to consider when specifying a suitable SAN are complex, including performance, fault tolerance and connectivity; suffice to say that they will be covered in a separate article.
Once your infrastructure meets the basic requirements these are the main features vSphere has to offer:
VM Migration
Even with shared storage moving a VM from one host to another with ESXi isn’t just a one-click process, but vSphere makes it that easy. With vMotion you can even move VMs whilst they are online, with only a brief pause in operation and no need to reboot. Another feature, “Storage vMotion”, allows you to move a live VM from one datastore to another without taking it offline. These features help to minimize the disruption of hardware maintenance; should you need to shutdown a host server in order to upgrade or replace it you can move your VMs to other hosts in advance so there is no interruption in service.

High Availability
One of the key vSphere features for the SME IT manager, High Availability allows you to promise service uptime levels that were previously only attainable with enterprise clustering solutions. Each VM and host is monitored continuously and should it go offline then vSphere will try to recover it, either by forcing a reset of the VM (e.g. if a BSOD occurred), or migrating the VMs to a new host and starting them on there in the event of the original host failing.

Fault Tolerance
Fault Tolerance is a new feature introduced in vSphere4 which takes advantage of new functions in the latest Intel CPU architectures to go one step beyond High Availability. Although HA can greatly reduce downtime it does not eliminate it as the VM still has to be restarted, during which period it is effectively unavailable. Fault Tolerance uses “lockstep” to maintain a second clone VM on another host, which is kept continuously up to date as the complete VM state is mirrored. Should the primary host fail for whatever reason the secondary VM image activates immediately, retaining user session states and allowing work to continue with no time or data loss.
Data Recovery
Sometimes even Fault Tolerance isn’t enough and you may want to restore an older version of a VM, or have to recover from a backup, so a good Disaster Recovery solution is essential. You should continue to use your existing backup application for file level backups, running within the VM, but often there are DR situations where the fastest solution will be to recover an entire VM image. VMware Consolidated Backup is still available as part of vSphere, which enables you to integrate your backup system with your virtual infrastructure for efficient VM backups. New with vSphere4 is “Data Recovery”, which provides a complete VM backup solution supporting scheduling and data de-duplication. Since this is a significant addition to vSphere’s capabilities it is fully covered in a separate article.
Dynamic Resource and Power Management
Although less of an issue for smaller businesses one of the key virtues often touted for virtualization is the ability to dramatically reduce IT system’s power consumption, and collaterally cooling, with the associated reduction in electricity bills. vSphere’s Dynamic Power Management allows to you to specify how aggressively you want it to reduce power consumption, and will take advantage of the latest CPU functions such as core throttling. It uses vMotion to move VMs between hosts in order to minimize power requirements whilst still providing the necessary resources, and is even able to put hosts into standby and then re-awaken them when required. Dynamic Resource Scheduling is a similar feature but works instead to ensure that all VMs get the resources they require, again it will use vMotion to move VMs between hosts in order to achieve the most efficient balance. DRS and DPM are reactive processes that will respond to changing patterns of usage, so if one particular VM comes under sustained load it may be moved to another host with more resources available.
Conclusions
You should now have a much better idea of the benefits vSphere has to offer you and whether you should consider implementing it in your network. Although the basic ESXi hypervisor is free the rest of the suite and related features aren’t, and there are various licensing options designed to suit different size implementations. Subsequent articles in this series will help you to assess the potential cost of virtualizing your network, and also show you how to set up your own vSphere cluster in order to experience and evaluate it yourself.
Blogged with the Flock Browser -
Sep2
What is IMAP?
Author: Susanta K Beura; Filed under: Computer & Internet; Tagged as: bulletin board messages, client email, client software, common file system, desktop computer, electronic mail, electronic messaging, imap server, imap what is it, internet message access protocol, internet messaging, key goals, mail server, mailboxes, manipulation, message storage, message stores, multiple computers, notebook computer, offline mode, post office protocol, single computer, what is a imap, what is a imap account, what is a imap email, what is a imap email account, what is an imap, what is an imap account, what is an imap email, what is better imap or pop3, what is better pop or imap, what is difference between pop and imap, what is imap, what is imap access, what is imap account, what is imap and pop3, what is imap e mail, what is imap email, what is imap email account, what is imap gmail, what is imap mail, what is imap protocol, what is pop and imap, what is pop3 imap, what is the imap
No CommentsIMAP stands for Internet Message Access Protocol. It is a method of accessing electronic mail or bulletin board messages that are kept on a (possibly shared) mail server. In other words, it permits a “client” email program to access remote message stores as if they were local. For example, email stored on an IMAP server can be manipulated from a desktop computer at home, a workstation at the office, and a notebook computer while traveling, without the need to transfer messages or files back and forth between these computers.
IMAP’s ability to access messages (both new and saved) from more than one computer has become extremely important as reliance on electronic messaging and use of multiple computers increase, but this functionality cannot be taken for granted: the widely used Post Office Protocol (POP) works best when one has only a single computer, since it was designed to support “offline” message access, wherein messages are downloaded and then deleted from the mail server. This mode of access is not compatible with access from multiple computers since it tends to sprinkle messages across all of the computers used for mail access. Thus, unless all of those machines share a common file system, the offline mode of access that POP was designed to support effectively ties the user to one computer for message storage and manipulation.
Key goals for IMAP include:
- Be fully compatible with Internet messaging standards, e.g. MIME.
- Allow message access and management from more than one computer.
- Allow access without reliance on less efficient file access protocols.
- Provide support for “online”, “offline”, and “disconnected” access modes
- Support for concurrent access to shared mailboxes
- Client software needs no knowledge about the server’s file store format.
The protocol includes operations for creating, deleting, and renaming mailboxes; checking for new messages; permanently removing messages; setting and clearing flags; server-based RFC-2822 and MIME parsing (so clients don’t need to), and searching; and selective fetching of message attributes, texts, and portions thereof for efficiency.
IMAP was originally developed in 1986 at Stanford University. However, it did not command the attention of mainstream email vendors until a decade later, and it is still not as well-known as earlier and less-capable alternatives such as POP, though that is rapidly changing, as articles in the trade press and the implementation of IMAP in more and more software products show.
There is a companion protocol to IMAP, developed at Carnegie Mellon University. It is called the “Application Configuration Access Protocol”, or ACAP, and provides the same location independent access to configuration files, address books, bookmark lists, etc, that IMAP offers for mailboxes.
-
Sep2
Installing Squid
Author: Susanta K Beura; Filed under: Computer & Internet, Proxy Server; Tagged as: apache squid, cache performance, cache server, cachemgr squid, catastrophic hardware failure, chapter 13, configure squid, configuring squid, cpu power, cur, disable squid, disk throughput, dns squid, enable squid, fedora squid, freebsd squid, hardware requirements, hardware subsystems, how to configure squid, how to squid, installing configure, installing proxy, installing proxy server, internet usage, ldap squid, lunch hour, operating system, peak number, power squid, ratios, reverse proxy squid, squid, squid access denied, squid acl, squid allow, squid authentication, squid block, squid cache, squid caching, squid config, squid configuration, squid definitive guide, squid faq, squid filter, squid for windows, squid howto, squid https, squid linux, squid log, squid proxy, squid proxy server, squid server, squid url, standby machine, statistics, stresses, system memory, system performance, transparent proxy squid, transparent squid
No CommentsHardware Requirements
Caching stresses certain hardware subsystems more than others. Although the key to good cache performance is good overall system performance, the following list is arranged in order of decreasing importance:
- Disk random seek time
- Amount of system memory
- Sustained disk throughput
- CPU power
Do not drastically underpower any one subsystem, or performance will suffer.
In the case of catastrophic hardware failure you must have a ready supply of alternate parts. When your cache is critical, you should have a (working!) standby machine with operating system and Squid installed. This can be kept ready for nearly instantaneous swap-out. This will, of course, increase your costs, something that you may want to take into account. Chapter 13 covers standby procedures in detail.
Gathering Statistics
When deciding on your cache’s horsepower, many factors must be taken into account. To decide on your machine, you need an idea of the load that it will need to sustain: the peak number of requests per minute. This number indicates the number of ‘objects’ downloaded in a minute by clients, and can be used to get an idea of your cache load.
Computing the peak number of requests is difficult, since it depends on the browsing habits of users. This, in turn, makes deciding on the required hardware difficult. If you don’t have many statistics as to your Internet usage, it is probably worth your while installing a test cache server (on any machine that you have handy) and pointing some of your staff at it. Using ratios you can estimate the number of requests with a larger user base.
When gathering statistics, make sure that you judge the ‘peak’ number of requests, rather than an average value. You shouldn’t take the number of requests per day and divide, since your peak (during, for example, lunch hour) can be many times your average number of requests.
It’s a very good idea to over-estimate hardware requirements. Stay ahead of the growth curve too, since an overloaded cache can spiral out of control due to a transient network problems. If a cache cannot deal with incoming requests for some reason (say a DNS outage), it still continues to accept incoming requests, in the hope that it can deal with them. If no requests can be handled, the number of concurrent connections will increase at the rate that new requests arrive.
If your cache runs close to capacity, a temporary glitch can increase the number of concurrent, waiting, requests tremendously. If your cache can’t cope with this number of established connections, it may never be able to recover, with current connections never being cleared while it tries to deal with a huge backlog.
Squid 2.0 may be configured to use threads to perform asynchronous Input/Output on operating systems that supports Posix threads. Including async-IO can dramatically reduce your cache latency, allowing you to use a less powerful machine. Unfortunately not all systems support Posix threads correctly, so your choice of hardware can depend on the abilities of your operating system. Your choice of operating system is discussed in the next section – see if your system will support threads there.’
Hard Disks
Disk Speed
There are numerous things to consider when buying disks. Earlier on we mentioned the importance of disks with a fast random-seek time, and with high sustained-throughput. Having the world’s fastest drive is not useful, though, if it holds a tiny amount of data. To cache effectively you need disks that can hold a significant amount of downloaded data, but that are fast enough to not slow your cache to a crawl.
Seek time is one of the most important considerations if your cache is going to be loaded. If you have a look at a disk’s documentation there is normally a random seek time figure. The smaller this value the better: it is the average time that the disk’s heads take to move from a random track to another (in milliseconds).
Operating systems do all sorts of interesting things (which are not covered here) to attempt to speed up disk access times: waiting for disks can slow a machine down dramatically. These operating system features make it difficult to estimate how many requests per second your cache can handle before being slowed by disk access times (rather than by network speed). In the next few paragraphs we ignore operating system readahead, inode update seeks and more: it’s a back of the envelope approximation for your use.
If your cache does not use asynchronous Input-Output (described in the Operating system section shortly) then your cache loses a lot of the advantage gained by multiple disks. If your cache is going to be loaded (or is running anywhere approaching capacity according to the formulae below) you must ensure that your operating system supports posix threads!
A cache with one disk has to seek at least once per request (ignoring RAM caching of the disk and inode update times). If you have only one disk, the formula for working out seeks per second (and hence requests per second) is quite simple:
requests per second = 1000/seek time
Squid load-balances writes between multiple cache disks, so if you have more than one data disk your seeks-per-second per disk will be lower. Almost all operating systems will increase random seek time in a semi-linear fashion as you add more disks, though others may have a small performance penalty. If you add more disks to the equation, the requests per second value becomes even more approximate! To simplify things in the meantime, we are going to assume that you use only disks with the same seek time. Our formula thus becomes:
theoretical requests per second = 1000/(seek time / number of disks)
Let's consider a less theoretical example: I have three disks - all have 12ms seek times. I can thus (theoretically, as always) handle: requests per second = 1000/(12/3) = 1000/4 = 250 requests per second
While we are on this topic: many people query the use of IDE disks in caches. IDE disks these days generally have very similar seek times to SCSI disks, and (with DMA-compatible IDE controllers) approach the speed of data transfer without slowing the whole machine down.
Deciding how much disk space to allocate to Squid is difficult. For the pilot project you can simply allocate a few megabytes, but this is unlikely to be useful on a production cache.
Disk Space
The amount of disk space required depends on quite a few factors.
Assume that you were to run a cache just for yourself. If you were to allocate 1 gig of disk, and you browse pages at a rate of 10 megabytes per day, it will take at least 100 days for you to fill the cache.
You can thus see that the rate of incoming cache queries influences the amount of disk to allocate.
If you examine the other end of the scale, where you have 10 megabytes of disk, and 10 incoming queries per second, you will realize that at this rate your disk space will not last very long. Objects are likely to be pushed out of the cache as they arrive, so getting a hit would require two people to be downloading the object at almost exactly the same time. Note that the latter is definitely not impossible, but it happens only occasionally on loaded caches.
The above certainly appears simple, but many people do not extrapolate. The same relationships govern the expulsion of objects from your cache at larger cache store sizes. When deciding on the amount of disk space to allocate, you should determine approximately how much data will pass through the cache each day. If you are unable to determine this, you could simply use your theoretical maximum transfer rate of your line as a basis. A 1mb/s line can transfer about 125000 bytes per second. If all clients were setup to access the cache, disk would be used at about 125k per second, which translates to about 450 megabytes per hour. If the bulk of your traffic is transferred during the day, you are probably transferring 3.6 gigabytes per day. If your line was 100% used, however, you would probably have upgraded it a while ago, so let’s assume you transfer 2 gigabytes per day. If you wanted to keep ALL data for a day, you would have to have 2 gigabytes of disk for Squid.
The feasibility of caching depends on two or more users visiting the same page while the object is still on disk. This is quite likely to happen with the large sites (search engines, and the default home pages in respective browsers), but the chances of a user visiting the same obscure page is slim, simply due to the volume of pages. In many cases the obscure pages are on the slowest links, frustrating users. Depending on the number of users requesting pages you should keep pages for longer, so that the chances of different users accessing the same page twice is higher. Determining this value, however, is difficult, since it also depends on the average object size, which, in turn, depends on user habits.
Some people use RAID systems on their caches. This can dramatically increase availability, but a RAID-5 system can reduce disk throughput significantly. If you are really concerned with uptime, you may find a RAID system useful. Since the actual data in the cache store is not vital, though, you may prefer to manually fail-over the cache, simply re-formatting or replacing drives. Sure, your cache may have a lower hit-ratio for a short while, but you can easily balance this minute cost against what hardware to do automatic failover would have cost you.
You should probably base your purchase on the bandwidth description above, and gather data to decide when to add more disk space.
Memory/Ram Requirements
Squid keeps an in-memory table of objects in RAM. Because of the way that Squid checks if objects are in the file store, fast access to the table is very important. Squid slows down dramatically when parts of the table are in swap.
Since Squid is one large process, swapping is particularly bad. If the operating system has to swap data, Squid is placed on the ’sleeping tasks’ queue, and cannot service other established connections. (? hmm. it will actually get woken up straight away. I wonder if this is relevant ?)
Each object stored on disk uses about 75 bytes (? get exact value ?) of RAM in the index. The average size of an object on the Internet is about 13kb, so if you have a gigabyte of disk space you will probably store around about 80 000 objects.
At 75 bytes of RAM per object, 80 000 objects require about six megabytes of RAM. If you have 8gigs of disk you will need 48Mb of RAM just for the object index. It is important to note that this excludes memory for your operating system, the Squid binary, memory for in-transit objects and spare RAM for for disk cache.
So, what should your sustained-thoughput of your disks be? Squid tends to read in small blocks, so throughput is of lesser importance than random seek times. Generally disks with fast seeks are high throughput, and most disks (even IDE disks these days) can transfer data faster than clients can download it from you. Don’t blow a year’s budget on really high-speed disks, go for lower-seek times instead – or add more disks.
CPU Power
Squid is not generally CPU intensive. On startup Squid can use a lot of CPU while it works out what is in the cache, and a slow CPU can slow down access to the cache for the first few minutes upon startup. A Pentium 133 machine generally runs pretty idle, while receiving 7 TCP requests a second
A multiprocessor machine generally doesn’t increase speed dramatically: only certain portions of the Squid code are threaded. These sections of code are not processor intensive either: they are the code paths where Squid is waiting for the operating system to complete something. A multiprocessor machine generally does not reduce these wait times: more memory (for caching of data) and more disks may help more.
Choosing an Operating System
(? Who is I ?)
Where I work, we run many varieties of Unix. When I first installed Squid it was on my desktop Linux machine – if I break it by mistake it’s not going to cause users hassles, so I am free to do on it what I wish.
Once I had tested Squid, we decided to allow general access to the cache. I installed Squid on the fastest unused machine we had available at the time: a (then, at least) top of the range Pentium 133 with 128Mb of RAM running FreeBSD.
I was much more familiar with Linux at that stage, and eventually installed Linux on the public cache machine. Though running Linux caused some inconveniences (specifically with low per-process filehandle limits), it was the right choice, simply because I could maintain the machine better. Many times my experience with Linux has gotten me out of potentially sticky situations.
If your choice of operating system saves you time, and runs Squid, use it! Just as I didn’t use Digital Unix (Squid is developed on funded Digital Unix machines at NLANR), you don’t need to use Linux just because I do.
Most modern operating systems sport both similar performance and similar feature sets. If your system is commonly used and roughly Posix compliant at the source level, it will almost certainly be supported by Squid.
When was the last time you had an outage due to hardware failure? Unless you are particularly unlucky, the interval between hardware failures is low. While the quality of hardware has increased dramatically, software often does not keep pace. Many outages are caused by faulty application of operating system software. You must thus be able to pick up the pieces if your operating system crashes for some reason.
Experience
If you normally work on a specific operating system, you should probably not use your cache as a system to experiment with a new ‘flavor’ of Unix. If you have more experience in an operating system, you should use that system as the basis for your cache server. Customers rapidly turn off caching when a cache stops accepting requests (while you learn your way around some ‘feature’).
Your cache system will almost certainly form a core part of your network as soon as it is stable. You must be able to return the system to working order in minimal time in the event of a system failure, and this is where your existing experience becomes crucial. If the failure happens out of business hours you may not be able to get technical support from your vendor. A dialup ISP’s hours of business differ dramatically to that of Operating System vendors.
Features
Though most operating systems support similar features, there are often no standards for functions required for some of the less commonly used operating system features. One example is transparency: many operating systems can now support transparent redirection to a local program, but almost all of them function in a different way, since there is not a real standard for the way an operating system is supposed to function in this scenario.
If you are unable to find information about Squid on your operating system, you may want to organize a trial hardware installation (assuming that you are using a commercial operating system) as a test. Only when you have the system running can you be sure that your operating system supports the required features.
Squid works on the following systems:
- Linux - freeBSD - Slackware (CaPunG) - (? List ?)If you are using Squid without extensions like transparency and ARP access control lists, you should not have problems. For your convenience a table of operating system support of specific features is included. Since Squid is constantly being developed, it’s likely that this list will change.
Compilers
Squid is written on Digital Unix (?version ?) machines running the GNU C compiler (GCC). GCC is included with free operating systems such as Linux and FreeBSD, and is easily available for many other operating systems and hardware platforms. The GNU compiler adheres as closely to the ANSI C standard as possible, so if a different compiler is included with your operating system, it may (or may not) have trouble interpreting Squid’s source code, depending on it’s level of ANSI compliance. In practice, most compilers work fine.
Some commercial compilers choose backward compatibility with older versions over ANSI compliance. These compilers generally support an option that turns on ‘ANSI compliant mode’. If you have trouble compiling Squid you may have to turn this mode on. (? is this still valid? I remember things like this back in the Borland C days – though I seem to remember this on a Unix system too… ?) In the worst possible scenario you may have to compile GCC with your existing compiler and use GCC to compile Squid.
If you do not have a compiler, you may be able to find a precompiled version of GCC for your system on the Internet. Be very careful when installing software from untrusted sources. This is discussed shortly in the “precompiled binary” section.
If you cannot find versions of GCC for your platform, you may have to factor in the cost of the compiler when deciding on your operating system and hardware.
Basic System Setup
Before you even install the operating system, it’s best to get an idea as to how the system will look once Squid is up and running. This will allow you to partition the disks on the machine so that their mount path will match Squid’s default configuration.
Default Squid directory structure
Normally Squid’s directory tree looks like this:
/usr/local/squid/ /bin/ /cache/ /etc/ /src/squid-2.0/
Working through each directory below /usr/local/squid in the order presented above:
Back to the cache directory: if you have more than one partition for the cached data, you can make subdirectories for each of the filesystems in the cache directory. Normally people name these directories cache1, cache2′, cache3 and so forth. Your cache directories should be mounted somewhere like /usr/local/squid/cache/1/ and /usr/local/squid/cache/2/. If you have only one cache disk, you can simply name the directory /usr/local/squid/cache/.
In Squid-1.1 cache directories had to be identical in size. This is no longer the case, so if you are upgrading to Squid 2.0 you may be able to resize your cache partitions. To do this, however, you may have to repartition disks and reformat.
When you upgrade to the latest version of Squid, it’s a good idea to keep the old working compiled source tree somewhere. If you upgrade to the latest Squid and encounter problems, simply kill Squid, change to the previous source directory and reinstall the old binaries. This is a lot faster than trying to remember which source tree you were running, downloading it, compiling it, applying local patches and then reinstalling.
User and Group IDs
Squid, like most daemon processes on Unix machines, normally runs as the user nobody and with the group nogroup.
For the maximum flexibility in allowing root and non-root users to manipulate the Squid configuration, you should make both a new user and two new groups, specifically for the Squid system, rather than using the nobody and nogroup IDs. Throughout this book we assume that you have done so, and that a group and a user have been created, (both called squid) and a second admin group, called squidadm. The squid user’s primary group should be squid, and the user’s home directory should be /usr/local/squid (the default squid software install destination).
When you have multiple administrators of a cache machine, it is useful to have a dedicated squidadm group, with sub-administrators added to this group. This way, you don’t have to change to the root user whenever you want to make changes to the Squid config. It’s possible, for users in the squidadm group to gain root access, so you shouldn’t place people without root access in the squidadm group.
When the config file has been changed, a signal has to be sent to the Squid process to inform it that that config files are to be re-read. Sending signals to running processes isn’t possible when the signal sender isn’t the same userid as the receiver. Other config file maintainers need permission to change their user-id (either by using the ’su’ command, or by logging in with another session) to either the root user or to the user Squid is running as.
In some environments cache software maintainers aren’t trusted with root access, and the user nobody isn’t allowed to login. The best solution is to allow users that need to make changes to the config file access to a reload script using sudo. Sudo is available for many systems, and source code is available.
In Chapter 4 we go through the process of changing the user-id that Squid runs as, so that files Squid creates are owned by the squid user-id, and by the group squid. Binaries are owned by root, and config files are changeable by the squidadm group.
Getting Squid
Now that your machine is ready for your Squid install, you need to download and install the Squid program. This can be done in two ways: you can download a source version and compile it, or you can download a precompiled binary version and install that, relying on someone else to do the compilation for you.
Binary versions of Squid are generally easier to install than source code versions, specifically if your operating system vendor distributes a package which you can simply install.
Installing Squid from source code is recommended. This method allows you to turn on compile-time options that may not be included in distributed binary versions (one of many examples: SNMP support is not included into the source at compile time unless it is specifically included, and most binary versions available do not include snmp support). If your operating system has been optimized so that Squid can run better (let’s say you have increased the number of open filehandles per process) a precompiled binary will not take advantage of this tuning, since your compiler header files are probably different to the ones where the binaries were compiled.
It’s also a little worrying running binaries that other people distribute (unless, of course, they are officially supplied by your operating system vendor): what if they have placed a trojan into the binary version? To ensure the security of your system it is recommended that you compile from the official source tree.
Since we suggest installing from source code first, we cover that first: if you have to download a Squid binary from somewhere, simply skip to the next sub-section: Getting a binary version of Squid.
Getting the Squid source code
Squid source is mirrored by numerous sites. For a list of mirrors, have a look at http://www.squid-cache.org/Mirrors/
Deciding which of the available files to download can become an issue, especially if you are not familiar with the Squid version naming convention. Squid is (as of this writing) in version 2. As features are added, the minor version number is incremented (Squid 2.0 becomes Squid 2.1, then Squid 2.2 etc etc). Since new features may introduce new bugs, the first version including new features is distributed as a pre-release (or beta) version. The first pre-release of Squid 1.2 is called squid-2.1.PRE1-src.tar.gz. The second is squid-2.1.PRE2-src.tar.gz. Once Squid is considered stable, a general release version is distributed: the first release version is called squid-2.0.RELEASE-src.tar.gz, the second (which would include minor bug fixes) squid-2.0.RELEASE2-src.tar.gz.
In short, files are named as follows: squid-2.minor-version-number.stability-info.release-number.tar.gz. Unless you are a Squid developer, you should download the last available RELEASE version: you are less likely to encounter bugs this way.
Squid source is normally available via FTP (the File Transfer Protocol), so you should be able to download Squid source by using the ftp program, available on almost every Unix system. If you are not familiar with ftp, you can simply select the mirror closest to you with your browser and save the Squid source to your disk by right-clicking on the filename and selecting save as (do not simply click on the filename – many browsers attempt to extract compressed files, printing the tar file to y our browser window: this is definitely not what you want!). Once the download is complete, transfer the file to the cache machine.
Getting Binary Versions of Squid
Finding binary versions of Squid to install is easy: deciding which binary to trust is more difficult. If you do not choose carefully, someone could undermine your system security. If you cannot compile Squid cache, but know (and trust) someone that can do it for you, get them to help. It’s better than downloading a version contributed by someone that you don’t know.
Compiling Squid
Compiling Squid is quite easy: you need the right tools to do the job, though. First, let’s go through getting the tools, then you can extract the source code package, include optional Squid components (using the configure command) and then actually compile the distributed code into a binary format.
A word of warning, though: this is the stage where most people run into problems. If you haven’t compiled source before, try and follow the next section in order – it shouldn’t be too bad. If you don’t manage to get Squid running, at least you have gained experience.
Compilation Tools
GNU utilities mentioned below are avaliable via FTP from the official GNU ftp site or one of it’s mirrors. A list of mirrors is available at http://www.gnu.org/, or download them directly from ftp://ftp.gnu.org/.
The GNU compiler is only distributed as source (creating a chicken-and-egg problem if you do not have a compiler) you may have to do an Internet search (using one of the standard search engines) to try and find a binary copy of the GNU compiler for your system. The Squid source is distributed in compressed form. First a standard tar file is created. This file is then compressed with the GNU gzip program. To decompress this file you need a copy of gzip. GCC (The Gnu C Compiler) is the recommended compiler: the developers wrote Squid with it, and it is available for almost all systems.
You will also need the make program, of which there is also a GNU version easily available.
If possible, install a C debugger: the GNU debugger (GDB) is available for most platforms. Though a debugger is not necessary for installation, but is very useful in the case of software bugs (as discussed in chapter 13).
Unpacking the Source Archive
Earlier we looked at the tree structure of the /usr/local/squid directory. I suggest extracting the Squid source to the /usr/local/squid/src directory. So, create the directory and copy the downloaded Squid tar.gz file into it.
First let’s decompress the file. Some versions of tar can decompress the file in one step, but for compatability’s sake we are going to do it in two steps. Decompress the tar file by running gzip -dv squid-version.tar.gz. If all has gone well you should have a file called squid-version.tar in the current directory. To get the files out of the “tarball”, run tar xvf squid-version.tar.
Tar automatically puts the files into a subdirectory: something like squid-2.1.PRE2. Change into the extracted directory,and we can start configuring the Squid source.
Running configure
Now that you have decided which options to use, it’s time to run configure. Here’s an example:
./configure --enable-err-language=Bulgarian --prefix=/usr/local
Running ./configure with the options that you have chosen should go smoothly. In the unlikely event that configure returns with an error message, here are some suggestions that may help.
Broken compilers
The most common problem for new installers is that there is a problem with the installed compiler (or the headers) for the system.
To test this theory simply try and run configure with no options at all. If you still get an error message it is almost certainly a compiler or header file problem.
To make sure try and compile a program that uses some of the less used system calls and see if this compiles.
If your compiler doesn’t compile files correctly, you might want to check if the header files exist, and if they do, permissions on the directory and the include files themselves.
If you have installed GCC in a non-standard directory, or if you are cross compiling, you may need configure to append options to the GCC command it uses during it’s tests. You can get configure to append options to the GCC command line by setting the ‘CFLAGS’ shell variable prior to running configure. If, for example, you compiler only works when you you modify the default i nclude directory, you can get configure to append that option to the default command line with a (Bourne Shell) command like:
CFLAGS=-I/usr/people/staff/oskar/gcc/include export CFLAGS
Incompatible Options
Some configuration options exclude the use of others. This is another common cause of problems. To test this you should just try and run configure without any options at all, and see if the problem disappears. If so, you can try and work out which option is causing the conflict by adding each option to the configure command line one-by-one. You may find that you have to choose between two options (for example Async-IO and the DL-Malloc routines). In this case you may have to decide which of the options is the most important in your setup.
Compiling the Squid Source
Now that you have configured Squid, you need to make the Squid binaries. You should simply have to run make in the extracted source directory, and a binary will be created as src/squid.
cache:/ # cd /usr/local/squid/src/squid-2.2.RELEASE cache:/usr/local/squid/src/squid-2.2.RELEASE # make
If the compilation fails, it may be because of conflicting configure options as described in the configure section. Follow the same instructions described there to find the offending option. (You should run make clean between configure runs, to ensure that old binaries are removed) As a start, try running configure without any options at all and then see if make completes. If this works, try additional configure options one at a time to see which one causes the problem.
Installing the Squid binary
The make command creates the binary, but doesn’t install it.
Running make install creates the /usr/local/squid/bin and /usr/local/squid/etc subdirectories, and copies the binaries and default config files in the appropriate directories. Permissions may not be set correctly, but we will work through all created directories and set them up correctly shortly.
This command also copies the relevant config files into the default directories. The standard config file included with the source is placed in the etc subdirectory, as are the mime.types file and the default Squid MIB file (squid.mib).
If you are upgrading (or reinstalling), make install will overwrite binary files in the bin directory, but will not overwrite your painfully manipulated configuration files. If the destination configuration file exists, make install will instead create a file called filename.default. This allows you to check if useful options have been added by comparing config files.
If all has gone well you should have a fully installed (but unconfigured) Squid system setup.
Congratulations!
Freelance Jobs At Scriptlance
- Need 2000 Members 4 Social NetI just finished setting up a new social networking site and need vto jumpstart the site with at least 2000 – 3000 members – so this is for someone who is good at diverting traffic or who has a social site that he wants to move to a new platform… This is also geographical area specific
- JoomlaNeed to add some features inside phocadownload. Please do not bid if you do not have experience and ability to complete project fast and make a good work. Need this done in not more than a week since it is is not a lot of modification. Would love to have reliable and experienced person to work with since I have a lot more joomla projects that are coming. Project description: This component is phoca download that I like because it has play and preview features but I need following to be done. …
- Phpmotion/calender/flowplayer1) Replace Phpmotion video player with flowplayer. 2) Integrate and set me up to be able to stream live video on the main player, also ability to have links/ads/commercials on the player. (Maybe a chat box under the live video, but not a current necessity.) 3) A dynamic ad box next to the live feed that I can manually post links as the live feed plays. 4) Add an extra advertising slot below the video player. 5) Add an extra advertising slot on the members profile page. Top and Side. 5) B…
- Embed Email Subs. + Rss TickerA campaign to re-legalize hemp in one state needs to build an email list of supporters quickly. PART 1 We imagine an email subscription box with two links at the bottom, one says info and leads to the site, the other says embed and will open to display the code so supporters can post the subscription box on their site. The background of the email box would be a design that relates to the site and inspires action in supporters. Info collected will need to be name and email address. All data w…
- Sonar File Viewer And Gps MappI am looking for a quality coder to create a Side Scan Sonar Application for Windows XP/Vista/7 to playback Multibeam Sonar Files in a Graphical view, live viewing of gps tracks on chart and view Snapshot images from Sonar recordings. – Import/Export multiple sonar files to user created directories in dedicated MyProject folder in users My Documents – View sonar and snapshot files in preview format and display file data: time/date, length, #of pings, description, ect – Ability for user to en…
- Swoopo.com Auctions CloneIm looking for a team to develop a swoopo.com auctions clone in functions with a new aspect and in spanish language (translations could be done by me) I dont think that i have to give more details , just visit the web to see the features.
- Tv On Pc SoftwareLooking to rebrand a TV on PC Software application. The app should be windows XP – windows 7 Compatible. I will also need monthly channel updates. If you have a Mac version I would also be interested. I am not looking to make something from scratch so please dont ask. If you have please open a PMB with more details. Sample of the App is a plus.
- Website Design WorkHello, I am looking for an experienced Web designer to build some simple website pages as a test with the intention to have him build sites on a fulltime basis. The designer should have excellent experience with Photoshop, Dreamweaver, Flash, CSS, & other web designing tools. Joomla experience would be an asset. I am only looking for experienced designers that can build html (with flash, etc.) professional sites in excellent time frames (1 day for 3 pages if i give you an exact example s…
- Web Template Html+css+jqueryWe have built a Job Board site, and are looking for someone to help us out with the design part of the site. CSS+HTML+JQuery for the job board template. This is what we expect: – main welcome page +++ header >>>>> company logo on the left end >>>>> login fields on the right end >>>>> horizontal menu bar +++ left sidebar >>>>> Quicklinks >>>>> latest members with logo +++ main area >>>>> list of lates…
- Chatroulette Clone SiteNeed a 1:1 clone of ChatRoulette.com The script/source should have easy to understand comments, should we need to hire more programmers in the future. If the project is successful, expect future opportunities for employment. Project should make use of Flashs new "RTMFP" , Client-to-Client cam connection – like on chatroulette – to save the traffic.
Partner links
Breaking News
- HC order fuels resentment in ranksThe chorus for equality in the armed forces is only growing. On the one hand, women officers have been granted permanent commission. On the other, ex-servicemen in Bangalore are out on the streets – their demand ‘one rank, one pension’.…
- 'Pak plotted' major terror plot foiledA major terror plot was foiled this morning with the arrest of two terror suspects who were plotting attacks on Mumbai’s key installations. The two terrorists who have been identified as Abdul Latif and Riyaaz Abu Ali were planning to carry out attacks on ONGC installations as well as crowded places like the Mangaldas market in South Mumbai as well as the Thakkar Mall in the suburban area of the city. Following the arrest of two suspected Hizb terrorists by Mumbai ATS in Gujarat, two more arrests are also e…
Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.



