Thursday, July 28, 2005

Setting up transparent proxy server

Hi all,

Today, while setting up ip for my internal network, i found out that i have run out of IPs and the internet access was very slow. I ran into a situation called "bottleneck". A situation where a road becomes narrow with heavy traffic. How to speed up this? The answer is proxy server.

On with the theory
Proxy server is a server that can cache visited web pages. Dynamic web pages are not cached. When a client access a website, the proxy server , on behalf of the client access the website and cache it. the next the client or other client wants to connect to the site, the proxy server just give the cached site to the client. Thus reducing the response time from the actual site.

Transparent proxy
In a normal proxy case, you have to set manually for each client to connect to outside. It is not a practical solution if you have a lot of workstations + many apps to connect to the internet. What is more practical solution? The answer is "transparent proxy" and now iptables comes into play.

What you have to do first?
1. Setup a server
OS : Linux (whatever flavor you want)
proxy server : Squid (install the latest one)
utilities : netfilter packages (for iptables)

Squid.conf
Your squid.conf location is dependent on how you install squid package. If u use source code and compile it without tweaking ./configure options, meaning it is in /usr/local/squid/etc. If you use your package manager, it is in /etc. Wherever it is, you have to edit it before you can use it as a transparent proxy.

What to edit
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
acl lan src 192.168.1.1 192.168.2.0/24
http_access allow localhost
http_access allow lan

Please change "lan" to suit your network environment. This file is heavily documented. Please read the comments before you change anything unless you know what you're doing.

I don't want to explain in detail on how to setup linux for your server. Please consult your spesific Linux distribution HOWTOs and FAQs. After you have complete setting up Linux, you should setup SQUID. More information on squid, pls visit http://www.squid-cache.org. squid usually readily packaged for your distro. You should check that first whether you can just install it from CD. If not, you have to download from the link above.

After you have edited squid.conf, this is the iptables command you should run on the proxy server.

iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3128
provided that your proxy server is using port 3128. If not, change it accordingly.

5 comments:

gendit said...

Hi,
I'm already successfully installing squid 3.1.14 on ubuntu server 11.10 and redirect traffic to the proxy using pfsense firewall and it works. I use a default configuration that were post on most of the website to configure the squid.conf file.but there is a problem, why my squid is always prompt TCP_MISS in the access.log??..is there is any configuration that is miss??or need to add??..please assist :)

zamri said...

tcp_miss means squid looked for the website in the cache but didnt find it so downloaded it from the internet. So it closely related to the browsing habit of your users and the fact that more and more websites serve dynamic contents which are not cached by squid. So it's pretty normal to me. By time, you can see that tcp_hit will increase and so do the tcp_miss.

gendit said...

Hi,
thanks for the reply...

i'm very new to squid, this is my first time installing it,i have several questions in my mind :
1. It there is any configuration that i can made,so that the TCP_hit can be improve?,i already google it,and there is 2-3 website that ask to modify the refresh_pattern,but it seems that they declare each website in order to improve that caching process example "refresh_pattern .gif 4320 50% 43200
refresh_pattern .jpg 4320 50% 43200
refresh_pattern .tif 4320 50% 43200
refresh_pattern ^http://www.friendster.com/.* 720 100% 4320
refresh_pattern ^http://mail.yahoo.com/.* 720 100% 4320
refresh_pattern ^http://mail1.plasa.com/.* 720 100% 4320
refresh_pattern ^http://*.yahoo.*/.* 720 100% 4320
refresh_pattern ^http://*.friendster.*/.* 720 100% 4320
refresh_pattern ^http://www.yahoo.com/.* 720 100% 4320
refresh_pattern ^ftp: 10080 95% 241920 reload-into-ims override-lastmod
refresh_pattern . 180 95% 120960 reload-into-ims override-lastmod"...or there is other way??

2.how can i view all the cache that already being stored in my HDD?..since i use ubuntu server 11.10 and no GUI..any command will be helpful :)

zamri said...

1. you can test it. It may increase the tcp_hit but no guarantee. But to put many websites into the refresh_pattern is a pain in the ass. :)

2. I dont know how to view the cache but you can see the files are in your cache_dir as specified in squid.conf.

gendit said...

ok..thanks for the reply...selamat hari raya :)